Visualising Premier League Penalty Rates
In the premier league, penalties seem to be given at an increased rate over the last few years. I sourced the data showing the number of penalties given each season and their outcomes and experimented with some visualisation techniques in order to display this data. As I was experimenting, there might be a lack of consistency in style as I attempt to use different methods.
The data for these visualisations were obtained from myfootballfacts (correct as of 23/02/2020) and contains the amount of penalties given, whether they were given to the home or away team, and whether they were scored, missed or saved. This data is given for all premier league seasons.
As of writing this, only 249 matches have been played so far in the Premier League. Additionally, it should be noted that the premier league had 22 teams up until (and not including) the 1995/96 season. This means a total of 420 matches were played over the season compared to the 380 that are played in every other season.
Firstly, I created a CSV file from the data in the table above. After this, I imported it into python as a Pandas data frame. Then, I created a basic (horizontal) barplot to show the number of penalties given in each season. Along the y-axis is the season and along the x-axis is the number of penalties given. This is just the ‘For’ (or ‘Against’) column from the table which has been renamed.
Next, I adjusted the x-axis to represent the number of penalties per match. This is simply dividing the number of penalties given by the number of matches played in that season. In addition, since each column represents the same value, I added a gradient colour map that has darker colours for higher values and lighter colours for lower values. This was not appropriate for the number of penalties per season, as the values come from a different number of games.
It is clear that over time the rate at which penalties have been given has increased. This can be seen in the basic scatter plot below. We can see that the rate of penalties being given is clearly increasing over time.
Next, I chose to investigate the distribution of penalties given to the home or away team. In football, ‘home advantage’ is often spoken about and it would be interesting to see this trend continue into the awarding of penalties. Given the nature of penalties and how they’re awarded, you might expect the ‘better’ team to receive more, and as a result, there be a trend in home teams receiving more penalties. I created another barplot to show this split in each season.
This graph shows that the blue value (home penalties) is nearly always higher than the orange value (away penalties). To see if there is a trend in this data, I plot the difference between home and away penalties in the graph below. Note that a negative value is where more away penalties have been given than home penalties.
This shows that in every season but one, there have been more penalties awarded to home teams than away teams. The number or proportion of which they are given does not seem to follow a clear trend. The interesting thing to note is that the previous 3 seasons have had quite a small difference in home and away penalties. For the second half of the 2019/20 season and the 2020/21 season, matches have been (mostly) played behind closed doors, potentially limiting the ‘home advantage’. It will be interesting to see if this small difference continues when crowds return, or if it potentially was due to the lack of crowds and the negation of the home advantage.
The final idea I wanted to investigate was whether or not there has been a change in the outcomes of penalties over time. Have goalkeepers gotten better at saving them over time with the increasing amount of data available on penalties and particular takers? A penalty kick has an xG value of 0.75- 0.8, depending on the model. This is to say, that between 75% and 80% of penalties are scored. I was personally interested to see how close to this value they have historically been scored at, or whether historically the rate at which they were scored might have been different.
The above graph is a stacked bar chart showing the total number of penalties each season and their outcomes. It is clear that in early premier league seasons, the number of penalties missed or saved was lower than it has been in recent years. These years had fewer penalties given so to better see the proportion of which these outcomes occur, we can stack the bar charts based on the percent of outcomes, this way each bar will add up to 100%.
This graph shows clearly that there has been a change in the percentage of each penalty outcome. In the early premier league seasons, approximately 90–95% of penalties were scored, compared to recent seasons where about 80% have been scored. The outcome that has replaced these appears to be penalties being saved (as seen by the growth of the middle bar in the chart). This might suggest that goalkeepers have indeed become better at saving penalties.
As a visual aid, in the graph below, I have added a line at 80% to see how close the percentage of scored penalties is to the ‘average’ value.
Again, this highlights that in the early premier league seasons, penalties were scored at a rate far above which they are known (or expected) to be scored today. Since around the 2003/04 season, the scoring rate has been approximately 80%, with some seasons higher, and some lower. This suggests that the rate at which penalties were scored (or saved) has changed over time and has appeared to have settled around the 80% mark for the premier league.
It would be an interesting extension to investigate the potential cause of this change. Whether goalkeepers have gotten better, takers worse, or a combination of both. Is it due to an increase in the data available? You are now able to see all the outcomes of a person's penalties that they have taken, does this give a goalkeeper an advantage? A way of testing this is to see if leagues, where this data might not be available, has a different rate of penalty success.