It appears that Arsene Wenger is determined to turn Mesut Özil into a left winger; is this really the best use of a player that has the creative talents and vision that Özil possesses?
In this very brief post, I will use heatmaps derived from Opta data to look at the locations where Özil has received the ball.
The following gif shows season long heatmaps for each season from 2011/12 to 2014/15 (so this includes the 4 league games Arsenal has played this season.
(click to open the gif in a larger window)
Özil location of passes received
Özil at Real Madrid
In 2011/12, Özil received passes all across the width of the pitch, but we can see that he most often received the ball on the right wing.
In 2012/13 it is noticeable that Özil played a more central position; this is witnessed by the nice continuous streak of red right across the width of the pitch. It was at this time that Wenger (as indeed most people were) was suitably impressed by Özil's skills that he decided to raid the Arsenal Piggy Bank and spend in the region of £42m on the German star
Özil's time in London
In the 2013/14 season Wenger decided that he didn't need to play Özil in a central position. Özil is showing virtually no heat in the central portions of the pitch as he received the ball towards either wing. Although there is a little orange on the left wing it is clear that Özil did most of his work from the right wing.
In the current season we can see some heat across the pitch but his main area for receiving the ball is clearly towards the left wing. I know the Premier League season is only 4 games old at this season, but I wonder why Wenger has decided that his German should be shunted across to the left wing. Lest we forget, it wasn't his performances from this position that he made Wenger pay in excess of £40m for his talents.
Surely Wenger should be trying to see if he can fit Özil into central positions, or at the very least towards the right wing?
The Rabbit
My daughter got a pet rabbit a few months ago, she enjoys its company but she now wants a dog. I don't want to get her a dog as I know the rabbit will be cast aside as all her attention will be diverted to her new pet dog. Surely Mesut Özil is too good to be the equivalent of my daughter's pet rabbit.
There were goals aplenty at Goodison on Saturday evening, but how did the shape of the teams change as the game ebbed and flowed? Here's how the game looked through our Player Positional Tracker.
(Click on the image to make it larger)
They say that Beauty is in the eye of the Beholder, but this is what I picked up from watching the viz:
1st Half
Game state undoubtedly played a big part in this (as Everton went behind in the first minute), but between the 5th and 20th minutes, the game was all about Everton pressure and Chelsea were happy to soak it up by sitting in their own half. Typical Mourinho. Unfortunately for Everton the scoreboard read 2-0 at this stage.
During this phase of Everton pressure Lukaku wasn’t able to get involved to any great extent
When Chelsea started to regain some territory advantage it was driven by Willian and Fabregas – both appeared towards the right side at this time (around the 30 minute mark)
In the run up to half time, Willian and Cesc dropped a little deeper and the attacking positions were taken up by Hazard, Costa and Ramires. During this first half Chelsea showed a great amount of attacking fluidity.
At the end of the half, Coleman was playing very advanced down the right side and Azpilicueta stayed back to mind the house
2nd Half
Everton were almost playing with three up top at the start of the second half, with Mirallas, Lukaku and Naismith all advanced, and central. Eto’o then came on to join this party later.
Ramires played deep during the opening 15 minutes of the second half. During this period he provided more defensive cover than did Matic.
Hazard was Chelsea’s attacking outlet in the second half
The final stages of the game seen Chelsea pretty much give up possession as they retreated into defence
In an article written at the end of July I introduced a metric which measured the intensity of a team’s high press.
That article can be found here, but in summary, this metric measured the number of passes that a team allowed in its attacking areas of the pitch per each attempted defensive action. A defensive action was defined as one of the following Opta events: a tackle, foul, interception or challenge. I used the term PPDA for this metric.
The lower the PPDA, the more aggressive the high press employed by a team.
The area of the pitch that was used in calculating the PPDA value was areas that had an Opta x co-ordinate greater than 40. It is the area to the right of the line in the image below.
Pitch image created by Valentin
Pitch image created by Valentin
Pitch image created by Valentin
Manager Effect
In the previous article I gave a few examples of how a team’s manager could have an enormous influence on how much their teams used a high press. We saw how Pochettino had an immediate impact on Southampton’s rolling PPDA values, and that even the great Barcelona team of the last few years were not immune to the impact of the manager as Tito Vilanova totally changed the way they tried to win the ball back. Managers really do matter in this aspect of the game. When the high press, or “gegenpressing”, is mentioned, names like Marcelo Bielsa, Pep Guardiola, Jurgen Klopp, Pochettino and Andre Villas Boas tend to spring to mind.
Now, by using Opta data for the last four seasons across Europe’s “Big 5” leagues I am in the position to be able to objectively rank managers based on how aggressively they have implemented the high press, by using their PPDA values.
Data Rules
To appear in this list a manager must have managed at least 34 games over the last 4 seasons in the Big 5 leagues. I chose 34 games as that is a full season in the German Bundesliga (the league that plays the least games per season). This left me with a list of 154 managers. We’ll start with the 20 managers that had the highest PPDA values across their managerial reign. As these managers are associated with teams that allow a lot of passes per defensive action they very rarely employ a high press.
The 20 Managers that use High Press the least
The list is mostly made up of coaches of English and French leagues. I don’t know a huge amount about the individual French managers, but it wouldn’t be an understatement to say that the managers of the English teams in this list were, generally, pretty ordinary coaches. Martin Jol, Martin O’Neill, Chris Hughton, Avram Grant, Sam Allardyce, Steve Bruce and Steve Keen haven’t achieved a great deal of success over the last 4 seasons.
What came first, the chicken or the egg? It’s difficult to know if the teams those managers controlled didn’t employ a high press because they didn’t have the skilled playing staff to carry it out, or were those teams largely unsuccessful because they didn’t press. At this stage, I’m not quite sure how to untangle this possible correlation / causation question but it would be wrong not to address the obvious potential correlation between the use of the press and the (lack of) success of their teams.
20 Most Pressing Managers
Let’s now concentrate on the other end of the list. Who were the managers and coaches that have made the high press a fundamental part of the way they instruct their teams to play?
The first thing to notice is that first place in this table isn’t held by any of the managers I named previously as being synonymous with the use of the high press; albeit most of them appear somewhere on the list. Rayo Vallecano have played in La Liga the past 3 seasons, and both of the managers that have taken charge of them during this time appear amongst the top 6 high pressing managers. Jose Ramon Sandoval, their manager during the 2011/12 season, earns the accolade of being the manager that consistently made the most use of the high press.
After Sandoval’s stint at Rayo, he moved to Segunda side Sporting Gijon, and he appears to be currently out of work after parting ways with Gijon at the end of last season. The knowledge that Rayo Vallecano has been a leading exponent of the high press over the last few years is not new. This facet of the game employed by an otherwise ordinary team has been picked up by probably the two most influential tactical sites; Zonal Marking and Spielverlagerung.
It’s no surprise to see Marcelo Bielsa, who is held up as being the inspiration for the high press, appearing prominently in this list. It’ll be interesting to see the impact that the Argentine has on Marseille’s use of the high press this season. Unlike previous seasons, the French side did not favour the use of the high press during the last campaign, and one would assume that Bielsa would attempt to reinforce his preferred tactical style during this term.
The tactics employed by Marseille, and their PPDA value, is one to keep an eye on as the new season unfolds. Bielsa, is said to have been responsible for the footballing education of both Gerardo Martino and Mauricio Pochettino and was also a huge influence on the style of Pep Guardialo. So although Bielsa only managed one club side, Athletic Bilbao, during this period, it’s no surprise to see that his fingerprints are all over this table of the top pressing managers.
Andre Villas-Boas
One name that some might wonder about its absence from this list is that of Andre Villas-Boas. I am sure that Chelsea and Tottenham fans would suggest that AVB ran a kamikaze high defensive line during his tenure at both London clubs. However, my PPDA values don’t suggest that he actually used a high press during his two managerial reigns.
In fact, out of the 154 managers in the list, AVB is only 110th in terms of managers based on how aggressively their teams pressed!! If Villas-Boas attempted to operate the high press he was definitely doing it wrong. Is it possible that AVB just had an awful set up where he played a high defensive line but his teams weren’t actually able to press high up the pitch?
The opposition players then had time on the ball to pick out a pass and play through his defence which was, by now, stranded high up the pitch. The end result is a great shooting opportunity for the opposition due to a lack of defensive pressure and cover. The relatively high PPDA values recorded by AVB teams over the period covered by this analysis certainly suggest that they weren’t successful in applying pressure where it needed to be applied given his favoured defensive setup.
Villas-Boas vs Pochettino comparison
My assertion that Villas-Boas’ teams were not successful in pressing may come as a surprise to many. With this in mind I thought it would be interesting to visually compare the PPDA values for Tottenham’s last two permanent managers – AVB and Pochettino. So yes, I am erasing the existence of Tim Sherwood for this exercise.
AVB only managed 81 games during this period, whereas Pochettino has been in charge for 143 games - this is why Pochettino’s red line is much longer than AVB’s blue line. However, what this image shows, due to his lower PPDA values, is that Pochettino has consistently been much more aggressive in his use of the high press. This demonstrates that the use of a high defensive line does not necessarily mean that the team operates a high press; indeed perhaps AVB is a case study as to why one without the other can be a recipe for disaster.
How high a line did Villas-Boas use?
I wanted to see if it was possible to test the hypothesis that AVB employed a high defensive line without having a high press by using the data that I have access to. To obtain a proxy for the height of the defensive line I obtained the average x Opta co-ordinate for all defensive actions for defenders. Like all Opta detailed events this only captures “on the ball” events so if a defender was high up the pitch but didn’t made a tackle, a foul or an interception I won’t be aware of his existence.
Although I am aware of this proxy’s shortcomings, over a large number of games I would expect the average x co-ordinate to give us an idea of where a team’s defenders typically engaged with the opposition. During the period that AVB was in charge of proceedings at White Hart Lane (from the start of the 2012/13 season until 15th December 2013) his Tottenham team operated the second highest defensive line in the Premier League – only Man City’s defenders had their average defensive touch higher up the pitch than Spurs’.
However, on the PPDA pressing metric Tottenham were just the 7th most pressing team in the league. So it does seem that there was a mismatch in the height of Tottenham's defensive line in comparison to the pressure that his players further up the field were able to put on the opposition. I’m sure that AVB’s failings were more complicated than just this, but the above seems to be a plausible explanation of at least part of why Villa-Boas’ brand of defending during his time in England was so porous.
Osasuna
When compiling the two tables that appeared in this article I was intrigued by the PPDA values notched up by the various Osasuna managers. Mendilibar appears in the list as running one of the most aggressive high press systems, yet the manager that replaced him at the beginning of last season, Javi Gracia, totally abandoned this style as he ran a team with a seriously high PPDA figure.
The immediate change in style after Mendilibar was sacked is apparent as Osasuna’s PPDA climbed in an almost vertical fashion over Gracia’s first dozen games in charge. At this stage, mid-November, Osasuna was occupying one of the relegation places. Over the next 14 games an improvement in results (they climbed to 13th position in the league) was matched by a substantial reduction in their PPDA value. Unfortunately, this momentum was not sustained and the final stretch of games seen them collect just 10 points from the last 12 games, a tally which ultimately seen them relegated.
Again, the downturn in results is matched with an increasing PPDA value. Although I’ m not claiming that all teams will get better results by pressing more aggressively, we can see that in the case of Osasuna in 2013/14 this correlation definitely held true.
Arsenal 2 vs 1 Crystal Palace (16th August 2014) A few things I noticed are listed below, but I am sure that people will have their own opinions on what the viz shows.
Other than Gibbs (and then Monreal), Arsenal were very much orientated towards the right side of the pitch. Cazorla was notionally on the right side of midfield, but the Spaniard played very centrally. Palace facilitated this as Puncheon (their right midfielder) also playing narrow and central
Chamakh played exceptionally deep during the second half - he was behind his midfield for large parts of the second half
Arteta played a very disciplined role. He never moved outside the centre circle on this image (Note, we are not suggesting that he didn't move outside the centre circle all day!!)
Cazorla and Ramsey played very close to each other, with Cazorla always just in positions that were slightly closer to the Crystal Palace goal Click on the viz to open in a larger window
This article has been co-written by Colin Trainor and Constantinos Chappas
Here at Statsbomb, we like to see how we can use data to assist us in gaining a better understanding of what actually happens on the football pitch. None of us are suggesting that data should replace watching games, but we are adamant that the intelligent use of data complements the watching of games.
It’s pretty nigh impossible for football fans to watch every game, even just in the Premier League. Data helps us get a picture of what happened in the games that we weren’t able to see live or watch a tape of.
Data, when used properly, also provides an objective view of what happened in a game. No longer do we need to rely solely on our recollection to see how a team or player performed throughout a game. Is there a danger that we only remember the good (or bad) things that a player did over the course of a game? If so, data can supplement our natural recall ability.
The Player Positional Tracker
Constantinos Chappas and I have realised that we can use the very detailed Opta event data to visualise the positions that the players took throughout the duration of the game. What follows is our first attempt at this visualisation and I have no doubt that improvements and tweaks will be made in the coming weeks as we begin to receive feedback.
Our aim is to have a selection of these visualisations posted on Statsbomb each weekend. I think that readers will find them of interest and they should help in understanding how players’ roles might have changed during a game. As an introduction the following is the gif for a game from the opening weekend of last season’s Premier League, Chelsea defeated Hull 2-0.
Chelsea 2 v Hull 0 (18th August 2013)
(click on the image to open it in a larger screen)
The reason I chose this game for our introduction is that it captures how the shape of a game changed. Chelsea dominated the first half and raced into a 2-0 lead by half time, during that first half they took 18 shots to Hull’s 2. In the video we can see that Hull’s players were barely able to get out of their own half and all the passes were between Chelsea players.
However, the second half was a different story as it appeared that Mourinho instructed his team to shell. Both teams took just 5 shots in the second half as Chelsea retreated right back and were happy to invite pressure from Hull. As well as operating more defensively, another thing that is noticeable in the second half from the Chelsea defence is how narrow they went in the second half – Cole and Ivanovic both operated narrower in the second half than they did in the first.
Presumably this was a deliberate strategy. The viz also contains passing network for the timescale around the minutes shown on the clock, with thicker lines representing a higher volume of passes.
Technical Appendix
For anyone interested in the technical details of how the data points were created, Constantinos wrote the following brief guide:
An often presented image is that of the average position of each player during a match. But that alone has its limitations. For example, here is the location of a fictional player’s successful actions during a match along with his average position shown in brown:
What is evidently a wide midfielder / winger operating in either side of the pitch is depicted by his average position as an attacking midfielder, playing down the middle, behind the striker(s). In truth this fictional winger must have switched sides sometime during the match.
The problem described above can be solved by breaking a match in smaller time intervals and monitoring the average position of players during those time intervals. For example, here are the same player’s actions in either half:
During the first half the player was predominantly on one flank (dark blue) while in the second half, he was mostly operating in the opposite side (light blue). Of course, this can further be broken down in even smaller time intervals. However, one must bear in mind that the smaller the time interval, the fewer data points and therefore the location of the average position of these points may be erratic (i.e. jump around!)
To address this issue, we have taken this a step further. Instead of simply calculating average positions on the pitch, we have modelled these positions against time using what is known as local regression (http://en.wikipedia.org/wiki/Local_regression). This method can provide a smoothed, averaged representation of the position of a player’s actions around any chosen time in a match. Doing this for all players and plotting their position at any given time produces a “movie” which can help when examining team formations during a match. It should be noted that this tool is not designed to provide an actual representation of where each action took place but rather to capture each player’s general area of operation during a match.
In addition to this, we have included passing networks depicting the most popular combinations of players exchanging passes around the chosen time period. Thicker lines indicate a higher frequency of successful passes between those players.
Wayne Rooney's attacking heatmap, using Opta data, over the past 4 league seasons for Man United.
BTW, as with all of these heatmaps, a huge thanks goes to Constantinos Chappas who created the code that enables them to be produced.
2010/11 to 2013/14 Premier League Seasons
2010/11 and 2011/12 were quite similar with Rooney playing the role of the main Centre Forward on the United team during these two seasons. Very rarely was he required to play deep enough to make many passes in his own defensive half during these two seasons.
2012/13
Robin van Persie joined United at the start of the 2012/13 season, and we can clearly see the different role that Rooney was asked to play from that point on.
His heatmap is much more widely spread, especially defensively. In this 2012/13 season he dropped deeper as his defensive duties increased and he took on a more creative role. No longer was Rooney required to be the main goalscorer.
He notched up more than 2 Key Passes per90 in this term, up from the 1.5 that he had during the 2011/12 campaign. Unsurprisingly, his shots volume dropped to 3.6 per90 from a mark of 4.5 the year before.
2013/14
Rooney's attacking heatmap for last term is broadly similar to that of the 2012/13 season, however, it appears that he didn't function as much in the very centre of the pitch as he has done for the previous 3 years. We can see that he has some heatspots, signifying a lot of attacking actions, just to the right and left of centre during the 2013/14 season.
Whether this was a deliberate decision driven by Moyes to actively find pockets of space or simply a quirk of one season's data I am not sure. Incidentally, the location of his involvements did not change much after Mata's arrival as his heatmap before and after the end of January (when Mata arrived) show very little differences.
As has been posted previously here on Statsbomb, Wayne Rooney has been asked to play differing roles during his time at Old Trafford. Before the arrival of RVP he was the main goal getter for United, but that changed at the start of the 2012/13 season. However, regardless of the role that he is asked to play he has tended to deliver for the Red Devils.
In my end of season Liverpool review I used some stats that showed Liverpool hadn’t actually taken any more shots from Fast Breaks than they had in the previous season. No doubt, anyone that watched much of Liverpool would point to the lightning fast breaks that Sturridge, Sterling and Suarez seemed to mount on a regular basis.
It is for this reason that I qualified the assertion that they didn’t create more Fast Break chances in this past season by saying that this conclusion was based on the Opta definition of Fast Break. I don’t think Opta publish their definition of Fast Break, so it is conceivable that their definition of a Fast Break is quite narrow, and so many of Liverpool’s attacks, whilst fast, may not have tripped the Opta Fast Break qualifier.
I did say in the Liverpool review article that I would like to take another look at this subject and as StatsBomb now has access to several seasons of Opta's most detailed on the ball data I’m now in a position to look beyond just the Opta definition of Fast Break, and perform some advanced analysis on the speed that teams attacked with.
Does Speed of Attack matter?
It certainly seems intuitive that having fast attacks should be a great way to launch efficient attacking moves. The attacking team has the chance to drive forward against an opposition that doesn’t have time to set itself or possibly has gaps in its otherwise well drilled defensive shape. Contrast that with a slow, laboured approach and the barriers that such pedestrian attacks must overcome to score. Of course the highly technical teams, such as Barcelona in their tika taka prime, have players that can retain possession long enough to eventually force the opposition to make a mistake. But this tends not to be the attacking type of choice for most of the world’s football teams.
So, although faster attacking moves would seem to be a more efficient way of scoring than slower moves let’s first establish some baseline numbers.
I have set out detailed methodology at the bottom of this article for the techniques that I used, but in summary I generated a metric for speed of attack (in metres per second, or m/s) for every shot in the Barclays Premier League over the last 4 seasons (from 10/11 to 13/14).
To start off, I simply sorted the data set by the Speed of Attack and divided the data into two groups at the median Speed of Attack value (2.684m/s) – this gave me two groups with an equal number of shots in each.
In this case, it is reassuring to see that the numbers back up the widely held assumption that attacking with speed and purpose is much more likely to lead to a goal than the shots that result from very slow attacks. Due to the large sample sizes the difference in conversion rates between the two groups are significant from a statistical point of view, with a p-value of <0.00001. Thus, we can confidently conclude that there is an advantage in having shots from fast attacks in comparison to slow attacking moves. Ok, so that’s not ground breaking but we needed to stick some foundation pegs in the ground.
Deciles
If we divide the data sample into more groups does the relationship of faster attacks equalling higher conversion rates still hold?
To answer this question I divided the data set into deciles, with each decile containing just over 1,500 shots. The summary information for each decile can be seen in the table below:
…and in graph format we can see the very strong correlation with the shot conversion rate increasing almost in line with the step up in decile number.
I appreciate that I have only plotted 10 data points in this chart but we can see that there is a very good, uniform correlation between the speed of the attack and the conversion rate of the shots coming from those attacks.
Shot Location vs Speed of Attack
I think it’s worth looking at some of the above figures in detail; specifically how the conversion rate changes depending on the decile of Speed of Attack and how that in turn relates to shot location.
This table plots the conversion rate for shots for each decile as well as the proportion of shots that were taken from the area that I term as the Prime Zone, ie the central portion of the penalty area.
At this stage, I imagine that everyone is aware that shot location should be taken into account when evaluating how the raw shot numbers translate into shot quality. However, the above table is super important as it seems to suggest that the Speed of Attack may actually be even more important than the location of the shot.
It can be seen that the two groups with the fastest attacks (Deciles 9 and 10) had the best conversion rates. Groups 7 and 8 had a higher proportion of shots in the Prime Zone yet had a slightly lower conversion rate than 9 and 10.
Generally, the shot location correlates well with the Speed of Attack, in that the very slowest attacks also tended to have the shots from the worst locations. This will be partly due to the fact that the slow, or even backwards in some cases, attacks will have allowed the defence to get set and force the attacking team to shoot from bad locations. Possibly a little out of necessity and a little out of sheer frustration.
However, the key takeaway here is that the swiftest attacks can overcome any potential advantage that working the ball into the best locations might bring. Of course, in many instances these two aspects will be intertwined as it will be easier to get into Prime locations from faster attacks than it is from slower attacks.
In Speed of Attack we have a metric that rivals, and perhaps even bests, shot location in terms of its impact on a shot resulting in a goal.
2013 Premier League
As well as calculating the speed of each attack I’ve also been able to count the number of touches (attacking events) that Opta assigned to each attacking move prior to the shot. This information can be used to provide an objective measure of how direct each team’s attacks were on average last season.
Directness of Attack This table includes the shots from my filtered down data set (see the Methodology piece at the bottom of this article), but in summary it includes any attacking moves that started in open play and seen at least 2 attacking events occur before the shot. It shows the average touches that each team took in their attacking moves prior to the shot.
It’s always comforting when a list of numbers that is calculated in a scientific method passes the eye test, and this table has achieved that. The teams at the zenith of this table are certainly the teams that are happiest when in control of the ball; Man City, Arsenal and Southampton.
Sunderland
It may be surprising to see Sunderland appear so highly up the average number of touches table, as in general, weaker teams tend to play a more direct style of football. However, Sunderland’s rank should come as no surprise to those who know their manager Gus Poyet. In fact, last November he pleaded for patience from their fans as he said:
I want the team to learn to be calmer, to pass the ball better. It is going to take time to get this way of football going but trust me, the fans will like it in the end.
Not only was Poyet successful in keeping the Black Cats in the Premier League this season, but he also succeeded in changing the way that his team played. Sunderland’s average touches per shot of 5.99 this season compares very favourably with the figures of 4.30, 4.46 and 4.53 recorded respectively in each of the three previous seasons. As can be seen from the above table, the jump from an average of 4.4 touches per shot to 6 is substantial and requires an entirely different playing style. Poyet deserves a large amount of credit for successfully pulling off such a transformation.
Most Direct Teams
At the other end of the table we see the very direct teams. This group includes Fulham, Crystal Palace, West Ham and Aston Villa; none of whom seem out of place in these positions. On the other hand, Chelsea’s place in this table sticks out with them being the 8th most direct team, yet they are undoubtedly one of the three best teams in the league. The Blues differ from the other top teams as they are extremely comfortable without the ball and are happy to sit back, soak up pressure and hit teams on the counter attack.
All in all, the simple measure of the average number of attacking events prior to each shot seems to objectively encapsulate the various playing styles of each of the teams in the Premier League.
Speed of Attack
Don't worry, I haven't forgotten that this article is supposed to be about the speed that teams attack with. To keep things simple to start, I calculated the median Speed of Attack for each team (in m/s).
However, using this method causes a problem which can clearly be seen when I plot the average touches per attacking move with the median Speed of Attack.
Apart from a few noticeable exceptions, namely Hull, Man City and Arsenal we can see that there is a very strong correlation between the two measures. This is entirely understandable as effectively we are plotting the average Speed of Attack against the average number of touches in the attack and thus we would expect a strong negative correlation, which can be clearly seen. It is apparent that we need a slightly different measure.
Although it is interesting to see the average (or median to be correct) Speed of Attack of each team, this metric isn’t what I set out to investigate as the median number will be compressed due to its nature. It is entirely feasible that a team (such as Liverpool) could have an average attacking speed of x m/s, but a chunk of their attacks are substantially faster than that. It is the ability of a team being able to mount fast attacks that I set out to measure.
The method I finally settled on was to look at the Speed of Attack for each team’s 95th and 90th percentile for Speed of Attack. I used a percentile basis instead of simply looking at the speed of the (for example) 10th or 25th fastest attack for each team as the teams that took relatively few shots would be at a disadvantage. The teams that took more shots would be expected to have a greater diversity in their Speed of Attacks, ie more very fast and very slow attacks.
95th Percentile
The following table shows the attacking speed of each team’s 95th percentile, ie each team had 95% of their attacks slower than their respective values in this table:
It will probably surprise most readers of this article to learn that, when looking at the speed of the fastest 5% of each teams’ attacks, Hull City emerged as the Premier League team that had the fastest attacks. It’s important to note that The Tigers haven’t got to that position just by playing long balls, unlike say, West Ham who also appear towards the top of the table. In the table I included above we observed that Hull were in the middle of the pack in terms of how directly they tended to play.
Although we have seen that Arsenal take a lot of touches per attacking move, it is clear that they can occasionally launch very quick attacks, as the speed of their 95th percentile attacking move is almost 7.5m/s.
Needless to say, I am somewhat surprised to see Liverpool only appear in 7th place on this measure.
At the foot of the table, it appears that Swansea take their passing methodology and desire to retain the ball to an extreme. The pace of their 95th percentile attack is decidedly pedestrian (remember 95% of their shots are from attacks that are slower than this value) when compared to the other Premier League teams. I would suggest that the ability to mix up their tactics and introduce some pace, a la Arsenal, should be high on Gary Monk’s “To Do List” next season.
The above table provides yet another reason why Man United struggled under David Moyes last season. His team simply seemed incapable of attacking with pace.
90th Percentile
In this next table I looked at the speed of the 90th percentile attack of each team, ie 90% of each team’s attacks were slower than this rate:
As with the 95th percentile table, Hull had the fastest attacks on this measure – this shows that they didn’t just have a very few fast attacks, but were able to maintain a fast attacking tempo fairly consistently. In doing this, they achieved what Arsenal was unable to do, as the Gunners slid down this table compared to the 95th percentile one.
We can see the teams that tend to play in a more direct fashion rise up this table. This is to be expected as we move towards the median speed of attack that we observed earlier in this article.
Liverpool
Once again, Liverpool is ranked right in the middle of the table. Despite what our memories and recall told us about Liverpool’s attacks last season, it does indeed look like I arrive at the same conclusion as I did in my end of season review. It appears that The Reds were not attacking faster than they were the previous season. The number of Fast Breaks as defined by Opta suggested this, and after looking at the speed of each individual attack I am not finding any evidence to contradict this assertion. I would guess that the terrific success rate that Liverpool enjoyed on their Fast Breaks this season, where they converted 33% of such attacks, has meant that subconsciously we remembered a lot more of their fast attacks than we did for other teams.
Hull
When I first planned the mechanics behind this article I had assumed that I would have seen Liverpool posting the fastest attacks. However, this was certainly not the case. Instead I was very surprised to see that Hull claimed this accolade, and as stated above, these quick attacking movement numbers don’t seem to have been earned by them playing a very direct style. I’m sure that their numbers are flattered by the fact that they didn’t see very much of the ball or didn’t have a huge number of shots. According to Whoscored, only 5 teams had less possession of the ball than Hull and only 3 teams shot less than they did.
This meant that the environment existed for them to launch fast attacks. However, other teams that ceded possession should also have been in a position to do this, but Hull managed to do it better than everyone else.
I would contend that Hull were arguably unlucky with the outcomes of their fast attacks. They failed to score with their 12 fastest attacks (measure in m/s), and their 13th fastest attack resulted in a goal scored by Jake Livermore which can be seen below. I even think I can offer a suggestion as to why The Tigers drew a blank with the shots that came from their 12 fastest attacks this season – 8 of them fell to either Shane Long or Nikica Jelavic.
Perhaps they need Jake Livermore on the end of a few more of them…… Fast forward to 3:11 in this video to see the perfectly exectuted fast attack. [youtube id="kB0z-30FRfQ" width="633" height="356"]
Speed of Attack Methodology
For each shot, I worked backwards until I reached the start of the possession chain, and I noted the x co-ordinate of the start of the attacking move, the x co-ordinate of the shot position and the time that elapsed between those two events. I was only interested in the speed of the vertical movement of the attack, so I ignored the y co-ordinates for this exercise.
I calculated the vertical distance in metres of the attack with the formula {(shot x – move_start x)*1.05} and then divided that distance by the duration of the attack (in seconds) to give me a metres/second (or m/s) value for each attack.
At this point I should mention that where a shot was the first action of an attacking team in the move, ie from a free kick, a penalty, or scoring directly from an opposition touch that such a shot would not be assigned a speed of attack. Shots from these events were excluded in this study.
It would be unfair to compare attacks from open play with those that originated from a set piece or a goal kick as the latter attacks would generally face a solidly set defence. To combat this I am only looking at attacks where the first event in the move was not one of: goal kick, free kick, throw in or corner kick.
Own Goals were excluded from the data set.
I added two more filters to that data set that could be construed as objective, but I am happy with the rationale for my inclusion of them. The first of these is that I excluded any shots where the move began from a location with an Opta x co-ordinate of less than 17, ie inside the teams’ own 18 yard line. I introduced this to reduce the chances of a goalkeeper kicking a long clearance to his striker who has a shot. I’ve no doubt that such a move would have an exceptionally fast Speed of Attack number, but this tactic isn’t what I’m trying to analyse here.
The final filter that I applied is that an attacking move had to have at least two Opta attacking events before the shot was taken, ie at least two passes or a successful take on and a pass. In reviewing tape of attacks that had just one event preceding the shot many of them came from a defensive header from inside the box or other instances where the defence didn’t really have full control of the ball. The clearances were passed back into the box and a shot was taken. As the attacking sequence was very short, at just a couple of seconds, these attacks registered a very fast m/s value. Once again, this type of attack isn’t what this analysis is about and so I didn’t want the outcomes of these attacks (be they good or bad) contaminating the data set which will mainly contain longer attacking moves.
All the above resulted in a final data set of 15,289 shots, out of more than 40,000 total shots across these four seasons.
I am aware that there could be some debate about which attacking moves to include in the study. It is extremely difficult to come up with water tight logic that will see that the study only includes “proper” attacking moves. As soon as the defence touches the ball I am working on the basis that the original attacking move has ended and a new one begins; even if it is the case that the defence only had a brief touch and didn’t have the ball under control. Of course, I can see how this isn’t ideal but we need to draw the line somewhere in terms of objectively deciding when one move stops and another one starts.
Despite that, I don’t feel that changing the data rules about which attacks ended up in the final data set would materially change any of the conclusions reached in the data. I’m merely acknowledging that there is no right or wrong answer in terms of how to piece together attacking moves, even from the detailed Opta data files.
With Man City's win over Aston Villa on Wednesday night it now appears that Liverpool will narrowly fail in their quest to win the 2013/14 Premier League. Regardless of the destination of the Premier League title, the general consensus is that the Reds have improved considerably from last season. A cursory look at the league tables appears to confirm this to be the case.
With one game left to play, Liverpool have already gained 20 league points more than last season. Unsurprisingly, their goal difference is also showing a marked improvement as they have scored 28 goals more whilst conceding just 6 goals more than last term – this equates to an improvement in goal difference of 22 goals compared to 12 months ago.
However, for me, the improvement in the general performances of the team isn't quite as marked as the league table suggests being the case. Before Liverpool fans shut down this page please note that this isn't entirely a bad thing.
Last Season
We live in the “here and now” and a lot can happen in 12 months, much of which we seem to have the capacity to forget. The Liverpool team of 2012/13 that were dreadful in front of goal now seem to be from another era. Last Summer Constantinos Chappas wrote specifically about Liverpool’s wastefulness in converting chances during the 2012/13 season in this article The second plot in that article shows that Liverpool were just the 14th best Premier League team in terms of how they finished their chances (once the chances were controlled for quality).
Fast forward 12 months and things are entirely different, with the triumvirate of Suarez, Sturridge and Sterling seemingly knocking in goals for fun.
Expected Goals
Based on our ExpG model (in short it assigns goal scoring probabilities based on the specifics of the shot) we expected Liverpool to score 5 goals more than last season. However, we have seen that they have scored a whopping 28 goals more than last term. This substantial increase in goals scored is due to the double effect of Liverpool’s finishing last season being very poor and this one being exceptionally good.
Liverpool fans may say that the brand of football their team is playing this season is much better and expansive than last season, that there has to be a greater difference than just 5 goals between what my model has expected them to score last season and this; that my model is wrong.
To coin a phrase use regularly used by Simon Gleave, this may be a case of Scoreboard Journalism. We (be that football fans or members of the media) tend to evaluate events by reference to the once off outcome rather than evaluating the process or by adequate reference to what “should have happened" . The difficulty with deciding what "should have happened" is that there is no one agreed uniform metric for how to measure this, but all I can say on this is that our ExpG measure is an objective measure that has used the same claculation method over the two seasons in question.
Liverpool's Fast Breaks
Perhaps my ExpG measure doesn’t accurately take account of the scintillating counter attacks that seem to have been the signature of Liverpool’s charge for the title. Thanks to some research undertaken by Andrew Beasley we can see that Liverpool have had 27 shots from counter attacks this season. So how does that compare with last season?
It may surprise you to discover that they also had 27 last term!!!
In Beasley’s article he makes the point that there is some subjective assessment by Opta in what qualifies as a Fast Break and there may well be attacks that many observers would think qualifies as a Fast Break but which Opta haven’t denoted as such. However, on the assumption that Opta have been consistently applying the same criteria in denoting a Fast Break over the last few seasons then this point is moot for the purposes of this analysis as the same narrow definition would have been applicable last term too.
It appears that Liverpool haven’t actually had more Fast Breaks than they had last season, but their conversion rate of 33% (9 goals scored compared to just 1 last season) on such attacks has perhaps fooled us into thinking that this form of attack has been used more often this season than before. Perhaps that notion can be put down to Scoreboard Journalism as well, although I do agree that there is the possibility that Liverpool attack in a manner that could provide them with good scoring opportunities which Opta do not classify as a Fast Break. More on this later.
Liverpool’s Shot Choices
Liverpool’s average shot quality has improved this season, with their average ExpG per shot (exc penalties) of 0.118 dwarfing the 0.101 they achieved last season. However, this improvement in shot quality has partly been offset by the fact that they are shooting considerably less than last season. With 1 game remaining, the Reds have taken 101 less shots than during 2012/13. I’m not arguing that this drop off in shot volume is necessarily a bad thing (as the conversion rates are so low on long distance shots) but it helps explain why we have Liverpool in for just 5 more goals than last year despite an increase of almost 20% in their average shot goal chance.
The difference in the shot choices taken by Liverpool this season can be clearly seen in the Shot Charts, with the 13/14 chart being shown on the left, and last season's Shot Chart appearing on the right.
We can see the noticeable reduction in shooting volume by almost 2.5 shots per game (19.3 down to 16.9), but the shots from within the Prime Zone (central area inside the penalty area) has reduced by just 0.8 per game. The remaining decrease of 1.5 shots per game in shot volume has occurred in the Secondary and Marginal zones, where the expected conversion rates will be lower.
A template outlining the perimeters of the four zones has been included at the bottom of this article.
Although Liverpool are now shooting from smarter locations on average we can see why this only has the impact of increasing their Expected goal total by 5 year on year (albeit I am comparing the 37 games played so far this season with the previous full 38 game season).
So how the hell have they managed to score 99 league goals so far this campaign?
A 1 in 14 Season
Very simply, Liverpool have had an amazing season in converting their chances; they have ran incredibly hot. I processed 10,000 simulations of the shots that Liverpool took this season and they achieved their current total of at least 95 goals (as this excludes Own Goals) just 7% of the time.
So, based on their shots taken, just 1 time in 14 could they be expected to score at least as many goals as they have achieved.
Comparing to previous Premier League Performances
Using Opta data I have gone back 5 seasons in the EPL (until the start of the 2009/10 campaign) – this gives me a sample size of 100 (5 x 20) individual team seasons.
Shot Conversion %
Liverpool’s current conversion rate of 16% (99 goals from 638 shots) is the highest conversion rate for any team over the last 5 seasons. That is quite the accolade. Interestingly they are closely followed by this current Man City team.
(The table below includes OGs)
The two teams that immediately trail Liverpool in this metric are the mystical 2012/13 Man United side who somehow won the Premier league last year and the current version of Man City.
We have shown that Liverpool’s chance quality has improved this season, but their average ExpG (exc penalties) of 0.118 is lower than the two Manchester teams that immediately follow Liverpool in this table. Man United’s average shot quality was 0.13 last season, and the 2013/14 Man City team has an even higher average shot quality of 0.132. All three of these teams mentioned have over performed their Expected goals total, but none by as much as Liverpool have done this season.
On my numbers, Liverpool couldn’t have expected to convert at a rate of 16% of the shots they took this season - even allowing for their improved shot quality. However, this is exactly the feat they achieved and I can help to understand how that happened.
Liverpool’s Blocked Shots
As we draw towards the end of the season and I reviewed Liverpool’s shot numbers, the lack of Liverpool’s blocked shots was extremely noticeable. Using the same 100 team sample as outlined above, I looked at the rate which those teams had their efforts at goal blocked.
Once again, the Anfield side sit proudly atop this particular table, and comfortably so.
The other 99 teams had their shots blocked between 21% and 31% of the time, but we can see there is clear space between the remainder and Liverpool’s incredibly low blocked shots ratio this season of just 19%. It may be suggested that with Liverpool’s very fast transitions it is reasonable to expect a lower rate of blocked shots than the average team due to the opposition defences being unable to set themselves properly. However, I have shown that, according to Opta, Liverpool had the same number of Fast Breaks as they had last season and at 27, they have just 6 more than Man City have had this season.
It is possible that Liverpool’s attacking strategy is such that they do have faster transitions than other teams, yet they don’t trigger Opta’s definition of “Fast Break”. However, in the absence of more detailed data I have no alternative but to work with this (possibly) narrow definition. I would like to revisit this area again if detailed event data becomes available, but until then I can only make reference to the Opta designated Fast Breaks and this metric doesn't seem to explain away such a low rate of blocked shots.
Having Few Blocked Shots is good?
Of course this is the case, but if I were a Liverpool fan I would want to know how likely it is that their team will enjoy such a low rate of blocked shots again next season. Is this aspect of Liverpool’s attacking strategy something that they can hope to replicate next season?
Looking at the EPL for the last 5 seasons I plotted the correlation between Blocked Shots % in Year n and Year n+1, ie how repeatable are offensive blocked shot percentages.
As I needed two consecutive seasons in the Premier League, I was left with 68 pairs of Blocked % to plot as follows:
Although the sample size is small, unfortunately for Liverpool there appears to be virtually no correlation between a team’s percentage of shots that are blocked from one season to the next. A perfect example of this is Liverpool themselves.
The Reds’ record low Blocked % of 19% this season follows last year’s 29%. Out of the 100 teams, their 2012/13 rate of blocked shots was 84th, but just 1 season later Liverpool followed that up with the best ratio posted in terms of blocked shots %!!
I have previously shown that the further out a shot is taken the greater the chance that it will be blocked. We have seen that Liverpool have been shooting smarter this season, however, the differences in the rate of being blocked are nowhere near large enough to explain a difference of even 1%, never mind 10% over their entire shot sample for the two seasons in question. The above plot also visually shows us just how much of an outlier Liverpool’s rate of blocked shots this season has been.
Based on the data available to us, which currently excludes very detailed event data, I don’t see any reason for Liverpool to expect to enjoy such a low rate of blocked shots again next season. There is always the possibility that I’ll be proven wrong as every outcome has a chance, no matter how small - just like Liverpool had an extremley slim 1 in 300 chance of winning their 11 games in a row. Perhaps Brendan Rodgers may do something that EPL managers have generally failed to do over the last 5 seasons, ie come up with a tactical wrinkle that leads to a consistently low rate of blocked shots. But until that point, I’m firmly in the “Liverpool will regress in respect of their percentage of blocked shots next season” camp. Needless to say, if that does happen then Liverpool's scoring percentage will correspondingly drop. Liverpool’s Defense
One aspect that Liverpool could improve on next season is in defence. Jamie Carragher, amongst others, has been critical of Liverpool’s defense.[youtube id="ptF89E-MEAs" width="640" height="360"]
I’m not going to comment on whether Liverpool have been below par in preventing opposition chances, but my numbers tell me that Liverpool have defended the chances they have conceded poorly with the concession of 6 more goals than I would have expected based on the opposition shots.
When I take into account the shot placement, it is clear that Simon Mignolet has had a poor season (as far as shot stopping is concerned). There are 17 Premier League goalkeepers that have faced at least 100 on target shots this season, and Mignolet has been the worst performing keeper of them.
Based on the Shots on Target faced by the Belgian, I estimated that he should have conceded 38 goals, instead of the 44 (this excludes Liverpool’s 5 Own Goals allowed) that he did concede. These numbers result in a save ratio of 87% which places him at the bottom of my Goalkeeper Saves ranking table for 2013/14 (of those that have faced at least 100 shots).
Although this information will make sorry reading for the Liverpool defence, the analytical work undertaken in the area of goalkeeper saves, such as this by Sander Ijstma suggests that there is next to no correlation in terms of how saves above or below expectation correlate from season to season. One possible reason for this is that the difference in shot stopping skills between professional top-flight goalkeepers is so small that the inevitable variance inherent in facing less than 200 shots per season drowns out any signal that there may be in the data.
As with Liverpool and their blocked shots, Simon Mignolet is a perfect example of this variance. Last season he was the 4th best Goalkeeper in the EPL as he conceded just 50 goals compared to the expected 62, yet 12 months later he appears right at the foot of the table. Proof, if it ever was needed about how volatility and variance plays such an important part in football, even over a full 38 game season.
The Future
In the event that Liverpool do not win the league this season I think that they will really rue this lost opportunity. Leaving analytics aside, the media have latched on to the fact that this season was a terrific opportunity for them to lift the title and there are several reasons why it might be more difficult for Liverpool to launch a similar title charge next term. Without me repeating these often trotted out reasons, example of same can be found here and here.
The reasons stated in those article may or may not come to pass, however I am confident that Liverpool have had a season in front of goal that they are unlikely to replicate in the near future. And, in the likely event that they don’t win the league this season, their inability to take full advantage of the way the cards have fallen in terms of their incredible shooting numbers will surely be frustrating for the fans and those involved with the club.
Liverpool fans point to the fact that they would have settled for Top 4 at the start of the season. On my ratings, Liverpool deserved to win a Champions League place last season. It was just their rank bad shot conversion that prevented this happening, but the underlying numbers don't back up the thought that there has been huge improvement in performance that the two league tables would suggest at first glance. As almost always is the case, when looking at two extremes I would suggest that the true picture is somewhere in the middle.
Liverpool are certainly amongst the best four teams in the country, and have been for at least the last two seasons. But although they could count themselves unfortunate to have come up against a Man City team that themselves have overperformed in terms of ExpG this season (although not as much as Liverpool have done) I can't help but feel that this was a terrific chance for the Reds to win their first Premier League.
Finally, unless their chance quality improves even more next season I'll be surprised if Liverpool are able to post the same sort of goal totals that we have seen this season, in which case 2013/14 really was a missed opportunity.
Unless of course that Man City do the untinkable on Sunday at home to West Ham.
What follows is a synopsis of my presentation at the OptaPro Forum which was held in London on Thursday 6th February. This article was first published on the OptaPro blog.
This analysis was only possible due to the data provided to me by Opta.
Expected Goals
The use of Expected Goals (ExpG or xG) as a metric in football is becoming more widespread. Even though all current versions if this metric are proprietary and use varying calculation methods, the aim of any Expected Goal measure is simply to assign numerical values to the chances of any given shot being scored.
The ExpG model that I and Constantinos Chappas developed produces a goal probability of approximately 3% for any shot that is struck from a central position outside the penalty area. Over the past year there has been recognition that shooting from long range is sub-optimal and by doing so a team is merely giving up other, more lucrative attacking options – think Tottenham Hotspur and Andros Townsend.
However, although I will admit that I had jumped to this conclusion in my own mind I was conscious that the alternative options open to the player in possession had never been evaluated before (at least not publically). This desire to establish baseline conversion rates for the different attacking options available to a player who was 25 – 35 yards from goal formed the basis for my Abstract submission.
Opta very kindly granted me access to the detailed match events for the English Premier League 2011/12 and 2012/13 seasons so that I could undertake this study and present my findings.
Those who are interested in the methodology I used can scroll to the bottom of this article, but for the sanity of any casual readers I will go straight to the findings of this study.
How many times was each option chosen?
Figure 1: Number of Opportunities for each FirstAction
Take ons were attempted much fewer than any of the other possible attacking options. With the exception of internal passes, all the other FirstActions were attempted between 11% - 18% of the time. At least part of the reason why there were so many internal passes is that some of the passes that were destined for forward central, wings or the corners would have been intercepted within the rectangle. As I’m using the end co-ordinates of the pass, and intentions can’t be measured, these passes fall into the internal pass category.
But how often was a goal scored from each option?
As each possible attacking option has not only a chance of the team in possession scoring, but also the move breaking down and the opposition quickly countering I wanted to look at the net goals scored. It seemed reasonable to assume that the choice of attacking option would have a bearing on how likely the opposition were to score.
To calculate the net goals scored figure for each option I deducted the number of goals scored by the opposition from the number of goals scored by the original attacking team (both within 30 seconds from FirstAction taking place).
Figure 2: Net Conversion Rates for each FirstAction
Shooting is good?
Much to my amazement, the choice of shooting was actually the joint most efficient attacking option for the player in possession to take.
I had certainly expected that a forward central pass would be one of the more efficient attacking options, but due to the lowly 3% success rate of shooting from this area I had expect shots to appear much further down the table.
Eagle eyed readers will have noticed that the net conversion rate for shots of 3.9% is much higher than the 3% I quoted at the start of my piece. Was I wrong in my initial understanding?
In my dataset a goal was scored directly from the initial shot in 2.9% of cases, however this was further supplemented by goals being scored from another 1.2% of initial shots due to secondary situations, ie rebounds or players following up. From this figure of 4.1%, a value of 0.2% was deducted to reflect the amount of times that the opposition scored within 30 seconds of the initial shot. And so we arrive at a net conversion rate of 3.9%.
Another surprising aspect is that, on average, a team only scored 1 in every 40 times that they had possession of the ball in the area under analysis. Without having any real knowledge, I had expected the number to be higher, but I guess it shows that our perception and memories can be misleading – hence why we should use data to aid us in our decision making processes.
What is the significance of these findings?
If these results can be taken at face value then no longer can we criticise a player for “having a go” from outside the area. He’s actually attempting one of the most efficient methods to score from his current location.
The findings are even more important to weaker teams. It appears that the option where the stronger teams have less of an advantage over the weaker teams is actually the option with the highest expected value (along with the forward central pass). I say that shooting is the option that should favour weaker teams because those teams are less likely to possess a number of players that can thread a well weighted through ball or play an intricate pass. They are also likely to struggle to attack in sufficient numbers to create an overlap down one of the wings or to have as many players in support of the ball carrier as the stronger teams will have.
As well as it being logical that weaker teams could benefit more from this knowledge than stronger teams, I was able to demonstrate this by ranking the teams based on average league points per game and splitting the sample into two halves – Top Half and Bottom Half teams.
Figure 3: Net Conversion Rates for Top and Bottom Half Teams
As expected, Top Half teams had a higher conversion rate than their Bottom Half equivalents across all FirstActions. However, we can see that the drop off between the Top and Bottom Half teams is at its lowest for the Shot option and also that a Shot was actually the most efficient option for Bottom Half teams; whereas Forward Central Passes were the most efficient options for the Top Half teams.
Statistical Rigour
I wanted to satisfy myself that the differences in the conversion rates for shots over the other options (excluding forward central passes) were statistically significant. I also excluded backward passes from these tests as I don’t think players choose a backwards pass with the expectation that their team will score a goal from it.
The Null Hypothesis used was that there were no differences in net conversion rates between the proportions.
Figure 4: p-values for significance in Net Conversion Rates
It can be seen that the Net Conversion Rates for shots are significantly different than corner passes, internal passes and wing passes. The only option that didn’t reach the statistically significant threshold was shots compared to take ons, and it is my opinion that with a larger data sample these proportions would also emerge as significantly different.
At this stage we have demonstrated that shots from outside the penalty area are just as efficient as forward central passes, and more efficient than the other possible options. However, I need to address the fact that there could be bias within the net conversion rates of shots.
Possible Sources of Bias
I am aware of four possible sources of bias that could be at play here which could artificially inflate the conversion rates of shots over other options.
Team Quality
Score Effects / Game State
Lack of Defensive Data
Natural Selection
I will briefly discuss each of these in turn and address them where possible.
1 - Team Quality Bias
We have already seen that Bottom Half teams convert shots at a higher rate than other options and that Top Half teams also convert shots at a relatively high rate. There are statistically no significant differences in how Top and Bottom Half Teams convert shots.
2 - Score Effect Bias
It is accepted that the styles and tactics teams use vary depending on the scoreboard. A team that is trailing are more likely to attack in numbers and a team that is leading may remain more compact. It could be possible that shots are being attempted, or converted, at different rates depending on the current score line.
To investigate if this was the case I temporarily removed the Opportunities that occurred when there were two goals or more between the teams. I then analysed the remaining Opportunities by looking separately at Opportunities which arose when the game was tied and when the game was close (ie tied or just one goal between the two teams):
Figure 5: Net Conversion Rate at Close and Tied Game States
Shots in the entire sample were converted at 3.9%, this is the same conversion rate for Opportunities arising when the game is tied and almost the same for Opportunities occurring when a team is leading by just a single goal.
It appears that shots are converted at broadly similar rates regardless of the current match score, and so there is no bias attributable this source.
3 - Lack of Defensive Data
The Opta dataset is very comprehensive in relation to on the ball events, but unfortunately I was not given any data that could help me ascertain the amount of defensive pressure on each Opportunity.
It could be possible that players shoot from Opportunities which have the greatest chance of their team scoring a goal and they only take other options such as passes to the wings or the corners when a shot is not possible. Conversely, there will also be occasions when a player could take a shot but opts instead to play a ball for an overlapping runner or attempts to thread a through ball inside the penalty area.
I do not have the data to be able to form an opinion on this either way, but am making the reader aware that this could be a potential source of bias.
4 - Natural Selection
In analysing this dataset I do not have knowledge of the tactics that each team attempted to use on match day or the instructions that were handed down by coaches and managers to the players. The final potential source of bias identified is the possibility that the only players that attempted to shoot from as a FirstAction were elite shooters (think Gareth Bale last season).
A player that is poor at long range shooting could be instructed not to shoot from an Opportunity or to always seek out the elite shooter. If this was the case, then the 3.9% Net Conversion Rate for shots that was observed in my dataset wouldn’t be representative of the entire sample of Premier League players.
I would counter that by saying that we know that it’s not just elite players that shoot. There will have been long range shots taken during the last two Premier League seasons by players who are not skilled in shooting. So this figure of 3.9% will already be diluted (to some unknown extent) by the conversion rate of non-elite long range shooters.
Even if non-elite shooters are expected to have a conversion rate below the average of 3.9%, the magnitude of the buffer in conversion rates enjoyed by shots over the alternative plays of wing, corner and internal passes and take ons are sufficiently large to suggest that taking a shot may even be the optimum FirstAction for non-elite shooters.
Conclusion
The purpose of this study was to establish benchmark conversion rates for each possible attacking Opportunity given a defined area of the pitch. I knew that I couldn’t capture all the information that was existent for each individual Opportunity but given the extent of the dataset used in this analysis I assume that I have obtained a representative sample on a macro level.
Given the visibly low conversion rates from long range shots I was surprised at just how efficient (relatively speaking) this option was. This reinforces the fact that it is not enough to simply know the success rate for any option; we must also be able to reference that against the opportunity cost or success rates of the other possible options.
I am not suggesting that players should shoot on every attack; however I have demonstrated that we should be wary of criticising players for attempting to shoot, especially those in less technically gifted teams. This study has shown that where players have opted to shoot it was, generally, the most efficient option open to them.
Armed with the information in this study it is no surprise that Tottenham had the highest conversion rate of their Opportunities over the last two seasons. Gareth Bale would certainly have contributed to the success rate last season, but the North London side converted their Opportunities in both seasons at 3.8% and Bale did not have an exceptional shooting performance during the 2011/12 season.
The logic and methodology used in this study could be carried out on other areas of the pitch and thus benchmark conversion rates could be established as required.
Methodology
I followed the flow of individual match events and created possession chains. For this analysis I was only interested in possession chains which had an attacking event (ie pass, shot or take on) take place within the boundaries of the red rectangle as displayed in Figure 6. Where an attacking event did take place within the rectangle I labelled this an “Opportunity” and it forms part of this analysis.
The boundaries of the rectangle in Figure 6 can be described (in Opta parlance) as:
80 ≥ x ≥ 67
65 ≥ y ≥ 35
In plain English, I was concentrating on Opportunities which occurred 23 – 37 yards from goal and in the central third of the pitch.
Over the two Premier League seasons there were almost 24,000 such Opportunities to analyse.
Figure 6: Rectangle showing boundaries for Opportunities
For my analysis I decided to have seven categories of attacking options based on the FirstAction carried out by the player within the rectangle. These were:
Internal Pass (red)
Corners Pass (yellow)
Wing Pass (black)
Forward Central Pass (blue)
Backwards Pass (orange)
Shot
Take on
To aid identification the colours noted above relate to the colours of the zone boundaries shown in Figure 7.
Figure 7: Boundaries of Five Passing Zones
To determine whether a goal was scored from each Opportunity I took the time of the FirstAction and allowed a period of 30 seconds to elapse to see if the attacking team scored a goal. I decided to use 30 seconds in an attempt to allow fluid passing movements to have a reasonable chance of concluding whilst trying not to contaminate the analysis with events from subsequent movements.
The reason that I chose a time based cut off instead of following the move until the team lost possession is that a clearing header by a defender does not necessarily mean the end of an attacking movement as the ball could drop at the feet of the original attacking team. Creating logic to determine when possession was really lost is challenging and objective, and so I avoided this method.
Following on from Daniel Altman’s excellent piece on the scoring rate of substitutes I thought I would undertake my own analysis on the impact of substitutes.
The methodology I will use is slightly different to that employed by Altman in his article. I will use the Big 5 European leagues for last season (2012/13), and I will study the goal scoring rates for all players that scored at least 6 league goals last season.
The use of this filter gives me a list of 268 players that scored a combined total of 2,782 goals in 617,331 minutes of playing time. This equates to an average scoring rate of 0.41 goals Per 90 minutes for our sample of players.
At this stage it is a well-documented fact that more goals are scored in the second half of games than the first half, and the apportionment in the Big 5 leagues last season was no different with just 44% of all goals scored in the first half and 56% in the second half.
The following is the distribution of goals in 5 minute time intervals for the 5 leagues last season:
We can see that, generally, the goal scoring rates increase in line with the time elapsed during the match. For my purposes, the minutiae of the goal scoring rates isn’t important, instead we just need confirmation that this trend does exist in my data sample.
In his piece, Daniel Altman found that forwards coming on as substitutes scored at a higher rate than starting forwards. But when we consider that more goals are scored in the second half than the first half then this is no great surprise. Substitutes will spend a greater proportion of their playing time in the second half (when goal expectation is higher) compared to the first half than a starting player.
So what do we take from this?
The fact that substitutes have a higher scoring rate means that you can’t directly compare Goals Per90 figures between players that regularly start and those who make frequent substitute appearances. Very simply, the substitute will have his numbers inflated and we would expect his Per90 numbers to drop in the event that he was handed a starting position.
However, Altman didn’t stop there and he found that “fatigue among forwards was a more powerful force than fatigue among defenders”. That sentence struck a chord with me and I wanted to investigate the general phenomenon of fatigue in footballers a little further.
Hierarchy of Goals Per90
We have established that the longer a match goes on the greater the goal expectation. This is one of the reasons why substitutes score at a higher rate than starting players. So, by extension of this logic we would therefore expect players who are substituted to score less Per90 than players who played the full 90 minutes.
Not only would the substituted player be swimming against the tide of playing at least as many first half minutes as second half minutes when the goal expectation is at its lowest, but the fact that he is substituted may also indicate that he hasn’t played a great game thus far.
That second suggestion certainly won’t be true all the time. The player may be injured, withdrawn for tactical reasons or just tired but it seems reasonable to assume that some of this cohort will have irked the manager enough with their performance to be substituted.
Even ignoring the suggestion that the substituted player has been having a less than stellar performance, due to the increasing goal expectation it is reasonable to assume that the hierarchy of Per90 goal scoring rates would rank as follows:
Substitutes_On
Full 90 minutes Players
Substitutes_Off
Now we're finalised our hypothesis, how does that compare with what actually happened last season?
Each game that our 268 players took part in last season was divided into the 3 categories: Substitutes_On, Full 90 and Substitutes_Off and I totaled the number of goals and minutes that the group of 268 players as a whole racked up in each category.
Big 5 Leagues 2012/13
As expected, substitutes coming on scored at the highest rate of our three groups. This group scored at a clip of 0.65 Goals Per90, however players that played the full 90 minutes actually posted the lowest Per90 numbers of 0.38 with the players that were substituted off sandwiched in between at a rate 0.42 Goals Per90.
I think this is a super interesting finding and it appears that Daniel Altman was spot on with his suggestion of fatigue being a big issue in the rate that forwards score goals. My sample doesn’t specifically just include forwards, but as it includes the leading goal scorers it will obviously be forward biased.
It looks like the fatigue factor is so strong that it is even able to overcome the fact that more goals are scored in the second half than the first half. We have shown that a player who starts the games and is withdrawn scores at a higher rate Per90 than a player who completes the full 90 minutes.
When you think about this, it is common sense. Players tire and it’s better to replace them with fresh legs, but I’ve never seen the impact of tiredness quantitatively assessed before. I have no doubt that clubs and organisations like Prozone have data that records the physical drop off in player performance due to fatigue but I am surprised that the impact is so strong for goal scorers that it outweighs the benefit of playing the entire second half of a game with its increasing goal expectation.
I’m sure that if we analysed the actual minutes that each player played and their scoring returns for those minutes we could remove the second half scoring bias and calculate exactly how much more likely a fresh player is to score than a player that has played the entire game. However, I’m going to stop short of these calculations in this article as that would require another level of data analysis.
I am conscious that the above findings are based on just one season of data, so to give me some comfort as to the integrity of those findings I looked at each of the 5 league separately to see how they performed individually.
Encouragingly, all 5 of the leagues follow exactly the same trend. The substitutes coming on comfortably post the highest Per90 scoring rates. This group have the parlay of being fresh as well as spending proportionately more of their playing minutes in higher goal expectation periods of the game. The players that were withdrawn have a slightly higher Per90 figure than the footballers that played the full 90 minutes with the benefit of freshness outweighing the back ended scoring bias.
I therefore feel that we can conclude that, not only do substitutes score at a higher rate than starting players but that the players who are subbed off score at a higher clip than their teammates that play the full 90 minutes.
What are the implications of this?
I can think of at least two implications. The first is in terms of comparing players' scoring rates it was presumed that substitutes' scoring rates were inflated due to the nuances of the back ended time they spent on the pitch. Daniel Altman confirmed this in his article. However, we also need to be equally aware of players who were substituted off as they too will tend to possess higher Per90 performances than players who play the full match duration.
The second impact is much more important. Unless there is a large difference in quality between the starting 11 and his substitutes any manager that doesn’t use all 3 substitutes are giving up some expected value. And by "using substitutes" I don’t mean introducing them in the 85th minute or in injury time to simply run down the clock.
I find myself agreeing with Altman’s almost throwaway suggestion that players should be substituted early in the game. Not only do we get the boost of the player coming on having fresh legs but we also reduce the negative impact of the fatigue of the substituted player as the change is being made earlier than "normal".
I realize that managers may need to hold a substitute back to cover the chance of injury later in the game, but leaving that aside there really should be no reason why managers don’t ensure that they empty the bench in enough time to get the full benefit of the fresh player.
When are Substitutes used?
After establishing that it is important that managers properly balance the trade off between ensuring they can finish the game with 11 players and ensuring that they obtain maximum benefit from the use of their substitutes I found myself wondering how subs are currently used.
Here is the data from the first 20 Game Weeks of the 2013/14 Premier League season showing the percentage of possible substitutes that have played a minimum amount of minutes.
2013/14 Premier League (Weeks 1 - 20)
The blue plots are the first subs that were used by Premier League managers. 50% of all first substitutes played at least 30 minutes. The noticeable drop off at the 45 minute mark is interesting; and this clearly shows the reluctance to substitute a player in the final minute of the first half.
The red plots represent a team's second substitute. 50% of second substitutes play less than 20 minutes, and only approx 15% of second substitutes play at least 30 minutes.
We can see from the green plots that, in only 50% of the time does a third substitute play 6 minutes or more and 1 in 5 managers wait until the 89 minute to make their last change. In fact, during the first 20 weeks in the Premier League there was a total of 98 possible substitutes that were not used. I know the managers have a desire to finish the match with a full complement of players, but there is a trade off where this prudence has the opportunity cost of not making maximum use of fresh legs against a tiring opposition.
Other Positions
In this article I have concentrated on scoring players, primarily forwards. Perhaps fatigue affects forwards more than other positions, but it's more likely the case that we are better able to measure a goal scorer’s output and thus comment on their performances.
Would it be far-fetched to assume that a central midfielder would suffer less fatigue than a forward? I don’t think so, and I assume that the clubs would be in the position to know how much physical fatigue each player suffers during a full 90 minutes. But are they in a position to be able to quantify how much that level of fatigue actually affects the chance of his team scoring a goal or conceding a goal? I have my thoughts on this, but I just don’t know.
Am I advocating that players should be substituted on the 30th minute, the 45th minute or the 60th minute? At this stage I cannot answer that. As stated above, I would need to undertake more detailed analysis to assess the fatigue impact on a minute by minute basis to arrive at a definitive answer. However, this analysis has shown that the fatigue impact is large enough to overcome the difference in the scoring rates between the two halves, so with that in mind there is really no reason for a manager not to avail of all of his available substitute opportunities.
Indeed, the use of substitutes is just another facet to the game that good managers will use to their advantage whilst poor managers will not realise the tactical advantage that smart substitutions could be able to give them.
EDIT (16/01/14 at 10:39)- A few comments has suggested that there may be a forward bias in the players that are substituted off. Here is the split of only the starting forwards in my sample:
Even within this starting forwards group the players that are substituted have a higher Per90 rate than the forwards that play the entire 90 minutes. Any further granular analysis than this would involve the identifying of individual players to see how they perform when substituted off compared to when they played the full 90 minutes. But I would be concerned that we would be slicing the data very thinly at this point.
ADDITIONAL EDIT - (16/01/14 at 12:40)
To eliminate the data contamination that has been suggested may arise from players with a higher Goal Per90 figure being more likely to be substituted than those with a lower Per90 number I divided my data set into two groups.
I ranked all 268 players by their Goals Per90 figure and divided the table in half, thus creating a top half that includes all the marquee strikers and a bottom half that included players that scored 6 goals but who weren't prolific goal scorers.
Even when looking solely at the bottom half of this table (so the players that aren't prolific goal scorers), this group of players also show that they have a higher scoring rate when they are subbed off than when they play the full 90 minutes.