Player Positional Tracker: Arsenal v Crystal Palace

Arsenal 2 vs 1 Crystal Palace (16th August 2014) A few things I noticed are listed below, but I am sure that people will have their own opinions on what the viz shows.

  • Other than Gibbs (and then Monreal), Arsenal were very much orientated towards the right side of the pitch. Cazorla was notionally on the right side of midfield, but the Spaniard played very centrally. Palace facilitated this as Puncheon (their right midfielder) also playing narrow and central
  • Chamakh played exceptionally deep during the second half - he was behind his midfield for large parts of the second half
  • Arteta played a very disciplined role. He never moved outside the centre circle on this image (Note, we are not suggesting that he didn't move outside the centre circle all day!!)
  • Cazorla and Ramsey played very close to each other, with Cazorla always just in positions that were slightly closer to the Crystal Palace goal Click on the viz to open in a larger window

Gifolution Heatmaps: Four Years of Wayne Rooney

Wayne Rooney's attacking heatmap, using Opta data, over the past 4 league seasons for Man United.

BTW, as with all of these heatmaps, a huge thanks goes to Constantinos Chappas who created the code that enables them to be produced.

2010/11 to 2013/14 Premier League Seasons

 

Rooney_HeatMaps_2010-2013

 

2010/11 and 2011/12 were quite similar with Rooney playing the role of the main Centre Forward on the United team during these two seasons.  Very rarely was he required to play deep enough to make many passes in his own defensive half during these two seasons.

2012/13

Robin van Persie joined United at the start of the 2012/13 season, and we can clearly see the different role that Rooney was asked to play from that point on.

His heatmap is much more widely spread, especially defensively.  In this 2012/13 season he dropped deeper as his defensive duties increased and he took on a more creative role.  No longer was Rooney required to be the main goalscorer.

He notched up more than 2 Key Passes per90 in this term, up from the 1.5 that he had during the 2011/12 campaign.  Unsurprisingly, his shots volume dropped to 3.6 per90 from a mark of 4.5 the year before.

2013/14

Rooney's attacking heatmap for last term is broadly similar to that of the 2012/13 season, however, it appears that he didn't function as much in the very centre of the pitch as he has done for the previous 3 years.  We can see that he has some heatspots, signifying a lot of attacking actions, just to the right and left of centre during the 2013/14 season.

Whether this was a deliberate decision driven by Moyes to actively find pockets of space or simply a quirk of one season's data I am not sure.  Incidentally, the location of his involvements did not change much after Mata's arrival as his heatmap before and after the end of January (when Mata arrived) show very little differences.

As has been posted previously here on Statsbomb, Wayne Rooney has been asked to play differing roles during his time at Old Trafford.  Before the arrival of RVP he was the main goal getter for United, but that changed at the start of the 2012/13 season.  However, regardless of the role that he is asked to play he has tended to deliver for the Red Devils.

Mythbusting: Is Long Range Shooting a Bad Option?

What follows is a synopsis of my presentation at the OptaPro Forum which was held in London on Thursday 6th February.  This article was first published on the OptaPro blog.

This analysis was only possible due to the data provided to me by Opta.

Opta-Logo-Final-Cyan

Expected Goals

The use of Expected Goals (ExpG or xG) as a metric in football is becoming more widespread.  Even though all current versions if this metric are proprietary and use varying calculation methods, the aim of any Expected Goal measure is simply to assign numerical values to the chances of any given shot being scored.

The ExpG model that I and Constantinos Chappas developed produces a goal probability of approximately 3% for any shot that is struck from a central position outside the penalty area.  Over the past year there has been recognition that shooting from long range is sub-optimal and by doing so a team is merely giving up other, more lucrative attacking options – think Tottenham Hotspur and Andros Townsend.

However, although I will admit that I had jumped to this conclusion in my own mind I was conscious that the alternative options open to the player in possession had never been evaluated before (at least not publically).  This desire to establish baseline conversion rates for the different attacking options available to a player who was 25 – 35 yards from goal formed the basis for my Abstract submission.

Opta very kindly granted me access to the detailed match events for the English Premier League 2011/12 and 2012/13 seasons so that I could undertake this study and present my findings.

Those who are interested in the methodology I used can scroll to the bottom of this article, but for the sanity of any casual readers I will go straight to the findings of this study.

How many times was each option chosen?

Totals

Figure 1: Number of Opportunities for each FirstAction

Take ons were attempted much fewer than any of the other possible attacking options.  With the exception of internal passes, all the other FirstActions were attempted between 11% - 18% of the time.  At least part of the reason why there were so many internal passes is that some of the passes that were destined for forward central, wings or the corners would have been intercepted within the rectangle.  As I’m using the end co-ordinates of the pass, and intentions can’t be measured, these passes fall into the internal pass category.  

But how often was a goal scored from each option?

As each possible attacking option has not only a chance of the team in possession scoring, but also the move breaking down and the opposition quickly countering I wanted to look at the net goals scored.  It seemed reasonable to assume that the choice of attacking option would have a bearing on how likely the opposition were to score.

To calculate the net goals scored figure for each option I deducted the number of goals scored by the opposition from the number of goals scored by the original attacking team (both within 30 seconds from FirstAction taking place).

Conv

Figure 2: Net Conversion Rates for each FirstAction

Shooting is good?

Much to my amazement, the choice of shooting was actually the joint most efficient attacking option for the player in possession to take.

I had certainly expected that a forward central pass would be one of the more efficient attacking options, but due to the lowly 3% success rate of shooting from this area I had expect shots to appear much further down the table.

Eagle eyed readers will have noticed that the net conversion rate for shots of 3.9% is much higher than the 3% I quoted at the start of my piece.  Was I wrong in my initial understanding?

In my dataset a goal was scored directly from the initial shot in 2.9% of cases, however this was further supplemented by goals being scored from another 1.2% of initial shots due to secondary situations, ie rebounds or players following up.  From this figure of 4.1%, a value of 0.2% was deducted to reflect the amount of times that the opposition scored within 30 seconds of the initial shot.  And so we arrive at a net conversion rate of 3.9%.

Another surprising aspect is that, on average, a team only scored 1 in every 40 times that they had possession of the ball in the area under analysis.  Without having any real knowledge, I had expected the number to be higher, but I guess it shows that our perception and memories can be misleading – hence why we should use data to aid us in our decision making processes.

What is the significance of these findings?

If these results can be taken at face value then no longer can we criticise a player for “having a go” from outside the area.  He’s actually attempting one of the most efficient methods to score from his current location.

The findings are even more important to weaker teams.  It appears that the option where the stronger teams have less of an advantage over the weaker teams is actually the option with the highest expected value (along with the forward central pass).  I say that shooting is the option that should favour weaker teams because those teams are less likely to possess a number of players that can thread a well weighted through ball or play an intricate pass.  They are also likely to struggle to attack in sufficient numbers to create an overlap down one of the wings or to have as many players in support of the ball carrier as the stronger teams will have.

As well as it being logical that weaker teams could benefit more from this knowledge than stronger teams, I was able to demonstrate this by ranking the teams based on average league points per game and splitting the sample into two halves – Top Half and Bottom Half teams.

Bars2

Figure 3: Net Conversion Rates for Top and Bottom Half Teams

As expected, Top Half teams had a higher conversion rate than their Bottom Half equivalents across all FirstActions.  However, we can see that the drop off between the Top and Bottom Half teams is at its lowest for the Shot option and also that a Shot was actually the most efficient option for Bottom Half teams; whereas Forward Central Passes were the most efficient options for the Top Half teams.

Statistical Rigour

I wanted to satisfy myself that the differences in the conversion rates for shots over the other options (excluding forward central passes) were statistically significant.  I also excluded backward passes from these tests as I don’t think players choose a backwards pass with the expectation that their team will score a goal from it.

The Null Hypothesis used was that there were no differences in net conversion rates between the proportions.

pvalues

Figure 4: p-values for significance in Net Conversion Rates

It can be seen that the Net Conversion Rates for shots are significantly different than corner passes, internal passes and wing passes.  The only option that didn’t reach the statistically significant threshold was shots compared to take ons, and it is my opinion that with a larger data sample these proportions would also emerge as significantly different.

At this stage we have demonstrated that shots from outside the penalty area are just as efficient as forward central passes, and more efficient than the other possible options.  However, I need to address the fact that there could be bias within the net conversion rates of shots.

Possible Sources of Bias

I am aware of four possible sources of bias that could be at play here which could artificially inflate the conversion rates of shots over other options.

  1. Team Quality
  2. Score Effects / Game State
  3. Lack of Defensive Data
  4. Natural Selection

I will briefly discuss each of these in turn and address them where possible.

1 - Team Quality Bias

We have already seen that Bottom Half teams convert shots at a higher rate than other options and that Top Half teams also convert shots at a relatively high rate.  There are statistically no significant differences in how Top and Bottom Half Teams convert shots.

2 - Score Effect Bias

It is accepted that the styles and tactics teams use vary depending on the scoreboard.  A team that is trailing are more likely to attack in numbers and a team that is leading may remain more compact.  It could be possible that shots are being attempted, or converted, at different rates depending on the current score line.

To investigate if this was the case I temporarily removed the Opportunities that occurred when there were two goals or more between the teams.  I then analysed the remaining Opportunities by looking separately at Opportunities which arose when the game was tied and when the game was close (ie tied or just one goal between the two teams):

GS

Figure 5: Net Conversion Rate at Close and Tied Game States

Shots in the entire sample were converted at 3.9%, this is the same conversion rate for Opportunities arising when the game is tied and almost the same for Opportunities occurring when a team is leading by just a single goal.

It appears that shots are converted at broadly similar rates regardless of the current match score, and so there is no bias attributable this source.

3 - Lack of Defensive Data

The Opta dataset is very comprehensive in relation to on the ball events, but unfortunately I was not given any data that could help me ascertain the amount of defensive pressure on each Opportunity.

It could be possible that players shoot from Opportunities which have the greatest chance of their team scoring a goal and they only take other options such as passes to the wings or the corners when a shot is not possible.  Conversely, there will also be occasions when a player could take a shot but opts instead to play a ball for an overlapping runner or attempts to thread a through ball inside the penalty area.

I do not have the data to be able to form an opinion on this either way, but am making the reader aware that this could be a potential source of bias.

4 - Natural Selection

In analysing this dataset I do not have knowledge of the tactics that each team attempted to use on match day or the instructions that were handed down by coaches and managers to the players.  The final potential source of bias identified is the possibility that the only players that attempted to shoot from as a FirstAction were elite shooters (think Gareth Bale last season).

A player that is poor at long range shooting could be instructed not to shoot from an Opportunity or to always seek out the elite shooter.  If this was the case, then the 3.9% Net Conversion Rate for shots that was observed in my dataset wouldn’t be representative of the entire sample of Premier League players.

I would counter that by saying that we know that it’s not just elite players that shoot.  There will have been long range shots taken during the last two Premier League seasons by players who are not skilled in shooting.  So this figure of 3.9% will already be diluted (to some unknown extent) by the conversion rate of non-elite long range shooters.

Even if non-elite shooters are expected to have a conversion rate below the average of 3.9%, the magnitude of the buffer in conversion rates enjoyed by shots over the alternative plays of wing, corner and internal passes and take ons are sufficiently large to suggest that taking a shot may even be the optimum FirstAction for non-elite shooters.

Conclusion

The purpose of this study was to establish benchmark conversion rates for each possible attacking Opportunity given a defined area of the pitch.   I knew that I couldn’t capture all the information that was existent for each individual Opportunity but given the extent of the dataset used in this analysis I assume that I have obtained a representative sample on a macro level.

Given the visibly low conversion rates from long range shots I was surprised at just how efficient (relatively speaking) this option was.  This reinforces the fact that it is not enough to simply know the success rate for any option; we must also be able to reference that against the opportunity cost or success rates of the other possible options.

I am not suggesting that players should shoot on every attack; however I have demonstrated that we should be wary of criticising players for attempting to shoot, especially those in less technically gifted teams.  This study has shown that where players have opted to shoot it was, generally, the most efficient option open to them.

Armed with the information in this study it is no surprise that Tottenham had the highest conversion rate of their Opportunities over the last two seasons.  Gareth Bale would certainly have contributed to the success rate last season, but the North London side converted their Opportunities in both seasons at 3.8% and Bale did not have an exceptional shooting performance during the 2011/12 season.

The logic and methodology used in this study could be carried out on other areas of the pitch and thus benchmark conversion rates could be established as required.

Methodology

I followed the flow of individual match events and created possession chains.  For this analysis I was only interested in possession chains which had an attacking event (ie pass, shot or take on) take place within the boundaries of the red rectangle as displayed in Figure 6.  Where an attacking event did take place within the rectangle I labelled this an “Opportunity” and it forms part of this analysis.

The boundaries of the rectangle in Figure 6 can be described (in Opta parlance) as:

80 ≥ x ≥ 67

65 ≥ y ≥ 35

In plain English, I was concentrating on Opportunities which occurred 23 – 37 yards from goal and in the central third of the pitch.

Over the two Premier League seasons there were almost 24,000 such Opportunities to analyse.

redzone

Figure 6: Rectangle showing boundaries for Opportunities

For my analysis I decided to have seven categories of attacking options based on the FirstAction carried out by the player within the rectangle.  These were:

  • Internal Pass  (red)
  • Corners Pass (yellow)
  • Wing Pass (black)
  • Forward Central Pass (blue)
  • Backwards Pass (orange)
  • Shot
  • Take on

To aid identification the colours noted above relate to the colours of the zone boundaries shown in Figure 7.

colourzones

Figure 7: Boundaries of Five Passing Zones

To determine whether a goal was scored from each Opportunity I took the time of the FirstAction and allowed a period of 30 seconds to elapse to see if the attacking team scored a goal.  I decided to use 30 seconds in an attempt to allow fluid passing movements to have a reasonable chance of concluding whilst trying not to contaminate the analysis with events from subsequent movements.

The reason that I chose a time based cut off instead of following the move until the team lost possession is that a clearing header by a defender does not necessarily mean the end of an attacking movement as the ball could drop at the feet of the original attacking team.  Creating logic to determine when possession was really lost is challenging and objective, and so I avoided this method.

Free kicks were excluded from this analysis.

Smart Use of Substitutes Can Make A Difference

Following on from Daniel Altman’s excellent piece on the scoring rate of substitutes I thought I would undertake my own analysis on the impact of substitutes.

The methodology I will use is slightly different to that employed by Altman in his article. I will use the Big 5 European leagues for last season (2012/13), and I will study the goal scoring rates for all players that scored at least 6 league goals last season.

The use of this filter gives me a list of 268 players that scored a combined total of 2,782 goals in 617,331 minutes of playing time.  This equates to an average scoring rate of 0.41 goals Per 90 minutes for our sample of players.

At this stage it is a well-documented fact that more goals are scored in the second half of games than the first half, and the apportionment in the Big 5 leagues last season was no different with just 44% of all goals scored in the first half and 56% in the second half.

The following is the distribution of goals in 5 minute time intervals for the 5 leagues last season:

GoalDistribution

We can see that, generally, the goal scoring rates increase in line with the time elapsed during the match.  For my purposes, the minutiae of the goal scoring rates isn’t important, instead we just need confirmation that this trend does exist in my data sample.

In his piece, Daniel Altman found that forwards coming on as substitutes scored at a higher rate than starting forwards.  But when we consider that more goals are scored in the second half than the first half then this is no great surprise.  Substitutes will spend a greater proportion of their playing time in the second half (when goal expectation is higher) compared to the first half than a starting player.

So what do we take from this?

The fact that substitutes have a higher scoring rate means that you can’t directly compare Goals Per90 figures between players that regularly start and those who make frequent substitute appearances.  Very simply, the substitute will have his numbers inflated and we would expect his Per90 numbers to drop in the event that he was handed a starting position.

However, Altman didn’t stop there and he found that “fatigue among forwards was a more powerful force than fatigue among defenders”.  That sentence struck a chord with me and I wanted to investigate the general phenomenon of fatigue in footballers a little further.

Hierarchy of Goals Per90

We have established that the longer a match goes on the greater the goal expectation.  This is one of the reasons why substitutes score at a higher rate than starting players.  So, by extension of this logic we would therefore expect players who are substituted to score less Per90 than players who played the full 90 minutes.

Not only would the substituted player be swimming against the tide of playing at least as many first half minutes as second half minutes when the goal expectation is at its lowest, but the fact that he is substituted may also indicate that he hasn’t played a great game thus far.

That second suggestion certainly won’t be true all the time.  The player may be injured, withdrawn for tactical reasons or just tired but it seems reasonable to assume that some of this cohort will have irked the manager enough with their performance to be substituted.

Even ignoring the suggestion that the substituted player has been having a less than stellar performance,  due to the increasing goal expectation it is reasonable to assume that the hierarchy of Per90 goal scoring rates would rank as follows:

  • Substitutes_On
  • Full 90 minutes Players
  • Substitutes_Off

Now we're finalised our hypothesis, how does that compare with what actually happened last season?

Each game that our 268 players took part in last season was divided into the 3 categories: Substitutes_On, Full 90 and Substitutes_Off and I totaled the number of goals and minutes that the group of 268 players as a whole racked up in each category.

Big 5 Leagues 2012/13

Big5

As expected, substitutes coming on scored at the highest rate of our three groups.  This group scored at a clip of 0.65 Goals Per90, however players that played the full 90 minutes actually posted the lowest Per90 numbers of 0.38 with the players that were substituted off sandwiched in between at a rate 0.42 Goals Per90.

I think this is a super interesting finding and it appears that Daniel Altman was spot on with his suggestion of fatigue being a big issue in the rate that forwards score goals.  My sample doesn’t specifically just include forwards, but as it includes the leading goal scorers it will obviously be forward biased.

It looks like the fatigue factor is so strong that it is even able to overcome the fact that more goals are scored in the second half than the first half.  We have shown that a player who starts the games and is withdrawn scores at a higher rate Per90 than a player who completes the full 90 minutes.

When you think about this, it is common sense.  Players tire and it’s better to replace them with fresh legs, but I’ve never seen the impact of tiredness quantitatively assessed before.  I have no doubt that clubs and organisations like Prozone have data that records the physical drop off in player performance due to fatigue but I am surprised that the impact is so strong for goal scorers that it outweighs the benefit of playing the entire second half of a game with its increasing goal expectation.

I’m sure that if we analysed the actual minutes that each player played and their scoring returns for those minutes we could remove the second half scoring bias and calculate exactly how much more likely a fresh player is to score than a player that has played the entire game.  However, I’m going to stop short of these calculations in this article as that would require another level of data analysis.

I am conscious that the above findings are based on just one season of data, so to give me some comfort as to the integrity of those findings I looked at each of the 5 league separately to see how they performed individually.

Bundesliga

 

LaLiga

Ligue1

SerieA

Encouragingly, all 5 of the leagues follow exactly the same trend.  The substitutes coming on comfortably post the highest Per90 scoring rates.  This group have the parlay of being fresh as well as spending proportionately more of their playing minutes in higher goal expectation periods of the game.  The players that were withdrawn have a slightly higher Per90 figure than the footballers that played the full 90 minutes with the benefit of freshness outweighing the back ended scoring bias.

I therefore feel that we can conclude that, not only do substitutes score at a higher rate than starting players but that the players who are subbed off score at a higher clip than their teammates that play the full 90 minutes.

What are the implications of this?

I can think of at least two implications.  The first is in terms of comparing players' scoring rates it was presumed that substitutes' scoring rates were inflated due to the nuances of the back ended time they spent on the pitch.  Daniel Altman confirmed this in his article.  However, we also need to be equally aware of players who were substituted off as they too will tend to possess higher Per90 performances than players who play the full match duration.

The second impact is much more important.  Unless there is a large difference in quality between the starting 11 and his substitutes any manager that doesn’t use all 3 substitutes are giving up some expected value.  And by "using substitutes" I don’t mean introducing them in the 85th minute or in injury time to simply run down the clock.

I find myself agreeing with Altman’s almost throwaway suggestion that players should be substituted early in the game.  Not only do we get the boost of the player coming on having fresh legs but we also reduce the negative impact of the fatigue of the substituted player as the change is being made earlier than "normal".

I realize that managers may need to hold a substitute back to cover the chance of injury later in the game, but leaving that aside there really should be no reason why managers don’t ensure that they empty the bench in enough time to get the full benefit of the fresh player.

When are Substitutes used?

After establishing that it is important that managers properly balance the trade off between ensuring they can finish the game with 11 players and ensuring that they obtain maximum benefit from the use of their substitutes I found myself wondering how subs are currently used.

Here is the data from the first 20 Game Weeks of the 2013/14 Premier League season showing the percentage of possible substitutes that have played a minimum amount of minutes.

2013/14 Premier League (Weeks 1 - 20)

Subs

The blue plots are the first subs that were used by Premier League managers.  50% of all first substitutes played at least 30 minutes.  The noticeable drop off at the 45 minute mark is interesting; and this clearly shows the reluctance to substitute a player in the final minute of the first half.

The red plots represent a team's second substitute.  50% of second substitutes play less than 20 minutes, and only approx 15% of second substitutes play at least 30 minutes.

We can see from the green plots that, in only 50% of the time does a third substitute play 6 minutes or more and 1 in 5 managers wait until the 89 minute to make their last change.  In fact, during the first 20 weeks in the Premier League there was a total of 98 possible substitutes that were not used.  I know the managers have a desire to finish the match with a full complement of players, but there is a trade off where this prudence has the opportunity cost of not making maximum use of fresh legs against a tiring opposition.

Other Positions

In this article I have concentrated on scoring players, primarily forwards.  Perhaps fatigue affects forwards more than other positions, but it's more likely the case that we are better able to measure a goal scorer’s output and thus comment on their performances.

Would it be far-fetched to assume that a central midfielder would suffer less fatigue than a forward?  I don’t think so, and I assume that the clubs would be in the position to know how much physical fatigue each player suffers during a full 90 minutes. But are they in a position to be able to quantify how much that level of fatigue actually affects the chance of his team scoring a goal or conceding a goal?  I have my thoughts on this, but I just don’t know.

Am I advocating that players should be substituted on the 30th minute, the 45th minute or the 60th minute?  At this stage I cannot answer that.  As stated above, I would need to undertake more detailed analysis to assess the fatigue impact on a minute by minute basis to arrive at a definitive answer.  However, this analysis has shown that the fatigue impact is large enough to overcome the difference in the scoring rates between the two halves, so with that in mind there is really no reason for a manager not to avail of all of his available substitute opportunities.

Indeed, the use of substitutes is just another facet to the game that good managers will use to their advantage whilst poor managers will not realise the tactical advantage that smart substitutions could be able to give them.

EDIT (16/01/14 at 10:39)- A few comments has suggested that there may be a forward bias in the players that are substituted off.  Here is the split of only the starting forwards in my sample:

Fwds

Even within this starting forwards group the players that are substituted have a higher Per90 rate than the forwards that play the entire 90 minutes.  Any further granular analysis than this would involve the identifying of individual players to see how they perform when substituted off compared to when they played the full 90 minutes.  But I would be concerned that we would be slicing the data very thinly at this point.

ADDITIONAL EDIT - (16/01/14 at 12:40)

To eliminate the data contamination that has been suggested may arise from players with a higher Goal Per90 figure being more likely to be substituted than those with a lower Per90 number I divided my data set into two groups.

I ranked all 268 players by their Goals Per90 figure and divided the table in half, thus creating a top half that includes all the marquee strikers and a bottom half that included players that scored 6 goals but who weren't prolific goal scorers.

Even when looking solely at the bottom half of this table (so the players that aren't prolific goal scorers), this group of players also show that they have a higher scoring rate when they are subbed off than when they play the full 90 minutes.

Bottom

Arsenal's Premier League Shots

Arsenal’s mid August crisis seems so far away at this stage.  At that time they had just lost their opening game of the new season at home to Aston Villa and Gunners’ fans were disappointed at Wenger’s typical Scrooge like dealings in the transfer market.

Then Mesut Ozil signed on the dotted line and all has gone swimmingly for Arsenal since then.

Following their comprehensive 2-0 win over Liverpool on Saturday evening, Arsenal now sits 5 points clear of the chasers at the top of the Premier League.  Their critics would say that they have faced an easy set of fixtures so far; and they would have a valid point.  The current average league position of the teams that Arsenal has faced has been 13th, which compares with the average position of 10th for Chelsea’s opponents. Still, Arsenal can only beat the opponents they face on each match day, and in they have done that 8 times since their surprise upset to Aston Villa.  Their only less than perfect league result has been away to West Brom at the start of last month.

I’m keen to get a look at what the shots in Arsenal’s games can tell us about how they have performed and whether their league leading position after 10 games is justified.

Arsenal’s Defence

Although it’s been Arsenal’s attacking talent such as Ozil, Giroud, Cazorla and Ramsey that has received most of the plaudits I’ve been seriously impressed by the Arsenal defensive performance.

Here is the Shot Chart for the shots that Arsenal has conceded in the opening 10 games of the 2013/14 Premier League season.  And for those unfamiliar with these Shot Charts I am also showing the template that defines the boundary between the four zones that I use:

 

Shooting Zones

ArsenalConceded

Arsenal has conceded 9 goals this season, but the two penalties scored by Sunderland and Aston Villa are not included in the above chart.

The concession of 125 shots is not elite; in fact the average EPL team has conceded 129 shots.  However, what Arsenal has done superbly is limit the amount of dangerous shots that they give up.  Their concession of just 31 shots (3 per game) from the Prime Zone is the best in the league; Tottenham, Man City and Everton are next best in this measure with 35 shots.

ExpG

The result of preventing shots from good locations is that the average goal probability per shot allowed by Arsenal (at less than 7%) is the lowest in the Premier League.  I posted the following image in this look (link) at Roma, but it’s worth publishing here even if the figures do not take the latest round of games into account.

 

Roma AvgExpG Big5

 

Out of the 98 teams in the Big 5 leagues, only one team, Roma, forced teams to take shots where their average goal probability per shot was less than that allowed by Arsenal.  That metric must bring tremendous satisfaction to the team and their coaching staff.

I am measuring the average goal probability by using the ExpG measure created by Constantinos Chappas and me.  Some outline details about ExpG can be found in this article, but as we use this metric for betting purposes we’d prefer not to reveal the full details of the calculation method.  Incidentally an approach similar to this ExpG model seems to be used by Prozone and Joey Barton recently published some of Prozone’s stats on QPR.

At least we seem to be in good company…….

As well as owning the best average ExpG value allowed per shot, Arsenal have the lowest aggregate ExpG value conceded in the league.  This suggess that the Gunners are defensively sound and means that, on the whole, Arsenal’s low goals conceded total of 7 (excluding those 2 penalties) is deserved.  Although there are four teams that have conceded fewer league goals than Arsenal I would contend that the Goals Against column for those teams (Chelsea, Spurs, West Ham and Southampton) are much better than the shots they have given up would suggest.

Southampton

The most extreme example of this is Southampton where they have conceded just 4 league goals.  Using the ExpG value for every shot conceded I ran a simulation which replicated the 97 shots that Southampton has faced this season 10,000 times.  In only 2.03% of these simulations did Southampton concede 4 goals or less.

Perhaps Southampton are doing something different where the probability of a team scoring against them is less than the average team in our data set but I don’t think so.  This incredibly low probability suggests that the Southampton goals conceded number is going to see some regression in the near future as the variance that they are currently getting the benefit of will turn on the Saints.

Who knows?  Perhaps Asmir Begovic’s goal for Stoke against the Saints on Saturday is an indicator of the “bad luck” that may be ahead of Southampton.

Arsenal in recent games

What’s even more impressive about Arsenal’s defensive performance is that they have improved as the season has gone on.  In 3 of their opening 4 games Arsenal conceded more than 1.00 ExpG (that is our estimation of the number of goals that a team should score given their shots); and their opposition in these games included Aston Villa, Fulham and Sunderland – none of which could be categorised as strong opposition.

However, in each of their last 6 games their ExpG against has been less than 1.00 and they conceded just 4 goals in those 6 games, so it’s not by luck that opposition teams are feeling frustrated after facing Arsenal.

I have seen Mathieu Flamini singled out for praise upon his return to Arsenal’s defensive system, and our ExpG numbers would corroborate that fact.  It’s probably no coincidence that he missed Arsenal’s opening two games of the season (when Arsenal shipped 2 of their 3 highest defensive ExpG figures).  The introduction of the French man has corresponded with a tangible decrease in the chances that Arsenal have given up.  

Arsenal Going Forward

ArsenalFor

Arsenal’s 142 shots represent the fourth highest volume of shots amongst teams.  Liverpool has also had 142 shots, and that pair trail behind Trigger Happy Tottenham, Chelsea and Man City in terms of efforts on goal.

Arsenal is very careful with their shooting locations, with no shots so far from the Very Poor Locations zone and 44% of their shots come from the Prime Zone which is well above the league average of 37%.

That Prime Zone figure of 44% is bettered by just Man City, West Brom and West Ham; with all of those teams have more headers than Arsenal.

The significance of this is that a team that has headers making up a larger proportion of their total chances would expect to see those chances originating from closer to goal than shots.  The trade off here though is that headers are converted at lower rates than kicked shots from all spots on the pitch.

All of the above means that Arsenal, although very good, from an attacking point of view have not been exceptional when analysed through the lens of our objective ExpG measure.

Man City, Chelsea and Liverpool (by virtue of their excellent average shot probability) all post aggregate ExpG values in excess of Arsenal’s number.

For the record, we have Man City with an ExpG of 8 higher than Arsenal, and Chelsea and Liverpool both at 3 goals higher than Arsenal at this stage of the season.

It’ll not surprise anyone when I therefore contend that although Arsenal has been a joy to watch this season they have over-performed in scoring 21 goals from their 142 shots. I processed Arsenal’s 142 shots through my simulator and on just over 10% (10.13%) of the simulations did Arsenal score at least 21 goals from the shots they took on. So, the over achievement of Arsenal in front of goals is not quite as significant as Southampton’s in stopping the goals being scored but, for my money Arsenal’s current goal tally of 21 goals (excluding the Giroud penalty) is somewhat inflated given the shots they have taken.

Aaron Ramsey

One of the stars of the season so far has been Aaron Ramsey with his 6 Premier League goals.  The Welsh midfielder has significantly upped his performances this season, probably due in no small part to the increased confidence that finding the net brings with it.

However, Ramsey is a perfect encapsulation of how Arsenal has scored more goals than the shots they have taken would suggest.

Using the same simulation methodology as above I have ascertained that Ramsey would score 6 goals less than 1% (0.85%) of the time based on the shots he has taken this season.

Even without access to advanced metrics, we know that shots are scored at a rate of approximately 10%.  Ramsey’s average shot location is certainly no better than would be expected for the league as a whole.  This simple logic would dictate that Ramsey would have been expected to have scored 2.30 goals from his 23 shots, not 6.

I want to be clear that our ExpG model uses much more complex inputs than laid out in the preceding paragraph, but even those simple numbers can give a sense of Ramsey’s over achievement in terms of putting the ball in the net.

I’m including a plot of all Ramsey’s shots this season and I spent some time trying to think of whose shots I could compare against.  I eventually settled on comparing the shots that Ramsey has taken this season with those taken by the same player last season.

 

Ramsey

The current season shots are the green dots, with the red and yellow checked dots representing the shots taken by Ramsey last season.

I think it’s fair to say that the general shapes of the shot locations are roughly similar from last season to this.  It is therefore surprising to see that last season Ramsey only scored 1 goal from his 46 shots, yet this season he’s shooting the lights out with 6 goals from half as many shots as last season.  You’d barely believe that these shots were struck by the same player.

I’d suggest that the true Aaron Ramsey conversion rate is somewhere between the two extremes, but the above image is a powerful reminder as to how much variance can exist when analysing individual players’ shots due to the relatively low numbers taken per season.

Interestingly, Ramsey’s variance of 6 actual goals against his 2 ExpG actually explains the majority of Arsenal’s attacking over achievement.

Summary

When we combine our attacking and defensive ExpG values for the entire Premier League, we rank Arsenal in third place overall, behind Man City and Chelsea.

Defensively they have been superb ensuring that teams shoot from unattractive locations, but the fact that Arsenal currently tops the league is partly due to variance in my opinion, and unless the Gunners create more and / or better attacking chances I would expect them to come back towards the chasing pack.

As a result of Arsenal’s relatively gentle start to the season in terms of opposition faced, it is inevitable that their strength of opposition will toughen up between now and Christmas.  In the event that goals start to dry up for the North London club the media will probably latch on to the fact that Arsenal have found it tougher as they face better opposition. In my opinion, this will be misguided as Arsenal is due a goal scoring regression regardless of who they face in their upcoming fixtures.

PDO

Another way of visualising the “luck” element that Arsenal has benefitted from this season is through PDO.  PDO is a concept that has been taken from ice hockey and it is supposed to measure luck by adding together the % of shots that a team scores and the % of opposition shots that they save. In summary, the league average is 100, and a number greater than 100 would suggest that a team has been lucky.

Our own Ben Pugsley has been keeping a record of PDO (amongst a host of other stats) over on his Bitter and Blue blog, and he has updated the stats for GW10.  His stats tables can be found here.

Arsenal tops the PDO table after 10 weeks.  Now I’m not totally convinced by the merits of PDO as it has at its core a belief that all shots for and against are equal, and I have shown that Arsenal allow poorer quality shots than anyone else in the league.

Still, even with that proviso I think it is useful to show that there is a measure other than our ExpG that shows that as good as Arsenal have been they are perhaps in a little bit of a false position.

I think it’s important for Arsenal fans to recognise this; they can certainly bask in the warm glow of being league leaders but be aware that the shooting performances suggest that there are currently one or two better teams in the Premier League than the Gunners.

Goalkeeper Saves Week8 EPL 2013/14

In the article I published on Monday I introduced our (Constantinos Chappas and I) ExpG2 metric as an objective quantitative method to rate the shot saving performances of goalkeepers.  Due to the way that ExpG2 values are calculated it makes an excellent objective measure of how effectively a goalkeeper dealt with the shots he faced.

To recap, the ExpG2 value is the goal expectation that each shot has after it has been struck by the player shooting and it is calculated with reference to all of the data that we know about the shot.  One of the main drivers of the ExpG2 values is the placement of the shots.  Any shot that is blocked or off target has an ExpG2 value of zero.

What information do we know about the shot? We have the information that is provided by Squawka and StatsZone (both of which are powered by Opta).  Unfortunately, we have no defensive pressure data available to us so the ExpG2 values do not take into account defensive pressure.  Still, even with that omission we’re left with what, in my opinion, is the most objective measurement of how a goalkeeper performed from a shot stopping point of view.

My article on Monday looked at EPL ratings for last season, and I promised that I would use the same methodology to rate the goalkeepers for the current season.  I’m going to present the data from the first 8 games in the EPL this season, but due to the lack of games played I need to issue a warning as to the volatility of these results.

You will see that even just 1 more goal conceded or prevented may substantially change the ratings at this early stage of the season.  That’s not the fault of ExpG2, it’s simply reporting mathematically,  what has happened.

2013/14 EPL Season (thru 8 Games)

We’ll start off with looking at all shots for goalkeepers that have faced at least 32 shots (that’s 4 per game).

 

ExpG2 is the amount of goals that our model expected the goalkeepers to concede based on the type of shot and shot placement

Goals are the number of goals conceded by the respective GKs Save Efficiency is (ExpG2  / Goals), with a higher number signifying less goals than expected were conceded

This table has been sorted by default in descending order of Save Efficiency, but you can sort the table as you wish.

Southampton

We can immediately see part of the reason why Southampton has had such a great defensive record at the start of this season.  The concession of just 3 goals means that their defence has conceded the least so far in the Premier League.  As you would expect with such an excellent defensive performance, it has been achieved as a function of both great goalkeeping and a defence or team that protected him very well.

From the protection point of view, the fact that the shots faced by Boruc has lower ExpG2 values than all the other teams in the Premier League demonstrates that his teammates in front of him have excelled at ensuing Boruc had just the minimum amount of work to do.

The Save Efficiency % suggests that Boruc has massively outperformed as he has saved 4 more goals than his numbers would have suggested.  His value of 230% is excellent, but unless he now gets changed in a phone box before matches and wears his underpants outside his shorts his Save Eff number will inevitably have to reduce. For reference, De Gea was the league leader in this measure last season at 138%.

Although I’m absolutely confident that Boruc’s Save Eff % will regress, it will be really interesting to see if his teammates can keep him as well protected for the remainder of this season.

Here is Boruc’s Shot Chart for shots faced this season.  Saves are the white balls and goals conceded are the red ones.  You can see why he has earned a rating of 230%.

Boruc's Shots Faced

 

BorucShots

 

Cech, Lloris, Begovic and Mignolet are all bunched up fairly tight behind Boruc as the best of the rest.  Each of those 4 goalkeepers has saved their teams approximately 2 – 2.5 goals so far this season. Mignolet’s performance reaffirms the decision that Liverpool made to replace Reina with the Belgian.

Eagle eyed readers will see that this table doesn’t average at 100%.  The reason for this is that our model wasn’t fitted using just this dataset.  We used a number of leagues and a longer time period.  The fact that the average sits well above 100% tells us that goals haven’t been scored at the rates that the raw shot data would dictate.  Perhaps this may be due to the defensive pressure in the Premier League that we cannot measure, or it may also be due to short term variance.  Most likely, it’ll be a combination of those two factors.

Fulham

Anyone who follows me on Twitter will have seen the Fulham Shot Chart that I published yesterday morning.  They have conceded a huge amount of shots from what appear to be great locations:

 

FulhamShots

 

Yet they have only conceded 10 goals.  So, for me, it’s interesting to see that the ability of Fulham to prevent those shots turning into goals on a more regular basis doesnn't seem to be down to the performance of Stockwell in goals as he is actually one of the poorer performing keepers so far on this metric.

Everton fans won’t like to see Tim Howard with just 92%.  Mind you, the Man City fans might just wish that Tim Howard was their goalkeeper.

Joe Hart

In Monday’s article I concluded that Joe Hart was below average last season, and this under performance was solely attributable to the long range shots he faced.  As I said at the top of this piece, it’s early in the season and much too early to be drawing too many conclusions from these numbers but the lack of saves made thus far by Joe Hart is alarming.  He has conceded almost 2 goals more than the shots he has faced would suggest.

If I can brielfy have a licence to be melodramatic, we can compare the sublime (Boruc) with the ridiculous (Hart) with Joe Hart’s shot chart:

 

HartShots

 

As England’s Number 1, or even just a Premier League standard goalkeeper, Hart will be disappointed with how poorly he has saved the shots aimed at him during City’s opening 8 games.  In this regard it will be interesting to see how Hart performs over the remainder of the season and just how high he can lift his rating from the current 81%.

Shots from Inside the Penalty Area

 

We have included goalkeepers that have faced at least 32 shots from inside the penalty area. Unsurprisingly, the general shape of this table is the same as the overall one with a few noticeable differences. Joe Hart has a more respectable 104% Save Efficiency rating for close range shots (this mirror’s last season’s pattern) and Arsenal’s Szczesny goes the other way down the table with what looks to be some poor shot stopping from shots taken inside the area.

Shots from outside the Penalty Area

The final table shows the shots that each keeper faced from outside the area.  Given the relatively few goals that the keepers have conceded from long range shots at this stage of the season these ratings are incredibly volatile and will change quite a lot as more goals are conceded.

So with that in mind, a large dollop of prudence is required when trying to undertake any analysis of these values.  They are provided for information purposes as much as anything else.

Joe Hart’s unfortunate penchant for being beaten from long range has seemed to continue into this new season.  He has conceded 3 goals from outside the penalty area, when our model suggests that he should have been beaten on only one occasion.  It does appear that Hart has an issue with saving long range shots; this pattern has emerged from this data, the piece I looked at on Monday as well as this article by Paul Riley.  Generally, I’m not a fan of long range shooting, but if I was advising any of Man City’s opposition I certainly wouldn’t be discouraging them from trying their luck from long range a little more often than would ordinarily be good for them.

At the top of the table, Szczesny has done really well in producing a mirror image of Hart’s numbers – he conceded just 1 goal when he “should have” conceded 3, whilst two other London based keepers, Cech and Lloris, are yet to be defeated from shots outside the area.

It’s worth noting that Szczesny’s ExpG2 value for shots from outside the area is significantly higher than any of the other keepers in the league.  This is due to the fact that Arsenal have been excellent at forcing teams to shoot from long ranges. Here is the Shots Chart for shots conceded by Arsenal this season:

 

ArsenalShots

The Small Print

For those not familiar with my work, perhaps a brief introduction to me may help manage readers’ expectations.

I see myself primarily as a sports bettor and the articles that I write here on Statsbomb are only possible thanks to the huge amount of data that I have collated and analysed for betting purposes.  If I didn’t bet then I’m quite sure that I wouldn’t have spent the time required to collect the necessary data.

So the plus side is that my articles exist as a by-product of my football betting.  The downside is that I always need to be mindful of my betting edge.  This inevitably means that I can’t get into specifics as to how our models work or what’s taken into account in their calculation.   Sorry, but that's the deal I made with myself.

Newcastle's Shots Allowed

Before tonight's Premier League game away to Everton, Newcastle sit 16th in the Premier League table on 7 points with a Goal Difference of -3 (5 scored and 8 conceded).

Only 4 Premier League teams have conceded more goals than Newcastle this season.  At face value this is surprising given that they have conceded the 3rd fewest shots in the league at 51 (excluding penalties).

In giving up those 8 goals they have allowed the opposition to score with 16% of all attempts faced.  This concession rate of 16% is the highest in the league (coincidentally they share it with their North East rivals Sunderland), so they've been pretty unlucky to concede 8 goals?

I'm afraid the answer to that is a resounding "No". Have a look at where Newcastle have conceded their 51 shots from in their opening 5 games this season:

 

Newcastle

 

For some (not very smart) reason, more than half of all shots Newcastle have conceded have come from Prime Positions.  By permitting 90% of all shots to be taken from the two most favourable zones you can see why the Geordies have conceded a goal in almost 1 of every 6 shots they have faced.

90% of shots coming from those two most dangerous zones is by far the worst performance in the League; the next greatest offender (Swansea) only allows 79% of their conceded shots to be struck from those zones.

Obviously no team sets out to give up chances in these very dangerous areas, so it will be interesting to see in tonight's game whether they can keep Everton (who can pile on the pressure when required) out of the danger zone with greater effect than they have managed in their previous 5 games so far. If not, they can expect the goals against column to keep increasing at a rapid rate.

EDIT - as requested, below is an outline of the 4 zones.  Apologies, for the omission in my original posting.

 

Shooting Zones

Chelsea's Striker Options

Given the huge amount of attacking talent currently residing at Stamford Bridge I wonder how Jose Mourinho is going to decide on which four attacking players he will field.

Formation

I assume that he will use a back four and will play two holding / central midfielders which will then allow him four out and out attacking players.

The widely held belief is that he will play three attacking midfielders, link men or “just off the shoulder” forwards.  Those 3 positions will probably be filled by some combination of Mata, Hazard, Oscar, Moses and the new signings of Schurrle and De Bruyne.  And the attacking talent has been assembled even before we consider the possibility of Wayne Rooney signing for Chelsea.

Such a formation would then leave room for just one traditional striker, and at the moment it would seem that this position will be contested by Lukaku, Torres or Ba.  It would appear that this position is Romelu Lukaku’s to lose but I wanted to take a look at the three strikers’ stats as well as visuals of their shot locations and placements from last season to see if this is indeed the correct decision.

Striker Options

Romelu Lukaku seems to be holding pole position in this battle right now, but at just 20 years old is he ready for such weight and responsibility to be placed on his shoulders?  Yes, he had a terrific season last year but despite the fairly large transfer fee Chelsea paid for him (£19m) perhaps he was something of a surprise package to the defences he came up against last season, might they be better prepared this season?

Demba Ba didn’t have a great first 6 months at Chelsea, in fact it went pretty awful for him with just 2 goals since his move in January from 46 shots.  That’s the sort of conversation rate that makes the current Fernando Torres look, well, like the Fernando Torres of old.

Torres doesn’t need me to write much more about him, suffice to say it appears that El Nino’s best days are well behind him at this stage.  Although the fact that he played in approximately 75% of all Chelsea’s available minutes last year suggests that Roman Abramovich may not feel the same way. At this stage it does look like his time at Chelsea is running out as there has been a lot of chatter concerning a return to Spain.

To help put some context on how the 3 Chelsea strikers performed last year, I thought I would take a look at their performance from a statistical point of view.

Player Statistics

 

The above stats are for the entire 2012/13 Premier League season, so Demba Ba’s figures include both his time at Newcastle and Chelsea.

All the figures, with the exception of ExpG and ExpG Eff, should be both obvious and well known to readers of this post so no explanation will be necessary.

Lukaku’s Per90Shots on Target value of 2.11 is pretty special and at more than 4.3 ShotsPer90 he certainly kept defences busy.  Demba Ba was even more impressive with the amount of shots he took but unfortunately for him he lacked a little accuracy which then reduced his SoT value. Torres’ numbers are really subdued.  Despite playing more minutes than Lukaku and Ba he had substantially less activity in all outputs (shots, shots on target and goals) and he rounds it off with just 1 shot on target per90, which is a very poor return for a top level striker.

ExpG

The new metric introduced in the summary box, ExpG , is the number of Expected Goals that we** expected a league average player to score based on the type of chances that the players attempted.  The inputs to this measure won’t be disclosed, but we find that it is fairly accurate and allows us to compare the quality of chances created and then the efficiency with which they were finished.

The ExpG Eff metric is  = Actual Goals / ExpG where an ExpG Eff of 1 represents an average player, a value greater than 1 represents above average finishing and less than 1 below average

**We refers to Constantinos Chappas and I. Constantinos can be followed on Twitter @cchappas

From a Chelsea viewpoint it is perhaps worrying that Lukaku is the only one of the trio whose actual goals tally exceeded their ExpG value.  So whilst the finishing skills of Torres and Ba were very poor, with an ExpG Eff of 0.73 and 0.88 respectively, even Lukaku’s 1.05 (as the best of the trio) was not exceptional by Premier League standards.

As a means of comparison; Van Persie finished the season with an ExpG Eff of 1.15, Walcott 1.40, Berbatov 1.19 and even Suarez earned 1.08.

In fact, of the top 12 Premier League scorers last season only Dzeko (at 0.84) had a worse ExpG Eff ratio than Ba and Lukaku. Interestingly, Wayne Rooney who has been strongly linked with Chelsea this summer doesn’t look like he’ll be the answer to their lack of a clinical finisher either as he posted 1.06 last season.

Shot Visualisations

In order to provide the bare statistics with some context I had a look at the shooting locations that the players were faced with and the placements of their non-blocked shots.

Torres

TorresShotsFFS

The shot location images I use in this piece have been taken from the subscriptions section of Fantasy Football Scout website.

I certainly wouldn’t encourage players to take speculative, often wasteful long range shots, but the almost total absence of long range shots for Torres appears indicative of a player that is very low on confidence.  He also struggled to hit the target (green dots) from many shots that were outside the width of the 6 yard box.

 

TorresPlacements

The above image shows the shot placement from the striker’s Point of View with the red balls signifying goals.

Looking at the shot placements it would appear that Torres strongly favours shooting toward the right side of the target.  Aside from that there was an unhealthy attraction towards the centre of the goal.  His lack of accuracy and the amount of easy saves that opposition keepers were allowed to make would have contributed to his awful ExpG ration of 0.73.

Demba Ba

BaShotsFFS

 

We can see a lot more activity on Ba’s image than the Torres one, with a particular penchant displayed  for attempting long range efforts

BaPlacements

On the whole, Ba seemed to have two types of shots.

Most of his on target shots tended to be very low ground shots, which at least is preferred to shots that arrive at the goalkeeper a few feet off the ground.  However, he seemed to lack appropriate accuracy control when he attempted to put some elevation into his shots.

Lukaku

LukakuShotsFFS

Lukaku’s shooting appears to be the happy medium between Torres’ lack of activity and Ba’s overzealous shooting.

He has a decent smattering of long range shooting, but the highlight of that image for me is that he displayed great skill in ensuring that shots from the right side of the pitch generally hit the target.  Undoubtedly this is due to the fact that he favours his left foot and thus the right sided shots give him the best angle, but the amount of green dots on that image is admirable.

 

LukakuPlacements

If I was being critical of Lukaku’s shooting its that he fired too many shots toward the centre of the goal at heights that were favourable to the goalkeepers.

A rough count gives me 19 shots in the central region that didn’t stay along the ground, and only 2 of them were scored.  That shooting pattern will certainly reduce a player's conversion percentage rate.

Perhaps that might explain why although good, the Belgian youngster’s actual goal tally compared to his ExpG was not exceptional by Premier League standards last season.

Conclusion

Based on the statistics from last season and the three strikers I have considered I don’t see any reason why Lukaku shouldn’t be the starting centre forward for Chelsea this season. Torres can be discounted entirely.  His finishing of the chances he had was very much below par, but this is compounded by the fact that he didn’t get himself in the position to be taking shots anywhere near often enough.

Ba just didn’t do enough last season to suggest that he is ahead of Lukaku.  Yes, he had more shots but his average ExpG per shot was 25% less than Lukaku.  The lower average shot ExpG is caused by attempting more difficult shots which suggests that Ba was less prudent in his shot selection. This also comes across clearly in their shot location maps.

As a result of Ba’s more speculative shooting, Lukaka posts better Shots on Target and Goals per 90 than Ba.  But the clincher for me is that Ba didn’t even convert his chances at the average player rate of 1.00 wheras Lukaku slightly exceeded that threshold (1.05 vs 0.88).

It will be interesting to see how Lukaku progresses this season.  There is no doubting that he is a handful and he should improve considerably with maturity, but he will need to. In my opinion, a club with the expectations of Chelsea should have a main striker who is capable of putting away their chances at a rate that vastly exceeds that of a league average player.  Perhaps Lukaku will develop into that player, but if not, it’s important for Chelsea that they have someone playing at the top of the pitch who can.

Near or Far Post Shooting

In a previous article which can be found here I did some research on the percentage of scoring shots and headers that a shooting player can expect to achieve given a specific shot placement. As it was my first attempt at looking at shot placements I grouped all shots together but the difficulty with stats and data is that you can never just take the first metric at face value as further analysis can be undertaken, and inevitably this second level of analysis can provide interesting insights that are missed at the higher level of data review.

In order to refresh memories, here is the scoring percentage for each shot placement zone for all shots and headers:

AllShots

Remember that we are looking at the goal mouth from the point of view of the striker. I now want to undertake some further analysis to see what other information we can learn, and to do this I am going to look at shot placement based on which area on the pitch the shots or headers were struck from. I have divided all unblocked shots and headers into three pitch areas (right, left and central) as laid out in this image below:

Pitch Sides

The boundaries of the three zones have been deliberately chosen to ensure that approximately 50% of shots in the sample fall within the Central Shots zone, with the other 50% being split almost equally between right and left sides.

Central Shots Zone

Let’s have a look first of all at shots which were struck from the Central zone.

Central

No surprise to see the general “shape” of the scoring rates heatmap pretty similar to the one at the top of the article which is for all shots.  The main difference is that the scoring rates are higher across the board, hence the increased level of "redness" in this plot.  As we are looking at the shots from the best positions (straight in front of goal) this would be in line with our expectation

Shots from the Right

We'll now cast our eyes at shots which came from the right side of the pitch as defined in the image above.

RightSide

Now it gets interesting!!!!!

The above image shows the scoring rates for shots taken from the right hand side of the pitch and immediately a clear pattern jumps out.  As expected there is considerably more blue and less red on this image than the previous heatmap due to the fact that we are now looking at attempts from less attractive shooting locations. However, that's not what is so intriguing about this heatmap.

The heatmap is extremely unbalanced, with all the red and orange zones concentrated onto the left side. The imbalance is so great that if we divide the plane of the goal into thirds, the average conversion rate for shots that hit the target in the left most third (Far Post) is 32%, the central third 7% and the right third (Near Post) is 14%.

As seen in my previous piece and logic would dictate, it would be expected that shots struck towards the centre of the goal would have the lowest scoring rate; but a conversion rate of 2.25 times higher for on-target shots aimed towards the Far Post than than those aimed for the Near Post appears hugely significant to these eyes.

Shots from the Left

And what about for shots from the other side of the pitch, the left?

LeftSide

The exact same pattern, only in reverse, emerges.

From this left side, the Far Post third (right) has an on target conversion rate of 30%, 8% for central third and 15% for the left third. This means that Far Post on-target shots from the left hand side of the pitch are converted at twice the rate of Near Post on target shots.  This ties in pretty neatly with the finding from the other side of the pitch.

At this stage I think it’s safe to conclude that on target shots towards the far post (third of the goal) has twice the success rate of near post shots.  Even without going any further, that strikes me as a pretty darn important piece of information.

Point of Order: For the rest of this piece, Far Post is defined as any shot where the ball would cross the plane of the goal line in the Far Post third of the goalmouth or wider.  Whereas Near Post is the opposite, it would cross the goal line either in the Near Post third of the goalmouth or wider. Also the remainder of this piece will concentrate on just the shots taken from the right and left sides of the pitch as I want to investigate in greater detail the apparent Near and Far Posts phenomenon.

Far Post is Superior

So what does this mean?  My first thoughts are that the Andy Gray cliché of “he should have went across the keeper there” is correct. However, I’m only going to give him half marks as I believe his assertion was based on the fact that, if missed, a shot across the keeper has a chance of being parried, allowing the attacking team to pick up the rebound and have another strike at goal.  A shot missed on the narrow side does not have this luxury. Not for one second do I think that Andy Gray was aware that on target shots to the far post are scored at rates of 2 to 2.25 times more than those shot towards the near post.  At least if he was aware of that fact then he, along with everyone else in football, kept that particular nugget very quiet.

Possible Reasons for Discrepancy

1 - The first possible explanation for this difference is that I’m only looking at shots that are on target, ie in this analysis I have ignored shots that were wide or high of the target. Perhaps looking at goals as a percentage of all unblocked shots is required as it may be more difficult to hit the target with cross shots than near post shots.

After investigation, it turns out that this was indeed the case as 68% of all Far Post shots missed the target, compared with 64% of Near Post shots.   However, that small difference isn’t anywhere near sufficient to explain the difference in goals scored as a percentage of unblocked shots.

After including missed shots, 9.9% of unblocked Far Post shots are scored, whereas the rate substantially falls to 5.3% for Near Post unblocked shots.  This means we end up with a final ratio of unblocked Far Post shots being scored at 1.8 times the rate of Near Post shots. So after ruling out the difference being attributable to off target shots we are still left with a significant unexplained difference in terms of the scoring rate for Far and Near Post shots.

2 - Could it be that goalkeepers are overly concerned with getting beaten at their near posts?

There is no doubt that it looks bad for a keeper if he is beaten at his near post, but perhaps they are trying to guard the near post at the detriment of the cross shot? At this point (with no access to goalkeeper positioning at the time of the shot) I don’t have any way to either prove or disprove this possible explanation, so unfortunately I have no other option than moving on to my next possibility.

3 - Another possible explanation for the difference is that I have so far excluded blocked shots from this analysis (as we never know where they will cross the plane of the goal).

Due to the fact that a cross shot has to travel through the central area of the pitch it certainly seems likely that shots aimed towards the Far Post have a greater chance of being blocked than those targeted towards the near post.  But is the difference in the rates that Near and Far Post shots are blocked enough to explain the near twice as often conversion differential?

This could be quite a difficult question to answer as we have no way of knowing where the shots would have crossed the goal line had they not been blocked.  However, I have been spared some potentially impossible mental gymnastics as even if EVERY blocked shot was a Far Post shot (so none of the blocked shots were destined for Central or Near Post!!) the scoring rate for all Far Post shots would still exceed that of Near Post shots. That really is something. So although that is good news, as a numbers man I have an innate desire to quantify effects and so I’m going to try to make an educated guess at the location in the goal where blocked shots were destined for.

First up, what’s our split of non-blocked shots:

ShotsSplit

As stated above, I would assume that Near Post shots are likely to get blocked less than Far Post shots, but I would assume it would be reasonable for Central shots to be blocked at the same rate as Far Post shots.

Having established this, let’s then assume that Far Post and Central shots are blocked at twice the rate of Near Post shots (this is only a guess, but seems reasonable to me and I need to pick a number).

This blocked shots weighting combined with the volume of non blocked shots results in an assumed distribution of the Blocked Shots as follows:

Far Post               53%

Central                 21%

Near Post            26%

Total                     100%

I will therefore split the Blocked shots in my data sample as being destined for Far Post, Near Post and Centrally in the ratios of 53%, 26% and 21% respectively.

At this stage, I want to point out that the only purpose of the preceding couple of paragraphs is to approximate the number of blocked shots for each of the goal zones (Far and Near Posts and Central) as the analysis cannot be properly completed with some attempt at apportioning blocked shots.

Yes, some of my assumption can be challenged but I don’t think that I can be that far out in the approximations I have used; and importantly certainly not enough to change the core findings of this analysis piece.

Conversion % of All Shots           

Armed with an approximation of blocked shots for each goal zone we can now reach a conclusion which takes into account the percentage of all shots which are scored from the sides of the pitch (the areas denoted in the second image in this piece) depending on whether the ball would have ended up Near Post, Far Post or centrally in the goal.

Remarkably, 6.8% of Far Post shots were scored, this compares with just 4.4% of Near Post shots.  As raw numbers, both of those conversion rates appear fairly small, but don’t forget that we are dealing with shots that are struck from less attractive locations on the pitch (ie away from the central strip of the pitch).

Conclusion

What I have laid out in this article appears to be quite fundamental. When shooting from less attractive positions, the player shooting has a conversion rate which is more than 1.5 times better for Far Post attempts than for Near Post attempts.

If this fact wasn’t impressive in its own right, when this is parlayed with the chance of a Far Post shot being parried and the rebound scored from then the advantage is even greater than the basic 1.5 multiple as calculated above.

Why?

The question I haven’t been able to answer properly is why this phenomenon exists in professional football when clubs have access to both better data and bigger brains than mine?

I don’t think it can be due to variance as my sample has a huge amount of shots, it contains every shot taken in the Big 5 Leagues during the 2012/13 season – that’s almost 50,000 shots.

After undertaking the work for this article the only conclusion I can arrive at is that it’s due to Goalkeeper positioning.  I have taken account of most other things, ie the difficulty of hitting the target and the apportionment of blocked shots. Could it really be that keepers are so conscious about the “Pride of the Near Post” that they over compensate?  I am unable to coherently put forward any other possible reasons.

In order to gauge reaction to this piece I sent a draft to David Sally and Chris Anderson, the co-authors of "The Numbers Game".  David made the point that a higher success rate for Far Post shots could be indicative of another aspect of the way goalkeepers play.  If they were slow to come off the line then due to basic geometry they would be more exposed to Far Post shots than Near Post efforts.

As alluded to in my preamble to this post, you can look at a facet of the game using just the headline measurements (conversion % for all shots in this instance), we can then go one level deeper into the data (slice the data by pitch sides) but even this may not be enough.  Chris Anderson made the point that I should probably further divide the data into shooting distances.  This would involve going yet another level deeper into the data. Perhaps I might further subdivide the data in a future article so that I can see the impact of shot distance on this Near and Far post phenomenon.  However, for my money that lack of further slicing of the data doesn't diminish the importance of the findings laid out here.

As an aside, this clearly demonstrates why the basic match stats information is so lacking in detail to give fans a proper understanding of what has happened in a game.  Despite using data in a format that I hadn't seen before (placement success rates), then going one level deeper, I find myself in the position where I could go another level deeper to try to complete our understanding of this quirk.

Whatever the reason, there is no getting away from the fact that that shooting Far Post seems to have a significantly increased higher goal expectation than shooting Near Post. In a game of such small margins were teams try to gain from any advantage where possible let’s see if clubs and players learn from this and we begin to see either a greater proportion of shots being fired towards the Far Post or keepers minding their Near Post just a little less in this coming season.

How do Headers compare to Shots?

In performing some of my shooting analysis work I have struggled with how best to deal with headers.

It is obvious that, on average, headers are taken from locations closer to goal than non-headed attempts on goal (these non-headed attempts will be defined as “shots” during the rest of this piece).  That alone would be enough to ensure that we shouldn’t group together headers and shots when undertaking any aggregate analysis.

The combining together of shots and headers is even more problematic when we consider that they may have different outcome profiles even when taken from similar locations.  This realisation has had some recent airings on Twitter, so I thought I would use the data that I have collected from the Big 5 leagues last season to put on record just how a header compares against a shot.

Summary Numbers

Heads1

When looking at all shots and all headers we can see that there is only a negligible difference in the amount of each type that are on target (34% of headers vs 33% of shots).  However of those on target attempts, a header is more likely to be scored than a shot (12% v 9%).  It is no surprise to see that headers are blocked much more infrequently than shots; shots are blocked approximately three times as often as headers.

So, if headers are scored at a higher rate than shots, does that mean that, given the choice we would prefer our team to be having a headed attempt at goal rather than a shot struck with the foot?

I would suggest that the answer to that question would be “no”.  The main driver of why headers are converted more frequently than shots is due to the location of where the attempts originate.

Location of Attempts

Heads2

Almost 95% of all headers are taken from the central portion (within the width of the 6yd box) of the penalty area, this compares with just 25% of shots.  Virtually no headers are taken from outside the penalty area; whereas more than 54% of shots originate from these longer distances. At this stage, it’s now easy to see why headers are converted with greater frequencies than shots.

Direct Comparisons

How would the conversion rates for shots and headers compare if we looked at like for like, ie removed the location basis that is inherent with headers?

Inside 6yd box

In6

Shots taken from inside the 6 yard box are converted at 40%, compared to less than 25% for headers.  So within these close range locations headers were scored only 62% as often as shots were.

One other takeaway from this grouping of shots is that less than 40% of headers from this extremely close location were put on target.  Presumably this is indicative of the pressure that is applied to headers that are attempted from such close range.

Other Central Locations Inside Penalty Area

InC

This time we are looking at shots within the central portion of the penalty area, but beyond the 6 yard line. Once again, shots are converted at vastly superior rates to headers.  This time the conversion rate for shots is almost double that of headers at 20% and 10% respectively.  As before, we can see the difficulty that headers have in even just hitting the target.

Sides of Penalty Area

InS

Now turning our attention to shots / headers that were struck from the sides of the penalty areas (outside the width of the 6 yard box) we can see the familiar pattern continuing as yet again shots are converted at twice the efficiency of headers.

Summary In writing this article I set out to determine how much less likely a header was to score than a shot.  Without adjusting for shot location headers are scored at a greater rate to those of shots.  However, in respect to this particular topic the devil is in the detail as we determined that when shots and headers that were struck from similar places were compared the conversion rate for headers was only approximately half of that for shots. This is a fact that should be remembered by anyone interested in the analytical side of football

Perhaps I could go as far to suggest that with shots and headers having such vast differences in conversion rates, perhaps the time has come for shots and headers to be disclosed separately in post match statistics instead of them being aggregated together as is the current norm.