## Smart Use of Substitutes Can Make A Difference

Following on from Daniel Altman’s excellent piece on the scoring rate of substitutes I thought I would undertake my own analysis on the impact of substitutes.

The methodology I will use is slightly different to that employed by Altman in his article. I will use the Big 5 European leagues for last season (2012/13), and I will study the goal scoring rates for all players that scored at least 6 league goals last season.

The use of this filter gives me a list of 268 players that scored a combined total of 2,782 goals in 617,331 minutes of playing time.  This equates to an average scoring rate of 0.41 goals Per 90 minutes for our sample of players.

At this stage it is a well-documented fact that more goals are scored in the second half of games than the first half, and the apportionment in the Big 5 leagues last season was no different with just 44% of all goals scored in the first half and 56% in the second half.

The following is the distribution of goals in 5 minute time intervals for the 5 leagues last season:

We can see that, generally, the goal scoring rates increase in line with the time elapsed during the match.  For my purposes, the minutiae of the goal scoring rates isn’t important, instead we just need confirmation that this trend does exist in my data sample.

In his piece, Daniel Altman found that forwards coming on as substitutes scored at a higher rate than starting forwards.  But when we consider that more goals are scored in the second half than the first half then this is no great surprise.  Substitutes will spend a greater proportion of their playing time in the second half (when goal expectation is higher) compared to the first half than a starting player.

So what do we take from this?

The fact that substitutes have a higher scoring rate means that you can’t directly compare Goals Per90 figures between players that regularly start and those who make frequent substitute appearances.  Very simply, the substitute will have his numbers inflated and we would expect his Per90 numbers to drop in the event that he was handed a starting position.

However, Altman didn’t stop there and he found that “fatigue among forwards was a more powerful force than fatigue among defenders”.  That sentence struck a chord with me and I wanted to investigate the general phenomenon of fatigue in footballers a little further.

Hierarchy of Goals Per90

We have established that the longer a match goes on the greater the goal expectation.  This is one of the reasons why substitutes score at a higher rate than starting players.  So, by extension of this logic we would therefore expect players who are substituted to score less Per90 than players who played the full 90 minutes.

Not only would the substituted player be swimming against the tide of playing at least as many first half minutes as second half minutes when the goal expectation is at its lowest, but the fact that he is substituted may also indicate that he hasn’t played a great game thus far.

That second suggestion certainly won’t be true all the time.  The player may be injured, withdrawn for tactical reasons or just tired but it seems reasonable to assume that some of this cohort will have irked the manager enough with their performance to be substituted.

Even ignoring the suggestion that the substituted player has been having a less than stellar performance,  due to the increasing goal expectation it is reasonable to assume that the hierarchy of Per90 goal scoring rates would rank as follows:

• Substitutes_On
• Full 90 minutes Players
• Substitutes_Off

Now we’re finalised our hypothesis, how does that compare with what actually happened last season?

Each game that our 268 players took part in last season was divided into the 3 categories: Substitutes_On, Full 90 and Substitutes_Off and I totaled the number of goals and minutes that the group of 268 players as a whole racked up in each category.

Big 5 Leagues 2012/13

As expected, substitutes coming on scored at the highest rate of our three groups.  This group scored at a clip of 0.65 Goals Per90, however players that played the full 90 minutes actually posted the lowest Per90 numbers of 0.38 with the players that were substituted off sandwiched in between at a rate 0.42 Goals Per90.

I think this is a super interesting finding and it appears that Daniel Altman was spot on with his suggestion of fatigue being a big issue in the rate that forwards score goals.  My sample doesn’t specifically just include forwards, but as it includes the leading goal scorers it will obviously be forward biased.

It looks like the fatigue factor is so strong that it is even able to overcome the fact that more goals are scored in the second half than the first half.  We have shown that a player who starts the games and is withdrawn scores at a higher rate Per90 than a player who completes the full 90 minutes.

When you think about this, it is common sense.  Players tire and it’s better to replace them with fresh legs, but I’ve never seen the impact of tiredness quantitatively assessed before.  I have no doubt that clubs and organisations like Prozone have data that records the physical drop off in player performance due to fatigue but I am surprised that the impact is so strong for goal scorers that it outweighs the benefit of playing the entire second half of a game with its increasing goal expectation.

I’m sure that if we analysed the actual minutes that each player played and their scoring returns for those minutes we could remove the second half scoring bias and calculate exactly how much more likely a fresh player is to score than a player that has played the entire game.  However, I’m going to stop short of these calculations in this article as that would require another level of data analysis.

I am conscious that the above findings are based on just one season of data, so to give me some comfort as to the integrity of those findings I looked at each of the 5 league separately to see how they performed individually.

Encouragingly, all 5 of the leagues follow exactly the same trend.  The substitutes coming on comfortably post the highest Per90 scoring rates.  This group have the parlay of being fresh as well as spending proportionately more of their playing minutes in higher goal expectation periods of the game.  The players that were withdrawn have a slightly higher Per90 figure than the footballers that played the full 90 minutes with the benefit of freshness outweighing the back ended scoring bias.

I therefore feel that we can conclude that, not only do substitutes score at a higher rate than starting players but that the players who are subbed off score at a higher clip than their teammates that play the full 90 minutes.

What are the implications of this?

I can think of at least two implications.  The first is in terms of comparing players’ scoring rates it was presumed that substitutes’ scoring rates were inflated due to the nuances of the back ended time they spent on the pitch.  Daniel Altman confirmed this in his article.  However, we also need to be equally aware of players who were substituted off as they too will tend to possess higher Per90 performances than players who play the full match duration.

The second impact is much more important.  Unless there is a large difference in quality between the starting 11 and his substitutes any manager that doesn’t use all 3 substitutes are giving up some expected value.  And by “using substitutes” I don’t mean introducing them in the 85th minute or in injury time to simply run down the clock.

I find myself agreeing with Altman’s almost throwaway suggestion that players should be substituted early in the game.  Not only do we get the boost of the player coming on having fresh legs but we also reduce the negative impact of the fatigue of the substituted player as the change is being made earlier than “normal”.

I realize that managers may need to hold a substitute back to cover the chance of injury later in the game, but leaving that aside there really should be no reason why managers don’t ensure that they empty the bench in enough time to get the full benefit of the fresh player.

When are Substitutes used?

After establishing that it is important that managers properly balance the trade off between ensuring they can finish the game with 11 players and ensuring that they obtain maximum benefit from the use of their substitutes I found myself wondering how subs are currently used.

Here is the data from the first 20 Game Weeks of the 2013/14 Premier League season showing the percentage of possible substitutes that have played a minimum amount of minutes.

2013/14 Premier League (Weeks 1 – 20)

The blue plots are the first subs that were used by Premier League managers.  50% of all first substitutes played at least 30 minutes.  The noticeable drop off at the 45 minute mark is interesting; and this clearly shows the reluctance to substitute a player in the final minute of the first half.

The red plots represent a team’s second substitute.  50% of second substitutes play less than 20 minutes, and only approx 15% of second substitutes play at least 30 minutes.

We can see from the green plots that, in only 50% of the time does a third substitute play 6 minutes or more and 1 in 5 managers wait until the 89 minute to make their last change.  In fact, during the first 20 weeks in the Premier League there was a total of 98 possible substitutes that were not used.  I know the managers have a desire to finish the match with a full complement of players, but there is a trade off where this prudence has the opportunity cost of not making maximum use of fresh legs against a tiring opposition.

Other Positions

In this article I have concentrated on scoring players, primarily forwards.  Perhaps fatigue affects forwards more than other positions, but it’s more likely the case that we are better able to measure a goal scorer’s output and thus comment on their performances.

Would it be far-fetched to assume that a central midfielder would suffer less fatigue than a forward?  I don’t think so, and I assume that the clubs would be in the position to know how much physical fatigue each player suffers during a full 90 minutes. But are they in a position to be able to quantify how much that level of fatigue actually affects the chance of his team scoring a goal or conceding a goal?  I have my thoughts on this, but I just don’t know.

Am I advocating that players should be substituted on the 30th minute, the 45th minute or the 60th minute?  At this stage I cannot answer that.  As stated above, I would need to undertake more detailed analysis to assess the fatigue impact on a minute by minute basis to arrive at a definitive answer.  However, this analysis has shown that the fatigue impact is large enough to overcome the difference in the scoring rates between the two halves, so with that in mind there is really no reason for a manager not to avail of all of his available substitute opportunities.

Indeed, the use of substitutes is just another facet to the game that good managers will use to their advantage whilst poor managers will not realise the tactical advantage that smart substitutions could be able to give them.

EDIT (16/01/14 at 10:39)- A few comments has suggested that there may be a forward bias in the players that are substituted off.  Here is the split of only the starting forwards in my sample:

Even within this starting forwards group the players that are substituted have a higher Per90 rate than the forwards that play the entire 90 minutes.  Any further granular analysis than this would involve the identifying of individual players to see how they perform when substituted off compared to when they played the full 90 minutes.  But I would be concerned that we would be slicing the data very thinly at this point.

ADDITIONAL EDIT – (16/01/14 at 12:40)

To eliminate the data contamination that has been suggested may arise from players with a higher Goal Per90 figure being more likely to be substituted than those with a lower Per90 number I divided my data set into two groups.

I ranked all 268 players by their Goals Per90 figure and divided the table in half, thus creating a top half that includes all the marquee strikers and a bottom half that included players that scored 6 goals but who weren’t prolific goal scorers.

Even when looking solely at the bottom half of this table (so the players that aren’t prolific goal scorers), this group of players also show that they have a higher scoring rate when they are subbed off than when they play the full 90 minutes.

## Near or Far Post Shooting

In a previous article which can be found here I did some research on the percentage of scoring shots and headers that a shooting player can expect to achieve given a specific shot placement. As it was my first attempt at looking at shot placements I grouped all shots together but the difficulty with stats and data is that you can never just take the first metric at face value as further analysis can be undertaken, and inevitably this second level of analysis can provide interesting insights that are missed at the higher level of data review.

In order to refresh memories, here is the scoring percentage for each shot placement zone for all shots and headers:

Remember that we are looking at the goal mouth from the point of view of the striker. I now want to undertake some further analysis to see what other information we can learn, and to do this I am going to look at shot placement based on which area on the pitch the shots or headers were struck from. I have divided all unblocked shots and headers into three pitch areas (right, left and central) as laid out in this image below:

The boundaries of the three zones have been deliberately chosen to ensure that approximately 50% of shots in the sample fall within the Central Shots zone, with the other 50% being split almost equally between right and left sides.

Central Shots Zone

Let’s have a look first of all at shots which were struck from the Central zone.

No surprise to see the general “shape” of the scoring rates heatmap pretty similar to the one at the top of the article which is for all shots.  The main difference is that the scoring rates are higher across the board, hence the increased level of “redness” in this plot.  As we are looking at the shots from the best positions (straight in front of goal) this would be in line with our expectation

Shots from the Right

We’ll now cast our eyes at shots which came from the right side of the pitch as defined in the image above.

Now it gets interesting!!!!!

The above image shows the scoring rates for shots taken from the right hand side of the pitch and immediately a clear pattern jumps out.  As expected there is considerably more blue and less red on this image than the previous heatmap due to the fact that we are now looking at attempts from less attractive shooting locations. However, that’s not what is so intriguing about this heatmap.

The heatmap is extremely unbalanced, with all the red and orange zones concentrated onto the left side. The imbalance is so great that if we divide the plane of the goal into thirds, the average conversion rate for shots that hit the target in the left most third (Far Post) is 32%, the central third 7% and the right third (Near Post) is 14%.

As seen in my previous piece and logic would dictate, it would be expected that shots struck towards the centre of the goal would have the lowest scoring rate; but a conversion rate of 2.25 times higher for on-target shots aimed towards the Far Post than than those aimed for the Near Post appears hugely significant to these eyes.

Shots from the Left

And what about for shots from the other side of the pitch, the left?

The exact same pattern, only in reverse, emerges.

From this left side, the Far Post third (right) has an on target conversion rate of 30%, 8% for central third and 15% for the left third. This means that Far Post on-target shots from the left hand side of the pitch are converted at twice the rate of Near Post on target shots.  This ties in pretty neatly with the finding from the other side of the pitch.

At this stage I think it’s safe to conclude that on target shots towards the far post (third of the goal) has twice the success rate of near post shots.  Even without going any further, that strikes me as a pretty darn important piece of information.

Point of Order: For the rest of this piece, Far Post is defined as any shot where the ball would cross the plane of the goal line in the Far Post third of the goalmouth or wider.  Whereas Near Post is the opposite, it would cross the goal line either in the Near Post third of the goalmouth or wider. Also the remainder of this piece will concentrate on just the shots taken from the right and left sides of the pitch as I want to investigate in greater detail the apparent Near and Far Posts phenomenon.

Far Post is Superior

So what does this mean?  My first thoughts are that the Andy Gray cliché of “he should have went across the keeper there” is correct. However, I’m only going to give him half marks as I believe his assertion was based on the fact that, if missed, a shot across the keeper has a chance of being parried, allowing the attacking team to pick up the rebound and have another strike at goal.  A shot missed on the narrow side does not have this luxury. Not for one second do I think that Andy Gray was aware that on target shots to the far post are scored at rates of 2 to 2.25 times more than those shot towards the near post.  At least if he was aware of that fact then he, along with everyone else in football, kept that particular nugget very quiet.

Possible Reasons for Discrepancy

1 – The first possible explanation for this difference is that I’m only looking at shots that are on target, ie in this analysis I have ignored shots that were wide or high of the target. Perhaps looking at goals as a percentage of all unblocked shots is required as it may be more difficult to hit the target with cross shots than near post shots.

After investigation, it turns out that this was indeed the case as 68% of all Far Post shots missed the target, compared with 64% of Near Post shots.   However, that small difference isn’t anywhere near sufficient to explain the difference in goals scored as a percentage of unblocked shots.

After including missed shots, 9.9% of unblocked Far Post shots are scored, whereas the rate substantially falls to 5.3% for Near Post unblocked shots.  This means we end up with a final ratio of unblocked Far Post shots being scored at 1.8 times the rate of Near Post shots. So after ruling out the difference being attributable to off target shots we are still left with a significant unexplained difference in terms of the scoring rate for Far and Near Post shots.

2 – Could it be that goalkeepers are overly concerned with getting beaten at their near posts?

There is no doubt that it looks bad for a keeper if he is beaten at his near post, but perhaps they are trying to guard the near post at the detriment of the cross shot? At this point (with no access to goalkeeper positioning at the time of the shot) I don’t have any way to either prove or disprove this possible explanation, so unfortunately I have no other option than moving on to my next possibility.

3 – Another possible explanation for the difference is that I have so far excluded blocked shots from this analysis (as we never know where they will cross the plane of the goal).

Due to the fact that a cross shot has to travel through the central area of the pitch it certainly seems likely that shots aimed towards the Far Post have a greater chance of being blocked than those targeted towards the near post.  But is the difference in the rates that Near and Far Post shots are blocked enough to explain the near twice as often conversion differential?

This could be quite a difficult question to answer as we have no way of knowing where the shots would have crossed the goal line had they not been blocked.  However, I have been spared some potentially impossible mental gymnastics as even if EVERY blocked shot was a Far Post shot (so none of the blocked shots were destined for Central or Near Post!!) the scoring rate for all Far Post shots would still exceed that of Near Post shots. That really is something. So although that is good news, as a numbers man I have an innate desire to quantify effects and so I’m going to try to make an educated guess at the location in the goal where blocked shots were destined for.

First up, what’s our split of non-blocked shots:

As stated above, I would assume that Near Post shots are likely to get blocked less than Far Post shots, but I would assume it would be reasonable for Central shots to be blocked at the same rate as Far Post shots.

Having established this, let’s then assume that Far Post and Central shots are blocked at twice the rate of Near Post shots (this is only a guess, but seems reasonable to me and I need to pick a number).

This blocked shots weighting combined with the volume of non blocked shots results in an assumed distribution of the Blocked Shots as follows:

Far Post               53%

Central                 21%

Near Post            26%

Total                     100%

I will therefore split the Blocked shots in my data sample as being destined for Far Post, Near Post and Centrally in the ratios of 53%, 26% and 21% respectively.

At this stage, I want to point out that the only purpose of the preceding couple of paragraphs is to approximate the number of blocked shots for each of the goal zones (Far and Near Posts and Central) as the analysis cannot be properly completed with some attempt at apportioning blocked shots.

Yes, some of my assumption can be challenged but I don’t think that I can be that far out in the approximations I have used; and importantly certainly not enough to change the core findings of this analysis piece.

Conversion % of All Shots

Armed with an approximation of blocked shots for each goal zone we can now reach a conclusion which takes into account the percentage of all shots which are scored from the sides of the pitch (the areas denoted in the second image in this piece) depending on whether the ball would have ended up Near Post, Far Post or centrally in the goal.

Remarkably, 6.8% of Far Post shots were scored, this compares with just 4.4% of Near Post shots.  As raw numbers, both of those conversion rates appear fairly small, but don’t forget that we are dealing with shots that are struck from less attractive locations on the pitch (ie away from the central strip of the pitch).

Conclusion

What I have laid out in this article appears to be quite fundamental. When shooting from less attractive positions, the player shooting has a conversion rate which is more than 1.5 times better for Far Post attempts than for Near Post attempts.

If this fact wasn’t impressive in its own right, when this is parlayed with the chance of a Far Post shot being parried and the rebound scored from then the advantage is even greater than the basic 1.5 multiple as calculated above.

Why?

The question I haven’t been able to answer properly is why this phenomenon exists in professional football when clubs have access to both better data and bigger brains than mine?

I don’t think it can be due to variance as my sample has a huge amount of shots, it contains every shot taken in the Big 5 Leagues during the 2012/13 season – that’s almost 50,000 shots.

After undertaking the work for this article the only conclusion I can arrive at is that it’s due to Goalkeeper positioning.  I have taken account of most other things, ie the difficulty of hitting the target and the apportionment of blocked shots. Could it really be that keepers are so conscious about the “Pride of the Near Post” that they over compensate?  I am unable to coherently put forward any other possible reasons.

In order to gauge reaction to this piece I sent a draft to David Sally and Chris Anderson, the co-authors of “The Numbers Game”.  David made the point that a higher success rate for Far Post shots could be indicative of another aspect of the way goalkeepers play.  If they were slow to come off the line then due to basic geometry they would be more exposed to Far Post shots than Near Post efforts.

As alluded to in my preamble to this post, you can look at a facet of the game using just the headline measurements (conversion % for all shots in this instance), we can then go one level deeper into the data (slice the data by pitch sides) but even this may not be enough.  Chris Anderson made the point that I should probably further divide the data into shooting distances.  This would involve going yet another level deeper into the data. Perhaps I might further subdivide the data in a future article so that I can see the impact of shot distance on this Near and Far post phenomenon.  However, for my money that lack of further slicing of the data doesn’t diminish the importance of the findings laid out here.

As an aside, this clearly demonstrates why the basic match stats information is so lacking in detail to give fans a proper understanding of what has happened in a game.  Despite using data in a format that I hadn’t seen before (placement success rates), then going one level deeper, I find myself in the position where I could go another level deeper to try to complete our understanding of this quirk.

Whatever the reason, there is no getting away from the fact that that shooting Far Post seems to have a significantly increased higher goal expectation than shooting Near Post. In a game of such small margins were teams try to gain from any advantage where possible let’s see if clubs and players learn from this and we begin to see either a greater proportion of shots being fired towards the Far Post or keepers minding their Near Post just a little less in this coming season.