# Smart Use of Substitutes Can Make A Difference

Following on from Daniel Altman’s excellent piece on the scoring rate of substitutes I thought I would undertake my own analysis on the impact of substitutes.

The methodology I will use is slightly different to that employed by Altman in his article.  I will use the Big 5 European leagues for last season (2012/13), and I will study the goal scoring rates for all players that scored at least 6 league goals last season.

The use of this filter gives me a list of 268 players that scored a combined total of 2,782 goals in 617,331 minutes of playing time.  This equates to an average scoring rate of 0.41 goals Per 90 minutes for our sample of players.

At this stage it is a well-documented fact that more goals are scored in the second half of games than the first half, and the apportionment in the Big 5 leagues last season was no different with just 44% of all goals scored in the first half and 56% in the second half.

The following is the distribution of goals in 5 minute time intervals for the 5 leagues last season:

We can see that, generally, the goal scoring rates increase in line with the time elapsed during the match.  For my purposes, the minutiae of the goal scoring rates isn’t important, instead we just need confirmation that this trend does exist in my data sample.

In his piece, Daniel Altman found that forwards coming on as substitutes scored at a higher rate than starting forwards.  But when we consider that more goals are scored in the second half than the first half then this is no great surprise.  Substitutes will spend a greater proportion of their playing time in the second half (when goal expectation is higher) compared to the first half than a starting player.

So what do we take from this?

The fact that substitutes have a higher scoring rate means that you can’t directly compare Goals Per90 figures between players that regularly start and those who make frequent substitute appearances.  Very simply, the substitute will have his numbers inflated and we would expect his Per90 numbers to drop in the event that he was handed a starting position.

However, Altman didn’t stop there and he found that “fatigue among forwards was a more powerful force than fatigue among defenders”.  That sentence struck a chord with me and I wanted to investigate the general phenomenon of fatigue in footballers a little further.

Hierarchy of Goals Per90

We have established that the longer a match goes on the greater the goal expectation.  This is one of the reasons why substitutes score at a higher rate than starting players.  So, by extension of this logic we would therefore expect players who are substituted to score less Per90 than players who played the full 90 minutes.

Not only would the substituted player be swimming against the tide of playing at least as many first half minutes as second half minutes when the goal expectation is at its lowest, but the fact that he is substituted may also indicate that he hasn’t played a great game thus far.

That second suggestion certainly won’t be true all the time.  The player may be injured, withdrawn for tactical reasons or just tired but it seems reasonable to assume that some of this cohort will have irked the manager enough with their performance to be substituted.

Even ignoring the suggestion that the substituted player has been having a less than stellar performance,  due to the increasing goal expectation it is reasonable to assume that the hierarchy of Per90 goal scoring rates would rank as follows:

• Substitutes_On
• Full 90 minutes Players
• Substitutes_Off

Now we’re finalised our hypothesis, how does that compare with what actually happened last season?

Each game that our 268 players took part in last season was divided into the 3 categories: Substitutes_On, Full 90 and Substitutes_Off and I totaled the number of goals and minutes that the group of 268 players as a whole racked up in each category.

Big 5 Leagues 2012/13

As expected, substitutes coming on scored at the highest rate of our three groups.  This group scored at a clip of 0.65 Goals Per90, however players that played the full 90 minutes actually posted the lowest Per90 numbers of 0.38 with the players that were substituted off sandwiched in between at a rate 0.42 Goals Per90.

I think this is a super interesting finding and it appears that Daniel Altman was spot on with his suggestion of fatigue being a big issue in the rate that forwards score goals.  My sample doesn’t specifically just include forwards, but as it includes the leading goal scorers it will obviously be forward biased.

It looks like the fatigue factor is so strong that it is even able to overcome the fact that more goals are scored in the second half than the first half.  We have shown that a player who starts the games and is withdrawn scores at a higher rate Per90 than a player who completes the full 90 minutes.

When you think about this, it is common sense.  Players tire and it’s better to replace them with fresh legs, but I’ve never seen the impact of tiredness quantitatively assessed before.  I have no doubt that clubs and organisations like Prozone have data that records the physical drop off in player performance due to fatigue but I am surprised that the impact is so strong for goal scorers that it outweighs the benefit of playing the entire second half of a game with its increasing goal expectation.

I’m sure that if we analysed the actual minutes that each player played and their scoring returns for those minutes we could remove the second half scoring bias and calculate exactly how much more likely a fresh player is to score than a player that has played the entire game.  However, I’m going to stop short of these calculations in this article as that would require another level of data analysis.

I am conscious that the above findings are based on just one season of data, so to give me some comfort as to the integrity of those findings I looked at each of the 5 league separately to see how they performed individually.

Encouragingly, all 5 of the leagues follow exactly the same trend.  The substitutes coming on comfortably post the highest Per90 scoring rates.  This group have the parlay of being fresh as well as spending proportionately more of their playing minutes in higher goal expectation periods of the game.  The players that were withdrawn have a slightly higher Per90 figure than the footballers that played the full 90 minutes with the benefit of freshness outweighing the back ended scoring bias.

I therefore feel that we can conclude that, not only do substitutes score at a higher rate than starting players but that the players who are subbed off score at a higher clip than their teammates that play the full 90 minutes.

What are the implications of this?

I can think of at least two implications.  The first is in terms of comparing players’ scoring rates it was presumed that substitutes’ scoring rates were inflated due to the nuances of the back ended time they spent on the pitch.  Daniel Altman confirmed this in his article.  However, we also need to be equally aware of players who were substituted off as they too will tend to possess higher Per90 performances than players who play the full match duration.

The second impact is much more important.  Unless there is a large difference in quality between the starting 11 and his substitutes any manager that doesn’t use all 3 substitutes are giving up some expected value.  And by “using substitutes” I don’t mean introducing them in the 85th minute or in injury time to simply run down the clock.

I find myself agreeing with Altman’s almost throwaway suggestion that players should be substituted early in the game.  Not only do we get the boost of the player coming on having fresh legs but we also reduce the negative impact of the fatigue of the substituted player as the change is being made earlier than “normal”.

I realize that managers may need to hold a substitute back to cover the chance of injury later in the game, but leaving that aside there really should be no reason why managers don’t ensure that they empty the bench in enough time to get the full benefit of the fresh player.

When are Substitutes used?

After establishing that it is important that managers properly balance the trade off between ensuring they can finish the game with 11 players and ensuring that they obtain maximum benefit from the use of their substitutes I found myself wondering how subs are currently used.

Here is the data from the first 20 Game Weeks of the 2013/14 Premier League season showing the percentage of possible substitutes that have played a minimum amount of minutes.

2013/14 Premier League (Weeks 1 – 20)

The blue plots are the first subs that were used by Premier League managers.  50% of all first substitutes played at least 30 minutes.  The noticeable drop off at the 45 minute mark is interesting; and this clearly shows the reluctance to substitute a player in the final minute of the first half.

The red plots represent a team’s second substitute.  50% of second substitutes play less than 20 minutes, and only approx 15% of second substitutes play at least 30 minutes.

We can see from the green plots that, in only 50% of the time does a third substitute play 6 minutes or more and 1 in 5 managers wait until the 89 minute to make their last change.  In fact, during the first 20 weeks in the Premier League there was a total of 98 possible substitutes that were not used.  I know the managers have a desire to finish the match with a full complement of players, but there is a trade off where this prudence has the opportunity cost of not making maximum use of fresh legs against a tiring opposition.

Other Positions

In this article I have concentrated on scoring players, primarily forwards.  Perhaps fatigue affects forwards more than other positions, but it’s more likely the case that we are better able to measure a goal scorer’s output and thus comment on their performances.

Would it be far-fetched to assume that a central midfielder would suffer less fatigue than a forward?  I don’t think so, and I assume that the clubs would be in the position to know how much physical fatigue each player suffers during a full 90 minutes.  But are they in a position to be able to quantify how much that level of fatigue actually affects the chance of his team scoring a goal or conceding a goal?  I have my thoughts on this, but I just don’t know.

Am I advocating that players should be substituted on the 30th minute, the 45th minute or the 60th minute?  At this stage I cannot answer that.  As stated above, I would need to undertake more detailed analysis to assess the fatigue impact on a minute by minute basis to arrive at a definitive answer.  However, this analysis has shown that the fatigue impact is large enough to overcome the difference in the scoring rates between the two halves, so with that in mind there is really no reason for a manager not to avail of all of his available substitute opportunities.

Indeed, the use of substitutes is just another facet to the game that good managers will use to their advantage whilst poor managers will not realise the tactical advantage that smart substitutions could be able to give them.

EDIT (16/01/14 at 10:39)- A few comments has suggested that there may be a forward bias in the players that are substituted off.  Here is the split of only the starting forwards in my sample:

Even within this starting forwards group the players that are substituted have a higher Per90 rate than the forwards that play the entire 90 minutes.  Any further granular analysis than this would involve the identifying of individual players to see how they perform when substituted off compared to when they played the full 90 minutes.  But I would be concerned that we would be slicing the data very thinly at this point.

ADDITIONAL EDIT – (16/01/14 at 12:40)

To eliminate the data contamination that has been suggested may arise from players with a higher Goal Per90 figure being more likely to be substituted than those with a lower Per90 number I divided my data set into two groups.

I ranked all 268 players by their Goals Per90 figure and divided the table in half, thus creating a top half that includes all the marquee strikers and a bottom half that included players that scored 6 goals but who weren’t prolific goal scorers.

Even when looking solely at the bottom half of this table (so the players that aren’t prolific goal scorers), this group of players also show that they have a higher scoring rate when they are subbed off than when they play the full 90 minutes.

• Thom Lawrence

I’ve been looking forward to more coverage of this sort of thing since earlier in the season. Roberto Martinez had a couple of games where he was credited with changing games with substitutes, in contrast with David Moyes who was sometimes accused of substituting poorly.

This seems like a great first step, analysing the player performances, but I think it’d be much more interesting to develop baseline 5min stats for all sorts of different measures for teams as a whole, and then analyse whether substitutes gave teams a bump or punished them. I think shots on target for/against would be interesting to analyse in terms of attacking and defending substitutions – did trailing teams use substitutes in a way that increased their shots on target? Did leading teams use them in ways that reduced shots on target against?

How can we tell if a _manager_ substitutes well (or correctly), as opposed to how just an individual performs as a substitute?

• Jonathan ford

Interesting – do you have the possibility to take defenders out of the 90 minute cohort. I suspect quite few may get 6 goals a season while playing 30+ full games thus having a v low goals per 90 stat. Would it be significant though?

• Colin Trainor

I’m assuming that their impact would be almost negligible. I wanted to have a defined filter criteria so left it as simple as I could and decided on a minimum of 6 goals.
If you omit defenders then what about forwards when they line out as midfielders etc. I wanted to keep it simple for my first pass at this.

• http://11tegen11.net 11tegen11

Colin,

Nice work there!
I appreciated all the effort that went into this piece. Seems like a huge task to neatly compile this data.

However, I tend to disagree on all the fatigue stuff. For me, that’s just pure hypothesis.
The key sentence for me is this one.

” We have shown that a player who starts the games and is withdrawn scores at a higher rate Per90 than a player who completes the full 90 minutes.”

It is not to deny your work, as this is very much an intriguing finding, but let me raise some bias that may explain the finding quoted above.
It may be that teams withdraw their star strikers once they are less encouraged to find more goals, like in 2+ goal leads. It may be that better strikers get withdrawn more than poor strikers to protect the better striker, or to allow new signings and youngsters to bed in.
Also, it seems likely that strikers are substituted more often than midfielders or wide players, which may skew the numbers. There must be quite some non-strikers among 6+ goals players in the top-5 leagues.

I understand the idea to raise an explanation for the finding, but for bias seems more likely than fatigue or any other reason.

• Colin Trainor

Sander, your comments may have some merit, however for you to believe that fatigue is not real you would have to ignore the plot that Daniel Altman posted here (http://www.bsports.com/statsinsights/football/easier-score-sub/3) which shows that fatigued forwards only score at 10% of the rate at the end of the match as they do at the start.

I wanted to keep the analysis at a global level as sub dividing the data may lead to noise. Following your question I looked at just the starting forwards in my data sample and split out their Per90s between the players that played the Full 90 and those that were subbed off.

I get the following figures:
Full_90 – 0.44
Sub_Off – 0.52

So even when I narrow it down to just forwards, there is a greater return for subbed off players Per90 than those forwards that played the entire game. You might say that the star forwards will be sustituted to protect them. But is it not as likely that the borderline starting forward will also be substituted just as much?

Neither me or you know this. Anything else is pure conjecture at this stage but I think it is wrong to simply dismiss fatigue out of hand.

• http://11tegen11.net 11tegen11

I’m far from convinced and I would challenge Daniel to run that analysis again and this time not only correcting for the minute of play (more goals in second half), but also for game state, and team strengths.

My hunch is that forwards get subbed on when teams need a goal. So, it is to be expected that the scoring rate of subbed on forwards is higher. Probably those teams start conceding at higher rates too. All in line with the known effects of trailing a game by one goal.

All of this already fits a perfect explanation for more goals per minute by subbed strikers.
If there is a correlation between subbing strikers on and pushing to score goals, all you’ll ever find is more goals by subbed on strikers.

All we can presently do, on the basis of your work and Daniels work is conclude that subbed on strikers score at higher rate, even when corrected for minute of play. For the rest it could be down to any factor outside the analysis: tactical reasons related to game state would be the first to factor in, as we have data on that.

• Colin Trainor

Let’s leave aside subbed on players, but you are ignoring the fact that Subbed off forwards score at a higher rate than forwards that play the full 90.

How do you explain that?
For every leading team that are shelling and thus the forwards on that team that play the full 90 minutes mightn’t produce many goals at the end of the game there will be a losing team that is throwing the kitchen sink and they will still probably have their starting forward on (as well as perhaps another one that has replaced a midfielder). Hence my initial suggestion that Game Stats should even out across the sample.

• http://fantasyformation.com Gummi

Brilliant, as well as the piece you mentioned from Altman. In my mind managers tend to think about substitutes in this order:

1) To make tactical changes (the first one is often a proactive change to affect the game or an answer to a change from the other manager)
2) To introduce impact subs (e.g. Solskjaer or Dzeko)
3) To cover for injuries

Considering this we understand why managers would be reluctant to use 1) until after the break and 3) until little time is left. However, using 2) should be viable at any time, but the problem is to convince managers to value 2) over 1).

• Jordan

Do you have minutes played and goal data for each game the players in your data set played in? I’d be curious in the results of a regression of the form Yig = a + b1Mig + b2Sig where Yig is goals scored for player i in game g, Mig is minutes played by player i in game g, and Sig is a binary variable equal to 1 when player i was substituted on in game g. A lot of managers (just anecdotal observation) sub off a prolific but defensively suspect striker in the last few minutes when they are leading, which might be biasing your sample for the goal scoring rates of subbed off players versus full 90 players.

• Rufus

Great article. One additional thought to chuck into the mix which might be influencing the results. Because you have chosen players who have scored at least 6 goals it can safely be assumed that this will mainly be a mix of wingers, forwards and attacking central midfielders. Is it possible that forwards are more likely to be substituted off than central midfielders (because their change is more likely in a tactical move to close down a game etc), therefore the Sub Off group will be more heavily weighted towards higher strike rate forwards and the Full 90 Group more heavily weighted to lower strike rate (but still over 6) midfielders? ie it is in part the different mix of positional type player in each group which is driving the higher strike rate of the Sub Off Group vs the Full 90 group.

Rufus
@demonbitters

• Colin Trainor

Rufus, 11tegen11 submitted a similar comment and I have answered it in my reply to him.

• Angus Murray-Brown

Excellent work and very interesting.

This has been a massive tactical hole for me waiting for exploitation. I think there is a big argument for a new type of player – an attacking presser – think Dirk Kuyt on steroids.

The role would be to run and press as hard as possible until the point of collapse at around 35 minutes before being replaced by fresher players who can work hard but get the half time break to recharge.

These pressers would need different training for different physiological requirements. They would be expected to burn out quicker.

I would have 2 of these players with one sub as spare – is this why Chelsea collect so many wide forwards? (Although they make their changes post half time)

• toshack

Colin,
Really interesting (as well as Altman’s views). Good work.
For reference, I think there is a section in “The Numbers Game” which explicitly discusses the “correct” timing of substitutions (i.e. they even pinpoint the minutes when it is best to sub).

• Duncan

Brilliant article, with a great bit of insight. This level of nuance gives a great picture of how the game is optimally played.

• Nikhil

But surely teams that are shelling are more likely to make defensive (midfield) and not offensive substitutions? Hence, leaving the forwards on for the full 90 and thus a potential explanation for the reduced Per90? Even when teams shell, the good ones keep strikers on to have an out-ball and some amount of counter attacking efficiency. ‘Super-subs’ may be sent in with a clear mandate to change a potentially losing game state – tactical system of the team aligning to maximise the sub’s goal expectancy.

Of course, it is hard to argue against the drop off that comes about with fatigue. But the numbers could be skewed by these effects?

• Colin Trainor

Did you see the edits? In the first edit I looked purely at forwards and the same pattern held for them as the entire sample.

And in the second edit I looked just at lower half of the table (in terms of goals per90) and the same phenomenon existed in that data set too.

Unsure what more I can do.