Based on this year’s figures, Luis Suarez is an exceptional goal scorer. Daniel Sturridge less so…!

[Dramatic pause for effect.]

Now that the bomb is away, let’s focus on the stats.

This article is written to explain how one could compare what was expected to happen (based on some statistical model) and what actually happened, and how to quantify how likely that result was, in the end. It may appear to somewhat overlap with my previous article on the statistical significance of comparing metrics, but it should be stressed that both pieces have been written to illustrate methods, using data for demonstration rather than to focus on the particular data analysed.

Goal expectation (ExpG) models have been growing like mushrooms in the football analytics community. Some analysts have published the details of their models; others have kept the calculations to themselves. In general, these models take into account a number of factors affecting the probability of a shot being scored and assign a numerical “goal expectation” value to each shot. This way, one can assess the performances of players in terms of scoring by comparing the actual number of goals scored compared to what would be expected given the nature of the chances they took.

This article is partly motivated by this post by Mark Taylor and this one by Daniel Altman in which they indirectly highlight the fact that knowing the total ExpG figure of each player/team is not sufficient, but one needs to know how that ExpG breaks down i.e. the ExpG value of each shot. The reason behind this is that the distribution of the ExpG figures will affect the probability of scoring 0, 1, 2, … goals etc.

Armed with this, I’ve set about to check how the number of goals actually scored by a player compares to what was expected from him, given the chances he took. I’ve used the goal expectation model that was developed by Colin Trainor and myself, introduced in this piece and whose results have been used on this site.

Let’s start with the Premier League top scorer. The following chart depicts a histogram of the ExpG of his shots:SuarezExpGMore than 50% of his shots have an ExpG figure between 0.02 and 0.08. A few of his shots arise from even more difficult chances but there are some shots from relatively good positions including a couple which he was odds-on to score (and he did). I should note here that other analysts may have different numbers depending on how their models calculate ExpG, but that is not that important right now as the same theory applies but the results may differ slightly (I presume).

Based on our results, the total ExpG figure for Suarez’s 116 shots was 13.8 goals. Instead, he found the net a total of 23 times. If we assume that the ExpG figure assigned for each shot closely resembles the goal expectation of that particular shot from an average player, we can calculate the probability of a player actually scoring 0 goals out of 116 similar shots to the ones taken so far by Suarez. That would be given by:

Prob( 0 goals ) = ( 1 – ExpG_1 ) * ( 1 – ExpG_2 ) * …. * ( 1 – ExpG_116 ) = 0.0000000653 = 0.00000653%

where ExpG_i is the goal expectation of the ith shot. Similarly, we could calculate the probability of our average player scoring 116 goals which would be:

Prob (116 goals) = ExpG_1 * ExpG_2 * … * ExpG_116 = 0.000 … (total of 124 zeros!) … 000217

Obviously, both probabilities are very small. We could sit down and write down algebraic expressions for the probability of the average player scoring 1, 2, 3, …, 114, 115 goals too. [For the statistically-oriented readers, that is the probability distribution of the sum of 116 Bernoulli variables which don’t necessarily have the same success probabilities (given by ExpG_i in each case here)]. Instead of doing that, we can simulate these shots/goals and look at the resulting distribution.  Out of the 116 shots Luis Suarez took, the average player would score goals with the following distribution:SuarezDistribution

As designed, the average would be 13.8 goals, but inspecting the whole distribution is much more revealing. The probability of an average player scoring 23 goals or more (i.e. equaling or surpassing Suarez performance) is just 0.6% and is highlighted in blue in the chart above. This indicates how exceptional Suarez’s goal scoring performance has been this season.

If we repeat this process for Daniel Sturridge and his 63 shots the goal distribution is as follows:SturridgeDistribution

The probability of an average player equaling or surpassing Sturridge’s tally of 16 goals given the chances Daniel took is 6.1%. It’s still low, but not as impressive as Suarez’s.

I’ve carried out the same analysis for all players who scored an arbitrarily chosen number of 9 or more goals in the Premier League so far this season and calculated the probability that an average player would equal or exceed the observed performance (i.e. the number of goals scored) of each player. Note that the penalty goals (or misses) need not be removed from the simulation, as they have been assigned a higher ExpG value which reflects the goal expectation of those shots. The resulting probabilities are shown in increasing order in the following chart:TopStrikersProbIf we were to use the 5% significance level, only 3 players (Suarez, Hazard and Yaya Toure) have demonstrated statistically significant, above-average, goal scoring performances once the nature of their chances has been taken into consideration. They are followed by Sturridge who narrowly misses the cut. However, the remaining leading goalscorers have not demonstrated any exceptional ability in converting their chances and in fact, players like Giroud and Negredo have been vastly underperforming in terms of the number of goals they actually scored compared to the number they were expected to contribute given the chances they took. Especially, in Giroud’s case, the 10 goals scored compared to the 13.6 he was expected to score given his chances, constitutes serious underperformance which suggests that perhaps Arsenal should be looking for reinforcements in that area in the summer.

As a concluding remark, I’d reiterate that this is a method to compare actual versus expected goals scored which does not make any unnecessary distributional assumptions on ExpG. It can therefore accommodate comparisons either in terms of specific players who could have a non-homogeneous shot profile (e.g. including shots from favourable positions, penalty shots or long range attempts) or even be used to evaluate a team’s scoring performance. Finally, it can be applied from a defensive standpoint to evaluate how teams or goalkeepers have prevented shots from being scored.

  • Quincy

    Hi, thanks for a very interesting article.

    I have one comment, but I know almost nothing about stats, so please be patient 🙂

    You write: ” If we assume that the ExpG figure assigned for each shot closely resembles the goal expectation of that particular shot from an average player…”

    Does this mean you have a ExpG figure based on the position the shot was taken from, and that figure is just the average of goals from shots taken from that position? If so, am I correct in saying this analysis shows the likelihood that a player is shooting at better than average?

    • Constantinos Chappas

      Thanks for the comment, Quincy.

      Yes, our model (developed with Colin Trainor) assigns a probability of each shot being scored by an average player. It’s not based on shot location only, but some additional factors like whether the shot was a header or not etc. So in effect, the distribution charts show the likelihood, as you put it, of an average player scoring 0, 1, 2, etc goals out of those chances. The final graph turns the whole thing upside down and shows the probability that an average (i.e. the one depicted in the distributions above) would score at least the number of goals the top strikers did score. This means that an average player would have very little chance of scoring 23 goals like Suarez did if he had the same shots. The same applies in the case of an average player outscoring Yaya Toure or Hazard given their chances.

      I hope the above helps.

      • Quincy

        I see. So the last graph gives the probability of an average player scoring as many goals as, say, Suarez.

        So the conclusion is that Suarez is an average finisher but very lucky, a very good finisher, or some mixture, a bit above average, but a bit lucky too. I suppose this doesn’t discount the possibility that Suarez is below average, but extremely, out-of-this-world lucky! All of this is not quite the same as the conclusion I came to in my comment above, that it’s likely Suarez is an above average finisher.

        Thanks for the response!

        • Constantinos Chappas

          They sometimes say that statistics means never having to say you are certain about something!

          Perhaps there is a positive, but extremely small, probability that Suarez is a below average striker who got extremely lucky and scored the goals he scored. Similarly, as mentioned in the article, there is a positive, but very small probability that he would fail to score in any of his 116 shots, even if he were an average player. But if we were to adopt this viewpoint, then anything could be down to luck.

          In my opinion, what the data say is that, if we were to choose a significance level of 5% or 1%, the null hypothesis that he is has converted at an average rate would be rejected in favour of the hypothesis that his conversion rate is above average, based on this year’s data.

          • Quincy

            Thanks for taking the time to respond, I learnt a lot! 🙂

  • Ian Baldwin

    Hi Constantinos,

    Across all of your data, do you have a complete list of Strikers who are above average finishers to a statistically significant degree? It would be very interesting to see if there are any dark horses at smaller clubs who are underrated due to the shots they get. Also it’d be good to see which ‘great’ striker’s aren’t above average in terms of finishing, but maybe rely on shot volume and other such circumstances to produce the goals.


    • Constantinos Chappas

      I haven’t done this for all strikers. It’s something that I could perhaps try once I get the time. Thanks.

  • Dan

    Hi Constantinos – Thanks for the interesting article.
    I was thinking about ExpG type models when looking at relatively small sample sizes, and wondered if it was possible to put some bounds or expectations on the ExpG numbers themselves (rather than the actual goals). eg if you consider the simplified 1 dimensional case where there are only 3 zones in your ExpG model A, B & C with shot ExpG of 10%, 30% and 50% say. Say a player takes 5 shos from zone B with total ExpG of 1.5, now you could say that all 5 of those shots could have been exactly on the border with zone A and so the true ExpG was closer to 0.5 (or inversely on the border with zone C for ExpG 2.5), so you can say the bounds for the ExpG of those 5 shots were 0.5 – 2.5 and you could also run simulations to say for 5 random shots within zone B there is a 99% confidence that the ExpG of those shots wasbetween x & y.
    Now if you extended this to your 2d ExpG model and the players shot counts in your various zones you could quantify what the likely possible range of ExpGs was for any specific shot sample.
    Does this make sense, any thoughts?

    • Constantinos Chappas

      If I’m understanding correctly, what you are suggesting is to include the uncertainty around ExpG for each shot. Without any relevant analysis to back this up, my gut feeling is that this would only increase the overall uncertainty i.e. perhaps flattening the goal distributions (2nd and 3rd chart in the article above) thus making it even less likely for a player to make the (arbitrarily chosen) 5% cut-off point. In other words, it would make it even more difficult to reject the null hypothesis that a player’s conversion rate is no different that an average player.

      • Dan

        The purpose of this would be to quantify the probability that the player had better than average chances within the zones rather than that the player converted at better than average rate. ie what is the probability that distribution of shots within your individual zones had a meaningful impact on the ExpG

        • Constantinos Chappas

          As the data is at a pretty granular level, it would be a bit difficult to implement. I think I can see your point if the data were grouped in categories/zones.

  • seb

    Great article. I really like the first graph. It seems to reflect some of the ridiculous efforts that he’s scored, the so called “goals out of nothing”. It would be interesting to see other strikers represented in the same way and whether a striker profile could be built up. For example Darren Bent may mainly score very easy chances, but never actually shoots from harder to score-in positions, his ability comes from being in the right place at the right time.

    Also, am I right in thinking that the final graph indicates that if an average player got into the same positions to have the same shots as Giroud, there is a 90% chance that they’d score as many as Giroud?

    • Constantinos Chappas

      Thanks for the comment. You are right regarding the “player profiles” as people tend to only look at average ExpG, and that is partly the point of the article. If for example a player had a number of shots from very difficult positions as well as some from extremely good positions, his ExpG per shot would perhaps be average, and could be the same as someone else who only took shots from average chances. A player profile, somehow detailing all this information may be more useful.

      Regarding the last graph, yes a player of “average” conversion ability has a 90% chance of scoring at least as many goals as Giroud.

  • Errorr

    What about shot creation as a skill? One of the things basketball analytics has learned is that efficiency isn’t as informative alone without the context of the ability to create shooting chances.

    I figure that strikers may be consistently worse along this metric another players only shoot at the most opportune of times. Given the low negative consequences of a missed shot I would suggest that what the data means is that Suarez is too picky with his spots and may be giving up goals in the long run. If he took low quality chances in position where he would otherwise be dispossessed his scoring would likely go up.

    I guess I am just concerned that people focus too much on efficiency of shots to the detriment of higher volume strategies.

    • Constantinos Chappas

      I am not sure I agree with your suggestion that if Suarez “… took low quality chances in position where he would otherwise be dispossessed his scoring would likely go up.”. Based on Mark Taylor’s post which I link to in the article above, and having run the simulations, adding low quality – low ExpG – shots (i.e. increasing the mass towards the left of the first chart) may have added goals to his total tally due to sheer volume, but the inevitable low conversion rate from those shots would make it even more difficult to pass the simulation-based, statistical significance test mentioned in my article regarding his above-average conversion rate. Essentially, we’d be moving away from the equivalent of Mark’s two shots scenario and closer to his 12 shots example.