Suarez, Aguero and the importance of significance

First, a disclaimer: the use of the following player examples to demonstrate a bit of statistical theory is by no means a criticism or an opinion on their footballing ability. Oh, and I’m actually a Liverpool supporter so this is definitely NOT a dig at Suarez!

Now that the disclaimer is out of the way let’s start: Luis Suarez has so far scored 23 goals in 19 league appearances. Sergio Aguero has scored 15 goals in 15 starting and 2 substitute appearances. Let’s call them a total of 17 appearances for now and we can deal with this later on. Here are these figures in summary:

Based on these numbers Suarez is scoring at a much higher rate than Aguero. But could this simply be randomness instead of anything else?

It’s time for a bit of statistics. Goals, shots, tackles, interceptions and lots of other match statistics can be thought of as random variables, which follow a probability distribution. So if you were to collect a zillion matches, a p0 percentage of them would have 0 events occurring (e.g. goals) , p1 would have exactly 1 event occurring, p2 would have exactly 2 events and …. Well, you get the drift. These probabilities (p0, p1, p2, …) form the probability distribution.

One distribution that is often used when dealing with the occurrence of events is the Poisson distribution. Given a mean, it tells us the probability of an event happening 0, 1, 2, 3, … times within a particular period. So for example, with an average rate of 1.21 goals per match, Suarez has the following goal probability distribution in a match:

There is about 30% of him not scoring in a randomly selected match, around 36% of scoring exactly once and almost 9% of scoring a hat-trick (actually exactly 3 goals and not more!). Furthermore, if we were to look at the number of goals Suarez was likely to score in 19 matches, again using the average rate of 1.21 goals per match, the probability distribution would now look like the following:

Notice how the probability distribution approaches the bell-shaped Normal distribution as the sample size i.e. the number of matches increases. This is a result of what is known in statistics as the Central Limit Theorem. A very important takeaway from the above graph is that there is a lot of uncertainty in the actual number of goals scored. In fact scoring exactly 23 goals in 19 appearances – as he has done so far in the Premier League – has a probability of just 8.3%.

So let’s now turn our attention to Sergio Aguero, too. He scores at an average rate of 0.88 goals per match so if we were to add his goal probability distribution for a single match next to Suarez’s it would look like this:

Since Aguero has a lower scoring rate, his probability mass is shifted towards the lower number of goals compared to Suarez’s. He has more than 41% chance of ending the match without a goal and less than 5% chance of getting exactly 3 goals.

Looking at these players over a wider sample, it would be unfair to compare the distribution of goals in Aguero’s 17 appearances with Suarez’s 19 appearances so let’s see how those two distributions would look like (given their respective average scoring rates per match) if both had played 19 matches:

Evidently even though Suarez is generally expected to score a higher number of goals in those 19 matches, it is possible for Aguero to outscore him.

It’s therefore important, before jumping into any conclusions about the relative levels of scoring ability between the two players to take into consideration this uncertainty that is demonstrated by the two distributions. To compare these scoring records we can use statistical significance tests which allow us to decide whether there is significant difference between scoring rates or whether any observed differences could simply be attributed to randomness.

To carry out a significance test, a null hypothesis is first formulated. In this example, the null hypothesis is that of no difference between the scoring rates of the two players. Then test-statistics are calculated, which in turn result in what is known as a “p-value”. To cut a long story short, and to spare the few readers still following me the statistical theory behind this, we can compare the p-value with small probabilities (usually 5% or 1% which are the chosen significance levels). If the p-value is a very small number – smaller than the significance level – then there is evidence in the data that leads us to reject the null hypothesis. If the p-value is large then the statistical test has failed to reject the null hypothesis, or in this particular case that the two scoring rates could in fact be equal and any observed difference is simply down to randomness.

Interestingly enough, comparing Poisson means is not as widespread as means of other distributions but a widely used test is the conditional test developed by Przyborowski and Wilenski (1940). As I don’t want you to be scared away from the maths expression, I’ll jump straight to the p-value result! (But for those of you interested you can scroll at the end for the algebraic expression and then back to continue.)

Applying the conditional test to the data at the start of the article (23 goals in 19 matches against 15 goals in 17 matches) results in a p-value of ……. 42.8%!

What if we completely ignored Aguero’s 2 substitute appearances and used a scoring rate of 15 goals in 15 appearances? Well, as the sample scoring rate approaches Suarez’s, the p-value in fact increases (p-value = 68.4%).

As these p-values are very large and greater than 5% in any case, these results suggest that there is absolutely no evidence in the data that Suarez is scoring at a truly higher rate than Aguero. This may surprise a lot of readers, but it demonstrates the effect that uncertainty can have on estimates, and highlights the degree of care needed to apply when interpreting differences in sample data. We can substitute goals for shots, interceptions, tackles or whatever other metric we decide on; or we can use per90 data rather than unstandardized figures, but the exact same theory applies.

Given the recent boom in football analytics, it’s therefore imperative to account for uncertainty when publishing results, otherwise a lot of the findings will not stand (statistical) scrutiny. What’s more is that, by ignoring randomness and sample-size effects, the analytics community may be considered as lacking credibility, something which would not be desirable at all given the constant push for being acknowledged as worthy contributors in the football world.

So, without further ado, who is Suarez’s scoring rate (statistically) significantly better than? I’ve compared his rate against all players in the Premier League who have scored at least 7 goals. As I didn’t have their actual minutes played, I’ve used their number of appearances either as a starter or as a substitute. After all, this is not really an analysis piece but rather illustrating a point.  The following table shows the p-value of the conditional test of each player’s scoring rate against Suarez’s:

Red highlighted p-values show a total of 10 players whose scoring rate is not significantly different to Suarez at the 1% level of significance. So there you have it, some food for thought: perhaps surprising but Luis Suarez is no better than Sergio Aguero or even Danny Welbeck in terms of scoring rates, from a statistical perspective!

[Finally, as promised for any interested parties, the p-value of this test is given by:

where c1, c2 are the observed events (in the players’ example above, the number of goals) out of t1, t2 time periods (i.e. the number of appearances). c is defined as the sum of c1 and c2 while t is the sum of t1 and t2.]

Reference:

Przyborowski  J., Wilenski  H. (1940) Homogeneity of results in testing samples from Poisson series, Biometrika 31, 313-323

To be honest, it’s too complicated for me:)

• http://statsbomb thez9zon

This is the first post I haven’t enjoyed, way too complicated.

• Hussain Dzan

Great work, Constantinos!

Love the piece! Could I trouble you for per/90 min analysis?

Thanks from Kuala Lumpur.

• Constantinos Chappas

Thanks. Unfortunately I don’t have the necessary per90 minute data for the analysis. The idea behind the article however was not in the actual analytical results but rather in making the point that uncertainty should be considered when doing such analysis.

• Ben

Loved this piece, Constantinos.

I was just wondering how you generated the probability distributions. Did you assume their current GPG was the mean value, or did you use a more sophisticated expG per shot calculation?

Thanks

• Constantinos Chappas

Thanks for the kind comment, Ben.

To generate the distribution I simply assumed that their current GPG was the mean value. That in itself is of course debatable whether it’s the right number to use, as is my treatment of starting and substitute appearances the same, or the use of appearances rather than standardized per90 matches, but the point I was trying to make was not in terms of accurate predictions but rather on understanding uncertainty and sample-size effects.

Perhaps once quantities like ExpG values, shot locations, opponents’ strength, etc have been taken into account the goal probability distributions for each player will differ. But what will remain is the uncertainty bit. And that was the point I was trying to highlight.

• http://highperformancestats.blogspot.com hpstats

Working out a population (actual) mean from a sample (observed) value is another statistical problem that I think needs addressing more in sports analytics. If a player scores a hattrick in the first game of the season, you’d be wary of saying that he will score 3 goals a game going forward. The latest post on my blog discusses finding actual values from samples. In turn, my post was written after seeing this excellent NHL post: http://www.sbnation.com/nhl/2013/12/16/5215112/toronto-maple-leafs-stats-shooting-talent

• Constantinos Chappas

Two interesting articles which I will definitely have a look at. Thanks.

I wholeheartedly agree on the importance of choosing a reasonable estimate for populations parameters. And I wouldn’t label working out estimates from the very first match of the season as “reasonable”!

• Simon Cavalini

Great work Constantinos

Based on your twitter conversations (which I read with great interest btw) and your math behind the ExpG(2) models I was kind of expecting some articles from a more statistical point of view and I like it.

Maybe if you take a 5% alpha in the last table, your point gets illustrated better, but that’s just a little note on the side. I don’t have the time to do it right now, but it could be nice to calculate the two scoring rates in between which both Suarez and Agüero can be in case the have the same scoring rate with the current output. As the season (/years) continues one can see how this develops.

Anyway keep up the good work.

Cheers

• Constantinos Chappas

I, too, would have liked to contribute more articles like this but (a) time is often extremely limited and (b) I’m not sure how many people would actually be interested in this kind of pieces. As a result sometimes Twitter becomes a good compromise!

Using alpha = 5% would also be an option as you suggest. As for the confidence intervals, that I guess, could also be done, while considering (a) and (b) above … However, as more years of data are considered so that sample size increases, we’d be faced with a more general and harder (?) to prove assumption of whether scoring rates remain constant through time. But that’s another can of worms!

• John

Nice, expected a lot worse after the warnings regarding difficulty on twitter.

Would the conclusion really be that surprising to a lot of people as you say it is? Even without much knowledge about statistics it is understandable that there’s a lot of uncertainty regarding a variable as rare as goals in a sample of only 19 matches isn’t it?

• Constantinos Chappas

Perhaps you are right John, as you say regarding the infrequent nature of goals. At the same time, similar considerations should also be given to other metrics which are more frequent but are governed by uncertainty too, such as shots, tackles etc. Thanks for the comment.

• Seb

Very interesting article. I’m very interested in football analytics and love reading these articles, but I guess this article shows how difficult the game is to analyse and draw *meaningful* conclusions from.

My statistical knowledge is limited, and I read the articles looking for statistical insight into player styles in the hope that it informs my knowledge, interpretation and indeed enjoyment of the game. The articles from the community definitely achieve that, but I’m wondering just how important or achievable, statistical significance is in football?

I realise it’s not necessarily the point of this article but if something as clear cut as goals and appearances can’t produce true statistical significance between Suarez and Yaya Toure (players in two different positions!), then what hope do other metrics and statistical tests have. Football is a low scoring game of very fine margins that exhibits a lot of variation/randomness, between 22 different individuals. I would expect the majority of metrics will produce data where only the most obvious differences are statistically significant. The whole point is to help us tease out the finer differences between players (not the obvious ones!) and teams where it isn’t obvious, no?

• bozz

My head hurts after reading this but it’s a good hurt. Great work.

• Jake

This is very clear work and very important to anyone who works with and/or reads about statistics.

I think it’s also important to note that we cannot conclude Suarez is a superior scorer based solely on this season of data. We often have more information that we can use to construct a reasonable prior for each player.

• Nikhil

Great article. Where were you during my undergrad years?! 🙂

There is definitely a dearth of this type of understanding of the numbers in prevailing debates on efficacy of players.

• Constantinos Chappas

Thanks 🙂 I was probably doing my undergraduate too!

• Hussain Dzan

Mr. Chappas,