Part 1: distributions of goalscoring stats.
In the second instalment of my Distributions series (for book & movie rights, contact me on the address below) I look at central midfielders. Considerable versatility is required in this position, and so I will look at a larger number of stats than usual, six in total: passing accuracy, key passes, tackles, interceptions, and resistance to opponent's dribbles and to dispossessions. The choice is arbitrary, and linked to the exclusion from the dataset of #10s, i.e. the advanced central playmakers. Perhaps I'll treat them separately in a next article; for now I didn't want them to dull the distributions of the defensive stats.
The dataset includes all central midfielders from the five big leagues since 2009/10. Different seasons are treated separately, so a player active since 2009/10 contributed five records to the dataset (the current season is included). To be included, the player had to spend at least 900 minutes on the pitch over the course of the season. As was the case with the previous piece, this one contains little actual analysis, that is clever things one does with one's dataset. I realise this makes them a bit dry, but we here at StatsBomb think they are important, in that they are perhaps the first published benchmark for players' statistical output. Like so:
The vertical line in each histogram marks the 80th percentile, that is the value of the stat beyond which a player outperforms 80% of the field. For each category I have also listed the 10 best player-seasons in the dataset. The reason why I plotted Spain-based players' interceptions separately is that we suspect that the interception stat is inflated, likely erroneously, for the 2010/11 and (especially) 2011/12 seasons of La Liga.
One question this graph immediately suggested to me was: how strong are the trade-offs between these six facets of midfield play? Unless you are Arturo Vidal, you're probably not an elite key passer and an elite tackler, and even Vidal doesn't rank towards the top in, for example, avoiding dispossessions (rather abject 2.53 per90 in 2011/12). So how close to the perfect central midfielder can you get, and who would you resemble? A quick way is to take the relative performance in all six categories (i.e. what % of the field does worse), and average that. The resulting histogram with the list of top 30 performers is below. Note that I didn't correct for the Spanish interception bias, so the 2010/11 and 2011/12 La Liga players had a bit of an upper hand. As an Arsenal fan, I'm delighted to see Mikel Arteta, my favourite player, in the second position. As an analyst, I'm compelled to remind you that the choice of the six stats was arbitrary in the first place, and averaging percentiles assumes all six stats are of equal importance.
Somewhat related: comparing central midfielders.
Data collected by Opta.