Part 1: distributions of goalscoring stats.

In the second instalment of my Distributions series (for book & movie rights, contact me on the address below) I look at central midfielders. Considerable versatility is required in this position, and so I will look at a larger number of stats than usual, six in total: passing accuracy, key passes, tackles, interceptions, and resistance to opponent’s dribbles and to dispossessions. The choice is arbitrary, and linked to the exclusion from the dataset of #10s, i.e. the advanced central playmakers. Perhaps I’ll treat them separately in a next article; for now I didn’t want them to dull the distributions of the defensive stats.

The dataset includes all central midfielders from the five big leagues since 2009/10. Different seasons are treated separately, so a player active since 2009/10 contributed five records to the dataset (the current season is included). To be included, the player had to spend at least 900 minutes on the pitch over the course of the season. As was the case with the previous piece, this one contains little actual analysis, that is clever things one does with one’s dataset. I realise this makes them a bit dry, but we here at StatsBomb think they are important, in that they are perhaps the first published benchmark for players’ statistical output. Like so:



The vertical line in each histogram marks the 80th percentile, that is the value of the stat beyond which a player outperforms 80% of the field. For each category I have also listed the 10 best player-seasons in the dataset. The reason why I plotted Spain-based players’ interceptions separately is that we suspect that the interception stat is inflated, likely erroneously, for the 2010/11 and (especially) 2011/12 seasons of La Liga.

One question this graph immediately suggested to me was: how strong are the trade-offs between these six facets of midfield play? Unless you are Arturo Vidal, you’re probably not an elite key passer and an elite tackler, and even Vidal doesn’t rank towards the top in, for example, avoiding dispossessions (rather abject 2.53 per90 in 2011/12). So how close to the perfect central midfielder can you get, and who would you resemble? A quick way is to take the
relative performance in all six categories (i.e. what % of the field does worse), and average that. The resulting histogram with the list of top 30 performers is below. Note that I didn’t correct for the Spanish interception bias, so the 2010/11 and 2011/12 La Liga players had a bit of an upper hand. As an Arsenal fan, I’m delighted to see Mikel Arteta, my favourite player, in the second position. As an analyst, I’m compelled to remind you that the choice of the six stats was arbitrary in the first place, and averaging percentiles assumes all six stats are of equal importance.


Somewhat related: comparing central midfielders.

Data collected by  Opta_200px .

  • KC_Gunner

    Very interesting. I also am an Arsenal fan and very much a fan of Mikel Arteta, whom I feel is somewhat under-appreciated by AFC fans at large for what he does to improve the team’s overall player. So yes, that is cool to see him up there, but seeing Xabi Alonso multiple times in that top 30 list is somewhat aggravating, given all the rumors about how close Arsenal were to securing his services from Liverpool a few years back, before Wenger allegedly stuck to one of his stubborn valuations and lost out on him. Oh well.

  • Errorr

    This is nice in confirming 2 strong beliefs I hold. 1. Fernandinho is essential to Man City’s 442 and has been their second best player this season. ( I believe Kompany is 1) 2. Mascherano’s skills are underutilized by Barca but only because his most useful position is occupied by one of the worlds best deep lying midfielders in Busquets ( who isn’t the supreme tackler that Mascherano is but he doesn’t need to be to be the key piece of Barca’s defense). I always believed that Barca was at its best when they could use Mascherano to destroy any ball handler near him and allowing him to gamble knowing Busquets wouldn’t allow anyone past him into the heart of the defense. It forced play wide which also provided the most space for the wingbacks to attack.

    • Marek Kwiatkowski

      Yes, absence of Busquets on this last list tells you everything you need to know about its value as a general ranking. In general, and contrary to some of my rhetoric, extra normalisation for opportunity is necessary before we can take the absolute numbers of defensive interventions as direct evidence of player quality (this point was made by several people on Twitter before and after publication of this post).

      I agree about Fernandinho, but disagree about Kompany. Do you know that Agüero is currently posting the 5th best NPG+A90 rate in the last 5 years in the big 5?

  • Derek

    this is the 2nd person on this website to assume that lots of tackles/ints/other counting stats=better player. why would that be true and why are each of those six categories weighted equally? there needs to be tons more research done on team context and stat context before stuff like this is worth much. right now it’s like counting errors and assists in baseball, it depends on the position you are in and is not a stat that tells you with much detail about the player at all

    • Marek Kwiatkowski

      I agree with this comment. I tried to make this post a bit more exciting to the reader, and this is why I included a couple of value statements, as well as the final list. I’ll be more careful about such additions in the future. For me, this is a descriptive article; the first part tells you how the six traits are distributed, the second how they trade off.

  • Pingback: StatsBomb | Part Two: Has Britain Got Talent? Is A Lack Of Data Holding Back British Football Clubs?()

  • Tuiuan Almeida Veloso

    As a Liverpool fan, loved to see Leiva and Alonso there, two players I always loved and considered elite.

    As a Brazilian who is quite concerned about our chances in the World Cup, I’m quite mad that our manager is too stubborn to not even consider to put Lucas in the Starting XI(even though he’s far better than Luiz Gustavo) and completely shocked when I read in the Brazilian Press these days that the chances of Lucas not being even in the 23 are quite high.

    • Jeff

      Note..Gustavo is #8 on passing % (and ranks only behind Xavi for 12/13 season). That is likely valued more highly by Scolari than the defensive stats that end up weighing in Leiva’s favor on the aggregate histogram, which has four defensive metrics and only two offensive.

      • Jeff

        As someone pretty neutral about the Brazil team, but a fan of Bayern and very thankful of Gustavo’s service there, I have a feeling you would be disappointed if he went out of your lineup.

        • Tuiuan

          Sounds a bit funny, but the only reason the analysts and journalists(who have some access to Scolari) say that Luis Gustavo(and not Fernandinho, Leiva or Hernanes, or other midfielder) plays alongside Paulinho is for pure defensive purposes. And in the games you can clearly see it. He fills the role of the more defensive midfielder who mostly stays back for defensive balance purposes. I think he’s a good player, of course. But we just have better options. Our coach has built an entire reputation of being stubborn and staying with the players he trues wherever an whenever(heck, Júlio Cesar had to go to MLS to play some minutes before the WC and there’s isn’t a shade of doubt that he will be the starting goalkeeper, with Scolari even saying when he was at QPR that he was the starter even if he didn’t played at all before the WC). He has a history of success in the Brazilian National Team, but I can’t say I don’t have any trouble with some of his choices.

Improve Performance and Productivity in Your Club:
State-of-the-art Football Analytics