(All data for the models is powered by Opta.)

As many of you know, I’ve been working on a statistical scouting model. The theory behind is this that by using a mixture of key performance indicators, you can unearth players that teams should be interested in watching and potentially signing, while making scouting more efficient, saving time and money in the process.

The background information for the attacking player system can be found here, and a data dump of the 2014 output can be found here.

Now I think the attacking player scout is really quite good. Backtested over four seasons, it seems to turn out near superstar players about 70% of the time, and unfortunate duds at a 15%. This falls in line with the fact that we understand attacking output fairly well, so it’s easy to pick them up with stats.

But what about midfielders?

I can tell you for a fact, scouting midfielders via stats is much harder. Thus far it’s been hard enough that some of my more skeptical friends don’t think this type of scouting is possible. However, this is the type of feedback that I tend to use as encouragement to persevere, so that’s what I did. This model is only a prototype right now, but early results are fairly promising.

Here is the list of players the model flagged after the 2010 season. The midfield scout is set to pick up players up to a year older than the attacking scout, mostly because midfielders have a longer age curve than attackers.

CMDM_2010

On first glance this looks like a mixed bag, but I think the output is really promising. Regardless of what you think of Alex Song now, Alex Song in 2010 was one of the most promising young mids on the planet, and eventually moved to Barcelona. Also present are Ramsey, Lucas, Mikel, Mark Noble (underrated), and uh… Denilson (can’t win ‘em all). Three Leverkusen guys show up from Germany, one of whom is now considered one of the best all-around midfielders in the world, and the other who is merely one of the best in Germany (Gonzalo Castro). Banega and Busquets are the two that show up from Spain, while you get Matuidi, Cigarini, and Marchisio out of France and Italy.

It’s an imperfect list, but it’s also a good start.

The other backtested years actually seem to improve on the overall results. In the big 5 leagues, a little over 50% of the names this model picks up become stars. About 65% are  at least good or very good, while the outright failure rate is around 15%. Again, this is taking output of younger players and saying “we think these guys either already are or will become very good.” The failure rate on transfers as a whole is 55-60%, so a prototype that picks out stars near that rate, with a failure rate of 15% is very strong.

Much like with the attacking player model, with the midfielders I am going to be open with the players the model picked up for the 2013-14 season. I definitely won’t have time to write about them, but if you are interested and want to do so, feel free. Making these public also provides a testable record to look back on over the next few seasons to see who developed into good players and who didn’t make it.

I’m also including Eredivisie picks here with one caveat. I only have one season of data for the Eredivisie, so this isn’t backtested at all. My hope is that it still works with some minor tweaks, but without data to go by, it is only a hope and not something backed up by data analysis (right now).

EPL
EPL_CMDM

Bundesliga
Bundes_CMDM

La Liga
Laliga_CMDM

Serie A
SerieA_CMDM

Ligue 1
Ligue1_CMDM

Eredivisie
Erediv_CMDM

Anyway, I hope you guys are enjoying this series. You’ll see radars of these players gradually appear on my Twitter timeline over the coming weeks. I have to admit, despite watching a ton of football, I know nothing about the vast majority of them. If you have thoughts or comments, feel free to leave them here, or hit me up on Twitter.

If you work for a club and are interested in discussing the statistical scouting models I have been developing, feel free to send me an email at mixedknuts at gmail.

Cheers,

–TK

Opta_200px

  • Daniel Andersson

    So Toni Kroos might become very good, huh?

    Ok so maybe this works ok, but scouting defenders by statistics is definitely impossible! 🙂 Just kidding

    Seriously though, looks intriguing and will be very interesting to follow up. Is there any ranking (or p-values of some sort) between them? Should also be a sanity check that those players that already are established as top players get a high ranking/statistical significance.

    Cheers

  • Enrico

    Mustafi is a defender, and there is only one Jorginho in Seria A.

    • tknutso

      A) Defenders sometimes show up in the defensive mid model.
      B) I just go by what the database tells me re: names.

  • Errorr

    Always been a big fan of Onazi and he had a pretty good showing at the WC.

    Jorginho, Xhaka, McCarthy, and Taider are all in that solid potential midfielders but I don’t have any sense of how to compare them or which should be rated better although I think Taider might be the one that was most impressive when I’ve seen him.

  • craig

    Do you have any weighting system to account for differences between leagues? For example the German league produces far higher dribbling numbers than the other major leagues. Scoring in the Eredivisie isn’t nearly as difficult as scoring in England… in fact most Eredivisie attacking stats are easier to accrue. I thought of this looking at your attacking scout and the very high number of Eredvisie players it returned.

    Maybe a league based multiplier (different for each stat) like the Golden Boot uses?

    • tknutso

      There isn’t a scientific way to adjust the numbers between the leagues right now. Once reason Erediv players show up in higher numbers is because many teams in that league have an explicit emphasis on playing young players. Also Chelsea have stashed almost half their top talent there and MANY show up in the list.

      However, I think as a team one thing you do is take all of this into account when buying. You have a higher risk buying guys from France or Holland, so you make a point of paying considerably lower prices. Art, not science unless we get a LOT more data

  • seb

    Can you explain how you decide whether a player is good value or not. While it’s not something you really talk about here specifically, it is something that is very important to the overall scouting process.

    I’m of the opinion that, ultimately a players’ contract and valuation is more or less meaningless to us the general public. The precise details of the contract and even the exact sum of a transfer fee are usually unclear, let alone the valuation of a player’s impact on a club’s finances after his signing (e.g. the apparent astronomical numbers of shirt sales for the big name transfers at Real Madrid for example). Surely the only people to know the true value of a signing are those very high up at a club?

    I haven’t worded that very well, but hopefully you know what I mean…

Improve Performance and Productivity in Your Club:
State-of-the-art Football Analytics