Statistical Scouting Young Super Stars
Long-time followers know that I spent much of last summer sifting through young player stats in an attempt to spot potential gems before they became stars. I’m happy with my analysis from that period, but this summer I wanted to upgrade the process. Instead of scouting via stats, guidelines, and common sense, this time I wanted to construct some basic models. I’ve got access to many more seasons worth of data now, so I can back test model output from up to five years ago and see how it fared.
Why would you want to do this?
Scouting time is expensive. Even if you aren’t paying scouts more than minimum wage (or hell, as Michael Calvin details in The Nowhere Men, some scouts barely get expenses), time costs money and so does travel. That’s where statistical scouting comes in.
Scouting via stats is not an effort to reduce scout jobs or eliminate scouting itself – I would never sign a player without having watched plenty of game film on how he plays. However, statistical scouting is an effort to identify targets quickly and efficiently, as well as to try and find ways to suss out future stars before most people even know about them.
So that’s what I’ve been working on. By taking a big database full of Opta stats from the major leagues over the last five seasons, I wanted to see if I could find some sort of predictive cocktails that would shake all the numbers together and spit out future stars.
I’m only going to write about attackers for now because those are the most interesting players to read about and they are also the ones that are easiest to analyse with numbers. I am also developing models that scout other positions, and I’ll be writing about those at some point, but attackers should keep me busy for quite a while. These models have been designed at the theoretical level first, based on outputs the community has discovered are actually important for winning matches.
One of the big things teams will care about is the hit rate any scouting model has. Daniel Altman wrote about this in detail over at BSports, but in order to properly test any predictive models against the real world, you not only need to figure out your hit rate, but you also need to determine how often the model returns false positives (transfer duds). Another thing to keep in mind is the frequency of targets your model returns. Ideally, you’d like to have a large group of potential stars to choose from instead of simply returning five guys who are guaranteed to be gold every season.
The current hit rate on the tight version of the model output for superstars is around 70%. 15% of the guys it recommends have been duds over the first three years of testing, and then the other 15% have been useful players, but not great. There are other versions of the model that return more potential targets, but it comes at the cost of delivering more young duds as well.
If you were a Director of Football, would a model that gave you 30 names each season between 18 and 23 years old that were 70% likely to develop into stars interest you? It should!
“Blah Blah WHATever…”
That’s a perfectly fair reaction. You just have a guy who says he has a model that uncovers future superstars at a very strong rate, but it’s a black box with no additional detail. The reason for this is, once the methodology is out there, everyone will have it. This type of information loses its edge if it’s public info. The various scouting models are also still in development for all positions, which means I’m going to sit on this for a while and see what happens. I can talk about the outputs, but not about the process.
However, what I can do today is produce a subset of hits and misses for the back testing of 2010 season, plus the ages those players were at the time so that you can get an idea of why I think this is fairly cool stuff. Ages listed are how old the player would have been in June of 2010.
Obviously a number of those names were in decent teams at the time, but quite a few of them weren’t. Imagine if you could have bought Gareth Bale in the summer of 2010! Or Diego Costa and Marco Reus before they exploded? There are some misses in there as well – guys who disappeared into lesser leagues or never realized their potential - but the hit rate at a young age, when players tend to be much tougher to project overall, is surprisingly good.
Starting tomorrow and throughout the rest of the summer, I will work to profile the names that the model delivered for 2014. In the meantime, however, here are some summer 2013 names that you might find interesting.
Koke, Gotze, Shaqiri, De Bruyne, Grenier, Lamela, Canales, Draxler, Johannes Geis, El Shaarawy, Ljajic, Coutinho, Mattia Destro, Maximilian Beister, Nelson Oliveira, Lass, Romelu Lukaku.
Thanks for listening.
P.S. The guy at the top of this article is one of the top targets for 2014.