calhanoglu

Long-time followers know that I spent much of last summer sifting through young player stats in an attempt to spot potential gems before they became stars. I’m happy with my analysis from that period, but this summer I wanted to upgrade the process. Instead of scouting via stats, guidelines, and common sense, this time I wanted to construct some basic models. I’ve got access to many more seasons worth of data now, so I can back test model output from up to five years ago and see how it fared.

Why would you want to do this?
Scouting time is expensive. Even if you aren’t paying scouts more than minimum wage (or hell, as Michael Calvin details in The Nowhere Men, some scouts barely get expenses), time costs money and so does travel. That’s where statistical scouting comes in.

Scouting via stats is not an effort to reduce scout jobs or eliminate scouting itself – I would never sign a player without having watched plenty of game film on how he plays. However, statistical scouting is an effort to identify targets quickly and efficiently, as well as to try and find ways to suss out future stars before most people even know about them.

So that’s what I’ve been working on. By taking a big database full of Opta stats from the major leagues over the last five seasons, I wanted to see if I could find some sort of predictive cocktails that would shake all the numbers together and spit out future stars.

I’m only going to write about attackers for now because those are the most interesting players to read about and they are also the ones that are easiest to analyse with numbers. I am also developing models that scout other positions, and I’ll be writing about those at some point, but attackers should keep me busy for quite a while. These models have been designed at the theoretical level first, based on outputs the community has discovered are actually important for winning matches.

Hit Rate
One of the big things teams will care about is the hit rate any scouting model has. Daniel Altman wrote about this in detail over at BSports, but in order to properly test any predictive models against the real world, you not only need to figure out your hit rate, but you also need to determine how often the model returns false positives (transfer duds). Another thing to keep in mind is the frequency of targets your model returns. Ideally, you’d like to have a large group of potential stars to choose from instead of simply returning five guys who are guaranteed to be gold every season.

The current hit rate on the tight version of the model output for superstars is around 70%. 15% of the guys it recommends have been duds over the first three years of testing, and then the other 15% have been useful players, but not great. There are other versions of the model that return more potential targets, but it comes at the cost of delivering more young duds as well.

If you were a Director of Football, would a model that gave you 30 names each season between 18 and 23 years old that were 70% likely to develop into stars interest you? It should!

“Blah Blah WHATever…”

That’s a perfectly fair reaction. You just have a guy who says he has a model that uncovers future superstars at a very strong rate, but it’s a black box with no additional detail. The reason for this is, once the methodology is out there, everyone will have it. This type of information loses its edge if it’s public info. The various scouting models are also still in development for all positions, which means I’m going to sit on this for a while and see what happens. I can talk about the outputs, but not about the process.

However, what I can do today is produce a subset of hits and misses for the back testing of 2010 season, plus the ages those players were at the time so that you can get an idea of why I think this is fairly cool stuff. Ages listed are how old the player would have been in June of 2010.

2010_Scout_Model_Subset

Obviously a number of those names were in decent teams at the time, but quite a few of them weren’t. Imagine if you could have bought Gareth Bale in the summer of 2010! Or Diego Costa and Marco Reus before they exploded? There are some misses in there as well – guys who disappeared into lesser leagues or never realized their potential – but the hit rate at a young age, when players tend to be much tougher to project overall, is surprisingly good.

Starting tomorrow and throughout the rest of the summer, I will work to profile the names that the model delivered for 2014. In the meantime, however, here are some summer 2013 names that you might find interesting.

Koke, Gotze, Shaqiri, De Bruyne, Grenier, Lamela, Canales, Draxler, Johannes Geis, El Shaarawy, Ljajic, Coutinho, Mattia Destro, Maximilian Beister, Nelson Oliveira, Lass, Romelu Lukaku.

Thanks for listening.

–TK

 

P.S. The guy at the top of this article is one of the top targets for 2014.

Opta_200px

  • toshack

    Bring it on Ted!
    I’m ready for tomorrow… 🙂

  • ballsmcgee

    Thought Salvio looked pretty good when I saw him in Europa this year. And that’s after an ACL tear in August of 2013. Benfica also spent a good chunk(€13.5 million) on him and they seemingly never buy anyone for big money. Not sure if that was before or after he triggered the predictive model. I suppose they are being evaluated on a Reus, Bale, Alexis level of stardom, though.

    • tknutso

      Yeah, it’s tricky to pick exactly what is a hit and what is a miss. Chelsea fans frown at Marko Marin, but the guy has produced good numbers every single place he goes, including 3G, 9A the year after he shows up in 2010. I try to be as honest as possible in the evaluation.

  • Mark

    Good stuff, but the inclusion of Abel Hernandez as a hit is a little lenient. Check out his np per 90 goal numbers; one every 220 minutes in Serie B at age 23 is nothing to write home about, and while 12-13 and 11-12 (both in Serie A) are better, they still don’t really justify the constant rumors with clubs like Arsenal. And 4 of his 7 Uruguay goals came in an 8-0 beatdown of Tahiti – he hasn’t otherwise scored for the national team since 2011.

    • tknutso

      That’s fair. I don’t have NPG numbers for Serie B and found about the Tahiti issue after publication. I would grade him a miss now instead. Basically no change on percentages over the 4 years of back testing though.

      • Mark

        Oh, I agree. This is really cool either way! Keep it up

      • Mark

        Also, a question: will you be posting reports about current young players you think have this star potential – I think those types of articles would be popular, and I’d certainly be reading them

  • http://cravencottagenewsround.wordpress.com Rich

    So is this a possible area for teams to get talent on the cheap?
    Julian Scheiber had big potential but hasn’t quite pushed on. Transfermarkt now has him at <£2m.
    Assuming due diligence doesn't throw up something weird, is this an opportunity for an up and coming team without a huge budge to potentially find a star?