Yesterday I wrote a piece about Max Kruse’s transfer, which to me encompasses the entire reason why player analytics are important. In that piece, I was harsh on basically every major team in football, mostly for comic effect. I still find it a bit ridiculous that Kruse was bought so cheaply when the statistical case is obvious (well done, Gladbach), but mostly I wanted to show that when people like Arsene Wenger say there are no more values to be found on the market, they are either lying to you or are wrong.
On the other hand, some people are likely to interpret that piece by itself as saying I think stats are the only way to do things. In fact, my actual belief is that using player stats alone to evaluate players in any sport is just stupid. The list of things stats cannot tell you is vast, but here are a few of the important ones off the top of my head
- How fast a player is. Aggregate stats have no idea unless you have access to super-schmancy (and expensive) Prozone data, at which point you actually can see how fast they sprint in game conditions.
- How quick a player is.
- How strong a player is.
- What the manager’s instructions were for that player
- Whether a player works hard or hardly works (though you can tell how much a player actually runs during a match with the afore-mentioned superdata).
- How smart a player is.
- Whether most of a player’s goals come via deflection or luck.
- Whether a player has a bad first touch (the kiss of death in the Premier League and Germany).
The list could go on and on, but you get the point – doing player recruiting by stats alone would be silly and wrong. I am officially a stats guy – I’ve got my membership card and everything – yet this is what I believe. However, doing recruiting just based on scouting is even more wrong, and probably a lot more unreliable and expensive.
All of this is based on the following belief: The worst thing a team can do is sign a dud player for a lot of money.
Doing this once is painful. Doing this multiple times can lead to poor performance, relegation, and bankruptcy. Even for potentially great clubs, it can lead to bloated wage bills and years of mid-table performance while you wait for the contracts of underperforming, unsellable players to expire.
The problem with scouting alone is that there are something like 1500 players alone who played in the big 5 leagues in Europe this season. Add that to 1860 matches per year (again just in the big 5), and your chances of finding anything useful by watching a particular 90 minute match are fairly remote. On the other hand, using statistics allows you to cut down your player pool to only the guys that did not suck. You can also use it to filter for any number of additional factors important to your team, including age, passing success rate, shots on target percentage, etc.
It is expensive to put a football expert in a room or stadium and make them watch football games for hours and hours and hours. It is far less expensive to allow a data geek with a computer to pre-select most of the guys that seem important and to just watch the players that matter. To get the best results, you absolutely need to do both.
There’s a chapter in Nate Silver’s book The Signal and the Noise that talks about the PECOTA system he developed to analyse baseball players and project their future careers based on massive amounts of data. At the end of the chapter, Silver looks at applying PECOTA to minor league prospects vs. what the scouts at Baseball America think and he finds that, while PECOTA itself is pretty good, the scouts actually beat him. The reason for this is that modern scouts in baseball use both analytical models and their knowledge of scouting to project how players will do, while the computer just has data. The combined knowledge of the two disciplines is superior.
Anyway, as shown by yesterday’s Kruse piece, I don’t think most teams are adequately using data in player scouting, or what they are using isn’t flagging the right players quickly enough. One of the things I want to do this summer is examine what stats actually seem to matter, versus which ones are simply muddled or regularly throw up false positives. I only have two years of detailed data to use, and that is only on the big 5 leagues (gathered from OPTA and WhoScored). I would love to be able to analyse the Czech league, Russian PL, Brazil and Eredivisie, but for now I just have to work with the data at hand.
Because transfer season is now upon us, and the progression from rumor to new signing happens in a flash, I’m going to post this now – a rough cut of statistically interesting offensive players (for various reasons) to examine and write about this summer. This way it won’t look like I am writing about guys after the fact, and it gives me time to do the analysis correctly as opposed to rushing it out and hoping it’s good enough. I’m also planning to try to unearth defensive gems, but that will come later in the summer, as figuring out the key performance indicators seems more complicated.
Wissam Ben Yedder