Football Analytics has a learning curve. That's great, because learning is a fun, though occasionally painful process. This summer I did a review of my past work, and there's some cool stuff in there from the early days along with some really boneheaded mistakes. It doesn't matter how smart you are - your work is not going to be perfect when it comes to something new. The trick is simply to get over it and do better next time.
Today, I wanted to talk a little more about what I learned regarding player evaluation while going from zero knowledge in 2013 to running worldwide recruitment for two clubs in 2015. As part of that, I'll introduce the new attacker radars in print for the first time, and I'll talk about three of the most famous players in the world: Neymar, Eden Hazard, and... Andros Townsend?!?
One of the first things you do when looking at a new data set is immediately boil it down to the important stuff and focus on that:
What is correlated with [important stuff?]
What causes [important stuff] to happen?
In football, we care about goals. In fact, for some pundits, that's all they care about. The only number that matters is the score.
Imagine a classroom of ten-year olds talking through the data.
Alright children, today we are going to talk about football. Match of the Day and legendary England striker Alan Shearer said we care about goals more than anything else.
So the first thing we have to ask is, what causes goals?
"Shots, shots cause goals!"
Excellent, Timmy. You're too young to remember, but Alan scored an awful lot of goals back in the day.
Now if we take a step back and say we care about "scoring", which is actually a superset of goals, what else might we care about?
"Assists! Assists are passes that created a goal. They should count too."
Great. Now we have goals and assists. And let's find one more element to look at here - what do exciting players do a lot of when they attack?
"They uh... elbow people in the head?"
I know you like Diego Costa, David, but that wasn't quite what I was going for.
Outstanding Samantha. So lets see if shots, assists, and dribbling are a great start to finding players who score more goals.
It's a bit forced, but this is literally what most people do when they start analysing football, which is great, because it's an excellent, logical process. There's one missing step in here going from assists to key passes, which is the functional equivalent of going from goals to shots, but that's it.
Want to find interesting attackers? Look at shots, key passes, and successful dribbles. Do this and good players start to magically show up at your doorstep.
For instance, take the numbers for these two guys...
We've isolated what we care about in attackers, and these two young guys stick out like sore thumbs. They are similar ages, and even play for bigger clubs in good leagues, so there are no worries about league translation or anything like that. Indicators are that Andros might actually be a slightly better player than Neymar, but they are both very good for their age.
Plot them side by side on the original forward radars and you get this.
Given our earlier conclusions about certain stats driving scoring outcomes, this begs the question...
Looking at this objectively, there might be a flaw in our process. These two players have a lot of similarities in driver stats, but the thing we actually care about - scoring - is massively different. Were either of the players lucky/unlucky in their output? Is it a teammate problem? A coach problem? You can think of a million different possible reasons why scoring might be different, but guessing is unacceptable.
So we now go back to the drawing board to find more clarity. There are lots of ways to do this, but one of the simplest, most effective ways of going about it stems from one of the most important lessons you learn as a data scientist.
Always plot your data.
Here we take locational data for shots and add it to the MK Shot Map format... and you get this.
(click to embiggen. Made with Opta data)
It's as if someone put a force field around the danger zone shooting ring for Townsend, and he's not allowed to have the ball in that area. Meanwhile, almost every shot Neymar takes is from prime real estate.
The reason for potential problem we flagged up earlier immediately becomes clear.
So using numbers and visualizations, we have gone through a three-step advancement in the player evaluation process.
Step 1: These are numbers we care about. Let's look at those and see what happens.
Step 2: Visualizing them on the radar charts while normalizing them for the population shows that we might have a hole in our basic process. Was Townsend unlucky not to score from all those shots? How do we get more clarity on this?
Step 3: Visualizing the data on shot maps makes the problem crystal clear. Neymar takes great shots. Andros takes terrible shots. In fact, Neymar's expectation of scoring on an average shot is more than five times greater than Townsend's. This in turn has an absolutely massive impact on their probability of scoring a goal from any particular shot.
Other Holes in the Process - The Eden Hazard Problem
Obviously with attackers we care about scoring, but what about players we know from watching have a huge impact on the game, but for whatever reason don't show up very well in traditional scoring stats?
To put it another way, how do you find players like Eden Hazard? Hazard might have been the best attacker in the Premier League in 14-15, but his scoring stats weren't close to overwhelming.
What can we do to tease out more data and find elite players who don't always directly contribute to goals or assists?
For me, the answer was to take another step back in the process. We look at key passes and shots and they matter, but what about the ability to generate successful touches inside the box? And since football is fundamentally a passing game, what about players who are able to make successful passes into the penalty box, which might be one of the rarest skills in the game? So I created two new metrics:
- PINTO = Successful passes in TO the box
- TINDA = Successful touches inside 'DA box.
It turns out when you start to isolate players by this particular combination of skills, you get a useful additional perspective on players who contribute to scoring, both directly and indirectly.
Thus a new format of attacker radars was born.
I called the new template "predictive" because at this point in my head, I was thinking of the old template as "narrative." The new template took a step back from narrative stats about what happened (goals, assists, goal conversion, etc), and started to use a few of the advanced, more predictive measures we'd developed since I created the early versions.
The new format more clearly illustrates what a monstrously talented creative player Eden Hazard was that season compared to the population of attackers.
(Note: OP stands for 'Open Play' which I get asked constantly on Twitter)
Finally, circling back to our initial comparison, this is what those Townsend and Neymar seasons look like on the new template.
Learning how to use football data better is a process, but it's a worthwhile and rewarding one. The new radar format came about from continually asking questions on how to analyse the data better. Can we iterate and improve on old metrics?
The old format was good as a starting point, but the new format shows player value much more clearly. It also contains years of work and improved understanding about how both the data and the game operate.
It's also worth noting that even this "new" tech is 18 months old. If you are a club and interested in seeing some of the new stuff we've developed in the intervening months, drop me an email at firstname.lastname@example.org.
The latest tech is both cool and extremely useful in helping your club make better decisions, both on the pitch and off.