Note: This was originally given as a presentation at the Science + Football conference in May 2016. After a great many requests, it has been lazily converted into the article below.
In the beginning… Cavemen scouts watched cavemen footballers live and in the flesh.
Then… Cavemen invented television, and cavemen scouts could watch cavemen footballers on video.
Now… Cavemen have invented computers. And spreadsheets. And air conditioning. New, useful tools in evaluating cavemen footballers in a search for the best, brightest, and undervalued.
What Did Video Bring To Scouting?
It’s cheaper. Look at a year of video provider service (like WyScout or Instat) compared to a year of travel budget to put scouts at live matches.
Scouts now have easy access to more players and leagues than ever before.
Finally, and most importantly… it’s all on demand. Whenever your scouts need to watch footage of players, they can. No more of this waiting for actual football to be played in front of you while you sit in cold, rainy stands on a Tuesday in Stoke nonsense.
What Does Data Bring to Scouting?
Instead of evaluating hundreds of players a year, you can profile tens of thousands.
Of those players, you get every minute played and every event they were involved in.
Once the infrastructure is created, costs are whatever your data costs are.
DIFFERENT, OBJECTIVE DATA than traditional scouting reports.
Plenty of people talk about the “data revolution” in sports. For me, it’s just another step and another set of tools to use in the evolving recruitment landscape.
How Does It Work Inside a Club?
With a small recruitment team of 2 stats + 6 part-time scouts, we evaluated over 1000 players in a year for the first teams of Brentford and Midtjylland.
Yes, but were you successful?
This is the most important factor, and obviously it depends on how you look at it. After a disastrous start in the first 9 games due to a poor manager choice, Brentford earned points at nearly a playoff pace, despite awful injuries in the first half of the season. The team also lead the league in goals scored and avoided an FFP-related transfer embargo.
And most importantly, they did it with one of the lowest wage budgets in the league and a £10-11m transfer fee surplus* in the year we were involved in recruitment. I’m going to notch that up as success, while admitting that at the start of the season, I was hoping for promotion just like the owner and every other Brentford fan out there.
*My estimate, not gospel truth.
Not Scouting Players Live is Ridiculous!
I get this sentiment a lot, both from fans and even from smart people who work inside of football. My perspective is like this
By not spending time and money sending scouts to watch (many) live matches, we are able to watch a much greater volume of matches and more players. Travel time is a resource cost and it can be significant.
Also by not sending scouts to watch live matches, we cut out an enormous source of cognitive bias.
Take this quote for example: “I can learn more about a player in 20 minutes in the stands than I can by watching him in hours of video footage.”
For this to be true, you either have to be some magical savant of player evaluation or you are full of shit. Conservatively, I would say 99% of people who feel this way have to be the latter.
This is a HUGE problem when it comes to scouting. If 4-5 games worth of video scouting work can be wiped away by sending one guy to the stadium, your process has a huge flaw, and likely so does your scout. It’s a correctable flaw, but if this type of thing is what decides whether you sign a player or not, good luck to you.
Yes, you need information about what the player is like with their coaches and teammates. That’s what personality profiles and background checks are for. Thinking you can get all of that information based on seeing a player once or twice in person is a myth the scouting world has sold itself.
Do I think there is potential value in watching certain positions live? Yes.
And some leagues have no useable video, so old-fashioned, in-person scouting is the only way to get the job done. In general though, I would do as much video work as possible in any club that I work in for the reasons explained above.
The Story of Radamel Falcao
And all of this was above and beyond the ACL injury, which for a 28 year old striker is a real concern. Loan fee: Unknown (Rumored to be 5-10M a season) Wages: Massive (Rumoured to be 13M+ a season) Outcome: 4 goals and 4 Assists at Manchester United. 1 Goal, 0 assists at Chelsea.
Lesson: Even if the ONLY THING adding data to your recruitment does is save you from a couple of bad decisions every year, they still pay for themselves. (In this case by saving something like 20M in wasted money on Falcao.)
Nate Silver and PECOTA
PECOTA = Statistical scouting system for baseball.
It was a very good system for evaluating Major League talent at the time.
In 2006, Silver started applying it to minor league prospects. At that time, the only competition was from the scouts.
In 2011, he compared PECOTA projections to the scouts…
Scout forecasts were 15% better than PECOTA’s predictions, resulting in $+336M worth of extra wins during that period.
Reasoning behind this was two-fold. First, the scouts started to use basic numbers more to inform their scouting. Second, scouts have far more than just statistics at their fingers to evaluate prospects. They know whether guys are smart. Or overweight. Or hard working. Or any number of other things PECOTA did not know.
Nate’s conclusion: “The only way a purely stat-based list should be able to beat a hybrid list is if the biases introduced by the [scouting] process are so strong that they overwhelm the benefit.”
Building a Hybrid Process
Names come in from any source – agents, coaches, players, scouts, numbers… whatever
Initial check on Age/Passport/Realism/Need
Stats check in leagues where you have stats – initial quick video scout in leagues where you don’t
If players pass that, they go into the proper scouting queue. The player also triggers a detailed stat check.
Scouting report comes back. Combined recruitment group ranks players for positions based on clear guidelines from head coach.
List with executive summaries + video delivered to coaches.
Scouting Biases
What follow are actual quotes from scouts I have collated by networking with other football people for the last couple of years.
“Six months in the devs and maybe he’ll be good enough.” This player went on to a big 5 move after being voted best player in his league
“Poor tackler.” This player lead his league in tackling.
“Couldn’t play in a midfield 2.” This player was the best DM in his league the year before. Then he was the best AM in his league the next season.
“Plays in a shit league.” Quants have solid algorithms that tell which leagues are good/bad. This league was actually solid.
“Few Shots. Snatches at the ball. Not a finisher.” Finished with an NPG90 of .9 for the season, and shots90 above 4. Whatever he was doing seemed to put the ball in the back of the net a surprising amount.
“Not enough heart for a [REDACTED] player.” Sadly, we have no algorithm that directly measures heart. Also, I’m not sure how you argue back about this in a recruitment meeting without being extremely sarcastic. Thankfully, I wasn't involved.
So if this is the general environment you are walking into, what do you do?
Profile Your Scouts
Scouting reports are DATA! Data is meant to be analysed.
Go back through past work and review it.
If using numbers, what are the average scores for each scout? For each position? What if some scouts never find a centerback or fullback that is deemed good enough to play in your team? You need to know that and adjust for it.
Are they biased:
Toward young players?
Against foreigners?
Just don’t understand the football in certain leagues? If the style of football in certain leagues differs a lot from what a scout is used to, they can have problems evaluating players in those leagues.
Data analysis is complicated and hard. Scouting is also complicated and hard, but uses an entirely different set of skills. One club I know of has scouts in 37 different countries, speaking more than 10 different native languages. Standarizing output across that organization simply based on linguistic differences is nearly impossible.
Standardize Your Scouts
Scouts are not just plug and play. Scouting is hard work and scouts from different clubs and backgrounds might be looking for totally different things in players than your club requires.
Scouts need training, and clear guidelines on what to look for when evaluating players.
Scouts may need training across multiple leagues and styles of football.
Youth scouts need even MORE guidance, since there are a ton of pitfalls regarding relative age and growth spurts they need to work around.
Lesson: You need to standardize traditional scouting output as much as possible to make better decisions and so that even your scouting output is consistent.
Conclusions
Increased use of stats and data is just a part of the long-term evolution of the recruitment landscape.
Talented scouts are hard to find and will remain extremely valuable members of any club.
Hybrid processes – both stats and traditional scouting - are mandatory as best practice.
On-demand video and data allow teams to use these valuable scouting resources more effectively.
You MUST be aware of scouting biases in order to get the most out of these valuable employees.
You know the part in The Big Short where Michael Burry (Christian Bale) is sitting there at his desk, explaining to an irate investor that the housing market is guaranteed to crash, it’s just that no one knows it yet? And the fact that it has never crashed in modern times has nothing to do with his certainty that this future crash is now inevitable?
I feel that way about stats and data in football.
I believe with utter certainty that stats and data will play a huge role in the future of the sport. This is despite knowing that my certitude makes me sound like a bit of a loon to parties that approach this subject with some skepticism.
Today I’m going to explain why I carry this certainty about the future, but it requires the audience to shed one big misconception most people seem to carry about the game.
Football is not unique and unrelated to other sports.
Football actually bears reasonable similarities to basketball and both forms of hockey, such that certain ways of analyzing those sports are easily adaptable to football. And this goes beyond stats – German coaches have long consulted with elite field hockey coaches about defensive tactics, and Pep Guardiola includes legendary water polo player Manuel Estiarte in his coaching staff. Even now, researchers like Luke Bornn are taking what they learned applying spatial statistics to SportVU data in the NBA and seeing what they can learn from football's tracking data.
Yes, football has its own idiosyncrasies and you need to understand the game at a high level to get the most out of your analysis. No one intelligent disputes that. But the fact that we've seen massive revolutions in how other sports are analysed that have lead to changes in how the sports are also played means we should expect a revolution to hit football in the future as well.
I have been reading Andy Glockner’s book recently, which catalogs and explains the NBA’s analytics evolution, and it continually amazes me how much it parallels a movement still in its infancy in football.
“There is way more money involved in [the league] today than even ten years ago, and teams have to work harder and harder to find and maintain competitive edges. How they are doing so varies wildly from team to team, and heavily involves state-of-the-art technology to try to move ever closer to solving an impossibly complex and nuanced sport.”
Is that quote about basketball or football? The NBA or the English Premier League? It could be either, right? Except in the Premier League there are bushels full of competitive edges sitting in easy reach of anyone who knows where to look.
Another thing I believe for certain is they won’t stay that way for long. Spending money now to obtain the low hanging fruit and discover new ones also gives a team a head start in what will assuredly be a brain race at some point down the road, and more importantly, will likely yield huge dividends in terms of points, money, and potential titles now.
That’s the thing that I think the analytics movement in football may have gotten wrong through absolutely no fault of anyone involved. The stats guys developing new ideas and doing the work often think of it as, “how do we apply stats to football to learn new things?”
That’s technically correct. However, it misses the major point.
The real goal for the analytics movement in any sport should be: how do we discover and deliver new competitive edges?
Stats and data are a very useful tool in doing that, but it’s a big tool box.
Sam Presti is 29. Sam Hinkie is 27. Celtics “Senior Vice President for Operations” Daryl Morey is… 31.
Anyway, Glockner’s book is excellent, especially if you read it with an eye that it may be foretelling the future of a football analytics movement that has yet to start across most of Europe.
FiveThirtyEight is a mixed bag, but their sports stuff is still generally pretty good. This piece, which examines the expansion of “numbers-savvy front-office staffers over time” is excellent.
“Although the analytical gold rush began before the period we examined, hiring has accelerated at an almost exponential rate over the last few years.”
One of the main takeaways from the article is that baseball teams are spending more and more on stats dorks because they provide a dramatically bigger boost to win totals on a per dollar basis than many free agent signings. Part of that is because baseball's player market has become more efficient over the years thanks to improved use of stats, but a bigger part comes down to basic economics.
They estimate a five-man analytics team costs about $350,000 per year, which still lags behind the minimum salary for a single player.
The takeaway: It paid to invest in analytics early. Teams with at least one analyst in 2009 outperformed their expected winning percentageby 44 percentage points over the 2012-14 period, relative to teams who didn’t — an enormous effect, equivalent to more than seven extra wins per season.
Even the minimum estimate of two extra wins per year would represent a return roughly 30 times as efficient as spending the same amount on the free-agent market.
One more thing that really struck me out of that piece and that I feel is hugely applicable to football.
Although the big-budget Boston Red Sox were also one of the first teams to demonstrate that an analytics department could help win a World Series,a number of low-payroll, small-market teams — including not only the Moneyball A’s, but also the Rays, Indians, Padres and Pirates — were among the first to form quantitative departments and develop systems to house and display statistical data. It made sense: The more pressing a team’s financial imperative to stretch every dollar and wring out every win, the more likely it was to try a new approach.
How can teams compete with the traditional giants beyond just spending more money?
Apply the marginal gains.
Make consistently better decisions than other teams.
Play more efficient football.
Recruit better coaches.
Recruit better players.
Make fewer mistakes in the transfer market.
Find. The. Edges!
Baseball and Basketball are hugely different sports. In fact, they are more different from each other than basketball is from football. And yet in both of these areas we have seen teams dramatically ramp up spending to get smarter faster than the competition.
Why? Because it helps them win more.
This WILL happen in football.
The only questions are how long it takes before it happens in scale across not just England, but European football as a whole, and which teams will lead the charge and reap the rewards as early adopters.
It’s a way of visualizing a large number of stats at one time. In our case, the radars specifically deal with player stats. Some people also call them spider charts or graphs because they can look like they make a spider web.
Why bother creating them? What’s wrong with tables? Or bar charts?
Hrm, let’s deal with the last questions first. There is nothing is wrong with tables of numbers. My brain loves them, and so do many others.
However, you have to admit that tables of numbers are a little boring. Bar charts are better, but they kind of fall apart when trying to compare many attributes at the same time. Radars allow exactly that.
Why bother creating them? That one is complicated. Why bother making infographics or doing data visualization at all? The answer is probably at least a book long, but the quick response is because people like to look at stats presented in this way far more than they like to look at a set of numbers. Radars invite you to engage with them. They create shapes that brains want to process. People have real reactions, and once you get used to what they display and how they display it, you can interpret them much faster than if you had to do the exact same analysis with a table of numbers.
Many of the shapes created correspond to “types” of players, at least when it comes to statistical output. Pacey, dribbling winger. Deeplying playmaker. Shot monster center forward. Starfish of futility.
There’s a lot more methodology chat in the various articles I have written about on StatsBomb, but I need to explain one very quick thing before I move on to player type shapes and examples.
Radar boundaries represent the top 5% and bottom 5% of all statistical production by players in that position across 5 leagues (EPL, Bundesliga, La Liga, Serie A, and Ligue 1) and 5 seasons of data. In stat-y terms, the cut-offs are at two standard deviations of statistical production.
In non-stat-y terms, Lionel Messi made EVERYONE look terrible. I know, that doesn’t sound that bad because it’s true, but trust me, the newer way the templates are constructed is better.
The design for these was taken from Ramimo's 2013 NBA All-Star poster. I thought it would be really interesting to apply this to football, and then through testing, became irritated by what Messi made everyone else look like if I just used pure stats output. That's when I added the standard deviations idea, and started playing with different positional templates.
QUICK NOTES:
The only thing these represent is statistical output.
If you put players in different systems, it may change their output.
If you put them in different positions, it almost certainly WILL change their output.
Age will also change statistical output.
In short, these are a tool to help evaluate players. Like any tool, they have strengths and weaknesses. In general, I have found it much easier to evaluate players WITH this information than without it.
Explaining Bits and Bobs
This means that all the non-percentage stats in this are normalized for 90 minutes played. The reason you do this is to correct for the fact that some players don’t always play 90 minutes. Players that frequently get subbed on or off will inherently look worse if you look at per game stats than per 90 minutes played.
This is the age the player would be at the end of the season. We will change this soon to season age + birthday.
Why use non-penalty goals? Because penalties are converted at a 75-78% rate almost regardless of who takes them. They are a different skill to scoring goals that are not penalties (some teams have even had goalkeepers as their lead penalty takers), and so we strip them out of the scoring numbers.
DRAWING penalties is a great skill (and will be added to assist stats over time). Converting penalties is a very common one.
Shooting%
How many shots were on target out of ALL shots that a player has taken. This includes those that were blocked.
Key Passes
Passes that set up a teammate to take a shot. These are highly correlated with assists, which are passes to teammates who score a goal quickly after. (Note: This is the same stat as Chances Created. Somewhere along the way Opta made Key Passes only mean passes that lead to shots that are NOT goals and CC is all. Which is weird.)
Through Balls
Opta definition: a pass splitting the defence for a team-mate to run on to. Why do we care? These types of passes are generally considered the single type of passes most likely to score a goal.
Scoring Contribution
Combined non-penalty goals and assists per 90 minutes.
PAdj stands for “possession adjusted” stats. The reason why we do this is because it normalizes defensive stats for opportunity. Think about it this way: If your teammates always have the ball, then you can’t make any defensive actions, and you would look worse in this statistic compared to a Tony Pulis-style team that sits deep and constantly defends.
When adjusted for possession, tackles and interception output becomes moderately correlated with shots conceded and goals against, as opposed to having no correlation without the adjustment. In short, it’s an imperfect adjustment, but much better than not having the adjustment at all.
In the bottom left of every radar is the actual statistical output in numbers for each spoke of the radar. Numbers in green are in the Top 5% of output in that stat for the player population and numbers in red are the Bottom 5%.
These were developed later, and to be perfectly honest, they are less valid overall than the other positional templates. I knew this ahead of time, but legendary Scotland, Everton, and Rangers player David Weir - who is also a centerback - asked me to take a swipe at creating these and I couldn't say no. They give you a sense of how a centerback plays, but become tricky beyond that.
I do know that Thiago Silva is pretty fantastic, though.
Some of you clicked on this just to ask, "WTF is PDO?" Which is fine - we take all kinds here.
The seeming acronym doesn't stand for anything - it was the online handle of Brian King who created the stat in hockey. The definition of the metric PDO is listed below but Wikipedia actually has a page for hockey analytics, so if you want to know more click here.
“PDO is the sum of a teams shooting percentage (goals/shots on target) and its save percentage (saves/shots on target against). It treats each shot as having an equal chance of being scored – regardless of location, the shooter, or the identity or position of the ‘keeper and any defenders. Despite this obvious shortcoming it regresses heavily towards the mean – meaning that it has a large luck component. In fact, over the course of a Premiership season, the distance a teams PDO is from 1000 is ~60% luck.”
Now you may have seen an occasional tweet from me expressing displeasure with the use of this particular metric, but I've never actually sat down to detail why I think it's dumb. Today I will do that.
Reason 1) It's Theoretically Flawed
Why? Because it treats all shots as equal.
Here's a clue: All shots in football are NOT equal.
Then you add in the whole headers are a lot harder than shots with feet thing that Colin Trainor did way back when and POOF there goes your theory and your metric, and we haven't even gotten to all the other factors that impact a shot's probability of being a goal.
It's kind of sort of fine in hockey I guess because shotqualityomgwtfbbq, but it's just fantastically dumb to use anything that makes this assumption in football.
If you need an image in your head to help explain all of this in personal terms, picture yourself with a football on a football pitch facing a goalkeeper. You take 20 on target shots at the goal from 20 yards out in the center of the pitch.
You also take 20 on target shots at the goal from 6 yards out in the center of the pitch. Which one of those scenarios is going to yield more goals?
Reason 2) It Combines Attacking and Defensive Conversion As If They Are Remotely Related They aren't. Teams technically have infinite choices in how they attack and how they defend. They don't have to be related at all. Therefore, why would we treat them as if they were?
You can have a normal, straightforward average attack and a league leading defense. Or you can have an attack that consistently creates insane chances and pairs it with a defense that gives up exactly the same.
Or you can... well, anything.
The point is that by combining the two separate phases of play into one metric, you miss out on the signal.
"Hey, this team is overperforming PDO!"
Okay, why?
THIS IS ALWAYS THE NEXT QUESTION, and if it is always the next question, then maybe you can - I DUNNO - treat the two phases separately and immediately jump ahead a step.
"This team is giving up far fewer goals than expected in defense."
Aha, now you have my interest. Tell me more.
"This team brought in an attacking assistant coach in the summer to try and boost the number of goals scored..."
Excellent, let's analyze that.
Wait... no team would actually do that in the current football landscape, but if they DID then this would be a very good thing to analyse.
Reason 3) Every Team Does Not Completely Regress
This is a fundamental nerd point, but the fact of the matter is that every team's PDO does not completely regress toward zero, even across multiple seasons.
Why?
BECAUSE ALL SHOTS ARE NOT EQUAL!
There are systemic reasons why some teams allow far worse chances season after season than others. If a team's defensive structure is such that the average shot distance it allows is from 20 yards instead of 15, your goalkeeper has more reaction time on average to make saves, there are likely more men between the ball and the goal, and the team is almost certainly going to post a better save percentage.
Or if you are a crazy high pressing team that tends to keep the number of opposing shots low, but the trade-off is that when someone beats your press they get awesome chances right on top of your goal, then your save percentage numbers are also going to look weird and are unlikely to regress to anything approaching average.
The same applies for elite attacking systems. Some head coaches have an attack that consistently creates better chances than average, which means their shots are more likely to go in the goal, and the team is more likely to post abnormal PDO numbers that have very good reasons to stay that way. And all of this is before we even touch the impact of super elite or sub-par players with regard to skill. One reason why it may look liketeams revert to the mean over the course of many years is because manager or head coach tenures last between 12-15 months on average.
Start tracking these things by head coach tenure (or tracking head coach performance across different teams) and it yields a lot more clarity. A weird PDO by a team might be random variation, but there's a decent chance it isn't and for reasons you care about. Other ways of analyzing team performance would be a lot more insightful and should be examined first instead of simply assigning outliers to the random variation dustbin.
Conclusion
Regardless of its common usage in hockey, PDO is theoretically flawed in football and people need to stop using it. Yes, I know there may be data reasons why some analysts continue to use PDO, but as explained above, we should try to find a way past this at the earliest possible opportunity. Do something smarter that better relates directly to the sport you are analyzing. The good news here is that there is now a giant open space just waiting for a clever person to tell the world what they should be using in place of PDO, and that person could be you!
A couple of years ago, I used to produce regular mailbags, where I answer reader questions about whatever seems interesting to them. Today we're going to do that again. Despite the fact that this is being published on April Fool's Day, I'm not going to post any idiotic jokes, pranks, or lies herein. These are all actual questions from actual readers and actual answers from actual mes.
Additionally, since I am not employed by any teams right now, we get to talk about transfers and I get to say whatever the hell I want to, regardless of whose plans it might screw up. I can see the world's recruitment analysts and technical scouts wincing already. This should be fun... Here we go!
Who should Arsenal buy as a CM/DM for next season?
A lot of this has to do with who do you think needs replacing and why. Most people asked for a defensive midfielder, but seemed to want passing range and versatility. That is a tough combination to come by, and I think the rumored Granit Xhaka is rather good. However... if I am buying one central midfielder in Europe right now, it's Naby Keita.
Just 21 years old, he played as an elite defensive midfielder in a pressing system last season.
This season he moved forward into an 8/10 role and has put up outrageous scoring stats while losing very little defensive output. No one does that. Only 1.72m tall, Naby is both fast and strong and has excellent balance. He's an outstanding dribbler. He's honestly one of the most athletic young central midfielders I have ever scouted.
The only question is whether his touch passing fits in with Arsenal's style well enough for Wenger to pick him. I think Arsenal need more of this type of athleticism in their squad for certain matchups, and this guy is wildly talented. I have been keeping track of him for quite a while now. At my old job, we [hit by electrical shocks]. So yeah, if I have to pick just one guy to fit in midfield for Arsenal, it's probably him.
What Manager Should Chelsea Hire? What Center Forward Should They Recruit?
The answers to this one are really boring and I apologize for that ahead of time, but these are the questions you gave me! I think Conte is an exceptional head coach who created utterly dominant teams in Serie A. The only real question is whether he can get players to buy into his methodology. I have information from very good sources that he is seriously intense. So is Diego Simeone. Those are my top two choices for manager, and I think Conte is far more likely to end up in London next season.
Can either of them win over the players and get the maximum out of them without losing the whole squad like Mourinho did?
As for a center forward, it's hard to see Chelsea improving much on Diego Costa and Bertrand Traore. I think Traore is one of the best young CFs in Europe and just needs some game time to adapt to the Premier League. I guess they could buy back Romelu Lukaku for twice what they sold him for (ouch), but barring that... Oh, and someone else said Chelsea need a new center back. The good news is you already own the guy I would probably recommend for you - Andreas Christensen. The bad news is that he's allegedly on loan to Gladbach for another season after this one.
What Goalkeeper Should Liverpool Buy?
Sorry folks, stats don't work on goalkeepers. Okay, that's not entirely true, but they only sort of work on GKs and I don't quite have enough time to answer this properly in full. Instead I'll just say they should buy Naby Keita for the midfield and that way whomever they do buy to compete with Mignolet next year will probably have less work to do. That's assuming Arsenal and Spurs don't buy him first. And let's be honest, assuming Arsene Wenger is not going to buy a central midfielder in the summer has been a safe bet for a very long time now. He's probably a more natural fit for Liverpool or Spurs than Arsenal anyway.
Who would you pick - Vincent Janssen or Sebastien Haller?
This is like making me choose between my kids. For those who don't follow the Eredivisie, these are two of the top young center forwards in the league. Do you know that Haller was bought by Utrecht on an option from Auxerre last spring for only 800k Euros? And bigger clubs than Utrecht wanted to buy him both last summer and in January and pay him a LOT more money, but he stayed put.
Rumor in the Netherlands is he only has eyes for Ajax right now, but I could see bigger fish with more money testing his desire to stay in Holland. I could also attempt to tell you an awful lot more about Haller and explain why I know a lot more about him, but that would trigger additional electrical shocks and I'm still kind of jittery after the last batch.
Meanwhile Janssen is one of the top scorers in Europe this season. A physical shot monster, I'm not sure about his pace, but he certainly causes huge problems for Dutch defenders. A couple of scouts I trust have also insisted he's the real deal (stats suggested this was likely months ago), and from what I have seen they are probably right. I would say Haller has a bit more potential and creates a few more goals for his teammates, while Janssen is a tremendous goalscorer right now. For me, Haller wins by a whisker, but it's basically too close to call. (And in the end, it all comes down to price and what the player wants to do anyway.)
This is an interesting question, and the real answer is that no one actually knows. I suspect Arsenal are probably furthest along in football research and they should be, as StatDNA had the biggest head start (outside of the Bolton group that dissipated). I have met a number of Arsenal's top level people on the analysis side and they are wicked smaht. It is annoying when your favorite team is also the team that would need your skill set the least, but thems the breaks. Liverpool are somewhere in the "we develop new football research/tech" sphere.
They even have a Director of Research, so something must be happening there! That's pretty much all I know. Southampton and Spurs probably have some cool stuff going, but I don't know enough about either place to say what. City is really hard to tell what is getting generated and what gets used, but they do have some personnel working on it. I don't think Chelsea or United have been developing anything on the analytics side for some time. Leicester City are doing smart things, but how much of that is related to stats research versus how much is just nailing normal decisions is something I am on the fence about.
But... and this is important... I could be totally fucking wrong.
All this stuff is supposed to be secret. If you are developing edges inside a club, you should NOT be talking about it. That makes it a whole lot of guess work on my part to say who is doing what well. I know we had some things at Smartodds that I was very happy with, and that I am pretty sure are bleeding edge tech (not just cool visualizations), but I can't be totally certain no one else developed those ages ago and simply didn't talk about it.
Our research was developed with a little over a year of full-time access to the Opta database and about two man years worth of output. *light bulb switches on* Funnily enough, I no longer work inside a club, so if Chelsea or Manchester United wanted to find someone smart who COULD talk about cutting edge research and what it could do for them... *makes the "call me" motion* The reason why I am talking about what clubs may or may not have developed is because that's the baseline for state of the art. Some things public analysts have going for them are as follows:
You can collaborate and make each other smarter. Clubs can't do that except by hiring people from the outside, and they can never do that in scale. I'm not sure if you guys are the vehicle Voltron or the lion one, but you can certainly join together and fight space dragons and shit.
There are lots and lots and lots of you. Teams have a comparatively tiny number of analysts and most of their brain is likely occupied by day-to-day tasks like how to beat Alan Pardew.
Many of you have fascinating and unique skill sets to bring to bear on any number of football-related problems.
And some things that public analysts have going against them are as follows:
Poor access to data and what you have is probably poorly organized unless you are a data/code pimp. If you are a data/code pimp, then you probably spent a lot of time getting your data organized and not doing any analysis or coding new tools or having any fun or...
Everyone is learning from scratch in most cases and there is no clear path to accelerate that. Those who might create such paths are disincentivized to do so. Constantly recreating the wheel is costly when it comes to edumacation.
Almost no one does this full-time. Or even half-time. It limits depth of expertise in the subject matter.
No one has access to all the cool tools you can build which really can accelerate knowledge growth.
Some of these club analysts have access to sweet proprietary data that only exist inside those clubs! The bastards.
Top clubs have big budgets that they could spend on this if they saw the value. *again with the "call me" motion (mixedknuts@gmail.com)*
Seriously, if I couldn't walk into any non-Arsenal, non-Liverpool club at this point and introduce them to a single competitive edge that could get them a minimum of 3 extra points a season (which equate to millions of pounds in revenue depending one where a team ends up in the table), I would hang up my analysis boots right now. The advantage left on the table OUTSIDE OF RECRUITMENT is fuck-ing massive at nearly every football club in existence. (Says the person who was just made redundant by a football club. Well, two of them, actually.)
AHEM.
I know it sounds incredibly arrogant or mildly insane that someone who no longer works at a Championship football club believes these Champions League clubs are missing out on enormous competitive edges, but that's exactly what I am saying. My work and reputation at this point is pretty solid, right? You guys trust me not to bullshit you at least a little bit? So you understand there is no way I would insist this to be the case unless I absolutely believed it to be true, and could damn well prove my case in private to people who wanted to listen.
I don't know everything. Hell, I barely know anything. As a whole, we know the tiniest bit about how football works. But that's the thing about sports - it's not about knowing everything, it's about understanding more than your competition. The thing is, once you start looking at these competitive edge problems with the right perspective, you notice improvements all over the place just waiting to be exploited. I dunno man... I assume the guys who figured out on base + slugging percentage were shocked no one else was exploiting it. So did the ones who developed the 3-pointers and drives offense in the NBA.
As did the guys who developed the spread offense in American Football, and countless other sport innovators. Hell, this is the epitome of Bill James' early and middle career. That's kind of where I'm at about all this stuff right now. (No, I did not compare myself to Bill James. He's a legend and I'm just barely getting started. I'm just saying he had an awful lot of knowledge that no one inside the sport really paid attention to for way too long a period of time.)
And the funny thing is, [SO MUCH REDACTED ELECTRICAL SHOCK TREATMENT]. Manchester United copying a single set piece (badly) is the tip o' the fucking iceberg, my friends. Back to the question at hand, I think certain public analysts are creating new things or at least derivative work that is really interesting. So some research might be state of the art. I am definitely not one of those people who think everything has been done already with event data.
In fact, I think we've barely scratched the surface of possibilities there, and I say this as someone who has done an awful lot of scratching. Of course, whenever a public person introduces new material, there's a chance it gets absorbed into the club IP sphere without so much as a thank you, but most people seem to do it as a-fun-hobby-that-could-eventually- some-day-in-the-far-distant-future-possibly-lead-to-a-job-that-doesn't-totally-suck? Do this stuff for yourself.
Make sure you are having fun with it, or go do something else with your free time. And if something happens, great, but don't wait by the phone for that girl to call because there are no girls who actually work in European football. Well, except Sarah Rudd, who happens to be one of the top people in the world in this area.
So if you are a girl, there is still hope that you can also be totally awesome and work in football! And then there would be two of you... Or three, I guess, if we count Marina Granovaskaia at Chelsea. Who I have not met but by all reports is also awesome. But I digress! (And I am clearly going to get in trouble with this, so it's probably time to move on and cut my losses.)
Two quick throughts before I answer other questions. I have run across multiple club owners and directors of football in the last year who have no idea what an expected goal is. Or a shots model. Or almost anything else to do with football stats and data. And these were definitely smart people I was talking to. I mention this not because I was surprised, but to detail a tiny piece of knowledge that people who are in the analytics community take for granted, but which has almost zero penetration beyond this particular football niche.
You don't need to know about expected goals to succeed in football. It helps. When applied correctly, it can definitely allow you to make smarter decisions. But the fact that Arsene Wenger mentioned it once in a press conference and you once-in-a-very-rare-while see it appear in mainstream media does not mean the concept has disseminated among the masses, either in fandom or by those employed in football.
This also relates back to why I think first mover advantage still exists and is enormous. The second anecdote goes back to this excerpt from "The Arm" by Jeff Passan that appeared on Yahoo Sports earlier in the week. That part of the book is awesome and you should read it. Kyle Boddy is the guy developing high velocity pitchers that gets profiled in that piece, and he's been a friend of mine for over a decade now. We've been grumpy old men on the internet together since well before either of us were of an age to be considered old.
We have always been grumpy. Anyway, he's a genius, but this isn't about him, it's about this bit further down.
"The Dodgers were run by Andrew Friedman, the hyperintelligent president of baseball operations who had just left the Rays after a decade-long run of success. In Los Angeles, no budget bound Friedman. The Dodgers had just started an $8 billion local-television contract that allowed their annual payrolls to threaten $300 million. Even better, Friedman and general manager Farhan Zaidi were allowing Fearing to build baseball’s biggest, best think tank. They were seeking experts in quantitative psychology and applied mathematics."
That's where baseball is right now. The only way to win more consistently is to be smarter, even when you have one of the biggest budgets in the league. And yet the excerpt from Passan's book goes a very long way detailing how backwards and wrong baseball have been about development of pitchers for the last decade (and longer). So much of the accepted conventional wisdom was completely and utterly incorrect regarding the most valuable position in the game. Now think about football, which is arguably twenty years behind where baseball is right now.
How much advantage is there for a club who sees a glimpse of the future, has the brain trust and money in place to invest in finding smarter ways to do things, and has the decision making structure to exploit the new intelligence that is discovered? In no other sport in the world do clubs control so many elements they can use to create advantages. Football clubs have access to academy players when they are children.
In the United States, they almost never get access until they are 18 or older. Football clubs also have the potential to completely revamp their roster on a yearly basis if they want to. None of the U.S. sports can do that. Also unique is the worldwide access to talent via the transfer market, and the fact that more people in the world play football than any other sport. Advantageous styles of play. Tactical edges. Better players, for cheaper prices.
Better coaches. Better, different training.
The potential for discovering and leveraging marginal gains for performance improvements is astounding, and this is not pie in the sky stuff. These are things that have been delivered successfully time and again as other sports grew into their analytical ages. But like my wish for a media provider that sees the value in people flocking to their site to generate radars and shot maps on the daily, this vision of what's possible in football is nothing more than a dream that you hope someone with power or influence or money eventually turns into a reality.
That is a project I would like to be a part of. Moving on!
You know how everyone always says that the real strength of using data in sports is to help teams avoid making stupid mistakes? Someone introduce Boro to the concept. Don't get me wrong, they have a great head coach - the hardest position to recruit - and they are lurking right at the top of the Championship again.
That said, their value for money in the transfer market this season has been poor, and the wages they pay out... woof.
It's weird for me to say this, as I have written publicly about Rhodes being a very interesting and sometimes undervalued striker in the past. The reason my tone has changed is because his data changed. After being one of the best forwards in the Football League for almost half a decade, he's now posting below average numbers.
This is a big red flag, especially when it comes to forwards, and double especially when it involves a monster £9M fee plus add-ons plus wages. It's too early to say this is a mistake, but there is an enormous amount of risk attached to it. Risk that Boro will happily deal with and/or write off, should they finally make the promised land of the Premier League.
Spurs centerbacks last year was a huge clash. Actually, given they had AVB before, it's weird that Spurs still had CBs that were slow and completely incapable of playing a high pressing style even through last season.
They solved that problem really well this summer. There are countless examples you could roll through - from an unsuitable Mario Balotelli trying to fill a Sturridge/Suarez role at Liverpool, to crossing wingers being added to possession teams, to purely defensive fullbacks being added to teams that need dynamic over- and under-laps to unlock teams in the final third.
Recruitment is tricky in the best of circumstances, but it absolutely must start with a clearly defined style of play that you can then fit players into and around. And it needs a coach who can coach that style of play, or your team is likely to end up in real trouble.
I did not.
I did manage to get a picture with it, which ended up being really important to me because in this business it is so easy to forget your successes. I was lucky enough to play a tiny role in a team winning its first league title and making the Europa League knockout stages. Given that I started all this when I was on chemo, the day FCM held their trophy celebration meant a lot to me. I still get goosebumps when I watch the video.
The reffing might be the biggest single difference between the leagues. Lack of called fouls make it a lot harder on skill players, and you definitely focus a bit more at body type when scouting, so that players can have more durability in the Championship.
That said, some fitness guys go too far in having players put on muscle at the expense of actually being able to play football, so there's a balance there. People don't perceive it this way, but there is probably only a small difference in quality of play between the bottom half of the Premier League and the top 6 to 8 teams in the Championship. Plenty of Champ teams now go up and stay there, especially if they have good coaches. Swansea, Southampton, Watford, Bournemouth, and Leicester are all archetypes of clubs that not only go up, but who can perform pretty well once there.
Off the pitch, I think the facilities are tremendously different, especially at the sides who have been established in the Premier League consistently. Then again, most of my time has been spent at one of the lowest revenue clubs in the Championship, so maybe other clubs are way more posh than I expect. The last thing I think is very different is the quality of head coach or manager. The foreign influx in the Premier League, has been enormous, even more than with the players, and as of next season it will be absolutely loaded with top coaches. I think the Championship is still a bit behind that right now, but we are seeing more foreign coach recruitment there as well, so it may not stay that way for long.
True story: I was watching set piece training, and comparing it to what I saw at FC Midtjylland.
Maybe some day. Here's the thing - you only get so many minutes on a training pitch each week. And yet you have a ton of things you need to teach players about the next opponent, about their own performance, about how they need to develop... about everything. I think at most clubs, the usage of coach analysts is a great way to bridge this gap, from young players through the first team. And at most clubs I don't think this is happening, at all.
Very little, but it does depend a bit on culture/country. I think player capacity for learning is hugely underestimated, especially in England. That said, you need to be really careful with what you introduce and how you introduce it. If I were starting somewhere new, I would do this, but very gently, and I would go out of my way to find out who would likely be receptive ahead of time.
This is funny, because we actually looked at this in detail at work, but it was with Andros Townsend in mind instead of Coutinho. I don't want to ruin it because I still plan to use Townsend in a future presentation and article, but I will say that Coutinho's average shot is about twice as good as Townsend's.
(This assumes that my script isn't horribly bugged, which is not a guarantee right now.)
While on this topic, two more fun facts from the Opta data set. First, Bayern's lightning quick wide player Douglas Costa clocks in at around .05 xG per shot average, which is startling for a Pep player and explains the whole two goals in 1700 minutes thing for him. Second, Alessandro Diamanti (briefly of Watford this season) had the worst goal expectation per shot of any high volume guy we looked at back when we were arguing at work. There was one season where his expectation - and I am absolutely not exaggerating this - was about one goal in every 40 to 50 non-penalty shots.
James already knows I am going to say player evaluation and transfers. People love to read it, which in turn means more people will be reading smart data pieces, which HAS to be a good thing, right?
I also think a lot more people should poke around in the same areas that Dustin Ward and Thom Lawrence have been researching (click their names for links to the articles). Their stuff is very smart and as cutting edge as it gets. I'd suggest more people follow in the footsteps of Will Gurpinar-Morgan and Martin Eastwood, but I am pretty sure the education and skills required to do so would be a massive hurdle for just about anybody, myself included.
Five years is a long time, and it's hard to stay dumb that long about transfers and stay in the Premier League unless the team is unconscionably rich. I think Arsenal rarely make mistakes in who they buy, so they probably win the award for most impressive.
On the other hand, I think they frequently make mistakes with who they don't buy, or who they sign to new contracts, but those last two things are almost entirely down to Arsene Wenger. I think Chelsea waste a ton of money every year buying confusing players that are highly unlikely to succeed, but they do have some fairly high profile hits as well. They also have a gigantic portfolio of player assets out on loan that could potentially benefit from better management.
Overall though, things aren't that bad there. United have been really poor at signing new players right up until this past season, when they got smart really fast. I think they found a source of good advice in the summer that directed them to good players, even if they seemed to dramatically overpay in almost every instance. It's been really difficult to see a consistent plan at Liverpool.
In fact, from the outside their recruitment has often looked like two rival factions, each getting half the players they wanted and then attempting to assemble a competent squad on the pitch. That seems sub-optimal, but who really knows the truth? If I had to pick one long-term PL club that has shit the bed consistently with regard to transfers over a five-year period, it has to be either Villa or Sunderland. Given the money spent, I'm pretty sure Sunderland win this one by a nose. (And to be fair to them, Villa had so much dead money immediately after the Houllier era, they actually couldn't spend any more and are still digging out of that hole.)
I don't know what happens behind the scenes there, but the recruitment in that place has been horrific for just about as long as I can remember. Finally, if I'm picking a club terrible at recruitment that used to be in the Premier League but isn't any more, it's probably Fulham. From Europa League final in 2010 to 21st in the Championship as of right now. Someone needs to pull the cord there, and soon, or they will follow Wigan and Wolves plunge from the Premier League to League One in no time at all. Nearly 4500 words of blathering, all in response to questions by you.
My wife assures me that I am absolutely, positively going on holiday next week, which means no new content from yours truly on the site. Thankfully, since there is actual football on television this week, James and company will be back and better than ever. Even though I had to lose my job for it to happen, I have really enjoyed being able to write about football again this week, and I hope you have enjoyed reading it.
Note: All football data in this piece is from Opta and the visualizations were built using that information.
A long, long time ago – November 2014 to be precise – I was lamenting the state of public shot maps. The ones floating around at the time were okay, but they provided neither the clarity I was looking for, nor the scaling I wanted for use looking across periods of more than one game. This isn’t to say the public ones are bad – more that I wanted to see if they could be done better.
My initial thoughts were that we might be able to do a Goldsberry style approach, adapted for football.
I explained this to my usual partner in crime, DOCTOR Marek Kwiatkowski (as he emphatically reminds me to call him). I have worked closely with Marek for years, and it’s safe to say he’s a bloody genius. The quality of my own work would be nowhere near as good as it is without his feedback. We’re like this…
Anyway, he was intrigued by the idea and started programming. What follows is a design diary for how this idea developed from a frustration to something that was used constantly by the now defunct Football Analytics Team during my days at Smartodds (Brentford and FC Midtjylland).
Nov 10th, 2014
Hi Ted, This is for Arsenal's home performance last season vs Cardiff (two late goals by Bendtner and Walcott). Legend:
circle=header, square=foot/other (triangle for own goals?) Thick black outline=goal, medium black=on target, gray=rest Colour=ExpG (the actual numbers are still wobbly, but it doesn't matter for the concept).
Some ideas:
* player numbers in the marker?
* half-sized marker for blocked shots?
Ted:
Things to consider:
Modified shapes for the following precursor events:
1) Throughball (Arrow?)
2) Successful Dribble (Triangle?)
3) Crosses (Chop 1/3 off whatever shape there is?)
Break colors into buckets for the following probabilities
1-.8: RED
.79-.7 - Red Orange
.69-.6 - Orange
.59-.5 - Orange Yellow
.49-.4 - Yellow
.39-.3 - Yellow Green
.29-.2 - Green
.19-.1 - Sequential Blues (from the PY spectrums you sent)
.09-.0 - Sequential Purples
As you said, there's a lot going on in the lower ranges and it needs more attention. Adding 2 colors of sequential would see to meet that, but open to changes here.
Obviously with this I probably just broke the Green outline concept for goals. Shot on target a thin black outline is good.
I kind of think blocked shots should just show up as grey, as if they have been blotted out of existence because they don't have a real expG value, but not 100%.
Long-term we can make these into an interactive app that has mouseover information for more detail.
Do you think adding player numbers inside the shapes will work or too much noise?
Marek:Hey, A few new versions.
Manual
Goldsberry Brewer manual=me trying to follow email, goldsberry=colour-picked from him directly, brewer=from colorbrewer2.org.
Outlines still to be worked out, unfortunately with the built-in scatter function I don't have enough control to do the black&white one. I can look into writing a custom scatter later.
In general, I think we are at the limit of the info we want to pack into these charts. I'm already not a fan of the dots/hatching, or even the many marker shapes.
As to the colours, I think you were right to mention the ExpG distribution itself. We should just partition it into ~10 classes of equal size and colour code these with a nice sequential map. It is in essence what Goldsberry is doing, I think. The downside is that the class boundaries will be at awkward ExpG values, but at the end of the day I'm not sure we care about that.
Ted:
Cool! Excellent effort. So much to process here, but that's good.
Now we can filter down what works and what doesn't.Of these I like Gradient + Manual best. Gradient + Brewer probably second, though it's close.
I hate the dots - they just don't work.
Get rid of the directional cross arrows.
It was a really good idea, but too information dense for the first pass.
Make headers circles (intuitive), regular shots hexes (or squares).
Stars and throughball triangles are pretty good, actually.
Black and green outlines aren't that bad.Maybe have no outline at all on normal shots?Obviously the legend will need to be crystal clear on meaning, but I think that will come quickly with usage as well.
Marek:Little bit getting there, perhaps? I quite like this one.
The colormap (I know I'm anal about it) is the right half of 'jet' from this link. I can now easily try any section of any colormap there if you want more samples.
The goal outline works better with regular shapes to my eye (ie triangle and star are a bit iffy), but it's still easily the best I've tried. Hexagon works better as default marker imo: the difference b/w headers and shots doesn't jump at you, but it's clear enough to pick it up immediately when you want to.
Ted:
This works. We'll need to build a spiffy, detailed legend to explain it and then I'll work on the poster display for the top section over the next few days for the info display there.
At this point, Ted realizes he is WAY out of his depth trying to be helpful and this looks terrible. Therefore he asks actual professional designer and all around awesome dude @bootifulgame for feedback.
@bootifulgame sends back this, which makes Ted feel bad about how dumb he is, and about his life, and the fact that he’ll never be able to make truly pretty things.
It just goes to show what you can do with an awesome professional designer involved and not just data dorks trying to solve problems. Alas, the final versions never quite looked as amazing as Ben's.
These are some further test versions Marek did for single game plots.
For individual player seasons, they looked like this:
And for team seasons, they look like this:
What Changed?
There was some further tweaking to come.
Marek got rid of the Super Mario Brothers star for successful dribbles and moved to diamonds. The rest of the markers are fairly intuitive.
The lowest color on the plot was changed to .05 or lower probability.
There’s also plenty of other information that can be added in the legend, and you can make a million different data cuts for what you want to see. Open play is an obvious one here, but there are plenty of others.
They still get really busy for full-season maps for teams. Unfortunately there isn’t much you can do about that. We have a couple of different styles that were built later that try to suss out trends with less noise, but they also have issues.
Conclusion
So there you are, the MK (Marek Kwiatkowski) Shot Map variation and a detailed explanation of how they went from Marek fooling around on a problem to something attractive and useful. Combine them with expected goal race charts (originally seen in hockey, but something 11Tegen11 posts frequently on his Twitter account) and you end up with a fairly complete unit of game analysis, at least when it comes to shots.
Enjoy the Easter holidays,and maybe if I get some time next week, I'll explain a bit more on how to use these charts to analyze team trends.
During the World Cup, I made a bunch of Gifolutions of how different players' statistical radars have evolved over time. Two that I never got around to were Cristiano Ronaldo and Lionel Messi. This is odd because as any hit-whoring writer knows, those two guys will get you the most hits, period, but life moves fast, you know?
Messi's radar barely changes over the period of data I have - he pretty much lives in the top 5% of every forward stat ever, every single year. The little lion is metronomic in his alienness.
I didn't get around to Ronaldo because Portugal went out of the World Cup fairly early, but Cristiano is different. He's evolved over the years from a slightly gangly kid full of flash and promise into a man mountain of power, pace, and technique.
What's interesting is that Ronaldo's statistical output has changed as well. It's rare to see a player who is nominally a wide forward score at the levels that Ronaldo does. It is even more weird - unprecedented, actually - for anyone in the world to shoot as much as Ronaldo does. In fact, over the five years prior to this one, Ronaldo lead every player in the world in shots per 90. By a mile.
No one shoots more than 7 times per 90 minutes.
Except Cristiano.
That's like 60-70% of the entire game volume for half of the teams in the world! The locations of his shots weren't always what one might wish for, but when a player scores goals in the volume that Ronaldo does, you kind of have to accept a little bad with all that good.
This year, Ronaldo is a little different. As his age creeps up on 30, his shot volume has gone down (nearly 2 shots per 90 from last season, a jawdropping amount in itself), but his efficiency is way up. This translates to MORE goals, not less. Assists to teammates are also way up, which means as he heads into his third decade, Ronaldo is having his best scoring season.
By having a little more confidence in the incredible cast around him at Real Madrid, he's finally become a truly complete player. It's impossible not to respect a guy who keeps learning and developing, even when at the pinnacle of world football.
This is generally a blog about analytics and football, and I’ll get there soon, but I was thinking recently about what might have happened if Bill James, the father of the baseball stats movement, had been hired by baseball teams back in the early 80’s. (For those who want more reading about James that doesn’t directly involve reading about baseball stats, his Wikipedia bio is here, and Moneyball has lots of info as well.)
James’ Baseball Abstracts, the early editions of which were written when he was a security guard at a pork and beans canning plant, were the spark for most of the modern statistics movement in baseball. It’s often hard to pinpoint the tipping points in time, but Baseball Abstract was fairly clearly one, and most of the modern baseball stats writers were hugely influenced by his work. That in turn lead to guys like Billy Beane, General Manager of the Oakland A’s and primary focus of Moneyball, doing what he did, and the Red Sox hiring James as a consultant in 2003.
The first Baseball Abstract was in 1977. It would take 26 years for James to officially work in Major League Baseball. That is a long time to build a body of work and public base of support, and to teach and educate interested parties.
What if James had been adopted early on by some ownership group and the majority of his work had been kept secret, so use by some teams? Obviously, that team would have benefitted, but the world almost certainly would have been a poorer place.
Don’t believe me? Here’s a quote from Moneyball that I still find shocking almost every time I read it.
“The legendary GM Branch Rickey employed a professional statistician named Allan Roth who helped to compose an article under Rickey’s byline in Life magazine in 1954 that argued the importance of on-base and slugging percentages over batting average.”
1954! And yet it would take nearly half a century more and the publication of Lewis’s book before most of the baseball world caught on to these core concepts.
The other thing to realize here is that the baseball stats movement eventually triggered movements in other major sports. Baseball had the luxury of detailed box scores with a century of useable (if not always useful) data available. And while James often lamented the quality of the data in his early work, which eventually lead to the formation of STATS Inc, at least he had something to work on that consisted of more than batting average, home runs, and ERA.
Coming back to the question posed at the start: what happens if James doesn’t have all that influence? Does someone else pop up to take his place near the exact same time? Or does the absence of his high profile public work retard the development of baseball stats for another decade, and thus contribute to a drag on the development of statistical analysis in other sports as well?
Obviously we’ll never know, but it certainly could have happened. All it would have taken was one smart owner reading an early Baseball Abstract and POOF, James would have been sucked into the sky, while decades of future work would be gone.
What’s the Point of Rambling About Baseball?
Bear with me as I draft in Gabe Dejardins for a little guest spot.
Gabe is one of those brilliant hockey analytics guys that football has stolen a lot of concepts from over the last few years, but he was also writing original, fantastic work about football analytics back in 2010, before WhoScored or Squawka, or really any public data even existed. He was doing this stuff as a sideline to his hockey blog before most people even thought about it.
For those who don’t follow the NHL, there have been a huge number of hires this summer by professional hockey teams targeting statistical bloggers. Two of the most prominent were Tyler Dellow and Eric Tulsky, but the Toronto Maple Leafs also hired the guys behind extraskater.com, the primary hockey data site, and in the process shut the whole site down so no one else could use it.
It has been a crazy summer for anybody intelligent who was doing hockey analysis, but as Gabe explains here, it’s not really a new thing inside the walls of many teams.
Now as I mentioned, Gabe was crunching soccer stats before most of us knew data existed. And Gabe’s point about hockley analytics holds true for football as well.
(I wanted it to exist back in 2004-05 so that I could work on it, but there were no public sources. In fact, I have a notebook from a trip to Prague in 2005 with the business outline for a company just like Opta to collect football stats and do analytics. I knew Opta existed because their name was on the Premier League home page, but all we ever saw of their work was the ridiculous Opta Index - a single, useless black box number evaluating a player.)
You know who else has been crunching data for ages? Gavin Fleig. He’s currently Global Lead for Talent Management at Manchester City, but he started out way back at Bolton Wanderers with Sam Allardyce, and they crunched data and built game models to help Bolton punch well above their weight for quite a few years.
The same can be said for Steve Houston, currently of Sunderland, but formerly of Chelsea, Hamburg, and the Houston Rockets.
And Ian Graham, currently at Director of Research at Liverpool, but formerly at analytics company DecTech. (Graham actually has a small archive listing at the SoccerAnalysts Blog dating back mostly to 2011! The things you find on the internet.
There are a number of guys who have been working with soccer data inside of clubs for much longer than you might expect. Here’s an early Sloan Sports Conference soccer planel with all three of those guys plus Blake Wooster, formerly of Prozone and currently of 21stClub discussing data stuff back in 2011. (I would embed it, but it's not on Youtube.)
Never heard of any of those guys before now? This wouldn’t be a huge surprise, especially if they don’t work specifically for your club, because they all work IN clubs. Therefore whatever their research uncovers is all top secret.
What I found fascinating, however, is how clearly all of them communicate in that panel how they wish there were more statistical analysts around. Football has tons of sports science analysts and miniscule numbers of stats geeks doing good work. These guys want to know more, and they want to read you and me writing it.
In fact, as Fleig explains in an interview here, that desire was a big part of the impetus behind Manchester City releasing their data set to the public back in 2012.
They knew fans needed to have the data in order to be able to ask and answer interesting questions about how football works. And they probably knew from American sports that increasing data availability actually triggers an enormous increase in fan interest and involvement from certain groups of fans (basically anyone who might play fantasy football). It’s clear that all of these teams wanted more people doing research about the game and hopefully writing about it, so that they could learn additional useful info for free.
The StatsBomb twitter account is dense with followers who work for teams, either publicly or privately. I know for a fact that a surprising amount of the work done by guys in the analytics community has been read and adopted into football teams already.
Free labor, plus competitive advantage if you no how to apply it. It’s hard to beat that sort of thing.
Two more people doing kickass stuff way back are Sarah Rudd and Ravi Ramineni. Ravi works for the Seattle Sounders in MLS now, while last I heard Sarah Rudd was a vice president at StatDNA. I also heard rumors that StatDNA was the analytics company purchased by Arsenal two years ago, but I can’t confirm that because despite looking all over the place, I never did see a name mentioned in the press. Assumption: Arsenal bought a fantastic analytics company who were totally ahead of the curve two years ago and who have probably continued to innovate since. Whether Wenger and co leverage that information is another question entirely.
Anyway, the point in all of this was that analytics usage in soccer/football isn’t new, but it’s also not terribly widespread. Some of the stuff we’ve done on StatsBomb might be new research, and was only possible after WhoScored and Squawka appeared, and after we took a ton of our own time to put that information into crunchable form.
On the other hand, much of the work we’ve done on StatsBomb has probably already been done at many clubs throughout England. This is hugely frustrating for me, but despite reading everything I can get my hands on in this area for the last two years minimum, there just isn’t that much new research being published.
Why did we have to redo all the work? Because football doesn’t have a Bill James or Rob Neyer. Or Gabe Dejardins, Vic Ferrari, Tyler Dellow, and Eric Tulsky of hockey fame. Or Dave Berri, John Hollinger, Zack Lowe, and Kirk Goldsberry (plus many others) of the NBA. Without that sort of long-term public framework to stand on, analysts reinvent the wheel again and again as they start to ask and answer the interesting questions about how the game works.
Bill James produced a book a year on this stuff from when he started in 1977, and a huge number of other writers sprung out of the interest in his work. Football pretty much has two books about stats, total.
(Soccernomics and The Numbers Game) Yes, the way the world publishes things is different and the total blog publication of what we've done would certainly stand up to any of James's busy years, but still... two books total. Football might develop a Bill James in a few years, but I think the odds are against that happening, and here’s why.
Unless you actually hate the game or they offer stupidly low compensation packages, it’s hard to turn down football clubs when they come calling.
And honestly, if you are putting all this work into crunching the stats, you almost certainly love the game.
So there you are, doing work, wishing for more/better data, and writing about it in public. You build a following, and start to have some interest from media and the occasional private email or DM from clubs asking about your work. Eventually that culminates in someone giving you a job offer to stop working in public, but to have a potentially real impact on an actual football club, with a fuckton of data including the secret stuff that in some cases no one really admits exists.
Poof, much like the myriad of hockey bloggers this summer, you get sucked into the sky and your future (and in some cases your past) work disappears with you.
It’s possible hockey research will experience a rough year or three now as well, since it will take time for new writers to fill the massive holes left by the most recent hiring sprees.
The funny thing is, if 10 Premier League teams immediately wanted to find and hire statistical bloggers, I’m not sure they could do it. And if another 10 clubs from the Championship and Spain and Germany wanted to find writing talent for immediate hires, they definitely would hit a wall when trying to hire amongst the football blogging community. There simply aren’t enough people out there writing period, let alone enough who have displayed the kind of skill in analysis, math, and attention to detail the hockey guys were doing.
Why?
To sum up, I think it comes back to three things.
1) Huge problems finding detailed data to crunch. American sports have had these issues off and on at varying levels, but in Europe the data is extremely expensive to buy, most data providers don’t have a public face, and those that do always have to keep an eye on the bottom line. It’s doable, but it’s certainly not easy to get started.
2) No Bill James-type figure to push the development with a huge body of public work because...
3) Every time a potential figure shows up, they get hired by clubs. This creates a big competitive advantage for the hiring club, but it retards the development of the discipline as a whole.
Back to the title question – what if Bill James had been hired in the early 80’s?
The development of baseball statistical analysis would have probably taken a lot longer to happen, which in turn might have delayed improvements across any number of other sports.
In fact, you might say that baseball even a decade or two after taking James out of the ecosystem would have ended up looking a lot like football/soccer analysis does today.
Michael Caley stated over the weekend that his model had Liverpool finishing 4th “by default.” He said basically the same thing about Arsenal finishing 3rd. After a weekend when so many “contending” teams imploded, this felt right on so many levels.
Those two teams are classic “hey, we’re not very good right now, but at least everyone else is worse!” material. Spurs are still wandering through the desert of Pochettino’s press, trying to find themselves. They registered all of seven shots against West Brom on Sunday while playing at home.
Seven. Spurs fans don’t even need their toes to count that high.
In fact, they’ve only averaged 11.6 shots a game so far this season, good enough for 12th in the league. Eeeeew. Poch is a good manager, but his style will take some getting used to. Spurs aren’t there yet. They will need to get there soon, or Thursday will remain Spursday forever more.
Speaking of teams that also aren’t there, Everton are waaaay down in 14th right now, only notching a solitary win across five matches. To be fair to them, they have faced Arsenal and Chelsea, plus the mighty Leicester in three of those five, and the loss to Palace was probably a bit unlucky as well, but a point a match to start the season just isn’t going to get you to the Champions League. They also have a tiny squad compared to most of the PL teams competing in Europe, meaning accumulated injuries will be a real problem.
Sitting just above Everton in the league table, but only because of goal difference, are the Gaalacticos. Despite finally opening and spending the entirety of Scrooge McDuck’s vault (normally used as a Glazer swimming pool) on players this summer, Manchester United have struggled. The shocking thing is that the struggles have come against the softest stretch of schedule they will have all season. Common wisdom had United roaring out of the gates, as the quality of their new purchases was likely to prove too much for Swansea, Sunderland, Burnley, QPR, and Leicester to handle. That hasn’t happened and now it’s anyone’s guess how long it will take for van Gaal to turn this band of highly paid misfits into a real football team.
For neutrals, the struggles of Manchester United stretching into a second season make for glorious theater. For United fans, they must feel like they are looking in the mirror and see themselves staring back, glassy-eyed and wearing Liverpool kits.
Despite their own issues, I think Arsenal are okay. Much like Liverpool last season, they have enough firepower up front to blow through most of the middle-to-bad teams without much resistance, though they will ship goals on occasion. This is what happens when despite years of pleas, Wenger once again fails to strengthen the squad at defensive midfielder and center back. The positive news is that their shot differential is finally good again. If they can avoid further defensive injuries and potentially strengthen in January, they should be able to keep the top of the table in sight.
Liverpool is another matter...
Tactical systems have tradeoffs. If your tactics are too defensive, your team will struggle to score goals. Sell out in attack, and even teams at the lower end of the table will score against your defense. Press poorly and teams will break through it and find themselves with great chances in transition. Fail to press at all and your team will likely be bit regularly by the jaws of probability, as at some point all those longer range shots will start to fall.
Last year, Liverpool opted for offense and a fairly aggressive press. In a sense, they sold their souls to the devil in exchange for old Steven Gerrard’s legs being able to both break up opposition attacks, and fire long passes out to Sturridge and Suarez on the counter.
This year the bill seems to have come due.
What happens when Gerrard, playing regista, no longer racks up defensive stats? Apparently, you lose. I think Liverpool upgraded along their back line, with Moreno, Manquillo, and Lovren all improving their quality at the back. That said, most teams have one or two other bodies in midfield to break up opposition attacks. Liverpool frequently have none. Even the best center backs in the league are going to look foolish when facing top attackers running at them constantly. At some point you have to conclude that the problems for Liverpool either lie with the personnel in front of the center backs or that they are systemic, or both.
Steven Gerrard is a legend, but either the system changed for the worse this year, or Gerrard just can’t put in the same miles as he did last season. Emre Can is talented, but 20. Joe Allen is frequently (read: constantly) injured and he doesn’t even have to train for Roy Hodgson. Lucas Leiva is... well, Lucas.
It’s possible Rodgers could try and keep the system intact by putting either Henderson or Allen in the Gerrard role. It’s also possible Liverpool should have bought an identical-output-but-differently-named regista this summer as succession planning for when Stevie finally hit the wall. All I know is that Rodgers needs to fix this problem fast or one of these abysmal challengers will finally start performing at a decent level and take their Champions League spot.
Quick Stats
The current top 10 for non-penalty goal scoring rate looks like this (had to have played half the available minutes so far to qualify).
While the top 10 for combined scoring rate looks like this:
It’s too early to draw any real conclusions other than uncontroversial things like, “Angel di Maria is a outrageously talented footballer.”
If you tried to draw additional conclusions at this point, you might end up being really confused by the fact that both Joey Barton and Leroy Fer are in the top 10 in key passes thus far, and say ridiculous things like, “Hey, QPR must be really good.” Not so fast, my friend.
The only thing I can promise is that both of these lists will change a bunch between now and the end of the season.
Burnley have played 5 games and scored a grand total of 1 goal. Sean Dyche’s team were promoted last year on the back of a stout defense, and they seem to have brought that with them to the Premier League, only conceding 4 goals in the campaign. The problem is that they will need to score occasionally too - 33 more more 0-0 or 1-1 scorelines won’t be enough for safety.
Interestingly, Burnley’s shot ratio looks alright, especially for a promoted club that has faced Chelsea, Swansea, and Man United already. We’ll see if the goalscoring regresses to expected levels, or if Burnley fans are in for a long, hard, boring season. The math models suggest have a better chance of staying up than preseason odds indicated, assuming that at some point they actually manage to score a couple of goals.
In January, I looked at what usually gets teams relegated from the Premier League and discovered that teams conceding an average of 16 shots or more per game almost invariably circled the toilet bowl. (For the record, last season’s relegated teams conceded 18.2, 17.8, and 15.4 shots a game. Norwich were the oddballs, while both West Ham and Sunderland gave up more than 16 shots a game but survived.) Early season numbers have a ton of volatility, but the three teams over the 16-shot threshold are: Leicester, Hull, and Swansea, with Palace fourth worst at 15.4. This will be something to keep an eye on through the autumnal months.
Four games, 11 goals scored, 0 goals against for Barcelona. And Luis Suarez hasn’t been anywhere near the squad. They are conceding six shots a game so far. That’s obscene, even by Barcelona standards.
Through four matches in the Bundesliga, Hoffenheim have conceded 2 goals. They gave up more than two per game on average last year. They are still conceding a ton of shots though, so expect this one to come back to earth, and soon.
Oh, and Paderborn and Mainz currently top the table in Germany. Just like we all expected...
An Apology
I am currently transitioning between two jobs right now, which means content from me is more infrequent and likely to stay that way until the first of the year. As much as I enjoy writing and researching, there just aren’t enough hours in a day right now to keep up with the demands of my career and family, while still producing high quality stuff here as well.
This doesn’t mean I won’t be writing at all, but it does mean updates from me may be spotty. I merely look at it as a good occasion for the rest of you to maybe start writing your own material and fill in?
There was a wordy preamble here, but I currently live on a building site, and the electrician inadvertently fried my working copy, so now you are spared such madness before we dive in to the good stuff.
Falcao vs. Welbeck - To the Pain!
Regardless of whether or not it’s fair, these two men will be linked to each other based on the events of their transfers. Falcao arrived in Manchester at the tail end of the day, either on an outrageously expensive loan or a gobsmackingly expensive transfer (as Guillem Balague reported was the story told by Falcao’s agents). Also changing teams at the end of deadline day was Danny Welbeck. Falcao’s arrival and Welbeck’s departure were nearly simultaneous, and the fee United garnered for Welbeck will pay for at least the first year of Falcao’s services.
The reality of the situation is that Falcao’s arrival was probably as a replacement to Robin van Persie, who may be gone for the season with another serious injury. However, the question the stats guys found themselves asking was:
If you have Danny Welbeck, why do you need Falcao?
Most fans will find that to be a stupid question on the surface – why are we comparing the performance of one of the world’s best number 9’s against, well... Danny Welbeck? Well, because there is a huge question mark over whether Falcao should be labeled as one of the world’s best any more.
Consider the following: While Falcao’s goalscoring at Ateltico Madrid was very good, it was partially fueled by a high number of penalties. This took him out of the Ronaldo or Messi range of scoring contribution and returned Falcao to the land of mere (incredibly talented) mortals. Additionally, after moving to an easier league (Ligue 1 vs La Liga), Falcao’s scoring rate actually went down, from .62 down to .45 non-penalty goals per 90. Add to that the fact that Falcao turns 29 this season, and he’s just coming off a serious knee injury and you have the basis to convene a grand jury.
Or um... to just ask the question above.
Now consider Danny Welbeck. In 12-13, I definitely said bad things about Danny Welbeck. In that season, he could not hit the broad side of the barn, and because if this he looked completely out of place as a goalscoring threat either for Manchester United or for England. Since that time, however, I have learned a lot, and done a great deal of research into what to expect from young goalscorers. In short... I was wrong.
Welbeck’s four-year goalscoring trend looks like this: .31, .40, .07, .56. One of those seasons is not like the others... was it a blip or some glaring problem? I’m going to lean toward blip, if only because the rest of the trend is so clear.
Welbeck isn’t even 24 yet. That’s an excellent scoring rate for a young player, especially one who rarely played as a center forward. Yes, there are some sub effects included in that data, but there’s also the fact that he was often played out wide, where it is harder to score. What Welbeck doesn’t have is elite shot generation numbers, but he has speed (something Arsenal have desperately lacked without Walcott in the lineup), good feet, and is used to playing combination passes with talented players. Arsenal do generate a ton of shots for their center forwards (Sanogo has seen and missed a ton of them), and Welbeck will get good chances. At only £16M, Wenger somehow ended up getting a huge discount for a young scorer who is just entering his prime.
Will Welbeck have a better career than Falcao isn’t what’s up for debate here. Falcao has had some amazing seasons that Welbeck might have to get lucky to match. The question is whether this year, and next, and the year after, will Welbeck score more goals than Falcao? Assuming they play the same amount of time, even if they cost the exact same amountper season, I know who I would bet on.
I think Welbeck will be good for Arsenal, and there’s even an outside chance he will be a great goalscorer over the next four or five years. However, Sturridge is one of the best goalscorers in the world right now, and it would come as quite a surprise if Welbeck were to reach those heights.
Arsenal’s Glaring Deficiencies
Arsenal went into this transfer window with a couple of needs. First, they needed to replace their right back, who was leaving. Beyond that they needed an elite center forward. They also needed an elite, playmaking defensive midfielder, and depending one what happened with Thomas Vermalaen, they needed to add depth at center back. In my opinion they also needed to find an elite left-sided wide forward, and though they were rumored to bid for Marco Reus, that deal was never very likely to materialize.
They somehow managed to address the right back and center forward needs (doing perhaps better than I expected with the attacking spot), while completely failing to bring in a dynamic defensive mid or adding to the defensive cover at CB.
How could a team that had clearly defined needs from the start of the summer and an enormous stockpile of cash let the window close without finding players to fill those positions? Only Wenger knows. The most likely scenario now seems like they will buy Rabiot on a Bosman next summer while missing out on any CB targets that Wenger felt would fit in the team, but it’s a huge gamble in squad depth, especially with Mikel Arteta already out injured.
Arsenal currently lead the league in shots for, shots conceded, passing completion, and possession. However, part of that is due to the fact that they’ve faced two likely relegation candidates in their first three matches.
There is an adjustment Wenger can make to help address the DM issue, but it involves playing Aaron Ramsey in that spot (he was good there back in 12-13 when Arteta was out), and shifting Cazorla, and or Oxlade-Chamberlain into the center midfield roles to provide dynamic passing and runnners. Wenger’s feeling is that this probably wastes part of Ramsey’s skill set, and given he was the best midfielder in the Premier League, I can kind of understand that. The alternative, however, is that Arsenal lack not only a physical destroyer (there is none in the squad), but also lack a passing quarterback to spring the sprinters on the wings into action.
You can probably get by without one of those things, but lacking both will cost points down the road. Again. Though I will say Arsenal should also be great fun to watch now.
Speed kills, and Liverpool have probably the fastest attack I have ever seen. At age 19, Raheem Sterling is already one of the best players in the Premier League. Liverpool’s ability to challenge for the title rests on whether Rodgers manages to adjust his system/personnel to control the midfield better, or whether his defenders are good enough now to consistently win battles against opposing attackers on their own. For the neutral, Liverpool’s balls-to-the-wall style is now must see TV.
After lauding Spurs last week, on Sunday we saw that they aren’t “there” yet. Kaboul looked awful yet again (which is why Spurs bought Fazio), and Liverpool’s normally porous defense was surprisingly stout against the Spurs attack. Pochettino’s press needs to work in order for his team to get a lot of good chances, and Liverpool acted like it wasn’t there. This level of defensive work always takes time to work perfectly – Spurs will need to be better if they are to compete well against the top teams.
Are Phil Jagielka and Sylvain Distin getting a divorce? What has long been the steadiest pair of center backs in the league has looked anything but that for the start of the season. There are rumors that Distin feels jilted and angry, and will now only speak to Jags in French. Roberto Martinez needs to get in there and do some counseling quickly, because so far this year it looks like Everton’s attack is married to Wigan’s defense, which is a recipe for disaster.
Manchester United’s cast-off forwards Javier Hernandez and Danny Welbeck ended up at Real Madrid and Arsenal, respectively. One of those teams is not in the Champions League this summer, and yet those players were eagerly snapped up by two teams that are. Some weird talent evaluation going on here.
In non-EPL news, PSV lead the Eredivisie despite some seriously questionable stats powering their wins so far. Memphis Depay and Luciano Narsingh will create some unexpected goals, but like @11tegen11 says here, expect regression to come. Cocu doesn’t seem like a good enough manager to enable them to just run away with the league.
Also outside of England, Iker Cassilas is done. Like done done. Like stick a fork in him because he’s a Thanksgiving turkey done. Deterioration of goalkeeper skills is often a sudden thing, but for Saint Iker it’s been a creeping suspicion ever since Mourinho benched him two years ago. Since then you have more and more consistent gaffes, both in the Champions League and at the World Cup this summer. The final straw, barring injury to Keylor Navas, should have broken this past weekend when he looked absolutely terrified, and barely attempted to command his box in a 4-2 loss to Real Sociedad. It’s a shame, because Iker was very very good for a very long time. At this point in his career though, almost any other option is preferable.
Stat Attack
Zlatan currently has an NPG90 rate of 2.51.
With a goal and four assists through three games, Gylfi Sigurdsson’s scoring contribution is 1.67. It’s unsustainable, but it lends credence to the theory that most players need to play in their best position to excel.
Also part of the four assists in three games club is Cesc Fabregas. Dammit.
Memphis Depay had his first real off game of the season this past weekend against Heerenveen, and still ended up with four shots and four key passes. He probably would have had one or two goals as well, but his legs clearly tired at the end of the match after a midweek Europa League tie.
Steven Berghuis, a player I think is one of the top talents in the Eredivisie, has a scoring contribution of 1.44 so far this year on 87.5% shooting! Needless to say, he’s not going to maintain that.
There is a guy named Igor currently leading the Championship in goals.
Finally, we come to Junior Malanda. Want to know how to confuse the living hell out of stats guys in terms of your Expected Goals totals? Check these out.
https://vine.co/v/MLXYwUHDrUM
That's nearly impossible to do, even when you are trying, but Malanda presumably wanted the ball to go IN the net, not away from it.
However, to prove his special set of skills, Wolfsburg fans got this in the next game!