One year ago today, I stood in a lecture hall in South London, waiting for StatsBomb’s launch event to start. The Data project was secret, and had been under wraps from the outside world since inception. This event was the culmination of nearly a year of work.
It was also probably the biggest personal and financial risk I had taken in my career.
Needless to say, StatsBomb Data came as quite a surprise to the data world. That's because StatsBomb Data wasn’t supposed to exist.
It wasn’t supposed to be possible to build the infrastructure necessary to produce detailed event data at a higher spec than Opta and the other competitors in the space - at the quality we knew we wanted - without far more time, far more money, or both.
But there I was, happily setting up the room with our team as we prepared to officially announce our new baby. One of my favourite things in the world is releasing new products to an audience, but this was special.
A year later, one question I often get asked is why? Why would we take all this risk to go after a market that already existed, and one where one giant company had developed a near monopoly on top quality data?
Because someone needed to do it better.
The need to improve our understanding of football demanded it, and it was pretty clear that none of the major suppliers were going to deliver a better product. I know, because I talked to a number of them about it.
“Hey, what about this?” Silence.
“How about collecting this new thing?” Silence.
“We are a paying customer and have this problem - could you, I dunno, answer your customer service emails?” Silence.
In the end, I came to the conclusion that StatsBomb could do it best. So we did.
And boy was it hard. Like, not from a from a technical perspective - that was fairly straightforward. Collecting data from video has been around for ages. Adapting the software to allow collectors to add new events and qualifiers was not terribly difficult, and we had a great partner in Arqam FC (now part of StatsBomb) to help us pull it off.
But from a design perspective? A process perspective? A quality perspective? Really, really hard. So many difficult choices were made, challenging problems were solved. Many more problems had no solutions, they only had consequences. You can make it this way or that way, but neither answer is optimal. Apparently kids these days call this “adulting.”
“I don’t think we expected this, but you are now our most important data provider.”
That’s a recent quote from one of our Champions League teams, and a huge compliment to the team at StatsBomb and Arqam for what we developed.
A compliment of a different sort is that teams who are on StatsBomb Data right now improved their points totals by 20% versus a year before. Better data = better analysis = better performances. It's something I hoped for when I started this project, but it's been pretty amazing to see it play out in the real world.
As I said, this stuff is hard, and often in ways I didn't even expect. We’re still not perfect, but like the teams that are our customers, we work our asses off every day to get better.
In fact, a lot has happened in a year. Maybe it’s best to review how StatsBomb has changed as a company (and a website) in that time.
Flashback one year: On May 9th, we introduced our new Data to the world. About an hour before our launch, Opta threw this tweet out. What curious timing.
Followed by this one a day later, and a blog explaining their new qualifiers.
Thanks for keeping us on our toes, folks! And for designing a data upgrade that can apparently only be interpreted via 3x3 matrices before disappearing from the world again. This announcement came as a huge relief to us because it proved to be so far inferior to what we had developed that we now knew we had a chance to succeed.
In contrast, StatsBomb customers know how much pressure a shooter is under from exactly how many defenders and the GK, in which locations, on every shot. This information has been available since we launched. We have even documented our research in this area extensively, here, here, and here.
Back to our story… new data also meant we were no longer shackled by restrictions from our old data provider, which meant we could turn StatsBomb.com into a place to post analysis and insight from whomever we wanted, on whatever topics we wanted, five days a week. So that's what we did. We recruited Mike Goodman to both write and develop content, while allowing a new crop of talented writers to show off so many cool new things.
For example, we started to profile "pressures", something we now view as the basic unit of defensive activity and an event that is unique to SB Data. Our research shows having actual pressing data (not derived info, like our competitors) dramatically changes how you evaluate teams and players defensively. Why is Roberto Firmino amazing - he's a striker that doesn't score many goals? How can we better show just what made Burnley's defending in 17-18 so unique? Pressures unlocked this information in a way that simply wasn't available before.
Or how about pass height, which is unique to StatsBomb Data, and is a fascinating indicator of team style. It's also a key component in creating better pass difficulty models, which is a hugely important area of research in a game that is largely comprised of passes.
James Yorke recently wrote about pass footedness, which is also unique to StatsBomb Data. Why does this matter? Because with this information, we built a new passing model that lets teams fully profile the quality of player passing with each foot. Maybe one player is amazing with his right foot, but only attempts 5-10 yard passes with his left? That’s in the data. Maybe your coach demands a two-footed player who can make difficult passes with both feet? The model lets your recruitment department uncover those types of players easily.
And possibly the thing that got us the most attention that we released to StatsBomb IQ in the last year was our Goalkeeper Module. Because our data has the location of the GK and defenders on every shot, we are now able to evaluate GKs via data in a way that was never possible before. Data scientist Derrick Yam used this to question the fee Chelsea spent on Kepa Arrizabalaga at the season’s start, and his framework for GK evaluation was accepted for poster presentations at the massive Sloan Sports Analytics Conference in Boston.
Beyond the public-facing research, we also have done unique and fascinating customer-only research as well. This includes detailing the new expected goals models, more information on GK evaluation, a groundbreaking analysis of [REDACTED] that I hope to be able to talk about in the future, and a recent study of how the Danish Superliga has changed over time, plus how it compares to bigger leagues like the English Premier League and German Bundesliga. This is above and beyond the typical head coach and player research we produce for our consulting customers on a monthly basis.
Behind the scenes, we produce regular white papers for customers, detailing our research, and unlike most companies, we discuss in detail where our research has failed. We think this is hugely valuable for customers to see, partly so they understand what we are working on for the future, and partly so they can learn what approaches to modify or steer clear of. Data science is hard and data scientists are expensive. Saving your customers time by educating them on failed approaches is a hugely valuable service, but one you'll never see out front.
I designed the initial data spec, but I've still been somewhat shocked to learn that there are so many new, useful elements inside of StatsBomb Data that it will take us years to explore it and learn what it has to teach us. However, as noted above, customers are already using it to succeed.
Speaking of teaching, we also recently launched our first analysis courses to help anyone interested in the sport better understand how it works through our research. The introduction course is suitable for literally anyone who likes football, while the Set Pieces courses are geared for coaches and analysts who want to learn more about this phase of the game.
Although it is increasing every year, data use in football is still in its infancy. I decided we needed to get out there and teach the information to the masses, and I think what James and Euan have produced with these courses is both unique and exceptional. If interested, you can find more information here.
So That Was the Past Year, Wrapped Into a Tiny Bow - What’s Next?
We will keep getting better. We recently announced data upgrades for next season, including a video explanation of the new stuff we are collecting and why. We are pretty sure Shot Impact Height will improve the accuracy of expected goals models, so we have incorporated that into our data collection. We have also added body pose information about GKs on every shot and save into the data spec. From a football perspective, GK position on the pitch isn’t just a dot of x,y information, and our data will now convey a lot more about what GKs are actually doing when it comes to shot stopping.
We also have exciting new things coming to StatsBomb IQ, including a tactical suite that will change how coaches and analysts are able to use data to analyse their own teams and their opponents. It’s the culmination of our own work in football combined with years of talking to coaches and analysts about how they look at the game, combined with understanding how data can improve that process. There’s no hype when I say our product will be great, and there is nothing else like it. Expect to see more information on this product as European teams get into their preseason camps.
We will also continue to release free data to the football world. The FIFA Women’s World Cup data will go out to the public daily during the World Cup itself, and we will keep producing FAWSL and NWSL seasonal data for free. And... there will be a new release of free men's data beyond the 2018 World Cup that's already there, but I don’t want to say any more because I don’t want to spoil the fun.
Another thing that will happen this year is that our competitors will continue to try and copy us. That’s just how business works.
They copy, we innovate. They market, we produce. They appear in media…
If your team isn’t using StatsBomb products and services right now, you’re already behind. And given how quickly we are releasing new things that help our customers perform better, that is the one thing that will probably not change in the year to come.