What Happened At StatsBomb Evolve: 360, Data Quality, OBV, and more...
When you create something new and awesome, naturally you tell the whole world about it. So that’s what we did. StatsBomb Evolve took place on Wednesday 17th March to bring you – our friends, customers, stakeholders, and members of the football analytics community – up to date with (nearly) all that’s been going on behind closed StatsBomb doors since the last time we spoke to you. Stick around to the end for details on how to watch the event again.
Co-Founder and CEO Ted Knutson hosted the event and got us underway with a reflection and reminder of why StatsBomb as a company even exists.
Frustrated by a lack of innovation and progression in the space, StatsBomb as a data company was formed in 2018. Ted told stories of previous experiences working with data of poor quality, from providers with poor customer service (featuring Michael Jordan-related memes).
So StatsBomb decided to do it ourselves. In came all the datapoints that are now synonymous with us – Pressures, Shot Freeze Frames, and Passing Height and Footedness data, as well as the later introduction of Shot Impact Height to the data spec.
All of these things mattered then and still matter now, and all continue to create an edge for us and our customers when it comes to analysing and asking questions of the game.
But that was then, so what is now.
StatsBomb’s new data product is the brainchild of many incredibly smart people that we’re incredibly lucky to have working with us and now, after launch at StatsBomb Evolve, it’s out in the world.
Welcome to StatsBomb 360.
StatsBomb 360 is contextual event data that gives the location of all players on the pitch for over 3300 events per game.
With 360 Data, we’re now going to be able to answer the questions analysts have always wanted to ask, but never been able to answer with existing data.
Want to know which players break lines consistently and regularly?
Now you can.
Want to identify players who always find space to receive the ball between opposition defensive lines?
You can do that with 360.
Evaluate player decision making based on the choices they make from available passing options?
Yep, that too.
And that’s just on the player level.
On a team level, we can analyse team defensive shape around thousands of events per game, identify which patterns of play are most disruptive to your, or your oppositions defensive shape, and clearly obtain a better picture of where gaps are appearing in your structure.
Those are a few examples but the truth is, we don’t even really know everything that’s going to come out of this data yet. We’ve only just created it ourselves. But we do know there’ll be a lot. And it’ll create a significant edge in recruitment and opposition analysis to those who have it.
A selection of Ted's slides from the day can be found below:
That’s not all we’ve been working on though.
Head of Data Product Ali Elfakharany followed Ted to talk us through what else we’ve been working on recently, with a sneak peek at products that are close to going, ahem, live... as well as an overview of the types of questions that’ve had to be answered to get to that point.
Ali started off by setting the rumour mill in motion with talk of a potential LIVE data product on the horizon.
Live data is and has been one of the most requested features and products of us since the day StatsBomb formed as a data company. With obvious utilities for all of clubs, media and gambling entities, people want to use the same StatsBomb data they currently receive and love post-game, for analysis, engagement and modelling during the game. Whether or not that becomes a reality in, say, this summer, remains to be seen. We’ll keep you posted.
The biggest questions the data team have had to answer over the last 18 months has been around how to upscale from what used to take multiple hours post-match collecting the granular level of data that we do, all the way to achieving the same quality of data in real-time.
Ali gave insight into how StatsBomb’s collection processes now incorporate Computer Vision as well as Human collection to improve both the speed and quality of the data collection process. Humans and machines combined has always produced better results than humans and machines working independently.
This has allowed us to ensure extremely high accuracy in our data and is also the development that means we are now a step closer to making Live StatsBomb data a reality.
A selection of Ali's slides from the day can be found below:
Head Of Data Operations Hesham Abozekry took this further to give more context and insight into how StatsBomb maintain the highest quality and accuracy of data in the industry.
Ask a data collection company the accuracy level of their data and they’ll often return with a 90%+ accuracy slogan, 99% in most cases.
The reality is, football data collection isn’t that black and white and in several cases every match, events will fall into a grey area between two or even three event qualifiers.
The biggest challenge for a data collector is to ensure consistency across all events, in all matches, in all leagues. So Hesham described the two main review processes StatsBomb have implemented to ensure this would be as small an issue as possible.
One is a sport-specific rule engine that immediately flags any events logged during collection that appear illogical or very unlikely, that instantly go to review.
The other is an Active Review, where a selection of matches are collected twice by separate collectors, with the two data feeds then compared by a member of the resolution team who evaluates the rare instances of conflict in the data and makes a decision over who is correct and the severity of the mistake.
This has proved valuable not only for maintaining accuracy but also for identifying where the biggest areas of subjectivity lie in the data collection process.
A selection of Hesham's slides from the day can be found below:
Dinesh Vatvani, Head Of Data Science, gave us the final presentation of the day with an update on the latest research and developments to come out of the StatsBomb data science team.
The headline act was the introduction of StatsBomb’s possession value model called On-Ball Value, our approach to valuing every event that happens on the pitch based on how it changes a team’s likelihood of scoring or conceding.
On-Ball Value will be made available to StatsBomb customers within our analytics platform StatsBomb IQ in the coming weeks, and represents a tangible upgrade on the possession value models that have come before it.
The introduction of OBV means we can now value players based on their contribution to the team outside of whether the possession had a shot at the end of it. OBV will identify good creators that exist on teams that don’t have any good shot takers. Where expected assists removed the need for a goal to occur to credit the creator with a subsequent assist, OBV removes the need for a shot to occur to credit the creator, whilst also identifying players that are important to the progression of moves earlier on in the possession chain too.
Dinesh went into major detail over the strengths of StatsBomb’s approach to OBV over the iterations that had come before it elsewhere, namely the use of possession states instead of actions, the exclusion of possession-history features as a proxy for defender positioning, which in turn removed team-strength bias that exists in other approaches to this metric, as well as the use of StatsBomb’s richer and more football-contextual data to train the model on.
That’s to name just three. But of course, all this modelling work would be for nothing should a certain diminutive Argentine not be sat at the top of the rankings. One all-important question needed answering. Was Lionel Messi top?
The answer was yes, which as we all know validates OBV to the highest standard of which all models seek to achieve. Case dismissed.*
*Other, more statistically-robust measures have been tested on the model and were detailed in both the talk and in the white paper sent out to StatsBomb customers immediately after the event.
A selection of Dinesh's slides from the day can be found below:
Lastly, it was back to Ted for a Q&A in which he gave more insight into the competitions and seasons covered in 360 data, the possible release of free 360 data to the public, and neither confirmed nor denied whether he'd been wearing trousers or pants during the presentation.
Now, to answer the most burning question we’ve been asked since Wednesday.
The StatsBomb Evolve event is available to watch On Demand. You can do so by clicking here.
StatsBomb 360 is no longer something that is happening in the future. It's live RIGHT NOW.
If you’re a team, media, or gambling organisation and want to speak to us about how 360 Data will significantly upgrade your operation, you can get in touch and we’ll be happy to help.
Thanks. We’ll see you again soon.
Further StatsBomb 360 reading: