Tom Brady. 7-time Super Bowl Champion. 5-time Super Bowl MVP. 3-time NFL MVP. Most career passing yards. Most career touchdown passes. Most games won by a player. Most games started by a non-kicker. 1997 NCAA National Champion*. Birmingham City owner. Delta Airlines strategic adviser. Former husband to Gisele Bunchen. The list of notable accolades for Thomas Edward Patrick Brady Jr. is endless.
Many, many moons ago now, StatsBomb released the Lionel Messi Data Biography. We wanted to give the world a full catalog of soccer’s widely-recognised GOAT’s league games, as some of his early seasons pre-dated any data company’s x, y collection. And so we did. For those who want to explore, the LMDB currently contains a full longitudinal data set of Messi’s Barcelona league career (2004-2021) that will receive a new update at the end of 2023.
Apparently, he’s still pretty good.
Turning our gaze to a slightly more padded sport, before we even finished our data spec for the American Football product, it was obvious who the first Data Biography in this sport would feature. The incredible thing, at least to me, is that the NFL has perhaps the shortest career life span of any major sport in the world, and yet Tom Brady somehow started his professional career before Lionel Messi (2000 vs 2004).
That is so much data to collect. Especially when you realise that it takes us about 80 hours per game right now to collect the full American Football data spec. And (assuming we can get the older video) we’re going to give all of that to the public, for free.
*Taps the roof of the data set*
So what exactly is in this bad boy…?
The answer is: SO. MUCH. STUFF.
It contains all of our event data spec. The details of which can be found in the documentation located [HERE]. It is the deepest data set in the industry, by far. It includes huge USPs like:
- Line Battles. When I designed the data spec, I wanted to give scouts far more objective information on what was happening in the trenches, in the thought it would provide better details on player skill sets for some of the hardest positions to evaluate. Enter Line Battles. We collect blocks and attempted blocks across the OL/DL at 10 frames per second. You’ll get info on double teams, combo blocks, get off, pushback and a lot more.
- Pass placement. We collect the x/y/z of passes on the receiver at the moment of catch. Combine that with route info, and you get very interesting information about passes in front of/behind the receiver across the field.
- As a follow-on to this, we’ve also put our completion probability models into the data. We think this is the first open release of model info incorporating tracking/freeze frame features, plus pass placement.
- Low frequency tracking data. We know where everyone on the field is at snap and update that info in the data a minimum of 2.5x per second. [Note, this is unsmoothed tracking data. We have a smoothed version at 30fps in our American Football Tracking product.]
- There is pre-snap movement. There’s x,y information about attempted tackles/missed tackles/where they started and where they ended. There are formations and personnel groupings. There are GSIS ids so you can join it to other data sets. And more, and more, and more. It’s too much stuff to reproduce in this small space, so check the docs and get started. OR… There’s actually a 30-page deck that details the stuff we’ve produced across the data and in StatsBomb IQ, so if you want more info on purchasing, please contact email@example.com
The process for the data releases will look like this.
- The first batch of data is available right now. It contains event data and low frequency tracking data for the last two seasons of Brady’s career (2021 and 2022).
- The next installment will land next month, and we’ll do monthly releases thereafter, going backwards. This seemed largely in line with Brady’s sometimes Benjamin Button-esque career and facial feature arcs.
- Next month’s release will be a bit different, but I’ll explain more at the time.
Accessing The Data
You can access the data via our GitHub page, where you will also find example Python and R code to not only help you pull the data, but also to work with the data and get you started on this deep dataset.
Ts and Cs
This data is not to be used for commercial use.We would love to license this project for media use, etc, but there will be a cost attached to that and we can discuss more at Sales@Statsbomb.com
Meanwhile, feel free to use it as much as you want in research or for personal use. Create new vis, blogs, analysis, etc.
Full terms and conditions are available [HERE]
One thing to be aware of... None of the data will be perfect. We have a ton of QA and verification against official sources, but if you’ve ever seen the play-by-play feeds from those, you’ll understand that data collection is HARD. We’re just the ones that are honest about how hard it is and acknowledge that there are imperfections up front.
On the other hand, we also care about getting every detail as correct as possible, so if you find things that are wrong, please send emails to firstname.lastname@example.org and we’ll fix it.
Call To Action
One of the difficulties in doing the Lionel Messi Data Biography was procuring the old video of Messi’s early career. We tried to get in touch with Barcelona, who suggested we contact La Liga. We contacted La Liga, who told us they no longer had video available of those old matches because they had changed video and media companies multiple times in the intervening 14 years. That… was going to be a problem.
The internet archive contained some of these oldies, but not nearly enough to fill in the early data set.
Somehow, former employee Pablo Rodriguez found a Barcelona video otaku who agreed to give us the videos we needed as part of the project, or the earliest seasons would have been dead in the water. A minor Messi miracle, in a career full of them.
With Brady, we are set for video back to the 2016 season and after that, we could use some help.
Hello, NFL. Would you like to help us?
Given the challenges of creating tracking data, we strongly prefer to work off all-22, so if any of you kind souls have complete seasons of old Brady games - especially on all-22 - and would like to contribute those to our project, please get in touch. (My email is at the bottom.) We will not share these videos with anyone, we simply need them to create data that we will then release to the public, entirely for free.
The other huge challenge will be finding video for Brady’s ‘98 and ‘99 Michigan seasons. If you know someone who can help us out with the Maize and Blue historic video, again please get in touch. We do have the ability to create data off of old video formats (we did this back to 1996 for UEFA, and even earlier for our soccer Icons series) and will do so regularly for this project if we get the video sources, even if it’s a painful process. This is another feather in the cap for our outstanding collection team.
So yeah, free NFL data. A hugely ambitious project. And the greatest QB of all time.
Please enjoy and use the hashtag #TB12DB whenever you post stuff on social media, so that others can easily find and surface your work.
All the best,
*So technically Brady was on that 1997 Michigan team that won a national championship. He didn’t really do anything that year (12-15 for 103 yards), but it’s listed in Wikipedia and I was amused enough to include it.