2020

The Invincibles Project and Classics Data Pack 1

By Ted Knutson | June 1, 2020
2020

The Invincibles Project and Classics Data Pack 1

As those of you who follow me on social media are aware, earlier this year we started working on The Invincibles Project. The idea behind this was to collect all of the data from this historic season to be able to look at it through a modern lens. I had initially pitched this as a follow-up project after the Messi Data Biography as something different, and another way of unlocking football's history.

As an Arsenal fan, I found the whole thing exciting. Prime Thierry Henry! Doing things like this:

The majesty of Robert Pires. Taking bodies!

Dennis Bergkamp! Patrick Vieira! Jose Antonio Reyes! Kolo kolo Toure! Sol Campbell! Mad Jens!

*Highbury roars*

OMG SO EXCITING.

Cashley.

*crickets chirping*

Also as an Arsenal fan, I know that other Arsenal fans could use a little joy in their lives and this seemed like the only way we were getting anything fun out of the Gunners in 2019-20.

We started collecting this with an eye to releasing it side by side with the data set from a different red team, should they manage to finish their season undefeated. Sorry Liverpool fans, due to circumstances beyond our control, that data release slipped through our fingers. You'll have to settle for merely a league title and one of the largest title winning margins in history.

The Problem

In order to collect data, we need to have video. It was fortunate for us that Lionel Messi has played his entire career for Barcelona, because that is one of the few teams in the world that has historic video available on the internet from pre-2010 without needing to jump through a million hoops. That doesn't mean that getting all of the video to reconstruct Messi's club career was easy - far from it. It was merely doable.

Arsenal? The only undefeated season in Premier League history? You would think this would be at least as simple as sourcing 15 seasons of Messi, right?

It was not.

We managed to get about half the 2003-04 season from the usual sources of football video history. And then we hit a wall. Our man in Spain and historic video expert Pablo Rodriguez then went to work, checking with various and sundry collectors that he knows who have large archives of historic, important football video. Through these wonderful people and the standard exchange of goods and services we were able to get to 32 matches of video. And then we hit another wall.

Why? well as Andrew Mangan of Arseblog reminded me, not all matches during that time period were broadcast to TV. In the modern day, every Premier League match is broadcast to air in multiple countries, which makes it easy to grab that video and store it away on a giant hard drive. Back then? A number of 3PM matches on Saturdays were simply never broadcast. (At least to our knowledge.) Which means that the collectors would not have that video unless they somehow tapped into different sources.

We checked with Arsenal. I've been lucky enough to meet people that work for the club over the years, and we figured maybe they would let us have access to the video to collaborate on the data release and some cool stuff with club media. And they totally would have been...

Except they didn't have the video either.

Someone who worked for Prozone back in the day suggested that the opponents might have those videos, as they would have been delivered by courier as part of their service. But that ran into a variety of snags, including the fact that football clubs change personnel on this end with remarkable regularity, and having the archive, being able to access it, and even knowing who to talk to was insurmountable for us.

The other problem here is the transition from analog to digital. Pretty much all archives back then were tape archives that would later need to be digitised so the match would be preserved for history. Rob Bateman of Opta tells the tale of trying to collect old Premier League matches from the 90s and being surrounded by crumbling video tape from the league's first decade. These Arsenal matches came right at the tail end of that period, and my understanding is that the PL has started to archive its history as much as possible, but it's still very much a work in progress.

Finally you hit the problem of a license fee. We got in touch with the archive service with a willingness to pay a fee to obtain the final six matches needed to complete the project. We were quoted a figure to license the video for the entire Arsenal season that frankly didn't make any sense to me, and certainly eclipsed my budget for a public service project.

I wanted to get everyone a data gift to bring people some joy during the pandemic, but I didn't want to/could not pay the price of a car to make that happen.

The Premier League itself actually showed willingness to help us out, but as you can understand, they are rather busy with other priorities right now (like restarting the league during the middle of a viral pandemic) and suggested maybe we can revisit this when the world wasn't quite so mad? Which totally makes sense.

But I have an anniversary data release deadline, and thus here we are.

Incomplete Invincibles.

Classics Data Pack 1

To make up for my own disappointment in not being able to complete this project, I added some extra matches I thought might interest people, including non-Arsenal fans. So what you are getting today as a gift from Hudl Statsbomb is a hefty little slice of football history, wrapped in the above-named package. In addition to delivering 32 of 38 matches from the Arsenal 2003-04 Premier League season, we are also giving you UEFA Champions League Finals data from 2000-2019. The collection on those CL matches aren't all finished, so will trickle out to the repository gradually over the next week to complete the set.

Thank you to all of the fans out there who have supported Hudl Statsbomb over the years. Thank you to our customers who buy our products and give us feedback to make us better every day.

And thanks to Arsenal for a truly magnificent season and set of memories. It would be great if we could get some more of those sooner rather than later. Information on how to access the data is here

A complete primer (in English and Espanol) on how to work with the data via Hudl StatsbombR is here

*EDIT: A new, updated version of the R Guide can be found here

The data comes with our standard non-commercial license that is usable for fan analysis and academic research. If you are a commercial entity that would like to use this data, get in touch with sales@statsbomb.com and we can have a conversation.

All the best,
--Ted Knutson
CEO, StatsBomb

*If we get video and I still run StatsBomb, we will finish this project.