This summer, StatsBomb is celebrating a special anniversary: 10 years since the site was formed and the first blog post was published.
A decade ago, the football analytics community was nascent, with a handful of prolific analysts experimenting with whatever football data they could get their hands on – which wasn’t a lot. But with every new blog post, a new analyst would be inspired, they'd write a new blog post... and so the community grew.
Ted (Knutson, now CEO of StatsBomb) created StatsBomb.com to house his own writing, but mostly to act as a centralised hub to amplify the work of the early analysts and researchers (for more on that you can read Ted’s 10 Years of StatsBomb blog post). Ten years on, we’ve spoken to some of those early contributors and will be sharing those conversations in a succession of articles that we’re calling the StatsBomb Originals series.
First up, Colin Trainor.
Colin was one of the earliest and most prolific analysts on the scene, publishing analysis and research that still gets used today. Ever used PPDA? That was Colin’s. Clubs at all levels and even media and broadcasters use it nearly every week in 2023 - and it was first published by Colin on StatsBomb. Like many early analysts, he was quickly hoovered up to consult for a Premier League club, but has more recently turned his hand to analytics in Gaelic Football.
Without further ado, here’s Colin.
What was the first thing you worked on as an analyst?
Colin Trainor (CT): Probably Shot Maps in Excel using conditional formatting. It must have been probably around 2011-2012. I realised that like I had lots of data and could do stuff. Now, shot maps and stuff, they just are what they are but it’s just absolutely crazy they didn't exist before. That was probably the first thing. I was just presenting the data that I had without doing too much smart stuff with it.
Do you remember your first analytics experiment or lightbulb moment?
CT: Not a light bulb moment but I think back as how little there was known about anything. I remember doing something on near and far post shooting. In ways it was easy pickings.
Eventually, I got to the end of my technical abilities; I couldn't do the more advanced modeling because I didn't have a nuclear physicist degree! Back in 2012, 2013, I would describe myself as just a normal person who's got a very inquisitive mind, reasonably smart but definitely not advanced technical skills. You could do some of that early stuff because there was nothing prior really. So you were creating foundational things. Then you need people like say Will Spearman to do the really smart stuff.
What has been your favourite piece to write or read on StatsBomb?
CT: The Dortmund one was a favourite. In hindsight was just such a simple article but it was at the time like everybody was wondering what was up with them and it was over the Christmas holidays during the break in the Bundesliga and the Dortmund official account retweeted it. That was super cool.
None of my articles were really tactically difficult. I would just wonder: I have a whole big data set and what happens if I slice it this way or that way. It was really just getting foundational knowledge about how the game worked. I also remember Ted's one where he wrote about Eriksen having 9 key passes, which was notably high, and i’d sent him some data and he got huge traction on that.
Whose work did you read early on? Where did you read this early work?
CT: Twitter sparked it all. I think the whole community was on Twitter. I described Twitter as like the pub, it's like this big infinite sized pub and you can find the corner that you want to go to. And you can go to your stamp collector's corner, or your betting corner. And then there was a sort of stats, football analytics corner. And that's all I did. I just spent my time on Twitter looking at a list I had, reading whatever came up.
Do you remember any particular articles that inspired you? Ideas or metrics or research?
CT: Some of Michael Caley’s stuff was really good. But the main thing for me was watching a football game, I'd have been thinking because again, there was zero foundation there: how can I measure this in the data? For example PPDA. Sitting watching the game and commentators are talking about a press. And then I thought, right: I know it isn't going to be perfect because you only have f24 on the ball data and you don't have any of the off the ball stuff, but could I come up with a metric that's at least going to give some sort of an approximation? PPDA came out of that and I know it's not perfect but it was better compared to what had come previously, which was zero.
I don't watch much football now. I don't get much time, but even Sky Sports, sometimes when they do stuff, they’ll use PPDA. I remember shortly after I wrote it John Coulson (Opta), sent me an email saying "Colin, we've had [a club] contact us. Can they get our PPDA model in their metrics?" They thought it was an Opta metric. Again you do this work because you want to advance, you want to advance the game, and you want to help the understanding and it’s a nice boost to your morale when you create something and you get these outcomes.
Are there any metrics/frameworks from the “early days” that you still use in your work now?
CT: I suppose xG is the big thing. I'm currently working in Gaelic football/GA. It's an amateur sport, there isn't any money. There's actually no data. So, I spend 90% of my time clicking on buttons to create data with a view to create an Expected Points Model, based on the location, based on the pressure and so on, to get a fairly rough and ready model. There's probably 16 people around Ireland coding the same game and everyone's doing it their own proprietary way and it just doesn't make sense for anyone. It’s a totally different space in terms of advancements and professionalism exterior to the athletes i.e. infrastructure.
Do you remember any particularly bad analytically-driven takes you had in the past, or work that you would approach differently knowing what you know now?
CT: I remember when I created xG2 [post-shot xG]. I remember being drawn to all sorts of pretty firm conclusions based on the sort of the over/under saving of a goalkeeper and then with a bit more data, you realise this just mightn't be that.
Is there any piece of work that you're particularly proud of?
CT: Probably the PPDA I would say. The fact that that was nine years ago and people are still using it.
Where has your analytics work taken you and your career?
CT: I consulted for Bournemouth the first year they were in the Premier League. So I did that and then sort of stepped out of it. And then in 2020, it was COVID and lots of stuff being streamed in terms of Gaelic football, I started to do some Gaelic football analysis, and again, due to the wonders of Twitter, put stuff out. Then I started working with Kerry in 2021 - they’re the most successful football team in Ireland in terms of All Ireland and were beaten in the semi final in 2021, but won it in 2022. So was great to be a small part of that. And I'm there with them again this year.
My lad got to be a mascot at Bournemouth too, he was 7 or 8 at the time. Coming from Northern Ireland, it would be a whole weekend thing and we went to a handful of matches. He still talks about those experiences, going over staying in the hotel, to the match, fly home. Was a nice side benefit to doing that.
Is there anything else you'd like to say about your experience in football analytics and writing for StatsBomb?
CT: StatsBomb was great. Obviously Ted was a driving force but the whole Twitter / football analytics world back at the time was enjoyable too. Ted was a leader of all of that and had the foresight or “get up and go” to create StatsBomb. He reached out to me and said “Look, I want to bring this here and get eyeballs on this work and create a hub”. And StatsBomb was the first place to bring these ideas under one roof like that.
Our sincere thanks to Colin for giving up his time to share his experiences with us. You can find him on Twitter @colintrainor.
We’ll be back next week with more from the StatsBomb Originals series.