This summer, StatsBomb is celebrating a special anniversary: 10 years since the site was formed and the first blog post was published.
A decade ago, the football analytics community was nascent, with a handful of prolific analysts experimenting with whatever football data they could get their hands on – which wasn’t a lot. But with every new blog post, a new analyst would be inspired, they'd write a new blog post... and so the community grew.
Ted (Knutson, now CEO of StatsBomb) created StatsBomb.com to house his own writing, but mostly to act as a centralised hub to amplify the work of the early analysts and researchers (for more on that you can read Ted’s 10 Years of StatsBomb blog post). Ten years on, we’ve spoken to some of those early contributors and will be sharing those conversations in a succession of articles that we’re calling the StatsBomb Originals series.
A warm welcome back to Mark Thompson.
Where everyone else was analysing the shots and expected goals totals of attacking players in the early days, Mark was donning his hard hat and digging into how data can be used to evaluate centre backs. Though his portfolio of work covers a wide range of topics and rabbit holes, he's probably best known for doing the hard yards in this area. He's been on the scene for around a decade, written literally hundreds of data-led articles, and workedfor several data and analytics entities, and his opinion on football and data is held in high regard around the community.
Here’s Mark.
What was the first thing you worked on as an analyst? Do you remember your first “analytics experiment” or lightbulb moment?
Mark Thompson (MT): I guess the first thing I did that could be called an 'analytics experiment' was digging into some Danny Welbeck data, sometime during the 2013/14 season. He was at Manchester United, and playing out on the wing, but I liked him as a player more than I felt his reputation was. I think I used Squawka to get the data for his goal-scoring - I'm not sure if I even looked at shot stats - for the games he played as a striker and the ones he played as a winger. And then I did that for United's other options at the time, so I had what felt like a more comparable dataset.
It turned out that Welbeck wasn't actually better than the other options, if I'm remembering right, but his numbers were closer than the overall figures where the positions being played was a confounding factor.
What has been your favourite piece to write or read on StatsBomb?
MT: To write: the one on Burnley. This company called Stratagem had made some data available which included stuff like defenders in the way of a shot, and this was early 2017 so Burnley were an obvious team to look at. They stuck out about as much as you'd expect them to. But it was also a piece where I combined the data with the video - which to be honest was partly just because I wanted to check whether the data was right - and I picked up on this thing that their defenders were doing, pinching inside the width of the goalposts.
To read: James Yorke's weekly round-ups were some of the best data-writing (or writing using data - I think the distinction is a fair one to make and his articles hit the mark on both fronts) that's been available anywhere. Outside that, there's no specific piece, but Mohamed Mohamed's writing, some of which has been on StatsBomb, has been a big influence on my own work as well; similarly, Grace Robertson's mix of analytics knowhow and knowing how to deliver a well-told story.
Whose work did you read early on? Where did you read this early work?
MT: I think I came to analytics more through Twitter than anything, although I'll have definitely read peoples' blogs. It will have been blogs of people I found via Twitter. I've definitely forgotten a lot of good work and good people that I've read so I'm wary of trying to list everyone.
Do you remember any particular articles that inspired you? Ideas or metrics or research?
MT: A big inspiration for first getting into analytics will have been Ted Knutson's early stuff around 2013 and 2014, both the radars and the scouting metrics. It seemed interesting, but it also seemed like he and others had attackers covered, so I tried picking up the defensive side. I'm glad, because central defence is objectively the best position on the pitch (both Tony Pulis and Pep Guardiola have played complete-CB back fours, so it's clearly something that nature bends towards, like fractals).
Thom Lawrence's experiments were always the great mix of interesting and clearly potentially useful, which maybe sounds halfway like an insult but to be clear: interesting without use is hollow, and useful without interest just doesn't make you want to have a go yourself.
Various papers from Sloan conferences around pitch control; Bobby Gardiner's write-up of Luke Bornn and Javier Fernández's paper, which among other things noted that Lionel Messi creates value by walking into space better than many players create value by running, is a classic.
Are there any metrics/frameworks from the “early days” that you still use in your work now?
MT: I think most of the stats are obvious ones, xG and stuff, and at work a large part of what we at Twenty3 do is make sure that all of the data provider's value is passed on to the user anyway, and that they can get the metrics/frameworks that they want to work with. The strong data vis people like Tom Worville, Maram AlBaharna, Peter McKeever, John Burn-Murdoch will all have been influential in the fact we make our vis look great. James Yorke (kindly) telling me that mine looked bad (around 2017, 2018) will also have been important there.
Another thing that I guess I still use, or subliminally draw from, would be Paul Riley's 'this doesn't need to be so complicated' mindset. Back in the day he put together, and showed people how to do it too, his own simple xG model. I think it's very valuable to have expert knowledge but to be able to gauge the level of your audience well, and to deliver the core essence to them without the complexity or frippery that might actually get in the way.
Do you remember any particularly bad analytically-driven takes you had in the past, or work that you would approach differently knowing what you know now?
MT: I am fascinated at how analytics and analytics-adjacent people - driven in part by StatsBomb - have recently moved towards being very pro-set pieces. Some of the earliest takes I remember were around the lack of value they brought, big defences of short corners for example - and now, the 'smart analytics edge' is to work on how you whip it into the box! I don't necessarily think those early takes were fully wrong, but sometimes I do wonder whether they, whether we, were closer to mistakes that we would, say, have accused Charles Reep of making [used data to back up an argument for long ball football], just in the opposite tactical direction.
Knowing what I know now, I think I'd also have been less wedded to the conception of what event data could look like. I've got a much broader knowledge of data providers as well as what tracking data brings to the table now - it was really easy, back in the day, to be limited to the idea of event data being on-ball data (at least, I think it was, in hindsight). I think I'd probably have tried out doing more self-collected data projects if I hadn't been so subconsciously attached to the conception of what event data was, which was heavily influenced by the fact that the data publicly available at the time was all about on-ball actions. I'm very aware, saying this, that this is something people in the game will think very foolish and naive to have thought, and to be fair to them they'd be right to.
Is there any piece of work that you're particularly proud of? Where has your analytics work taken you and your career?
MT: Well, I work in the space now, at Twenty3 - I actually joined Twenty3 as a writer on their editorial side (about five years ago now) and shifted over to the data science and product development team after about a year there. There's no way that I'd have had the space and time in a professional setting to develop as a coder and software developer without having the grounding in football analytics that I had and could bring to the table.
There are a few bits of work I'm proud of. There were a couple of old blogs where I self-collected some data, one on crossing and one on counter-attacks - it was time-intensive and I didn't really get enough data for substantive results, but I think the actually good thing is that they were good and interesting questions to look into: how much does the relationship between attackers and defenders in the box when a cross is made affect the outcome, how much does the volume and spread of attacking players in a counter-attack affect the outcome. To give him some credit, Sébastien Chapuis pushed me to spend more time thinking about football while thinking about data, and I think those projects were influenced by that.
Aside from that, I'm somewhat proud of spending a long time thinking about defensive statistics. Between the archive of my old blog and my still-existing newsletter Get Goalside I probably have a fair claim to having the most writing about defensive football statistics on the internet. Top four, at least. But we all know it's about chance quality, not change quantity. It may well be that the output is a bit "André Villas-Boas's Tottenham".
How would you rate the progression of analytics in the last 10 years, in terms of both research and application?
MT: I dunno, does this assume I know what fancy stuff the big guns with well-funded research departments are doing? 'Good', probably?
I think the concept (and modelling) of pitch control, which I think happened (publicly) in the mid-2010s, was a big step. Units of value are useful (which, as a sidenote, is why I think expected goals has had such a huge influence), and having a unit of value for player positioning... it just seems like such a fundamental building block to me.
About the application, I think you can track the progression by the stuff that people are moaning about. A decade ago it was... well, I'm not even sure what it was a decade ago, for a lot of people who came through StatsBomb it will have all been very new and exciting in 2013. Then came the frustration at The Establishment not listening to us. Then about how 'just hiring one person isn't enough, they need to be integrated into the club properly'. Then more recently people have flagged data engineering as being of big importance.
Data is definitely being applied, but I guess that doesn't necessarily mean analytics is - that requires more time for analysts, genuine statistical skillsets, and (let's be honest) salary figures which recognise these skillsets in addition to the unsociable and hectic hours that a data scientist at a club is likely expected to work, which they wouldn't in a regular job.
What are you most excited about in the future for football analytics?
MT: Oh, so much.
Data used more for player training: sure, you can buy players, but it seems better for everyone if you not only bring in good players but you improve them as well.
Skeletal data: More data isn't necessarily more useful, but the question was about exciting, and it's definitely exciting.
The community: This is twee and all, but it's important and good and exciting that the people doing all this stuff keeps going and growing. This is gonna sound rich and easy to write, but the most meaningfully positive thing about the increased usage of football analytics in clubs and media might be that people outside the traditional demographics see it as more accessible. It's been noted in the NBA that there's a dynamic of mostly Black players and mostly white analytics staff, and it's not that different in football either. There are people in the industry who care about this and other inclusion issues, but 1) I think it's inescapable that seeing a peer doing something tends to have a significant impact 2) I dunno how much is actively being done (including, I must admit, by myself) to address things. Women in Sports Data is a US-based org worth knowing about if it applies to you; please spam me with links to any similar ones I can share or help support.
Analytics as opportunity: There are two aspects of this - data and analytics doesn't necessarily require physical proximity, which opens the world up. There are people all across the world who love football but never had a footballing infrastructure to participate in the professional game through playing, but maybe they can through analytics, and maybe these places can use data to help turbocharge interest and investment in their domestic competitions.
This also applies with women's football, which has been let down or actively held back, to varying degrees, globally. The improvement in player quality (partly/largely driven, I think, just by a broadening in playing opportunity) could take years, a generation for policies to kick in, but the use of data doesn't need to.
Even if you decide that women's leagues need different data models to men's leagues, large providers have enough data to run those experiments and models - a possibility for implementation that's much quicker than research into injury prevention techniques (which are also needed). The obstacle is only ('only') cost. Yes, Premier League clubs with WSL teams currently spend more on agent's fees for their men's teams than their entire women's team operations, but the possibility that a fancy app with a pitch control model could be used in both leagues, right now, is very exciting to me. Even moreso if it's already happening.
Our sincere thanks to Mark for giving up his time to share his experiences with us. You can find him on Twitter @TweetsByMarkT, or through his GetGoalside newsletter.
We’ll be back next week with more from the StatsBomb Originals series.