We Want to Hire You

StatsBomb is currently experiencing explosive growth and has sailed through the startup phase straight into the scaling phase. 

In order to do that, we need great people.

Like you.

Why Should You Work At StatsBomb?

Because you love new challenges.

We are shaping the future of data in sport. That includes new technology, new visualisations, and completely new ways of thinking about the game. That is challenging work that changes on a regular basis, but if you are the type of person that loves figuring out new things, StatsBomb is a great place to be.

Stock Options.

Nearly every one of our employees receives stock options in their first year on the job. We believe strongly that our employees deserve a piece of the business they are helping to build, and our compensation plans reflect that.

Our revenues have more than doubled each of the last two years. If we can keep this up - and honestly, we are just getting started - it’s easy to see how this can turn into significant additional earnings over time.

Space to grow

One of the best things about working in a young startup that is growing is that new positions open up all the time. This means employees who excel have plenty of scope to move up in the organisation as the company grows.

Speaking of space, we are about to move into a brand new office before the end of the year, just outside the train station in Bath, fully equipped with a kitchen, free coffee, plus bicycle storage and showers.

Only 3 Days in the Office per week.

We hire highly motivated individuals, and in return we are able to offer huge work-life flexibility to our employees. Gathering the team together remains important, but most of our employees find quiet working days at home hugely valuable. Office hours are also somewhat flexible, removing much of the daily stress from commutes, school runs, etc that you get from more strict companies.

You want to work around incredibly talented people.

You can’t build a great company without a great team of people. We have that right now and already need more.  By joining our team, you get to work with some of the best people in the sports data field on a daily basis.  You also get to work with some of the biggest football clubs in the world.

Bath is gorgeous.

Bath is a UNESCO World Heritage city and one of the best cities in the UK for quality of life and work/life balance. It’s only 15 mins by train from Bristol, 30 minutes by train from Swindon, and an hour from Reading and Cardiff.

Job Openings

Our careers page is a work in progress, but expect to see new job postings frequently in the coming weeks. In addition to the 2 Junior Front End Developer roles and the Accounts Administrator role currently listed, we will also have full-time positions for

  • UI Designer Digital
  • Project Manager
  • Computer Vision
  • Developer Graphic
  • Designer Junior
  • Data Scientist
  • Quantitative Football Analyst

If you have ever wanted to come work at StatsBomb, now is the time. Even before a job description, if you think your skill set fills these titles, please send a CV to careers@statsbomb.com. To fill these roles, you will need a valid UK work permit and to work in Bath three days a week.

Ted Knutson

CEO, StatsBomb

A Sneak Peak at IQ Tactics + A Brief History of Radials/Sonars/Wagon Wheels in Soccer

Our summer project in StatsBomb IQ has been something I wanted to develop and release for more than 18 months now, but development work on the project was sidetracked by becoming a data company. Unintended consequences and all that jazz. Anyway, it looks like we will release this new section of StatsBomb IQ as a beta release next week and we are calling it IQ Tactics.  I’ll give you all some brief previews of the new module toward the end of this article, but first we’re going to talk about wagon wheels.

No, not THAT wagon wheel… these.

Why are we talking about wagon wheels? A fair question, and one I am glad you asked. The answer is partly because we have incorporated them into IQ Tactics for some good reasons, and partly because I’m a nerd who feels the need to cite and credit past examples and influences when producing new things. 

So what is a wagon wheel? It’s a cricket vis that shows where batters have hit the ball around the pitch. You can use them to better place fielders, find batter tendencies, or for various and sundry other reasons specific to the game of cricket.

They make a lot of sense for cricket, because unlike nearly every other sport, cricket’s primary battles take place in one central spot of the pitch (okay, technically two) surrounded by a circular surface. Obviously football is played on a rectangular pitch and has no consistent central points of origin - why are we talking about this type of vis at all? Well, because passing data is a lot. Like, a lot, a lot. You can’t just map the data and have it make any sense because there is too much of it. This is sort of true for a single game, but especially true when it comes to mapping a high volume passer, or even a low volume team across a stretch of games or an entire season.

 



Here are maps of three different aggregations of passing data. Red represents one completed pass, yellow represents an incomplete one. The first map is Manchester City across a single game. The middle is Marco Verratti across the whole of last season. And the last one is Burnley from last year. As you can see, the last two kind of stretch our ability to make any sense of what is happening apart from the colors here suggesting that Marco Verratti is considerably better at completing his passes than… um… Burnley.
So what do you do? Well, lots of things are possible, but from a process perspective, you need to take all of this highly granular data and abstract it in a way that can be interpreted. Traditionally we use heat maps or zone maps to help here, but like all vis, these have their own strengths and weaknesses. These are zone maps from Engine Room in StatsBomb IQ. They allow us to compare passing tendencies when the ball gets to a particular position in the pitch. In this case, I’m comparing Manchester City and Cardiff City from last year.

This vis shows where the NEXT pass typically goes for both teams. Notice the difference between how often either team plays the ball wide vs central, or in Man City’s case, directly backward from the zone they are in.

 

And this vis shows where the buildup pass came from. An entertaining 4% of all passes played into the zone directly outside the 18-yard box for Cardiff City came directly from the GK, while practically none of the passes Man City played into that zone came from their own half. Good ol’ Neil Warnock, out there tacticsing the place up. Anyway, these are fine and zonal or topographical heat maps are probably better, but for our new module I wanted to explore the radial/wagon wheel/sonar vis style and see what we could do with that.  As noted before, football differs from cricket in that it has no fixed origin point, but what if you made the origin of the pass the central point of the vis, and then looked at all passes from that perspective? These types of plots have been around in football/soccer for quite some time.

The first time I remember seeing them was a link from Howard Hamilton pointing to some random Chelsea blog doing the visualisation work in Tableau. Now this was before I was doing any work in football analytics (I was in gambling back then), so I didn’t pay much attention other than to mentally note, “hey this thing exists,” before totally forgetting about it again. It turns out that piece was written by current interim editor-in-chief of SB Nation, Graham MacAree. Before he was being full-time obnoxious to Zito Madu (a noble cause, if ever there was one), Graham was creating unique data vis for Chelsea fans on We Ain’t Got No History. Two things continue to impress me about Graham’s foray into radial passing plots.

  1. He did this in 2011.
  2. The bloody thing STILL WORKS.


So yeah, Graham is very clever and has been for a very long time, and if you ask him, he will tell you all about it. The next time I remember seeing these types of vis were in David Sumpter’s FourFourTwo pieces circa 2015.


Sumpter took a zonal approach to wagon wheels on the pitch that he called a “distribution map.” Longer lines meant longer passes on average from that zone, and he used a black-to-white colour scheme to indicate how common passes were in each radian, with black meaning very common and white meaning uncommon. With Graham’s design, this colour scheme isn’t necessary, but the single line scheme quickly gets overwhelmed as the season progresses. The next iteration of these I saw came from Ben Torvaney in April 2016. He used radial shards from a single zone and analysed Middlesboro’s passing by game state to see how aggressive or conservative they had been. It’s a lovely wrinkle to the analysis and a very readable blog post.

The aforementioned Howard Hamilton circled back on these in late 2016 with a very faithful cricket-style vis.


Finally we get to Eliot McKinley’s work on Twitter and the American Soccer Analysis blog. The ASA guys have been quietly innovating different vis approaches for years - partly due to a willingness to just try shit and a lot because they are smart - and Eliot’s passing “sonars” are by far the most attractive version of the radial/wagon wheel vis that I have seen
. The initial ones I saw were the positional sonars like the one below…

 

I looked at this and started thinking about zonal versions like what Sumpter and Torvaney had done to help look at game model information to make things more easily interpretable. Discussion around the concept even ended up in my Barcelona Coaches Summit presentation.

So even though the vis wasn’t there yet, the application was now fairly clear in my mind. The idea stemmed back to some old work Oliver Gage described in writing about his coaches’ game model when he was an analyst at University of Virginia. I’m paraphrasing here, but it was ideas like, “How often did we get the ball into these various zones? What did we do when we got there? We need to pass the ball forward x% of the time when we get into these spaces in order to put pressure on the opponent and have a chance of success.” This felt like one of those situations that these types of plots was made for. Eliot also started doing zonal plots around the pitch, though he uses different size for his zones than we do at StatsBomb. For our own versions, we ended up using length of shard compared to league average for the spatial component (shard length) and then the colour component is just raw pass completion percentage as default. However, we’ve added a number of other options that may still see release for IQ users to tweak to their liking. We’re also keeping the name “sonars” out of respect for Eliot’s gorgeous work. He will tell you himself that he wasn’t the first to try this style of vis on passing data (and MacAree probably wasn’t either, he’s just the first I am aware of), but Eliot’s are certainly the best versions and he deserves the recognition. Though we may choose to give the throw-in specific “thrownars” a miss...

 



So these are seasonal sonars for passes made by Manchester City and Cardiff City. It's a heatmap scale where deep red is a very high level of completion and blue is a comparatively low completion percentage.
And then these are sonars for Man City’s passes (again) but the second image is passes from Manchester City’s opponents.

 

You can also do player seasonal sonars like this one for Lionel Messi, Barcelona 2018-19. We also can go from abstraction (with the sonars) to explicit data with a simple click of the button.

 


My tweet of Ederson’s goalkicks compared to PSG’s went viral last week while I was playing around with the new tool. Then came a flood of requests for other comparisons. Here are Ederson, Allison, and Manuel Neuer’s goalkick maps from last season.



Even when compared to other famous goalkeepers, Ederson - and how Manchester City use him - really is something else.
And the new Tactics tool can do this type of vis for any team or player, with dozens of potential filters added on. And this is just the sonars section, which is probably the smallest and least powerful part of the new release. *deep breath* This is already long, but I’ll give you just one more teaser of what's coming next week before I wrap up. Let’s say you wanted to look at Virgil van Dijk and those raking crossfield balls he plays for Liverpool. First you load up VVD’s profile. Then you click passes, and you select starting origin of his own half and ending origin of the wide zones on the pitch.  Voila!

Okay, now just show me the ones to Salah and Mane.  And finally… just show me the passes he made with his left foot compared to those made with VVD's right.

 


IQ Tactics will change how coaches and analysts work with data from a tactical and opposition scouting perspective. Our goal is to take this information and make it as simple and intuitive as possible to deliver insight that helps our customers win games. I helped design the thing and I still can’t quite wrap my head around all of the cool stuff it can do. It is genuinely that exciting.
Anyway, it goes into Beta release on the StatsBomb IQ platform next week for our customers. If you are interested in having a demo of the new toys AHEM tools, ping sales@statsbomb.com to get started.

--Ted Knutson CEO, StatsBomb ted@statsbomb.com

PostScript: If you find yourself wondering what can be done with IQ Tactics and the Messi. Data Biography, you are in good company.

Introducing the Lionel Messi Data Biography

Scene: StatsBomb Strategy Meeting, Autumn 2018

“Should we release a new men’s data set soon? People seem to really enjoy the World Cup.”

“Probably? We’re definitely going to release the Women’s World Cup next summer, and we’ll put it out daily.”

“Oo, I like that.”

“Okay, but back to the men’s side… what could we release that would matter?”

“We could do a season of Premier League data. That would certainly get eyeballs.”

“Nah, Manchester City already did that in 12-13.”

“Really? Man, where did that data go? No one even knows that.”

“It’s not a terrible idea, but maybe we can do better.”

“What about a season of La Liga? I feel like that market has been under served and deserves some love.”

“Not bad. Maybe you do 17-18 so you get both Ronaldo and Messi in it.”

“That seems fine, but I’m still not excited.”

“I have this idea for some older matches. The Manchester United treble turns 20 this spring. It would be really interesting to collect some of that run and do the analysis of those games in a modern light.”

“Oo, I love that. Let’s do it.”

“Yeah, but it’s still not enough data for a public release. People want something they can sink their teeth into.”

“What if we release the last two seasons of Cristiano Ronaldo and Lionel Messi and really compare from a data perspective? They are clearly the two best players of all time.”

“Ronaldo, yuck.”

“One or two seasons of Lionel Messi isn’t cool."

"You know what would be cool? ALL of Lionel Messi.”

*silence*

“Oh shit... that’s never been done. Messi started in like 2005 - I don’t think data companies produced x/y data that early.”

“Can we even get the video on that?!?”

“Let’s find out!”

And that is how the Messi Data Biography began. Getting the video was an enormous pain in my ass. None of the usual video platforms have video anywhere near that far back. We then talked to friends, clubs, and former media rights holders for months trying to track down all these matches. I pulled every string I could think of and we were still only able to get to about 90% completion, hitting a hard wall with the last 10%. About at the point where we were going to buy DVDs off eBay in a hope we could fill in as much as possible, Pablo Rodriguez found a super-fan video archivist, and this source filled in all the missing matches. You all owe Pablo many, many drinks for his service. I’ve basically been floating on a cloud ever since.

So what is the Messi Data Biography? Quite simply, it is a data archive of every match Lionel Messi has played in La Liga since his career began in 2004-05.

We collected all of this data with our own time, energy, and crossed eyeballs (the old video is really poor quality) over the last few months as a kind of passion project. At this point, every single member of the StatsBomb and Arqam team has contributed, and I can only thank them for all of the hard work getting us to this point.

The MDB exists on the top tier StatsBomb Data spec for the 18-19 season, so despite the fact that these matches occurred as far back as 04-05, the data is the same incredibly rich event data our Champions League customers use right now.

It was expensive. It was painful. It is… fucking brilliant.

Messi's first senior goal? We've got that. Messi's entire Pep career? That too. The body count from all of the opponents Messi nutmegged in his career? Also in the data!

And we will be releasing all of this data TO THE PUBLIC over the next four weeks.

To recap: Free data. For the entire La Liga career. Of the greatest footballer ever.

The schedule from July 15-August 9 (a.k.a 'Messi Month') looks like this:

Monday - Analysis from each set of seasons will be published on StatsBomb.com and our media partners.

Tuesday - That same data will be released to the public for non-commercial use. The first Tuesday, we will also publish our own R primer written by Euan Dewar to help people who are new to the data and R get started.

Wednesday and every other day after - You get to analyse, visualise, and simply play with the data yourselves.

This is our gift to football. We hope you enjoy.

--Ted Knutson
CEO, StatsBomb
ted@statsbomb.com

P.S. I know there will be soooooo many questions people have. I may put out an FAQ next week to answer the bulk of the important ones. (Like will you release all of Messi's CL data, etc etc etc.) For now, just enjoy your weekend!

P.P.S. I said I wouldn't leak until July 15th. I didn't. This isn't a leak. This is an ANNOUNCEMENT. It's like, a totally different thing.

NOTE: If you wish to use any data from the Messi Data Biography for commercial purposes, please send an email to Sales@StatsBomb.com

StatsBomb Data, One Year On

One year ago today, I stood in a lecture hall in South London, waiting for StatsBomb’s launch event to start. The Data project was secret, and had been under wraps from the outside world since inception. This event was the culmination of nearly a year of work.

It was also probably the biggest personal and financial risk I had taken in my career.

Needless to say, StatsBomb Data came as quite a surprise to the data world. That's because StatsBomb Data wasn’t supposed to exist.

It wasn’t supposed to be possible to build the infrastructure necessary to produce detailed event data at a higher spec than Opta and the other competitors in the space - at the quality we knew we wanted - without far more time, far more money, or both.

But there I was, happily setting up the room with our team as we prepared to officially announce our new baby. One of my favourite things in the world is releasing new products to an audience, but this was special.

A year later, one question I often get asked is why? Why would we take all this risk to go after a market that already existed, and one where one giant company had developed a near monopoly on top quality data?

Because someone needed to do it better.

The need to improve our understanding of football demanded it, and it was pretty clear that none of the major suppliers were going to deliver a better product. I know, because I talked to a number of them about it.

“Hey, what about this?” Silence.
“How about collecting this new thing?” Silence.
“We are a paying customer and have this problem - could you, I dunno, answer your customer service emails?” Silence.

In the end, I came to the conclusion that StatsBomb could do it best. So we did.

And boy was it hard. Like, not from a from a technical perspective - that was fairly straightforward. Collecting data from video has been around for ages. Adapting the software to allow collectors to add new events and qualifiers was not terribly difficult, and we had a great partner in Arqam FC (now part of StatsBomb) to help us pull it off.

But from a design perspective? A process perspective? A quality perspective? Really, really hard. So many difficult choices were made, challenging problems were solved. Many more problems had no solutions, they only had consequences. You can make it this way or that way, but neither answer is optimal. Apparently kids these days call this “adulting.”

“I don’t think we expected this, but you are now our most important data provider.”

That’s a recent quote from one of our Champions League teams, and a huge compliment to the team at StatsBomb and Arqam for what we developed.

A compliment of a different sort is that teams who are on StatsBomb Data right now improved their points totals by 20% versus a year before. Better data = better analysis = better performances. It's something I hoped for when I started this project, but it's been pretty amazing to see it play out in the real world.

As I said, this stuff is hard, and often in ways I didn't even expect. We’re still not perfect, but like the teams that are our customers, we work our asses off every day to get better.

In fact, a lot has happened in a year. Maybe it’s best to review how StatsBomb has changed as a company (and a website) in that time.

Flashback one year: On May 9th, we introduced our new Data to the world. About an hour before our launch, Opta threw this tweet out. What curious timing.

Followed by this one a day later, and a blog explaining their new qualifiers.

Thanks for keeping us on our toes, folks! And for designing a data upgrade that can apparently only be interpreted via 3x3 matrices before disappearing from the world again. This announcement came as a huge relief to us because it proved to be so far inferior to what we had developed that we now knew we had a chance to succeed.

In contrast, StatsBomb customers know how much pressure a shooter is under from exactly how many defenders and the GK, in which locations, on every shot. This information has been available since we launched. We have even documented our research in this area extensively, here, here, and here.

Back to our story… new data also meant we were no longer shackled by restrictions from our old data provider, which meant we could turn StatsBomb.com into a place to post analysis and insight from whomever we wanted, on whatever topics we wanted, five days a week. So that's what we did. We recruited Mike Goodman to both write and develop content, while allowing a new crop of talented writers to show off so many cool new things.

For example, we started to profile "pressures", something we now view as the basic unit of defensive activity and an event that is unique to SB Data. Our research shows having actual pressing data (not derived info, like our competitors) dramatically changes how you evaluate teams and players defensively. Why is Roberto Firmino amazing - he's a striker that doesn't score many goals? How can we better show just what made Burnley's defending in 17-18 so unique? Pressures unlocked this information in a way that simply wasn't available before.

Or how about pass height, which is unique to StatsBomb Data, and is a fascinating indicator of team style. It's also a key component in creating better pass difficulty models, which is a hugely important area of research in a game that is largely comprised of passes.

James Yorke recently wrote about pass footedness, which is also unique to StatsBomb Data. Why does this matter? Because with this information, we built a new passing model that lets teams fully profile the quality of player passing with each foot. Maybe one player is amazing with his right foot, but only attempts 5-10 yard passes with his left? That’s in the data. Maybe your coach demands a two-footed player who can make difficult passes with both feet? The model lets your recruitment department uncover those types of players easily.

And possibly the thing that got us the most attention that we released to StatsBomb IQ in the last year was our Goalkeeper Module. Because our data has the location of the GK and defenders on every shot, we are now able to evaluate GKs via data in a way that was never possible before. Data scientist Derrick Yam used this to question the fee Chelsea spent on Kepa Arrizabalaga at the season’s start, and his framework for GK evaluation was accepted for poster presentations at the massive Sloan Sports Analytics Conference in Boston.

Beyond the public-facing research, we also have done unique and fascinating customer-only research as well. This includes detailing the new expected goals models, more information on GK evaluation, a groundbreaking analysis of [REDACTED] that I hope to be able to talk about in the future, and a recent study of how the Danish Superliga has changed over time, plus how it compares to bigger leagues like the English Premier League and German Bundesliga. This is above and beyond the typical head coach and player research we produce for our consulting customers on a monthly basis.

Behind the scenes, we produce regular white papers for customers, detailing our research, and unlike most companies, we discuss in detail where our research has failed. We think this is hugely valuable for customers to see, partly so they understand what we are working on for the future, and partly so they can learn what approaches to modify or steer clear of. Data science is hard and data scientists are expensive. Saving your customers time by educating them on failed approaches is a hugely valuable service, but one you'll never see out front.

I designed the initial data spec, but I've still been somewhat shocked to learn that there are so many new, useful elements inside of StatsBomb Data that it will take us years to explore it and learn what it has to teach us. However, as noted above, customers are already using it to succeed.

Speaking of teaching, we also recently launched our first analysis courses to help anyone interested in the sport better understand how it works through our research. The introduction course is suitable for literally anyone who likes football, while the Set Pieces courses are geared for coaches and analysts who want to learn more about this phase of the game.

Although it is increasing every year, data use in football is still in its infancy. I decided we needed to get out there and teach the information to the masses, and I think what James and Euan have produced with these courses is both unique and exceptional. If interested, you can find more information here.

So That Was the Past Year, Wrapped Into a Tiny Bow - What’s Next?

We will keep getting better. We recently announced data upgrades for next season, including a video explanation of the new stuff we are collecting and why. We are pretty sure Shot Impact Height will improve the accuracy of expected goals models, so we have incorporated that into our data collection. We have also added body pose information about GKs on every shot and save into the data spec. From a football perspective, GK position on the pitch isn’t just a dot of x,y information, and our data will now convey a lot more about what GKs are actually doing when it comes to shot stopping.

We also have exciting new things coming to StatsBomb IQ, including a tactical suite that will change how coaches and analysts are able to use data to analyse their own teams and their opponents. It’s the culmination of our own work in football combined with years of talking to coaches and analysts about how they look at the game, combined with understanding how data can improve that process. There’s no hype when I say our product will be great, and there is nothing else like it. Expect to see more information on this product as European teams get into their preseason camps.

We will also continue to release free data to the football world. The FIFA Women’s World Cup data will go out to the public daily during the World Cup itself, and we will keep producing FAWSL and NWSL seasonal data for free. And... there will be a new release of free men's data beyond the 2018 World Cup that's already there, but I don’t want to say any more because I don’t want to spoil the fun.

Another thing that will happen this year is that our competitors will continue to try and copy us. That’s just how business works.

They copy, we innovate. They market, we produce. They appear in media…

We change football.

If your team isn’t using StatsBomb products and services right now, you’re already behind. And given how quickly we are releasing new things that help our customers perform better, that is the one thing that will probably not change in the year to come.

--Ted Knutson

CEO, Co-Founder
StatsBomb
ted@statsbomb.com

Set Pieces Remain An Underutilised Gamechanger

This week I was lucky enough to present a comprehensive analysis of the Danish Superliga to an audience of 300 coaches, analysts, and administrators in Danish Football. The report was commissioned to not only analyse how the league has changed over the last five seasons, but also to benchmark it against the German Bundesliga and English Premier League. Our analyst Euan Dewar did a great job on the analysis and preparing the report, and it was fun to once again be in a packed room of football people, discussing data analysis. My understanding is that the entire report will be made available to the public at some point in the future.

StatsBomb do this type of analysis for clubs, federations, and governing bodies fairly regularly, and it’s a huge compliment to be trusted to produce honest, insightful analysis about the game.

One thing that was absolutely clear in the report was that Danish teams remain innovators in one specific area: set pieces. Danish teams score consistently more goals from set pieces than pretty much every other league in the world, including ones with considerably more money and more talent. (For more analysis on this, check out my earlier piece I Think We Broke Denmark.)

Let me also make something else clear - more goals are not being scored off set pieces because the defenses are bad at defending this phase of the game. More goals are being scored because a number of Danish teams are simply better at executing them. And they are better at executing because they do things differently.

What are the differences? First of all, they shoot more often off direct free kicks.

This might seem a basic point - OH GEEZ TAEK MORE SHOTS, SCORE MOAR GOALS RAAAAR - but they also score a higher percentage of those shots. Danish teams convert 8% of their DFKs compared to 6% in the Premier League, and 5.7% in the Bundesliga. That’s a significant gap, and one that seems to suggest there is a lot of slack in execution for teams in the bigger leagues.

Alright, what else?

Danish teams also target and succeed at exploiting different spaces off corners. If you know the better positions of maximum opportunity and are able to deliver balls to those areas, you can score more goals off what is traditionally a low-return phase of the game. (Teams score off corners between 2 and 2.5% across the full data set. We have seen certain teams double or treble that for multiple seasons.)

And…?

Well, remember how Andy Gray mocked Liverpool hiring a long throw coach?

Look ma, nearly free goals! (Approximate value in the Premier League, £2.5M each.)

Only possible in Denmark? Nope:

Find the edges, then exploit them. One team in Liverpool is suddenly scoring a bunch of goals from long throws. The other one hired our favourite long throw coach--Thomas Gronnemark.

Set piece execution is one main reasons Liverpool are having their greatest ever Premier League season. We have Liverpool scoring 17 goals so far in the league and conceding 6 for a goal difference of +11 in this phase of the game. Manchester City are +2 (9 scored, 7 conceded). Without that gap, the goal difference between the two contenders would go from a gap of 8 to 19, and there would likely be no title race.

The same is true further down the table as well. Given how tight the Top 4 race is right now, it’s entirely possible a difference of a few goals off this phase of the game could swing Champions League qualification for next season. When qualification is an automatic passport to tens of millions, and the least an English club will receive this season is a minimum of £86m, any edge to traverse the gap or maintain participation is worth every penny of outlay. We’ll take some time to revisit this once the season ends.

A couple of notes before I wrap this up...

Set Piece Program
We are taking applications from professional teams that want to work with us on set pieces for next season. We only work with a couple of teams on this max every season, and are exclusive to one team per league. If you work for a professional team with significant budget (bringing you goals does not come cheap), please send me an email to ted@statsbomb.com. We will choose who we work with by the end of May, so if your team wants to be in the mix, now is the time.

Set Piece Courses
For everyone else, we have tickets available for three set piece courses in June in New York, London, and Los Angeles. The courses will be taught by me, and cover both process and execution of set pieces from a coaching and analysis perspective.

To my knowledge, no one else in the world teaches a course like this, and certainly no one who works with professional teams. I made the decision to teach this information to interested parties quite simply because I feel the game is ready to change, but needed more talented people with education to carry it out. Part of my commitment to StatsBomb and its audience has been to teach people more about the game and how it operates instead of hoarding the info, and this once again falls squarely under that umbrella.

Links to buy tickets can be found here:

New York – June 2nd

London – June 11th

Los Angeles – July 7th

I hope to see a lot of you this summer.

Ted Knutson
CEO, StatsBomb
ted@statsbomb.com

Header Image Courtesy of the Press Association

Details on Our New Intro to Analytics and Set Piece Courses

A couple of weeks ago, we announced on our social media that we would be launching an Intro to Analytics for Football Professionals course here at our offices in Bath, England.

The idea behind the course is that analytics and data use is becoming more and more important in both the team and media spaces and there are currently a dearth of good places to learn this information from scratch. Coaches need this info. Analysts need this info. Pundits need this info (please jesus, let the pundits realise this). Future coaches and analysts need this info! Thus it makes sense for us to develop teaching material to fill the gap. There's a bit of risk here, because developing a full day of course materials is about a 6-week project for a single analyst, and honestly, we don't know if we're right. What if no one actually wants to attend this course?

On the other hand, it makes a ton of sense for us to teach it. We have been pioneers in this space since 2013, we have our own data for students to leverage, our own cutting edge analytics platform to use during courses, and have actually worked inside of football for teams both very small, and very very large. We also feel like there will be a shortage of qualified analysts for teams to hire as more transition from no data analysis to heavy data analysis, and we need to help pick up the slack.

The initial course offering of thirty slots filled in five days.

Okay then, demand question (mostly) settled.

We also received a flood of questions about when we would offer more courses and where we would offer them (London, Germany, Spain, the U.S., Australia, online)? Baby steps!

After about ten days of trying to find venues that made sense, we have now locked down space to host two fresh classes in London in June.

PLEASE SIGN UP FOR THE INTRODUCTORY COURSE HERE.

Along with the Introductory course we previously announced, we will also begin teaching a new course focused on Set Piece Design and Analysis.

First we broke Denmark. Now we're going to help all of you break the rest of football.

Our Set Pieces offering is a practical course designed for football/soccer coaches and analysts to learn how to get the most out of this undervalued phase of the game. In one day, we will present the building blocks for the success we have had executing set pieces at the professional level.

I don't believe there is anything else like it.

Given how many people ask me on a weekly basis if I can give them more info on how to improve their set pieces, and how many professional clubs have already expressed an interest in this new course, I suspect demand here will be high and spaces will fill quickly.

PLEASE SIGN UP TO THE SET PIECES COURSE HERE.

What's Next?

If the London courses fill quickly, we will begin looking for space to run additional Introduction and Set Piece courses outside the U.K. We will potentially do a U.S. tour this summer in major cities, plus Barcelona and Madrid (en espanol), and somewhere in central Germany, but everything depends on whether there is enough interest in these London courses to expand. You guys seem excited, but as usual, I could be wrong.

Alongside the next set of courses we announce, we will also build a new page on our website to better keep track of our education schedule.

Those of you who listen to our podcast also know that I teased the concept of building a Data-Based Recruitment course that we may do a few times a year, and it's possible we will do something very high end on data infrastructure, data visualisation, and programming somewhere down the road.

Football is in a significant period of change right now, and I'm genuinely happy StatsBomb is at the forefront of that change, while teaching people skills they will need to succeed in the future. Given the feedback we've had about our education initiative thus far, you guys are happy we're here too.

Ted Knutson

CEO, StatsBomb

ted@statsbomb.com

StatsBomb Elevates Their Industry-Leading Football Data Spec Yet Again

On May 9th, 2018 StatsBomb announced our new product, StatsBomb Data. Our football data features massive upgrades to the event data world including

  • Location of defenders and goalkeeper on every shot
  • Defensive pressures
  • Passing footedness
  • Pass Height
  • Ball Receptions

And so much more... On release, StatsBomb Data ended up with 60% more events per match than the competition. Our data is currently collected across 22 leagues and we plan to double the number of leagues we collect over the next 18 months. On the StatsBomb IQ side, we spent much of the last year unlocking the power of StatsBomb Data inside our analytics platform. Customers now have information about player and team defensive pressures where none existed before.

We also released an entire module focused on objective information that helps evaluate Goalkeepers, previously a problematic area of player analysis. Are you going to buy or sell a goalkeeper this summer? Then you really need to be on StatsBomb IQ. StatsBomb Data represents a paradigm shift in the football data industry.

Having been around this industry since 2013, you almost never see significant upgrades in event data specs, but we packaged a decade worth of innovations into our launch product. But that was what we did last year... What have we done for you lately? Not content to already have the best data in this space, we introduced new upgrades.

Shot Impact Height

You know those crosses that are too high, but the attacker goes for the shot anyway and it glances off the top of his head as it’s vaguely looped toward goal? Those look the same in the data as a standing header with perfect contact. They won’t look the same with StatsBomb Data. We have added a z-coordinate to the start of shots so you’ll be able to tell at what height the shooter made contact. By doing this, we get more useful information about each individual chance and another small variable that we think this will improve expected goal model performance. Those of you out there whose jobs do not involve improving the performance of expected goals models are probably like, “Whatever! This is boooooring.” I feel your pain. So how about this?

Goalkeeper Ragdolls

Introducing GK position information on shots has paid huge dividends when it comes to evaluating individual GK performance and positioning. However, we looked at what we were collecting and found a way to improve the information provided about goalkeepers in a massive way.

 

 

These are officially termed ragdolls because they are based on the dolls you see in ragdoll physics demos, but throughout design and development we have affectionately nicknamed them skellingtons. They capture the GK position at the start of a shot and at the point of a save/potential save in a way no company ever has before. And we will capture this information on every shot in every league we collect, from the English Premier League all the way down to League Two. These new upgrades plus a couple of other minor ones rolled out at the same time will give our data set twice as much information per game as our competitors. There is no extra charge on the new stuff to StatsBomb Data customers.

These upgrades to the data spec will start rolling out as part of our normal data delivery in March, and will extend backwards through all of our historic data. I think it’s been clear from the start that we’re a bit different from the other data companies out there. Our mission is to find innovative new ways to analyse and visualise the game, and provide our customers an edge over the competition. Watch this space - we’re just getting started.

Ted Knutson CEO, Co-Founder StatsBomb ted@statsbomb.com

Introducing Goalkeeper Radars

If you pay attention to our social media, you know that we recently released the new goalkeeper(GK) module on our analytics platform StatsBomb IQ. This past weekend, phase 2 of the module went live, and included in that release were an awful lot of things, not least of which were the long-awaited GK radars.

Today I'm going to discuss what we've done with the GK metrics, why they differ from what you might see elsewhere, and why this is something people in football really need to care about. (Note: For those of you who want to know more about the framework we have chosen to analyse GKs, please check out my intro piece here.)

StatsBomb Data is Different

I have been working with player data in football since 2013, but I never bothered to do much work with GK data. It's not that I didn't think GKs were important - obviously they are. The problem was that I felt the data we had access to didn't add much insight into the job GKs actually do. Primary jobs for GKs consist of:

  1. Stopping shots
  2. Claiming crosses and high balls
  3. Distribution

When I was designing the data spec for our new data, I went around to most of the smart people I know in football and asked them how we could improve football data without widespread tracking data. We ended up with a long list of upgrades to what our competitors offer, but probably the most important element across everyone's list was the position of the GK on every shot. And the reason for this was that a big part of the GK's job is simply being in the right place to have the best chance of saving any particular shot.

Think of what you often hear in commentary when David de Gea is playing.

"It's not really a save, the ball just hit him and bounced off."

"Another shot right at him."

"Great reflex save from de Gea, but again the ball was right at him."

Being in the right position to make saves for a keeper is a huge skill, but you can't measure that if you don't have the data.

So we collected it, along with the position of all the defenders in the frame when a shot is taken, and we call them Freeze Frames.

(Credit for all the data science heavy lifting in the GK Module goes to Derrick Yam, who did great work on this on.)

Once we had enough shots, we were then able to investigate where GKs generally should be positioned on shots from any particular location in order to make a save and put that information into a model. We then use that model to evaluate each GK on each shot and produce two shot stopping metrics.

GSAA% - Goals Saved Above Average Percentage: How the Goalkeeper performed versus expectation. Calculated as: (PSxG - Goals)/Shots Faced

Positioning Error - How far from the optimal position for facing a shot the Goalkeeper is (on average).

The next two metrics we produced focus on GK activity around the box.

CCAA% tries to answer how active are GKs at gathering claimables - high balls and crosses into the box that could be claimed.

The claimables model first defines the likelihood of a pass from and to a particular location being claimed and then evaluates GKs based on their activity. (This is made easier because StatsBomb Data also includes pass height as you wouldn't generally expect GKs to claim ground passes.) Busy GKs that come off their line to claim lower xCL balls are graded higher than those who are consistently rooted to the goal line. The reason is because claims have some level of value in cutting out opposition chances, and GKs can be rewarded and penalised based on this activity.

(Note: There are a lot of additional technical details behind the scenes here that are only available to StatsBomb IQ customers right now.)

For GK Aggressive Distance we wanted to look at how active are GKs generally at moving off of their goal line to do football things? We investigate the distribution of the distance from goal for goalkeeper actions that are not passes, saves or claims. This includes clearances, interceptions, tackles and ball recoveries. This shows the presence a goalkeeper has further up the pitch and measures their defensive contribution in a manner more common to field players.

Finally, you get to the distribution metrics. Admittedly, these are as more stylistic profiles as opposed to telling you whether a player is strictly good or bad at a skill set, but we chose these because we liked the insight they deliver in this area. In real world analysis, we produce something like twenty different distribution metrics in this area to dig deeper.

Pass into Danger% - Percentage of Passes made where the recipient was under pressure or otherwise in Danger.

Positive Outcome Contribution - How frequently is the player involved in sequences that soon resolve with a Postiive Outcome.

Combine all of those into a visual plot with the outside ring as a top 5% cutoff and the inside ring as a bottom 5% cutoff and you get this:

If you have watched these GKs quite a bit over the years, these really do feel "right" in terms of profiling their skill sets. De Gea is great at stopping shots, but doesn't do that much with regard to coming off his line. Lloris is a solid shot stopper who remains very busy around his own penalty area.

What about Chelsea's Kepa, who Derrick analysed early in the season as being largely average in most of our metrics?

And with our data, we now have detailed GK metrics for every league we collect, from the Premier League right down to League Two. Or MLS. Or Poland. Or your academy...

Goalkeeping is Unsolved

I hinted at this a little in my Barcelona presentation, but from talking to teams around the world, I get the impression very few understand goalkeeping from an analytic and training standpoint, and almost no one is closing the loop with regard to data driven coaching. I've been working with football data for nearly six years now, and it took us until now one to build a framework we liked to evaluate GKs analytically. Because of this, there are just so many things we don't know.

  • How do GKs age? What does the age curve look like?
  • Does shot stopping ability - which appears largely stable - increase, plateau, and decrease at certain times?
  • Are shot stopping and positional error negatively correlated to claim activity and defensive aggression?
  • How do GK skills transfer from lower quality leagues to higher ones?
  • How do they transfer across top leagues?
  • Our model thinks David de Gea saved Manchester United thirteen goals more than an average GK would have last season. Is that type of elite performance sustainable?

And that barely scratches the surface. Not knowing things in sport is dangerous. It throws a random factor into every decision you make that could be tremendously costly down the line. But ignorance becomes way more dangerous when it shifts from "no one really knows these things" to "we're the only ones who don't know these things." If your opponents have better info, and you are the only sucker left on the block...

We designed StatsBomb Data to allow coaches and analysts to ask questions they never could before. And with StatsBomb IQ, we deliver powerful, easily understandable insights to answer those questions.

We're not just here to stop teams from making mistakes, though data is super useful for that. We are here to deliver info that makes teams better in every area of the game. Recruitment, self-analysis, opposition scouting...

And now goalkeeping.

--Ted Knutson

ted@statsbomb.com

@mixedknuts

PostScript

For good or for ill, next month is the five-year anniversary of the first player radars I ever created. For those who want a design history and defense of the visualisation format, relevant links are below.

The first terrible introduction article.

Understanding Radars for Mugs and Muggles

Defending Radars CASSIS Presentation - RADAR WARS. (Also an excuse to poke fun at Luke Bornn and Daryl Morey)

New Radars on StatsBomb Data

5 Easy Ways Data Can Give Football Clubs an Edge

In something a little different today, I'm going to discuss five simple ways data can help football teams gain an advantage. There's this idea among football's old guard that data is complicated and difficult, but the reality is, we try and provide useful insight that is easy to understand and interpret.

1) Corner Touch Maps

This is what we call a corner touch map. Marek and I designed it back in 2014 to help out with the set piece program, and it's probably the dumbest, simplest vis we'll ever build.

 

 

What it shows is the first touch by either team after a corner is taken.

Why?

Because I can show you a shot map of where teams have had shots off corners, but that only tells you about when they have been successful. These maps more clearly show their plan and - generally - their intended delivery zones.

Check out the map from right-sided corners from Manchester City last season.

 

 

This immediately tells you two things. First, they take a lot of short corners, and you need to be ready for those. Second...

 

 

City apparently only took outswingers from that side last season, and as a result, neither team had a touch in the box on the left HALF of the six yard box.

And honestly, if I am an opposing coach facing City, my life is nearly impossible as it is, so I am thanking little baby jesus for making my life much easier by allowing me to generally ignore marking that zone (unless there are runners) and overload the zones along the curve. This is just a tiny glimpse of how we use data to help execute set pieces at both ends of the pitch.

2) Arsenal's Left Lane

 

 

This is what we call a Defensive Activity Map. Teams are attacking from left to right. The vis attempts to profile where teams are making defensive actions (including pressures), and then compares their defensive activity in each area to the rest of the teams in the league. Zones where they make more actions than average are hotter, and zones where they have fewer actions are greyer or blue.

Arsenal this season are slanted right, possibly because of personnel issues (left back injuries), but maybe as part of a plan? This type of vis doesn't deliver a magical recipe for how to solve/attack tactical issues, but it does help coaches and analysts ask interesting questions. As a coach, I go to the video and try to figure out what is weird. If I am an analyst, maybe I compare the success of attacks down Arsenal's left compared to the right/center and see if there is a vulnerability that way.

3) Similarity Scores

We use these a lot in recruitment, largely because it's easy to talk to coaches about who their ideal player for a position is as opposed to all of the precise things they need that player to do on the pitch.

Once you know which players fill their ideal archetypes, you can then dig into the data for what those players do on metrics you care about, and then plonk down a list of players to scout in the leagues you can afford.

Coach, who is your ideal wide forward?

"I want Lionel Messi."

(Seriously - this always happens. Every coach says this exact same joke.)

And because we are indulgent number wonks who have this already set up in StatsBomb IQ, we can answer the question honestly.

The most similar players to Messi 17-18 in our current data set are:

Neymar Messi (18-19 edition)

Eden Hazard

Raheem Sterling

and Nicolas Pepe, who has been on fire so far this year.

But the fun part of this is that you can actually narrow down the data to the leagues you can afford to buy players in and still have the exact same conversation.

Who is the Lionel Messi of League One? 2017-18 Bradley Dack, maybe? Or Conor Chaplin?

How about in Austria Bundesliga? Uh... Andrei Ivan?

Look, I'm not saying the data is always right in these situations, but shopping for the poor man's Messi apparently comes with serious limitations.

4) Evaluating Goalkeepers

On Monday, we released phase 1 of the Goalkeeper Module into StatsBomb IQ. It allows teams to profile goalkeepers statistically across a broad range of metrics that haven't really been available before because in other data sets, we never knew where the keeper was when a shot took place.

We were messing around with some of the visualisations in testing and came across this fun one for last year. David De Gea and Joe Hart faced almost exactly the same amount of xG in shots on target last season, but how that xG came about and what happened after that was dramatically different.

 

 

The vis above is broken into xG buckets, and you'll notice that the shots Hart had to content with were generally much higher quality than those De Gea dealt with. Sadly, nearly every high xG shot Hart faced also made it into the back of the goal.

When it comes to analysing and evaluating GKs with stats, we're just getting started. Expect to see a lot more from us on this topic in the coming weeks.

5) Passing Tendencies at the Team and Player Level

 

 

TL;DR

Stats don't have to be complicated to deliver powerful, useful insight. And often the simple stuff is the most effective IF you know where to find it.

Ted Knutson ted@statsbomb.com @mixedknuts

Explaining xGChain Passing Networks

(Editor's Note: This was originally published on the StatsBomb Services blog, but the URL was lost in a server move. We have re-published it here so it can be referenced in future work.)
Some of the work we need to do on the StatsBomb Services side involves teaching people how to use what we create. If it’s not practically applicable and/or can’t be taught, then it’s just a piece of art, not analytics.
Today I’m going to discuss passing networks, with a specific emphasis on the xGChain passing networks you’ll find on the StatsBomb IQ platform and also on our Twitter feed.

What is a Passing Network?

It’s the application of network theory and social network analysis to passing data in football. Each player is a node, and the passes between them are connections.

The first time I saw them used in football was either a presentation by Pedro Marques of Man City at the first OptaPro Forum, or Devin Pleuler’s work at Central Winger on the MLS site.

We also used them at Brentford to do opposition analysis, specifically to find which players we might want to aggressively press whenever they get the ball, or looking at valuable connections between players we wanted to break.

The application is simple.

  1. Look at a bunch of recent matches for a club and you will often start to see patterns of play and interesting details you care about.
  2. Investigate a little further in the data to find usage information
  3. Go to the video and see what shakes out.

In many cases, analysts only have time to watch and analyse the last 3 matches of opposition on video. Using the passing networks gives them quick info in an easily digestible format that doesn’t cost them an extra 10-20 hours of video time.

Before we go any further though, I think it’s important to speak about the limitations of passing networks. These are a tool and meant to be part of an analytics suite to help you analyse games, but like any tool, you need to understand their weaknesses.

First, each node consists of the average location of a player’s touches. If they switch sides of the pitch regularly, their average will look central, even if they never touch the ball in that area. This is a limitation of the vis and why we ALWAYS use video to back stuff up. On the other hand, if you want to stay data-based, you could use things like heat maps, or even dot touch maps for every place a single player touched on the pitch to get more accuracy. This is a bit like using shot maps to supplement aggregate data in player radars to get a clearer picture.

The second limitation is that this info is an extrapolation of what actually happened. Did the fullback pass 15 times to the left wing, exactly along the path in the vis? No, of course not. That information is also easily visualized, but it’s just not contained here.

The third limitation is that these don’t actually explain that much by themselves. They take snapshots of actions through a match and combine them into a bigger picture. It’s like a movie where you only see 20 of 50 scenes without seeing the whole thing. Sometimes, you’ll end up with a clear idea of the plot. Other times, you are going to be really surprised when your friends start talking about the whole Verbal Kint/Kaiser Soze thing. They are still useful, but this is another reason why - in practice - we almost always pair this analysis with video work to complete the picture.

Design Stuff

Right, so we have passing networks. Some people do them vertically. We do them horizontally.

Why?

For starters, most humans are accustomed to looking at football matches left to right. High angle tactical cam footage from behind the goal is quite useful if you can get it, but the vast majority of the audience views football in a left to right perspective.

The next thing you notice is that we stack ours on top of each other. This happened as a bit of a happy accident where I noticed a pressing team had a map very high up the pitch. I then put the map from their opponent underneath, and voila! we had a fairly clear view of territoriality in the touch maps.

If you take a step back, it seems fairly obvious, right? There are two teams on the pitch, and each of their actions impacts the other one, so visualize both together. However, actions between two teams aren’t always linked. The shot locations of one team don’t have any impact on the locations of the opponent. Passes do though, so at least in my opinion, pairing them as part of this vis makes sense.

We also have them both going the same direction, which seems to strike some people as odd. All I can tell you is I think the territory element is much clearer if they go in the same direction, but people are welcome to test their own implementations and judge for themselves.

What else do we have… ah yes, the big difference: colour.

With passing networks, there is a real danger of adding so much information that your vis basically becomes unusable. It’s an incredibly info-dense visualization to begin with, so adding more elements is likely to make understanding what you are trying to display harder instead of easier. I think Thom walked this tightrope perfectly, adding the extra xGChain layer of data while still leaving it interpretable, and to be honest, totally gorgeous.

That said, it may take looking at these a number of times before you become comfortable with what they are trying to display. The same caveat was true of radars and shot maps, and is another reason why analysis blends elements of art with data science.

The xGChain Layer

First you need to understand what the xGChain metric is, and to do that you should read Thom Lawrence’s intro piece here. So any time a player is involved in a pass in the possession, they get xGC credit, and then we sum up their involvement over the course of a match and colour their node based on that.

Why?

Because this allows us to take the network vis beyond basic counting stats and starts to examine the value of a player’s contribution to the match. Because the colour scales are tied to the 5%/95% cutoffs I started back with the radars, you also get an easy reference for whether a player’s attacking contribution was pretty great (RED), pretty poor (GREEN), or somewhere in between.

We also start to get a sense of how non-attacking players are contributing to valuable build-up play in a way that just makes sense (at least to me).

Quick Reference

  • Size of node = number of touches
  • Thickness of line = number of passes between two nodes
  • Colour of node = linear scale from green to red (.6-1.4 xGCh based on 5%/95% cutoffs)
  • Colour of line = the total xGChain of possessions featuring a pass from A->B (0-.5 values based on 5%/95% cutoffs)

We Still Use Numbers

On Twitter, you will generally see just the visualization. This is mostly due to the limited, bite-size nature of the format. However, on the StatsBomb IQ app, Passing Networks also include all the individual and combination numbers you see below.

The combination of the vis and the numbers represents the whole of the analysis. The vis gives you basics, the numbers specifics, but both are still constrained by the limitations of this visualization format.

Examples

In this one you see Liverpool pushed quite far forward and had massive amounts of possession and created reasonable chances. Pretty much everyone is involved, but Coutinho and Lallana only put up good, not great xGChain numbers for the match. On the Swansea side, Llorente is the only guy up high most of the time, while he and Wayne Routledge both put up big numbers for the game, and Swansea came away with a vital win.

Just a single plot this time from Liverpool’s trip to Bournemouth earlier in the season, mostly to compare same team performance. Here Firmino is posted out wide instead of central, and had comparatively little impact in creating big scoring chances for LFC that match. Normally he’s a fiery red circle, but for this match he’s ineffective green. That’s another cool element these plots allow. Instead of focusing on the full match, you can isolate one player across a number of positions and games and see what it does to their performance.

I posted this one because both team’s maps are pretty incredible. City’s front three have average touches nearly on the 18, and nearly everyone except Claudio Bravo is red or orange. Meanwhile Boro had almost none of the ball and created almost nothing as well. The match ended 1-1, with Boro scoring a very late equalizer. 90% of the time our simulations think City win that match.

It’s always fascinating to see what happens to these maps when two elite teams square off. This is from the 1-0 Dortmund home win earlier this season. Bayern dominated the touches, but Dortmund just edged then in xG, 1.40 to 1.24. Aubameyang was rampant the entire game, and every time Dortmund touched the ball, they felt dangerous while doing a pretty good job of stymying Bayern’s great attackers.

How Do You Use This Inside a Professional Football Club?

Typically what I would do would be take passing networks for the last 10 matches from the next opposition and divide them into home and away games. Stick the numbers next to each of them for reference, and start to look for patterns.

Which players provide the engine for plan A when this team attacks?

Which players have the most valuable touches?

Does their fullback tend to get really high in possession and can we play behind them?

Which players should we look at for potential pressing triggers?

If we have a choice, which center back would we allow to play the ball forward?

Conclusion

This is already long, so I will wrap it up here. We view passing networks as an integral part of data-based football analysis. Provided you understand their limitations, they can provide a huge productivity boost to opposition and own team analysis. We also think the addition of our xGChain metric adds a layer of value to a visualization that previously only contained counting stats.

If you work in football and want to see what else the StatsBomb IQ platform has to offer, please get in touch.

--Ted Knutson

ted@statsbomb.com

@mixedknuts

I Think We Broke Denmark

I was mucking around with an analysis for a customer this week when I ran across something I hadn't looked at in a really long time - the set piece table for Danish Superliga 14-15. That was the season FC Midtjylland (FCM) won their first ever Danish title, largely on the back of scoring tons of set piece goals. Brian Priske was the set piece and defensive coach that season and he and the players probably deserve 99% of the credit for those goals, but a tiny portion of what's left should probably be apportioned to Matthew Benham for the idea that this phase of the game was exploitable, and to my own work in designing the set piece program.

Anyway, the reason why I mention it is not to break my arm patting myself on the back, but because after this nostalgic instance of stumbling across the 14-15 stats, I wondered what the 17-18 set piece table looked like.

That's when things got weird...

Background

When FCM first started having success on set pieces, we discussed how to talk about this in a few internal meetings, especially with regard to questions from the press. I distinctly remember the message we landed on being one of happily crediting player skill and a bit of luck, but under no circumstances should anyone say that we worked on these more than normal.

Set piece goals to outsiders would hopefully be written off as things that magically happened, which was just fine by me.

(This splash image is from Daniel Taylor's scathing piece on Championship owners.)

That's why I thought it was weird when pieces like Sean Ingle's one from February 2015 started appearing in the press. Why was this thing that we knew was hugely important to us and driving a lot of our success, suddenly public knowledge? I still don't actually know, to be honest. My guess was that it provided a counterpoint of positivity to the ongoing Warburton mess at Brentford, but even acknowledging the edge existed - and one that would likely be sustainable long term - seemed incredibly dumb.

One of the big rules of conducting sports analytics inside a team is that when you find an edge, you exploit the hell out of it.

And you never talk about it in public.

Why not? Because professional sport is competitive and you don't want to make your competition any smarter. Plenty of them will ignore the information or not be able figure out how to successfully exploit your edge directly, but even one team copying an edge for free is too many.

In many cases, the edge only exists because people don't know it's there to begin with. That's why many coaches and general managers/directors of football will outright lie when reporters start asking questions in these areas.

The Fallout

In a way, the public discussion created a fascinating economics question. How do actors in competitive economies adapt behaviour to new information over time? Or to put this in more obvious sports terms, what happens when a comparative league minnow wins a title on the back of scoring a lot of set piece goals, and then tells the entire world what they did?

Welcome to Denmark!

That is a lot of set piece goals. Like... a LOT. And from a whole bunch of non-Midtjylland teams. Brian Priske would later spend some time helping giants FC Kopenhagn crush the league (partly also via dominating from set pieces), and his expertise may have dispersed into greater Denmark a bit, but this whole "we too can score lots of set piece goals" idea has clearly caught on up North.

In 14-15, FCM were the only team in the league to crush this particular phase of the game, scoring 25 goals, while three other teams barely cracked 10. Three years later, eleven of fourteen teams were in double digits.

I Now Have SO MANY QUESTIONS

One of the things I used to argue about with my long-time collaborator Marek Kwiatkowski was whether working on set pieces more in training forces trade-offs in other areas. I was firmly in the "you can score more goals, period" camp (Marek is a natural uber-skeptic), but it was mostly just theory. However, now we get a chance to look at exactly that.

Are the total goal outputs largely fixed and you just shuffle between open play and set pieces, or can you just plain score more goals by adding set piece expertise? To put it another way, can you create a bigger pie or are you just carving out different sized pieces?

Set piece goals per game, 14-15: .55
Set piece goals per game, 17-18: .75
Total goals per game, 14-15: 2.41
Total goals per game, 17-18: 2.91

Set piece goals per game have gone up by .20, while overall scoring is up half a goal a game. This lends weight to the bigger pie hypothesis, and not merely different sized pieces.

Note: The Danish league changed structure between 14-15 and 17-18 by adding additional teams and the world's most complicated playoff structure, so it's not as clean as an analysis as it might otherwise be. Professional sports...

But wait, you say... some of that increase can be explained by competitive reasons, right? By increasing the league size, they probably brought in some weaker teams that were more likely to get blown out.

A fair point. Going to the first year of the 14-team league, we see... .63 goals from set pieces and 2.65 goals a game. Again, more set piece goals and more goals overall.

Shouldn't there be an equilibrium, though? Teams are scoring more set pieces, and they presumably know how to defend better against them as well, right? So why are we seeing so many more goals?

This is where we get to a bit of theory. Back when I was at FCM, someone asked our striker Duncan about defending set pieces, and his reply was that if the timing and delivery were right, the goals were basically unstoppable.

Now part of this comes back to creating complexity in your delivery and route patterns, and who you are targeting on all your different set pieces. You can't just do what England did in the World Cup and run the same play over and over again and succeed. You might be able to get away with that for a few games, but you'll struggle mightily through a league season. Defenses will adjust to that type of basic plan. However, if you are smart about your planning... well, maybe the goals actually are unstoppable.

Given the fact that the fewest set pieces goals conceded in the entire league was nine (and three seasons earlier, it was just four), either everyone in Denmark suddenly became really bad at defending set pieces or everyone became much better at executing them in ways that were difficult to stop.

I lean toward the latter.

The analysis above isn't scientific or conclusive. There are confounding factors, and football is an inherently complex game that often defies simple explanations anyway. However, I find the dramatic increase in set piece goals across the entire league here fascinating, and if we were building a case that you can increase set piece goal production at the cost of basically nothing else, we now have some evidence that perspective may be correct.

My Own Work

One of the things I am happiest about regarding set pieces is that what we built at Midtjylland was sustainable, despite the fact the coach initially responsible for the success was poached by a bigger club. Well done to Mads Buttgeireit for continuing to innovate in this area, and well done to FCM for listening when I said, "You HAVE to get Priske a fucking set piece assistant or you'll lose tens of millions of euros in value if he ever leaves."

Listening is underrated.

We also know for a fact that data analysis has dramatically changed the way that both baseball and basketball are played now. It's not just about finding better players, it's often about finding fundamentally superior styles of play before your competition and then beating them with it over and over again until they adopt your style.

In light of the above, I still find it amusing that this summer, no one on the club side came and talked to us about set pieces. The World Cup of Set Pieces was great. I broke down a lot of things, both on the site and on Twitter. Still, zero interaction.

¯\_(ツ)_/¯

In a way, this was really good, because I honestly did not have the bandwidth to spare while also launching StatsBomb Data. (I probably still don't, but football is a siren's call.) In another way, it's just continuing evidence that football is glacial when it comes to adopting new ideas from outsiders. Lest you think things are progressing behind the scenes in England, the Premier League scored 214 set piece goals in 2017-18... and 216 in 14-15. Fair dos to Bournemouth and Eddie Howe/Tom Webber for leading the league in this area last year though.

Our price for consulting on this is not cheap. We don't need to be. We still get you goals at a huge discount vs what you pay at the player or manager level without cannibalising anything else. And we teach your club personnel how to sustain this edge. The value you get at the club level is stupidly large.

And like I said above, it's not like teams were put off by the price... no one even had the conversation. *

One thing I do want to note is that if you are a national team and want help with set pieces for the Women's World Cup, definitely get in touch because like with our data, we will offer a deep discount to support the women's side of the game.

Conclusion

I will always be a nerd at heart, so finding data on how the Danish Superliga ecosystem changed after we shocked it in 2014-15 was super exciting to me and I had to write about it. While it doesn't offer conclusive proof of anything, it certainly allows you to ask interesting questions about what would happen if the rest of the football world starts to adopt advice on better ways to play the game that were reached largely via data and analytics.

Thank you for listening!

Ted Knutson
CEO, Founder StatsBomb
ted@statsbomb.com
@mixedknuts

*And I also know that plenty of you are in clubs already and listening, and you'll take what you learn from us and do it on your own and probably succeed at least somewhat, because you are smart and it's not that hard to do better than what you have now. It's probably pretty hard to score 25 every season like FCM though.

Other Writing

Changing How the World Thinks About Set Pieces Set Pieces and Market Inefficiency.

Historic data used in this piece was licensed from Opta