It’s been ten months since I wrote xCommentary, which came out of frustration from hearing my 7-year-old, who is fully addicted to Sunday morning Match of the Day binges, parroting factually wrong commentary.
I don’t want to repeat what I said there because I think the piece stands on its own quite well. However, with the announcement that Match of the Day will now be using expected goals as part of the program combined with what is a clear push for Sky to move forward in this area, I did want to cover a bit about how to use these silly numbers in the first place.
The short answer, at least at first is: with caution.
First of all, this move is a good thing.
The fact that broadcasters in the UK are willing to move in this direction is a positive for analytics in the sport. Period.
Huge credit to Opta, Sky, and the BBC for making this possible. I’m still quite staggered that it is happening at all, and using and explaining these numbers daily has been my job since 2014.
Yes, there may be rough patches to start, but everything new has those.
Yes, there may be quibbles about the precision of the model(s) used, but the remarkable fact here is that a model is going to be used at all. I have barely seen the numbers, but if there is a backlash about general discrepancy, then presumably there will be a push to improve the error of the models. That’s part of the natural process of data science.
Yes, expected goals discussion might be best served by having a smart stats guy on air to explain them clearly and concisely, but let’s give all of this a chance before we kill it.
Second of all, please be gentle…
Okay, so we’ve got an expected goals model. What do the numbers it spits out actually mean?
This is where you have to be really careful in making claims about what single shot xG numbers do and do not convey. The analytics community are all guilty of treating these as defaults, largely because the venue where we usually discuss these things is limited to 140 characters. That doesn’t allow much room for caveats. In reality, every tweet about xG values of single shots or even single games comes with a whole host of legal fine print that no one really cares about except the data scientists.
However… since this is going to be on TV, some caution is advised.
An xG value like .40, means that 40% of the time a shot with these qualifiers from this location has been scored. This means all previous shots are factored into that number, which will include a whole range of very simple chances as well as insanely hard ones.
So why do we care about this?
Because it doesn’t actually say much about this particular shot we are discussing right now. It’s more like “in the past, this has happened.”
Now the reason we’re here at all is because most TV commentators have previously been really bad at estimating historic likelihood. (This is a verifiable claim.) For some reason they seem to think the modern incarnation of football is a much easier game than when they played, which makes them far too critical of whether any particular chance should have been scored. I don’t know why this is, but it’s an epidemic across the entirety of European commentary and there isn’t a way to change it without some sort of objective information.
This is where xG shines, because it provides an anchor point based on history. All the players in the data set taking these shots are/were professional footballers. It’s not like we’re comparing the expertise of children against the Sergio Agueros of the world – these are mostly like for like comparisons.
And this is where the commentators get to apply their expertise…
Because as noted above, xG models have very little information about the particulars of any one chance. Commentators, on the other hand, have all the information, including expertise in knowing what it’s like to be on the pitch trying to score those goals.
They can then apply their expertise and tell us why a single shot is likely easier or harder than all the other shots from that location. It won’t generally turn a 9% chance into a 90% chance (see also: wide angle headers from 10 yards out), but it could easily be double or treble what the model estimates.
I stated in my article last year, I feel like the commentators don’t get enough chance to apply their expertise in place of cliche. Adding an underlying xG model gives them exactly that opportunity.
My show pitch
Opta have a lot of data from the entirety of the Premier League at their disposal. It would be brilliant to see someone walk ex-players back through the stats and data from their own careers and discuss it, especially when paired with video highlights.
It could also potentially be a huge conversion point for players and coaches on the value data represents to the game.
Example: Alan Shearer is easily one of the best forwards ever to play in the Premier League. This isn’t a claim anyone will argue with. However, as good as he was, Shearer probably only scored about one in every five shots he took. 20%. Maybe less.
If one of the PL’s best ever forwards only scores at that rate, and you prove this info to him with his own stats, maybe it will soften/improve his commentary when evaluating others?
Football has changed.
I’ve been saying this all summer, but even compared to 12 months ago, I am seeing massive differences in how interested clubs are in adding data analysis into their football process. The fact that media are picking up on this and moving forward is a clear sign that football itself is in transition. Whether certain groups of fans like it or not, the world is progressing from viewing data analysts as “xG Virgins” (as someone recently tweeted at me), into people that work inside of football clubs and have their analysis appear regularly in the mainstream.
My suspicion is that this transition won’t be an entirely smooth one, but it is unequivocally positive. It’s also going to create an entire new generation of highly educated fans and coaches who view the game itself in a more knowledgeable light.
In the meantime, my only request is please, be gentle. With feedback, with drawn conclusions, with criticism. With everything.