A beginner's guide to analyzing teams using stats
The next in our continuing series where Kirsten asks me all (or at least one or two of) the questions you could ever want answered about using stats. This time we're talking about analyzing teams.
KS: So, even after all this education, I’m not quite clear on which stats apply to individuals, and which to teams. Is expected goals the number one valuation used for both players and teams?
MG: There are a bunch of stats that might be applied to players and teams. Pretty much everything you can count a player doing you can then total up and look at as a team number. So numbers, of course, are more useful than others. And for starting a conversation about team analysis, xG is definitely the way to go. As a metric xG is actually probably more useful at the team level than the player level.
KS: In order for me not to panic about not knowing this teams v individuals numbers, Mike sent me a piece entitled (in hindsight, ironically), “Who's gonna fix Wolves?” It’s actually a great place to start thinking about team numbers, given that when the FA hit the pause button on the season, Wolves were in sixth—but when the article was written, they were second from bottom and had yet to record a win. Mike asks if it’s time for Wolves to panic, but unequivocally states that it’s not. Turns out he was right, so it’s appropriate to ask what numbers he used to predict the future . . . and possibly to ask what other factors influenced what he saw in the numbers. Obviously the first number he looks at is xG, but for a defensive team like Wolves, it might be more appropriate to ask about what they’ve conceded. What’s the stat for that, Mike?
MG: The stat for that is xG conceded, it’s just like xG except on the other side of the ball. Look at all the shots a team has conceded, total up the expected amount of goals those shots might lead to, and then look at the actual amount of goals they’ve given up. The great thing about xG as a metric at the team level is that it operates just like goals does. You look at a team’s xG and the xG they’ve conceded and then the difference between the two, so just like you’d look at a team’s goal difference you can also look at their xG difference. And the basic rule is that we should all expect teams to have their goals and xG converse to the same values. Or, to say it slightly more nerdily, xG and xG conceded predict future goals scored and conceded better than current goals scored and goals conceded does. In Wolves' case the side had conceded a bunch more goals, 11, than the xG of the shots they’d conceded, 6.49, predicted, so it was easy to predict that their defense would improve going forward.
KS: And you were right! However, when you go on to explain the bad news, you state that Wolves’ xG isn’t nearly what it was last season—enough to land them just outside the top of the table. Instead, they’re in the European places. Does this indicate that looking at just xG and xG conceded isn’t enough?
MG: Well, I think there are three components to that answer. First is that position in the table is always going to be contingent on not only how well one team plays, but how well everybody else plays. So, part of what was going on in this once and maybe future season is Arsenal struggling, Spurs collapsing, and just in general a season where the league’s big six are being underwhelming. So, even if xG is fully capturing the contours of Wolves performance, the side’s performance in relation to everybody else can certainly change.
Second, we need to separate out two different ideas surrounding xG. There’s the idea of how a team’s actual goals stack up against their expected goals. In that arena we can say with confidence what we expect to happen (the goals they score and concede will eventually come in line with xG), but the story of how that is likely to happen is where science meets art. The question of what exactly is causing the divergence is an interesting one and highly relevant for players and managers and fans, even if we can say that whatever it is is likely temporary.
And finally, xG levels can themselves change. While xG is a pretty good proxy for how good a team is, teams get better and worse all the time. So we might observe a team’s xG improve on either side of the ball from season to season, or even within a season, and then we’d want to look for reasons why that was happening. Usually xG is pretty predictive of itself, which is to say that usually teams don’t improve a ton or get dramatically worse over the course of a season, but there are always exceptions.
So, to sum it up. Yeah.looking at xG itself isn’t enough. You need to look at a team’s metrics in relation to the rest of the league, look at how a team’s actual results are in relation to that metric and why they might differ, and then look at the movement of the metric itself and what might be causing it to change.
KS: Now, despite being barricaded inside my apartment, I don’t have time to go back and look for an article in which a team’s expected xG differed wildly from the actual number of goals scored. Here, Wolves were pretty much on track: They’d scored 6 with an xG of 5.61. However, they had conceded 11 goals from 6.49 xG conceded—and this isn’t viewed as a problem. So given that defense is simply attack in reverse (shocking!) I can look at those defense numbers and wonder, what on earth causes this difference between what is predicted and what happens in reality?
MG: Right, early this season the biggest thing that was wrong with Wolves is that the team’s goals conceded was much higher than their xG conceded. And while we’d expect that the numbers would come back in line, fixing the problem, that doesn’t tell us much about how that’s going to happen. The biggest element involved in these kinds of divergence is generally just variance in finishing. There’s really not much you can do if your opponents keep launching into the top corner against you, except rely on the numbers to reassure you it won’t keep happening.
But, there are other things that can contribute that are at least worth looking into. We can isolate keeper performance, for example. StatsBomb has a separate model, a post-shot xG model, that looks at the performance of keepers given the shots they’ve faced (this is different from normal xG models because it takes factors like the trajectory of the ball into account, which you’ll just have to trust me is the best way to go about things because working through those differences is a whole article unto itself). Using those tools we can determine if some of the divergence is down to keeper performance. Or we can separate out set pieces from open play. If the problem is that a team is conceding a lot more on set pieces than expected, it would be worthwhile to examine if there is in fact something going wrong in that phase of the game.
All of which is to say that the fact that a team is diverging from their xG or xG conceded is the start of the story, the fact that they’ll eventually come back to expectations is the end, but there are chapters and chapters to investigate in the middle about why that divergence occurs and what the likely path back is.
KS: Ok, one last question because it’s been bugging me: why are penalty goals removed from the equation?
MG: Penalties are just not particularly predictive of anything. Just because one team got a bunch of penalties doesn’t mean they’re any more likely to get them in the future. So, including them doesn’t help us get a true picture of how good or bad a team is. It’s mostly just a function of the kind of variance that likely won’t continue. And, since the entire point of xG is to strip out the noise and look at what’s likely to continue, away go the penalties.