StatsBombIQ StatsBomb Data
July 20, 2018

StatsBomb July Mail Bag

By Mike Goodman

It’s mailbag time! The World Cup is over, the European domestic season will be here again in just a few weeks. Just enough time to fit mailbag questions into the interim. Let’s get right to it.

 

These are two similar questions about expected goals and goalkeeping that get at an important point. It’s really hard to evaluate goalkeeping. So, the first part here, how useful is the difference between goals allowed and expected goals allowed in measuring keeper performance? Well, it’s not useless. If there’s a large gap between the two it means there may at least be something worth looking at. It’s useful as a flag for further investigation, but not much more than that.

Question two gets at why. Without the kind of post shot information that expected goals doesn’t include, it’s impossible to tell just from the numbers how much to credit or blame a keeper. A keeper who concedes a few shots with low expected goal values might do so because he had the misfortune of finding the corner, or he might do so because he was caught flatfooted and left an entire side of the goal open. Traditional xG won’t differentiate.

Expected goal models that take post shot information into account will certainly be better judges of what’s happening on the field. They’ll better pinpoint the degree to which keeper’s just get unlucky and face a string of shooters aiming for the corners or are bundling more mundane shots. But, even there we should be careful. We simply know very little about the causal mechanisms of goalkeeping. In theory modeling involves testing outcomes against prior assumptions, and when it comes to keeping we should be very careful about giving our prior assumptions a lot of weight.

This cuts both ways because as the question points out, the conventional wisdom on evaluating keeping, at least in the public sphere, is awful. Generally speaking, analysis starts from whether or not the ball ended up going in the back of the net and works backwards. That’s not how you want to do things. An expected goal model with post shot information is certainly helpful against that baseline, but it’s an exceedingly low bar to clear. Ultimately, whether we’re talking about using analytics or our eyes to evaluate keeping everybody should be very aware of just how much they don’t know.

 

Why even run a mailbag if you don’t take at least one question to be self-indulgent? I’ve had a weird and somewhat peripatetic career path. I graduated college during the height of the online poker boom and was a professional poker player for a couple of years. I left poker to work in finance (literally on Wall Street, right above the New York Stock Exchange) for a couple of years, and then left that job when my now wife joined the United States Foreign Service. I needed a job that was portable and I could do from almost anywhere so I started writing, about finance for money, and about soccer as a hobby. Writing about soccer is the first thing I’ve done professionally for more than two years in a row.

The unifying factor through all of this is that in some way I’ve always been interested in explaining technical ideas to a broader audience. In poker, the online boom meant that a field that had been mostly based on conventional wisdom and some very basic math all of a sudden became an incredibly data rich environment. Some conventional wisdom turned out to be very wrong, though a lot turned out to be pretty accurate, and the devil was in the details, the edge cases, the places where understanding the rule of thumb meant figuring out what the exceptions were faster than everybody else. In finance I happened to work for a youngish boss who was a natural with numbers, and grew up staring at as many computer screens streaming numbers as was humanly possible. His boss, however, was an old school stock exchange floor broker, a world built on relationships as much as math. Lots of my job involved making sure the two of them didn’t strangle each other.

I view my work in soccer analytics in a similar light. Around five years ago when I started doing this as a hobby I saw a world which was both highly technical, but conceptually not very far away from what most fans were used to. Lots of very smart people were bringing a new set of tools to the game, but they were trying to answer the same questions that football fans have been wondering about for time immemorial. I was drawn to the challenge of bridging that gap. Taking the tools that people were using and making the things they were learning accessible to a broader audience. I’m still pretty obsessed with that challenge.

 

Stats are useful immediately. The question is can you get good and reliable stats? It’s certainly true that lots of players, even ones that show up as promising statistically, will flame out. It’s also true that lots of players that scouts flag as promising will miss the mark. There’s no magic bullet to projecting talent. There’s only making the best decisions you can when faced with lots of limited information.

Even very early on stats can help in a number of ways. Most teams have limited budgets for scouting. It’s important to scout players, but figuring out who to spend limited man hours chasing down and watching is a huge part of recruitment. Also, scouts need to be able to set their expectations. How likely is it that the player they’re seeing is having an off day, or a spectacular one? What specific traits are they looking for? Historically, these kinds of decisions have been made by networking. Sometimes that works well, and sometimes it works in the favor the agent of agency doing the networking. Stats are simply another tool to use to help make decisions in the face of imperfect information.

That said, especially at young ages, it’s simply hard to get good numbers. Kids don’t play that many games with the senior team, and when they do it’s almost always against skewed competition (weak sides in cup games for example). So there’s an extreme sample size problem. Numbers can help set your expectations but it’s important to maintain a willingness to change as the evidence evolves. Still, you’d certainly rather have numbers, even a small set of them, to complement what an agent or scout has, than not have them at all.

One last point here is that teams themselves can use numbers within their own development system in all sorts of creative and effective ways. We talk all the time about just how hard it is to tease out finishing skill from the numbers. Well if you have players at your disposal for years on end, and you can track and record and compare all of their shooting in practice, and even design drills meant to provide information, then it becomes a lot easier to figure out who is better at what. That gives a team added information when it comes to both developing, and valuing their own players. It’s a great piece of private information that can help teams separate the signal from the noise among their own players and give them a leg up over the competition.

 

They would not. Because I am terrible beyond terrible. There’s actually a real point to be made in here. Part of the reason that analytics works is that sports operate in a relatively closed environment. Players that are too terrible to play in a league don’t end up playing in that league for long. When analytics people say that “shooting skill doesn’t exist” (a position which I don’t endorse but is still closer to being right than wrong) what they mean is that at the top of the football pyramid everybody is close enough to being equal that the differences don’t matter. They don’t mean that I, Michael Goodman, am equally as good at kicking a football as Sergio Aguero.

The reality of professional sports is that at some point everybody who steps on the field is really really good, and that the differences between them dwarf the differences between professional athletes and everybody else. That’s the world analytics operates in. Players that are significantly worse than the general standard of the league they’re playing in by and large get found out quickly and nailed to the bench. The differences between players that football teams or scouts, or analytics nerds, or supporters are looking at are extremely minor compared to the differences between the set of professional players and the rest of humanity. It’s worth keeping in mind for two reasons. The first is that it properly contextualizes the work that’s being done, and the second is that it’s helpful in dealing with outliers. Sometimes a really unlucky seeming player is just so terrible that the they fall outside the context that analytics provides. Not always, sometimes they’re just very (very very) unlucky. But doing thorough analysis means allowing for the possibility that an outlier of a player is just, simply, a terrible player.

 

I was, um, surprised by the number of relationship questions I actually got to fill this slot. I’d really love to answer all of them. I also like my job running a soccer analytics website, so one question it is (although if you happen to be an editor in the market for a totally unqualified advice columnist who is reading this because they just happen to love soccer stats and stumbled on my fun little gimmick but now wants to shower me with money to write an actual advice column you know where to find me).

Ok, enough preamble, advice time.

Dear Carolyn (I will absolutely not call you DD in D),

You seem like you’ve got a pretty good life going on! You’ve got friends, and family and a cool job and are writing a novel (some of these things I happen to know because we are twitter friends and you tweet about them). As you say in your question 90% of your free time is taken up by hanging out with friends and family. Honestly that seems to me like it’s a lot of fun. So, I guess what I’m kind of wondering is, how much do you actually really want to be dating?

It doesn’t strike me that you are having trouble dating, it strikes me that you are having trouble wanting to date. You are prioritizing things other than dating, and honestly seem pretty happy about it. Look, nobody reads a fake advice column for bullshit pseudo economic buzzwords, but it sure seems like we’ve got a case of revealed preferences going on here. Because the easy answer to your question is if you want to date more and meet single people, then you need to juggle around the priorities in your life so that you aren’t spending 90% of your free time with friends and family and have more than the exhausted 10% left for thinking about dating before reasonably deciding that sleep seems more appealing.

There are of course ways this can be bad. If you were, say, secretly pining away for love at all hours and only deigning to show up at happy hour or nana’s birthday as a way to distract yourself from your misery, that would be a problem. Or, if you decided to give up on love because you talked yourself into believing it was impossible, and viewed your social life as a consolation prize, I’d say that that’s probably not healthy and something that you should go ahead and work through.

But, like, there’s nothing wrong with living a happy fulfilled life with dating as a deemphasized part of it. Maybe at some point your priorities will change and you’ll shift your time around and choose to skip the occasional hang in favor of dabbling with a dating app of your choice or doing those corny singles activity nights (is speeding a thing still? It is on bad network TV dramas, but I am old and married and sit at home and watch bad network dramas and don’t do things like speed dating.). Or maybe something fun will happen when you least expect it. Friends and romance aren’t mutually exclusive of course, and entirely walling off the two from each other seems, at the very least, impractical.

For now though, there really isn’t anything wrong with not dating because you’d rather be doing other things. If you want to date more, you can always reprioritize your time and go do it.

 

Header image courtesy of the Press Association

Article by Mike Goodman