The Increasing Complexity of Analytics
Why does anybody do analytics?
Another Sloan Sports conference has come and gone. For those who don’t know, the conference, held annually at MIT in Boston, is basically a trade show for sports analytics. Panels abound, there is glad handing galore, resumes are passed, and on panel after panel numbers are endlessly hashed and rehashed. Oh, and also, Meek Mill was there, because reasons.
There’s also math. Papers are submitted, awards are handed out, posters are posted. And the thing about the math in the modern days of analytics is that it’s really complicated. There was a time when the bulk of analytics work was doing basic math and applying it to sports. That time is not this time, at least not at Sloan.
When the math gets as complicated as it has, it presents a new set of challenges for the people doing the work. Part of the reason that football analytics coalesced around expected goals is that it fits comfortably within how people, both coaches and fans, traditionally think about the game. How many chances were created? How good were those chances? Would you rather have a lot of speculative efforts or a couple of golden chances?
It’s not only that the questions xG sought to answer were ones that people intimately familiar with the game were already asking, it’s that it’s methodology was fairly simple as well. Look at all the shots, factor all the things that went into them (where they were, what part of the body they were with, now increasingly where the defenders in front of them were) throw in just a little dash of math to figure out how best to weight the variables, and you have an answer that works well (but clearly not perfectly) for both descriptive and predictive purposes.
But, as analytics increasingly moves into the spaces behind the shots, into they “why” of it all, the chasm gets harder to bridge. The next step, one which we’ve presented some methodology on, is building a passing model. The idea is to do for everything before the shot what xG did for shots, answer the same questions everybody is asking, but do it in a rigorous stats based manner. Who is moving the ball forward into the best areas? Who is too aggressive with their passing, or too conservative? Who uses space to spring attacks, who consistently wastes it and fails to play the ball into advantageous areas? How much credit does a player who started a move deserve? Which passes are the most important and unique ones, and which ones are the kinds of passes that most players can make?
These aren’t easy questions to answer. And while xG involved taking factors that everybody understood and adding in a sprinkling of math in order to get results, passing models are the opposite. Because, from a stats standpoint, the problem is so hard, the math to try and solve it becomes, well, a whole lot more mathy. There’s no avoiding the fact that building a passing model involves doing a lot of work that people will have neither the inclination nor the training to unpack. So, why do it?
For people working inside the game that’s an easy answer. To get an edge. The hope is that a good passing model well implemented will turn up players that other methodologies miss. With limited time and limited scouting budgets, the ability to use numbers to unearth a handful of hidden gems to then send your scouts off to further investigate provides great value for teams. It’s hard to access that value of course. The process needs to run smoothly. A manager needs to communicate what he needs from a potential new player, the scouts and analytics department have to work together to unearth possible players that fit the bill, and then everybody has to crosscheck, see where their Venn diagrams overlap, and finally the business people need to execute the deal on any gems you may find. It’s hard to do it well, but it’s easy to explain why you’d go about that task.
It’s a little bit harder when that task moves over to the media side of the game. On one level, I, as managing editor of an analytics website, find the process of analysis itself interesting. There is clearly some degree of public interest in how teams try and improve themselves, and what they could be doing better as they try and get a leg up on their competitors. Whether it’s youth systems, or set piece coaches, or state of the art medical facilities, or any of a million other little edges teams pursue, it’s interesting to supporters to report on what teams are doing to try and get better.
But, at a more basic level, using analytics is also supposed to help us understand the game we’re covering better. Using xG helps people covering the game explain what’s happening. It’s easier now to pinpoint which teams good results are coming from a streak of hot finishing that’s unlikely to continue (think Arsenal earlier this year for example) or which struggling goal scorer is likely to come good.
It will be more challenging for analytics writers to do the same with passing models. That’s not to say it’s impossible, but the added layer of math abstraction requires more translation. It’s not enough to say that a model demonstrates a player being good, it’s important to understand the model well enough to understand in football terms what he or she is good at. Similarly, if a model doesn’t rate a player, but that player seems to do some obvious things well, it’s incumbent upon the person citing the model to be able to explain in football terms what they’re taking off the table.
Complex modeling is valuable. It will often pick up on things that the human eye does not. Used in concert with other tools it makes scouting and understanding the game easier. But modeling doesn’t explain itself. It’s never enough to shrug and defer to the model. Model’s, for them to work, must be integrated into a fundamental understanding of the game they’re modeling. When the models are relatively simple, like with xG, that’s a small hurdle to overcome. As the math gets more complex, the explanations get harder. That doesn’t make them any less important.
Header image courtesy of the Press Association