I have been involved in football analytics for four years and doing it for a living since 2014. It has been a wonderful adventure, but there is no denying that the public side of the field has stalled. But this is not really a “crisis of analytics” piece or an indictment of the community. Instead, I want to point out one critical barrier to further advancement and plot a course around it. In short, I want to argue for a more theoretical, concept-driven approach to football analysis, which is in my opinion overdue.
It is going to be easy to read this short article as a call to basic, as opposed to applied, research and consequently dismiss the ideas as impractical. Try not to do this. I like applied football analytics and I firmly believe that it has value — even the public variety. But I also believe that we have now reached the point where all obvious work has been done, and to progress we must take a step back and reassess the field as a whole.

I think about football analytics as a bona fide scientific discipline: quantitative study of a particular class of complex systems. Put like this it is not fundamentally different from other sciences like biology or physics or linguistics. It is just much less mature. And in my view we have now reached a point where the entire discipline is held back by a key aspect of this immaturity: the lack of theoretical developments. Established scientific disciplines rely on abstract concepts to organise their discoveries and provide a language in which conjectures can be stated, arguments conducted and findings related to each other. We lack this kind of language for football analytics. We are doing biology without evolution; physics without calculus; linguistics without grammar. As a result, instead of building a coherent and ever-expanding body of knowledge, we collect isolated factoids.

Almost the entire conceptual arsenal that we use today to describe and study football consists of on-the-ball event types, that is to say it maps directly to raw data. We speak of “tackles” and “aerial duels” and “big chances” without pausing to consider whether they are the appropriate unit of analysis. I believe that they are not. That is not to say that the events are not real; but they are merely side effects of a complex and fluid process that is football, and in isolation carry little information about its true nature. To focus on them then is to watch the train passing by looking at the sparks it sets off on the rails. The only established abstract concept in football analytics currently is expected goals. For good reasons it has become central to the field, a framework in its own right. But because of it focuses on the end result (goal probability), all variation without impact on xG is ignored. This focus on the value of a football action or pattern rather than on its nature seriously undercuts our understanding of the fundamental principles of the game. Just like isolated on-the-ball events, expected goals tell us next to nothing about the dynamic properties of football.

Indeed it’s the quantitative dynamics of football that remains the biggest so-far unexplored area of the game. We have very little understanding of how the ball and the players cross time and space in the course of a game, and how their trajectories and actions coalesce into team dynamics and, eventually, produce team outputs including goals. This gap in knowledge casts real doubts on the entirety of quantitative player analysis: since we do not know how individual player actions fit in the team dynamics, how can we claim to be rating the players robustly? And before the obvious objection is raised: these dynamic processes remain unexplored not for the lack of tracking data. The event data that is widely available nowadays contains plenty of dynamic information, but as long as our vocabulary forces us to consider every event in isolation, we cannot but glimpse it.

Luckily, a newer concept is emerging into view and taking a central place: the possession chain (possession for short). A possession is a sequence of consecutive on-the-ball events when the ball is under the effective control of a single team. A football game can then be seen as an (ordered) collection of sequences. It is a very positive development since possessions make much more sense as the fundamental building blocks of the game than events. This is because they are inherently dynamic — they span time and space. I believe that they should be studied for their own sake, and if you only compute them to figure out who should get partial credit for the shot at the end of it, then in my opinion, you are doing analytics wrong – or at least not as well as you could be.

To give an example of such a study and why it is important, consider the question: what makes two possessions similar? To a human brain, trained in pattern recognition for millions of years, it is a relatively easy question. It is however a quite difficult, basic research task to devise a formal similarity measure given the disparate nature of the data that makes up a possession (continuous spatial and time coordinates, discrete events, and their ordering). For the sake of argument, assume that we have a measure that we are happy with. It has an immediate, powerful application: we can now measure the similarity of playing styles of teams and players by measuring the similarity of possessions in which they are involved. This method is bound to be much more precise than the current, purely output-based methods, and as we know, playing style similarity has a wealth of applications in tactics and scouting. But that’s not the end of the story. Our hypothetical measure, having already provided a considerable applied benefit, can now be fed back into basic research. Under a few relatively mild additional assumptions, the measure gives a rich structure to the set of all possible possessions, potentially allowing us to deploy a century of research in general topology and metric spaces to make statements about football. But for all these potential rewards, the subject remains unexplored because of the twin obstacles of inadequate conceptual arsenal and perceived lack of immediate applied benefit.

* * *

Based on what I sketched above, my suggestion for everyone involved in the field is to be more ambitious, to think more expansively and to not settle for imperfect investigations of lesser questions just because the data seems to be limiting us in this way. It isn’t, at least not all the time. Instead of counting events in more and more sophisticated ways, let us focus on possessions, ask broader questions and interrogate the data in more creative ways than before. I firmly believe that the payoff of this approach will be far greater than anything we have achieved so far.


I want to thank Colin Trainor (@colintrainor), Ted Knutson (@mixedknuts), James Yorke (@jair1970) and especially Thom Lawrence (@deepxg) for their feedback on earlier versions of this article.

  • Seb Crutchfield

    Really like the possession chain idea, am I right in thinking the basic structure of this would look like a list of the players involved with the pass distances between them, any individual dribble distance, and basic positional coordinates of first/last touch? Being able to compare those points together as a chain over a few games would be a really interesting tool for understanding a team’s desired tactics. Haven’t seen anything like this in the public realm, is it old news in professional circles? Anyway, I’d be really interested to hear from anyone on how these chains are measured so I can try to apply them to my own work. Thanks.

    • Ron IsNotMyRealName

      A possession is simply a period in which a team controls the ball. Man U last year should tell us quite definitively that just possessing doesn’t tell you anything about effectiveness except that you cannot give up a goal during the time when you possess the ball.

      Basically I would direct you toward sports that already measure the advancement of a ball up a field, and how that is done.

      Pretty much every popular sport in the world is way ahead of football on this. Why re-invent the wheel?

      I’m not sure why one would want to understand desired tactics rather than the effectiveness of them.

      • yathaid

        Why would you not? You want to know if player A can fit into your team, but it depends on what kind of system the player was being utilized in their current team.

        • Ron IsNotMyRealName

          Does it? I’m not sure that’s exactly been proven. I think it’s up to the team to know what kind of player they can use best, regardless of the system they were in.

          • yathaid

            Regardless of the system they were in? Okayyy ……..

          • Ron IsNotMyRealName

            I don’t see what’s impossible about that.

            You don’t hear so much about systems making players in other sports. Many common systems in football are easily morphable into other shapes. IMO the talents and characteristics of players are more important than the system.

  • kidmugsy

    The other half is how you play out of possession. How could one characterise that?

    • TTTactics

      I don’t know about that. But we can use the positive to examine the negative. How does team A set-up to null the effect of possession chain X? Why do they always struggle against teams using possession chain Y? So on

  • Ron IsNotMyRealName

    As an American, I find the idea that ‘all the obvious work has been done’ in football utterly hilarious.

    Football cannot even agree on how to measure a goal scorer’s effectiveness (and let’s not even attempt anything but goal scoring). Baseball had this licked 100+ years ago. It wasn’t perfect, but it was pretty good — a .300 hitter is almost always better than a .200 hitter, and a .300 hitter is usually better than a .250 hitter. Billy Beane merely look advantage of the minority of cases that ran counter to that (and benefitted from pitching in the farm system). Slugging was already valued. Walks weren’t. and a lot of baseball teams now have followed that up with terrible assumptions by exception, like that batters striking out isn’t that bad.

    Basketball coaches have been doing possession analysis since the 70s, and had descriptive statistics for on-ball activities for longer than that. There’s nothing wrong on-ball statistics at all. The problem is not knowing what to do with them.

    In football, we’re so far from even getting to either of those places. Expected goals is a fool’s game. It doesn’t take a genius to figure out that xG/shot is a matter of proximity to goal and centrality on the pitch. That’s the Pareto rule of goal scoring in football. But that’s like saying to get a hit in baseball you have to hit em where they aint — and baseball was at that point over 100 years ago also.

    It is frustrating to know that so many people are focusing on the wrong things, have the wrong ideas, but not be able to put forward things you know are better because the majority will either shout you down or steal your ideas. There’s so much “who you know” involved in working in sports that I suppose I’ll probably end up keeping things under my hat indefinitely.

    As a working rule though, I would say football should look at baseball and basketball (maybe hockey though that can lead to misunderstandings of things like PDO), and combine the ideas of the two into an integrated idea. Because football is fundamentally similar to basketball, and baseball is simply the gold standard for analytics.

    Trying to come up with some revolutionary kind of analytics when you’ve done so incredibly badly at even the old kind long ago mastered by other sports is the height of hubris.

    • Simon

      To suggest that football is “fundamentally similar to basketball” just shows you have a complete lack of knowledge about at least one of these games. And simply because baseball was the first sport to use advanced analysis, does not mean that other sports can or should rely on these same principles as well. In fact, I believe that the differences between football and baseball are (clearly) so immense, that trying to implement any of the fundamentals of baseball- analyses into football would be a waste of time any time.

      • Ron IsNotMyRealName

        If you’re unwilling to learn from the people doing it better than you, you’re wasting chances to warp through the learning curve. The statcast stuff that is the new cutting edge in baseball (actually not really, but the one that’s best known among hte next-gen stuff) is nothing that football clubs couldn’t and shouldn’t be doing.

        I’m not going to argue about the similarities in game play between basketball and football because you’ve obviously got your mind made up. But it should be obvious that we know from data much better who is a good shooter in basketball than you do in football.

        Even American football does things fundamental to football analysis better than football does.

        Perhaps I’ve wasted my time with this post. perhaps with the site as well. I’m really hoping Ted starts writing more about his observations and issues in analytics, putting forward ideas for overcoming weaknesses in data collection and presentation, actually evaluating players, and less…well, other stuff.

        Oh well, things to do.

  • Ron IsNotMyRealName

    There are a few statistics that tell you about off the ball play. How infrequently (on a relative and absolute basis) a defender gets dribbled past, for example, tells you how effective he is at keeping opponents in front. It migth also tell you how deep a defender is playing, but it might not. Shots from inside the box tells you about more than just the obvious because rarely does a player get there on his own. So he had to make some kind of movement to be available for a pass either to the spot or into space such that he could then move into the box.

    Descriptive statistics about plays on the ball tell more than just that story, but you have to actually understand what it is they’re actually telling you. Inventing something new that you just then have to understand and comprehend all over again isn’t going to cover up for a lack of understanding and eventual dismissal of important but more straightforward things you already don’t understand.

  • Mike Driggs

    I didn’t end up fully completing my charts for last season as it got really time consuming, but I think it fits in with what you were talking about. I charted goals scored. I went back and when the scoring team gained possession, I charted the ball movement up to the goal.