The Passing Motifs methodology is something I’ve been working for a couple of months now, and it has left pretty satisfying results convincingly representing team and player passing style. I got the original idea from an article pre-print on the arXiv titled “Searching for a Unique Style in Soccer” by Laszlo Gyarmati, Haewoon Kwak and Pablo Rodriguez. These guys do research for Spanish telecommunications giant Telefonica and took a slight detour into football analytics by applying a mathematical concept from graph theory which they apply to communications networks, basically to prove the popular point that Barcelona have a pretty unique passing network. Since then, I have substantially modified their original idea and have obtained a pretty cool methodology of my own.
Here’s how it works:
The basic idea is to break up passing sequences into 3-pass long subsequences (usually overlapping) where the identity of the node is relaxed. If you’re not used to mathematical jargon this might sound confusing, but it’s pretty simple to get your mind around:
There are 5 possible 3-pass long motifs; identified by their acronyms:
The process of identifying the passing motifs is simply taking a passing sequence, breaking it up into 3-pass subsequences and looking which motif they fit into. The key concept is that at first we are not interested in the particular player performing the passes, simply the flow of passes amongst them. A sequence Kroos – Modric – Bale – Kroos and a sequence Ronaldo – Kroos – Benzema – Ronaldo are simply two separate instances of ABCA.
At the end, we are left with a counter for each of the 5 motifs for each team in each match we have the necessary data for. Simple enough. So how do we use this to identify passing style?
Team Passing Style:
The original authors’ reasoning is that by understanding the motifs’ distribution for different teams, inherent information about a team’s playing style will become apparent. It seems like a reasonable intuition, if we consider for example that ABCD is a direct build-up passing sequence involving 4 different players, while ABAB most likely reveals a patient build up where 2 players give the ball back and forth in the style we usually attribute to Barcelona or Bayern Munich.
However, rather than looking at the raw numbers of how many times each team performed a certain motif, I found it more interesting to look at the relative frequencies. That is to say, for a certain match I would break down each team’s distribution of motifs into something like 13% ABAB, 22% ABAC, 30% ABCA, etc., rather than looking at the actual number of times the motif was performed. This is interesting because it should represent something like ‘intent’. When you have the ball, what do you intend to do with it. If we focus on absolute numbers rather than relative frequencies, then high possession teams like Barcelona or Arsenal would always come up as being unique, something that we all already know and don’t need this methodology to tell us.
By taking each team’s relative frequencies for each motif and averaging over an entire season, I obtained a 5-dimensional vector representing each team over the course of a season. Below is the hierarchical clustering dendrogram for this methodology using data from the 2014-2015 and 2015-16 Premier League seasons respectively.
There are several interesting things to point out. First of all, 2015-16 title-winning Leicester has the most distinctive motif frequencies distribution; and forms a subgroup with the Premier League’s passing powerhouses Arsenal and Manchester City. Not only this, but this unique identity of Leicester’s motif distribution was also present for the 2014-15 season in which they went on an impressive run at the end to avoid relegation. However, nobody could have foreseen their exploits of the following season. Was this a sign we should have seen that there was something special about this team?
The consistency of Leicester’s motif vector for both seasons is a good sign that this method is capturing an underlying quality which we can call “passing style” rather than simply randomly assigning values through statistical noise. The consistency of the method is also present in other teams to which the method attributes similar styles such as Arsenal-Manchester City, Tottenham-Chelsea and Crystal Palace-Sunderland. If the method wasn’t picking up on stable underlying qualities of the team’s passing style, the probability of the method assigning these pairings by chance for two consecutive seasons is very low.
It seems we’re on the right track to quantify team passing style…
Player Passing Style:
Extrapolating this methodology convincingly to a player level is an exciting prospect for recruitment. Passing play is obviously a major factor of a team’s potential. Let’s say we could find economically efficient alternatives to Bayern Munich’s players and set up a low-cost team with the potential to execute a similar style of play to that which has made Bayern so dominant. Seems a bit naïve, but there’s definitely a competitive advantage there for clubs.
The question then is how to manipulate the info and translate it into a player context; and once we have done this, how to validate that we are in fact picking up on stable underlying qualities of the players. The problem for teams already introduced a key “validating element”: consistency across consecutive seasons. I won’t get into much detail here, but you can have a look in this entry from my blog on how I measure this consistency.
Long story short, I settled on a 45-dimensional (yes, 45) vector representing each player. Once again, for more details on how and why this was done in this way, you can have a look at this other entry from my blog. This is the summary at how I constructed this 45-dimensional vector:
Since 5+5+5+15+15=45, we are left with our 45-dimensional vector representing each player. This entry explains how we know that this vectorisation is performing well; that is to say, the vector representations for players are in a sense “stable” across consecutive seasons, indicating that the methodology is picking up on some underlying qualities of a player’s passing style rather than randomly assigning statistical noise.
I’m quite happy with the results and there is good evidence that the vectorisation contains valuable information. Presenting this information in a visual way to the reader isn’t exactly straightforward. One way to do it is by displaying hierarchical clustering dendrograms of the results.
Below is a link for the pdf for the hierarchical clustering dendrogram applied to the data set for the 2015-16 season of the Premier League (only players who played in over 18 matches). Since there are 279 players, the tree labels are really tiny so the image couldn't be uploaded onto the site directly, but on the pdf you can use your explorer's zoom to explore the results.
https://drive.google.com/file/d/0Bzvjb5fnv1HtZjFtRDJjUVBua0E/view
If you’d rather not, here’s a selection of the method’s results:
This is a poor man’s substitute to actually exploring the dendrogram yourselves. Not to mention that a clustering dendrogram is not even the most faithful representation of the information being collected by this vectorisation, but I’m more than happy with the results and feel there is some real promise to the methodology. If I can come up with some better visualisations for the results I’ll post those later on.
Please have a look through the results from the dendrogram and comment on whether you feel we’re getting close to convincingly capturing player passing style through passing motifs.
Find me on twitter @dperdomomeza1
For prior work on this subject see my blog here