A Family Tree of European Offenses
In my previous two posts on StatsBomb, I have used passing data to create team profiles and find similarity scores and then built on that to create a family tree relating how teams defend all across Europe. Today brings the offensive side of the ball to the forefront. You can read the full process of how these metrics are created and related in the previous two pieces but I will give a quick run-through here.
The metrics used are:
Shot tempo: shots per pass (3 highest tempos: QPR, Leverkusen, Crystal Palace. Lowest: PSG, Bayern, Manchester United)
Box activity: how often a team passes into the box per game (3 highest: Bayern, Dortmund, Man City. Lowest: Nantes, Bastia, Cordoba)
Intra-box success rate: completion % of passes that start and end inside the box (3 highest: Man City, Lyon, Bordeaux. Lowest: Hertha, Koln, Athletic Bilbao)
Centrality: % of completions in middle of pitch when in final third (3 highest: Torino, Dortmund, Hoffenheim. Lowest: Real Sociedad, Atletico Madrid, Levante)
Possession: share of possession (3 highest: Bayern, Barcelona, PSG. Lowest: Palace, Hertha, Eibar)
Forward play: % of completions that are forward (3 highest: Paderborn, Marseille, Hoffenheim. Lowest: Manchester United, Roma, Manchester City)
Field tilt: how far up the pitch the average pass is completed at (Highest: Man City, Barcelona, Chelsea. Lowest: Augsburg, Torino, Rennes)
and a few new ones for offense
instead of simple long ball%, two metrics replace it
Penalty box entry length: the average distance of a pass in which a team enters the box (Shortest: Barcelona, Arsenal, Manchester City. Longest: Eibar, Evian, Levante)
Playout length: average length of completions from deep in own half (shortest: PSG, Cagliari, Inter. Longest: Burnley, Eibar, QPR)
also new are
Red Zone%: % of passes that are completed within 20 yards of the opposition goal (highest: Man City, Leverkusen, Burnley. Lowest: Cordoba, Elche, Levante)
Diagonals: added thanks to this treatise from Adin Osmanbasic it is a measure of what % of a teams passes are long and diagonal. (highest: Bayern, Lyon, Lazio. Lowest: Palace, Palermo, Sunderland)
Are these the best metrics for judging a style of play? Almost certainly not, that will be a long process full of tweaking and testing. Right now, I feel satisfied this gives us good groupings as the variables are generally measuring different things (none correlate above .5 with each other) and are measuring some reasonably distinctive part of the game. I've weighted some metrics more (shot tempo, box attacks, possession, intra-box success rate) and some less (diagonals, forward play) before running these analyses.
The first analysis was a k-means cluster analysis using those metrics to group similar teams together. "K" is number of groups and there is always a debate as to how many you should choose. I ran analysis with k ranging from 12 to 35 and then looked at how much variance was explained by each one. 20 was where the variance seemed to stop decreasing consistently and since I used 20 groups in my defensive piece, I was happy to go with 20 again. If you choose a different k, you will get teams shuffled around a bit within groups as obviously certain teams are barely part of one group and could be moved to another without much concern. Once they had been grouped I ran an agglomerative hierarchal clustering on those group metrics to create a tree graph relating all the groups of teams across Europe. The tree graph as a whole is below, then I will go through each branch for a quick overview.
We will start with the 5 groups at the top.
From top to bottom:
Group 6: Lazio, Sampdoria, Torino
is closely related to
Group 5: Genoa, Frankfurt, Newcastle, Parma, Malaga, Villarreal
We start out with an enormous yawn. A bunch of solidly midtable teams along with Newcastle and Parma (whose defenses were historically awful, offenses weren't near as bad). These teams have few standout characteristics either way. They do have well above average shot tempos and are good at passing inside the box. The main difference is Group 6 plays extremely centrally and play a lot of diagonal balls while Group 5 plays higher up the pitch and spends a high % of their time in the "red zone" (within approximately 25 yards of goal).
Group 8: Palermo, Atletico Madrid
Don't play in the center and essentially never play diagonal passes. Have a very high field tilt, yet are well below average at time spent in the red zone and box attacks. I do wonder which metrics are mainly manager-related and which are player-related and if it's even possible to separate them satisfactorily. If you have Messi and Neymar, you will hit a lot of diagonals and have a great intra-box passing % no matter what, even if they play for Diego Simeone right? I tend to think Atletico rarely play diagonal passes because so many attacks go through the wings and where there is only one way to play a diagonal ball and that's into the teeth of the defense. The fact Atletico don't commit many men to attack and seem to set up defense first means there will be less options to hit across the field. Hopefully a piece on variance of styles throughout the season can help find manager effect.
Group 3: Cesena, Hull City, Guingamp, Metz, Montpellier, Almeria
Group 1: Atalanta, Chievo, Burnley, Palace, Leicester, QPR, West Ham, Bremen
These teams rarely play with the ball and play it long repeatedly. Group 1's shot tempo is by some distance the highest of any group, and they generally spend a lot of time in the red zone and are above average at putting balls into the box. Group 3 has neither of those last two positive metrics, providing the difference.
West Ham at some point this season were being mentioned as a team that might break into the top 5 and were good enough to qualify for Europe (I guess they did, but I doubt many of those pundits were eyeing the Fair Play Table at the time). They wound up as a pretty poor team playing similarly to a lot of other poor teams.
Group 13: Stoke, Toulouse, Mainz, Augsburg, Hamburg, Freiburg, Stuttgart
Group 9: Sassuolo, Udinese, Verona, Koln, Hertha, Paderborn
Here we find most of the bottom of the German table. We find teams in this group play normal length passes deep in their own half but play long balls into the box. They have low possession rates, low field tilt and very low intra-box success rates. Group 13 has the ball and tests the box a lot more than group 9.
Group 16: West Brom, Caen, Lens, Athletic, Espanyol, Real Sociedad
Group 14: Sunderland, Bastia, Cordoba, Deportivo, Eibar, Getafe, Granada, Levante
These teams are even worse at passing inside the box and couple it with rarely getting the ball to the box. They generally play long balls throughout the entire field, don't play centrally, and don't have a large share of the ball. Group 16 has a higher share of the ball, play shorter passes into the box, and have a significantly higher field tilt. Last year David Moyes was in the Champions League managing Manchester United against Bayern Munich and he ends this season lumped in with Caen and West Brom by some guy on StatsBomb. What a fall.
Group 17: Evian, Nantes, Reims, Elche
Group 15: Swansea, Bordeaux, Lille, Lorient, Nice, St Etienne, Valencia
Here we have the patient, pick your spots teams. These teams breach the box at very low rates, but break into the box using short passes (group 15 significantly shorter) and complete a very high rate of their intra-box passes. Group 15 sees more of the ball while Group 17 plays mainly through the wings.
Group 11: Aston Villa, Rennes
Low possession teams generally play directly and shoot quickly when they get the ball. They don't have the quality to hold the ball and play intricately so seem to rush the ball up the pitch and fire. Aston Villa and Rennes do not:
They couple their low shot tempo with the longest average pass length of any group when entering the ball into the box and below average intra-box pass success rates. So they aren't picking and choosing prime spots, but seem to simply have a lot of useless completions that don't get them closer to the box or a shot. Not pretty.
Group 18: Marseille, Wolfsburg, Rayo Vallecano
An interesting group here with two high-pressing defenses in Marseille and Rayo. These teams have the lowest average field tilt of any group and a high rate of forward play. They play short passes at both ends of the field and tend to play through the wings in the attacking third. It's a strange profile as they almost play counter-attack football with very high possession rates. I am guessing the fact the game has become very stretched in Marseille and Rayo's case leads to many of these numbers. When the other team is wide open you can play forward and don't spend a lot of time passing it around against a set box (which would raise your field tilt rating).
Group 7: Milan, Schalke
If nothing else comes from this, I think the grouping absolutely got this one right. On a gut level this just feels perfect. Two big-budget teams who performed absolutely dreadfully this season (barring maybe the most bizarre game of the season in Madrid). Slow, ponderous play that rarely gets the ball near the goal or tests the box does not make for good watching. For good measure they are atrocious at passing inside the box. At least they don't hit a lot of long balls, right?
After this group there is a big gap. Look back up at the main tree and you will see there isn't much similarity between groups 7 and 18 and then 20.
Group 20: Bayern, Barcelona, Celta Vigo
Group 4: Empoli, Inter, Roma, Spurs, Juventus
Now we start to get to the high possession, highly effective offenses clustered here at the bottom of the tree. Celta Vigo kind of stand out here and while they certainly don't reach the heights of Bayern or Barca, they style themselves similarly. They are good inside the box, hold the ball at very high rates, attack the box a lot and play short passes to enter the box. When you combine this offensive style with the crazy Bielsa pressing tactics and taking impeccably named Chilean team O'Higgins to their first league title ever back in 2013, Celta manager Eduardo Berizzo should at least be taken a look at for bigger and better jobs in the coming years.
Inter and AC Milan finished near each other in the table and are linked together in my generally EPL-centric mind but actually played very differently (a high box entry pass length bar here actually refers to a short pass in what was an astoundingly poor design choice):
Another team who is interesting in how they profile with these metrics is Empoli. Their defense was mixed in with Fiorentina, PSG, and Manchester United and now their offense reaches high class company as well. Kind of strange for a team that won 8 games all year and finished 15th a year after promotion from Serie B. Without knowing much else about him except these profiles, I'd wager that Maurizio Sarri will be another manager to watch going forward. And as soon as I typed that sentence I scrolled down on his Wiki page to find out that he has been confirmed as the new Napoli manager. Another instance of the profiles running ahead of my knowledge. To get a team with no players on more than $300,000 yearly salary to play like this is quite an achievement.
Group 12: Everton, Manchester United, Lyon, Monaco, PSG, Hannover, Gladbach
An imaging error has led to the Gladbach logo being left off. Any Foals fans feeling left out, please go read my long investigation into the entirety of Gladbach and get back to me. These teams have very slow developing attacks that don't get up the field at high rates. They are very good at passing inside the box.
Group 10: Arsenal, Man City, Liverpool, Chelsea, Southampton
Saints and Liverpool just barely make this group, but it shows the serious stratification of the EPL once again. These teams pepper the box (Saints 62nd percentile, all others 78+) from central areas (all above 80th percentile), with short passes (all above 72nd percentile) and are great at completing passes once inside the box (each team above 80th percentile). Chelsea and especially Arsenal and City are near the top of Europe at all of these things but Liverpool and Saints are like the little brothers who are doing what their big brothers do, just not quite as well.
Group 2: Cagliari, Fiorentina, Napoli, Real Madrid
Only PSG played out of the back using shorter passes than Cagliari. They can take some solace in that stat and the fact their offense was grouped with these 3 teams next season as they play in Serie B. The main problem there was they allowed 68 goals and their defense was grouped with QPR, Burnley, and Chievo.
Group 19: Dortmund, Leverkusen, Sevilla, Hoffenheim
The crazy uncles who seem to have little relation to anyone else. Usually shot speed is correlated with possession, as the chart with Aston Villa and Rennes showed earlier. Here we see it flipped the other way: teams with high possession who still fire a lot of shots per pass.
This group also has a higher % of their passes in the "red zone" or the 25 yards or so within the goal of any group. They pepper the box more than anyone bar the Bayern/Barca group and play extremely centrally.
I think there is great potential with this type of analysis, especially once we begin to drill down into style vs style or game to game analysis. Tom Worville thought it could be used for transfers for teams looking for flexible players or possibly players with experience playing the style they wanted to. I am not sure if I would feel comfortable basing player analyses on this broad, team-level data right not but certainly it could be good for a starting point. For example, if Sunderland is looking to fix their offense maybe they would study what Athletic Bilbao does differently. Since Bilbao does a lot of things similarly to Sunderland, the differences might be easier to reach than studying Arsenal or Barcelona. At the very least, a quick glance at these graphs can make anyone much more knowledgable about the game across Europe and then decide what to look at further from that. For example, I had no idea about Empoli or Celta Vigo's style of play before a week or so ago. Now I will keep my eye on Maurizio Sarri and Eduardo Berizzo going forward despite having not watched more than 30 minutes of those two teams total in the previous year.
And again, this is a rough guideline. If you change any of the metrics or the number of clusters you would get slightly different results. Southampton and Liverpool were close to breaking off into a separate group from the big 3 English offenses, Swansea was close to joining group 17, Bremen and Hannover are only loosely attached to their groups and several more things might have changed. These groups are not set in stone at all.
If you have any questions, criticisms, comments or want to discuss this further you can reach me on twitter @SaturdayonCouch or post a comment on my blog. I'd love to discuss.