Measuring Tactical Variance by League


Manchester City won the 2013-2014 Premier League with a diverse and international (and very expensive) squad.  Of the players who made 20 or more league appearances, a full eight different nationalities were represented (nine if you count their Chilean manager, Manuel Pellegrini).  Only one first choice squad player, goalkeeper Joe Hart, was English.

In many ways Manchester City is representative of what many see as the future of European football, one in which hyper cross-pollination of playing styles and tactics renders our old heuristics (Spain = tiki-taka, Italy = catenaccio , etc.) useless.  In this future world of European football, then, we might expect the distribution of formations/tactics to be fairly consistent across different leagues. Of course that is not the case now, and quite possibly will never be.

 In the complex world of game theory and football formations, sometimes it behooves a manager to stick with an unsuccessful setup for no better reason than it is what everyone else in the league is doing; many people do not like to take risks, especially if their job is on the line.  Conversely, in a league like Serie A where using different formations/tactics from game to game is almost an obsession, an adherence to one formation might be frowned upon.

 It should be stated that while this piece is about “tactical” variance our only measurement tool is “formation” variance.  Formations and tactics are not necessarily the same thing.  For example, a 3-5-2 might in practice more resemble a 5-3-2 and any formation can exist in an attack-minded or defensive form.  However, to the extent we are measuring tactical heterogeneity/homogeneity it seems self-evident that measuring formation variance is probably as good of a proxy as any.  Formation information comes from Opta, whose analysts watch every game for each team they are assigned.  Also noteworthy is this data does not include any in-game changes and is merely how each team lined up at the start of the game.  Information is from the last completed season (’13-’14) and includes only formations used more than 3%.  Formations are listed from left (most used) to right (least used).

formations by league

A couple things stand out here:

1. The Eredivisie loves the 433 and Russia loves the 4231, almost to the exclusion of any other formation.

2. Serie A demonstrates a tactical diversity not seen in other leagues (see below).

team avg formations

The “favored” and “unfavored” formations are partially a symptom of the fairly eclectic mix of leagues included in the analysis.  If we just aggregated all the teams from the “Big 4” leagues (Bundesliga, EPL, La Liga, Serie A) this is what the results look like:

Big 4

The 4231 is certainly the fancied approach at the moment, but things can change.  For example, MLS has seen a rise in the use of the “diamond” 41212 in 2014. Unfortunately, this analysis does not include any data from previous seasons.  Will the homogeneity in the Dutch approach and heterogeneity in the Italian approach hold in the face football globalization?  It will certainly be worth watching.

Passing Impact and David Silva

David Silva

Overview David Silva’s nasty looking ankle injury is a demoralizing loss for Manchester City.  Although City will not have the Spaniard down the stretch of their Premiership title pursuit, world football (and City) fans will take some consolation in the fact that the injury is not as severe as it initially appeared.  While reviewing his season, Silva’s excellence across a number of passing statistical categories is striking.  It brought to mind a challenge I received from Matt Tomaszewicz  aka The Shin Guardian to try and derive an over-arching metric from a few ubiquitious passing statistics: number of passes, pass completion %, key passes.  The following is a (flawed) attempt to both quantify David Silva’s passing impact and meet Matt’s challenge. Passing Impact Who is the most impactful passer?  The obvious answer is someone whose passes create goals (assist).  But assists are so infrequent that really we are looking for passers that create goal scoring opportunities (key passes).  Of course, it would also be ideal to have the data for passes that create passes for goal scoring opportunities (secondary key passes), but that data is not publicly available.  Also, as has been pointed out by Colin Trainor (and others), being able to assess the quality of the shots the key passes create is very informative, and is notably absent from the key pass metric.  Nevertheless, key passes is what we have.  Below is the list of the top total key passers in the EPL this season.

 total kp

This seems a pretty good list.  Generally, these are names we associate with being impactful passers.  But what about efficiency?  Who is creating the most key passes per pass attempted? pass per key pass Ok, so this list is quite different than the first.  But look at the low passing % of some of these players, like Anichebe and Vydra.  We have to take incomplete passes into account as well. KP incomplete This is probably the best measure of key pass efficiency.  I included two versions of the same metric because while  I prefer KP/Incomplete %, some might prefer to visualize it the other way around.  It should be noted that turnovers or dispossessions are not included in this analysis.  Also absent?  Pass usage rate.  It is one thing for Kevin Mirallas to be incredibly efficient at creating goal scoring opportunities, but as an attacker how often does he see the ball relative to other players?  (Note: pass usage rate = player passes attempted / team passes attempted.)  This is the same list of the most efficient EPL key passers, but now with their pass usage rate. pukp So how do we combine the two metrics (efficiency and volume)?  I decided to measure each player’s total passing impact relative to an average EPL field player (non-GK). David Silva impact   In David Silva’s case, his passing impact while on the field for Manchester City is equivalent to almost five average EPL players.  If we exclude defenders and compared Silva to just midfielders and forwards his passing impact would still be equivalent to approximately 3.5 average midfielders or forwards.  In short, Silva’s passing impact is equivalent to almost an entire midfield of an average EPL team.  Here is the list of the top 10 players. pimpact   There are obviously a lot of flaws in this analysis, chief amongst them the reliance on key passes as a primary indicator of passing impact.  Therefore, it is no coincidence that a majority of this list are creative attacking midfielders.  Then again, if one were to create a “goal impact” rating, that list would primarily be populated by strikers.  No matter the statistical inputs, it is self-evident that David Silva is having an exceptional season and City, despite having the number two player on the list in Nasri, will no doubt miss his passing genius.

Splitting Possession into Offense/Defense

Does Possession % Matter? Any analytically inclined soccer fan (a.k.a. you) is probably well-aware of the limits of possession % as a meaningful metric.  In fact, its faults are so numerous and well documented that the ubiquitous  ironic mentions of “but what about possession?” every time Barcelona loses have (mostly) stopped.  I understand the collective derision, but if we look at the metric in a deeper way can we glean some interesting information?  I think so. One thing that I think does need to be stated is that there is a relationship between possession % and points (at least in the EPL – see graph below). epl poss v points   The causes of this relationship are complex and difficult to disentangle, but probably the best way to think of possession % is as a symptom of playing winning football as opposed to the cause, though of course sometimes it is the cause! Confusing! A must read on this subject is  Devin Pleuler’s  interesting take on possession as a defensive weapon. How is Possession % Calculated? Based on some good work a couple years back by Graham Macaree, we know that the possession % that the majority of media outlets use is really just a pass ratio.  The pass ratio approach is pretty simple: team possession % = team’s total passes / both teams’ total passes. This methodology was confirmed to me by an Opta employee.  We can debate the merits of this approach until we are blue in the face, but for many sensible reasons I think it is probably the best proxy. Splitting Possession % into Offense/Defense Not all pass ratios/possession % are created equal.  For example, let us assume that an average EPL match sees 900 passes on average between the two teams (450 for each team).  On this particular match day Arsenal outpasses Swansea 600-400 (60%/40%).  Across town, West Ham outpasses Crystal Palace 300-200 (60%-40%).  Both Arsenal and West Ham have the same possession % (60%), but they have achieved them in vastly different ways.  By comparing their passing #’s to the league average, we can essentially allocate Arsenal and West Ham’s 20% possession advantage (60%-40%) to an offensive and defensive component, as demonstrated below.  You start by comparing how many passes each team attempted and allowed and compare them to the league average.  Arsenal, in this example, were 150 passes above an average offense (600-450).  West Ham, by contrast, were 150 passes below an average offense (300-450).  But, West Ham makes up this difference by allowing 250 less passes than an average defense (450-200). possession differential example   That was a hypothetical, but what does this approach look like for this year’s EPL? (stats are two weeks old) epl pos diff   Talk about a tale of possession haves/have nots.   The difference between the #1 possession team (Swansea) and the #10 team (Chelsea) is closer than the difference between Chelsea and the #11 team (Newcastle)!  Another thing that jumps out is the comparison between Southampton and Arsenal; both have similar possession #’s, but achieve it in a very different fashion: Arsenal with offense and Southampton with defense.  You also might notice the larger variance in the offensive component compared to the defensive component.  This makes sense, as a team might face a variety of passing styles over the course of the year, but their offensive style is more persistent.  Running some regressions (based on past five years of EPL data – 100 teams) backs this up, as the offensive component has a much stronger correlation with total possession differential than the defensive correlation.  Interestingly, while you would expect a strong relationship (R2  > 0.7) between offensive and defensive components, the R2 was only 0.49, which I think demonstrates that this exercise of decoupling possession into offense/defense has some merit. offense defense rsquared   offense v defense

Home Field Advantage in the MLS

Since 2008, 100 different MLS squads have taken the field and 96 of them have played better at home than on the road (measured by goal differential).  The average home team over the past five and a half years has a positive goal differential of +0.47 goals a game. Home field advantage is a big deal.  But in a league as diverse in geography and supporter culture as MLS is— think 5,000 people at a Chivas USA game vs. 40,000 at a Seattle Sounders game—you would expect some teams to have a greater advantage than others. In the table below you can see every team’s home field advantage for every season from 2008-2013.  These years were chosen as they represent the contemporary Soccer Specific Stadia-version of MLS.  Home field advantage is measured by taking a team’s home goal differential and subtracting their road goal differential.  This is not necessarily the most accurate possible measure (doesn’t adjust for SOS, etc.) but it represents the most sensible proxy. [table id=12 /] Some Takeaways: The variation from year to year can be quite staggering.  The LA Galaxy, despite playing in the same stadium, went from having basically no home field advantage in 2009 and 2010 to having the largest differential so far in 2013.  Because of this variability, among other reasons, only limited conclusions can be drawn. For example, although Montreal, Vancouver, and Portland all have great home field advantages—both statistically and by the eye test—the limited sample size for all three in terms of games played precludes any broad declarations. Perhaps the one factor that does appear to have an effect, as Ted Knutson correctly hypothesized, is altitude.  Of the 14 teams that have been in the league from 08-13, the two teams that play at altitude—RSL and Colorado—have the best and third best home field advantages, respectively.  Somewhat curiously, the New York Red Bulls have the second best home field advantage over that same time frame, though it noticeably depreciated once they moved into Red Bull Arena in 2010.