packing

You’ve probably heard of Packing data by now. If you haven’t, it may be worth having a look at Raphael Honigstein’s primer. In effect, Packing is assigning a value to the amount of opposition players ‘taken out of the game’ by a pass or dribble.

Its popularity comes predominantly down to its intuitiveness. Where more complicated metrics can alienate in their codification and complicated statistical methods, this is a fairly comprehensible way of coding something in football terms. That has value, especially when it comes to selling to football people.

Nonetheless, there has been an obvious question to those with experience with event data. Is Packing really an upgrade? Put more articulately: Does Packing provide significant additional value over what we already have?

For Euro 2016 data that I kindly received a sample of,  I found around 70% of the variation in ‘average outplayed opponents’, effectively the mean amount of times a player ‘takes out’ other players per 90, could be explained in a linear regression model with only forward passes and successful dribbles per 90 as inputs. These aren’t complicated or unintuitive metrics, rough proxies for ‘verticality’, and yet they can explain a significant portion of the Packing numbers. That isn’t great.

Another version of Packing looks only at the amount of defenders taken out by players. Using another regression model with only passes and dribbles in the box per 90, 50% of the variation in these player values can be explained. Again, these are really simple metrics explaining a lot of what is meant to be unique insight.

With event data, it is sometimes forgotten that although something is not explicitly measured, this doesn’t mean that it is totally ignored. When players are passing forwards or successfully dribbling, they are implicitly taking players out of the game. Another example is that although Opta shot data doesn’t include defensive pressure, there is an implicit relationship between how close you are to the goal and the likely amount of pressure you face to take a shot; caveats, like whether or not an attack is a counter attack, also carry implicit connotations about the amount of pressure faced.

Any metric being marketed as a one-hit solution to football analytics is going to be a disappointment, as they can’t really exist in a game this complicated. Packing may be a useful part of quantifying player and team verticality, but I would be hesitant to call it much more than that, and it isn’t really anything that couldn’t be done with, say, Opta’s event data. To be fair to them, I know nothing about their pricing system, and it may be that clubs who want to only gain specific insights to complement coaching styles prefer to buy into something like this than an event data subscription.

I do also think that Packing and its popularity highlights a weakness in the analytics community’s current attempts to quantify stylistically past, say, goalscoring or chance creation. There is a lot of hype about positional data at the moment, but we are still nowhere done with analysing event data in unison with football theory to quantify tactical styles on a player and team level. What is the difference between how Sergio Busquets and Luka Modric play, and how are those differences useful in their respective team styles? These are questions that are perhaps more about efficient use of data to empirically answer football questions than actual statistical methods, but this is all part of what I believe to be the future of actionable analytics in the game.

Using the first regression model, I predicted Packing values p90 for the Premier League 15/16 season for players who played more than 10 90s. To clarify, this is players ranked by the predicted amount of opponents they ‘take out of the game’:

  1. Santi Cazorla
  2. Gianelli Imbula
  3. Yaya Toure
  4. Mousa Dembele
  5. Cesc Fabregas
  6. Ross Barkley
  7. David Silva
  8. Eden Hazard
  9. Bastian Schweinsteiger
  10. Alexis Sanchez
  11. Jordon Ibe
  12. Aaron Ramsey
  13. Mamadou Sakho
  14. Fernandinho
  15. Alex Oxlade-Chamberlain
  16. Michael Carrick
  17. Wilfried Zaha
  18. Manuel Lanzini
  19. Lucas Leiva
  20. Francis Coquelin

It’s notable that Mamadou Sakho is the only centre-back in the top 20, and this is because of his unusual rate of forward successful passes per 90. In the Euros list, Mats Hummels and Jerome Boateng ranked highly because they do this too.

Some interesting results there, and a decent first attempt at quantifying player ‘verticality’.

Is Packing the future of football analytics? Generously speaking, maybe in spirit, but probably not in practice.

  • Barry

    I’ll caveat this by saying I’m just an interested outsider that doesn’t purchase stat packages, so I guess this article isn’t targeted at me, but I don’t really get what the specific problem is here.

    Your analysis suggests that the event data doesn’t wholly explain the Packing data, so it presumably contains more information (the number of players the ball beats has been counted?). If Packing is specifically quantifying this, it’s adding more context to a move than we have already. Surely this is an upgrade on an assumption based on the position of an event. Whether or not that information is useful is a different matter altogether that isn’t really addressed here.

    Is the point to just think twice before paying for Packing Stats?

    • Ron IsNotMyRealName

      Paying for, using, placing value on…I think all of the above.

  • Chris

    I think the most interesting aspect of Honigstein’s article was not how many players does a pass “take out” when executed by the passer, but in the reverse direction. How many players are “taken out” by the person receiving the pass. This could be a good metric or base stat for “player movement” which I think can be hard to quantify unless you watch videos of specific players.

    I know passes received and other OPTA stats give information about attacking events, but as said previously, this “packing” stat can fill in the gaps or be used on conjunction with those events to paint a clearer picture of what is happening or how well a player moves off the ball.

    • Paul Tiensuu

      I too found this as particularly interesting way to (finally) value the off the ball movement, but notice the list there: it consists of passers mostly (with some dribblers like Hazard, Sanchez and Imbula, but most dribblers on the list are, nonetheless, players who produce forward passes rather than runs), which signals that in fact the passing of the ball is of more consistent value than the runs without the ball.

  • NinJa

    How does Leicester as a team do by your metric? How about Mahrez and Drinkwater?

    • Ron IsNotMyRealName

      They probably don’t as often have defenders to take out of the play because of the way they played. If there are only 3 opponents between you and the goalkeeper, you don’t have to take many out of the play to be in a scoring position.

      • Paul Tiensuu

        Even if it is so, there’s still somebody playing the ball past the defenders, unless possession is won very high up the pitch.

        • Ron IsNotMyRealName

          Or unless the opponent has piled people into the attacking third trying to move the parked bus.

  • NinJa

    You ask: “Does Packing provide significant additional value over what we already have?” But the real question you should ask is: “Does Packing do a better job of predicting (retroactively) what happened during the season?” For example, could Packing have predicted Chelsea’s poor season? Or that Newcastle would get relegated? Or the surprising performances of Leicester, Southampton, or West Ham?

  • NinJa

    The value of a model lies in its predictions. Most people in the soccer analytics community seem to ignore this point.

    For example, if you are using the “Expected Goals” model and a team significantly outperforms their ExpG, then you boil it down to “luck”. The implication is that the team will come back down to earth. However, there is not enough self-reflection as to whether the model itself is (partly) wrong, e.g., in some small systematic way.

  • NinJa

    Here is a relatively simple challenge. Consider the halfway table of last year’s PL season (after 19 games). Crystal Palace had 31 points (6th) and Swansea City had 19 points (17th).

    Could any analytic model take the data from these 19 games and make a reasonable prediction of the teams’ fortunes for the remainder of the season?

    In reality, Crystal Palace secured just 11 points over their next 19 games and finished 16th, while Swansea City secured 28 points over their next 19 games and finished 12th. So, could any model have predicted the reversal of fortunes for these two teams? If so, that model deserves more credence.

    • El Loco Crab

      Looking at the xG through 12/21 games, Crystal Palace had 16.2 for and 23.3 allowed. A swing of 14 goals to the positive compared to their 19 non penalty goals and 13 non penalty goals allowed. This suggested that they were considerably over performing the underlying numbers.

      Swansea City’s numbers were 15.2 xG for and
      17.9 xG allowed compared to 11 non penalty goals and
      20 non penalty goals allowed. This also suggests that they had a bit of bad luck and weren’t as bad as their place in the table said.

      So yes, it looks like if presented with the above data that you would guess that Swansea would be the slightly better team going forward, maybe not to the extent that it turned out but over a half season variability can be quite large.

      Data can be found here: https://web.archive.org/web/20151229114409/http://cartilagefreecaptain.sbnation.com/2014/2/12/5404348/english-premier-league-shot-statistics

      • James Yorke

        yeah, predicting a demise for Palace and improvement for Swansea was a pretty simple conclusion, and these are the type of things that we regularly highlight on this site.

        As for packing, could it predict the weird aspects of 2015-16?

        No chance. As Bobby has suggested it’s merely a slightly nuanced version of relatively straightforward event data.

        Nobody picked Chelsea, nobody picked Leicester. There’s a ton of info in all current metrics and those two fell so far outside any ratiional prediction that any prediction that did forecast their seasons has to be considered suspect

        • NinJa

          I don’t have a dog in the “Packing” vs. xG models. But it is also too facile to simply attribute a team’s performance relative to its xG to luck. So why was Palace overperforming its xG and why was Swansea underperforming?

          Did any of the “analytics” gurus predict that Swansea would comfortably avoid relegation based on their performance at the halfway mark? I am afraid I did not see a single such prediction.

          • James Yorke

            read this then, from November on Swansea, by me:
            http://statsbomb.com/2015/11/should-swansea-sack-garry-monk/

            and you’ll see that armed with little more than shot data and a simple points expectation, it was easy to see that Swansea were fine.

            As for Leicester, they ran up a sequence against weak opposition and ran particularly hot on a few metrics in 14/15, and were utterly abject beforehand. even an optimistic projection would be top half let alone beyond. That’s why nobody “picked Leicester”, because there was no good reason to, it was a once in a generation (maybe more) outlier.

          • NinJa

            In that article on Swansea, I will give you credit for saying that Swansea “…have been a 45-52 point team since arriving in the league and it is
            probable that they will again land in a zone close to that projection-
            anywhere from 8th to 14th”. But your article was written after Matchday 14 when Swansea was 15th in the league (with 14 points). Predicting that they would finish at about the position they were would not exactly have qualified as a “bold” prediction.

            By the halfway point (Matchday 19), things got worse for Swansea. Even after Matchday 22, they were on 22 points and 17th place. If at that point, you had written an article saying “Relax Swansea fans, you have nothing to worry and your team will comfortably avoid relegation”, then I would tip my hat to you.

          • James Yorke

            oh give over. I’m sorry I wrote a Swansea article at not the precise juncture that they were at their specific lowest point. Monk got sacked soon after my article IIRC so they weren’t exactly confident of their future, and I think saying a team will end up 8th -14th qualifies as “comfortable from relegation”

          • NinJa

            I know you guys are the best in the business of soccer analytics. And again, you deserve credit for the article you wrote on Swansea, which in hindsight was right on the money!

            But I would like to see you all go out on a limb and make more “bold” predictions. Something that would get your site (and by extension, the soccer analytics community) on the map.

            And also, I would like to see more introspection when your predictions don’t work out or when there are unexpected outcomes (over the course of a season).

          • Ron IsNotMyRealName

            I remember when Ted actually did this with players, but IMO didn’t explain well enough what his predictions were actually based on though in some cases the radars told the story well enough. I can understand why he didn’t, though, the same reason I’m careful about what I say here.

            I didn’t agree with a lot of his picks, and in particular as mentioned before he has way more faith in the Eredivisie than I do.

            I’ve toyed with doing some kind of “overhyped/underrated” bit based on data, but then you’re like the weatherman, if you’re wrong 20% of the time, that’s what everyone remembers.

            And I used to be a weatherman, so I know of what I speak. 🙂

          • NinJa

            So let me see if I understand this:

            Leicester in the second half of 14-15 = fluke, unsustainable.

            Then, they go on to sustain their performance for the first half of the 15-16 season: still a fluke, unsustainable.

            Then, they go on to win the league in the second half of the 15-16 season: that was a “once in a generation outlier”.

            Really?

          • James Yorke

            Go and read the many many articles that people have written in the aftermath of Leicester’s triumph, then come back and tell me you still think it was “predictable”. It wasn’t.

          • NinJa

            My point is that it was NOT predictable based on your current model! But perhaps that is a reason to revisit the model. Better than chalking it up to being an “outlier”.

          • Ron IsNotMyRealName

            I think there’s credence in this statement. Reverse engineering models based on real world events the model utterly failed to model is a reasonable action, or at least attempting to do so to see if it’s possible without breaking it (i.e. becoming wrong about the other 19 teams to be right about 1; that would suggest that it was truly unpredictable).

            I think at least 2 key aspects of Leicester’s performance WERE predictable based on 14-15 data. Kante’s ball hawking and Mahrez’s attacking effectiveness. Maybe Vardy’s effectiveness was less predictable. I’ve also thought for some time that Albrighton was underrated and that showed up importantly as well. Drinkwater not so much.

            Gueye signing for Everton suggests that Steve Walsh doesn’t think that Kante’s impact was unpredictable, as he’s gone and gotten the next best thing from a statistical perspective.

          • NinJa

            Exactly correct that Mahrez’s attacking effectiveness was there for all to see back in 2014-15! None of the analytics models picked that up, however, when it comes to predicting Leicester’s chances in 2015-16.

            My hypothesis is that there is underlying value in players who are particularly effective at DRIBBLING and beating their man, especially in the final third. Mahrez is the poster child for this. He does it time after time, even though he is not particularly fast.

          • Ron IsNotMyRealName

            Actually he was on my radar for a best fit model based on certain players. I wouldn’t have expected him to have that kind of return, and I suspect he won’t do quite as well next year no matter where he is (his on target rate relative to the number of shots he took outside the box suggests he overachieved a bit), but he is good, was good and will likely continue to be good.

            We’ll see if that’s in Leicester or not.

          • James Yorke

            And my point is nobody in the world had a model that predicted Leicester.
            Honestly, go find me one and you can carry the day, until then, your point is moot

          • NinJa

            I agree that there is no model that predicted Leicester.

            The point this implies is that the existing models in soccer analytics (including xG) are ALL deeply flawed.

          • Ron IsNotMyRealName

            I go back and forth on whether this is true or not. Leicester wasn’t bad in the 2nd half last year. Picking 1st, maybe not, but perhaps we should have expected more based on how they finished? And if anyone should have been able to predict the impact Kante would have, it should have been the analytics community. He was a stats monster, plain and simple.

          • James Yorke

            you would have to weight a series of games where they ran hot (backend of 14/15) so highly and ignore all that went before.
            Kante had very similar stats to Idrissa Gueye, so should we have predicted Villa to go well based on that?

            Plus the stats he monstered aren’t tangibly linked to success, so predicting a tackling guy to land in a small but hot team and launch them to the title, or even top 4 is far fetched.

            Honestly Leicester are without precedent, trying to backfit reasons because it happened doesn’t necessarily become an informative practice.

          • Ron IsNotMyRealName

            Well we’ll see about Gueye. But Gueye didn’t play with a Vardy, a Mahrez, Albrighton, etc. The only other player on that team worth a hoot got injured early (Amavi).

            I’d say Steve Walsh thinks those stats were tangibly linked to success. Just because it hasn’t been modeled effectively doesn’t mean it doesn’t exist.

            I think it’s probably of more value right now to look at what happens when players actually play than to put much faith in a regression model.

            I know I’ve taken a finance course where one of hte things we did was look at the market price and retrofit our model to that to see “does this make sense?” As opposed to purely using the model to say “this is what this should cost” which assumes the model is right — which it isn’t: all models are wrong. But some are useful. And, importantly, some are not.

          • Ron IsNotMyRealName

            Btw, there’s precedent for the “what happens when teams/players actually do it” in other sports, even in baseball where so much more has been figured out.

            The original analytics CW was that bunting is a terrible idea in almost every instance because of expected value of a base vs. giving up an out…etc.

            But then someone else looked at “well what happens when teams actually bunt?” and found out teams actually do a pretty decent job already of deciding when they can and can’t add value with that.

        • NinJa

          As I recall, Leicester’s performance in the last half (or was it last 10-15 games) of the 2014-2015 season was completely in line with their performance over the entire 2015-2016 season. So why was it that “nobody picked Leicester”?

          • Ron IsNotMyRealName

            Another really good point, and one I’m interested in seeing whether it holds going forward for other teams, like Southampton and Liverpool (though the latter taking one of the former’s best players might cause some issues).

          • Paul Tiensuu

            NinJa, what you’re missing here is that no model ever will “pick” every winner, because there will always be outliers. That’s the truth about probability: improbable outcomes have a probability, and will occur in proportion to their probability. If you refuse to ever resort to accept that an occurring event is an outlier, you are simply refusing to accept the reality. Many of us were able to predict that Leicester would finish comfortably safe from relegation and perhaps in the top of the table if they did well, not because they were good in the end of the 2014/2015, but because they were good throughout the 2014/2015 season. But nothing suggested that they were title contender good, and that’s why nobody would have concluded them to be the PROBABLE winners. That’s why nobody “picked” them.

            And you can go on and demonstrate the effect of efficient dribbles in the final third but for several reasons I’m going to bet my ass that no sustainably predictive model taking dribbling into account will turn the predictions around to support Leicester’s title bid. One reason is that Leicester’s squad was very thin and they were extremely lucky to avoid injuries to all key players the whole season. You can’t simply assume that to happen.

            When you ask the stats people to make “bold predictions”, it seems to me that you want them to pick improbable winners. Certainly a statistician may find improbable bets that are however less improbable than bookies think, and pick such a surprise, because surprises happen. But when making predictions of what will probably happen, one looks for the most PROBABLE outcomes. They will not give you your “bold predictions”, if that means predicting Hull to finish in top10 or Everton to win the league, because they are improbable to come true, and it is literally impossible to say which team will this time have the necessary luck to loudly overachieve its potential. Models are based on probability, and it cannot determine what exactly will happen, it can only approximate what is the most likely.

          • NinJa

            Paul, thanks for a well-considered response. I do not disagree with most of what you say.

            However, the term “outlier” has a different interpretation to me (and I know my statistics). An outlier would win the PL by eking out 20+ wins by a 1-goal margin, suffering several bad losses by 4 or 5 goal margins and overall having a net goal differential of, say, +10.

            For comparison, Leicester scored 68 goals (only 2 teams scored more), conceded only 36 (only 2 teams conceded less), and their overall goal difference of +32 was bettered only by Spurs. Put simply, Leicester were the best team in the PL last year, not an outlier in terms of a vast gap between their fundamental performance and results.

          • Abel

            I don’t see how they can’t be both.

          • Paul Tiensuu

            Hello again NinJa, sorry for a way delayed reply, I have been busy. But let me now on my free day give you what should be a comprehensive answer.

            I believe you know your stats. 🙂 But with statistics the trick is seldom in technicalities, but in looking at the right things, and that is also where the errors tend to come. I think that you are looking at the wrong factors in search of explanation. Goals scored and conceded are the results, not the causes. And the fundamental performance is what leads to a likelihood of scoring and conceding goals.

            Of course, goal difference is a decent signal of continuing success: in general, a team with a good goal difference is more often able to keep on winning than a team with bad goal difference. But that is not because it has scored more and conceded less goals in past, but because it usually plays in a way that is prone to get it significantly more goals than to its opponent.

            It is with regard to this that Leicester is an outlier: they got that great goal difference through an extremely unlikely out folding of events. Let me take just a few stats: in the last 24 matches their opponent conversion rate was only 4.3%. Schmeichel? Quite normal save%. This was a case of opponents missing the net very often. There is no way there is not a significant part of luck in that, we know that although there is some continuity to be found here, in general this is a volatile stat that tends to regress toward the mean.

            Their own conversion rate regressed toward the mean slowly during the season, but stayed unsustainably high. It’s partly an effect of shooting from good locations, but not exclusively:

            That is a shot that Vardy will convert once in his lifetime, and it won them 3 points in a crucial stage of the season.

            Leicester won 13 penalties in PL. Median team wins 4 or 5 penalties in season, conceding as many. So they doubled or tripled the number of penalties while getting the normal 4 against. That’s a 9 penalty shot difference. The 2nd placed Arsenal, who spent a lot more time in opposition box, won 2. They over sextupled that, getting a potential 11 goal advantage just from penalties (and in practice a 9 goal advantage). Arsenal was called 1 penalty against, so the whole difference was 9-1 for Leicester. In any case these are very volatile stats, I urge you to go and look at the penalty stats and there is almost no continuity from a season to another. It is very unlikely they can have such a great penalty kick advantage over a season. Second most was City with penalty call difference of 7 (8 for 1 against). Leicester has already won 2 more penalties this season, and at least one that was a wrong call, so maybe they actually are getting high penalty numbers again somehow. But I would still take a bet that they won’t get more than 8 penalties in PL this season.

            Finally, they had absolutely no injuries to their essential first team players. Schmeichel, Albrighton and Morgan played full 38 matches, Mahrez and Kanté 37, Vardy and Okazaki 36, Huth and Drinkwater 35. Only ones who didn’t feature in essence all season were Simpson and Fuchs, because they were not favoured from the start, and even they played 30 and 32 matches respectively. It is incredibly lucky that Ranieri was able to play his favoured eleven week after week without injuries. Now he already suffered an injury to Mendy. What if Mahrez or Kanté had been sidelined for months from early rounds? They didn’t, but you cannot assume before the season that it would not happen. Usually, all teams lose some of their most important players for significant periods, and very rarely can they avoid losing many of them at once. Particularly at the top, almost all teams have to rely on their 3rd or 4th choices on some positions at some points of the season. But Leicester played with their first choices virtually all season, at worst they used 2nd choice on a few positions.

            These are factors that explain partially their strong goal difference, but that are not fully repeatable as such because they depend significantly on factors that are not in the hands of the team itself. Nobody is saying that Leicester’s success was completely down to luck. Ranieri is a skilled and underrated manager, and if Vardy and Mahrez maybe overachieved, Schmeichel and Kanté were predictably going to do exactly what they did, given that they stayed fit. But the luck comes in the extremely small likelihood of all these things that Ranieri cannot really have control on (conversion rates, injuries, penalty calls, and such) to turn out in their favour at once.

            That is the massive luck factor that StatsBomb people have been trying to explain: that this was the absolute and unpredictable best case scenario where everything pans out perfectly for them, by which they get a great goal difference, leading to a lot of points, while their opponents all have a bad season in terms of conversion, injuries or penalty calls, or all of them. Some team always benefits of some such factor or another, but you cannot really predict which one, let alone that a given team would benefit of them all at once. That’s why, while it is entirely possible that Everton or Southampton, in an absolute best case scenario, would win the title, that kind of “bold prediction” would not show great foresight but a will to risk predictions against the likelihoods.

            I know this was already a long reply, but I would add yet two points.

            Firstly, your proposed factor of “effective dribbling” would definitely not explain Leicester’s win. Leicester did try dribbles amongst the top clubs, but didn’t do it particularly successfully. Which team had the most successful dribbles? Arsenal, with 13 per match (Leicester had 11.3). And most efficiently, at the best rate? West Bromwich. But put that anomaly aside, the second was again was Arsenal, with 64% success rate. Leicester had a league 6th worst 54% rate. Maybe more crucially, dribbling stats have not been found to generally correlate well with success.

            Secondly, you say that “the predictions you will see at sites like the Guardian or ESPN…all of them will have the Big 6 (Man U, Man City, Arsenal, Chelsea, Tottenham, Liverpool) in some order as their predictions for the Top 6 of the PL this year. This despite the fact that last year two clubs outside this elite coterie cracked the Top 6 (Leicester and Southampton).” But when you look in the history of the PL, Leicester was the first team outside the mentioned big-6 finishing in the top-4 since 2005. That’s 11 years. Is it really unfounded for the pundits to predict these big money clubs to finish at the top? That prediction has gone right every single season apart from the last one. The general prediction is not that these 6 will certainly finish in top-6. Everybody knows that it is likely that time to time one or another of them fails, even that in most seasons one will. And that some close to the top team will overachieve and take advantage of that failure. But while you can predict which teams could possibly leap in the top-6 in the best case scenario, or drop out of it in the worst case, and you can have cues about weak points that could make a bad scenario likely for a given team (such as Arsenal being thin defensive midfielders last season), it is before the season impossible to predict which one will have which scenario.

            This is going through a slightly and possibly long lasting change, because now Premier League has so much money in it, that even its smaller clubs are significantly richer than almost all non-PL clubs, which means that there is an overabundance of quality players to choose from so that even the smaller clubs can build excellent squads. This is particularly the case because the French Ligue 1 on the other side of the channel has simultaneously financially very poor, which makes the excellent player basis produced there easily available for even the poorest Premier League clubs. The differences in richness are unaltered, ManU, Arsenal, Chelsea and City still have riches beyond Everton’s, Leicester’s and Watford’s dreams, but it makes the difference less important. The smaller clubs still are not going to buy Pogba, Zlatan or Özil, but they can have strong enough squads to compete against the bigger clubs. But this only makes it more possible for them to finish sometimes in the top-4 or top-6. It doesn’t mean that they are likelier to finish in the top-6 than the “big-6”. Or in particular, while it makes it possible that some team outside the big-6 would finish in top-4, it does not make it likely for any particular team outside the big-6 to finish in the top-4.

          • Paul Tiensuu

            Dear StatsBombers, I sincerely apologize for the length of my post. I didn’t realize it would be that long. I should create a blog of my own instead of spamming you like this, and worse yet with things that I have mostly learned from your earlier articles anyway. Sorry.

          • NinJa

            Thanks for the nice long post! You SHOULD have your own blog!

          • NinJa

            To avoid confirmation bias, a good stats-based analysis of the PL should basically THROW OUT recent history as well as the financial status of the clubs. The focus should mainly be on the performance stats of the first-team players, the style/tendencies of the coaches, and perhaps the overall strength of the squad.

            Then a prediction should be formulated in the following vein (all numbers for illustration purposes only).

            E.g. 1: Hull City: Probability of finishing in:
            Relegation places (18-20): 45%
            Places 15-17: 40%
            Places 10-14: 12%
            Places 7-9: 2.9%
            Places 1-7: 0.1% (1 in 1000)

            E.g. 2: Leicester City: Probability of finishing in:
            Relegation places (18-20): 5%
            Places 11-17: 60%
            Places 7-10: 20%
            Places 5-8: 10%
            Places 1-4: 5% (1 in 20)

            What irks me is when the analytics folks act as if there are only 6 teams worth considering in the PL every year and discount everyone else. Even when Leicester were doing so well last year, you continued to see articles stating that they couldn’t possibly keep it up. While there were good reasons to expect a drop-off, all of which are mentioned above (such as low number of injuries,
            high number of penalties, etc.), one suspects that the real reason for discounting Leicester was the recent history of the PL (no teams in top 4 except for the elite 6) and therefore even many of the so-called “objective stats-based analyses” were simply engaging in confirmation bias.

          • NinJa

            The problem, as I see it, is that predictions about the PL are ALWAYS clouded by the money that lies at the top of the table. For example, take the predictions you will see at sites like the Guardian or ESPN…all of them will have the Big 6 (Man U, Man City, Arsenal, Chelsea, Tottenham, Liverpool) in some order as their predictions for the Top 6 of the PL this year. This despite the fact that last year two clubs outside this elite coterie cracked the Top 6 (Leicester and Southampton).

            In my opinion, a site focused on soccer stats should ignore the names of the clubs to make their predictions. Perhaps the glass ceiling has been cracked and we will continue to see the “small” clubs making inroads into the Top 6.

            In fact, this is exactly what Slaven Bilic predicted last year. He suggested that the gap in quality of players between the big and small clubs was far narrower than before. For example, if Everton were to hold on to Lukaku and Stones, why would they not be a candidate to win the league this year (or at least break into the top 4)?

  • Abel

    fwiw we at Bundesligafanatic broke this story nearly 2months ago
    http://bundesligafanatic.com/impect-packing-the-future-of-football-analytics-is-here/

  • Vishruth Srinath

    I am someone who is trying to understand football analytics a bit more. The article says that 70% and 50% of variations in players taken out/defenders taken out are explained by existing metrics. based on this the writer then concludes that these existing metrics do a good job in predicting “packing”. However, to the untrained eye, the 30% and 50% not explained by existing metrics seems like a large enough proportion to not draw the conclusion that the author draws. Am I missing something?

  • Sheikh Barabas

    Does this measure take into account failed attempts? Would it only value the Stevie G “Hollywood” pass that works once in 8 attempts and disregard the wasted low margin attempts?