This weekend, @7AMKickoff published a piece attacking the concept of adjusting defensive stats for possession. The piece was a bit dickish, but there were elements of it that deserve a reply, so I’ll do that today. For reference, here is my original look at possession adjusted defensive stats.

“I’ve done a fair number of regression analyses and I would probably never publish a .40 much less make some of the sweeping statements that Ted makes.”

For starters, I think the piece reads as a fairly cautious look at new research, not something that makes sweeping statements, but I guess mileage may vary. To address the regression bit, obviously I’m fairly well versed in statistics myself, so why would I publish something with just a .4?

There are two primary reasons. The first deals with statistical relevance in complex systems and the second has to do with the relevance of base defensive stats themselves.

Let’s quickly deal with statistical relevance. While it’s generally true that you would prefer to explain everything with just one figure, in most real world examples that’s impossible. After I linked to the 7AM piece this weekend, a number of social scientists spoke up saying they would often be happy to explain .10 to .20 of variation, while .40 is actually fairly useful. There are certain metrics that can have up to .80 r-squared in explaining total goal difference (depending on what adjusted shots model you use), but explaining smaller pieces of the puzzle often gets hard, fast.

Football is an extremely complex game, and defense in particular is very a complex system with multiple potential fail points, covering defenders, presses, low blocks, etc.  (Unless of course, your manager is ‘Arry Redknapp, where your players just go fackin’ run about a bit.) When faced with complex systems, especially when just getting started, any additional relevant explanation is useful.

To put it another way, we know shots matter. How does one prevent shots? That’s a surprisingly tricky question to answer. Or at least it has been for me, and I’ve been frustrated about this for a while.

Now for the second point…

Did you know that by themselves, defensive stats like tackles and interceptions show zero correlation to anything useful? It’s true. They are just numbers on a page. They don’t correlate to shots against, goals against, goal difference, points, nada. This is despite them being an intrinsic part of the game, and the method by which most teams get the ball back.

But if you adjust for possession? Now you get the 40% explanation of variation. Going from absolutely no relevance to explaining 40% of important things seemed useful enough to write about. Keeping track of adjustments that have SOME explanatory power while continuing to search for better ones seems worthwhile.

That’s the phase that football analytics is in right now. It’s annoying to know that a lot of things you write about right now will be obsolete in a week, or a month, or in a year, but that’s part of the progression. If I talk about p-adjusted stuff now, maybe someone else will go down that line of research and create adjustments that account for 60 or 80% of the variation in shots conceded.

“Possession is a measure of offensive dominance.”

It’s really not, and I never suggested it was. For the most part, I ignore possession stats completely, as it seems like a relic of the tiki-taka Barcelona era and little more. It is, however, a useful measure for evaluating who had the ball more and made more passes, which in turn is tied to the opportunity to make defensive actions.

Break it out: The opportunity to make interceptions is tied to your opponent making passes. The opportunity to make tackles is directly tied to your opponent having possession of the ball.

This seems to be what 7AM disagrees with most, but to me it’s fairly clear. Just because every team doesn’t actually try to tackle or intercept the ball in all areas of the pitch doesn’t change the fact that these actions are tied together.

Is possession adjustment imperfect? Absolutely. I never suggested it to be otherwise. However, it lends statistical meaning to stats where none existed before. For me, that was enough to force me to change fairly significant amounts of existing code to include them when looking at player stats on the defensive side of the ball.

Is possession adjustment wrong (and um wrong)? That seems like a value judgment, so I guess it’s for you to decide.

I can tell you that p-adj stats are already on the fullback radars and will be used the other templates soon, so will be heavily incorporated in my work and the radar player charts I produce. What other people choose to do with them is out of my hands.

At the end of the day, I’m all for other people trying to adjust defensive stats to provide better explanations of metrics we care about. The only issue here is that they a) have to pass the theory barrier, and b) have to add statistical relevance. Do both and surpass what my initial attempt has done, and I’m sure the world will quickly adopt the new approach as more correct.

In the meantime, I’ll keep using imperfect stats as opposed to irrelevant ones.

• kslotay

While I agree that it is better to have 40% explanation of variation rather than none, I think the fundamental issue here is that for the other 60% of scenarios, the model ends up doing the opposite of what it was intended for i.e. resulting in the defensive metrics being even more “wrong”, for example Schneiderlin’s stats. This would imply that in 60% of variation, the raw defensive stats are better than the p-adj ones. 7am’s piece may not have been the most eloquant way to discuss the subject, but it did have a number of compelling arguments against the general use of p-adj stats. The concept is a step in the right direction, and kudos to you for taking the time and effort to come up with it, but until a better model is available I prefer to have the raw data, but of course that is just my opinion.

• kslotay

Used a little bit too many commas there (Blame my phone). Also I’m not a stats expert so excuse me if I have the wrong idea about R^2 values (reading up on it now).

• Beige

“I’ve done a fair number of regression analyses and I would probably never publish a .40″
This is short sighted – I’ve seen great pieces of work results of which have been used to good effect with r2 results of 0.01 or lower. A low r2 simply means we aren’t capturing a lot of what is going on, not that what we are capturing isn’t meaningful.

In this case moving from ~0 to 0.4 is very persuasive.

Do you have anything else which might capture the ‘opportunity to defend’? Perhaps something like passes / dribbles into or in the final third?

• Derek

What is the point of the player radars? Have you calibrated them where a certain amount of tackles per 90 is supposed to be equal to a certain passing %? Because that is what most people take away from the graphs and what they appear to imply but I doubt you have been able to prove anything like that.

• Errorr

The radars are calibrated to 2 std dev from mean of player seasons in the database. The data I believe covers 5 seasons in several top leagues.

• chuksi

Does the database you’re using have positional data(not sure if the right term)? One way to move to another level would be to include that. For example its impossible to compare the defensive stats of Southampton and Chelsea. For Southampton to have a chance of getting the ball back the ball can be anywhere on the pitch. For Chelsea to have a chance of getting the ball back it has to be in their half(well, thats probably not a perfect definition, but still better – although I guess that might change from game to game as well). If we could filter the time when the ball is in an area where the teams actually work to win the ball back then it might help to create a more meaningful thing than the naked possession based stats. I’m not sure how to put that in numbers, but thought I’d share this idea. Maybe you guys haven’t thought about it and it could be an area to explore.

• http://engageinganalysis.wordpress.com/ Oliver Gage (@G4gey)

Before I start, I would like to point out that I have an extremely limited statistical background and a year ago if someone mentioned a regression analysis or r2 to me I would have looked at them like they were an alien. I have an idea and it could be very well be blown completely out of the water instantly….

Should defensive stats be adjusted based on more than possession? For example on Ted’s radars (which are fantastic) could we adjust interceptions based on passes against, tackles based on opposition dribble attempts and blocks based on number of shots attempted by the opposition.

You cant intercept a dribble, you can’t block a pass and you can’t tackle a shot. So if the 3 main defensive metrics were measured against their individual ‘sources’ would that help? For example if a team makes 300 final 1/3 passes, its likely that interceptions will be higher than a team that makes 150 but if that them who passes less is more direct and dribbles far more often wouldn’t tackles be higher? Not to mention anybody playing against Andros Townsend has the opportunity to block 20+ shots from 35 yards per game.
Based on this I think we could see a more genuine look at frequency of defensive actions based on the opportunity to make them.

As I said previously, there might (and probably is) a reason why this cannot be done, is wrong, would take ages etc. but I think it’s possibly the next step for adjusted defensive stats?

Feedback very welcome to @g4gey or ogage@virginia.edu

• Som

Do you give more importance to stats like possession in the final 3rd or do you just use overall possesion?

• Todd

“Break it out: The opportunity to make interceptions is tied to your opponent making passes. The opportunity to make tackles is directly tied to your opponent having possession of the ball.”

Speaking as a coach this is the best thing you can say and goes right to the point. You see, I have a saying in coaching with my teams: “possession is the best defense.” Players understand that they cannot get scored on if they have the ball so “keep ball” is the best defense. Sometimes statistical analysis gets in the way of common sense. I for one love the stats pieces and use them to understand player and team development. Understanding that keeping the ball will help with defensive actions in the game should compel someone to agree with the opposite; in that the team against you who is defending more instead of keeping the ball is then giving you less defensive actions in the game to execute than if the reverse were true.

This is why your break out is true. This is why coaches who teach possession consider possession to be the best defense tactically. This is why tiki-taka and the spacing to keep the ball in rondo boxes is so tight (short passes in small areas); because coaches teach tiki taka not as the best way to keep the ball, rather as the best way to win the ball back through high pressing. Put another way, Barca teaches 9 metre boxes and now Bayern is teaching the same (Pep), why? Not to keep the ball, not as the primary reason. Winning back right away when everyone is together is the primary reason. Keeping the ball with an overload in a 9 metre box is way harder and requires highly technical players to pull it off (which bayern and barca have), but the primary reason for the spacing they teach is defensive.

• http://blogs.columbian.com/portland-timbers/ Chris Gluck

Ted,
As always some intriguing information worthy of a good “think” :).

I think we’ve been over this before but tracking individual statistics on defensive activities is as much a function of what happens as it is a function of what ‘doesn’t happen’. In other words interceptions that don’t occur can very much mean that the defenders are doing such a good job of shutting down space that the unlikely bad pass doesn’t occur simply because the opponent decides to pass the ball elsewhere. Ben Knapper (head of stats at Arsenal FC) and I had a very long discussion on that concept at the World Conference on Science and Soccer this past June.

As you know I’ve looked at team defensive statistics for over 2 years in MLS and the analysis I have on Defensive Possession with Purpose continues to remain and sustain a significant R2 with respect to results – at this stage, this year, the defensive R2 is -.54 while the attacking R2 is .73 and the composite R2 is .76; only the goal differential R2 is greater (.84) than the Attacking PWP or Composite PWP and even the Defensive PWP R2 of -.54 is greater than the goals against R2 of -.46… I think you would be very hard pressed to find any other team attacking or defending formulas to match those R2 outputs…

but I wish you continued success in your effort – you know where you can find my latest research on MLS and I also took the same approach for the World Cup – it was surprising to see the relationship in that limited competition…

• mmiki

What Opta calls “possession” is actually pass volume, and not actual possession of the ball. It’s the main point of the 7am article that you didn’t address at all. Saying “The opportunity to make tackles is directly tied to your opponent having possession of the ball” is all well and good, but the “possession” stat does not tell you which team had the ball more, it tells you about the pass volume which is completely different.

• http://engageinganalysis.wordpress.com/ Oliver Gage (@G4gey)

How so? If the opponent has the ball for say 30 seconds and makes 5 passes, you have 5 chances at an interception. If the same opponent has the ball for 30 seconds and makes 10 passes you then have 10 chances to make an interception, so ‘possession’ adjusted stats actually do make total sense. This is why I suggested the actual defensive actions being tied to the event that causes them makes more sense. If the team makes 5 passes in 30 seconds and 2 dribbles, you have 5 chances at an interception and 2 chances at a tackle and your defensive stats can be adjusted accordingly.

Nobody is trying to claim that possession adjusted stats are the answer to it all, just a way of getting closer to an answer, right? Everybody knows that possession being calculated by pass volume is not perfect, but right now its the method every provider uses, so we have to make do with the numbers we are given.

• http://www.possessionwithpurpose.com Chris Gluck

Actually the possession stat does tell you who has the ball more (per possession) – in any given game – both team ‘own’ the ball the same amount of times – what is different is the ‘length’ of that ‘ownership’ – and that length is measured as a function of the volume of passes a team makes.
It makes sense because to measure the time a team possesses the ball means someone has got to run a stopwatch the entire game – clicking from one team back to the other team… so who ‘owns the ball’ when the ref blows and there is dead time due to an injury – do you subtract that time from active ownership? What about dead time in retrieving the ball when it goes out of play? How about ‘the time wasting’ that a goal keeper does when preparing a goal kick? By the way – it’s not just OPTA that views possession this way – everyone does – it’s accepted as an industry standard…
Of course the other issue with this is that there are teams that ‘dont’ look to tackle – they look to contain and that containment gets even tighter inside the final third as well as the 18 yard box.
But back to Oliver’s point made earlier…
The challenge of wrapping the head around interceptions, tackles, clearances, blocked crosses and the like is that there are a direct result of what opportunity is provided based upon what action the opponent takes…. for example – a player who makes a tackle is likely to be successful in that tackle becuase the opponent either 1) couldn’t handle the ball effectively, 2) had a poor first touch as they initially accepted a pass or 3) they simply fell down or lost balance… that tackle, in and of itself is a change in possession first and foremost – as such the importance of that tackle should be relative to ‘gaining possession of the ball’ and nothing more… Another view is that the more tackles one player has might mean that the opponent has chosen to penetrate that players area becuase they are weaker in defense than another player… is 20% good if it’s 5 tackles out of 25 opportunities or is it good if 20% means the player got 20 tackles based upon 100 opportunities?
I continue to offer thoughts on this thread because I personally don’t see the value in the individual tracking of defensive actions relative to ‘the individual’ being rated as better or worse than someone else..
Put another way – Mertens puts in 20 crosses against a team in the World Cup and everyone of those crosses was cleared becuase the opposing team had two great center-backs. Does that mean Mertens is poor in delivering crosses? No – it means the center-backs were positioned correctly to make the clearance. And what if one of those centerbacks cleared 15 of those crosses and the other one only cleared five – does that mean one centerback is ‘better than the other one’? no… it merely means that the ‘team of centerbacks’ did their job in team defense.

• http://engageinganalysis.wordpress.com/ Oliver Gage (@G4gey)

I do agree with Chirs’ point there and in my opinion the answer is that there is no right answer. After all, defending is so much about positioning and preventing an attacking player from doing what he wants to do, rather than making him do it and then blocking/tackling/intercepting him. I also understand that playing vs Bayern for example means that probably at least 50% of their 999 passes you aren’t even trying to intercept, simply contain, so passing therefore needs to be split further into ball rotation, penetrating pass, key pass and probably a whole host of other types of passes depending on how picky you want to be. This simply isn’t going to happen anytime soon.

On the flip side, should Messi, Ronaldo or any attacking player therefore be judged based on his opportunity. Should an attacker in a team who has less ‘possession’ be judged on team passes per shot rather than simply shots per game? It doesn’t take a rocket scientist to see that Messi wouldn’t score as many goals for West Ham as he would for Barca.

I don’t think Ted was trying to say he has found the answer to defensive metrics as he’s smart enough to see that there probably isn’t one right now. But I do think there is some value in looking at individuals defensive output too.

• mmiki

“Actually the possession stat does tell you who has the ball more (per possession) – in any given game – both team ‘own’ the ball the same amount of times – what is different is the ‘length’ of that ‘ownership’ – and that length is measured as a function of the volume of passes a team makes.”

It tells you how many passes a team has made, which is entirely different from “who has the ball more”, which was the crux of the argument.

Making a 15 second long run down a flank is significant possession of the ball that does not get registered, while pointless exchanges of a large number of safe passes between defenders can increase the possession significantly.

Industry standard or not, it doesn’t do what it says on the tin.

“It makes sense because to measure the time a team possesses the ball means someone has got to run a stopwatch the entire game – clicking from one team back to the other team… so who ‘owns the ball’ when the ref blows and there is dead time due to an injury – do you subtract that time from active ownership? What about dead time in retrieving the ball when it goes out of play? How about ‘the time wasting’ that a goal keeper does when preparing a goal kick? ”

Yes, while the ball is out of the game no one would own it. The remaining time on the ball would be split between the two teams. Why is this a problem?

It’s obviously more difficult to measure and there would be accuracy issues (due to humans doing the actual measuring), but even an imprecise measure of actual time would likely tell us something.

“The challenge of wrapping the head around interceptions, tackles, clearances, blocked crosses and the like is that there are a direct result of what opportunity is provided based upon what action the opponent takes…. for example – a player who makes a tackle is likely to be successful in that tackle becuase the opponent either 1) couldn’t handle the ball effectively, 2) had a poor first touch as they initially accepted a pass or 3) they simply fell down or lost balance…”

Ok, with you so far.

“that tackle, in and of itself is a change in possession first and foremost – as such the importance of that tackle should be relative to ‘gaining possession of the ball’ and nothing more…”

This does not follow from the previous paragraph.

Just observing defensive action as acts of ‘gaining possession’ is meaningless because just ‘gaining possession’ is not very meaningful without the context of the game state in which it happens.

The defensive action is first and foremost, denial of opportunity. If the opposition has 3 vs 2 in your half, which is highly likely to result in a goal for them, making a successful tackle or interception might be worth as much as scoring a goal. By contrast, dispossessing an opponent while surrounded by 3 others and no obvious passing options might not be worth much at all.

This, I suspect, is one of the reasons tackles and interceptions don’t seem to be related to much of anything – in much the same way that not all shots are created equal and you need shot positions and ExpG to see which ones are worth it and which ones aren’t really, we need a better way to figure out which tackles and which interceptions really matter.

Now I’m not saying that pass volume is unrelated – and I don’t have the data to prove it one way or another – my point is that if we’re discussing the logic rather than the data, it doesn’t follow that having a higher ‘possession’ stat will result in the opposition having more chances to perform defensive actions, because it’s not actually measuring how much time you have the ball in non-defensive areas of the pitch.

• http://www.possessionwithpurpose.com Chris Gluck

Ted, So you know – this may help by the way…

I did extensive research on defensive statistics in MLS the first half of this year and here’s what I found out http://possessionwithpurpose.com/2014/05/27/hurried-passes-could-this-be-a-new-statistic-in-soccer/

In short though – without reading that article the primary information is this… focusing specifically on defending passes as dribbling was not seen, by me, as being a considerable threat in generating penetration into the final third. Hence tackling was not looked at… The focus was on “unsuccessful passes”…. and what are the statistics of value in tracking how effective a team is in forcing the opponent to generate an unsuccessful pass?

To summarize – Blocked Crosses, Interceptions and Clearances will be counted as defensive activities that should impact the volume of Unsuccessful Passes.

So what are the correlations between those combined Defensive Activities versus Unsuccessful Passes after 142 events?

Final Third Defensive Activities to Unsuccessful Passes = .6864

Final Third Defensive Activities to Unsuccessful Passes when the Defending Activities’ Team Wins = .7833

Final Third Defensive Activities to Unsuccessful Passes when the Defending Activities’ Team Draws = .6005

Final Third Defensive Activities to Unsuccessful Passes when the Defending Activities’ Team Loses = .6378

In conclusion:

It seems pretty clear that Teams who win have more Defensive Activities, that in turn increase their Opponents’ Unsuccessful Passes given the higher positive correlation than losing teams – in other words a team that wins generally executes more clearances, interceptions and blocked crosses to decrease the number of Successful Passes their Opponents make.

It also seems pretty clear that all those Defensive Activities don’t account for the total of Unsuccessful Passes generated by the Opponent. If they did then the correlation would be higher than .7833; it’d be near .9898 or so.

So what is missing from the generic soccer statistical community to account for the void in Unsuccessful Passes?

Is it another statistic like Tackles Won, Duals Won, Blocked Shots or Recoveries?

I don’t think so – none of them generated a marked increase in the overall correlation of those three Activities already identified.

I think it is the physical and spatial pressure applied by the defenders as they work man to man and zone defending efforts. Again, this is what Ted Knapper, myself and others from Prozone Sports discussed at the World Conference on Science and Soccer… a potential new statistic might be called ‘hurried pass’ or ‘passes hurried’ or something else? We all agreed that ‘what doesn’t happen’ has just as much value as ‘what does happen’…

hope that helps Ted…

• http://www.possessionwithpurpose.com Chris Gluck

typo – * Ben Knapper 😉 not Ted Knapper…

• http://www.possessionwithpurpose.com Chris Gluck

Ted,
One final thought – I read your original article and maybe I missed it but I didn’t see where you penalised a player for ‘poor tackling’ – in other words when a player garners or red or yellow card it’s because of a poor tackle.

Again, maybe I missed it but it would seem to me that if you track and measure individual statistics you should account for poor tackles by that individual as much as good tackles… and perhaps consider subtracting one from the other before arriving at your path forward…

My apologies if you’ve looked and addressed that issue in the past… all from me.

Best, Chris