There’s been a lot of really good work done on shot positions recently, much of which is both clever and tremendously useful. The short recap is that shots taken close to the goal in the center of the penalty area are good. Shots taken close and wide are pretty bad (due to the angle of goal space available to shoot into), and shots taken from outside the penalty area are also pretty bad and get worse as you move further out.

Obviously the location of the shot itself only tells you part of the story. Like most of the analysis we can do right now, shot location is an abstraction. It’s a valuable one, but it only accounts for half of the offense vs. defense equation.

Consider for a moment, which of the following two shots is more likely to result in a goal?

defenders to block shot

Figure 1

Figure 2

Figure 2

Figure 1 takes place in the 20-25% conversion range. It’s central and inside the penalty area, which is good. If all of a team’s shots were taken from this position, they would manage to do very well at converting shots to goals during the course of the season. Obviously this is complicated by the fact that there are two players in immediate blocking distance that limit the available angle the shooter has to put the ball in the goal.

Figure 2 is a shot from the exact same area, except the player has somehow beaten the defensive line. Most estimates say this shot is converted at closer to a 40% rate, which is about as good as it gets. The only limitation on the angle the shooter has is where the keeper is positioned at the time.

That means the shot in figure 2, despite being taken in the exact same area as the shot in figure 1, is about twice as valuable as the first one.

You run into this same problem with headers in the area. Very few headers are actually “free” where a defensive player isn’t doing something to at least put a body on the shooter and throw off their aim. But headers that are free are usually in the center of the penalty area and within 12 yards of the goal. Free headers are enormously valuable and have a high probability of scoring goals, if you can get them.

That’s the issue with data abstraction. As a “shot,” they all get lumped in to the same areas and they look the same, despite the fact that one shot will be twice as valuable as the other, depending on where the defenders are located.

That’s also one reason why I say positioning is everything.

One Moment In Time
Right now, a good baseball model can look at a number of inputs and give an extremely accurate probability of run scoring. If you take the current pitcher’s stats, the current batter’s stats, whether and where there are men on base, and the park factor, you come up with a figure that tells you the likely amount of runs scored for this at bat. Add in the rest of the batting order, and you can calculate that on a batter by batter basis throughout the course of a game.

What does this have to do with football? Football is in constant motion. Baseball has static stops and starts (each pitch). They are not remotely the same game.

But what if you sliced football into individual moments in time?

Take any moment in time where the ball is in the attacking third, look at the current ball position and the positions of the defenders. Then analyse the probability of a goal being scored by a shot in that location. Do that with enough shot and position data, and you construct a model that tells you how likely any shot is at creating a goal based not only on shot location, but also on defender location.

25 shots per game… 380 games per league per year… it would take about two years to reach 100K shots from the big five leagues and have an enormous database of shot success vs defender positions to analyse.

Why would you want to do this?
Because it teaches you what situations actually yield the best chances. We don’t have to guess anymore, we can know what is most likely to happen.

If a good shooter has 2 yards of space to take a shot at 20 yards, is that a good shot? Is it better than the shot in figure 1 above?

What about scenarios where a player cuts to the byline and then crosses back to the center? Is it better to have a near post runner or a far? How about a near-post and a penalty spot filler?

With enough game and positional data, you can answer every one of those questions with an actual goal likelihood.

The value of this information to managers is immense. Every single thing you learn here is a coaching point, not just at the top level, but right down to the grass roots.

Baby Steps into Really Big Steps
If you feel like a lot of what we are learning in the stats community is exceptionally basic, that’s because it is. Not because the people analysing the data are dumb, or because they don’t understand how to do things more complex – there are a scary amount of post-graduate degrees donking around with football stats during their free time. The reason a lot of analysis seems pretty basic compared to what is happening in other sports, is that up until recently, there has been no data available to the public.

Now that we have some data available, we’re taking baby steps, expanding the knowledge literally every day with new, useful information.

But this type of modelling and analysis… that’s the end game. Some of the data companies have positional data on everything that happens on the football pitch going back for years.

So the cool part is, it’s doable right now for teams or Football Associations that have the money, the talent, and the willpower to make it happen

And positioning – plus analysis of how shot location position vs. defensive position affects goal probabilities – is everything.

  • Steve

    Very interesting post. It’s good that you acknowledge how basic the current state of analytics is and your emphasis on positioning is a step in the right direction.

    However, I don’t see how “positioning is everything” or “And positioning – plus analysis of how shot location position vs. defensive position affects goal probabilities – is everything.”

    The problem for analytics in football is that there are way too many variables and even when there is data (I agree there is very little data available given the complexity of the sport) most people strip it down to linear relationships between one or two variables.

    In the case of shots, apart from the points you’ve mentioned, one would have to look at (and this is not exhaustive)

    – The position of the ball vis-a-vis the attacker, i.e. is it just in front of him forcing him to stretch while shooting, is it in his sweet spot, is it just behind him limiting the direction and pace he can generate. This has an obvious impact on probability of scoring. Furthermore, a good goalscorer might be converting more chances from suboptimal ball positions while another player might have scored more but received the ball in the sweet spot more often. Bland statistical analysis would not be able to identify this difference in skill if the position of the ball wrt the striker is not considered. Similarly, the strikers ability to adapt his body shape in order to get the ball into the sweet spot is very important and has to be measured and factored in.

    – Then we need to see whether the ball is on his strong foot or weaker foot. Again it also becomes a factor of skill, i.e. how well a player manipulates the ball to his stronger side. Experimentally, this can be seen easily by making different players attack the same ball against the same defensive set up. Coaches probably see this all the time in training. But how much data do we need from match situations to even come close to understanding this?

    – What about the height of the ball? Is it on the ground, is it a half-volley, a volley, how high is the volley? Did the ball wobble at the wrong moment?

    – Related to the above points about skill, what about decision making? Did the player have enough time to take a touch? Did he rush his shot?

    – what about scuffed and deflected shots? Goals are major events but at times the shot resulting in goal didn’t really deserve to go in, if that makes sense. A player might miskick the ball and inadvertently place it in the right spot or it could beat the goalkeeper via a change in direction that wasn’t intended. Do we have to eliminate these from the equation? How much data would it take away?

    – What about the factor of goalkeeping. A great attack and shot could be thwarted by a superb save while a terrible shot straight at the keeper might go in. How often does this happen and how does it affect the probabilities?

    At the very basic level it is obvious that shots from the centre are better than shots from wider areas. Similarly, shots from closer to the goal are better than shots further away. But we are nowhere close to exactly mapping out these zones and assigning any sort of practically useful probabilities to them (not sure it’s needed, the basic idea seems good enough from a coaching point of view).

    Positioning will be helpful but it will just be another baby step in a marathon if we want to quantify the quality of chances.