Is every goal unique? The instinct says yes, but one needs only to remember Steven Gerrard to realise that it is possible to make a fine career out of scoring the same three goals over and over. The truth sits somewhere in the boring middle, but it's undeniable that many goals share similarities, including how they are created. It is this similarity that I set out to explore here.
My analysis rest on the notion of possession chains. For every on-the-ball event -- such as a goal -- it is possible to find the unbroken, ordered sequence of previous events leading to it. This is the idea on which Colin Trainor based his recent article about players' attacking contributions. My definition of a chain is likely different from his, and too technical to give it in full here, but the broad outline is as follows:
- I only look at chains terminating in a goal,
- The events in the chain are strictly consecutive (ie. no intermediate events are excluded),
- Only actions by the scoring team belong in a chain,
- A set piece can only be the first event in the chain (ie. we never look past a set piece),
- Ditto possession regain events (tackle, interception, recovery).
The numerous minor choices I had to make on top of these may mean that the overall definition is so arbitrary that I am unsure how much of what follows is insightful (never mind useful), and how much is just having unwholesome fun with the data. Caveat emptor.
The data I looked at was kindly provided by Opta and comprises all games from 2010/11 to 2013/14 in the top divisions of England, Spain, Italy, France and Germany. For every goal I derived the possession chain and grouped identical chains. It turns out that the 50 most common goals look like this:
[Yep, that's a screenshot.]
As you can see, by far the most common goal is scored with the team's first touch in the chain. I think this is partly a testament to the randomness of the game itself, but also to the strictness of the definition of the chain: if a defender manages to get a touch just before an intricate move is about to be crowned with a goal, none of the move will count in the chain. A penalty and a header from a corner complete the top 3.
Another summary of the results is provided in the figure below. Apologies for the terse, but hopefully still unambiguous codes for individual events. Note that listed event can occur anywhere in the chain for the goal to be counted, so for example the "head" bar comprises not only the headed goals, but also any goals where there was a headed pass in the move.
It turns out that only 64.5% of goals have a completed pass in the buildup (again, under my restrictive definition of buildup). I was delighted to discover that this agrees nicely with the classic analysis of Reep and Benjamin ("Skill and Chance in Association Football" J. R. Stat. Soc. 134(4):623-9, 1968, cited here after The Numbers Game), whose number is 60.6%. A quarter of goals involve a cross (but not necessarily as an assist), and about 1 in 19 see a shot saved before the ball goes into the net. Own goals are 3% of the total.
Finally, for the theory minded, here is the distribution of frequencies of individual chains on a log-log scale. It's tempting to drop some names here (cough Zipf cough), but in truth so many things look linear-ish on a log-log plot that it's best not to. Perhaps if and when the definition of the chain is made more robust, the distribution plot will be more interesting.