This article is the second part in a series about StatsBomb's expected completion percentage model, you can read the first part here.
In football, data is often presented at a play level. Given the structure of the game, looking at plays is a natural way of quantifying what happens. However, StatsBomb data includes more granularity, providing data at an event level.
By event level, we mean a single action carried out by a player at a specific point in time. Two of the main events related to passing plays are the pass event and the pass end event. The pass event is easy to define, as this is the moment the QB releases the ball in a forward pass (lateral passes are defined separately within the data). The pass end event can take many different forms, and is any event related to the outcome of the pass. In the most simple case, the end event is a catch (whether this is successful or unsuccessful).
In more complex cases, there can be multiple pass end events associated with a pass event. For example, a pass could be batted by a defensive player, before a catch attempt is made by a player on the offensive team. This would result in two pass end events; one for the batted ball and one for the catch attempt.
We can see from the figure below that although the majority of passes have a single end event, there are a significant number of passes with more than one end event. There are even passes with as many as five or six pass end events. While this seems like a lot, it is possible - take a look at this play from Auburn at LSU for an example.
This presents an interesting modelling question: when there are numerous end events, which should we use for any model features that require information from the end of the pass? The answer to this question may vary depending on the analysis in question, but here we will explore this from the perspective of modelling the probability of a given pass resulting in a successful catch.
To help answer this question, let’s take a look at the breakdown of event types within the pass end events. For simplicity, the pass end events here have been grouped into seven broad categories based on the main event type. In practice, these event types also include additional information - for example, some catch events are recorded as “jump-catch,” based on the action of the receiver at the time of the catch. Although this information can be incorporated into modelling, it is not necessary for determining which single event to use.
We can immediately see some examples of event types that could co-occur within the same play. For example, an unsuccessful catch attempt by the offence, followed by an interception from the defence.
One potential approach to defining a single end event is selecting the event that occurs immediately after the pass event, with no consideration for the event type. If we are interested purely in assessing the quality of the pass and the decision making of the QB, this may make sense, as throwing into an area where this results in a batted-ball event is potentially dangerous, no matter if the end result of the play was a successful catch.
Another option is to consider a hierarchy of pass end events, with the primary event of interest (a catch) being selected as a priority over other events (a batted-ball). This approach would focus interpretation on the main result of the play, regardless of how this occurred. Due to how we expect our final catch probability model to be used, we believe that this approach makes most sense for our initial modelling work.
To further investigate, we can use a Sankey diagram to visualise how the end events interact with each other within each sequence. Due to the small number of passes with four or more end events, this visualisation includes data up to and including the third pass end event only.
This plot shows that most of the catch events occur as the first pass end event, although some catch events occur as later events. This means that taking the first event in the sequence would certainly result in some catch events not being included in the analysis.
Interestingly, there are also some passes with multiple catch events. It would therefore be advisable to include additional information to determine which catch to use as the paired-end event if a hierarchical approach was to be used - specifically, a successful catch (most likely the last catch attempt in the sequence of attempted catches) could be used over an unsuccessful catch.
In summary, the richness of the event-level information within StatsBomb data presents opportunities for many different modelling approaches, depending on the research question of interest. The intended interpretation of the final analysis should play a key role in determining how to use the data most effectively.
This article was first featured in our football newsletter. Be the first to see these by subscribing here.