Free Data

Free Data: Bayer Leverkusen's Invincible Bundesliga Title Win

By Hudl Statsbomb | May 21, 2024 | 4 min read
Free Data

Free Data: Bayer Leverkusen's Invincible Bundesliga Title Win

At Hudl Statsbomb, we are committed to providing education and resources to support the development of the next generation of analysts. One way we've been doing this is through regular releases of our industry-leading data to enable analysis of interesting datasets, teams and players. We're excited to bring you this latest release: Bayer Leverkusen's unbeaten Bundesliga title win under Xabi Alonso – the first league title in the club's history.

The dataset contains the same event data used by our customers, at roughly 3,400 events per game on average, across all 34 unbeaten league matches for Leverkusen. We're also including Hudl Statsbomb 360 data in the release: our tactical event data that contains the location of all players in the visible frame around each event. The 360 frames add critical context and enable deeper and more meaningful analysis than is possible using only the event data.

 

Including 360 data is very deliberate on our part, given the importance of spatial control and positional contexts in Alonso and Leverkusen's success. They were excellent at both ends of the pitch, with the 2nd-best open play xG created and best open play xG conceded, and were the most territorially-dominant team in the league too, progressing the ball to the final third most often while allowing their opponents into the final third least often.

They achieved this without employing an overly aggressive press but whilst maintaining the highest defensive line in the league -- their success was built not on overwhelming the opponent with hard-running and aggression, but through intelligent positioning and coordinated pressing traps.

->
->

On an individual level, there's the opportunity to analyse the influence of Granit Xhaka in the heart of the Leverkusen midfield, responsible for moving the ball into the final third more often than any other player in the Bundesliga this season, as well as playing the most line-breaking passes in the league too.

Or perhaps the re-emergence of Florian Wirtz, awarded Bundesliga Player of the Season in his first full season since returning from long-term injury. The 21-year-old contributed 11 goals and 10 assists, but what stood out the most was his ability to constantly find space between the opponent's defensive and midfield lines.

Accessing the data

The easiest way to work with our data is in R or Python, for which we've released two packages to make working with our free datasets more manageable: StatsbombR and StatsbombPy.

To pull the data into your working environment, you'll need to use competition_id (9) and season_id (281).

Example code for both is here:

R

library(tidyverse)
library(StatsbombR)
Comp <- FreeCompetitions() %>%
filter(competition_id=="9" & season_id=="281")
Matches <- FreeMatches(Comp)
StatsbombData <- free_allevents(MatchesDF = Matches, Parallel = T)
StatsbombData = allclean(StatsbombData)

And to add 360 data:

data_360 <- free_allevents_360(MatchesDF = Matches, Parallel = T)

data_360 = data_360 %>% rename(id = event_uuid)
StatsbombData = StatsbombData %>% left_join(data_360, by = c("id" = "id"))
StatsbombData = StatsbombData %>% rename(match_id = match_id.x) %>% select(-match_id.y)

Python

from statsbombpy import sb
import pandas as pd
from mplsoccer import VerticalPitch,Pitch
events_df = sb.competition_events(
country="Germany",
division="1. Bundesliga",
season="2023/2024",
gender="male")

And to add 360 data:

frames_df = sb.competition_frames(
country="Germany",
division="1. Bundesliga",
season="2023/2024",
gender="male")

frames_df.rename(columns={'event_uuid': 'id'}, inplace = True)
merged_df=pd.merge(frames_df, events_df,
how="left", on=["match_id","id"])

We recommend keeping both the event data specification and 360 data specification handy while working with the data. These contain a list of all column names and variables in the data, with definitions.

To help you work with the data, we created the Using Hudl Statsbomb Data In R and Using Hudl Statsbomb Data In Python guides. There's also more advice and guidance available in the How To Get Started In Football Analytics article.

Lastly, if you intend to publish your work on social media – which we greatly encourage you to do if you want your skills to be noticed by professionals – then please remember to abide by our user agreement and credit StatsBomb as your data source when doing so.

As always, we hope you enjoy working with the data. We're looking forward to seeing what can be discovered about this incredible Leverkusen season under Xabi Alonso, one of the brightest prospects from the new generation of coaching talent.

Best of luck,
The Hudl Statsbomb Team

By Hudl Statsbomb | May 21, 2024