A historic Bundesliga season is in the books. As Germany’s top-flight managed to restart the season after the coronavirus-induced break sooner than the Premier League and La Liga, it experienced how the new environment affects the game earlier than others. The Bundesliga staged nine matchdays under unusual precautions, most notably the banning of fans from the stadiums. It quickly became clear that--at least in this league--the home-field advantage was reduced when there weren’t tens of thousands cheering the team although this development has been overstated. In total, 37 out of 82 matches (81 matches on nine regular matchdays plus one match that was postponed before the break) were won by the away side, with nine of these wins for the underdog according to the standings. For comparison, we saw only 27 wins in the 80 matches on the nine matchdays before the break, including five underdog wins. The empty stands undeniably affected the football played on the pitch. There’s a point to be made that football is less consequential when no one – besides the coaching staff and a handful of bench players – is reacting to success and failure. Especially early after the break, several Bundesliga players were keen to show that they can escape situations with elegance instead of brute force. This resulted in pressing attacks being outmanoeuvred with dribbles and smart movements while in normal times the long hoof might have been the typical reaction from defenders that don’t possess the best feet. But that’s just anecdotal. Let’s look at some of the numbers that indicate the various effects of the different environment in what Germans call Geisterspiele (ghost games). We compare the nine matchdays after the break with the nine before, starting on 20 December 2019. Less intensity The eye test suggested early on that teams were less intense, particularly in advanced positions defending against the build-up of the opponents. The high press has been a prominent feature of Bundesliga football for several years. Centre-backs with bad feet often fell victim to this kind of style, while coaches of smaller clubs sometimes decided to abandon any kind of constructive build-up play for security concerns. While the pressures in the opposing half averaged around 77 per team per match before the break, it fell to 66 after it. This significant change certainly proves the eye test correct. What’s also striking is how many teams didn’t necessarily change their overall approach as the ranking among the Bundesliga clubs has remained largely the same, with two exceptions: Schalke, somewhat surprisingly, recorded the most pressures in the opposing half after the break which goes to show that the team did not give up in the midst of a crisis but rather were not able to capitalise on its intensity. Meanwhile, Borussia Dortmund dropped from second down to rock bottom which might give Lucien Favre’s critics some new fodder, as BVB were already third from last in this category before they lost to Mainz and Hoffenheim late in the season. The number of counterpressures across the entire pitch also declined, falling from around 35 to 30 per team per match. Borussia Monchengladbach were outstanding in this category post-break, while Dortmund and Wolfsburg were less active immediately after turnovers. There are a few factors that contribute to these numbers, most importantly the unparalleled circumstances during and shortly after the break. Teams were not able to train as hard as they would have liked to for a couple of weeks, because close contact between players was not allowed. Hertha’s Bruno Labbadia and a few other head coaches stated how they could not practise any kind of intense pressing or just intense actions in general which hindered plans to bring their teams into a state where they could replicate or improve the effectiveness of pressing. Moreover, the overall fitness level was likely below par when the Bundesliga returned on 16 May, but the tracking data provided through the Bundesliga indicated that teams were only running less and made fewer sprints on the first matchday post-break, and as time moved on, the numbers quickly approached pre-Corona levels. No tactical changes An instinctive response to these stats could be that coaches just adjusted the tactical setup of their teams to pay tribute to the circumstances. They accepted that their teams could not play such a high or even midfield press throughout the entirety of the match and instead settle for a more cautious style with a deeper back line and a more compact structure that would rely less on defensive actions and instead defend space more effectively. However, the defensive distance, meaning the average distance from a team’s own goal from which they make defensive actions, remained almost the same, with a per-team average of 43.9 metres before and 44.9 metres after the break. It wasn't the case that suddenly a good portion of the league was sitting deep, hoping to keep opponents away from the goal. We also didn’t see a wave of tactical changes, as most coaches stuck to what they did before the break in terms of basic formations and the structures in all game phases. Higher xG Less pressing coupled with no significant tactical changes could logically be considered to indicate a higher probability of scoring. Interestingly, on average, we saw fewer dribble attempts (17.3 compared to 18.3 per team per match) and fewer passes inside the box (2.4 compared to 2.7 before) but with a fairer distribution across the league. This indicates that it was easier for most of the teams to get close to the goal without having to rely on attacking actions that require outstanding individual skills. Instead, it was due to the declining resistance of opponents that teams were simply able to advance more easily. The open play xG rose from an average of 1.06 to 1.13 per team per match. The average amount of shots per team per match, however, dropped from 13.12 to 12.32, with fewer shots resulting from a high press (3.0 to 2.5) and through counterattacks (1.3 to 1.1). Overall, the attacking output did not increase, even though the decline in defensive intensity could have facilitated the output. What happened in the past few weeks was that there were fewer interactions between players, particularly in one-on-one duels, which allowed teams to play their way through defensive structures facing less resistance than usual. If the declining defensive resistance was caused from a lack of fans inspiring players to get physical, pressure opponents, and generate turnovers is just one factor in play during this period and is therefore up to debate, and a hypothesis that can perhaps never effectively be tested.
With a daily dose of partidos, writing about La Liga feels very much like shooting at a moving target right now, but we’ll do our best. The New Reality La Liga is back, and much looks the same as before. Valencia are still conceding shots at an alarmingly high rate, Messi continues to bear outsize responsibility for the success of the Barcelona attack, and André Zambo Anguissa remains one of the league’s most active dribblers and ball carriers through midfield. There has been very little variation in the number of dribbles attempted and completed, in how quickly teams move forward and in how high up the pitch they defend. The ball has been in play for more or less the same amount of time. A comparable number of free-kicks have been conceded, and the overall number of pressures and counterpressures has remained more or less constant. But some things have changed. We’re still dealing with a very low sample size here, so caution should be exercised in drawing any concrete conclusions, but a few patterns stand out. La Liga was already the major European league that saw the least shots and expected goals (xG) per match, and those figures have fallen even further since the restart. The number of shots has dropped from a pre-stoppage average of 22.36 per match down to 19.91, while the average xG has gone down from 2.12 to 1.92. It seems teams have struggled to successfully advance into dangerous areas as often as before. Completed passes within 20 metres of the opposition goal are down by over 15%. The aggressiveness with which teams are contesting possession has also decreased. On average, teams are allowing over one extra pass for each intent to break up opposition passing chains. The number of pressures in the attacking half and the proportion of aggressive actions (tackles, pressure events and fouls recorded within two seconds of an opposition ball receipt) are also down. Llorente's New Role at Atlético Atlético Madrid have been in fine form since the restart, recording three wins and a draw to take advantage of slower starts from other Champions League contenders and move up into third, six points clear of Getafe in fifth. Diego Simeone’s side have had good underlying numbers all season, but a finishing slump that lasted all the way into early February meant that results didn’t follow. At that stage, they were running well over seven goals behind expectation, and all four of their primary forwards were underperforming their individual xG sums. Things do now seem to be evening themselves out. Atlético have performed four and a half goals ahead of expectation over their subsequent nine matches, yielding an unbeaten run of five wins and four draws. That surge primarily seems to be the result of a shift in fortune in front of goal, but Atlético’s results since the restart also have a fair bit to do with just how well Marcos Llorente has performed in an unfamiliar second striker role. We got a glimpse of the possibilities in Atlético’s dramatic Champions League triumph over Liverpool just before the shutdown. There, Llorente was involved in all three of their extra-time goals, scoring twice and then slipping Álvaro Morata in behind for the third. But few could have expected him to resume the campaign in an offensive role. A defensive midfielder by trade, there had been little in his performances at previous clubs or even at Atlético, where he hadn’t slotted in quite as well as expected, to suggest he had the skillset to thrive further up the pitch. His passing has always been more neat and tidy than incisive, and he’s never been even a medium-volume dribbler. Simeone, though, had seen something. “After watching him in the training sessions, we decided to push him further forward against Liverpool and we discovered a player with different characteristics to the others in that position,” he explained after Llorente provided his third assist since the restart in Atlético’s 1-0 win away to Levante on Tuesday. A repeat of the surprisingly deft footwork that led to his goal against Osasuna this time produced space for a cutback deflected into his own net by Bruno González. In just 231 minutes of football, Llorente has already completed double the number of dribbles he did in the previous 812... ...and has set up more chances than the previous season and a half combined. After two seasons without completing a through ball, he has slipped through two in a week and a half. We are working with a super small sample size here, and it remains to see how much of this will hold over a larger one, but it appears that Simeone might just have engineered himself a new forward. Betis Sack Rubi It had been coming for a while, and finally the axe fell. Real Betis parted ways with head coach Rubi on Sunday following a 1-0 defeat away to Athletic Club that left them without a win since the restart and with just one in their last 10 matches. Rubi produced an excellent seventh place finish at Espanyol last season but was unable to replicate the neat and progressive football of that side with what was, on paper at least, a more talented squad at Betis. The underlying numbers were okay -- upper, rather than lower, middle pack -- but not enough to offer stringent support for his continuation given the club’s budget and pre-season pretensions. Rubi was never able to get on top of the defensive issues that led to his team conceding more goals than all but Espanyol and Mallorca, both of them relegation candidates. He also failed to derive any sort of output from summer arrival Borja Iglesias, signed for €28 million after an impressive season alongside Rubi at Espanyol. This is just pitiful: The reintroduction of elements of the approach of his predecessor Quique Setién did power a promising run of results through the back end of 2019 into the new year that was backed up by strong underlying numbers. But that swiftly petered out as attacking output cratered and their defensive numbers began to waver. Over the course of Rubi’s final 10 matches in charge, Betis were back to a pretty much even xG difference. More damagingly, they took just seven points -- alongside Eibar, the joint-lowest mark in the league. Rubi leaves Betis down in 14th, clear of the bottom three but with no realistic chance of European qualification. Another reset is in order. It seems that however much the directorship talk of modernising the club, this remains Betis: four head coaches and various backstage reshuffles in four seasons is about par de course at the Benito Villamarín. Alexis Trujillo takes charge until the end of the season, but who comes next? Javi Gracia, Manuel Pellegrini, maybe even Unai Emery?
The Premier League follows in the footsteps of the Bundesliga and La Liga in returning to action this week. Given it has been three months since a ball was last kicked in anger, perhaps it’s time to remind ourselves of how things stand in the major contests at both ends of the table. The Title Let’s be honest, Liverpool are going to lift the Premier League trophy. They’ve crushed this season, winning 27 of their 29 matches to open up a 25-point lead at the top. They need just two wins (or some other combination of results that yields six points) from their remaining nine to claim their first league title in 30 years. European Places This is where it gets interesting, particularly if Manchester City’s European ban stands. Remove them from the equation, and do likewise with the third-placed Leicester side who, despite a downward trend in their underlying numbers, enjoy a 10-point cushion over the first non-qualifying position, and things look very spicy indeed in the competition for the two remaining Champions League spots. Six teams are separated by just eight points; Arsenal, the side at the bottom of that group, have a game in hand. [table id=83 /] Results and the underlying numbers over the course of the season to date have Chelsea pegged as a frontrunner, but after that it gets a little more murky. Manchester United and Wolverhampton Wanderers are separated by just two points, and are pretty much neck and neck on expected goal (xG) difference. The underlying numbers over the last 10 matches suggest the momentum is with Wolves... ...but it is United who have taken more points in that time: 17 to 13. These things often take much longer than nine matches to shake themselves out, especially when the differences are fairly minimal. United will also benefit from the return from injury of Marcus Rashford, their primary attacking contributor. Tottenham Hotspur have also profited from the pause in action, as it has allowed them to recover some much-needed firepower in the form of Harry Kane and Heung-Min Son (as well as January recruit Steven Bergwijn). That alone is unlikely to turn things around for a side who after an initial surge under Jose Mourinho had combined mid-table results with downright sad underlying numbers in the lead up to the league stoppage, but it certainly won’t hurt their chances. Sheffield United are a couple of points ahead of Spurs and have a game in hand that if won would see them leapfrog United into fifth. They’ve already defied the odds by coming up and immediately establishing themselves as a very solid top-flight side, and while their numbers aren’t as good as those of some of the teams around them, they are still very much in the mix. Arsenal are at the back end of the group, and haven’t really shown enough signs of concrete improvement since Mikel Arteta replaced Unai Emery in December to give reason to believe they will end the campaign strongly. The two Champions League places and the two currently available Europa League places are likely to be filled by four of those six teams. But if the outcome of the FA Cup results in eighth place also yielding a spot in the latter competition, it could open to way to other challengers. Notably, an Everton side who have so far performed well below their underlying numbers. Relegation At the bottom of the table, there are probably six teams fighting to avoid filling the three relegation spots. Southampton, five points up the road from 15th placed Brighton and seven points clear of the bottom three, will likely be okay. [table id=84 /] Norwich are four points adrift at the bottom and it’s honestly difficult to see them making up the six points that separate them from safety. Their attempt to transplant their Championship approach and (largely the same) personnel to the top flight hasn’t really worked out for them. Aston Villa have had awful underlying numbers all season, particularly on the defensive side, where their concession of almost 18 shots per match has inevitably led to the worst xG conceded figure in the league. When your are conceding an average of two goals per match, you need your attack to be pretty damn good. Villa’s is only okay. They do, though, have a game in hand over those around them. Then comes the real action, four teams separated by just two points: Bournemouth, Watford, West Ham and Brighton. It is an incredibly hard race to call, and in truth, it wouldn’t be all that surprising if a similar number of points covered them come the end of the campaign. Watford have the momentum. Since Nigel Pearson became their third head coach of the season in early December, they’ve matched top-half results to top-half underlying numbers. There has been clear improvement, especially in defence. David Moyes has not yet had the same effect at West Ham. Results and underlying numbers have actually got marginally worse there since he replaced Manuel Pellegrini at the end of December. The stylistic changes are clear: deeper and more passive defending, and a greater reliance on transitional phases to create shots. The end result is all but unchanged. Bournemouth are one of the worst six teams by the underlying numbers and slipped into the bottom three off the back of three defeats and a draw prior to the league stoppage. They don’t inspire a great deal of confidence at either end of the pitch, and may have to lean on their set-piece ingenuity to steer themselves to safety. Brighton are yet to win a match this calendar year, and in fact have won only once in their last 14. No team have taken fewer points in 2020. While it is fairly easy to pinpoint the stylistic changes made since Graham Potter took charge last summer, it is not translating to results. Their underlying numbers are those of a mid-table side, and that would normally be enough to suggest they’ll probably be okay. But then you look at their remaining schedule, which includes encounters with four of the current top five across their next six matches, and it’s hard to be quite so sure.
La Liga is back. Nearly a month after the Bundesliga became the first of the major European leagues to resume, the Spanish top flight returns with an enticing set of fixtures that begin with a city derby between Sevilla and Real Betis on Thursday evening. Up and down the league, there is still much to be decided across the remaining 11 rounds of action. The Title Race Realistically, Barcelona and Real Madrid are the only two teams in the title race. Barcelona lead the way, two points clear of Madrid and a further nine ahead of Sevilla in third. The momentum would appear to be with the leaders. Since Quique Setién replaced Ernesto Valverde in January, they’ve taken more points than any other side in La Liga. Over that fairly small sample size of eight matches, they’ve also had the best expected goal difference, and by some distance: Setién’s heavily possession-based style should also be a good fit for the hectic fixture list that, if everything runs smoothly, will see the teams play their remaining 11 matches over the course of just over five weeks of action. There are some counterpoints. On paper, Madrid look to have the easier run-in. The pause has also given Eden Hazard the opportunity to recover from what seemed likely to be a season-ending injury. With Marco Asensio likewise closing in on a return, Madrid look better equipped than they might otherwise have been to go toe-to-toe for the title. European Places The contest for the two remaining Champions League places will be a thrilling watch. Just two points currently span Sevilla in third, Real Sociedad in fourth, Getafe in fifth and Atlético Madrid in sixth. Sevilla possibly have a slight edge. January signings Youssef En-Nesyri and Suso have added some needed variety to their attack, and while both Getafe and Real Sociedad have gained ground on them since the turn of the year, Sevilla’s underlying numbers have remained strong. But this is a very difficult race to call. Real Sociedad have won admirers as a young and vibrant team playing attractive football, but they look to have the toughest schedule of any of the top-four aspirants. Getafe have been on a tear since the turn of the year, but can they maintain their intense play style through the crammed fixture list? With their finishing slump seemingly behind them, and with a fairly accessible run-in, can Atlético barge into the top four? What seems clearer is that the race is limited to those four teams. Valencia are only actually three points back from Atlético but seem to have had more than their fair share of fortune. They’ve consistently over-performed their poor underlying numbers. Valencia’s numbers are trending in the wrong direction, and it seems improbable that a team taking less than nine shots a match while conceding nearly 15 can continue to get the results necessary to keep pace with those ahead. In fact, their seventh place, a position that could yield a Europa League spot depending on the outcomes of domestic and continental cup competitions, could come under threat from behind. Villarreal have been frustratingly inconsistent but have enough quality in attack to make up the four-point difference if things go their way. Granada have impressed on their return to the top flight and made a good start to 2020. But that’s probably it. Athletic Club have one of the best defensive records in the league but their attacking output is below average, and they’ve benefited from a positive swing versus their underlying numbers. They’ve also taken just 11 points from their last 12 fixtures. The Copa del Rey final against Basque rivals Real Sociedad would seem to offer their best hope of European qualification. Relegation There are three relegation places to be filled and six teams trying to avoid them. While there is still an outside chance that Alavés or Levante might get dragged into it, the battle against the drop is likely to be contested by Espanyol, Leganés, Mallorca, Celta Vigo, Eibar and Real Valladolid. Espanyol are bottom of the pile. Results have improved considerably since Abelardo became their third head coach of the campaign late into December, but there hasn’t been an accompanying improvement in their underlying numbers. In that time, they’ve been one of the league’s worst sides: With other teams towards the foot of the table also picking up good points in the lead up to the league stoppage, they are still six points shy of safety. All is not lost. There is still over a quarter of the campaign to be played. But Espanyol are not in a good position right now. Second from bottom are Leganés. They’ve had mid-table underlying numbers all season but have consistently underperformed those numbers at both ends of the pitch. Their luck may yet turn, but the January departures of En-Nesyri and Martin Braithwaite, between them scorers of almost half of their league-low tally of 21 goals, certainly didn’t help their cause. Next up are Mallorca, Celta Vigo and Eibar, all separated by just two points between 18th and 16th. Mallorca have shown signs of improvement since the turn of the year, particularly in defence, but they have one of the hardest closing schedules and a thin squad. Things look brighter for Celta Vigo, who have been a lot better under Óscar García and added players in three key positions during the January window. Eibar might be in trouble. While they’ve improved upon their pitiful early season performances, they’ve still been one of the worst three teams in the division by the underlying numbers since the turn of the year. A combination of the league’s oldest squad and an aggressive play style is unlikely to mesh well with the condensed schedule. Valladolid have a four-point cushion over the last relegation place. While they are far from home and dry, if they can continue to pick up points at their current rate -- which seems doable considering they are performing pretty much exactly in line with their underlying numbers -- it is unlikely that three teams will overtake them.
We complete our data history of the European Cup with the all-Bundesliga final of 2013. After seeing off the Spanish giants in their semi-finals, Bayern Munich and Borussia Dortmund met at Wembley, each seeking to become the first German winner in over a decade. This is the sixth and final part of the series. We’ve previously covered: - 1960: Real Madrid 7 - 3 Eintracht Frankfurt - 1972: Ajax 2 - 0 Inter Milan - 1989: AC Milan 4 - 0 Steaua Bucharest - 1995: Ajax 1 - 0 AC Milan - 2009: Barcelona 2 - 0 Manchester United Bayern were the favourites coming into the match, having run away with the Bundesliga and traversed a difficult route to the final that included a historic 7-0 aggregate thrashing of Barcelona in the final four; Dortmund had come ever so close to elimination against Málaga in the last eight before then seeing off Real Madrid to make it through to the final. Bayern had been extremely unfortunate to lose out to Chelsea in the 2012 final and were seeking to make amends and send coach Jupp Heynckes off into retirement on a high with victory. New Style, Vintage Results Just as we seemed to have settled into a stylistic tussle between patient possession and deep block defending, along came the Germans to upset the apfelkarren. Suddenly, the attention of the footballing world shifted to the Bundesliga. Gegenpressing (later translated as counterpressing) firmly entered the football lexicon and there was much talk of the importance of transitional phases of play. The meeting of two German sides at Wembley produced a high-paced encounter that was actually closer in style and output to the 1960 final that any of the others we’ve covered in this series. The shot count was nowhere near as high, but the 2013 final nevertheless sits second only to 1960 in terms of the expected goals (xG) total, although that was heavily tilted towards Bayern. There was also some of the frantic, back-and-forth play of that early final on display. The average speed of attack was the fastest of all the finals we’ve covered, faster still than in 1960. Dortmund were especially swift to transition forward after gaining possession. The average pace towards goal for teams in last year’s Champions League was 2.53 metres per second; Dortmund raced forward at a rate of 4.61 metres per second. Not that it lead to a particularly dangerous set of shots. Jurgen Klopp’s team began on the front foot, getting off six efforts on goal before Bayern had even mustered one, and accumulated 12 over the course of the 90 minutes. But even with Robert Lewandowski, scorer of all four goals in Dortmund’s 4-1 thrashing of Real Madrid in the first leg of their semi-final and impeccable in his use of his body to shield the ball and turn defenders, and an effervescent Marco Reus among their starts, they not only managed five less shots than Bayern, but the average quality of those shots was also far below those of their opponents. Despite a heavily aerial attack, there was very little fat on the Bayern shot map. Remove Dortmund’s penalty from the equation and they created under one expected goal. It may have taken Bayern until the 89th minute, when Arjen Robben skipped between two defenders and finished neatly to finally enjoy success in a major continental final after two failed attempts at the Champions League with Bayern and a World Cup final defeat with the Netherlands, to score their winner but it was clearly deserved. Robben and Thomas Müller had been involved in much of their best play. The pace with which the two teams attacked saw them regularly turn over possession. Even Bayern’s more patient buildup in deeper areas usually eventually resulted in a long ball forward from one of the two central defenders. The final featured the lowest passing completion percentage of any since 1960, with just a 71% completion rate -- nearly four percentage points fewer than the next lowest. Dortmund’s 65% rate was the first time since 1960 that a team had dipped below 70%. Not only did it stand out in comparison to the other finals in this series but also within the context of contemporary finals. The completion rate was the lowest of all those contested in the 2010s. [table id=82 /] And that’s the thing. For all that this was heralded as a new dawn in football, it didn’t start a revolution nor did it herald a new era of German dominance. The national team won the following year’s World Cup, but did so with a more possession-dependent style of play. At club level, Spain came back strongly, with Barcelona and Real Madrid lifting the Champions League trophy in each of the subsequent five seasons -- four times in Madrid’s case. Germany is yet to provide another finalist. Such is the widespread availability of footage in the modern age that even before Bayern and Dortmund took to the pitch at Wembley, their ideas had already been acutely analysed and elements incorporated elsewhere. They didn’t enjoy the same sort of extended advantage that a novel play style afforded Inter Milan in the 1960s or Ajax in the 1970s, for example. The totals for counterpressures and counterpressures in the respective attacking thirds in this match fell on or below the average points for those metrics during last season’s Champions League. What was once unique quickly became commonplace. Pep Guardiola’s arrival at Bayern in the summer of 2013 and some of his innovations, including narrowly positioned full-backs, also provided ready examples of how possession-based teams might seek to better protect themselves against rapid transitions. Add all that up and this final almost feels like a rapidly resolved glitch in the system. Dangerous Bayern Corners This Bayern side were a real force from set-pieces. Two of the goals in their semi-final rout of Barcelona had come from them, and they also created numerous chances from corners in the final. Seven shots from eight corners and pretty much an entire expected goal. There wasn’t all that much sign of some of the more advanced routines we see these days, although a neat early free-kick scheme saw Thomas Müller drop off to receive a central pass and lay wide for a cross headed on goal by Mario Mandzukic. The same player was unable to adjust his body sufficiently to successfully convert a near-post flick from a right-wing corner. But in Mandzukic, Müller and Javi Martínez, Bayern had three players very much capable of winning individual duels to get on the end of deliveries. ------------------------------ We hope you’ve enjoyed this series. Alongside our release of the Arsenal Invincibles data earlier this week, we also made our data from each of the last 20 Champions League finals freely available. If you fancy digging into some of the competition’s recent history, all the details for accessing the data can be found here: https://statsbomb.com/academy/ And a complete primer (in English and Espanol) on how to work with the data via StatsBombR is here: https://statsbomb.com/2019/07/messi-data-release-part-1-working-with-statsbomb-data-in-r/
As those of you who follow me on social media are aware, earlier this year we started working on The Invincibles Project. The idea behind this was to collect all of the data from this historic season to be able to look at it through a modern lens. I had initially pitched this as a follow-up project after the Messi Data Biography as something different, and another way of unlocking football's history. As an Arsenal fan, I found the whole thing exciting. Prime Thierry Henry! Doing things like this: The majesty of Robert Pires. Taking bodies! Dennis Bergkamp! Patrick Vieira! Jose Antonio Reyes! Kolo kolo Toure! Sol Campbell! Mad Jens! *Highbury roars* OMG SO EXCITING. Cashley. *crickets chirping* Also as an Arsenal fan, I know that other Arsenal fans could use a little joy in their lives and this seemed like the only way we were getting anything fun out of the Gunners in 2019-20. We started collecting this with an eye to releasing it side by side with the data set from a different red team, should they manage to finish their season undefeated. Sorry Liverpool fans, due to circumstances beyond our control, that data release slipped through our fingers. You'll have to settle for merely a league title and one of the largest title winning margins in history. The Problem In order to collect data, we need to have video. It was fortunate for us that Lionel Messi has played his entire career for Barcelona, because that is one of the few teams in the world that has historic video available on the internet from pre-2010 without needing to jump through a million hoops. That doesn't mean that getting all of the video to reconstruct Messi's club career was easy - far from it. It was merely doable. Arsenal? The only undefeated season in Premier League history? You would think this would be at least as simple as sourcing 15 seasons of Messi, right? It was not. We managed to get about half the 2003-04 season from the usual sources of football video history. And then we hit a wall. Our man in Spain and historic video expert Pablo Rodriguez then went to work, checking with various and sundry collectors that he knows who have large archives of historic, important football video. Through these wonderful people and the standard exchange of goods and services we were able to get to 32 matches of video. And then we hit another wall. Why? Well as Andrew Mangan of Arseblog reminded me, not all matches during that time period were broadcast to TV. In the modern day, every Premier League match is broadcast to air in multiple countries, which makes it easy to grab that video and store it away on a giant hard drive. Back then? A number of 3PM matches on Saturdays were simply never broadcast. (At least to our knowledge.) Which means that the collectors would not have that video unless they somehow tapped into different sources. We checked with Arsenal. I've been lucky enough to meet people that work for the club over the years, and we figured maybe they would let us have access to the video to collaborate on the data release and some cool stuff with club media. And they totally would have been... Except they didn't have the video either. Someone who worked for Prozone back in the day suggested that the opponents might have those videos, as they would have been delivered by courier as part of their service. But that ran into a variety of snags, including the fact that football clubs change personnel on this end with remarkable regularity, and having the archive, being able to access it, and even knowing who to talk to was insurmountable for us. The other problem here is the transition from analog to digital. Pretty much all archives back then were tape archives that would later need to be digitised so the match would be preserved for history. Rob Bateman of Opta tells the tale of trying to collect old Premier League matches from the 90s and being surrounded by crumbling video tape from the league's first decade. These Arsenal matches came right at the tail end of that period, and my understanding is that the PL has started to archive its history as much as possible, but it's still very much a work in progress. Finally you hit the problem of a license fee. We got in touch with the archive service with a willingness to pay a fee to obtain the final six matches needed to complete the project. We were quoted a figure to license the video for the entire Arsenal season that frankly didn't make any sense to me, and certainly eclipsed my budget for a public service project. I wanted to get everyone a data gift to bring people some joy during the pandemic, but I didn't want to/could not pay the price of a car to make that happen. The Premier League itself actually showed willingness to help us out, but as you can understand, they are rather busy with other priorities right now (like restarting the league during the middle of a viral pandemic) and suggested maybe we can revisit this when the world wasn't quite so mad? Which totally makes sense. But I have an anniversary data release deadline, and thus here we are. Incomplete Invincibles. Classics Data Pack 1 To make up for my own disappointment in not being able to complete this project, I added some extra matches I thought might interest people, including non-Arsenal fans. So what you are getting today as a gift from StatsBomb is a hefty little slice of football history, wrapped in the above-named package. In addition to delivering 32 of 38 matches from the Arsenal 2003-04 Premier League season, we are also giving you UEFA Champions League Finals data from 2000-2019. The collection on those CL matches aren't all finished, so will trickle out to the repository gradually over the next week to complete the set. Thank you to all of the fans out there who have supported StatsBomb over the years. Thank you to our customers who buy our products and give us feedback to make us better every day. And thanks to Arsenal for a truly magnificent season and set of memories. It would be great if we could get some more of those sooner rather than later. Information on how to access the data is here: https://statsbomb.com/academy/ A complete primer (in English and Espanol) on how to work with the data via StatsBombR is here: https://statsbomb.com/2019/07/messi-data-release-part-1-working-with-statsbomb-data-in-r/ *EDIT: A new, updated version of the R Guide can be found here: https://statsbomb.com/wp-content/uploads/2021/11/Working-with-R.pdf The data comes with our standard non-commercial license that is usable for fan analysis and academic research. If you are a commercial entity that would like to use this data, get in touch with email@example.com and we can have a conversation. All the best, --Ted Knutson CEO, StatsBomb *If we get video and I still run StatsBomb, we will finish this project.