This is generally a blog about analytics and football, and I’ll get there soon, but I was thinking recently about what might have happened if Bill James, the father of the baseball stats movement, had been hired by baseball teams back in the early 80’s. (For those who want more reading about James that doesn’t directly involve reading about baseball stats, his Wikipedia bio is here, and Moneyball has lots of info as well.) James’ Baseball Abstracts, the early editions of which were written when he was a security guard at a pork and beans canning plant, were the spark for most of the modern statistics movement in baseball. It’s often hard to pinpoint the tipping points in time, but Baseball Abstract was fairly clearly one, and most of the modern baseball stats writers were hugely influenced by his work. That in turn lead to guys like Billy Beane, General Manager of the Oakland A’s and primary focus of Moneyball, doing what he did, and the Red Sox hiring James as a consultant in 2003. The first Baseball Abstract was in 1977. It would take 26 years for James to officially work in Major League Baseball. That is a long time to build a body of work and public base of support, and to teach and educate interested parties. What if James had been adopted early on by some ownership group and the majority of his work had been kept secret, so use by some teams? Obviously, that team would have benefitted, but the world almost certainly would have been a poorer place. Don’t believe me? Here’s a quote from Moneyball that I still find shocking almost every time I read it.
“The legendary GM Branch Rickey employed a professional statistician named Allan Roth who helped to compose an article under Rickey’s byline in Life magazine in 1954 that argued the importance of on-base and slugging percentages over batting average.”
1954! And yet it would take nearly half a century more and the publication of Lewis’s book before most of the baseball world caught on to these core concepts. The other thing to realize here is that the baseball stats movement eventually triggered movements in other major sports. Baseball had the luxury of detailed box scores with a century of useable (if not always useful) data available. And while James often lamented the quality of the data in his early work, which eventually lead to the formation of STATS Inc, at least he had something to work on that consisted of more than batting average, home runs, and ERA. Coming back to the question posed at the start: what happens if James doesn’t have all that influence? Does someone else pop up to take his place near the exact same time? Or does the absence of his high profile public work retard the development of baseball stats for another decade, and thus contribute to a drag on the development of statistical analysis in other sports as well? Obviously we’ll never know, but it certainly could have happened. All it would have taken was one smart owner reading an early Baseball Abstract and POOF, James would have been sucked into the sky, while decades of future work would be gone. What’s the Point of Rambling About Baseball? Bear with me as I draft in Gabe Dejardins for a little guest spot. Gabe is one of those brilliant hockey analytics guys that football has stolen a lot of concepts from over the last few years, but he was also writing original, fantastic work about football analytics back in 2010, before WhoScored or Squawka, or really any public data even existed. He was doing this stuff as a sideline to his hockey blog before most people even thought about it. For those who don’t follow the NHL, there have been a huge number of hires this summer by professional hockey teams targeting statistical bloggers. Two of the most prominent were Tyler Dellow and Eric Tulsky, but the Toronto Maple Leafs also hired the guys behind extraskater.com, the primary hockey data site, and in the process shut the whole site down so no one else could use it. It has been a crazy summer for anybody intelligent who was doing hockey analysis, but as Gabe explains here, it’s not really a new thing inside the walls of many teams.
Now as I mentioned, Gabe was crunching soccer stats before most of us knew data existed. And Gabe’s point about hockley analytics holds true for football as well. (I wanted it to exist back in 2004-05 so that I could work on it, but there were no public sources. In fact, I have a notebook from a trip to Prague in 2005 with the business outline for a company just like Opta to collect football stats and do analytics. I knew Opta existed because their name was on the Premier League home page, but all we ever saw of their work was the ridiculous Opta Index - a single, useless black box number evaluating a player.) You know who else has been crunching data for ages? Gavin Fleig. He’s currently Global Lead for Talent Management at Manchester City, but he started out way back at Bolton Wanderers with Sam Allardyce, and they crunched data and built game models to help Bolton punch well above their weight for quite a few years. The same can be said for Steve Houston, currently of Sunderland, but formerly of Chelsea, Hamburg, and the Houston Rockets. And Ian Graham, currently at Director of Research at Liverpool, but formerly at analytics company DecTech. (Graham actually has a small archive listing at the SoccerAnalysts Blog dating back mostly to 2011! The things you find on the internet...) There are a number of guys who have been working with soccer data inside of clubs for much longer than you might expect. Here’s an early Sloan Sports Conference soccer planel with all three of those guys plus Blake Wooster, formerly of Prozone and currently of 21stClub discussing data stuff back in 2011. (I would embed it, but it's not on Youtube.) Never heard of any of those guys before now? This wouldn’t be a huge surprise, especially if they don’t work specifically for your club, because they all work IN clubs. Therefore whatever their research uncovers is all top secret. What I found fascinating, however, is how clearly all of them communicate in that panel how they wish there were more statistical analysts around. Football has tons of sports science analysts and miniscule numbers of stats geeks doing good work. These guys want to know more, and they want to read you and me writing it. In fact, as Fleig explains in an interview here, that desire was a big part of the impetus behind Manchester City releasing their data set to the public back in 2012. They knew fans needed to have the data in order to be able to ask and answer interesting questions about how football works. And they probably knew from American sports that increasing data availability actually triggers an enormous increase in fan interest and involvement from certain groups of fans (basically anyone who might play fantasy football). It’s clear that all of these teams wanted more people doing research about the game and hopefully writing about it, so that they could learn additional useful info for free. The StatsBomb twitter account is dense with followers who work for teams, either publicly or privately. I know for a fact that a surprising amount of the work done by guys in the analytics community has been read and adopted into football teams already. Free labor, plus competitive advantage if you no how to apply it. It’s hard to beat that sort of thing. Two more people doing kickass stuff way back are Sarah Rudd and Ravi Ramineni. Ravi works for the Seattle Sounders in MLS now, while last I heard Sarah Rudd was a vice president at StatDNA. I also heard rumors that StatDNA was the analytics company purchased by Arsenal two years ago, but I can’t confirm that because despite looking all over the place, I never did see a name mentioned in the press. Assumption: Arsenal bought a fantastic analytics company who were totally ahead of the curve two years ago and who have probably continued to innovate since. Whether Wenger and co leverage that information is another question entirely. Anyway, the point in all of this was that analytics usage in soccer/football isn’t new, but it’s also not terribly widespread. Some of the stuff we’ve done on StatsBomb might be new research, and was only possible after WhoScored and Squawka appeared, and after we took a ton of our own time to put that information into crunchable form. On the other hand, much of the work we’ve done on StatsBomb has probably already been done at many clubs throughout England. This is hugely frustrating for me, but despite reading everything I can get my hands on in this area for the last two years minimum, there just isn’t that much new research being published. Why did we have to redo all the work? Because football doesn’t have a Bill James or Rob Neyer. Or Gabe Dejardins, Vic Ferrari, Tyler Dellow, and Eric Tulsky of hockey fame. Or Dave Berri, John Hollinger, Zack Lowe, and Kirk Goldsberry (plus many others) of the NBA. Without that sort of long-term public framework to stand on, analysts reinvent the wheel again and again as they start to ask and answer the interesting questions about how the game works. Bill James produced a book a year on this stuff from when he started in 1977, and a huge number of other writers sprung out of the interest in his work. Football pretty much has two books about stats, total. (Soccernomics and The Numbers Game) Yes, the way the world publishes things is different and the total blog publication of what we've done would certainly stand up to any of James's busy years, but still... two books total. Football might develop a Bill James in a few years, but I think the odds are against that happening, and here’s why. Unless you actually hate the game or they offer stupidly low compensation packages, it’s hard to turn down football clubs when they come calling. And honestly, if you are putting all this work into crunching the stats, you almost certainly love the game. So there you are, doing work, wishing for more/better data, and writing about it in public. You build a following, and start to have some interest from media and the occasional private email or DM from clubs asking about your work. Eventually that culminates in someone giving you a job offer to stop working in public, but to have a potentially real impact on an actual football club, with a fuckton of data including the secret stuff that in some cases no one really admits exists. Poof, much like the myriad of hockey bloggers this summer, you get sucked into the sky and your future (and in some cases your past) work disappears with you. It’s possible hockey research will experience a rough year or three now as well, since it will take time for new writers to fill the massive holes left by the most recent hiring sprees. The funny thing is, if 10 Premier League teams immediately wanted to find and hire statistical bloggers, I’m not sure they could do it. And if another 10 clubs from the Championship and Spain and Germany wanted to find writing talent for immediate hires, they definitely would hit a wall when trying to hire amongst the football blogging community. There simply aren’t enough people out there writing period, let alone enough who have displayed the kind of skill in analysis, math, and attention to detail the hockey guys were doing. Why? To sum up, I think it comes back to three things. 1) Huge problems finding detailed data to crunch. American sports have had these issues off and on at varying levels, but in Europe the data is extremely expensive to buy, most data providers don’t have a public face, and those that do always have to keep an eye on the bottom line. It’s doable, but it’s certainly not easy to get started. 2) No Bill James-type figure to push the development with a huge body of public work because... 3) Every time a potential figure shows up, they get hired by clubs. This creates a big competitive advantage for the hiring club, but it retards the development of the discipline as a whole. Back to the title question – what if Bill James had been hired in the early 80’s? The development of baseball statistical analysis would have probably taken a lot longer to happen, which in turn might have delayed improvements across any number of other sports. In fact, you might say that baseball even a decade or two after taking James out of the ecosystem would have ended up looking a lot like football/soccer analysis does today.