During the Summer I cast my first analytical eye at Goalkeepers in a piece over at my old Statsbettor site, this article can be found here.  In that piece I analysed the saves made by Premier League goalkeepers during the 2012/13 season and I concluded that Julio Cesar from QPR was the best shot stopping keeper last season.

Just about the only other person within the analytics community that cares about the poor old goalkeeper is Paul Riley, and he produced this excellent piece yesterday.

As tends to be the difference in our styles, I am tackling the same subject in a slightly more quantitative way than Paul did yesterday, but I’ve no doubt that we’ll end up at almost the same place.

All regular readers of my writing will know that Constantinos Chappas and I have developed a number of models that numerically  estimate the probability of a goal being scored.  We use Opta data served up by Squawka and / or StatsZone in order to estimate these probabilities.  As a result we do not have any data on defensive or goalkeeper positioning, but our models take into account whatever “on the ball” information is significant in determining the probability of a goal being scored.


In an article published last month I introduced a new metric that Constantinos and I calculated, ExpG2.  ExpG2 tells us the probability of a shot being scored AFTER the shot has been struck.
In short, ExpG2 is arrived at by looking at where and how the shot was taken and where it was eventually aimed for.  The further away from the centre of the goal the shot is placed the higher the ExpG2 value.

A blocked shot or a shot off target will always have an ExpG2 of zero because once these shots have been struck there is a 0% probability of them resulting directly in a goal.  A close range shot that is aimed towards the corners will have an ExpG2 value approaching  1.00.

According to my records there were 10,562 shots in last season’s EPL and 97 of them had an ExpG2 value of >0.90 (ie they had a greater than 90% chance of being scored).  89 of them were actually converted which suggests that the ExpG2 values are pretty well calibrated.

Rating Goalkeeper’s Saving Performances

After we created the ExpG2 values, their potential to rate the saves (or lack thereof) by goalkeepers became obvious to me.  In simple terms, we now have a numerical value of how difficult each save was to make.
At this point I must acknowledge once more that these values do not take into account defensive pressure.  However, the work that has gone into them and the detail included in their calculation means that I am not aware of any other goalkeeper save rating system that is more detailed.

The table below includes all Premier League goalkeepers that faced more than 100 shots last season.  It’s not important to know how many of the +100 shots were on target as we compare actual goals conceded to the Expected number conceded.

[table id=33 /]

ExpG2 is the amount of goals that our model expected the goalkeepers to concede based on the type of shot and shot placement
Goals are the number of goals conceded by the respective GKs
Save Efficiency is (ExpG2  / Goals), with a higher number signifying less goals than expected were conceded

The first thing to note is that the 7 keepers at the top of the list are the same 7 that appeared atop my ranking table in my previous look at this topic, even though my method of evaluation was slightly different.  So that’s a good start.

The more exact data that has been used in this exercise sees David De Gea achieve the best shot stopping performances last season.  The shots he faced suggested that he should have conceded 8.6 goals more than he actually did; this along with United’s ruthless attacking efficiency last season would have been major reasons that help explain how Fergie could succeed where David Moyes is so far failing.

You’ve got to feel for Gerhard Tremmel, I’ve him in second place in terms of saves in the Premier League last season, yet he can’t even get a game.

Mignolet, Julio Cesar and Begovic where the only other goalkeepers to have a Save Efficiency of more than 125%.  I haven’t yet ran the current season’s saves through ExpG2 (I intend to do this in the coming days) but it certainly feels that Mignolet has brought his super saving performances to Liverpool this season.

At the other end of the table you can see that Wigan struggled with whichever keeper they picked (as did Southampton).  Looking at the values returned by the two Man United keepers it is difficult to think that at one stage last season Lindegaard appeared to be the first choice custodian at Old Trafford.  Unfortunately for him, his and De Gea’s season went in entirely different directions.

In North London it is clear to see why Spurs decided to bring in Hugo Lloris to replace the aging (and ailing) Brad Friedel.  Although at less than 100% Save Eff the French man didn’t return sparkling numbers himself.

It strikes me as unusual to see the England GK (Joe Hart) having a worse than average record of saving shots last season, his Save Eff was just 95%.

Shots From Inside the Penalty Area

To give us a little more detail we can divide the shots faced between those coming from inside and from outside the penalty area.  This time we’ll concentrate on shots that were taken from inside the Penalty Area.  Again, the table below includes all goalkeepers that faced more than 100 shots from inside the area.

[table id=34 /]

For shots faced from inside the penalty area De Gea appears in just 4th position.  Three keepers (Begovic, Cesar and Mignolet) returned almost identical performances at the top of the rankings with Save Eff values of 127%.

Joe Hart noticeably climbs the rankings on this measure as he conceded 2.5 less goals from close in shots than would have been expected.  Hart’s strength appears to be stopping the closer in shots (perhaps due to good reflexes).
Reina underperformed badly when faced with shots from inside the area.  The information available to me suggests that he allowed 2 more goals that he should have done.  In both of these tables it can be seen that, from a shot stopping point of view, the replacement of Reina with Mignolet was a clear upgrade by Liverpool.

Shots From Outside the Penalty Area

The table below shows all goalkeepers that faced more than 50 shots from outside the penalty area last season.

[table id=35 /]

Due to the volatility implicit with long range shooting there is a much greater spread of performances than we seen with the inside penalty area saves.  As a result of this we perhaps need to be careful with any conclusions that we draw from this dataset as a keeper allowing or stopping just 1 additional goal could have quite an influence on the values in this table.

Alex McCarthy of Reading did really well when dealing with long range shots as he only conceded 1 goal instead of the 4 that the shots aimed at him would have suggested.
Tremmel, Lloris and De Gea all conceded at rates of half (or less) that would have been expected.

But let’s cast our eyes right to the foot of this table.  Joe Hart is in penultimate place in terms Premier League goalkeepers saving long range shots.
Last season he conceded 9 goals from these shots, when our model would have said that he should have allowed 5 of those efforts to result in a score.  This finding agrees with what Paul Riley concluded in his piece from yesterday, Joe Hart had a problem last season with stopping long range efforts.  With him being above average in terms of stopping closer shots perhaps his footwork or anticipation lets him done on these long range efforts.

To put Hart’s (lack of) performance in context, Alex McCarthy conceded 1 goal and Joe Hart 9 last season from shots outside the area when all quantitative data tells us that Hart should have just conceded one more goal.  That’s quite the difference……..


Like all the analytical work I perform I ask myself whether I’m reporting on something that just happened or whether these values are repeatable and thus they can be used in terms of picking teams or lining up transfer targets.
As we are just in the infancy of soccer analytics I can’t answer that question right now.  But with time, and the ability to run this type of analysis over previous seasons I hope to be in a position to determine whether there is any repeatable material difference in the ability of Premier league goalkeepers to save the efforts headed their way.

  • http://fantasyformation.com Gummi

    Fascinating calculations.

    I find it interesting to see Boruc’s placement, given the amount of praise he has received for his performances this season. This perhaps indicates that Southampton’s defence or Pochettino’s organizational skills are bigger reasons for Southampton’s performance.

    • Colin Trainor


      What I would say is that these values are for last season. I hope to do something similar for the season before the end of this week. It’ll be interesting to see if there how strong the correlation is.

  • Rich

    Whoah! This is fascinating: reason? Paul Robinson was England #1 for ages and was also massively vulnerable to long range shots. How odd that this should arise now with Hart.

    • Colin Trainor

      Interesting comparison Rich

  • Romu

    Hey, Colin. Can you give a more thorough explanation of the ExpG2 variable? To what degree of granularity does it take the x, y, z coordinates at which the ball was struck into account? To what degree of granularity does it take the coordinates of the ball as it enters the goal into account? Is there a way to incorporate the ball’s velocity and spin into the variable? Can you account for a potentially non-linear trajectory of the ball?

    • Colin Trainor

      Hi Romu,

      No, I can’t give a great deal about the exact calculations of ExpG2. All I will say is that it takes into account factors that can be identified from Squawka and Statszone.
      Have a look at those sites for a shot and you’ll see what sort of variables you can find.

  • Romu

    Hi, Colin. I understand that there are reasons you’d want to keep your methods proprietary. However, coming from a background as a scientist, I can also vouch for the benefits of peer review and receiving commentary and critiques from peers. When other are aware of your methodology, they can try to reproduce your results (thus, potentially spotting errors), offer valuable insights and critiques, and find ways to build on your methods.

    It looks like you guys are doing some great work, and as you’ve pointed out, football analytics is an incipient field, so for the sake of the field’s progress, it would be great if methodology were described as openly, thoroughly, and accurately (preferably, in a separate methods section, as in scientific papers) as possible. In science and in most fields of research, it’s generally better to share too much than too little.

    • Colin Trainor

      Mr Sharklasers email address man,

      Thanks for the kind offer of allowing me to open up my data and methodologies to you, but in the field of sports betting it’s generally stupid to share too much than too little.

      • Romu

        I don’t care a bit for sports betting (which is not even permitted in my country, aside from a couple of cities), and I don’t want you to open up your methods to me, but for everyone who considers himself a researcher in sports analytics to open up their methods to anyone else.

        It’s sad that you prioritize sports betting over knowledge (much less that you suspect others of trying to capitalize on your methods). Just be aware that research published without insight into the methodology it uses cannot be taken seriously, and will generally remain a curiousity or sidenote

        • Colin Trainor


          It’s only because I bet that I go to the bother and time to collect all the stats that I use in these articles. If I didn’t bet then these articles would never be created as the time required to collate and analyse the information is enormous.

          I either have the choice to publish as I currently am, which is to try to be helpful and identify patterns that are of interest to a great number of readers, but not giving away all the secrets of my sauce, or stop writing altogether.
          You may feel I’m being harsh, but it’s the way of the world.

          You might not care a bit for sports betting, but my wife and 3 kids certainly do. And I don’t intend jeopradising their standard of living for someone who wants me to be a researcher. I’m happy doing just what I’m doing right now.

  • Simon

    Hi Colin,

    First of all, nice read on a interesting topic. You state that these values do not incorporate defensive pressure. However, if I understood your article “defending in EPL” (great read as well by the way, really like what you’re doing on defensive measures) correctly, ExpG2 values could partly account for defensive pressure as defensive pressure could lead to more blocked, wide and aimed at the center shots, therefore lower ExpG2 values. I am aware you already stated that these goalkeeper save ratings are not solely based on goalkeeper performance, but is it fair to say that when looking at all shots instead of shots on target only, it is more of a defense save rating than a goalkeeper save rating?

    Nevertheless, really enjoying your effort and keep up the good work.


    • Colin Trainor

      Glad you enjoyed the read.

      No, ExpG2 does not account for defensive pressure the way you have suggested. If a shot is blocked or off target then the ExpG2 is zero as those shots don’t get as far as the goalkeeper.
      In my “Defending in the EPL” piece, I suggested that we could account for defensive pressure by looking at the DIFFERENCE between ExpG1 and ExpG2. ExpG1 is calculated in the milisecond before the ball is struck.

      Let’s take a far fetched example.

      If a team had 10 shots that seemed to be from very good positions and say they had an ExpG1 of 0.50 each that means that the team would register an ExpG1 value of 5. But if everyone of those shots were blocked or put wide then the ExpG2 would be zero.
      In that instance I would suggest that the difference between ExpG1 and ExpG2 of 5 is enormous and it appears that there must have been huge defensive pressure on the shots. Defensive pressure that was unable to be taken into account in the calcualtion of ExpG1.
      So I’m using the difference in ExpG1 and ExpG2 as a proxy for defensive pressure.

      Does that make any more sense?

  • Romu

    Fair enough, Colin. I’m not criticizing you for betting – just pointing out that I’m not trying to elicit knowledge for personal gain. I don’t want you to be a researcher – as a data modeler/analyst, you are a researcher. I was simply stating that your publications would be far more helpful if they were accompanied by a writeup of the methodology (and I don’t think anyone would dispute me on that). If you don’t want to publish that, it’s fine, but you responded with a somewhat snide remark (which had a tone as if someone is trying to steal something from you, or that I’m trying to personally benefit from your “secret sauce”) instead of calmly explaining your reasons, which was pretty obnoxious. Also, please don’t miss the irony that calling me out as “sharklasers” guy justifies my use of that anon email service.

    If you publish methodology, people could offer comments that could help you improve your system, which would surely benefit you (unless you’re convinved you’ve reached some pinnacle of your art; nobody does). It would then be up to you to publish or withhold that improved system. It’s not my intention to force you to do anything, but to give you some food for thought. We’re obvioiusly coming from two different philosophies here.

    • Colin Trainor


      It’s obvious that you don’t know much about betting. I never suggested that you are trying to gain knowledge for personal gain; it’s not you that I’m worried about.

      Betting edges tend to be small, and very difficult to come by. The publication of anything related to my special sauce increases the probability that the market will move towards my true odds and thus my edge disappears.
      This is what happens in a betting market.

      My “snide remark” came after you asked for a second time to give some details. I had earlier replied to your first question stating “No, I can’t give a great deal about the exact calculations of ExpG2”.

      Might that simple answer not have done you, instead of lecturing me about the benefits of peer review? Then the rest of this need not have been required.

      Oh, and the “sharklasers” comment was my thoughts as to you asking me to divulge more information about my work, but you were asking that via an anonymous email account. That’s a little Alanis Morisette, non?

      • Romu

        “Oh, and the “sharklasers” comment was my thoughts as to you asking me to divulge more information about my work, but you were asking that via an anonymous email account. ”

        People stand to benefit from learning about others’ analytics methods. I don’t think you’ll accrue much benefit from learning any of my email addresses. Whenever I’ve published scientific papers, they’ve included detailed methodology sections (because no journal publishes papers without a detailed methodology section unless there is an exceptional reason for doing so).

        My “snide remark” came after you asked for a second time to give some details. I had earlier replied to your first question stating “No, I can’t give a great deal about the exact calculations of ExpG2″. Might that simple answer not have done you, instead of lecturing me about the benefits of peer review? Then the rest of this need not have been required.

        Right, you simply said that you can’t do something without divulging your reasoning. No, I typically don’t find much meaning in a stand-alone “yes” or “no” answer, so that sort of thing typically does not suffice for me.

        Your snide remark came after I tried to explain the benefits of communicating your methods to others; you’ll notice that my second post does not specifically ask you to give any details.

        Betting edges tend to be small, and very difficult to come by. The publication of anything related to my special sauce increases the probability that the market will move towards my true odds and thus my edge disappears.

        I’m aware of that – a similar principle is applicable to the stock market. As I stated above, divulging methodology might help you improve methods (which you could again choose to publish or not publish), as it could elicit meaningful input from others; you could improve your predictive power without compromising your odds by staying a step ahead of the crowd. It’s clear that you’re not interested in that approach, so that’s fine.

        I can certainly understand how tying your family’s standard of living to betting, as you claim to have done, would make you edgy, though, so I think I understand where all this is coming from.

Improve Performance and Productivity in Your Club:
State-of-the-art Football Analytics