Meet the metric that could blow the soccer analytics game wide open

It looked done and dusted at Old Trafford last Sunday. After a game in which United sent endless probing crosses from each and every flank of the Fulham half, Manchester United were set to win their Premier League match 2-1 and pick up a much-needed three points in their long-shot quest for Champions League qualifications.

Then a freak moment, a save, and a point-blank header for Darren bent deep into extra time, and the game ended in a 2-2 draw. This was a game one might presume United “should” have won based on the raw shot statistics. United took 31 shots to Fulham’s 6. United also managed to get 9 on target to Fulham’s 3.

This, of course, was just one game out of the 38 United will play in the league this season. Without looking at the match footage (never a good idea, guys), perhaps Maarten Stekelenberg had a great night, or United just suffered from a bit of rotten luck in front of goal. What matters, in terms of statistical significance, is whether United outshoot their opponents on a regular basis, from game to game, and season to season (they’re not at the moment). This is the essence of Total Shots Ratio.

Except even over the long term there are outliers when looking at raw TSR numbers. A quick look at Ben Pugsley’s awesome sortable tables reveals some interesting differences between TSR and table position. Arsenal for example are second in the league on points but sit 8th in TSR.

There are several ways we can explain away this variation. One is through compensating for game states, or as they’re known in hockey, score effects. Basically the data suggests that teams with a +1 goal lead shoot less often but with a higher conversion rate, and teams trailing -1 shoot more but with less accuracy. You can isolate for this effect by looking at TSR at tied game states, as Pugsley does with his tables. So using Arsenal as our example, they are in fact third in TSR in a tied GS. Moreover, Arsenal spent the third highest time in the league in a winning position, which likely pushed their TSR down a bit.

The other, more traditional method (if something only several years old can be called traditional) is to take a peek at the team’s PDO. What’s PDO? Well I’ve written a go-to definition here, but essentially it’s just shot percentage plus save percentage. The reason it’s useful is because both these numbers (one more than the other) quickly revert to the mean. That just means they vary a lot (not entirely though) from game to game, and so we say they’re a product of random variation, or ‘luck.’ So, once again, Arsenal’s PDO is 1127, which is well above the median (roughly 980-1020). They’ve enjoyed some good fortune this season too.

Yet getting away from the raw numbers here, there is something that doesn’t feel intuitively right. For example, scoring goals in football clearly isn’t just a matter of luck, flipping a coin over and over again. There is quite a big difference from the kind of chance Bent finished in that United game, and a goal scored from the half-way line. One is much more clear cut (to use Opta’s language) than the other. This is where we get into the Next Big Thing in football stats: Expected Goals, or ExpGs.

Now, while some analysts refer to a specific metric when they use ExpG, I think it’s more helpful to think of ExpG in a more a general sense as methods that incorporate shot characteristics, like type or location, into raw shot data. Why? Because—duh—there are certain areas of the pitch where you’re more likely to score from than others (see Bent’s goal for Fulham). To the point where we know the average conversion rates for all of these things, and can compare them by both team and player. We can also see if teams are creating better scoring opportunities on a consistent basis.

Now my major initial concern about ExpGs mostly had to do with how some were leaping to use it as a way of evaluating individual players (did they score more or less than they’re ExpG total?) before we knew anything (publicly at least) about its repeatability as a metric.

Thankfully we’re getting a bit of insight into this key question. For one, The Woolster looked at including Opta’s admittedly observer-biased metric Clear Cut Chances (CCC) into TSR, and noted an improvement in correlation to team goal difference which remained relatively stable from season to season. The ever-interesting 11tegen11 blog also did the heavy-lifting and noted not only a strong correlation between Expected Goal Ratios (ExpGfor / (ExpGfor + ExpGagainst)) and Points Per Game and Goal Difference, but also a strong repeatability from season to season. Finally, Mark Taylor brilliantly illustrates how sometimes creating two really good chances in a match is significantly more effective than creating 12 comparatively weak ones, which would obviously carry implications for TSR.

This paints a tentatively hopeful picture I think for getting a much more comprehensive view of how football clubs are performing beyond TSR and PDO. To that end, Michael Caley’s table adds yet another intriguing layer to team analysis, applying Shot Matrix data to each Premier League team. And so that allows us to take one more look at Arsenal.

And look! Fourth in the league in shots in the ‘Danger Zone’, first in the league in shots on target from the same area. Arsenal may not be dominating their opponents in shooting, but they are taking more shots in dangerous areas of the pitch. This may be augmented by their time spent leading +1, but that’s the beauty of analytics. Each metric is a tool to be used alongside one another to paint a more comprehensive picture.

If evidence emerges that this is repeatable across individual players (though I’m tentatively skeptical), well then we’re cooking with gas in a way that I think could blow the analytics field in football wide open.

Meet the metric that could blow the soccer analytics game wide open

HEADLINES

Daily Newsletter

MORE STORIES

Daily Newsletter