The Uncertainty of Statistics – Follow Up

I’ve received a lot of feedback on the Uncertainty of Statistics article from a few days ago – thank you very much for your comments. Here are a couple of additional points that I probably should have made clearer in the original article:

The statistics themselves are not the issue
As I said in my opening paragraph, I am not calling the accuracy of the statistics provided by Opta into question at all, regarding Szczesny, Friedel or or any other statistic they provide. Those guys are fantastic.

The statistic chosen for the article is illustrative only
Szczesny vs Friedel is not the issue, it is just an example which I used to illustrate my point. Here’s another example: pass completion percentages. These are often produced after matches, particularly for midfielders, and it is commonly accepted that a higher percentage means a better performance, evidence that a particular player “ran the game”. So here are two identical stats from two different players on 21st December:

20111230-193829.jpg

Both players completed 67 passes from 74 attempts, for a pass completion percentage of 90.5%. But even a cursory glance at the image shows that Carrick completed his passes much further from goal and much more frequently with his defence compared to Silva, whose passes were often attempted in the opponent’s half. This is the context which suggests that Silva was generally attempting more difficult passes than Carrick in these matches. This suggestion is partly backed up by the fact that Silva created two chances to Carrick’s one. The point is, given the context of the passes attempted by each player, I would expect Silva’s pass completion percentage to be lower than Carrick’s, but that is not necessarily a bad thing as I’d expect Silva’s passes to be more likely to lead to goal scoring opportunities than Carrick’s. A simple 90.5% pass completion percentage conceals all of this context, so it must be used in concert with other stats such as the location of each pass and the number of chances created to add the context which makes it more meaningful.

The original conclusion may indeed have been correct
I argued in the article that it is impossible to compare goalkeepers by their shots saved statistics alone. It may well be the case that Friedel is 18 percentage points better at saving shots than Szczesny i.e. the original conclusion is valid once shots have been controlled for. My point is that other conclusions could also be validly drawn from that same statistic; for example that Arsenal’s defence allow better goalscoring chances than Tottenham’s, meaning that Szczesny faces a higher proportion of shots which are better goalscoring chances (and therefore less likely to be saved by the goalkeeper) than Friedel. Without considering the context, it’s impossible to determine which conclusion is more likely to be correct. That is the point of the article.

The Uncertainty of Statistics

As you know, this site makes use of the fantastic football statistics put together by Opta and delivered by the Stats Zone iPhone app. I am hugely in awe that while I sit on my sofa watching a match I can receive near-live statistics on each pass, shot and tackle taking place for less than the cost of a pint per season. This is the future and I’m glad to be a part of it. In fact, without Colm McMullan’s Total Football app (the precursor to Stats Zone), this website would probably not exist.

This increase in the availability of match statistics makes analysing football matches much easier, and has given rise to a number of armchair pundits such as myself. For example, want to prove who is the best goalkeeper? The shots saved percentages are right there. Which player is best at keeping possession? Just look at the pass completion percentages. It couldn’t be more simple with the facts available to all.

Unfortunately, it isn’t that simple. There is a reason why statistical analysis underpins most of science. Physicists at the Large Hadron Collider at CERN recently announced that they have effectively found the Higgs boson, but it will take a year to prove that the statistics suggesting it exists are genuine and that all other potential explanations have been ruled out. They are currently doing the same thing with the statistics “proving” that neutrinos travelled faster than the speed of light. Don’t worry, I won’t mention nuclear physics again, nor will I suggest that this rigour needs to be applied to Christopher Samba’s heading stats, but it illustrates how large and important a field statistical analysis is.

To take an example from a discussion I recently had on twitter: it was quoted that “Brad Friedel is far outperforming [Wojciech] Szczesny” as their shots saved percentages are 79% and 61% respectively. I believe that it is impossible to draw that conclusion from those statistics for reasons I will shortly explain, although before that I should note for full disclosure that I am an Arsenal fan, even though I would make exactly the same argument had the statistics been the other way around.

The reason why that conclusion is impossible to make (or to put it another way, why that statistic is meaningless as a method of evaluating goalkeepers) is because it suffers from a lack of context. Basically, it assumes that all shots are equal, when there is a very simple thought experiment which disproves this:

Picture two goalkeepers trialling for a place in a team you manage. You put them in a match on opposite sides and sit back to evaluate their performances. Team A has a midfield full of ‘Charlie Adam’s, constantly taking shots from 40 yards out which bobble towards goal. They get ten of these shots on target, and score three goals. Meanwhile Team B has wingers like Bale and Valencia, constantly firing dangerous crosses down the corridor of uncertainty for their strikers to shoot from six yards out. They also get ten of these shots on target and score seven goals. Which goalkeeper do you select?

Team B’s goalkeeper only let in three goals, while Team A’s let in seven. Team B’s shots saved percentage was 70%, while Team A’s was only 30%. However, I’d pick Team A’s goalkeeper all day long. Why? Because based on the context of the shots he had to face he wouldn’t be expected to save any of those point blank shots but he managed to keep three of them out, while Team B’s keeper should have thrown his cap on all ten of those weak long range shots but he somehow managed to let in three of them.

This is all very theoretical, so here’s an example from the games played on December 27th:

20111228-125223.jpg

As you can see, Friedel faced two shots, saved them both, so had a clean sheet and a shots saved percentage of 100%. Szczesny on the other hand only faced one shot, let it in so had a shots saved percentage of 0%. Pretty clear cut who was the better keeper, no?

Well, no. Friedel faced two shots from outside the area, both of which he would be expected to save, and that’s exactly what he did. The goal Szczesny conceded was a deflected shot which Fletcher expertly steered back across goal into the far corner, wrongfooting the keeper, and realistically no goalkeeper would have been expected to keep that chance out. I’m assuming you’ve seen the goal but if not try the ESPN Goals app or Eurosport highlights:

20111228-130215.jpg

The point is, 100% shots saved compared to 0% shots saved seems pretty damning, but when context is added, both percentages would be expected given the shots each keeper had to face, so using those statistics to compare the keepers is impossible. Note that I am not arguing that Szczesny is better than Friedel; this is not a defensive post and in my own personal opinion both are two of the best goalkeepers in the Premier League. All this post is claiming is that it is impossible to draw conclusions about goalkeepers based on shots saved percentages without context.

Is it possible to get around this problem and come up with a statistic which would allow genuine comparisons of goalkeepers? Well, if each shot was rated based on whether it should be saved or not, or perhaps given a score of 1-10 for how difficult a save it required, then that would provide the context needed to accurately compare keepers against each other. A goalkeeper making a higher proportion of difficult saves would therefore be unarguably better than a goalkeeper letting in a higher proportion of easier saves. Without this context, the statistic is meaningless.

You might suggest that these things even themselves out over a season, but this oft-used excuse seems to be accepted as conventional wisdom by many people in football without any proof whatsoever. In this instance, a solid back four will consistently lead to teams getting frustrated and shooting from distance over the course of a season, while a porous back four will consistently allow more point blank shots and one-on-ones. The number of shots taken against each side may be relatively consistent, as why would a team shoot from distance against a porous defence when they know they can create better chances by holding on to the ball? I refuse to believe that anything evens itself out over a season until somebody proves it with statistics, as there are a huge number of counterexamples, not least of which the equivalent cliché of “when you’re down there, these things go against you”. There doesn’t seem to be much “evening out over a season” going on there. Like “narrowly wide”, clichés can be oxymoronic. Or just moronic.

In conclusion, the reason for this post was just to make people stop and think when looking at the wonderful array of statistics on offer to us football fans these days. This applies as much to shot saved percentages as pass completion percentages (how easy or difficult were those passes?) or shots on target percentages (was that a free header or was he surrounded by defenders?). Context is the key to statistical analysis.

Premier League Matchday 18 – Chalkboard Analysis

Some brief thoughts on the Boxing Day action:

Fernando Torres is working hard – winning freekicks, making tackles, playing passes – but surely he was bought for his goal threat rather than his link up play?

20111227-004417.jpg

Juan Mata is shouldering Chelsea’s creative burden, but surely he’s more of a Ljungberg than a Modric or a Silva? He is currently tasked with providing Chelsea with the service from the centre that ideally he’d like to be receiving in the channels. Note how his passing around the edge of the area is frequently very short, often backwards and rarely successful at playing into the area. It’s a square peg in a round hole, but he’s the best Chelsea have in this role until they renew their quest for Modric or a similar playmaker.

20111227-005234.jpg

No such problems in Fulham’s midfield though, with a typically great performance from Murphy, bettered by Dembele.

20111227-005551.jpg

Another top 5 team struggling to create were Manchester City. For such a supposedly free-flowing team, it was odd that Samir Nasri seemed to be restricted to left or right wing play only, never really coming into the middle to play more intricate football with Silva or Aguero.

20111227-010946.jpg

Meanwhile Charlie Adam continues to shoot from ridiculous distances, which Kenny Dalglish appears to tolerate for some reason. His shots are the thicker arrows; I had to show them like this because the shot chalkboard (justly) assumes you are at least within sight of goal before taking a shot, so Adam’s shots were off screen…

20111227-005846.jpg

Just ahead of him, Andy Carroll is warming for an extended run in the Liverpool team. It’s clear by the aerial challenges (upside down Vs) that he’s very adept at the “long ball target man” role, but is that what Liverpool need, particularly against a team like Blackburn?

20111227-010551.jpg

It certainly suggests Liverpool will struggle to replace Suarez’s link up play between the lines and threat in the six yard box:

20111227-010647.jpg

Finally, after all this talk of midfielders it was rather scary to see that three of the top four players for *attacking* third passes in the Bolton v Newcastle game were in the Bolton back five – and I thought Owen Coyle was brought in to get Bolton playing sexy football…

20111227-011407.jpg

Premier League Matchday 16 – Chalkboard Analysis

This week’s chalkboard analysis can be found at FourFourTwo here. Meanwhile, here’s a cheeky chalkboard that didn’t make the article:

20111219-213005.jpg

I noted last week how Newcastle’s enforced reshuffle in defence caused them problems in the air, particularly from set pieces. Swansea were never going to test that aspect of their play this week, but I was struck with a) how little a threat Swansea posed, and b) how inaccurate Newcastle’s shooting was. Demba Ba registered two shots on target from his eight attempts and had a further two blocked, but what were the rest of the team up to?

Manchester City 1-0 Arsenal – Match Analysis

Arsenal unsurprisingly selected an unchanged team from the one that faced Everton, meaning four centrebacks in defence given their collection of injured wingbacks. The major concern would surely be on the right, where Djourou is the obvious weak link in the side and Mertesacker could struggle against the pace of Aguero. The lack of a true wingback perhaps also led to Nasri’s selection over Milner, as defensive acuity would be at less of a premium. Perhaps Mancini was also hoping for an “old-club” reaction from Nasri, although that wasn’t forthcoming in the recent league cup match at the Emirates between these two sides. Only two starters from that match made today’s Arsenal team, as on that day a City team featuring Nasri, Toure, Johnson, Dzeko and Aguero won with a single shot on target. Of course Balotelli was the big news on the City team sheet today, but would it be the Balotelli which appeared against Manchester United or Liverpool?

The opening phase of the game was intriguing for three aspects. Manchester City were pushing up when Szczesny had the ball, not allowing Arsenal to build from the back and instead forcing him to kick long towards van Persie. At the other end, Arsenal were intent on closing down City in possession and denying them space and time on the ball. Finally Walcott started on the left instead of his more usual right hand side, perhaps to scare Richards into not pushing up the field as often as he would like.

The first real chance came to City on 10 minutes, as they released Zabaleta down the left in behind Djourou much too easily. Zabaleta’s cross was controlled well by Aguero but he blasted his shot wildly over. Szczesny was called into action shortly afterwards to make a comfortable save as City began to settle and test Arsenal’s defensive solidarity which was looking suspect in these early moments.

Arsenal weathered this early storm though and began to create chances themselves. Gervinho was released through the inside right channel to test Hart, who saw the ball squirm under his body and trickle wide, then from the resulting corners Ramsey forced another palm around the post from Hart before Vermaelen got in a weak header. Possession was evenly matched in the opening 20 minutes, although Arsenal had the better of the territory as most of Manchester City’s possession was in their own half while Arsenal were able to retain the ball further up the field at the far edge of the centre circle.

20111218-163524.jpg

That chalkboard does reveal Manchester City’s focus on the wings, particularly in trying to use Zabaleta on the overlap around Nasri to test Djourou, and Walcott’s tracking back now that he had switched back to his more usual flank.

The game was certainly open, with even Koscielny driving forward from defence to join a five-on-three which led to a last ditch tackle by Kolo Toure on Ramsey. With four centrebacks on the pitch it wasn’t as if he was lacking cover. At the other end Aguero was playing really well in the buildup but his finishing was leaving a lot to be desired. Something else leaving a lot to be desired in the first half was Walcott, who was having minimal impact.

Early in the second half Djourou was subbed for Miquel due to what looked like an injury, but it did mean shifting Koscielny to right back where City had looked most dangerous, and Vermaelen moving into his best position in the centre. Almost immediately Balotelli was released down the left in behind, forced a save from Szczesny and Vermaelen could prevent Aguero from following up but not Silva, who was able to open the scoring from two yards. The reshuffle in the Arsenal defence seemed to cause enough confusion in the buildup to the goal to allow City in.

The goal seemed to spark Arsenal into life, with Walcott drawing a good save from Hart and van Persie putting the ball into the back of the net from a borderline offside position. Suddenly the game was open, there was no suggestion City were going to sit on this lead, and Arsenal would of course go all out for an equaliser. City had a glorious chance to make it 2-0 as Nasri was released in behind, but he overhit his pass with two City players waiting for a tap in. Then Zabaleta hit the post as Koscielny was too eager to break and gifted him the ball. Another borderline offside decision went against Arsenal as van Persie drew a great save from Hart.

It was frenetic stuff as all tactics seemed to have gone out the window. Both teams were causing problems and both defences were under pressure. Arsenal brought on Arshavin for Walcott who had been quiet apart from one good attempt. With 20 minutes to go the outcome of the match was still well and truly in the balance. Mindful of this, Mancini withdrew Balotelli for Milner to try to shore up the City midfield. This seemed to put the ball in Arsenal’s court as they seized possession and drew yet another debatable decision from the officials as Richards appeared to handle the ball in his own penalty area. Prior to this Arshavin proved he could do just as well as Nasri by also fluffing a simple cross to two Arsenal players waiting for a tap in.

Arsenal then brought on Chamakh for Mertesacker to search for an equaliser, raising the question of if simply pushing Mertesacker up front might have been more of a threat given Chamakh’s recent form. Vermaelen drew another fine save out of Hart, then curled yet another attempt just wide, but Arsenal were unable to find an equaliser.

An entertaining game in which both sides had plenty of chances and the result could have gone either way. In the end a defensive shuffle caused enough momentary confusion to allow Manchester City to take the three points, but perhaps the two men of the match were the two goalkeepers as the scores would have been much higher without their excellent displays.

Wigan Athletic 1-1 Chelsea – Goal Analysis

0-1 Sturridge

Cole has plenty of time to bring the ball forward from left back and get his head up, eventually dropping a perfectly weighted ball over the top for Sturridge to run on to. Sturridge has taken up a position between centreback and fullback, and Figueroa is slow to push up in line with the rest of his back line as the ball is played. He is also square on to the play, seemingly unaware of Sturridge who has already started his run in behind.

As Sturridge shoots from a narrow angle, Al-Habsi makes a critical error; instead of trying to block the ball with his right foot, which would have been easy as the ball passes right next to it, he instead attempts to dive down and use his right hand which means he can only dive over the top of the ball and concede.

1-1 Gomez

At 1-0 down in the 88th minute, still Wigan look to play their way up the field instead of reverting to low percentage long balls, and they are rewarded for it. Di Santo is released on the left, and he cuts inside Ivanovic and plays a perfectly weighted reverse throughball to Rodallega which cuts out Bosingwa.

Rodallega can only toe-poke it towards goal as Terry comes out to close him down, but that’s enough to cause confusion as Cole creates the latest Koscielny-Szezesny or Collins-Given confusion, putting Cech off and the ball runs to Gomez to tap in. In these situations the keeper has to second-guess whether or not the defender is going to get a touch on the ball, which makes it much more difficult to deal with than it looks.

Terry and Bosingwa can only Morris Dance in disbelief in the background of the image.

Blackburn Rovers 1-2 West Bromwich Albion – Goal Analysis

0-1 Morrison

The ball is cleared to the edge of the box, where Morrison produces a stunning volley on the turn, managing to keep it down and beat a packed penalty and Robinson’s dive. Phenomenal technique.

1-1 Dann

You couldn’t possibly come up with a more stereotypical Blackburn goal for the post-Mark Hughes era. A freekick in their own half is lumped forward by Robinson towards a giant centreback, who nods it down for his defensive partner to score. It’s not as if West Brom would have been expecting anything different, so perhaps they could have started further up the pitch to give Foster space to come and collect the ball? It’s impossible to be 100% certain of the offside call from this oblique angle but for my money Dann looks level with the ball.

1-2 Odemwingie

As Odemwingie picks the ball up on the right wing, Blackburn have two players ready to deal with him, and as is standard in this position you show him wide, prevent the cross and make him play the ball back upfield. Right?

Not at Blackburn. It turns out you just let him run between you both without even attempting a tackle and let him get in to the penalty area.

Although to be fair, once he gets in to this position Odemwingie pulls out a fantastic shot, bending the ball around Dann and into the far corner with pace.

Wolverhampton Wanderers 1-2 Stoke City – Goal Analysis

1-0 Hunt (pen.)

Unfortunately for Stoke, Chris Foy wasn’t refereeing this game so this was one blatant penalty decision they weren’t going to get away with. It should have been a yellow card though (and Woodgate’s second inside the first 17 minutes), so their luck with decisions continues.

1-1 Huth

Huth blasts a freekick and it takes a slight deflection off Doyle in the wall, wrong-footing Hennessey. Some people are claiming this should go down as a Doyle own goal. Those people are idiots. Others are claiming that it shouldn’t have been a freekick in the first place. Those people are also idiots. A scissor tackle from behind is a freekick all day long.

1-2 Crouch

No real pressure on the cross from the wing, even though Wolves have two-on-two defensively. Crouch pulls away at the back post and when there’s no pressure on the crossing player, he has time to get his head up and see the movement, which is what Etherington does here.

He plays the ball into what I’m contractually obliged to call the corridor of uncertainty, and Crouch only has to worry about not heading the post as he scores from a yard.

Fulham 2-0 Bolton Wanderers – Goal Analysis

1-0 Dempsey

Ruiz picks the ball up on the right and isn’t closed down particularly quickly by Alonso. This gives him space to whip the ball in towards the back post, where Dempsey is more alive than Boyata to the threat and gets ahead of him to glance a great header in at the far post.

2-0 Ruiz

As the ball is played in to Dempsey, both Bolton centrebacks charge up the pitch to close him down, leaving a huge gap through the centre of defence, which both Dempsey and Ruiz realise immediately.

Once in, Ruiz pulls out a brilliant scoop to lift it over Jaaskelainen.

Everton 1-1 Norwich City – Goal Analysis

0-1 Holt

Holt receives the ball in the penalty area with his back to goal, and Heitinga gets way too tight to him. A more flimsy player might have already fallen over to win a penalty by this point.

Then, as Hibbert approaches, Holt does a cheeky dragback to turn away from both.

Holt immediately shoots on the turn and the ball goes in off the far post past Baines on the line. If Messi had scored this goal, etc…

1-1 Osman

Hibbert’s overlapping run makes this goal, as it creates a huge space through the left side of Norwich’s defence. Drenthe uses the overlap to drive infield behind Johnson.

Drenthe shoots and it looks like Ruddy has it covered, but Osman very deliberately steers it past the Norwich keeper and into the near post. Lightning reflexes.