I’ve received a lot of feedback on the Uncertainty of Statistics article from a few days ago – thank you very much for your comments. Here are a couple of additional points that I probably should have made clearer in the original article:
The statistics themselves are not the issue
As I said in my opening paragraph, I am not calling the accuracy of the statistics provided by Opta into question at all, regarding Szczesny, Friedel or or any other statistic they provide. Those guys are fantastic.
The statistic chosen for the article is illustrative only
Szczesny vs Friedel is not the issue, it is just an example which I used to illustrate my point. Here’s another example: pass completion percentages. These are often produced after matches, particularly for midfielders, and it is commonly accepted that a higher percentage means a better performance, evidence that a particular player “ran the game”. So here are two identical stats from two different players on 21st December:
Both players completed 67 passes from 74 attempts, for a pass completion percentage of 90.5%. But even a cursory glance at the image shows that Carrick completed his passes much further from goal and much more frequently with his defence compared to Silva, whose passes were often attempted in the opponent’s half. This is the context which suggests that Silva was generally attempting more difficult passes than Carrick in these matches. This suggestion is partly backed up by the fact that Silva created two chances to Carrick’s one. The point is, given the context of the passes attempted by each player, I would expect Silva’s pass completion percentage to be lower than Carrick’s, but that is not necessarily a bad thing as I’d expect Silva’s passes to be more likely to lead to goal scoring opportunities than Carrick’s. A simple 90.5% pass completion percentage conceals all of this context, so it must be used in concert with other stats such as the location of each pass and the number of chances created to add the context which makes it more meaningful.
The original conclusion may indeed have been correct
I argued in the article that it is impossible to compare goalkeepers by their shots saved statistics alone. It may well be the case that Friedel is 18 percentage points better at saving shots than Szczesny i.e. the original conclusion is valid once shots have been controlled for. My point is that other conclusions could also be validly drawn from that same statistic; for example that Arsenal’s defence allow better goalscoring chances than Tottenham’s, meaning that Szczesny faces a higher proportion of shots which are better goalscoring chances (and therefore less likely to be saved by the goalkeeper) than Friedel. Without considering the context, it’s impossible to determine which conclusion is more likely to be correct. That is the point of the article.
Site Twitter: @footballistix | Author Twitter: @footballistnick
Chalkboards provided by the brilliant Stats Zone iPhone app.