Lies, damn lies and statistics |
May 28, 2005 |
That old yarn. I've heard it often enough, and seen it used in debates, arguments and as the first-line defense of those who argue points that are statistically hard to defend. Most recently, the BBC has published an article that has a second-to-last paragraph that particualrly annoyed me.
It is of course, possible to prove almost anything with careful manipulation of statistics.
Well, that's just bollocks.
It's possible to state anything with statistics. It is not possible to conclusively prove anything with statistics. This is precisely the sort of inaccurate statement that a statistician will get irate about. I'm not a statistician, in my case my ire is fuelled by alcohol.
As with most things, it's a problem of definitions. 'Statistics' is a heavily overused word (as is 'prove'). On the one hand, any accurately quoted number can be a 'statistic', but a 'statistical analysis' must involve certain processes that define the measure of accuracy of the analysis.
In other words, using a statistical analysis, I can state that there's a 99% chance that the average value for a statistic is between x and y. Based on a proper analysis, and based on a number of easily defended principles (such as the central limit theorem), these statements stand up to infinite scrutiny. Of course, then you get the individuals determined to believe in the 1% chance, but when pushed they wouldn't bet money on that same 1%.
The BBC article in question quotes an analysis from a researcher from the University of Aberystwyth (which has a 99.9999999% chance of being in Wales), from a researcher called 'Janet Jones' (who has a slighly lower than 99.9999999% chance of being female). The BBC paraphrases the study like this...
She told BBC News: "This means that over the course of the run, most respondents voted approximately ten times.
"A significant number voted up to 50 times and a minority voted from 50 to 100 times.
"And a few sorry individuals probably fuelled their addiction by voting well over 100 times."
Her research was based on 12,339 responses to a questionnaire on the Big Brother web site.
This is understandable.
But in so doing, they've removed confidence intervals, standard deviations and other rigorous measures of statistics with words like somewhat, probably and approximately. Since very few people will ever read the article this news item was taken from, and still fewer will have the requisite training to understand the information presented therein, the best anyone really has to go on is this populist reinterpretation of the underlying hard reality of statistics. Because it's presented in a blurry way, they feel justified in using the phrase, 'lies, damn lies and statistics', and fail to elevate themselves from the level of ignorance they started at.
We're left with the last two paragraphs of the BBC, which really don't help...
It is of course, possible to prove almost anything with careful manipulation of statistics.
This is only true if you're talking to people you've already brainwashed into thinking statistics are made up. They don't deserve the truth, they wouldn't believe it anyway.But next time a politician appears on television claiming Big Brother is more popular than the general election among young voters, it is worth bearing in mind that they are almost certainly wrong.
While this may be true, nothing in the article explains why it might be true, and if politicians understood a few concepts like statistics, economics or finance, we'd all be much better off.