Quantification in science is a good thing, so long as you’re not counting rubbish.
The physicist Richard Feynman had a great anecdote that ought to be put on the wall of every lab and statistics classroom.
It concerned a man who wanted to know exactly how tall the emperor of China was. To find out, he went around China asking everyone he met for the height of the emperor. Having obtained several thousand answers, he went away and analysed his data. He plotted the distribution, calculated the mean, excluded outliers, and – just to show how rigorous he was – worked out the confidence intervals too. Just so people could see how thorough he’d been, he published it alongside his (vast) sample size.
The problem? It was totally inaccurate, because none of the people he’d asked had ever seen the emperor.
In today’s science, when Big Data has become a specialisation, if not a field, in its own right, and when computational power has made quantification, processing, and analysis semi- if not wholly automated, it’s a a caveat that should be reiterated more than ever. Quantification is an essential feature of good science, but we need to be wary that we’re not quantifying crap.
Quantification, used correctly, is always a good thing because it reduces the “trust me” element in a paper. When only a single “representative” example is shown – invariably “exemplary” rather than truly “representative” – the reader basically has no choice but to take the authors’ words at face value. But when a single example is shown alongside a quantification of a larger collection of similar data (ideally obtained using multiple biological replicates and multiple independent experiments), it allows a dispassionate observer to take in the distribution and independently draw their own conclusions – which hopefully are the same as the authors’.
Quantification makes strong observations robust, and makes weak effects believable (such as the incremental changes that might occur as a population evolves).
But quantifying low quality data gives a sheen of respectability to results that don’t deserve it. Even worse, quantifying poor data can potentially be misleading, fooling scientists that they’ve discovered a genuine effect when there’s nothing there. One should always be wary of figures that don’t show what was being quantified, and apply the same level of scrutiny that direct readouts like blots, images, and so on receive. Nowadays we’re automatically sceptical of figures that rely on those single “representative” examples, but we should be equally cautious of any data summary that gives no indication of the inputs. Just because a computer says something doesn’t mean it’s automatically correct.
The widespread availability of computational tools has facilitated the application of statistics to biological data, but this has not gone hand in hand with a strong baseline increase in statistical knowledge. Too many people still think that statistical significance means a result is real. Statistical significance can arise from a number of sources, including having an insufficient sample size. Another widespread error is applying significance tests without knowing under what circumstances those tests are valid. Peddling p values as an indication of veracity is often an indication of blind faith in the quality of the data (or, more cynically, a lack of confidence), when a better approach would be to attempt to draw the same conclusions using a different approach.
A wonderful piece of homespun wisdom is that “you can’t polish a turd”. If something’s crap, you’ve little hope of making it better through cosmetics. Extending the metaphor, possibly too graphically (ha ha), getting loads of turds and counting them, and then putting them into a shiny and tastefully coloured box marked n=200 and p<0.0001 doesn’t stop them being turds. It is not possible to carry out meaningful statistical analysis of data that is fundamentally inaccurate.
Scientists, biologists particularly, worry that they’re rubbish at counting, but it’s worse to be counting rubbish.