Wednesday, January 30, 2013

Data Bingo! Oh no!

Oh boy - look what a data hunter has dragged in this time! Why is this problem so common? And who on earth is Bonferroni?

Our friend here found one "statistically significant" result when he looked at goodness knows how many differences between groups of people. He's fallen totally for a statistical illusion that's a hazard of 'multiple testing'. And a lot of headline writers and readers will fall for it, too.

Then he's made it worse by taking his unproven hypothesis (that a particular drink on a particular day in a particular group of people prevented stroke) and whacking on another unproven hypothesis (that if everyone else drinks lots of it, benefits will ensue). But it's the problem of multiple testing (also called multiplicity) where Bonferroni comes in.

It's pretty much inevitable that multiple testing will churn out some some totally random, unreliable answers.

A "statistically significant" difference isn't proof that the difference is a "real" one that could hold true for others. But it estimates that the probability of finding a difference roughly like this in this data if it's not real is less than a 5/100 or 5% (a "p" value of less than 0.05).

If you test for multiple possibilities, you need to expect even your statistically significant "findings" to be freak occurrences on average 5 times out of a 100 (or 1 in 20 findings). If you test only a few things, your chances of this kind of random error is very low.

But especially if you have a big dataset, the more things you look at, the higher the chance is that you'll drag total nonsense out. With high-powered computers crunching big data, this becomes a big problem - large numbers of spurious findings that can't be replicated.

Carlo Bonferroni (1892-1960) was an Italian mathematician. His name graces some statistical tests used to interpret results when doing multiple tests. But the multiple testing methods with his name that we use today were developed by Olive Jean Dunn, in papers she published in 1959 and 1961 [PDF].

There are other ways of approaching these problems. Some are concerned that techniques based on the Bonferroni correction are too conservative - too likely to throw the baby out with the water, if you like. So they use measures that have a different basis, such as the False Discovery Rate (FDR) [PDF].

Statistical tests can't totally eliminate the chance of random error, though. So you usually need more than just a single possibly random test result to be sure about something.

Getting more technical...

What about multiplicity issues in systematic reviews? As the Cochrane Handbook (section 16.7.2) points out, systematic reviews concentrate on estimating pre-specified effects - not searching for possible effects. Safeguards still matter, though. Even pre-specified analyses need to be kept to a minimum. And how many analyses were done needs to be kept in mind when interpreting results.

If you would like to read more technical information about multiple testing, here are some free slides from the University of Washington. And if you want to read more about the controversies and issues, here's a primer in Nature and an article in the Journal of Clinical Epidemiology (behind paywalls).

Update on 20 March 2016 added Olive Jean Dunn (including creating a biography page for her on Wikipedia) and refined the description of statistical significance - thanks to the feedback from an anonymous commenter.

Saturday, January 26, 2013

Newsflash: Honking causes cancer

In The Emperor of all Maladies, author Siddhartha Mukherjee describes a type of cancer as "terrifying to experience, terrifying to observe and terrifying to treat."

Somehow, though, in our efforts to stem the tide of the disease and our dread of it, we can end up making things worse for many people. The shadow of cancer angst is spreading much further than it needs to go.

We're struggling, as a culture, with the consequences of the over- and mis-use of associations from epidemiological data about cancer risks. The imposition of risk awareness has been called a form of cultural imperialism. Cancer awareness-raising continues relentlessly, though - even in cases where a community's problem has become over-estimation of risk, not a lack of awareness.

This week, Jeff Niederdeppe and I will be co-moderating a discussion for science writers and researchers on these issues in the Covering cancer causes, prevention and screening session at Science Online. Come along, or follow/share thoughts and resources at the Scio13 wiki or via Twitter: #Scio13  #SciCancer

Want to increase your skills at picking out the important signals from all the noise? There's a collection of (free) important books and articles at PubMed Health that could help.

Saturday, January 19, 2013

Fright night in the doctors' lounge

It doesn't come as a terrible shock to hear that a lot of patients struggle with statistics. It's a little more scary, though, to be reminded that doctors' understanding of health statistics and data on screening isn't all that fabulous either. And now this month we hear that "a considerable proportion of researchers" don't understand routinely used statistical terms in systematic reviews. Gulp.

We've probably only been scratching the surface of what can be done to improve this. A recent small trial found that hyperlinking explanations to statistical and methodological terms in journal articles could improve physicians' understanding.

Statistical literacy needs a combination of literacy, mathematical, and critical skills (PDF). In communication, numbers will always be tangled up with words (and sometimes words are better, as I discuss here).

Journalists are key to helping turn this problem around. They probably aren't getting the training they need, according to this study from 2010 - but that might be improving...slowly. Thankfully, Frank Swain from the Royal Statistical Society reports encouragingly on journalists' desire to learn more about statistics in the era of data journalism.

Want to learn more about basic statistics in health studies? Trials show that reading this book, Know Your Chances, could help.

And if you're wondering about how your own mathematics competency is faring since you left school, here's an online test. Mind you, it would help a lot if we had a clearer way of communicating numbers. The confusion over what means mean is a good case in point, covered here at Statistically Funny.

Another study on doctors' understanding and communication of data on the potential benefits and harms of treatment - published in August 2016.

This post was updated on 30 January 2016: the original shorter post was written when Evelyn Lamb and I were co-moderating a session at Science Online.

Additional study on 3 September 2016.