Sunday, November 17, 2013

Does it work? Beware of the too-simple answer

Leonard is so lucky! He's just asked a very complicated question and he's not getting an over-confident and misleading answer. Granted, he was likely hoping for an easier one! But let's dive into it.

"Does": that verb packs a punch. How do we know whether something does or doesn't work?   It would be great if that were simple, but unfortunately it's not.

I talk a lot here at Statistically Funny about the need for trials and systematic reviews of them to help us find the answers to these questions. But whether we're talking about trials or other forms of research, statistical techniques are needed to help make sense of what emerges from a study.

Too often, this aspect of research is going to lead us down a garden path. It's common for people to take the approach of relying only, or largely, on testing for statistical significance. People often assume this means it's categorical proof of whether or not something was a coincidence.

However a statistically significant result - especially from a single study - is often misunderstood and contributes to over-confidence about what we know. It's not a magical wand that finds out the truth. The numbers alone cannot possibly "know" that. Here's my quick overview: 5 tips to avoid getting this wrong. I wrote about testing for statistical significance in some detail at Absolutely Maybe. Leonard's statistician is a Bayesian: you can find out some more about that, too, in that post.

As chance would have it, there was also a lot of discussion this week in response to a paper published while I was writing that post. It called for a tightening of the threshold for significance, which isn't really the answer either. Thomas Lumley puts that into great perspective over at his wonderful blog, Biased and Inefficient: a very valuable read.

"It": now that part of our question should be easy, right? Actually, this can be particularly tricky. The treatment you could be using may not be very much like the one that was studied. Even if it's a prescription drug, the dose or regimen you're facing might not be the same as the one used in studies. Or it might be used with another intervention that could affect how it works.

Then there's the question of whether "it" is even what it says it is. Unlike prescription drugs, for example, the contents of herbal remedies and dietary supplements aren't closely regulated to ensure that what it says on the label is what's inside. That was also recently in the news, and covered in detail here by Emily Willingham.

If it's a non-drug intervention, it's actually highly likely that the articles and other reports of the research don't ever make clear exactly what "it" is. Paul Glasziou had a brainwave about this: he's started HANDI: the Handbook of Non-Drug Intervention. When a systematic reviews shows that something works, the HANDI team wants to dig out all the details and make sure we all know exactly what "it" is.

For example, if you heard that drinking water before meals can help you lose weight, and you want to try it, HANDI helpfully points out what that actually means is drinking half a liter of water before every meal AND having a low-calorie diet.

"Work": this one really needs to get specific. You really need to be thinking about each possible outcome separately - the evidence is going to vary in quality and quantity from one outcome to another. I explain this in another post here.

Think of it this way: if you do a survey with 150 questions in it, there are going to be more answers to some of the questions than others. For example, if you had 400 survey respondents, they might all have answered the first easy question and there could be virtually no answers to a hard question near the end. So thinking "a survey of 400 people found…" an answer to that later question is going to be seriously misleading.

On top of that, you have to take the possible adverse effects into account, too. There can be complicated trade-offs between effects. And how much does it work for a particular outcome? Does a sliver of a benefit count to you as "working"? That might be enough for the person answering your question, but it might not be enough for it to count for you - especially if there are risks, costs or inconvenience involved.

And who did it work for in the research? Whether or not research results apply to a person in your situation can be straightforward, but it might not be.

Then there's the question of how high did researchers set the bar? Did the treatment effect have to be superior to doing nothing, or doing something else - or is the information coming from comparing it to something else that itself may not be all that effective? You might think that can't possibly happen, but it does more often than you might think. You can find out about this here at Statistically Funny, where I tackle the issue of drugs that are "no worse (more or less)." 

Finally, one of the most common trip-ups of all: did they really measure the outcome, or a proxy for it? If it's a proxy for the real thing, how good a proxy is it? The use of surrogate measures or biomarkers is increasing fast: more about that in this post.

So while there are many who might have told Leonard, "Yes, it's been proven to work in clinical trials" in a few seconds flat, I wonder how long it would take his statistician to answer the question? There are no stupid questions, but beware of the too-simple answer.


  1. One other point, if the effects are averaged effective responses and non-effective responses may cancel each other out. Heterogeneity. And your effect may not be the average effect. Kravitz did a fabulous article on this some time back.

  2. Thank you Hilda. I have added your cartoon on Flickr and plan to help you spread the word. Kind regards