Sunday, February 17, 2013

You will meet too much false precision

Precise numbers and claims - as though there is no margin for error - are all around us. When someone tells you that 54.3% of people with some disease will have a particular outcome, they're basically predicting the future of all groups of people based on what happened to another group of people in the past. Well, what are the chances of exactly that always happening, eh?

If our fortune teller was quoting the mean of a study here, it could be written like this: 67.5% (95% CI: 62%-73%). The CI stands for "confidence interval" and it gives you an idea of how much imprecision or uncertainty there is around the estimate. The confidence level - 95% here, which is common - is chosen when a confidence is calculated. The 95% level means the significance level is at 0.05 (or 5%) - more about that here. It has set the level of uncertainty being measured - how probable it is, that roughly that result would occur.

The chances of the result always being precisely 67.5% can be pretty slim or very high, depending on lots of things. If there is a lot of data, the confidence interval will be narrow: the best case scenario and the worst case scenario will be close together (say, 66% to 69%).

We give ranges for estimates all the time. If someone asks, "How long does it take to get to your house?", we don't say "39.35 minutes". We say, "Usually about half an hour to 45 minutes, depending on the traffic."

In a systematic review, you will often see an outcome of an individual study shown as a line. The length of that line is showing you the width of the confidence interval around the result. It looks something like this:

This is called a forest plot. Find more from Statistically Funny on this in The Forest Plot Trilogy.

What a confidence interval isn't: it doesn't mean that 95% of people's outcomes will be between those upper and lower boundaries. It's where the mean is expected to be likely to fall (or median, or whatever other statistic is being measured).

If the statistical estimates are made with Bayesian methods, the range you will see around an estimate isn't a confidence interval: it's a credible interval. I explain a bit about Bayesian statistics in this post. Unlike a confidence interval, a credible interval has incorporated extra data about the probability of the result falling inside the interval.

Update [4 June 2016]: The American Statistical Association (ASA) issued a statement encouraging people to consider estimates like confidence intervals instead of only looking at p-values and statistical significance. I've written an explanation of that in this post: 5 Tips for Avoiding P-Value Potholes.


  1. It's funny how I understand CI much better than in my statistics class 4 years ago. I think there may be more to it than what's written here, but this is a good start to review my statistics foundations.

    Thank you, Hilda. I love your writings.