Sunday, November 29, 2015

More Than Average Confusion About What Mean Means Mean

She's right: on average, when people talk about "average" for a number, they mean the mean.

The mean is the number we're talking about when we "even out" a bunch of numbers into a single number: 2 + 3 + 4 equals 9. Divide that total by 3 - the number of numbers in that set - and you get the mean: 3.

But then you hear people make that joke about "almost half the people being below average" - and that's not the mean any more. That's a different average. It's the median - the number in the middle. It comes from the Latin word for "in the middle", just like the word medium. That's why we call the line that runs down the middle of a road the median strip, too.

If the numbers in a group are all pretty close to each other - like our example here, or, say, the ages of everyone in a class at school - then there's not much difference between the mean and median.

But if the numbers in a group are wildly far apart - the ages of the people who like Star Wars movies, for example, or whose favorite singer is Frank Sinatra - then it can make a very big difference. Even if Strangers In The Night had enough of a resurgence to drag the average age of Ol' Blue Eyes listeners down, the big Sinatra fan base would still skew older!

How far apart numbers in a dataset are spread from each other is called variance: if the numbers bunch up in the middle, the variance is small. And understanding or dealing with variance is where we start to head in the direction of, well, sort of means of means.

The distance of a piece of data from the group's mean is a great standard way to measure the spread. This is called the deviation from the mean. A measure called the standard deviation from the mean will be bigger when the numbers are more spread out. Lots of results will cluster within 1 standard deviation (SD), and most will be within 2 standard deviations. Roughly like this:

From here, it's a hop, skip to another calculation based on the mean that you often come across in health studies. It's a way to standardize the differences in means (average results) called the standardized mean difference (SMD).

The SMD needs to be used when outcomes have been measured in similar, but different, ways in groups that researchers are comparing.

For example, there are several scales used to measure fatigue in people with cancer. When researchers wanted to find out whether exercise reduces or increases fatigue for people with cancer, the clinical trials of exercise they found used different scales to measure fatigue.

To get a perspective on the results of these trials, the SMD gave them the tool they needed to standardize the result from each trial. Having one standard way of seeing whether fatigue went up or down, meant the study results could be combined and compared. (The answer? Exercise reduces fatigue in people with cancer.)

There's a lot you can make sense of when you know what the means mean!

The SMD is calculated by dividing the differences in the mean in two groups by standard deviations. You can read more about the SMD in meta-analyses here, and more on standard deviations here at Statistically Funny.

Feel like testing your knowledge of the mean, median, and mode? (The mode is the number in a set that occurs the most often: so if our example had been 2 + 3 + 4 + 4, then the mode would have been 4.) Try the Khan Academy quiz.

Interested in the ancient roots of averages? Examples from Herodotus, Thucydides, and in Homer here.