Sunday, November 29, 2015

More Than Average Confusion About What Mean Means Mean

Cartoon about what people mean when they say average

She's right: on average, when people talk about "average" for a number, they mean the mean.

The mean is the number we're talking about when we "even out" a bunch of numbers into a single number: 2 + 3 + 4 equals 9. Divide that total by 3 - the number of numbers in that set - and you get the mean: 3.

But then you hear people make that joke about "almost half the people being below average" - and that's not the mean any more. That's a different average. It's the median - the number in the middle. It comes from the Latin word for "in the middle", just like the word medium. That's why we call the line that runs down the middle of a road the median strip, too.

If the numbers in a group are all pretty close to each other - like our example here, or, say, the ages of everyone in a class at school - then there's not much difference between the mean and median.

But if the numbers in a group are wildly far apart - the ages of the people who like Star Wars movies, for example, or whose favorite singer is Frank Sinatra - then it can make a very big difference. Even if Strangers In The Night had enough of a resurgence to drag the average age of Ol' Blue Eyes listeners down, the big Sinatra fan base would still skew older!

How far apart numbers in a dataset are spread from each other is called variance: if the numbers bunch up in the middle, the variance is small. And understanding or dealing with variance is where we start to head in the direction of, well, sort of means of means.

The distance of a piece of data from the group's mean is a great standard way to measure the spread. This is called the deviation from the mean. A measure called the standard deviation from the mean will be bigger when the numbers are more spread out. Lots of results will cluster within 1 standard deviation (SD), and most will be within 2 standard deviations. Roughly like this:

From here, it's a hop, skip to another calculation based on the mean that you often come across in health studies. It's a way to standardize the differences in means (average results) called the standardized mean difference (SMD).

The SMD needs to be used when outcomes have been measured in similar, but different, ways in groups that researchers are comparing.

There's a lot you can make sense of when you know what the means mean!

The SMD is calculated by dividing the differences in the mean in two groups by standard deviations. You can read more on standard deviations here at Statistically Funny.

Feel like testing your knowledge of the mean, median, and mode? (The mode is the number in a set that occurs the most often: so if our example had been 2 + 3 + 4 + 4, then the mode would have been 4.) Try the Khan Academy quiz.

Interested in the ancient roots of averages? Examples from Herodotus, Thucydides, and in Homer here (very academic).

Note: Edited to address broken links, on November 6, 2022.

Hilda Bastian