The mean, median and mode are 3 useful measures that tell us information about a data set. The mean and the median tell us about the ‘central’ part of the data, and the mode tells us about the most commonly occurring value in the data.

Sponsored Links

However, a typical data set also has other characteristics. For instance, look at these two simple data sets:

Data set 1 |
Data set 2 |

0, 5, 10, 15, 15, 20, 25, 30 |
12, 13, 14, 15, 15, 16, 17, 18 |

The mean of both data sets is 15 (check it if you don’t
believe me). The median of both sets is 15. The mode of both sets is 15 as
well. But if we just look at the data sets ourselves, we can see that they are
very different, even though they have the same mean, median and mode. Data set
1 is a lot more *spread out* than data set 2. To help describe how much
the values in a data set are spread out, there are some *measures of spread *we
can use.

### Range

The range is a very easy measure of spread to understand – it’s the difference between the smallest value and the largest value. For our sample data sets, the range can be calculated like this:

_{}

As you can see, they have very different ranges.

### Quarters and quartiles

You can split the data up into *quarters* by arranging
it in order of value, and then dividing it up into *four* equally sized
groups. For instance, the first data set could be divided up into quarters
this way:

*Quartiles* are different to quarters. Quartiles are
the values *between* the quarters. There are two commonly talked about
quartiles, the *upper quartile* and the *lower quartile*. The lower
quartile is the value *one quarter* of the way up the values, and the
upper quartile is the value *three quarters* of the way up the values.
The value *one half* of the way up the values is just the median, which
we’ve looked at before.

Because we have 8 values, there is no value exactly one
quarter of the way up the values. So to work out the lower quartile, we need
to take the *average* of the values just below a quarter of the way up (5)
and just above a quarter of the way up (10). This gives us a lower quartile
value of 7.5. The symbol for the lower quartile is often written as Q_{1}.

Same for the upper quartile – we have no value exactly
three quarters of the way up, so we’re going to have to take an average. The
two values are 20 and 25, so the upper quartile has a value of 22.5. The
symbol for the upper quartile is often written Q_{3}. What about Q_{2}
you may ask. Well, Q_{2} is the symbol for the *median* value.

### Interquartile range

The interquartile range is the difference between the upper quartile and the lower quartile. For our example, this is:

_{}

### Deviations

The word deviate means to stray or to differ – in a mathematical sense the word ‘deviation’ describes how much the values in a data set differ from the ‘central’ value.

### Mean deviation

The *mean deviation* is the average difference between
the values in the data set and the mean of the entire data set. So say we were
finding the mean for data set 1. We’d need to follow this procedure:

· Find the mean of the entire data set

·
Find the difference between every value and the data set mean, as
a *positive number*.

· Add up all these differences and then divide by the number of values

_{}

Once we’ve found the mean of the entire data set, we need
to find all the differences. Whether the value is larger or smaller than the
mean, we need to give the difference as a positive number. Think about it
this way – we don’t care that much whether a value is above or below the mean,
we just care how *far away* it is from the mean. We can always have a
positive number by picking the larger of the mean and the value, and
subtracting the other number from it. For instance, for the value ‘0’, the
difference from the mean can be found like this:

_{}

Value |
Difference from Mean |

0 |
15 |

5 |
10 |

10 |
5 |

15 |
0 |

15 |
0 |

20 |
5 |

25 |
10 |

30 |
15 |

Last step is to add all these differences up and divide by the number of values, which is 8:

_{}

So which is a better way to indicate how spread out data
is? Well, the range is quick to calculate, but the mean deviation is a more *robust*
measure. By robust, we mean that it is not too affected by one single very
large or very small value. For instance, take the following data set:

10, 11, 12, 243

The range for this data set is 233. The mean deviation is 87. The mean deviation is not so affected by the single large value, and hence is smaller than the range. This is generally a good thing.