The average of a data set, formally called the mean, expresses the
typical value in that data set. It is calculated by adding up all of
the values, and dividing the sum by the size of the data set. For
example, four friends take a quiz. Alice and Bob each get 3 questions
right, Carol gets 4 right, and Dan gets 5 right. To calculate the mean,
do the following:
sum = 3 + 3 + 4 + 5 = 15
size = 4
mean = 15 ÷ 4 = 3.75
In the example above, 3.75 expresses the typical quiz score. Note
that none of the friends actually scored a 3.75, which is not even
possible since a quiz score must be a whole number. The mean
represents how people scored on the quiz overall.
Now let's say you want to get a general idea of how much money a group
of friends have. Alice and Bob each have a net worth of $150,000, Carol
is worth $75,000, Dan is worth $125,000, and Warren is worth $60
million. The mean is $12.1 million, but that implies that each friend
is very wealthy, when really most of them have a modest net worth while
one is extremely wealthy. For data sets that have extreme outliers, it
is better to calculate the median, which is the central value in that
data set. It is calculated by sorting the values, and taking the middle
value:
sorted = $75K, $125K, $150K, $150K, $60M
median = $150K
In the example above, $150,000 is the central value, which is a better
representation of how much money the group of friends have. Even if
Warren was worth $600 million, the median would remain the same. The
median is not skewed by extreme outliers.