starting-statistics_6
starting-statistics_6
Standardising Numbers
Chapter Overview
Recall that the term standardisation refers to changing original values to make them easier to understand and
compare. Chapter 5 looked at ways to standardise categorical data. This chapter focuses on standardising
numerical data. Measuring with numbers usually means that there are many more unique values than occur
with categorical variables, where you place the units of analysis into only a handful of categories. In addition,
you can manipulate numbers in ways that are impossible with named categories. The chapter starts with
simplifying original values, then recaps and develops some earlier comments about using ranks, and finally
looks at two more specialised techniques, standard scores and indexes.
This section looks at two often interrelated techniques, conversion of units and rounding. The basic aim is to
make complex numbers easier to understand, use, and remember, ideally by standardising them so that they
come within the range of 1 to 100. The downside is that each value becomes less precise, though often this
may not matter as we usually need a headline figure (e.g. ‘Man wins $1 million’) rather than an exact figure
(e.g. A man wins $1,020,407).
The Number column of Table 6.1 shows the number of barrels of oil consumed every day in different parts
of the world. These values are too large to mean very much to the average reader. To reduce the size of the
numbers, the next column shows oil consumption in thousands of barrels. This helps, but the numbers are
still too large, and so the next column shows millions of barrels. This is better, the values ranging from about
20 to less than 1. However, each value has far too many digits for most people to deal with comfortably. As
this level of detail is not necessary for reading the table, it's useful to round off each value.
Table 6.1 Converting and rounding units: oil consumption, barrels per day
Table 6.1 shows two columns of rounded data. The fixed rounding column shows the millions of barrel of oil
rounded to whole numbers. You do this as follows:
• If the digit immediately after the decimal point is a 0, 1, 2, 3, or 4, then the digit before the decimal
point remains the same (e.g. 14.3800 rounds to 14).
• If the digit immediately after the decimal point is a 5, 6, 7, 8, or 9, then the digit before the decimal
point goes up (e.g. 20.6800 rounds to 21).
This is fixed rounding because the rounding rules are fixed regardless of the size of the value. The fixed
rounding values are clearer than the unrounded values, but there are still problems. For example, the
rounding is too crude for countries in the bottom half of the list, especially New Zealand, which has a
nonsensical value of 0.
Another approach is to use variable rounding, as shown in the final column of Table 6.1. (The term variable
rounding contrasts with fixed rounding.) In variable rounding, you usually round each value to the first two
(occasionally three) digits, regardless of the number of digits. For example, the US value is 20.6800. Focus
on the first two digits (20) and round up only when the following digit is a 5, 6, 7, 8, or 9. Thus, round up
20.6800… to 21. So far, variable rounding doesn't seem much different from fixed rounding. But when you
use the same technique with the smaller values, you can see the difference between the two techniques. For
example, the New Zealand value is 0.1584. Again, focus on the first two digits (15) and round up only if the
following digit is a 5, 6, 7, 8, or 9. Thus, round 1584… up to 16 because the third digit in the original value is
8. The New Zealand value on the table is thus 0.16. Similarly, the India value (2.7220) rounds down to 2.7,
and the Russia value (2.6990) rounds up to 2.7. The advantage of variable rounding is that it leaves some
detail of the original value regardless of its absolute size.
The values produced by variable rounding shown on Table 6.1 are a big improvement on the original number
of barrels. They show just how useful it can be to standardise data through: (i) the conversion of units; and (ii)
rounding. These simple standardisation techniques make the information as easy as possible to understand,
use, and remember.
One downside of converting and rounding is that we can lose sight of the absolute size
of the numbers we're looking at. For example, the CIA's World Factbook tells us that the
world is consuming 85.22 million barrels of oil per day. Our eyes can scan this figure
perhaps too quickly. The longer, original value (85,220,000) is more impressive. And this is
the number of barrels per day. Looked at on a yearly basis, the world consumption figure
is a staggering 31,105,300,000 barrels. The unit of measurement is an oil barrel, and
most people have little idea of the size of an oil barrel. The annual world oil consumption
figure becomes even more staggering when measured in units that are more familiar:
1,087,823,800,000 UK gallons, 1,306,422,600,000 US gallons, and 4,945,347,600,000
litres!
Undoubtedly, the most widely used standardisation procedure for numerical variables is simply to rank
individuals from best to worst or highest to lowest. It is no surprise that one of the simplest techniques is also
the most widely used. Chapter 4 looked at ranking, using the data shown again here as Table 6.2, which
ranks movies in terms of their box office takings (i.e. the value of all cinema tickets sold). Titanic is ranked
number 1, taking $1840 million at the box office to be the most popular movie ever. And Lord of the Rings:
The Return of the King is ranked number 2, taking $1130 million.
Table 6.2 Ranks: movies, by worldwide box office takings
1 Box office takings only, in millions of US dollars. Figures not adjusted for inflation.
When dealing with a large number of values, often you don't need the precise ranking of an individual. In
these circumstances, use percentile ranks by grouping the ranked values into 100 equal parts and identifying
each part by its rank. The lowest 1% of values has a percentile rank of 1; the highest 1% of values has a
percentile rank of 100. For example, imagine the numbers 1 to 1000 in a long line. To divide the 1000 numbers
into 100 equal parts, you need 10 numbers in each part. The bottom 1% of values (values 1 to 10) have a
percentile rank of 1, the second lowest 1% of values (values 11 to 20) have a percentile rank of 2, and so on
up to the top 1% of values (values 991 to 1000) which have a percentile rank of 100. For example, Table 6.3
shows that the value 17 has a percentile rank of 2, and the value 985 has a percentile rank of 99.
Table 6.3 Percentile ranks: a thousand values from 1 to 1000
For example, if you have the 8590th highest score in a public exam involving a total of 8760 students, your
percentile rank is:
If you have a percentile rank of 98, then 98% of examinees have a score equal to or less than your score.
You'd probably prefer to turn that around and say that only 2% of students have higher scores.
Percentiles divide a set of ranked data into 100 equal parts. You may also come across other similar systems
for dividing a ranked set of values. For example, Chapter 2 looked at quartiles, which divide ranked values
into four equal parts:
Similarly, quintiles divide ranked values into five equal parts (each with 20% of all values), and deciles divide
ranked values into ten equal parts (each with 10% of all values). You often see quintiles and deciles used to
describe the distribution of wealth and income. Table 6.4 shows data collected by various national statistical
agencies about how household income varies between quintile groups. For example, in the USA, households
in the poorest quintile (the poorest 20% of households) have 3% of all income; households in the richest
quintile have 51% of all income. As each quintile includes 20% of households, the more the income figures
differ from a 20/20/20/20/20 split, the more unequal is the distribution of income. Look out for percentiles,
quartiles, quintiles, and deciles in your reading.
Table 6.4 Quintiles: households, by income quintiles
Standard Scores
Ranks are easy to calculate and easy to understand. However, when standardising with ranks, you lose a lot
of the information contained in the original values. To use some of the jargon from Chapter 2, you go from
Starting Statistics: A Short, Clear Guide
Page 6 of 12
SAGE SAGE Research Methods
2010 SAGE Publications, Ltd. All Rights Reserved.
an interval level of measurement to an ordinal level of measurement. For example, recall Table 6.2 about the
world's most commercially successful movies. Box office takings are at the interval level of measurement – so
called because you can measure the interval (or difference) between values. For example, Titanic took $710
million more than The Return of the King, and The Two Towers took $2 million more than Jurassic Park. But
when you simplify these dollar values to ranks, all you can now say is Titanic took more than The Return of
the King, and that The Two Towers took more than Jurassic Park. The fact that ‘more’ means $710 million in
the first pair of movies, and $2 million in the second pair, is lost in the rankings.
In contrast to ranks, standard scores (or z-scores) are not quite as straightforward to calculate or understand,
but do retain all the original information. Standard scores measure how far each value differs, or deviates,
from the mean. You then compare this deviation to the standard deviation. Look back to Chapter 2 if you need
to remind yourself about standard deviations.
For example, the 52 student marks below have a mean of 58.2 and a standard deviation of 14.3. The mean is
the starting point for standard scores and, like most starting points, it has a value of 0. The standard deviation
is the measuring unit (a ‘standard deviation unit’) against which you compare the deviation of each individual
value from the mean.
For example, a mark of 72.5 is 14.3 above the mean (72.5 − 58.2 = 14.3). As each standard deviation unit is
worth 14.3 marks, then 72.5 is one standard deviation unit above the mean. Thus, in this data set an original
value of 72.5 has a standard score of +1, the plus sign showing that the score is above the mean. Similarly,
a value of 43.9 is 14.3 below the mean (58.2 − 14.3 = 43.9), or one standard deviation unit below the mean.
Thus, as the diagram below shows, in this data set an original value of 43.9 has a standard score of −1, the
minus sign showing that the score is less than the mean:
The following points recap the above comments about standard scores:
• Original values that are the same as the mean have a standard score of 0.
• Those original values with an average deviation, or standard deviation, from the mean have a
standard score of 1.
• The bigger the deviation of an original value from the mean, the bigger its standard score.
• Original values that are more than the mean have positive standard scores.
• Original values that are less than the mean have negative standard scores.
• Calculate a standard score as follows:
Because these characteristics always apply, you can directly compare sets of standardised data. For
example, Jessica has a mark of 65 in Sociology and 70 in Psychology. Has she done better at Sociology or
Psychology? In absolute terms, she has done better in Psychology – 70 is higher than 65. But how has she
done in relation to all the other students in each course? The following figures give the information you need
to standardise the scores, and thus make a more meaningful comparison:
Sociology Psychology
Jessica's mark 65 70
Mean mark 55 65
Standard deviation 5 10
The two plus signs show that she has done better than average in both Sociology and Psychology. But despite
the higher absolute mark in Psychology, her standard scores show that, relative to all the other students in
each course, she has done much better in Sociology (+2.0) than in Psychology (+0.5).
Lecturers often standardise marks. They do this because they believe that although student ability remains
constant from year to year, tests may vary in their degree of difficulty, and staff may vary in their marking
standards. Standardisation ensures that overall student marks do not vary from year to year.
For example, every year lecturers teaching a social statistics course standardise student marks to a mean of
60 and a standard deviation of 10. The actual marks this year show a mean of 65 and a standard deviation
of 5. To standardise this year's marks, the lecturers first have to convert all the original marks into standard
scores. For example, Michael has a mark of 77.5. His standard score is +2.5:
Thus, Michael's original mark is 2.5 standard deviation units above the mean. The lecturers want to
standardise student marks to a mean of 60 and a standard deviation of 10. In this new scheme, Michael's
revised mark will still be 2.5 standard deviations more than the specified mean – in other words, 2.5 ‘lots’ of
10, or 25 marks, above 60. Thus, Michael's standardised mark is 85% (25 + 60). More generally, the lecturers
standardise their students' marks as follows:
For example:
Indexes
Most people like lists, and the media are full of them. For example, Forbes magazine is famous for its
lists. They include the ‘world's 100 most powerful women’, currently headed by Angela Merkel, the German
Chancellor (Egan and Schoenberger 2008), and the ‘world's 100 most powerful celebrities’, currently headed
by Angelina Jolie (Miller et al. 2009).
Of course, power is a concept that you can't measure directly, like height or eye colour. A multifaceted concept
that you can measure only by combining several simpler concepts is called a construct. Commercial list-
makers, like Forbes magazine, are usually rather vague about how they generate the lists – after all, they
don't want competitors copying them. But when you look at any research about a multifaceted concept,
always try to find as much information as you can about how the researchers have measured it. If they have
included the wrong things and/or missed out some of the right things, then the work will not be valid. In other
words, it will not measure what it is intended to measure.
To show how to measure a multifaceted concept, I'll use the familiar example of the best car awards given
by motoring organisations. The construct Best family car is not something researchers can measure directly.
Instead, they have to make sure that they take into account all the big things a typical buyer looks for. These
are likely to include value for money, safety, and drivability. These ‘big things’ are dimensions. Each dimension
then needs breaking into smaller parts. For example, ‘value for money’ will include sale price, depreciation,
fuel consumption, warranty, and so on. These more specific things are indicators. The final step is to find a
way to measure each indicator. For example, you can measure fuel consumption in miles per gallon or litres
per 100 kilometres. These are the variables. A car might have a fuel consumption of 24 miles per gallon (or
10 litres per 100 kilometres). These are the values. Table 6.5 shows the basic procedure for constructing an
index.
Table 6.5 Creating an index
France was the first country to officially adopt the metric system, during the period of the
French Revolution (1789–99). The new unit of length, the metre, was one ten-millionth of
the distance from the North Pole to the equator along a line of longitude running through
France. Today, all but three countries (USA, Liberia, Burma) have officially adopted the
metric system (see USMA 2009). French revolutionary zeal also led to a short-lived
experiment with metric time: each day had 10 hours; each hour had 100 minutes; and
each minute had 100 seconds. In 1786, the USA became the first country to use a
decimal currency. Today, all but two countries (Madagascar and Mauretania) have decimal
currencies.
Once researchers identify the variables and collect the data, the next stage is to combine values from all the
variables into a single figure. This is an index. To do this, all the values need to be in the same units. But
it's very likely that the variables will be in very different units. For example, you might measure the variable
Warranty by number of years (e.g. 3) and the variable Fuel consumption in miles per gallon (e.g. 24 mpg).
However, it does not make sense simply to add the two values (e.g. 3 years + 24 mpg does not equal 27!).
The simplest way to combine values from variables measured in very different units is to rank all cars on each
variable, and then add the rank scores to provide an overall index. For example, Table 6.6 shows how eight
cars (A to H) compare on the construct Best family car, measured using three dimensions, each dimension
measured by three variables. Values for each variable have been standardised using ranks.
For example, Variable 1 is Retail price. The Variable 1 column in Table 6.6 shows that Car C scores best,
and so ranks 1, Car F scores second best and ranks 2, and so on. Similarly, Variable 2 is Repair costs. The
Variable 2 column in Table 6.6 shows that Car H scores best, and so ranks 1, Car G scores second best
and ranks 2, and so on. Because all the values in the body of the table are ranks, you can legitimately add
together the ranks for each car in all nine columns. For example, Car A ranks third for Variable 1, fourth for
Variable 2, third for Variable 3, and so on, giving a total score of 35. This is the index score for that car. It
shows how well it has done overall relative to the other cars.
However, the index scores themselves are usually much less important than the rankings of the index scores,
which are in the final column of Table 6.6. Because of the ranking procedure, the best score is the lowest one.
In this example the best possible score is 9, which occurs only if a car is ranked first for all nine variables.
And the worst possible score is 72, the result of a car being last (i.e. ranked eighth) for all nine variables. The
final column of Table 6.6 shows that Car C takes the Best family car award, closely followed by Car G.
You may have wondered about the scoring of the Best family car index because simply adding the ranks
from all nine variables assumes that all variables are equally important. This is unlikely to be true, of course.
For example, buyers might regard price as critical to their choice of car, whereas a car's quietness has much
lower importance. When developing indexes to measure constructs, researchers often do surveys to assess
public opinion about how they feel about the importance of each feature. For example, new car buyers might
say that price is twice as important as quietness. Using this survey result, researchers will make scores from
the price variable twice as important as scores for the quietness variable when calculating their Best family
car index. This is called weighting.
A real-world example of weighting is the development of the Times Higher Education–QS World University
Rankings (THES-QS 2009). Harvard University currently heads the list. There are five dimensions to the
index: (i) academic peer review (weighted at 40% of the total index); (ii) staff–student ratio (20%); (iii) citations
per member of staff (20%); (iv) employer review (10%); and (v) proportion of international staff and students
(10%). Clearly, the choice of dimensions is open to debate, as are the weightings. When you come across
indexes in your reading, always see if the researchers have used weightings. If they haven't, then ask if they
have evidence to allow them to treat all variables as equal. If they have used weightings, then ask what
evidence the researchers have to justify their weights.
This chapter is the final one on standardisation. The next chapter, Chapter 7, introduces the idea of statistical
correlations between two variables. And the following two chapters look in more detail at correlations between
categorical variables and then numerical variables.
http://dx.doi.org/10.4135/9781446287873.n6