Lesson 4 Data Description Measures of Position-1
Lesson 4 Data Description Measures of Position-1
Measures of Position
In addition to measures of central tendency and measures of variation, there are measures
of position or location. These measures include standard scores, percentiles, deciles, and quartiles.
They are used to locate the relative position of a data value in the data set. For example, if a value is
located at the 80th percentile, it means that 80% of the values fall below it in the distribution and
20% of the values fall above it. The median is the value that corresponds to the 50th percentile,
since one-half of the values fall below it and one half of the values fall above it.
Standard Scores
There is an old saying, “You can’t compare apples and oranges.” But with the use of
statistics, it can be done to some extent. Suppose that a student scored 90 on a music test and 45 on
an English exam. Direct comparison of raw scores is impossible, since the exams might not be
equivalent in terms of number of questions, value of each question, and so on. However, a
comparison of a relative standard similar to both can be made. This comparison uses the mean and
standard deviation and is called a standard score or z score.
A standard score or z score tells how many standard deviations a data value is above or
below the mean for a specific distribution of values. If a standard score is zero, then the data value is
the same as the mean.
Example:
Test Scores
A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of 10; she
scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare her relative
positions on the two tests.
Solution
First, find the z scores. For calculus the z score is
Note that if the z score is positive, the score is above the mean. If the z score is 0, the score is the
same as the mean. And if the z score is negative, the score is below the mean.
Example:
Test Scores
Find the z score for each test, and state which is higher.
When all data for a variable are transformed into z scores, the resulting distribution will have
a mean of 0 and a standard deviation of 1. A z score, then, is actually the number of standard
deviations each value is from the mean for a specific distribution.
Percentiles
Percentiles are position measures used in educational and health-related fields to indicate
the position of an individual in a group.
Percentiles divide the data set into 100 equal groups.
In many situations, the graphs and tables showing the percentiles for various measures such
as test scores, heights, or weights have already been completed. Percentiles are also used to
compare an individual’s test score with the national norm.
Percentiles are not the same as percentages. That is, if a student gets 72 correct answers
out of a possible 100, she obtains a percentage score of 72. There is no indication of her position
with respect to the rest of the class. She could have scored the highest, the lowest, or somewhere in
between. On the other hand, if a raw score of 72 corresponds to the 64th percentile, then she did
better than 64% of the students in her class.
Percentiles are symbolized by
Percentile graphs can be constructed as shown below. Percentile graphs use the same values
as the cumulative relative frequency graphs, except that the proportions have been converted to
percents.
Figure 1. Weights of Girls
by Age and Percentile
Rankings
Source: Distributed by Mead
Johnson Nutritional Division.
Reprinted with permission.
Example:
Systolic Blood Pressure
The frequency distribution for the systolic blood pressure readings (in millimeters of mercury, mm
Hg) of 200 randomly selected college students is shown here. Construct a percentile graph.
Solution
Step 1 Find the cumulative frequencies and place them in column C.
Step 2 Find the cumulative percentages and place them in column D. To do this step, use the
formula
Step 3 Graph the data, using class boundaries for the x axis and the percentages for the y axis
Once a percentile graph has been constructed, one can find the approximate corresponding
percentile ranks for given blood pressure values and find approximate blood pressure values for
given percentile ranks. For example, to find the percentile rank of a blood pressure reading of 130,
find 130 on the x axis of the figure above, and draw a vertical line to the graph. Then move
horizontally to the value on the y axis. Note that a blood pressure of 130 corresponds to
approximately the 70th percentile. If the value that corresponds to the 40th percentile is desired,
start on the y axis at 40 and draw a horizontal line to the graph. Then draw a vertical line to the x
axis and read the value.
The 40th percentile corresponds to a value of approximately 118. Thus, if a person has a
blood pressure of 118, he or she is at the 40th percentile. Finding values and the corresponding
percentile ranks by using a graph yields only approximate answers. Several mathematical methods
exist for computing percentiles for data. These methods can be used to find the approximate
percentile rank of a data value or to find a data value corresponding to a given percentile. When the
data set is large (100 or more), these methods yield better results.
Example:
Test Scores
A teacher gives a 20-point test to 10 students. The scores are shown here. Find the percentile rank of
a score of 12.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
Solution
Arrange the data in order from lowest to highest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Then substitute into the formula.
Note: One assumes that a score of 12 in the example, for instance, means theoretically any value
between 11.5 and 12.5.
Example:
Test Scores
Using the data in the previous example, find the percentile rank for a score of 6.
Solution
There are three values below 6. Thus
The next examples show a procedure for finding a value corresponding to a given percentile.
Example:
Test Scores
Using the scores in previous example, find the value corresponding to the 25th percentile.
Solution:
Step 3 If c is not a whole number, round it up to the next whole number; in this case, c = 3. Start at
the lowest value and count over to the third value, which is 5. Hence, the value 5 corresponds to the
25th percentile.
If c is a whole number:
Example:
Using the data set in the previous example, find the value that corresponds to the 60th percentile.
Solution
Step 1 Arrange the data in order from smallest to largest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Quartiles and Deciles
Quartiles divide the distribution into four groups, separated by Q1, Q2, Q3. Note that Q1 is
the same as the 25th percentile; Q2 is the same as the 50th percentile, or the median; Q3
corresponds to the 75th percentile, as shown:
Example:
Find Q1, Q2, and Q3 for the data set 15, 13, 6, 5, 12, 50, 22, 18.
In addition to dividing the data set into four groups, quartiles can be used as a rough
measurement of variability. The interquartile range (IQR) is defined as the difference between Q1
and Q3 and is the range of the middle 50% of the data. The interquartile range is used to identify
outliers, and it is also used as a measure of variability in exploratory data analysis.
Deciles divide the distribution into 10 groups, as shown. They are denoted by D1, D2, etc.
Note that D1 corresponds to P10; D2 corresponds to P20; etc. Deciles can be found by using
the formulas given for percentiles. Taken altogether then, these are the relationships among
percentiles, deciles, and quartiles.
Deciles are denoted by D1, D2, D3, . . . , D9, and they correspond to P10, P20, P30, . . . , P90.
Quartiles are denoted by Q1, Q2, Q3 and they correspond to P25, P50, P75.
The median is the same as P50 or Q2 or D5.
Outliers
A data set should be checked for extremely high or extremely low values. These values are
called outliers. An outlier is an extremely high or an extremely low data value when compared with
the rest of the data values.
An outlier can strongly affect the mean and standard deviation of a variable. For example,
suppose a researcher mistakenly recorded an extremely high data value. This value would then make
the mean and standard deviation of the variable much larger than they really were. Outliers can
have an effect on other statistics as well.
Example:
Check the following data set for outliers.
5, 6, 12, 13, 15, 18, 22, 50
There are no hard-and-fast rules on what to do with outliers, nor is there complete agreement
among statisticians on ways to identify them. Obviously, if they occurred as a result of an error, an
attempt should be made to correct the error or else the data value should be omitted entirely.
When they occur naturally by chance, the statistician must make a decision about whether to
include them in the data set. When a distribution is normal or bell-shaped, data values that are
beyond 3 standard deviations of the mean can be considered suspected outliers.
Example:
Number of Meteorites Found
The number of meteorites found in 10 states of the United States is 89, 47, 164, 296, 30, 215, 138,
78, 48, 39. Construct a boxplot for the data.
Source: Natural History Museum.
Solution:
If the boxplots for two or more data sets are graphed on the same axis, the distributions can
be compared. To compare the averages, use the location of the medians. To compare the variability,
use the interquartile range, i.e., the length of the boxes.
Example:
Sodium Content of Cheese
A dietitian is interested in comparing the sodium content of real cheese with the sodium content of
a cheese substitute. The data for two random samples are shown. Compare the distributions, using
boxplots.
Solution:
Step 4 Compare the plots. It is quite apparent that the distribution for the cheese substitute data has
a higher median than the median for the distribution for the real cheese data. The variation or
spread for the distribution of the real cheese data is larger than the variation for the distribution of
the cheese substitute data.
A modified boxplot can be drawn and used to check for outliers. In exploratory data
analysis, hinges are used instead of quartiles to construct boxplots. When the data set consists of an
even number of values, hinges are the same as quartiles. Hinges for a data set with an odd number
of values differ somewhat from quartiles. However, most calculators and computer programs use
quartiles.
Another important point to remember is that the summary statistics (median and
interquartile range) used in exploratory data analysis are said to be resistant statistics. A resistant
statistic is relatively less affected by outliers than a nonresistant statistic. The mean and standard
deviation are nonresistant statistics. Sometimes when a distribution is skewed or contains outliers,
the median and interquartile range may more accurately summarize the data than the mean and
standard deviation, since the mean and standard deviation are more affected in this case.
A modified boxplot can be drawn by placing a box around Q1 and Q3 and then extending the
whiskers to the largest and/or smallest values within 1.5 times the interquartile range (that is, Q3 –
Q1).
Mild outliers are values between 1.5(IQR) and 3(IQR).
Extreme outliers are data values beyond 3(IQR).
Example:
Unhealthful Smog Days
For the data shown here, draw a modified boxplot and identify any mild or extreme outliers. The
data represent the number of unhealthful smog days for a specific year for the highest 10 locations.
Solution:
Learning Task/Activity:
Exercise 4b
Data Description
C. Measures of Position
1. Miles per Hour
Using the data below, find the approximate percentile ranks of the following miles per hour (mph).
a. 380 mph
b. 425 mph
c. 455 mph
d. 505 mph
e. 525 mph
2. Test Scores
Find the percentile rank for each test score in the data set.
12, 28, 35, 42, 47, 49, 50
3. Another measure of average is called the midquartile; it is the numerical value halfway between
Q1 and Q3, and the formula is
Using this formula and other formulas, find Q1, Q2, Q3, the midquartile, and the interquartile range
for each data set
a. 5, 12, 16, 25, 32, 38 12; 20.5; 32; 22; 20
b. 53, 62, 78, 94, 96, 99, 103
B. Use each boxplot to identify the maximum value, minimum value, median, first quartile, third
quartile, and interquartile range.
4.
5.