0% found this document useful (0 votes)
11 views

Lesson 4 Data Description Measures of Position-1

Uploaded by

alfredojrdavin4
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lesson 4 Data Description Measures of Position-1

Uploaded by

alfredojrdavin4
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

FT T216/DC-EC SP4

Applied Statistics/Descriptive and Inferential Statistics

LESSON 4: Data Description: Measures of Position

Measures of Position
In addition to measures of central tendency and measures of variation, there are measures
of position or location. These measures include standard scores, percentiles, deciles, and quartiles.
They are used to locate the relative position of a data value in the data set. For example, if a value is
located at the 80th percentile, it means that 80% of the values fall below it in the distribution and
20% of the values fall above it. The median is the value that corresponds to the 50th percentile,
since one-half of the values fall below it and one half of the values fall above it.

Standard Scores
There is an old saying, “You can’t compare apples and oranges.” But with the use of
statistics, it can be done to some extent. Suppose that a student scored 90 on a music test and 45 on
an English exam. Direct comparison of raw scores is impossible, since the exams might not be
equivalent in terms of number of questions, value of each question, and so on. However, a
comparison of a relative standard similar to both can be made. This comparison uses the mean and
standard deviation and is called a standard score or z score.
A standard score or z score tells how many standard deviations a data value is above or
below the mean for a specific distribution of values. If a standard score is zero, then the data value is
the same as the mean.

Example:
Test Scores
A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of 10; she
scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare her relative
positions on the two tests.

Solution
First, find the z scores. For calculus the z score is

For history the z score is


Since the z score for calculus is larger, her relative position in the calculus class is higher than her
relative position in the history class.

Note that if the z score is positive, the score is above the mean. If the z score is 0, the score is the
same as the mean. And if the z score is negative, the score is below the mean.

Example:
Test Scores
Find the z score for each test, and state which is higher.

When all data for a variable are transformed into z scores, the resulting distribution will have
a mean of 0 and a standard deviation of 1. A z score, then, is actually the number of standard
deviations each value is from the mean for a specific distribution.

Percentiles
Percentiles are position measures used in educational and health-related fields to indicate
the position of an individual in a group.
Percentiles divide the data set into 100 equal groups.
In many situations, the graphs and tables showing the percentiles for various measures such
as test scores, heights, or weights have already been completed. Percentiles are also used to
compare an individual’s test score with the national norm.
Percentiles are not the same as percentages. That is, if a student gets 72 correct answers
out of a possible 100, she obtains a percentage score of 72. There is no indication of her position
with respect to the rest of the class. She could have scored the highest, the lowest, or somewhere in
between. On the other hand, if a raw score of 72 corresponds to the 64th percentile, then she did
better than 64% of the students in her class.
Percentiles are symbolized by

and divide the distribution into 100 groups.

Percentile graphs can be constructed as shown below. Percentile graphs use the same values
as the cumulative relative frequency graphs, except that the proportions have been converted to
percents.
Figure 1. Weights of Girls
by Age and Percentile
Rankings
Source: Distributed by Mead
Johnson Nutritional Division.
Reprinted with permission.

Example:
Systolic Blood Pressure
The frequency distribution for the systolic blood pressure readings (in millimeters of mercury, mm
Hg) of 200 randomly selected college students is shown here. Construct a percentile graph.

Solution
Step 1 Find the cumulative frequencies and place them in column C.
Step 2 Find the cumulative percentages and place them in column D. To do this step, use the
formula
Step 3 Graph the data, using class boundaries for the x axis and the percentages for the y axis

Once a percentile graph has been constructed, one can find the approximate corresponding
percentile ranks for given blood pressure values and find approximate blood pressure values for
given percentile ranks. For example, to find the percentile rank of a blood pressure reading of 130,
find 130 on the x axis of the figure above, and draw a vertical line to the graph. Then move
horizontally to the value on the y axis. Note that a blood pressure of 130 corresponds to
approximately the 70th percentile. If the value that corresponds to the 40th percentile is desired,
start on the y axis at 40 and draw a horizontal line to the graph. Then draw a vertical line to the x
axis and read the value.
The 40th percentile corresponds to a value of approximately 118. Thus, if a person has a
blood pressure of 118, he or she is at the 40th percentile. Finding values and the corresponding
percentile ranks by using a graph yields only approximate answers. Several mathematical methods
exist for computing percentiles for data. These methods can be used to find the approximate
percentile rank of a data value or to find a data value corresponding to a given percentile. When the
data set is large (100 or more), these methods yield better results.

Example:
Test Scores
A teacher gives a 20-point test to 10 students. The scores are shown here. Find the percentile rank of
a score of 12.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10

Solution
Arrange the data in order from lowest to highest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Then substitute into the formula.

Note: One assumes that a score of 12 in the example, for instance, means theoretically any value
between 11.5 and 12.5.

Example:
Test Scores
Using the data in the previous example, find the percentile rank for a score of 6.
Solution
There are three values below 6. Thus

A student who scored 6 did better than 35% of the class.

The next examples show a procedure for finding a value corresponding to a given percentile.
Example:
Test Scores
Using the scores in previous example, find the value corresponding to the 25th percentile.

Solution:

Step 3 If c is not a whole number, round it up to the next whole number; in this case, c = 3. Start at
the lowest value and count over to the third value, which is 5. Hence, the value 5 corresponds to the
25th percentile.

If c is a whole number:
Example:
Using the data set in the previous example, find the value that corresponds to the 60th percentile.
Solution
Step 1 Arrange the data in order from smallest to largest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Quartiles and Deciles
Quartiles divide the distribution into four groups, separated by Q1, Q2, Q3. Note that Q1 is
the same as the 25th percentile; Q2 is the same as the 50th percentile, or the median; Q3
corresponds to the 75th percentile, as shown:

Example:
Find Q1, Q2, and Q3 for the data set 15, 13, 6, 5, 12, 50, 22, 18.
In addition to dividing the data set into four groups, quartiles can be used as a rough
measurement of variability. The interquartile range (IQR) is defined as the difference between Q1
and Q3 and is the range of the middle 50% of the data. The interquartile range is used to identify
outliers, and it is also used as a measure of variability in exploratory data analysis.
Deciles divide the distribution into 10 groups, as shown. They are denoted by D1, D2, etc.

Note that D1 corresponds to P10; D2 corresponds to P20; etc. Deciles can be found by using
the formulas given for percentiles. Taken altogether then, these are the relationships among
percentiles, deciles, and quartiles.
Deciles are denoted by D1, D2, D3, . . . , D9, and they correspond to P10, P20, P30, . . . , P90.
Quartiles are denoted by Q1, Q2, Q3 and they correspond to P25, P50, P75.
The median is the same as P50 or Q2 or D5.

Outliers
A data set should be checked for extremely high or extremely low values. These values are
called outliers. An outlier is an extremely high or an extremely low data value when compared with
the rest of the data values.
An outlier can strongly affect the mean and standard deviation of a variable. For example,
suppose a researcher mistakenly recorded an extremely high data value. This value would then make
the mean and standard deviation of the variable much larger than they really were. Outliers can
have an effect on other statistics as well.

Example:
Check the following data set for outliers.
5, 6, 12, 13, 15, 18, 22, 50

Reasons Why Outliers May Occur


1. The data value may have resulted from a measurement or observational error. Perhaps the
researcher measured the variable incorrectly.
2. The data value may have resulted from a recording error. That is, it may have been written
or typed incorrectly.
3. The data value may have been obtained from a subject that is not in the defined population.
For example, suppose test scores were obtained from a seventh-grade class, but a student in
that class was actually in the sixth grade and had special permission to attend the class. This
student might have scored extremely low on that particular exam on that day.
4. The data value might be a legitimate value that occurred by chance (although the probability
is extremely small).

There are no hard-and-fast rules on what to do with outliers, nor is there complete agreement
among statisticians on ways to identify them. Obviously, if they occurred as a result of an error, an
attempt should be made to correct the error or else the data value should be omitted entirely.
When they occur naturally by chance, the statistician must make a decision about whether to
include them in the data set. When a distribution is normal or bell-shaped, data values that are
beyond 3 standard deviations of the mean can be considered suspected outliers.

Exploratory Data Analysis


In traditional statistics, data are organized by using a frequency distribution. From this
distribution various graphs such as the histogram, frequency polygon, and ogive can be constructed
to determine the shape or nature of the distribution. In addition, various statistics such as the mean
and standard deviation can be computed to summarize the data.
The purpose of traditional analysis is to confirm various conjectures about the nature of the
data. For example, from a carefully designed study, a researcher might want to know if the
proportion of Americans who are exercising today has increased from 10 years ago. This study would
contain various assumptions about the population, various definitions such as of exercise, and so on.
In exploratory data analysis (EDA), data can be organized using a stem and leaf plot. The
measure of central tendency used in EDA is the median. The measure of variation used in EDA is the
interquartile range Q3 – Q1. In EDA the data are represented graphically using a boxplot
(sometimes called a box-and-whisker plot). The purpose of exploratory data analysis is to examine
data to find out what information can be discovered about the data such as the center and the
spread. Exploratory data analysis was developed by John Tukey and presented in his book
Exploratory Data Analysis (Addison-Wesley, 1977).

The Five-Number Summary and Boxplots


A boxplot can be used to graphically represent the data set. These plots involve five specific values:
1. The lowest value of the data set (i.e., minimum)
2. Q1
3. The median
4. Q3
5. The highest value of the data set (i.e., maximum)
These values are called a five-number summary of the data set.

Procedure for constructing a boxplot


1. Find the five-number summary for the data values, that is, the maximum and minimum data
values, Q1 and Q3, and the median.
2. Draw a horizontal axis with a scale such that it includes the maximum and minimum data values.
3. Draw a box whose vertical sides go through Q1 and Q3, and draw a vertical line though the
median.
4. Draw a line from the minimum data value to the left side of the box and a line from the maximum
data value to the right side of the box.

Example:
Number of Meteorites Found
The number of meteorites found in 10 states of the United States is 89, 47, 164, 296, 30, 215, 138,
78, 48, 39. Construct a boxplot for the data.
Source: Natural History Museum.
Solution:

Step 5 Draw a scale for the data on the x axis.


Step 6 Locate the lowest value, Q1, median, Q3, and the highest value on the scale.
Step 7 Draw a box around Q1 and Q3, draw a vertical line through the median, and connect the
upper value and the lower value to the box.

The distribution is somewhat positively skewed

If the boxplots for two or more data sets are graphed on the same axis, the distributions can
be compared. To compare the averages, use the location of the medians. To compare the variability,
use the interquartile range, i.e., the length of the boxes.

Example:
Sodium Content of Cheese
A dietitian is interested in comparing the sodium content of real cheese with the sodium content of
a cheese substitute. The data for two random samples are shown. Compare the distributions, using
boxplots.
Solution:

Step 4 Compare the plots. It is quite apparent that the distribution for the cheese substitute data has
a higher median than the median for the distribution for the real cheese data. The variation or
spread for the distribution of the real cheese data is larger than the variation for the distribution of
the cheese substitute data.

A modified boxplot can be drawn and used to check for outliers. In exploratory data
analysis, hinges are used instead of quartiles to construct boxplots. When the data set consists of an
even number of values, hinges are the same as quartiles. Hinges for a data set with an odd number
of values differ somewhat from quartiles. However, most calculators and computer programs use
quartiles.
Another important point to remember is that the summary statistics (median and
interquartile range) used in exploratory data analysis are said to be resistant statistics. A resistant
statistic is relatively less affected by outliers than a nonresistant statistic. The mean and standard
deviation are nonresistant statistics. Sometimes when a distribution is skewed or contains outliers,
the median and interquartile range may more accurately summarize the data than the mean and
standard deviation, since the mean and standard deviation are more affected in this case.
A modified boxplot can be drawn by placing a box around Q1 and Q3 and then extending the
whiskers to the largest and/or smallest values within 1.5 times the interquartile range (that is, Q3 –
Q1).
Mild outliers are values between 1.5(IQR) and 3(IQR).
Extreme outliers are data values beyond 3(IQR).

Example:
Unhealthful Smog Days
For the data shown here, draw a modified boxplot and identify any mild or extreme outliers. The
data represent the number of unhealthful smog days for a specific year for the highest 10 locations.

Solution:
Learning Task/Activity:

Name: _______________________________ Date: _____________


Course & Year: ________________________ Instructor: Dr. ANJIN PLEIADESS P. CABRERA

Exercise 4b
Data Description
C. Measures of Position
1. Miles per Hour
Using the data below, find the approximate percentile ranks of the following miles per hour (mph).
a. 380 mph
b. 425 mph
c. 455 mph
d. 505 mph
e. 525 mph

2. Test Scores
Find the percentile rank for each test score in the data set.
12, 28, 35, 42, 47, 49, 50

3. Another measure of average is called the midquartile; it is the numerical value halfway between
Q1 and Q3, and the formula is

Using this formula and other formulas, find Q1, Q2, Q3, the midquartile, and the interquartile range
for each data set
a. 5, 12, 16, 25, 32, 38 12; 20.5; 32; 22; 20
b. 53, 62, 78, 94, 96, 99, 103

4. Check each data set for outliers.


a. 16, 18, 22, 19, 3, 21, 17, 20
b. 24, 32, 54, 31, 16, 18, 19, 14, 17, 20
c. 321, 343, 350, 327, 200
d. 88, 72, 97, 84, 86, 85, 100
e. 145, 119, 122, 118, 125, 116
f. 14, 16, 27, 18, 13, 19, 36, 15, 20

5. Driver’s License Exam Scores


The average score on a state CDL license exam is 76 with a standard deviation of 5. Find the
corresponding z score for each raw score.
a. 79
b. 70
c. 88
d. 65
e. 77
D. Exploratory Data Analysis

A. Identify the five-number summary and find the interquartile range.


1. 8, 12, 32, 6, 27, 19, 54
2. 19, 16, 48, 22, 7
3. 362, 589, 437, 316, 192, 188

B. Use each boxplot to identify the maximum value, minimum value, median, first quartile, third
quartile, and interquartile range.

4.

5.

You might also like