Question On Box Plot 1
Question On Box Plot 1
http://www.purplemath.com/modules/boxwhisk2.htm
Box-and-Whisker Plots:
Quartiles, Boxes, and Whiskers (page 1 of 3)
Sections: Quartiles, boxes, and whiskers, Five-number summary, Interquartile ranges and outliers
Statistics assumes that your data points (the numbers in your list) are
clustered around some central value. The "box" in the box-and-whisker plot
contains, and thereby highlights, the middle half of these data points.
You have three points: the first middle point (the median), and the middle
points of the two halves (what I call the "sub-medians"). These three points
divide the entire data set into quarters, called "quartiles". The top point of
each quartile has a name, being a "Q" followed by the number of the
quarter. So the top point of the first quarter of the data points is "Q1", and so
forth. Note that Q1 is also the middle number for the first half of the list, Q2 is
also the middle number for the whole list, Q3 is the middle number for the
second half of the list, and Q4 is the largest value in the list.
Once you have these three points, Q1, Q2, and Q3, you have all you need in
order to draw a simple box-and-whisker plot. Here's an example of how it
works.
1
Question 1
4.3, 5.1, 3.9, 4.5, 4.4, 4.9, 5.0, 4.7, 4.1, 4.6, 4.4, 4.3, 4.8,
4.4, 4.2, 4.5, 4.4
My first step is to order the set. This gives me:
3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4, 4.4, 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1
The first number I need is the median of the entire set. Since there are seventeen values in this
list, I need the ninth value:
3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4, 4.4, 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1
The next two numbers I need are the medians of the two halves. Since I used the " 4.4" in the
middle of the list, I can't re-use it, so my two remaining data sets are:
3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4 and 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1
The first half has eight values, so the median is the average of the middle two:
The median of the second half is: Copyright © Elizabeth Stapel 2004-2011 All Rights Reserved
2
The "box" part of the plot goes from Q1 to Q3:
By the way, box-and-whisker plots don't have to be drawn horizontally as I did above; they can be vertical,
too.
Question 2
This splits the list into two halves: 77, 79, 80, 86 and 87, 87, 94, 99. Since the halves of
the data set each contain an even number of values, the sub-medians will be the average of the
middle two values. Copyright © Elizabeth Stapel 2004-2011 All Rights Reserved
3
As you can see, you only need the five values listed above (min, Q1, Q2, Q3, and max) in order to draw
your box-and-whisker plot. This set of five values has been given the name "the five-number summary".
Question 3
The five-number summary consists of the numbers I need for the box-and-whisker plot: the
minimum value, Q1 (the bottom of the box), Q2 (the median of the set), Q3 (the top of the box),
and the maximum value (which is also Q4). So I need to order the set, find the median and the
sub-medians, and then list the required values in order.
ordering the list: 53, 79, 80, 82, 87, 91, 93, 98, so the minimum is 53 and the maximum
is 98
lower half of the list: 53, 79, 80, 82, so Q1 = (79 + 80) ÷ 2 = 79.5
upper half of the list: 87, 91, 93, 98, so Q3 = (91 + 93) ÷ 2 = 92
Part of the point of a box-and-whisker plot is to show how spread out your values are. But what if one or
another of your values is way out of line? For this, we need to consider "outliers"....
Question 4
To find out if there are any outliers, I first have to find the IQR. There are fifteen data points, so
the median will be at position (15 + 1) ÷ 2 = 8. Then Q2 = 14.6. There are seven data points on
4
either side of the median, so Q1 is the fourth value in the list and Q3 is the twelfth: Q1 = 14.4 and
Q3 = 14.9. Then IQR = 14.9 – 14.4 = 0.5.
Outliers will be any points below Q1 – 1.5×IQR = 14.4 – 0.75 = 13.65 or above Q3 + 1.5×IQR
= 14.9 + 0.75 = 15.65.
Question 5
So I have an outlier at 49 but no extreme values, I won't have a top whisker because Q3 is
also the highest non-outlier, and my plot looks like this:
5
Notes:
For Question 3
More terminology: The top end of your box may also be called the "upper hinge"; the lower end may also
be called the "lower hinge". The lower hinge is also called "the 25th percentile"; the median is "the 50th
percentile"; the upper hinge is "the 75th percentile". This means that 25%, 50% and 75% of the data,
respectively, is at or below that point. The distance between the hinges may be referred to as the "H-
spread" or, as you will see on the following page, the "Interquartile Range", abbreviated "IQR". ("Hinge"
actually has a different technical definition, but the term is sometimes used informally.)
For Question 4
Box-and-Whisker Plots:
Interquartile Ranges and Outliers (page 3 of 3)
Sections: Quartiles, boxes, and whiskers, Five-number summary, Interquartile ranges and outliers
6
The "interquartile range", abbreviated "IQR", is just the width of the box in the box-and-whisker plot. That
is, IQR = Q3 – Q1. The IQR can be used as a
measure of how spread-out the values are. ADVERTISEMENT
Statistics assumes that your values are clustered
around some central value. The IQR tells how
spread out the "middle" values are; it can also be
used to tell when some of the other values are "too
far" from the central value. These "too far away"
points are called "outliers", because they "lie
outside" the range in which we expect them.
(Why one and a half times the width of the box? Why does that particular value demark the difference
between "acceptable" and "unacceptable" values? Because, when John Tukey was inventing the box-
and-whisker plot in 1977 to display these values, he picked 1.5×IQR as the demarkation line for outliers.
This has worked well, so we've continued using that value ever since.)