Box and Whisker Plot
Box and Whisker Plot
Definition: A bar or diagram using a number line to show the distribution of data.
The box-and-whisker plot is an exploratory graphic, created by John W. Tukey, used to show the distribution of a dataset (at a glance). Think of the type of data you might use a histogram with, and the boxand-whisker (or box plot, for short) could probably be useful. The box plot, although very useful, seems to get lost in areas outside of Statistics, but I'm not sure why. It could be that people don't know about it or maybe are clueless on how to interpret it. In any case, here's how you read a box plot. Reading a Box-and-Whisker Plot
Let's say we ask 2,852 people (and they miraculously all respond) how many hamburgers they've consumed in the past week. We'll sort those responses from least to greatest and then graph them with our boxand-whisker. Take the top 50% of the group (1,426) who ate more hamburgers; they are represented by everything above the median (the white line). Those in the top 25% of hamburger eating (713) are shown by the top "whisker" and dots. Dots represent those who ate a lot more than normal or a lot less than normal (outliers). If more than one outlier ate the same number of hamburgers, dots are placed side by side.
As you can see, you only need the five values listed above (min, Q1, Q2, Q3, and max) in order to draw your box-and-whisker plot. This set of five values has been given the name "the five-number summary". Give the five-number summary of the following data set:
53, 79, 80, 82, 87, 91, 93, 98, so the minimum is 53 and the maximum
98 (82 + 87) 2 = 84.5 = Q2 53, 79, 80, 82, so Q1 = (79 + 80) 2 = 79.5 87, 91, 93, 98, so Q3 = (91 + 93) 2 = 92 53, 79.5, 84.5, 92, 98
five-number summary:
Box-and-Whisker Plot
Definition: A box-and-whisker plot or boxplot is a diagram based on the five-number summary of a data set. To construct this diagram, we first draw an equal interval scale on which to make our box plot. Do not just draw a boxplot shape and label points with the numbers from the 5-number summary. The boxplot is a visual representation of the distribution of the data. Greater distances in the diagram should correspond to greater distances between numeric values. Using the equal interval scale, we draw a rectangular box with one end at Q1 and the other end at Q3. And then we draw a vertical segment at the median value. Finally, we draw two horizontal segments on each side of the box, one down to the minimum value and one up to the maximum value, (these segments are called the "whiskers"). Example 1: Draw a box-and-whisker plot for the data set {3, 7, 8, 5, 12, 14, 21, 13, 18}. From our Example 1 on the previous page, we had the five-number summary: Minimum: 3, Q1 : 6, Median: 12, Q3 : 16, and Maximum: 21.
Notice that in any box-and-whisker plot, the left-side whisker represents where we find approximately the lowest 25% of the data and the right-side whisker represents where we find approximately the highest 25% of the data. The box part represents the interquartile range and represents approximately the middle 50% of all the data. The data is divided into four regions, which each represent approximately 25% of the data. This gives us a nice visual representation of how the data is spread out across the range. Example 2: Draw a box-and-whisker plot for the data set {3, 7, 8, 5, 12, 14, 21, 15, 18, 14}. From our Example 2 on the previous page, we had the five-number summary: Minimum: 3, Q1: 7, Median: 13, Q3: 15, and Maximum: 21.
When we relate two data sets based on the same scale, we may examine box-and-whisker plots to get an idea of how the two data sets compare. Example 3: Suppose that the box-and-whisker plots below represent quiz scores out of 25 points for Quiz 1 and Quiz 2 for the same class. What do these box-and-whisker plots show about how the class did on test #2 compared to test #1?
These box-and-whisker plots show that the lowest score, highest score, and Q3 are all the same for both exams, so performance on the two exams were quite similar. However, the movement Q1 up from a score of 6 to a score of 9 indicates that there was an overall improvement. On the first test, approximately 75% of the students scored at or above a score of 6. On the second test, the same number of students (75%) scored at or above a score of 9.
Median (middle value) = 22 Lower quartile (middle value of the lower half) = 12 Upper quartile (middle value of the upper half) = 36 (If there is an even number of data items, then we need to get the average of the middle numbers.) Step 3: Draw a number line that will include the smallest and the largest data.
Step 4: Draw three vertical lines at the lower quartile (12), median (22) and the upper quartile (36), just above the number line.
Step 5: Join the lines for the lower quartile and the upper quartile to form a box.
Step 6: Draw a line from the smallest value (5) to the left side of the box and draw a line from the right side of the box to the biggest value (53).