0% found this document useful (0 votes)
12 views

The Five-Number Summary and Boxplots

The document explains the five-number summary, which includes the minimum, lower quartile, median, upper quartile, and maximum of a dataset, and illustrates its application through examples. It also describes boxplots as visual representations of the five-number summary, detailing how they depict data distribution and identify outliers. Additionally, it introduces the concept of fences to classify outliers based on the interquartile range (IQR).

Uploaded by

zengyangru
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

The Five-Number Summary and Boxplots

The document explains the five-number summary, which includes the minimum, lower quartile, median, upper quartile, and maximum of a dataset, and illustrates its application through examples. It also describes boxplots as visual representations of the five-number summary, detailing how they depict data distribution and identify outliers. Additionally, it introduces the concept of fences to classify outliers based on the interquartile range (IQR).

Uploaded by

zengyangru
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

The Five-Number Summary and Boxplots

The Five-Number Summary


The five-number summary is a set of statistics that provide
information about a dataset.

It consists of the five most important sample percentiles:


• Minimum (smallest value)
• 𝑄 : lower quartile (first quarter value)
• Median (middle value)
• 𝑄 : upper quartile (third quarter value)
• Maximum (largest value)

Example Modified VCAA 2004 Exam 1 Question 2


The marks obtained by 26 students who sat for a test are displayed as an ordered stemplot as shown.
The five-number summary of these test marks is

The miminum is 9 and the maximum is 50

26 35 + 37
⎯⎯⎯= 13: group data into 13s, Median: ⎯⎯⎯⎯⎯⎯⎯= 36
2 2

13
⎯⎯⎯= 6.5: group data into 6s − exclude the middle of the ends,
2

𝑄 : 30, 𝑄 : 43, ∴ 9, 30, 36, 43, 50

Example Modified VCAA 2002 Exam 1 Question 7


The following data was recorded from 12 men: 26, 29, 32, 32, 34, 37, 39, 40, 41, 43, 45, 51
For these men, the five-number summary of these ages is

The miminum is 26 and the maximum is 51

12 ÷ 2 = 6: group data into 6s: 26, 29, 32, 32, 34, 37 | 39, 40, 41, 43, 45, 51, 𝑀 = 38

6 ÷ 2 = 3: group data into 3s: 26, 29, 32 | 32, 34, 37 | 39, 40, 41 | 43, 45, 51

𝑄 = 32, 𝑄 = 42, ∴ 26, 32, 38, 42, 51

Boxplots (Box-and-Whisker Plots)


Boxplots (Box-and-Whisker Plots)
A visual representation of the five-number summary of a numerical
variable. A box is constructed between the upper and lower
quartiles with a vertical dividing line where the median is located.
From the ends of the box lines (called whiskers) as extended out to
the maximum and minimum values.

Percentage Distribution of Boxplots


The median splits the data into 2 equal sections.
Therefore, there is 50% of the data is above the median and 50% below.

The quartiles split each 50% section in half creating 25% sections.
Therefore, each whisker and each part of the box represent 25% of the
data, regardless of the shape and the whole box, which represents the IQR,
is 50% of the data.

Example VCAA 2016 Question 2a


The weather station also records daily maximum temperatures. The five-number summary for the
distribution of maximum temperatures for the month of February is displayed in the table below.
There are no outliers in this distribution.
The five-number summary has been used to construct a boxplot on the grid below.

Temperature (°C)
The percentage of
days that had a Minimum 16
maximum 𝑸𝟏 21
temperature of 21°C, Median 25
or greater, in this
particular February is 𝑸𝟑 31
75%. Maximum 38

Example VCAA 2010 Exam 1 Question 2


To test the temperature control on an oven, the
control is set to 180 °C and the oven is heated for 15
minutes. The temperature of the oven is then
measured. Three hundred ovens were tested in this
way. Their temperatures were recorded and are
displayed below using a boxplot. The interquartile
range for temperature is 181.7 − 179 = 2.7 °C.

Shape of a Boxplot
We describe boxplots in terms of symmetry and skew.

Negatively Skewed (Approximately) Symmetric Positively Skewed

Example VCAA 2008 Exam 1 Question 2


The box plot below shows the distribution of the time, in
seconds, that 79 customers spent moving along a
particular aisle in a large supermarket. The shape of the
distribution is best described as positively skewed with
outliers.
Outliers
Outliers are data values that are well outside where the majority of the data set lies. That is, data values
that are much greater or much lesser than the rest of the data.

On a boxplot, outliers are generally drawn as dots or crosses, and the whisker ends at the previous data
value before the outlier(s). Outliers will be minimum and maximum values if present.

Example VCAA 2008 Exam 1 Question 1


The box plot below shows the distribution of the time, in seconds, that 79 customers spent moving
along a particular aisle in a large supermarket.

Example VCAA 2017 NHT Exam 2 Question 1c


A 1 m solar array is located at a weather station.
The total amount of energy generated by the solar
array, in megajoules, is recorded each month. The
data for the month of February for the last 22 years is
displayed in the dot plot below.

For the data in the dot plot above, the five-number


summary is 17.1, 20, 21, 21.8, 24.6. The data value
17.1 is an outlier. Using the data in the dot plot, a
boxplot has been constructed below.
Note, the left whisker ends at 18.2

Example VCAA 2017 Exam 1 Question 1 & 2


The boxplot below shows the distribution of the
forearm circumference, in centimetres, of 252
people.

The percentage of these 252 people with a forearm


circumference of less than 30 cm is closest to 75%

The five-number summary for the forearm


circumference of these 252 people is
approximately 21, 27.4, 28.7, 30, 35.9
Classifying Data Values as Outliers using Fences
Fences are used to determine what data points are considered outliers and what data points are not.
They are positioned 1.5 × 𝐼𝑄𝑅 towards the extremes from the lower and upper quartiles.

Lower Fence Upper Fence


Outliers are values less than 𝑄 − 1.5 × 𝐼𝑄𝑅 Outliers are values greater than 𝑄 + 1.5 × 𝐼𝑄𝑅

Example VCAA 2011 Exam 2 Question 1


The stemplot shows the distribution of the average
age, in years, at which men first marry in 17
countries. For these countries, the interquartile
range (IQR) for the average age of men at first
marriage is 31 − 29.9 = 1.1.

If the data values displayed in Figure 2 were used to


construct a boxplot with outliers, then the country
for which the average age of men at first marriage is
26.0 years would be shown as an outlier.
26.0 is on the lower side. So, determine the position of the lower fence: 29.9 − 1.5 × 1.1 = 28.25
Since 26.0 is less than 28.25, it is classified as an outlier.

Example VCAA 2018 NHT Exam 2 Question 1b


The dot plot and boxplot
below display the distribution
of skull length, in millimetres,
for a sample of the same
species of bird.

Use information from the plots


above to show why the bird
with a skull length of 33.5 mm
is not plotted as an outlier in
the boxplot.

𝐼𝑄𝑅 = 31.6 − 30.3 = 1.3

33.5 is on the upper side. So, determine the position of the upper fence: 31.6 + 1.5 × 1.3 = 33.55
Since 33.5 is less than 33.55, it is not classified as an outlier.

Example
The lower and upper fences of a set of data are 16.5 and 98.4 respectively.
The interquartile range of this set of data is
The fences are 4 interquartile ranges apart so, 𝐼𝑄𝑅 = (98.4 − 16.5) ÷ 4 = 20.475

You might also like