STA101 Lecture 8 (1)
STA101 Lecture 8 (1)
Introduction to Statistics
Farzana Zaman
Adjunct Faculty (Statistics)
Department of Mathematics and Natural Sciences (MNS)
BRAC University
Lecture 08
❖ Box-and-Whisker Plot
– Outlier and its detection with box plot
❖ Stem and leaf plot
❖ Measures of Shape Distribution
– Skewness
– Kurtosis
Q. The following data are the incomes (in thousands of dollars) for a
sample of 12 house holds:
75 69 84 112 74 104 81 90 94 144 79 98
Construct a box-and-whisker plot for these data.
Solution:
The following steps are performed to construct a box-and-whisker plot.
First, rank the data in increasing order and calculate the values of the
median, the first quartile, the third quartile.
The ranked data are
69 74 75 79 81 84 90 94 98 104 112 144
For these data, the five statistics are:
Lowest Value= 69, Highest Value = 144,
Median= (84+90)
2 = 87, First Quartile, Q1 = 75+79
2 = 77, Third
98+104
Quartile, Q3 = 2 = 101
STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 6/28
Example 1
Comment: In the figure, about 50% of the data values fall within the
box, about 25% of the values fall on the left side of the box, and about
25% fall on the right side of the box. Also, 50% of the values fall on the
left side of the median and 50% lie on the right side of the median.
The data of this example are positively skewed as the lower 50% of the
values are spread over a smaller range than the upper 50% of the values.
In other words, the left whisker/tail is smaller than the right whisker/tail.
STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 7/28
Outlier Detection with Box Plot
Outliers: The points lying beyond 1.5 times the inter-quartile range (i.e.
above Q3 and below Q1 ) are known as outliers.
Set 1: Median = 10, Lower quartile =8, Upper quartile = 15, Lowest
value=6, Highest value = 19.
Draw a box plot to represent these data and comment on the distributions.
Solution:
The box plot for the two sets of data can be represented as:
Interpretation: The median for both sets is the same. However, the
values in the set 2 is more evenly distributed with a smaller range.
There is a bigger spread of values for set 1 and the distribution for this set
1 is positively skewed.
Stem-and-Leaf Plots
Solution:
Example
Example
The final stem-and-leaf display would appear as follows, where we have
sorted all of the leaf values.
Skewness
Skewness Examples
Measures of Skewness
3(x̄ − Median)
sk =
s
where:
x̄ denotes the sample mean.
Median is the median of the dataset.
s denotes the sample standard deviation.
The coefficient of skewness can range from -3 to 3.
A value near −3, such as −2.57, indicates considerable negative
skewness.
A value such as 1.63 indicates moderate positive skewness.
A value of 0 (when the mean and median are equal) indicates the
distribution is symmetrical, meaning there is no skewness present.
STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 18/28
Skewness
Measures of Skewness
Q3 − 2Q2 + Q1
Q3 − Q1
Example
Example
The median is the middle value in a set of data, arranged from smallest to
largest. In this case, there is an odd-number of observations, so the middle
value is the median. It is $3.18.
The sample standard deviation can be calculated as:
Exercise
Q. A sample of five data entry clerks employed in the Horry County Tax
Office revised the following number of tax records last hour: 73, 98, 60,
92, and 84.
(a) Find the mean, median, and the standard deviation.
(b) Compute the coefficient of skewness using Pearson’s method.
(c) What is your conclusion regarding the skewness of the data?
Introduction to Kurtosis
Key Definition
Types of Kurtosis
Kurtosis Formula
µ4
k=
µ22
Graphical Representation