0% found this document useful (0 votes)
9 views26 pages

STA101 Lecture 8 (1)

The document outlines a statistics course (STA101) taught by Farzana Zaman at BRAC University, focusing on key concepts such as box-and-whisker plots, stem-and-leaf plots, skewness, and kurtosis. It includes examples and exercises to illustrate how to detect outliers, analyze data distributions, and understand the shape of distributions. The course emphasizes practical applications of statistical techniques in analyzing data sets.

Uploaded by

fahmidrahman3675
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views26 pages

STA101 Lecture 8 (1)

The document outlines a statistics course (STA101) taught by Farzana Zaman at BRAC University, focusing on key concepts such as box-and-whisker plots, stem-and-leaf plots, skewness, and kurtosis. It includes examples and exercises to illustrate how to detect outliers, analyze data distributions, and understand the shape of distributions. The course emphasizes practical applications of statistical techniques in analyzing data sets.

Uploaded by

fahmidrahman3675
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Course: STA101

Introduction to Statistics

Farzana Zaman
Adjunct Faculty (Statistics)
Department of Mathematics and Natural Sciences (MNS)
BRAC University

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 1/28


Outline

Lecture 08

❖ Box-and-Whisker Plot
– Outlier and its detection with box plot
❖ Stem and leaf plot
❖ Measures of Shape Distribution
– Skewness
– Kurtosis

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 2/28


Shape of the Distribution Based on the Relation between
Mean, Median, and Mode

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 3/28


Box-and-Whisker Plot

A box-and-whisker plot is a graphical display, based on quartiles,


that helps us picture a set of data.
It shows the center, spread, and skewness of a data set.
It is constructed by drawing a box and two whiskers.
To construct a box plot, we need only five statistics:
– the minimum value ,
– Q1 (the first quartile),
– the median (or the second quartile, Q2 ),
– Q3 (the third quartile), and
– the maximum value.
These quantities are known as the five-number summary of a
distribution.
It also helps detect outliers.
We can compare different distributions by making box-and-whisker
plots for each of them.
STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 4/28
Box-and-Whisker Plot

Example of a) Symmetric distribution, b) Left-skewed distribution, and c)


Right-skewed Distribution:

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 5/28


Example 1

Q. The following data are the incomes (in thousands of dollars) for a
sample of 12 house holds:
75 69 84 112 74 104 81 90 94 144 79 98
Construct a box-and-whisker plot for these data.

Solution:
The following steps are performed to construct a box-and-whisker plot.
First, rank the data in increasing order and calculate the values of the
median, the first quartile, the third quartile.
The ranked data are
69 74 75 79 81 84 90 94 98 104 112 144
For these data, the five statistics are:
Lowest Value= 69, Highest Value = 144,
Median= (84+90)
2 = 87, First Quartile, Q1 = 75+79
2 = 77, Third
98+104
Quartile, Q3 = 2 = 101
STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 6/28
Example 1

The box-and-whisker plot for this dataset can be shown as:

Comment: In the figure, about 50% of the data values fall within the
box, about 25% of the values fall on the left side of the box, and about
25% fall on the right side of the box. Also, 50% of the values fall on the
left side of the median and 50% lie on the right side of the median.
The data of this example are positively skewed as the lower 50% of the
values are spread over a smaller range than the upper 50% of the values.
In other words, the left whisker/tail is smaller than the right whisker/tail.
STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 7/28
Outlier Detection with Box Plot

Outliers: The points lying beyond 1.5 times the inter-quartile range (i.e.
above Q3 and below Q1 ) are known as outliers.

Steps for detecting outliers/ extreme values:


Step 1: Calculate IQR and obtain 1.5×IQR
For previous example, the inter-quartile range can be calculated as,
IQR = Q3 − Q1 = 101 − 77 = 24
Then, 1.5 × IQR = 1.5 × 24 = 36
Step 2: Calculate the Lower Fence & Upper Fence values
Lower fence, LF= Q1 − 1.5 × IQR = 77 − 36 = 41
Upper fence, UF= Q3 + 1.5 × IQR = 101 + 36 = 137
That means, we will consider a value/ observation as outlier if it is
below 41 or it is above 137.
Therefore, the observation 144 is an outlier in this example.

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 8/28


Example 2

Q. Given the following information obtained from two sets of data.

Set 1: Median = 10, Lower quartile =8, Upper quartile = 15, Lowest
value=6, Highest value = 19.

Set 2: Median=10, Lower quartile = 7, Upper quartile = 13, Lowest


value = 4, Highest value = 16.

Draw a box plot to represent these data and comment on the distributions.

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 9/28


Example 2

Solution:
The box plot for the two sets of data can be represented as:

Interpretation: The median for both sets is the same. However, the
values in the set 2 is more evenly distributed with a smaller range.
There is a bigger spread of values for set 1 and the distribution for this set
1 is positively skewed.

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 10/28


Exercise

*** For more exercises:


Statistical Techniques in Business and Economics- Douglas A Lind,
William G. Marchal & Samuel A. Wathen.
- Exercise 15-18 [Pg-109]

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 11/28


Stem-and-Leaf Plots

Stem-and-Leaf Plots

In a stem-and-leaf display of quantitative data, each value is divided


into two portions— a stem and a leaf.
The leaves for each stem are shown separately in a display.
Steps for construction of a stem-and-leaf plot:
1. Select one or more leading digits for the stem values. The trailing
digits become the leaves.
2. List possible stem values in a vertical column.
3. Record the leaf for each observation beside the corresponding stem
value.
4. Indicate the units for stems and leaves someplace in the display.

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 12/28


Stem-and-Leaf Plots

Example:Two Digit Numbers

Solution:

Figure: a) Unranked and b) Ranked Stem-and-leaf display of test scores

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 13/28


Stem-and-Leaf Plots

Example

Q. Listed following is the number of people attending each of the 45


performances at the Theater of the Republic last year.

Organize the data into a stem-and-leaf display.


Around what values does attendance tend to cluster?
What is the smallest attendance? The largest attendance?

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 14/28


Stem-and-Leaf Plots

Example
The final stem-and-leaf display would appear as follows, where we have
sorted all of the leaf values.

Several conclusions can be drawn from the stem-and-leaf display:


The minimum number of people attending is 88 and the maximum is 156.
There were two performances with less than 90 people attending, and
three performances with 150 or more.
There were fifteen performances with at tendance between 110 and 119
and eight performances between 120 and 129. Within the 120 to 129
group the actual attendances were spread evenly throughout the class.....
STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 15/28
Skewness

Skewness

An important characteristic of a distribution is the shape. Commonly


observed shapes are: symmetric, positively skewed, negatively
skewed.
In a symmetric distribution the mean and median are equal and the
data values are evenly spread around these values.
The shape of the distribution below the mean and median is a mirror
image of distribution above the mean and median.
A distribution of values is skewed to the right or positively skewed
if there is a single peak, but the values extend much farther to the
right of the peak than to the left of the peak.
In this case, the mean is larger than the median.
In a negatively skewed distribution there is a single peak, but the
observations extend farther to the left, in the negative direction, than
to the right.
In a negatively skewed distribution, the mean is smaller than the
median.
STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 16/28
Skewness

Skewness Examples

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 17/28


Skewness

Measures of Skewness

The simplest measure of skewness is Pearson’s coefficient of skewness.

Pearson’s Coefficient of Skewness

3(x̄ − Median)
sk =
s

where:
x̄ denotes the sample mean.
Median is the median of the dataset.
s denotes the sample standard deviation.
The coefficient of skewness can range from -3 to 3.
A value near −3, such as −2.57, indicates considerable negative
skewness.
A value such as 1.63 indicates moderate positive skewness.
A value of 0 (when the mean and median are equal) indicates the
distribution is symmetrical, meaning there is no skewness present.
STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 18/28
Skewness

Measures of Skewness

Another measure of skewness is Bowley’s Quartile coefficient of


skewness which is based on three quartiles Q1 , Q2 , and Q3 .

Bowley’s Quartile Coefficient of Skewness

Q3 − 2Q2 + Q1
Q3 − Q1

This coefficient of skewness can range from -1 to +1 and is zero for


a symmetric distribution.

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 19/28


Skewness

Example

Q.Compute the mean, median, and standard deviation. Find the


coefficient of skewness using Pearson’s estimate. What is your conclusion
regarding the shape of the distribution?
Solution: The mean can be calculated as:
∑x 74.26
x̄ = = = $4.95
n 15

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 20/28


Skewness

Example

The median is the middle value in a set of data, arranged from smallest to
largest. In this case, there is an odd-number of observations, so the middle
value is the median. It is $3.18.
The sample standard deviation can be calculated as:

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 21/28


Skewness

Exercise

Q. A sample of five data entry clerks employed in the Horry County Tax
Office revised the following number of tax records last hour: 73, 98, 60,
92, and 84.
(a) Find the mean, median, and the standard deviation.
(b) Compute the coefficient of skewness using Pearson’s method.
(c) What is your conclusion regarding the skewness of the data?

*** For more exercises:


Statistical Techniques in Business and Economics- Douglas A Lind,
William G. Marchal & Samuel A. Wathen.
- Exercise 19-22 [Pg-113]

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 22/28


Kurtosis

Introduction to Kurtosis

Kurtosis is a statistical measure that describes the shape of a


distribution’s tails in comparison to a normal/ symmetric distribution.
It provides insights into the sharpness or flatness of the data
distribution.
The measure focuses on the extent of outliers in the data.

Key Definition

Kurtosis measures the ”tailedness” of the probability distribution of


a real-valued random variable.

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 23/28


Kurtosis

Types of Kurtosis

Kurtosis Formula
µ4
k=
µ22

There are three main types of kurtosis based on the comparison to a


normal distribution:
Mesokurtic:
Distributions with kurtosis similar to a normal distribution (k = 3).
Example: Standard normal distribution.
Leptokurtic:
Distributions with heavier tails and a sharper peak (k > 3).
Indicates more extreme outliers.
Platykurtic:
Distributions with flatter tails and a wider peak (k < 3).
Indicates fewer extreme outliers.
STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 24/28
Kurtosis

Graphical Representation

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 25/28


References

❑ Statistical Techniques in Business and Economics- Douglas A Lind,


William G. Marchal & Samuel A. Wathen.
❑ Introductory Statistics- Neil A. Weiss.
❑ Introductory Statistics- PREM S. MANN

STA101 - Spring ’25 Prepared by Farzana Zaman (FZZ) 26/28

You might also like