STAT101_2025 Week 3 Notes(1)-2
STAT101_2025 Week 3 Notes(1)-2
➢We can group the data into suitable categories that are ordinal,
equal, and non-overlapping.
5
Example 2: Cholesterol Level
The American Heart Association uses the following
classification for total cholesterol levels(mg/dL):
6
Example 3: Body Mass Index
Body mass index (BMI) is computed as the ratio of weight
in kilograms to height in meters squared and the following
categories are often used:
❖ Obese: BMI ≥ 30
𝑹𝒂𝒏𝒈𝒆
𝑪𝒍𝒂𝒔𝒔 𝒘𝒊𝒅𝒕𝒉 =
𝒌
Step 4: Determine the Class Limits
➢Calculate the class limits for each interval.
➢To determine the lower limit of the second class, add the
class width to the lower limit of the first class.
➢To find the lower limit of the third class, add the class width to
the lower limit of the second class.
➢Continue this process until you reach the lower limit of the k-
th class.
Step 5: Create the Frequency Table
➢Count the number of observations falling within each interval
(class) and record the frequency.
𝑹𝒂𝒏𝒈𝒆
𝑨𝒅𝒋𝒖𝒔𝒕𝒆𝒅 𝑪𝒍𝒂𝒔𝒔 𝒘𝒊𝒅𝒕𝒉 = + 𝜹,
𝒌
Where 𝜹 is chosen to ensure the last interval captures the
maximum value.
Data 1:
4,6,8,5,7,3,9,5,8,7,2,6,7,4,5,8,3,7,6,9
Data 2:
18,21,25,27,30,33,35,38,40,42,44,48,50,52,55,58,60,63,
65,68
Visualizing Quantitative data
Here are some visualization techniques commonly used for
numeric data:
𝑼𝑪𝑳𝒋 + 𝑳𝑪𝑳𝒋+𝟏
𝑪𝑩 = ,
𝟐
where
• 𝑗 = 1, 2, … , 𝑘 represents the class number,
• 𝑈𝐶𝐿𝑗 is the upper-class limit of class 𝑗.
• 𝐿𝐶𝐿𝑗+1 is the lower-class limit of class 𝑗 + 1.
𝑈𝐶𝐿𝑗 + 𝐿𝐶𝐿𝑗
𝐶𝑀 = ,
2
OR
𝑈𝐶𝐵𝑗 + 𝐿𝐶𝐵𝑗
𝐶𝑀 =
2
where 𝑗 = 1, 2, … , 𝑘 is the class number,
• 𝑈𝐶𝐿𝑗 is the upper-class limit of class 𝑗.
• 𝐿𝐶𝐿𝑗 is the lower-class limit of class 𝑗.
• 𝑈𝐶𝐵𝑗 is the upper-class boundary of class 𝑗.
• 𝐿𝐶𝐵𝑗 is the lower-class boundary of class 𝑗.
Calculation of the Class Width
➢Given class limits/intervals and/or class boundaries,
class width is calculated as follows:
29
How to ➢ A histogram has a right-skewed
Describe the (positively skewed) distribution if
Shape of it has a “tail” on the right side of
the distribution.
Histograms
30
➢ A frequency polygon
represents the frequency
Frequency Polygon distribution of a numeric
variable using class
midpoints/ class marks.
➢ Each point on the graph
corresponds to the
frequency of data in a
specific class interval.
➢ The points are connected by
straight lines, forming a
polygonal shape.
➢ Frequency polygons are
useful for comparing
multiple distributions and
visualizing the shape of the
data.
➢ The frequency polygon (black line)
connects the midpoints of each
Key observations bar in the histogram.
Use Case:
➢Helps determine percentiles, quartiles, and medians.
➢Answers questions like "How many students scored less than 70?"
More Than Ogive (cumulative frequency(CF))
1 8
2 1 5 7
3 0 3 5 8
4 0 2 4 8
5 0 2 5 8
6 0 3
➢ The stem represents the tens place.
➢ The leaves represent the ones’ place.
➢ The distribution is fairly symmetric.
41
Double Stem-and-Leaf Plots
➢ Used when a single stem has too many leaves
➢ Each stem is split into two parts such that you have stems with
leaves 0-4 and 5-9.
Example:
3 5 7
4 0 0 2 3
4∗ 5 6 6 6 8 9
5 2 3 4
5∗ 5 6 7 8 9
6 1 2 2 4
6∗ 9
7 2 3
7∗ 8
8 1
44
Other Stem-and-Leaf Plots
➢ There are various ways in which stem-and-leaf displays
can be modified.
and
𝟐 |𝟑𝟏 𝟒𝟓 𝟕𝟎 𝟖𝟖
would represent the numbers 231, 245, 270, and 288.
45
The purpose of the scatter Scatter Plot for two
plot is: continuous variables
➢ To identify relationships
(correlation) between two
continuous variables.
46
What type of relationship
(correlation) exists between the
two variables?: Identifying Relationships
➢ Is it positive, negative, or no
correlation?
Positive Correlation
➢ As one variable increases,
the other also increases.
➢ Engine size vs. car weight
(larger engines tend to be in
heavier cars).
Negative Correlation
➢ As one variable increases,
the other decreases.
No Correlation
➢ No visible pattern between
the variables.
➢ Car weight vs. driver’s age
(the weight of a car doesn’t
depend on who drives it). 47
Does the data show a linear trend, Data Variability &
or is it more scattered?:
➢ Can a straight line fit the data Spread, and Linear
well, or is the pattern irregular? Trend
Does the relationship appear to be
strong, moderate, or weak? Why?:
51
IV. Are there any visible clusters of data points?
V. What does clustering suggest about the data?
VI. Are there any points that do not follow the general trend
(outliers)?
VII. Based on the trend, if a new car has an MPG of 32, what
would you predict its weight to be?
52