Lec_4 (Summary Data)
Lec_4 (Summary Data)
02/09/2025 1
Learning objectives
At the end of this chapter, the student will be able to:
– Identify the different methods of data
summarization
– Compute appropriate summary values for a set of
data
– Identify the properties & limitations of summary
values
02/09/2025 2
Introduction
• Although frequency distributions serve useful purposes,
02/09/2025 3
Cont..
• Descriptive methods can be done by using a data of a
samples or a data from the population, to distinguish
between them we have the following definitions.
Statistic -descriptive measure computed from sample data
Parameter -descriptive measure computed from population
data.
02/09/2025 4
Numerical summary measures
02/09/2025 5
Measures of Central Tendency
02/09/2025 7
1. Arithmetic mean
02/09/2025 8
If the variable x assumes n values x1, x2 … xn then the mean, x, is given by:
02/09/2025 9
Cont..
02/09/2025 10
B. Grouped Data
• In calculating the mean from grouped data, we assume that all
• It is calculated as follows:
• where,
– k = the number of class intervals
– mi = the mid-point of the ith class interval
– fi = the frequency of the ith class interval
02/09/2025 11
Example. Compute the mean age of 169 subjects from the grouped data.
02/09/2025 12
The mean can be thought of as a “balancing point”, “center of
gravity”
13
Properties of the Arithmetic Mean
02/09/2025 15
2. Median
• The median is the value which divides the data set into two
equal parts.
02/09/2025 16
Cont..
02/09/2025 18
Cont..
E.g. 19 20 20 21 22 23 24 27 27 27 34 n=11
E.g. 19 2 0 20 21 22 24 27 27 27 34 n= 10
02/09/2025 19
cont..
20
Cont..
21
Exercise
02/09/2025 22
Cont..
Solution
• Step 1: Arrange the data in order.
02/09/2025 24
Cont..
02/09/2025 25
E.g. Compute the median age of 169 subjects from the grouped data.
02/09/2025 26
Cont..
n/2 = 169/2 = 84.5
84.5 = in the 3rd class interval
Lower class boundary= 29.5, Upper class
boundary = 39.5
Frequency of the class = 47
Fc = 70
Median = 29.5 + (84.5-70 /47)10 = 32.58 ≈ 33
02/09/2025 27
Properties of median
02/09/2025 28
3. Mode
20
18
16
14
12
10
8
6
4
2
0
30
Cont..
Example-1
Data are: 1, 2, 3, 4, 4, 4, 4, 5, 5, 6
Mode is 4 “Unimodal”
Example-2
Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12
No mode, since all the values are different
02/09/2025 31
Cont..
Example-3
Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8
There are two modes = 2 & 5
This distribution is said to be “bi-modal”
02/09/2025 32
Mode of grouped data
02/09/2025 34
E.g. Find the mode for the following data
02/09/2025 35
Cont..
Solution
Lmo=19.5,F=66,Fa=47,Fb=4,i=10
Mode=19.5+((66-47)/66-47+66-4))10
=21.8=22
02/09/2025 36
Properties of mode
38
cont..
39
Cont..
41
cont..
Bimodal — Mean and median should be about
the same, but may take a value that is unlikely
to occur; two modes might be best
42
cont..
Skewed to the right (positively skewed) —Mean
is sensitive to extreme values, so median
might be more appropriate
Mode
Median
Mean
43
cont..
Skewed to the left (negatively skewed) — The
same to previous
Mode
Median
Mean
44
cont..
When the data are skewed, the mean is “dragged” in the direction of the
skewness
It is possible in extreme cases for all but one of the sample points to be on
one side of the arithmetic mean & in this case, the mean is a poor measure of
central location or does not reflect the center of the sample.
45
Measures of variation (dispersion)
Measures that quantify the variation or dispersion of a set of data from its
central location.
Dispersion of a set of observations is the variety exhibited by the
observations
1. If all the values are the same→ There is no dispersion
2. If all the values are different → There is a dispersion
3. If the values close to each other →The amount of dispersion is small
4. If the values are widely scattered/spread → The dispersion is greater
02/09/2025 46
Cont..
Common measures of dispersion
1. Range
2. Inter quartile range
3. Variance
4. Standard deviation
5. Coefficient of variation
02/09/2025 47
Range
• The difference between the largest and smallest
observations in a sample.
Example:
Range = 42-5 = 37
02/09/2025 50
Con…
02/09/2025 51
Cont..
c) The third quartile (Q3): 75% of the
observations are less than or equal to the third
quartile and 25% of the observation are greater
than or equal to the third quartile.
02/09/2025 52
Cont..
IQR is used when the median is used as the measure of central
tendency.
It gives the range in which the middle 50% of the distribution
lies.
The inter-quartile range quantifies the difference between the
IQR = Q3 - Q1
02/09/2025 54
Cont..
E.g1 :Given these data: 13, 7, 9, 15, 11, 5, 8, 4 find IQR?
a. Arrange the observations in increasing order.
4, 5, 7, 8, 9, 11, 13, 15.
b. Find the position of the 1st and 3rd quartiles.
n=8.
Position of Q1 = ¼ (n+1) = ¼ (8+1) = 2.25th
02/09/2025 56
Cont..
the value of Q3 is equal to the value of the 6th
02/09/2025 58
E.g 2: Suppose we have a small data set of twelve observations
15 18 19 20 20 20 21 23 23 24 24 25
• we want to divide the data into four equal sets
• First, we find the median
15 18 19 20 20 20 ↑ median 21 23 23 24 24 25
• median = 20.5 (half way between the 6th and 7th observations),
• divides the data into two equal sets with exactly 50% of the
observations in each: the 1st to the 6th observations in the first set
and the 7th to 12th observations in the other.
02/09/2025 59
Con…
• To find the first quartile we consider the
observations less than the median.
• 15 18 19 20 20 20
• The first quartile is the median of these
data.
• In this case, the first quartile is half way
between the 3rd and 4th observations and
is equal to 19.5.
02/09/2025 60
Cont..
Now, we consider the observations which are
21 23 23 24 24 25
The third quartile is the median of these data and
is equal to 23.5.
15 18 19 ↑ 20 20 20 ↑ 21 23 23 ↑ 24 24 25
Q1 Q2 Q3
Apply the same method with Lm = lower true class boundary of the
median interval containing the quartile
Q1= Q1L+((n/4-fc)/fQ1)i Fc = cumulative frequency of the
specific values
It is important in selecting cut-off points in the
0 =å ( x i - x)
variance.
02/09/2025 66
Cont..
02/09/2025 67
Cont..
Degrees of freedom
In computing the variance there are (n-1) degrees
Example
Data: 43,66,61,64,65,38,59,57,57,50.
Find Sample Variance of the data , mean =
56
S2= [(43-56) 2
+(66-56)2+…..+(50-56) 2
]/10-
1
= 810/9 = 90
02/09/2025 69
Cont..
Variance for grouped data
where
02/09/2025 71
Properties of variances
· The main disadvantage of variance is that its unit is the
deviation.
02/09/2025 72
Standard Deviation
02/09/2025 73
Cont..
02/09/2025 74
Cont..
02/09/2025 75
Cont..
02/09/2025 76
Properties of SD
curve.
02/09/2025 78
Coefficient of variation (CV)
multiplied by 100.
02/09/2025 81
Cont..
• CV also used to compare two or more sets of
02/09/2025 82
Distributions
Normal distribution
It is symmetric about its mean/one half of
the curve is the mirror image of the other
half
The highest point is at its mean
The height of the curve decreases as one
moves away from the mean in either
direction, approaching, but never reaching
zero
02/09/2025 83
Cont..
02/09/2025 84
Cont..
Skewed distributions
The data are not distributed symmetrically in
skewed distributions
the mean, median, and mode are not equal and
02/09/2025 87
Which measures to use?
02/09/2025 88
Thank you!!!
02/09/2025 89