0% found this document useful (0 votes)
109 views

Chapter 2B QM (PC)

This document discusses various measures of dispersion used to describe the spread or variability in a data set. It defines range, quartile deviation, and percentiles as common measures of dispersion. Range is the difference between the largest and smallest values. Quartile deviation describes the spread using the interquartile range. Percentiles divide a data set into 100 equal parts to describe dispersion. Examples are provided to demonstrate calculating each measure for raw data, ungrouped frequency distributions, and grouped frequency distributions. Measures of dispersion provide additional information beyond measures of central tendency to better understand and compare the variability in different data sets.

Uploaded by

SEOW INN LEE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views

Chapter 2B QM (PC)

This document discusses various measures of dispersion used to describe the spread or variability in a data set. It defines range, quartile deviation, and percentiles as common measures of dispersion. Range is the difference between the largest and smallest values. Quartile deviation describes the spread using the interquartile range. Percentiles divide a data set into 100 equal parts to describe dispersion. Examples are provided to demonstrate calculating each measure for raw data, ungrouped frequency distributions, and grouped frequency distributions. Measures of dispersion provide additional information beyond measures of central tendency to better understand and compare the variability in different data sets.

Uploaded by

SEOW INN LEE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Chapter 2B Data Description (B)

Measures of Dispersion
➢ Measures of dispersion help us to understand the spread or
variability of a set of data. It gives additional information to judge
the reliability of the measure of central tendency and helps in
comparing dispersion that is present in various samples.
➢ Two data sets can have the same mean, the same median, or the
mode and yet they are very different in other respects.
➢ Example: consider the heights (cm) of five employees from each
of the sales and production departments as shown:
Sales department: 183 185 193 193 198
Production department: 170 183 193 193 213

The two groups have the same mean heights, 190.4cm, the same median heights,
193cm, and the same modal heights, 193cm.
Nonetheless, it is clear that the two data sets differ. To describe this difference
quantitatively, we use a measure of dispersion.

➢ There are several commonly used measures of dispersion. They


are range, quartile deviation, variance, standard deviation and
coefficient of variation.
➢ The more spread out or dispersed the data, the larger is the range,
the quartile deviation, the variance and the standard deviation.
Range (Julat)
➢ Range is the difference between the largest and the smallest
observations in a data set.
Range = largest value – smallest value

1
Eg 1: Find the range for the following data.
10 15 17 20 25 29 30 35 38 40 45
Solution: Range = 45 – 10 = 35

For grouped data (discrete) For grouped data (continuous)


Range = Upper class limit of the Range= Upper class boundary of
last class – Lower class the last class – Lower
limit of the first class class boundary of the
first class

Eg 2A: (Discrete) Eg 2B: (Continuous)


The following table shows the Find the range of the following
daily outputs of 80 workers in a frequency distribution regarding
factory. Determine the range. the time spent (in hour) by
Daily outputs No. of students in campus per week.
workers Time spent No. of students
10 – 19 6 0-<6 2
20 – 29 10 6 - < 12 4
30 – 39 30 12 - < 18 10
40 – 49 20 18 - < 24 12
50 – 59 10 24 - < 30 8
60 – 69 4 Solution: Range = 30 – 0 = 30
Solution: Range = 69 – 10 = 59
Advantage of range: It is easy to understand and simple to calculate.
Disadvantage of range: Since only the largest and the smallest
values are considered, it can be very much influenced by them
especially if they are unrepresentative extreme values. (Remember
the influence of extreme value?)

2
Quartile Deviation (semi-interquartile range – ½ IQR)
𝑄3 −𝑄1
QD = where 𝑄1 = lower quartile or first quartile
2

𝑄3 = upper quartile or third quartile


Interquartile range – the different between the third and the first
quartiles.
Interquartile range, IQR = Q3 – Q1
1) For raw data
✓ Arrange the data into an array in ascending order of magnitude.
(𝑛+1)th
✓ Locate the quartile items as: Q1 = item
4
3(𝑛+1)th
Q3 = item
4

Eg 3: The following are the scores of 12 students in a mathematics


class.
75 80 68 53 99 58 76 73 85 88 91 79
a) Find the values of 𝑄1 and 𝑄3 .
b) Find the interquartile range and quartile deviation.

53 58 68 73 75 76 79 80 85 88 91 99
12+1
a) Q1 = ( ) 𝑡ℎ = 3.25th = 68 + 0.25(73 – 68) = 69.25
4
3(12+1)
Q3 = ( ) 𝑡ℎ = 9.75th = 85 + 0.75(88 – 85) = 87.25
4

b) IQR = Q3 – Q1 =87.25 – 69.25 = 18


18
QD = =9
2

3
Eg 4: The following are the ages of nine employees of an insurance
company.
47 28 39 51 33 37 59 24 33
Find the quartile deviation.
24 28 33 33 37 39 47 51 59
9+1
Q1 = ( ) 𝑡ℎ = 2.5th =28 + 0.5(33 – 28) = 30.5
4
3(9+1)
Q3 = ( ) 𝑡ℎ =7.5th = 47 + 0.5(51 – 47) = 49
4
49−30.5
QD = = 9.25
2

2) For grouped data,


Two methods:
a) By Ogive
cf

Value

b) By calculation (Linear interpolation formula)

where L = lower class boundary


c = class size
4
f = frequency
∑ 𝑓𝑄−1 = cumulative frequency

Eg 5: The following frequency distribution shows the daily production


level.
Production(units) No. of days
13 – 17 2
18 – 22 22
23 – 27 10
28 – 32 14
33 – 37 3
38 – 42 4
43 – 47 6
48 – 52 1
Find the quartile deviation using (a) the formula; and (b) an ogive.

Production(units) f Class cf Upper


boundaries boundary
0 12.5
13 – 17 2 12.5 – 17.5 2 17.5
Q1 class
18 – 22 22 17.5 – 22.5 24 22.5
23 – 27 10 22.5 – 27.5 34 27.5
Q3 class 28 – 32 14 27.5 – 32.5 48 32.5
33 – 37 3 32.5 – 37.5 51 37.5
38 – 42 4 37.5 – 42.5 55 42.5
43 – 47 6 42.5 – 47.5 61 47.5
48 – 52 1 47.5 – 52.5 62 52.5

5
62
(a) Q1 = value of the th = value of the 15.5 th item
4
LQ1=17.5
Q1 class: 17.5 – 22.5
CQ=
5
Q1 = 17.5 + 22 [15.5 − 2] = 20.57 fQ1=22

3(62) fQ1-1=2
Q3 = value of the th = value of the 46.5 th item
4

Q3 class: 27.5 – 32.5 LQ3=27.5

5 CQ=
Q3 = 27.5 + 14 [46.5 − 34] = 31.96
fQ3=14
31.96−20.57 fQ3-1=34
Quartile deviation = = 5.695
2

(b)

Q1 = 20.5
25% of the days are having production less than or equal to 20.5
units and the other 75% of the days are more than or equal to 20.5
units.
6
Q3 = 32
75% of the days are having production less than or equal to 32
units and the other 25% of the days are more than or equal to 32
units.
32−20.5
Quartile deviation = = 5.75
2

Advantages of quartile deviation:


It can be computed even though the end values of the distribution are
not known, as with the open-ended classes. Also, it is not influenced
by the extreme values.
Disadvantage of quartile deviation:
It is not fully representative of a set of measurements as it is not
based on all information available.

Range based on Percentiles

Percentiles are the summary measures that divide a ranked data set
into 100 equal parts. There are 99 percentiles in a ranked data set.
Consider n items arranged in ascending order. Then,
𝑘
The kth percentile, Pk = 100 (𝑛 + 1)th value
 P25 = Q1; P50 = Q2; P75 = Q3

7
a) Raw data

Example: Find the P10 and P90 for the following raw data.

63, 105, 30, 43, 53, 73, 65, 77,89, 70, 68, 47, 38, 34, 41, 80, 60, 54, 59

Soln: Arrange the data in ascending order

30, 34, 38, 41, 43, 47, 53, 54, 59, 60, 63, 65, 68, 70, 73, 77, 80, 89, 105

10
P10 = 100 (19 + 1)𝑡ℎ = 2th = 34

90
P90 = 100 (19 + 1)𝑡ℎ = 18th = 89

b) Ungrouped frequency distribution

Example: Find the P10 and P90 for the following distribution.
Marks 10 20 30 40 50 60
Number of 3 9 20 8 5 4
students
Cumulative 3 12 32 40 45 49
frequency

Soln:
10
P10 = 100 (49)𝑡ℎ = 4.9th = 20

90
P90 = 100 (49)𝑡ℎ = 44.1th = 50
8
c) Grouped frequency distribution
𝐶𝑃𝑘 𝑛𝑘
𝑃𝑘 = 𝐿𝑃𝑘 + 𝑓 ( − ∑ 𝑓𝑃𝑘−1 )
𝑃𝑘 100

where 𝐿𝑃𝑘 = lower class boundary of the percentile class


𝐶𝑃𝑘 = the size of the percentile class
𝑓𝑃𝑘 = frequency of the percentile class
∑ 𝑓𝑃𝑘−1 = the cumulative frequency in the class before
the percentile class
n = total frequency

Example: The following frequency distribution shows the daily


production level in a production line. Compute P10 and
P90 using a) formula and b) ogive.
Productions (units) Number of days Upper cf
boundary
12.5 0
13 – 17 2 17.5 2
P10 18 – 22 22 22.5 24
23 – 27 10 27.5 34
28 – 32 14 32.5 48
33 – 37 3 37.5 51
38 – 42 4 42.5 55
P90 43 – 47 6 47.5 61
48 – 52 1 52.5 62

9
Soln:
10(62)
n = 62 P10 = value of th item = value of 6.2th item
100
90(62)
P90 = value of th item = value of 55.8th item
100
L10=17.5

a) P10 class boundaries: 17.5 – 22.5 C10=5


5 f10=22
P10 = 17.5 + [6.2 − 2] =18.4545
22
f10-1=2

P90 class boundaries: 42.5 – 47.5


5
P90 = 42.5 + 6 [55.8 − 55]= 43.1667 L90=42.5
C90=5
f90=6
b)
f90-1=55

10
From the ogive, P10 = 18.5 units and P90 = 43.5 units

Standard Deviation and Variance


✓ The standard deviation, s is a very important and useful measure
of spread. It gives a measure of the deviations of the reading from
the mean, 𝑥̅ . It is calculated using all the values in the distribution.

1) For raw data


(∑ 𝑥)2
∑ 𝑥2 ∑𝑥 2 ∑ 𝑥2 −
𝜎2 = −(𝑁) and 𝑠2 = 𝑛
𝑁 𝑛−1

where 𝜎 2 = the population variance


𝑠 2 = the sample variance
∑𝑥 2 ∑𝑥 2
Population standard deviation : 𝜎 = √𝜎 2 =√ −( )
𝑁 𝑁

(∑ 𝑥)2
∑ 𝑥2 −
Sample standard deviation : 𝑠 = √𝑠 2 =√ 𝑛
𝑛−1

Eg 6: Find the variance and standard deviation for a sample of five


numbers 2, 3, 5, 6, 8.
x2 = 138 x = 24 n=5

11
(∑ 𝑥)2 (24)2
∑ 𝑥2 − 138−
𝑛 5
s2 = = = 5.7
𝑛−1 5−1

s = √5.7 = 2.39

Eg 7: Following are the 1999 earnings (in thousand of dollars) before


taxes for all six employees of a small company.
29.50 16.20 35.45 21.35 49.70 24.60
Calculate the variance and standard deviation for these data.

x2 = 5920.465 x = 176.8 n=6

∑ 𝑥2 ∑𝑥 2 5920.465 176.8 2
 =
2
−(𝑁) = −( ) = 118.460
𝑁 6 6

 = √118.460 = 10.88
2) For grouped data
2
(∑ 𝑓𝑥)
∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2 −
𝜎2 = ∑𝑓
− (∑ ) and 𝑠2 = 𝑛
𝑓 𝑛−1

where 𝜎 2 = the population variance


𝑠 2 = the sample variance
x = midpoint of a class
∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2
Population standard deviation : 𝜎 = √𝜎 2 =√ ∑ 𝑓 − ( ∑𝑓 )

12
2
(∑ 𝑓𝑥)
∑ 𝑓𝑥 2 −
Sample standard deviation : 𝑠 = √𝑠 2 = √ 𝑛
𝑛−1

Eg 8: The following table shows the number of children in all their


families.
No. of children per family, x 1 2 3 4 5
Frequency, f 3 4 8 2 3

Find the standard deviation.

Solution:
x f fx fx2
1 3 3 3
2 4 8 16
3 8 24 72
4 2 8 32
5 3 15 75
f= 20 fx= 58 fx2= 198
13
∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2 198 58 2
St. dev = √ ∑𝑓
− ( ∑ 𝑓 ) = √ 20 − (20) = 1.22

Eg 9: The following data give the frequency distribution of daily


commuting times (in minutes) from home to work for all 25
employees of a company.
Daily commuting Number of
time (minutes) employee
0 -< 10 4
10 -< 20 9
20 -< 30 6
30 -< 40 4
40 -< 50 2
Calculate the variance and standard deviation.
Solution:
x f fx fx2
5 4 20 100
15 9 135 2025
25 6 150 3750
35 4 140 4900
45 2 90 4050
f = 25  fx = 535  fx2 = 14825

14
∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2 14825 535 2
Variance = ∑𝑓
− ( ∑𝑓 ) = − ( 25 ) = 135.04
25

St. dev = √135.04 = 11.621

Eg10: The following data give the frequency distribution of the


number of orders received during the past 50 days at the office
of a mail-order company.

No. of orders f
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14

Calculate the variance and standard deviation.

x f fx fx2
11 4 44 484
14 12 168 2352
17 20 340 5780
20 14 280 5600
f = 50  fx = 832  fx2 = 14216

15
2 2
(∑ 𝑓𝑥) (832)
∑ 𝑓𝑥 2 − 14216−
𝑛 50
Variance = = = 7.582
𝑛−1 50−1

St.dev = √7.582 = 2.75

Coefficient of Variation
➢ Useful for relative comparison.

𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
CV = × 100%
𝑚𝑒𝑎𝑛

Eg 11: Over a period of three months, the daily number of components


produced by two comparable machines was measured, giving
the following statistics.
Machine A: mean = 242.8, standard deviation = 20.5
Machine B: mean = 281.3, standard deviation = 23.0
Find the coefficient of variation of machines A and B. Do
comment on the results.
20.5
CV of Machine A = 242.8 × 100% = 8.44%

23.0
CV of Machine B = 281.3 × 100% = 8.18%

Comment: Machine B is more stable compared to machine A.


The higher the CV, the more variability.

16
Coefficient of Skewness
➢ The term skewness is used to describe the shape of a frequency
distribution.
➢ If the histogram of a frequency distribution is drawn, the
distribution is said to be skewed if the peak of the histogram lies
to either side of the centre of the distribution. The terms positive
and negative skewness are used to describe the direction of the
skewness.
➢ If the mean = mode = median, the distribution of data is said to
be symmetrical else asymmetrical or skewed.
➢ 2 types of asymmetrical frequency distribution:
i) Positive skewed distribution
Mean > Median > Mode
Tail stretches out to the right

ii) Negative skewed distribution


Mean < Median < Mode
Tail stretches out to the left

17
➢ Degree of skewness, Pearson’s coefficient of skewness
3(𝑀𝑒𝑎𝑛−𝑀𝑒𝑑𝑖𝑎𝑛)
𝑆𝑘 = 𝑆𝐷

For population data, For sample data,


̃)
3(𝜇−𝜇 3(𝑥̅ −𝑥̅̃)
𝑆𝑘 = 𝑆𝑘 =
𝜎 𝑠

Range of Sk is [-3,3]
For symmetrical distribution, Sk = 0

Example: If the 𝑥̅ = 30.9, 𝑥̅̃ = 28.8 and s = 13.23. Find


Sk and interpret the value.

Soln:
3(𝑥̅ −𝑥̅̃) 3(30.9−28.8)
𝑆𝑘 = = = 0.476
𝑠 13.23

The distribution is slightly skewed to the right of 0.476.

Computer Application – Using Excel


Example
Table shows a sample of 50 final exam scores taken from last
semester’s elementary statistics class.

18
To get some basic statistics from the data, follow the following
procedure:

19
20

You might also like