0% found this document useful (0 votes)
8 views

Chapter 1.3 Data description (B)

The document discusses measures of dispersion in statistical analysis, including range, quartile deviation, standard deviation, and variance, which help assess the variability of data sets. It provides examples and formulas for calculating these measures, emphasizing their importance in understanding data distribution and reliability. Additionally, the document outlines advantages and disadvantages of each measure, highlighting the significance of quartiles and standard deviation in data analysis.

Uploaded by

Zi Chen Aah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Chapter 1.3 Data description (B)

The document discusses measures of dispersion in statistical analysis, including range, quartile deviation, standard deviation, and variance, which help assess the variability of data sets. It provides examples and formulas for calculating these measures, emphasizing their importance in understanding data distribution and reliability. Additionally, the document outlines advantages and disadvantages of each measure, highlighting the significance of quartiles and standard deviation in data analysis.

Uploaded by

Zi Chen Aah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

BAMS2923 STATISTICAL ANALYSIS FOR BUSINESS

CHAPTER 1: DATA DESCRIPTION (B)

MEASURES OF DISPERSION

• Measures of dispersion help us to understand the spread or


variability of a set of data. It gives additional information to judge
the reliability of the measure of central tendency and helps in
comparing dispersion that is present in various samples.
• Two data sets can have the same mean, the same median, or
the same mode and yet they are very different in other respects.
• Example: consider the heights (cm) of five employees from each
of the sales and production departments as shown:
Sales department: 183 185 193 193 198
Production department: 170 183 193 193 213
The two groups have the same mean height, 190.4 cm, the
same median heights, 193 cm, and the same modal height, 193
cm. Nonetheless, it is clear that the two data sets differ. To
describe this difference quantitatively, we use a measure of
dispersion.

• There are several commonly used measures of dispersion.


They are range, quartile deviation, variance, standard deviation
and coefficient of variation.

• The more spread out or dispersed the data, the larger is the
range, the quartile deviation, the variance and the standard
deviation.

1
Range
• Range is the difference between the largest and the smallest
observations in a data set.
Range = largest value – smallest value

Example: Find the range for the following data.


10 15 17 20 25 29 30 35 38 40 45

Solution: Range = 45 – 10 = 35

• For grouped data,


Range = Upper class boundary – lower class boundary
of the last class of the first class

Example
The following table shows the daily outputs of 80 workers in a
factory. Determine the range.
Daily outputs Number of workers
10 – 19 6
20 – 29 10
30 – 39 30
40 – 49 20
50 – 59 10
60 – 69 4
Solution: Range = 69.5 – 9.5 = 60 units
• Advantage:
It is easy to understand and simple to calculate.
• Disadvantage:
Since only the largest and the smallest values are considered, it
can be very much influenced by them especially if they are
unrepresentative extreme values.

2
Quartile deviation (Semi-inter quartile range)
• The disadvantage of the range can be overcome by ignoring the
extreme values. This is done by ignoring the top and the bottom
quarters and considering only the range between the quartiles
(called the inter-quartile range).
Inter-quartile range = Q3 − Q1
• If the inter-quartile range is divided by two, the figure obtained is
called semi inter-quartile range or quartile deviation which gives
the average amount by which Q1 and Q3 differ from the median.
Quartile deviation = semi-inter quartile range
Q3 − Q1
=
2
• Definition of quartiles:
(a) The first quartile, also called the lower quartile, is denoted
by Q1. It is defined as the value of an item one-quarter of the
way through a distribution
(b) The third quartile, also called the upper quartile, is
denoted by Q3. It is defined as the value of an item three-
quarter of the way through a distribution.
Quartiles divide the data into 4 equal parts. Thus with the
quartiles known, we can say that a quarter of the observations
lies below the first quartile. A quarter lies above the third quartile
while half of the observations lies between the two quartiles.

• Computation of the quartiles:


(a) Raw data
1. Arrange the data into an array in ascending order of
magnitude.
2. Locate the quartile items as:

3
n+1
Q1 = value of th item
4
3( n + 1)
Q3 = value of th item
4
where n = no. of items in a data set.

Example
The following array shows the daily wages (in RM) of ten factory
workers: 20, 25, 26, 30, 32, 36, 38, 38, 40, 45
Calculate (1) the quartiles;
(2) the inter-quartile range;
(3) the quartile deviation.

Solution n = 10
n +1 10 + 1
(1) Q1= value of th item = value of th item
4 4
= value of 2.75thitem = 2nd item + 0.75 (3rd –2nd )
= 25+0.75(26-25)=25.75 (RM)
3(n + 1) 3(10 + 1)
Q3= value of th item = value of th item
4 4
= value of 8.25thitem =8th item+0.25(9th -8th )
=38 + 0.25(40-38)=38.5 (RM)
(2) The inter-quartile range = Q3 – Q1
=38.5 – 25.75= 12.75 (RM)
12.75
(3) The quartile deviation = 2
= 6.375 (RM)

• Advantages:
It can be computed even though the end values of the
distribution are not known, as with the open-ended classes.
It is not influenced by the extreme values.

4
• Disadvantage:
It is not fully representative of a set of measurements as it is
not based on all the information available.
(b) Grouped data

1. Calculate the cumulative frequencies to position the


items in ascending order.
2. Locate the quartiles classes as:
n
Q1 = value of th item
4
3n
Q3 = value of th item
4

3. Find the quartiles using (1) ogive;


(2) linear interpolation formula:
cQ1  n 
Q1 = LQ1 +  −  fQ1 −1 
fQ1  4 
where LQ1 = lower class boundary of Q1 class
cQ1 = class size of Q1 class
f Q1 = frequency of Q1 class
 fQ1 −1 = cum. freq. of the preceding Q1 class

cQ3  3n 
Q3 = LQ3 +  −  f Q3 −1 
f Q3  4 
where LQ3 = lower class boundary of Q3 class
cQ3 = class size of Q3 class
f Q3 = frequency of Q3 class
 f Q3 −1 = cum. freq. of the preceding Q3 class

5
Example
The following frequency distribution shows the daily production
level.
Production (units) No. of days
13 – 17 2
18 – 22 22
23 – 27 10
28 – 32 14
33 – 37 3
38 – 42 4
43 – 47 6
48 – 52 1
Calculate the quartile deviation using
(a) the linear interpolation method;
(b) an ogive.

Solution
Production Class f Cumulative
(units) boundaries f
13 – 17 12.5 – 17.5 2 2
18 – 22 17.5 – 22.5 22 24
23 – 27 22.5 – 27.5 10 34
28 – 32 27.5 – 32.5 14 48
33 – 37 32.5 – 37.5 3 51
38 – 42 37.5 – 42.5 4 55
43 – 47 42.5 – 47.5 6 61
48 – 52 47.5 – 52.5 1 62
Total 62
n = 62
62
Q1 = value of th item = value of 15.5 th item
4
3 62
Q3 = value of th item = value of 46.5 th item
4

6
(a) Q1 class boundaries: 17.5 – 22.5
5
Q1 = 17.5 + (15.5 − 2) = 20.57 units
22

Q3 class boundaries: 27.5 – 32.5


5
Q3 = 27.5 + (46.5 − 34) = 31.96 units
14
Q3 −Q1 31.96 − 20.57
Quartile deviation = = = 5.695 units
2 2

(b)
No. of days
'<' ogive: Production at a factory for a period of 62 days
(Cum. freq.)

70

60

50

40

30

20

10

class boundaries
0
12.5 17.5 22.5 27.5 32.5 37.5 42.5 47.5 52.5

Production (units)

7
From the ‘<’ ogive, Q1 = 20.6 units which shows that 25% of the
days are having production less than or equal to 20.6 units and the
other 75% of the days are having production more than or equal to
20.6 units.
From the ‘<’ ogive, Q3 = 32.0 units which shows that 75% of the
days are having production less than or equal to 32.0 units and the
other 25% of the days are having production more than or equal to
32.0 units.

Q3 − Q1 32.0 − 20.6
 Quartile deviation = = = 5.7 units
2 2

Standard deviation
• Standard deviation is the root-mean-square deviation between
the individual values and the mean in a distribution.
• Consider a set of data: x1 , x2 , …, xn
Let mean of the data be: x

The deviations of each value x1 − x , x2 − x , …, xn − x


from the mean:

square deviations: (x1 − x )2 , (x2 − x )2 , …, (xn − x )2


mean-square deviation: ( x − x )
2

n
( x − x )
root-mean-square deviation: 2

8
• Computation of the standard deviation:
(a) Raw data
Population standard deviation, Sample standard deviation,
( x −  ) ( x − x )
2 2
= s=
N n −1

Alternatively:

Population Standard deviation, Sample standard deviation,


 x2   x 
2
( x)
2

= −  x 2

N  N  s= n
n −1

• The standard deviation computed from population data is


denoted by the symbol  (pronounced as sigma); the standard
deviation computed from sample data is denoted by s.

Variance

• Variance is the mean-square deviation between the individual


values and the mean in a distribution.

• Variance is also called the square of the standard deviation


in a distribution.

• In general, it is difficult to interpret the meaning of the value of


variance because the units are squared values. Hence,
standard deviation is more frequently used.

9
Example
Find the standard deviation and variance for the following data:
2, 12, 7, 5, 9
Solution
N=5
∑x = 2 + 12 + 7 + 5 + 9 = 35
∑x2 = 22 + 122 + 72 + 52 + 92 = 303
Population standard deviation,
 x2   x 
2 2
303  35 
= −  = −   = 11.6 = 3.41
N  N  5  5 
Population variance,  = 3.41 = 11.6
2 2

Example
During a particular summer month, the number of central air-
conditioning units sold by a random sample of 5 salespersons from
a heating and air-conditioning firm were as follows:
8, 11, 5, 12, 8
Find the sample standard deviation and the sample variance.
Solution
n=5
∑x = 8 + 11 + 5 + 12 + 8 = 44
∑x2 = 82 + 112 + 52 + 122 + 82 = 418

Sample standard deviation,


( x ) 2 44 2
x 2

n
418 −
5 = 7.7 = 2.77
s= = units.
n −1 5 −1

Sample variance, s2 = 2.772 =7.7 units2

10
(b) Grouped data

Population standard deviation, Sample standard deviation,


 f (x −  )  f (x − x )
2 2
= s=
f n −1
Where n =  f
Alternatively:

Population Standard deviation, Sample standard deviation,


 fx 2   fx 
2

 fx 2 −
( fx )2

= −  
f f  s= n
n −1

Example
Find the mean and standard deviation of the following frequency
distribution.
Class interval Frequency
0–6 2
6 – 12 4
12 – 18 10
18 – 24 12
24 – 30 8
30 – 36 4
Solution
Class interval Class mark, x f fx fx2
0–6 3 2 6 18
6 – 12 9 4 36 324
12 – 18 15 10 150 2250
18 – 24 21 12 252 5292
24 – 30 27 8 216 5832
30 – 36 33 4 132 4356
Total 40 792 18072

11
 fx 792
Population mean,  =  f = 40 = 19.8

Population standard deviation,


2
 fx 2   fx 
2
18072  792 
= −   = −  = 59.76 = 7.73
f   f  40  40 

Example
The output distribution for a sample of 100 workers in BB Company
is shown below:
Output (units) Number of workers
21 – 25 10
26 – 30 35
31 – 35 16
36 – 40 14
41 – 45 12
46 – 50 10
51 – 55 3
Calculate the mean and the standard deviation.

Solution
Output (units) Class mark, x f fx fx2
21 – 25 23 10 230 5290
26 – 30 28 35 980 27440
31 – 35 33 16 528 17424
36 – 40 38 14 532 20216
41 – 45 43 12 516 22188
46 – 50 48 10 480 23040
51 – 55 53 3 159 8427
Total 100 3425 124025

12
 fx 3425
Sample mean, x = = = 34.25 units
f 100

Sample standard deviation,

 fx 2

( fx )
2
124025 −
34252
s= n = 100 = 67.8662 = 8.24
units
n −1 100 − 1

COEFFICIENT OF VARIATION (CV)


• So far, we have been measuring degree of dispersion within a
series or a distribution and the measures used are called
absolute measures of dispersion, e.g. standard deviation.
• If we wish to compare the variability between two or more
different distributions, the absolute measures of dispersion
cannot achieve this; we have to make use of the relative
measures of dispersion instead.
• The relative measure of dispersion is a measure of dispersion
expressed as percentage of a measure of location. The
commonly used relative measure of dispersion is
standard deviation
Coefficient of variation =  100
mean
Example
Distribution 1 Distribution 2
Standard deviation 27 km RM 4.6
Mean 454.5 km RM 10
Coefficient of variation 5.94% 46%

Thus, the values in distribution 2 are more variable than the


values in distribution 1.

13
Example
Typist A can type 40 words per minute with standard deviation of 5
while typist B can type 160 words per minute with standard
deviation of 10. Which typist is more consistent in her work?

Solution
The standard deviation of typist B is twice of typist A. B can type
four times the speed of A. Taking into consideration all the
information, the coefficient of variation is used.

standard deviation
CV =  100
mean

5
CV for A = 100 =12.5%
40

10
CV for B = 100 = 6.25%
160

The results show that the typing ability of typist B is more consistent
than typist A.

• When making comparison, rule of the thumb is that the larger


the percentage, the greater is the relative variation.

• A larger relative variation implies less consistency, while a small


relative variation implies more consistency, respectively.

14
Mean deviation
 The mean deviation measure the average difference
individual values and the arithmetic mean in a distribution.
 In contrast to the range and the quartile deviation, the mean
deviation takes into account all the data and yet it is not greatly
affected by extreme values in the distribution due to the
averaging technique used.
Ungrouped data

Mean deviation =
where | | = positive sign

Example
The following data shows the daily wages of 5 factory workers(in
RM):
20 40 58 60 32
Calculate the mean deviation.

Solution
Let x = daily wages of the workers(RM)
Mean = (20 + 40 + 58 + 60 + 32) / 5 = 42

x |x - mean|
20 |20-42|= |-22|=22
40 2
58 16
60 18
32 10
Total 68

15
So, mean deviation =

Grouped data

Mean deviation =
(note: in the case of frequency distribution of multi-value grouping,
x = the class mark)

Advantage
~It measures the degree of dispersion in terms of all the values in
the distribution.

Disadvantage
~Since the signs of the differences are ignored in the calculation,
it is not suitable for further statistical analysis.

Example
No. of children No. of families, f
0 8
1 11
2 20
3 5
4 3
5 2
6 1
Total 50
Calculate the mean deviation.

16
Solution

Mean = children
No. of No. of fx |x-mean| f|x-
children, x families, f mean|
0 8 0 1.88 15.04
1 11 11 0.88 9.68
2 20 40 0.12 2.40
3 5 15 1.12 5.60
4 3 12 2.12 6.36
5 2 10 3.12 6.24
6 1 6 4.12 4.12
Total 50 94 13.36 49.44

So, mean deviation =

17
Example
Calculate the mean deviation.
Length f Class fx |x- f|x-
(cm) mark, x mean| mean|
10-20 3 15 45 39.2 117.6
20-30 7 25 175 29.2 204.4
30-40 10 35 350 19.2 192
40-50 16 45 720 9.2 147.2
50-60 34 55 1870 0.8 27.2
60-70 13 65 845 10.8 140.4
70-80 7 75 525 20.8 145.6
80-90 6 85 510 30.8 184.8
90-100 4 95 380 40.8 163.2
Total 100 5420 1322.4

Mean=

So, mean deviation =

18
BOX-AND-WHISKER PLOT(OR BOX-PLOT)
~ a box-and-whisker plot provides a graphical representation of
the data based on the five number summary, namely Xsmallest, Q1,
median, Q3 and Xlargest
~ the general form of a box-and-whisker-plot is shown in the
diagram below:

Xsmallest Q1 Median Q3 Xlargest

The box-and-whisker plot in the above diagram indicates that the


data set being depicted is right-skewed. The right side whisker is
much longer than the left side whisker. However, the vertical
median line is unexpectedly closer to the right side of the box,
indicating that the middle 50% of the values are actually slightly
skewed to the left. In general, however, the data set appears to be
right skewed.

Example
The following is a set of data from a sample of size n = 7:
12 7 4 9 0 7 3
(a)List the five number summary
(b)Form the box-and-whisker plot and describe the shape.
Solution
(a) Rearrange the data, we have 0 3 4 7 7 9 12 The Xsmallest
= 0, xlargest = 12

Q1 location = so Q1 = 3

Q3 location = so Q3 = 9
Median, Q2 = 7

19
(b)Box plot diagram

Xsmallest = 0 Q2 =7 Xlargest = 12
Q1 = 3 Q3 = 9
From the box-plot, we can say that the data set is quite symmetrical
distributed as the left and right whisker are same length. However,
the middle 50% value is left skewed as the vertical median line is
closer to the right side of the box.

COEFFICIENT OF SKEWNESS
~The term skewness is used to describe the shape of a frequency
distribution.
~If the histogram of a frequency distribution is drawn, the
distribution is said to be skewed if the peak of the histogram lies
to either side of the centre of the distribution. The terms positive
and negative skewness are used to describe the direction of the
skewness.

Positive skewness
The distribution is said to have a positive skewness if the peak of
the histogram lies to the left of the centre of the distribution.
Negative skewness
The distribution is said to have a negative skewness if the peak of
the histogram lies to the right of the centre of the distribution.

~ If the peak of the histogram lies at the centre of the distribution


with two slopes virtually identical, the distribution is said to be
symmetrical, or not skewed.

20
The coefficient of skewness is used to measure the degree of
skewness.
(a)Pearson measure of skewness(denoted by Sk(1) and Sk(2))
Pearson first coefficient of skewness, Sk(1)

=
Pearson second coefficient of skewness, Sk(2)

=
Pearson measures can take values from -3 to +3. For a
symmetrical distribution, it is zero; for positively or negatively
skewed curve, it takes the appropriate sign. For a moderately
positively/negatively skewed distribution, it takes the appropriate
sign and the absolute value of the measure is less than 1.

(b)Quartile measure of skewness (denoted by SkQ)

SkQ =
Quartile measure of skewness takes values between -1 and +1. It
is convenient to use when the median and the quartile are used to
describe the distribution.

Example
The lengths of stay by patients on the cancer floor of a local
hospital were organized into a frequency distribution. The mean
and median length of stay was 28 and 25 days respectively, and
the standard deviation was found to be 4.2 days.Calculate the
coefficient of skewness

Solution
Since given the values of mean, median and standard deviation,
use

Sk2 =

21
 Relationship between the mean, median and mode in a
frequency distribution:
~In a positively skewed distribution, mean > median > mode
~In a negatively skewed distribution, mean < median < mode.
~In a symmetrical distribution, mean = median = mode.

BAMS2923 STATISTICAL ANALYSIS FOR BUSINESS


TUTORIAL 3 (Measures of dispersion)
1. A manager observes the amount of time taken by his secretary to prepare
a sample of 10 business letters in the office and the results are arranged
in ascending order to the nearest minute:
5, 5, 5, 7, 9, 14, 15, 15, 16, 18.
Determine
(a) the quartiles,
(b) the quartile deviation,
(c) the variance.

2. The following array shows the amounts spent (in RM) by a random sample
of 15 students at a primary school canteen:

0.50, 0.50, 0.75, 0.75, 0.75, 0.85, 0.90, 1.50, 1.90, 1.90, 2.35,
2.45, 2.71, 3.00, 3.10.
Determine
(a) the quartiles,
(b) the inter-quartile range,
(c) the standard deviation.

3. The number of successful sales made by the salesmen employed by a


large micro-computer firm in a particular quarter.
No. of sales No. of salesmen
0–4 1
5–9 14
10 – 14 23
15 – 19 21
20 – 24 15
25 – 29 6
Calculate the quartile deviation and standard deviation of the number of
sales.

22
4. A company owns two garages, A and B. In garage A, a representative
sample of 200 consumers’ purchases was taken. The results were as
follows:

Petrol purchased (gallons) No. of consumers


0 and < 2 15
2 and < 4 40
4 and < 6 65
6 and < 8 40
8 and < 10 30
10 and < 12 10

(a) Calculate the mean and standard deviation of the number of


gallons purchased.
(b) A similar sample of garage B users showed a mean of 4 gallons
with a standard deviation of 2.2 gallons. In which garage were
the purchases of petrol relatively more variable?

5. Projected population in a large city (thousands)


Age group Males Females
0 – < 15 75 72
15 – < 30 59 58
30 – < 45 46 47
45 – < 60 43 48
60 – < 75 31 41
75 and over 6 14
Total 260 280
(a)Calculate the arithmetic mean and the standard deviation
for each distribution.
(b) Obtain a relative measure of dispersion for each distribution and
comment on the results.

6. (a) In the production department of a firm, the average weekly


earnings are RM366 with a standard deviation of RM28.2.
Calculate the relative dispersion.

(b) In the administrative department, the earnings (in RM) in a certain


week of the 10 members are as follows:

237 245 283 296 253 249 236 254 305 242

23
Calculate the mean and the standard deviation of this department.
Calculate the relative dispersion.

(c) Compare the variability of earnings of employees between the two


departments.

7. A survey of house prices yielded the following information:


Price of houses (RM’000) No. of houses for sale
Below 50 16
50 and under 60 41
60 and under 70 39
70 and under 80 22
80 and under 90 10
90 and under 100 11
100 and under 110 4
110 and under 120 5
120 and under 130 1
130 and under 140 5
140 and under 150 4
150 and under 160 2
Total 160

(a) Construct a “less than” cumulative frequency distribution and


draw an ogive.
(b) From the ogive, estimate the median, the lower quartile, the
upper quartiles and the quartile deviation.
(c) The government considers to rebate stamp duty for the 10%
most expensive houses, estimate the cut-off point for the prices.

8. A survey of 15 university students selected indicates the following


computer gaming in hours per week.

19 17 12 30 38 27 14 16 17 0 20 45
5 25 11
Construct a box- and- whisker plot and comment on the skewness of the
distribution.

24
Answers:
1. (a) 5 min, 15.25 min (b) 5.125 min (c) 26.989 min2

2. (a) RM 0.75, RM 2.45 (b) RM 1.70 (c) RM 0.947

3. (a) 4.62 sales, 6.07 sales

4. (a) 5.6 gallons, 2.58 gallons. (b) garage B (CV = 55%)

5. (a) Males: 32.54 years, 21.85 years; Females: 35.89 years, 23.39
years
(b) Males: CV = 67.2%; Females: CV = 65.2%; The ages of males are
more variable.

6. (b) CV = 7.70% (c) RM260. RM23.9 (d) Admin Dept: CV = 9.19%; hence
earnings are more variable.

7. (b) 66 (RM’000), 56(RM’000), 82(RM,000), 13(RM’000),


(c) Prices  112(RM’000)

Extra questions
1)

25
2)

26

You might also like