0% found this document useful (0 votes)
8 views

Lecture 4 Measures of Dispersion

The document discusses measures of dispersion, which quantify the variability of data around an average. Key measures include range, mean deviation, variance, and standard deviation, along with their relative measures such as coefficients of variation. It also covers the five-number summary, box-and-whisker diagrams, and the identification of outliers in data sets.

Uploaded by

kdarashana53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture 4 Measures of Dispersion

The document discusses measures of dispersion, which quantify the variability of data around an average. Key measures include range, mean deviation, variance, and standard deviation, along with their relative measures such as coefficients of variation. It also covers the five-number summary, box-and-whisker diagrams, and the identification of outliers in data sets.

Uploaded by

kdarashana53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

MEASURES OF

DISPERSIONS
MEASURES OF DISPERSIONS
• A quantity that measures the variability
among the data, or how the data one
dispersed about the average, known as
Measures of dispersion, scatter, or
variations.
2. Common Measures of
Dispersion
• The main measures of dispersion
1. Range
2. Mean deviation or the average deviation
3. The variance & the standard deviation
1. RANGE
• It is the difference between the largest and the smallest
observation in a set of data.
• Range = xm – xo
• Its relative measure known as coefficient of dispersion.
x m  xo
• Coefficient of dispersion =
x m  xo
• It is used in daily temperature recording stick prices rate
• It ignores all the information available in middle of data.
• It might give a misleading picture of the spread of data.
1. RANGE
• Example:
1. Find the range in the following data.
31,26,15,43,19,10,12,37
Range = xm – xo 33 = 43 – 10
2. Find the range in the following F.D. (Ungrouped)
X 3 4 5 6 7 8
f5 5 = 8 –8 3 12 10 4 2

Range 5=8–3
3. Find the range in the following data.
X 10 - 20 20 - 30 30- 40 40 – 50 50 - 60
f 5 8 12 10 4

Range = 60 – 10 = 50
MEAN (OR AVERAGE) DEVIATION

• It is defined as the “Arithmetic mean of the


absolute deviation measured either from
the mean or median.

•  x x
or  x  mean
for
M .D.  N
n
ungroup.

 f x x

 f x  mean
f f

• or for grouped.
MEAN (OR AVERAGE) DEVIATION
• Example:
1. Calculate mean deviation

X f f.x I x – 4.9 I f I x - 4.9 I


2 3 6 2.9 8.7
4 9 36 0.9 8.1
6 5 30 1.1 5.5
8 2 16 3.1 6.2
10 1 10 5.1 5.1
MD (x) = 33.6 / 20 = 1.68
Total Σf =20 Σf.x =98 Σ f I x - 4.9 I = 33.6
MEAN (OR AVERAGE) DEVIATION
• Exp: Calculate mean deviation from the FD (Grouped Data).

X f Class Mark f.x I x – 6.57 I f I x – 6.57 I


(x)
2–4 2 3 6 3.57 7.14
4-6 3 5 15 1.57 4.71
6–8 6 7 42 0.43 2.58
8 – 10 2 9 18 2.43 4.86
MD (x) = 33.6 / 20 = 1.68
10 – 12 1 11 11 4.43 4.43
Total Σf =14 Σ f.x =92 Σ f I x – 6.57 I =
M.D = 23.72 / 14 = 1.69
23.72

ẋ=92/14=6.57
MEAN (OR AVERAGE) DEVIATION

• It is an absolute measure.
• Its relative measure is coefficient of M.D.
M .D. M .D.
• Coefficient of M.D. = or
mean median

• It is based on all the observed values.


THE VARIANCE AND
STANDARD DEVIATION
• It is defined as “The mean of the squares
of deviations of all the observation from
their mean.” It’s square root is called
“standard deviation”.
• Usually it is denoted by  2(for population of
statistics) S2 (for sample)

 ( x  x ) 2

• 
2
= n for ungrouped
THE VARIANCE AND
STANDARD DEVIATION
•  2
= f ( x  x)
for grouped
2

f
• It is an absolute measure;
• It is relative measure is coefficient of
variation.
 S .D.
• C .V 

100 C .
V . 
x
100

• Shortcut method
2

2 
 x2 x
 
N  N 
 
2

 2

 f .x 2
  fx 
  
f f 
 
VARIANCE AND STANDARD
• Example:
DEVIATION
1. Calculate Variance and SD from the FD (Ungrouped Data).

X f f.x X^2 f.x^2


2 3 6 4 12
4 9 36 16 144
6 5 30 36 180
8 2 16 64 128
10cut method
Using Short 1 10 100 100
var = (564 / 20) - (98 / 20) ^ 2 = 28.2 – 24.01 = 4.09
Total Σf =20 Σf.x = 98 Σ f.x^2=564
Sd = √ σ^2 = √ 4.09 = 2.02
2

 2

 f .x 2 
 
fx 

N  N 
 
VARIANCE AND STANDARD
DEVIATION
• Exp: Calculate Variance and Standard deviation from the FD (Grouped Data).

X f Class Mark f.x x^2 f.x^2


(x)
2–4 2 3 6 9 18
4-6 3 5 15 25 75
6–8 6 7 42 49 294
8 – 10 2 9 18 81 162
10 – 12 1 11 11 121 121
Total Σf =14 Σ f.x =92 Σ f.x^2 =670
Using Short cut method:
var = (670 /14) - (92 / 14) ^ 2 = 47.85 – 43.18 = 4.67
2
Sd = √ σ^2 = √ 4.67 = 2.16  2  f .x 2


 fx 

N  N 
 
Relative Measures of
Dispersion
 Coefficient of Range
 Coefficient of Quartile Deviation
 Coefficient of Mean Deviation
 Coefficient of Variation (CV)

06:48 PM 14
Relative Measures of Variation
X Largest  X Smallest
Coefficient of Range 
X Largest  X Smallest

Q3  Q1
Coefficient of Quartile Deviation 
Q3  Q1

MD
Coefficient of Mean Deviation 
Mean
06:48 PM 15
Coefficient of Variation (CV)
 S 
CV   100%

 X 

Can be used to compare two or


more sets of data measured in
different units or same units but
different average size.
06:48 PM 16
Use of Coefficient of Variation
Stock A:
Average price last year = $50
Both stocks
Standard deviation = $5 have the
S same
$5
 
CVA   100%  100% 10% standard
X $50 deviation

Stock B:
Average price last year = $100
Standard deviation = $5 but stock B is
less variable
relative to its
S $5 price
CVB   100%  100% 5%
X $100
06:48 PM 17
Five Number Summary
The five number summary of a data set consists of the
minimum value, the first quartile, the second quartile, the
third quartile and the maximum value written in that order:
Min, Q1, Q2, Q3, Max.

From the three quartiles we can obtain a measure of central


tendency (the median, Q2) and measures of variation of the
two middle quarters of the distribution, Q2-Q1 for the
second quarter and Q3-Q2 for the third quarter.

06:48 PM 18
Five Number Summary
The weekly TV viewing times (in hours).

25 41 27 32 43 66 35 31 15 5
34 26 32 38 16 30 38 30 20 21

The array of the above data is given below:

5 15 16 20 21 25 26 27 30 30
31 32 32 34 35 37 38 41 43 66

06:48 PM 19
Five Number Summary
1(20  1)
LOCATION of Q1 ; th obs. in the data  5.25th obs.
4
VALUE of Q1 ; 5th obs.  0.25{6th obs. - 5th obs.}  21  0.25{25 - 21}  22.0 Hrs

2(20  1)
LOCATION of Q 2 ; th obs. in the data 10.50th obs.
4
VALUE of Q2 ;10th obs.  0.50{11th obs. - 10th obs.} 30  0.50{31- 30} 30.5 Hrs

3(20  1)
LOCATION of Q 3 ; th obs. in the data 15.75th obs.
4
VALUE of Q 3 ; 15th obs  0.75 {16th obs - 15th obs} 35  0.75{37 - 35}  36.5 Hrs

Minimum value=5.0 Maximum value=66.0


06:48 PM 20
Box and Whisker Diagram
A box and whisker diagram or box-plot is a
graphical mean for displaying the five number
summary of a set of data. In a box-plot the first
quartile is placed at the lower hinge and the
third quartile is placed at the upper hinge. The
median is placed in between these two hinges.
The two lines emanating from the box are
called whiskers. The box and whisker diagram
was introduced by Professor Jhon W. Tukey.

06:48 PM 21
Construction of Box-Plot
Max
Value

Q3
1. Start the box from Q1 and
end at Q3
Q2
2. Within the box draw a line
to represent Q2
3. Draw lower whisker to Min.
Value up to Q1 Q1

4. Draw upper Whisker from


Q3 up to Max. Value Min
Value

06:48 PM 22
70

Construction of Box-Plot 60

50

1. Q1=22.0 Q3=36.5
40
2. Q2=30.5
3. Minimum Value=5.0
4. Maximum Value=66.0 30

20

10

0
06:48 PM 23
70

Interpretation of Box-Plot 60

Box-Whisker Plot is useful to identify


50
•Maximum and Minimum Values in the data
•Median of the data
40
•IQR=Q3-Q1,
Lengthy box indicates more variability in the data 30
•Shape of the data From Position of line within box
Line At the center of the box----Symmetrical 20
Line above center of the box----Negatively
skewed 10
Line below center of the box----Positively Skewed
•Detection of Outliers in the data 0
06:48 PM 24
Outliers
An outlier is the values that falls well outside the overall
pattern of the data. It might be

• the result of a measurement or recording error,


• a member from a different population,
• simply an unusual extreme value.

An extreme value needs not to be an outliers; it might,


instead, be an indication of skewness.

06:48 PM 25
Inner and Outer Fences
If Q1=22.0 Q2=30.5 Q3=36.5
Lower Inner Fence Q1  1.5IQR  0.25
Inner Fences : 
Upper Inner Fence Q 3  1.5IQR  58.25

Lower Inner Fence 22-1.5(36.5-22) = 0.25


Upper Inner Fence 36+1.5(36.5-22) = 58.25

Lower Outer Fence Q1  3IQR   21.5


Outer Fences : 
Upper Outer Fence Q 3  3IQR  80.0

Lower Outer Fence 22-3(36.5-22) = -21.5


Upper Outer Fence 36+3(36.5-22) = 80.0

06:48 PM 26
80
Identification of the Outliers
70

1. The values that lie within inner *


60
fences are normal values
2. The values that lie outside inner Only
50
fences but inside outer fences 66 is a
are possible/suspected/mild mild 40

outliers outlier
3. The values that lie outside outer 30

fences are sure outliers


20

Plot each suspected outliers with an asterisk 10


and each sure outliers with an hollow dot.
06:48 PM 27
0
Uses of Box and Whisker Diagram

Box plots are


especially suitable for
comparing two or more
data sets. In such a
situation the box plots
are constructed on the
same scale.

Female
06:48 PM Male 28
Measures of Skewness
A distribution in which the values equidistant from
the centre have equal frequencies is defined to be
symmetrical and any departure from symmetry is
called skewness.

1. Length of Right Tail = Length of Left


Tail
2. Mean = Median = Mode
3. Sk=0
a) Sk=(Mean-Mode)/SD
b) Sk=(Q3-2Q2+Q1)/(Q3-Q1)

06:48 PM 29
Measures of Skewness
A distribution is positively skewed, if the observations
tend to concentrate more at the lower end of the
possible values of the variable than the upper end. A
positively skewed frequency curve has a longer tail on
the right hand side

1. Length of Right Tail > Length of Left


Tail
2. Mean > Median > Mode
3. SK>0
a) Sk=(Mean-Mode)/SD
b) Sk=(Q3-2Q2+Q1)/(Q3-Q1)
06:48 PM 30
Measures of
A distribution isSkewness
negatively skewed, if the
observations tend to concentrate more at the upper
end of the possible values of the variable than the
lower end. A negatively skewed frequency curve
has a longer tail on the left side.

1. Length of Right Tail < Length of Left


Tail
2. Mean < Median < Mode
3. SK< 0
a) Sk=(Mean-Mode)/SD
b) Sk=(Q3-2Q2+Q1)/(Q3-Q1)
06:48 PM 31
Measures of Kurtosis
The Kurtosis is the degree of peakedness or flatness of a
unimodal (single humped) distribution,
• When the values of a variable are highly concentrated around
the mode, the peak of the curve becomes relatively high; the
curve is Leptokurtic.
• When the values of a variable have low concentration
around the mode, the peak of the curve becomes relatively
flat;curve is Platykurtic.
• A curve, which is neither very peaked nor very flat-toped, it
is taken as a basis for comparison, is called
Mesokurtic/Normal.

06:48 PM 32
Measures of Kurtosis

06:48 PM 33
Measures of Kurtosis
n  X-X 
4

Coefficient of Kurtosis=
2 2
 X-X  
 

1. If Coefficient of Kurtosis > 3 ----------------- Leptokurtic.


2. If Coefficient of Kurtosis = 3 ----------------- Mesokurtic.
3. If Coefficient of Kurtosis < 3 ----------------- is Platykurtic.

06:48 PM 34

You might also like