Lecture V Probability and Statistics
Lecture V Probability and Statistics
These measures are used in comparing spreads of two or more sets of observations. These
measures are independent of the units of measurement. These are a sort of ratio and are called
coefficients. The smaller the coefficient the lower the spread and vice versa.
Suppose that the two distributions to be compared are expressed in the same units and their
means are equal or nearly equal. Then their variability can be compared directly by using their
standard deviations. However, if their means are widely different or if they are expressed in
different units of measurement, we cannot use the standard deviations as such for comparing
their variability. We have to use the relative measures of dispersion in such situations.
Examples of these Measures of relative dispersion includes; Coefficient of quartile deviation,
Coefficient of mean deviation and the Coefficient of variation
Example 1 Consider the distribution of the yields (per plot) of two ground nut varieties. For
the first variety, the mean and standard deviation are 82 kg and 16 kg respectively. For the
second variety, the mean and standard deviation are 55 kg and 8 kg respectively. Then we have,
for the first variety C.V(x) = 16
82
100 19.5%
For the second variety C.V(x) = 558 100 14.5%
It is apparent that the variability in second variety is less as compared to that in the first variety.
But in terms of standard deviation the interpretation could be reverse.
Example 2 Below are the scores of two cricketers in 10 innings. Find who is more „consistent
scorer‟ by indirect method.
Cricketer A 204 68 150 30 70 95 60 76 24 19
Cricketer B 99 190 130 94 80 89 69 85 65 40
Solution:
From a calculator, x A = 79.6 , S A = 58.2 xB = 94.1 and S B = 41.1
Coefficient of variation for player A is C.V(x) = 58.2
79.6
100 73.153 %
Coefficient of variation for player B is
41.1
C.V(x) = 94.1 100 43.7028 %
Coefficient of variation of A is greater than coefficient of variation of B and hence we conclude
that player B is more consistent
Exercise
1) Find the coefficient of quartile deviation, the coefficient of mean deviation and the
Coefficient of variation n of x for the following data: a) 9, 3, 4, 2, 9, 5, 8, 4, 7, 4 b)
1, 2, 2, 3, 4, 4, 5, 5, 6, 6, 7, 8, 8 and 9 c) 3, 6, 9, 10, 7, 12, 13, 15, 6, 5, 13 d)
data on marks given by the table below
Marks Obtained 0-10 10-20 20-30 30-40 40-50 50-60 60-70
No. of Students 6 12 22 24 16 12 8
2) If the weights of 7 ear-heads of sorghum are 89, 94, 102, 107, 108, 115 and 126 g. Find the
arithmetic mean and standard deviation using a calculator hence determine the coefficient
of variation of the ear-heads of sorghum
3) The following are the 381soybean plant heights in cm’s collected from a particular plot.
Using coding formula, Find the mean and Standard deviation of the plants hence determine
the coefficient of variation of the 1soybean plant heights:
Plant 6.8- 7.3- 7.8- 8.3- 8.8- 9.3- 9.8- 10.3- 10.8- 11.3- 11.8- 12.3-
heights 7.2 7.7 8.2 8.7 9.2 9.7 10.2 10.7 11.2 11.7 12.2 12.7
(Cms)
No. of 9 10 11 32 42 58 65 55 37 31 24 7
Plants
given by 3 =
f(x - x) 3
The Karl Pearson’s coefficient of Skewness is based upon the divergence of mean from mob
in a skewed distribution. Recall the empirical relation between mean, median and mode which
states that, for a moderately symmetrical distribution, we have
Mean - Mode = 3 (Mean - Median)
Hence Karl Pearson's coefficient of skewness is defined by;
Mean − Mode 3(Mean − Median )
SK p = = ,
Standard Deviation Standard Deviation
The Bowley’s coefficient of Skewness is based on quartiles. For a symmetrical distribution, it
is seen that Q1 and Q3 are equidistant ftom median.
Q3 - 2Q 2 + Q1
SK B = where Qk is the Kth quartile.
Q3 - Q1
The Kelly’s coefficient of Skewness is based on P90 and, P10 so that only 10% of the
observations on each extreme are ignored. This is an improvement over the Bowley’s
coefficient which leaves 25% of the observatories on each extreme of the distribution.
P90 - 2P50 + P10
SK k = where Pk is the Kth percentile.
P90 - P10
Interpreting Skewness
If the coefficient of skewness is positive, the distribution is positively skewed or skewed right,
meaning that the right tail of the distribution is longer than the left. If the coefficient of skewness
is negative, the data are negatively skewed or skewed left, meaning that the left tail is longer.
If the coefficient of skewness = 0, the data are perfectly symmetrical. But a skewness of exactly
zero is quite unlikely for real-world data, so how can you interpret the skewness number?
Bulmer, M. G., Principles of Statistics (Dover,1979) — a classic — suggests this rule of thumb:
If the coefficient of skewness is: -
• less than −1 or greater than +1, the distribution is highly skewed.
• between −1 and − 12 . or between + 12 . and +1, the distribution is moderately skewed.
• between − 12 and + 12 .., the distribution is approximately symmetric.
Example The following figures relate to the size of capital of 285 companies:
Capital (in Ks lacs.) 1-5 6-10 11-15 16-20 21-25 26-30 31-35
No. of companies 20 27 29 38 48 53 70
Compute the Bowley's coefficients of skewness and interpret the results.
Solution
Boundaries 0.5-5.5 5.5-10.5 10.5-15.5 15.5-20.5 20.5-25.5 25.5-30.5 30.5-35.5
CF 20 47 76 114 162 215 285
71.5 - 47
Q1 = 14 (286) th value = 71.5th value = 10.5 + 5 14.7241
29
143 - 114
Q2 = 12 (286) th value = 143rd value = 20.5 + 5 23.5208
48
214.5 - 162
Q3 = 34 (286) th value = 214.5th value = 25.5 + 5 30.4528
53
Q - 2Q 2 + Q1 30.4528 - 2 23.5208 + 14.7241
SK p = 3 = -0.11855.
Q3 - Q1 30.4528 - 14.72411
This value lies between − 12 and + 12 , therefore the distribution is approximately symmetric.
Question: Compute the Karl Pearson's and the Kelly’s coefficient of skewness for the above
data and interpret the results.
5.2.2 Kurtosis
It measures the peakedness of a distribution. If the values of x are very close to the mean, the
peak is very high and the distribution is said to be Leptokurtic. On the other hand, if the values
of x are very far away from the mean, the peak is very low and the distribution is said to be
Pletykurtic. Finally, if x values are at a moderate distance from the mean then the peak is
moderate and the distribution is said to be mesokurtic. (see figure on pg 42)
Measures of Kurtosis
Generally for a set of values x1 , x2 , x3 , ..... xn , the moment coefficient of kurtosis 4 is given
by 4 =
f(x - x) 4
Solution
x= 1
n x = 42
7
= 6 and Standard deviation s = 1
n (x - x) 2
= 4
7
x 5 6 7 6 9 4 5 Total
(x - x ) 2 1 0 1 0 9 4 1
16
(x - x ) 3 -1 0 1 0 27 -8 -1
18
(x - x ) 4 1 0 1 0 81 16 1 100
Coefficient of Skewness 3 =
(x - x)3
=
18
( ) 0.744118
7
3
3 4
nS 7
Notice that this distribution is moderately skewed to the right
Coefficient of kurtosis 4 =
f(x - x) 4
=
100
( ) 2.73438
7
4
4
nS 4 7
Exercise
1. Find the moment coefficient of Skewness and kurtosis for the dat below. a) 9, 3, 4, 2, 9, 5,
8, 4, 7, 4 b) 1, 2, 2, 3, 4, 4, 5, 5, 6, 6, 7, 8, 8 and 9 c) 3, 6, 9, 10, 7, 12, 13, 15, 6, 5,
13
data on marks given by the table below
Marks Obtained 0-10 10-20 20-30 30-40 40-50 50-60 60-70
No. of Students 6 12 22 24 16 12 8
Data given by the table below
Marks Obtained 0-10 10-20 20-30 30-40
No. of Students 1 3 4 2
2. Compute the Bowley's coefficient of skewness, the Kelly’s coefficient of skewness and the
Percentile coefficient of kurtosis for the following data and interpret the results. a) 9, 3, 4,
2, 9, 5, 8, 4, 7, 4 b) 1, 2, 2, 3, 4, 4, 5, 5, 6, 6, 7, 8, 8 and 9 c)
3, 6, 9, 10, 7, 12, 13, 15, 6, 5, 13 d) data on heights given by the table below
Heightl (in 58 59 60 61 62 63 64 65
inches.)
No. of persons 10 18 30 42 35 28 16 8
e) data on daily expenditure of families given by the table below
Daily Expenditure 0-20 20-40 40-60 60-80 80-100
(Rs)
No. of persons 13 25 27 19 16
f) Data on marks given by the table below
Marks Obtained 0-20 20-40 40-60 60-80 80-100
No. of Students 8 28 35 17 12
3. The following measures were computed for a frequency distribution :
Mean = 50, coefficient of Variation = 35% and Karl Pearson's Coefficient of Skewness
SKp = - 0.25 . Compute Standard Deviation, Mode and Median of the distribution.