0% found this document useful (0 votes)
42 views

Text Book Statistics Endsem

Uploaded by

Rintaro Okabe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Text Book Statistics Endsem

Uploaded by

Rintaro Okabe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 276

UNIT III - DESCRIPTIVE STATISTICS:

MEASURES OF DISPERSION

Chapter 5
MEASURES OF DISPERSION
5.1 INTRODUCTION
Ws bave seen that average condenses information into a single value. However, average alone is not
the distribution completely. There may be two distributions with same sufficient to describe
means, but distributions may not be identical.
TIlustration 1: Marks of students A, B, C in 5 subjects are as follows:
Student Marks A.M.
A 51 52 50 48 49 50
B 30 35 50 65 70 50
0 15 45 95 95 50
Notice that the average marks of all students are the same but
they differ in variation. Clearly we can see that A is more
consistent than Band Bis more consistent than C.
For further study and analysis it becomes
essential to measure the
dispersed from central value. This variation is called as dispersion. Thus,extent of variation. Observations are scattered or
frequency distribution is dispersion. Moreover, it plays very important rolenext important aspect of comparison or study of
in further analysis.
Average remains good representative if dispersion is less (i.e. if the
the reliablity average.
of observations are close to it). Thus, dispersion decides
5.2 MEASURES OF DISPERSION
In this chapter we study the following measures of
(iv) standard deviation. These measures have the dispersion : () range, (i) quartile deviation, (ii) mean deviation,
same units as that of the observation.
For example, ?, cm, hours etc.
Measure of Comparison of Dispersion :
It can be very well seen that these
measures posaess units and hence create difficulty in comparison of
or more frequency distributions. dispersion for two
For example :For a group of persons, variation in
height and variation in weight are to be compared. Height may
and weight may be in kg. Therefore, comparison is not be in cm
to every Such measure of dispersion, measure of
possible until a unitless quantity is available. Therefore,with
respect
dispersion for comparison is defined. Such measure can be obtained by
dividing the measure by corresponding average. Such a
measure is called as coefficient of the respective measure of
dispersion.
5.3CHARACTERISTICSOFAN IDEALLMEASURES OF DISPERSION
It should be rigidly defined.
It should be easy to calculate, and simple to
It should be based on all the
understand.
observations of the given series.
t should not be undually affected by
It should not be affected by the
sampling fluctuations.
extreme values.
IT Should be capable of further
mathematical treatments and statistical analysis.
5:4 RANGE AND
cOEFFICENT OF RANGE
halige is a crude measure of dispersion. However, it is the simplest
measure and suitable if the extent of variation is small.
Definition
Thus,
: If Lis the largest observation and Sis the smallest
observation, then range the difference between L and S.

Range = L-S
(5.1)
(5.2)
STATISTICS (SE, AI &DS)
largest 2nd
and the corresponding relative measure is
taken to be
the the smal
Coefficient of range L+ S intervals are
class
and last
In case of freguency distribution, mid-values of first
average.
observations
Note:
respectively. measures of dispersion
are same
as those of

Requisites of good
Merits of Range : calculate.
easyto
understand and
There is only one merit viz. it is simple to between the extreme observatio
Demerits of Range : variation patterns are
reqarding
It is not based on all give proper idea
200. However, variation different
observations. It does not 100, 150,
For example: Range of 0, 3, 5, 200 is same as that of 0, 50.
Applications of Range: variation.
Rangeis suitable measure of dispersion in case of small group with less Control.
It is widely used in the Statistical Quality
branch of statistics known as values are recorded.
The changes in prices of shares, the lowest and highest
npeTature at certain place is recorded using maximum and minimum va etc. is normal.
count
Range is used in medical science to check whether blood pressure, HB Soh

B.5 QUARTILE DEVIATION OR SEMI-INTERQUARTILE RANGE going to affect the rang


The range uses only two change in the inbetween observations is not
extreme items. Hence, any
items are widely separated from remaining item
This is amain drawback of range.
Moreover in many situations extreme
50%
25% 25%
Q. Q,

L
Q Q3
Fig. 5.1
In this situation range will overestimate the dispersion. Thus, range fails to give true picture of dispersion. In ordet
overcome these drawbacks range of middle 50% items is computed.
Clearly the middle 50% items lie inbetween the two quartiles Q, and Q:. The measure of dispersion based on the
quartiles is given below:
Quartile Deviation (Q.D)or Semi-Interquartile Range 2

And the corresponding relative measure is


Coefficient of Quartile Deviation

SOLVED EXAMPLES
Example 5.1 : Compute (0 range and coertcient oj range, (9 quartie deviation and coefficient of guartile deviation for
following data
100, 24, 14, 105, 21, 35, 106, 16, 100, 72, 68, 103, 61, 90, 20.
Solution: ()Here, Smallest observation (S) =14
Largest observation (L) = 106
Range = L-S= 106 14 = 92
L-S 92
Coefficient of range = L+S 106 +14 92 =0.7667
120
STATISTICS (SE, AI && DS)
(5.3)
MEASURES OF DISPERSION
Snd quartile deviation, we arrange the observations in
1A 16.
ascending order as follows:
20.,21,24, 35, 61, 68, 72, 90, 100,100103, 105, 106
Q1 = The value of (n1_15+1 th
4 4 item the ordered arrangement
= 21

Q3 = The value of Bh+1) 3x16 th


4 4 12)
item in the ordered
= 100 arrangement
100- 21
Q.D. = 2 = 39.5
2

Coefficient of Q.D. = Q3-Q1 = 0.6529


Q3 + Q1
Example 5.2: Compute Q.D. and Coefficient of Q.D. for the following frequency
distribution.
Daily Wages (in 3) below 35 35-40 40-45 45-50 50-55 55-60 60-65 above 65
No. of workers 12 18 22 26 36 23 19
Solution:

Class
Frequency Less than type cumulative frequency
below 35 12 12
35-40 18 30
40-45 22 52 ’Q, class
45-50 26 78
50-55 36 114
55-60 23 137 ’Q, class
60-65 19 156
above 65 8 164

N 164 \st
Qi =The value of= 4 = 41 observation
Therefore, (40 45) is Q, class
NJ4- c.f. 41-30
Q1 = l+ f xh= 40 + x5 = 42.5
22
rd
(3N 3x 164
Qz = The value of
4 4 12 observation

Therefore, Q& lies in the class (55 - 60)


3N/4- c.f. 123- 114
f
xh=55 + x5= 56.9565
23

Q3-Q1 56.9565 -42.5


Q.D. = = 7.2283
2 2

Q3-Q1 55.9565 - 42.5


Coefficient of Q.D. = 55.9565 + 42 5 = 0.1454
observations. However, Q.D.
Kemark : One of the reguisites of a good measure is that, it should be based on all the
depends upon onlv two partition values. Therefore, it is not affected by any changes except the upper and lower quartile.
MEASURES OF DISPERSION
STATISTICS (SE, AI &DS) (5.4)
b.6 MEAN DEVIATION AND COoEFEICIENT OF MEAN DEVIATION
satistied by
yement of agood statistical measure is that it chauld be based on all the observations. It is not
the
K a i g e and quartile deviation, Here we discuss the measure of dispersion which take into acCOunt all
Observations. Naturally, the use of deviations taken from acertain point of reference is appropriate. Prererabiy we take
One of the
deviations from arithmetic mean (A.M.). We require to combine all these deviations into a single value.
appropriate techniques is to take arithmetic mean, However, the sum of deviations taken from A.M. is zero. Theretore, AM.
of deviations fails to serve the purpose. A.M, behaves like a centre of gravity, it balances both posItive and negatve
deviations giving total zero. Hence, it is required to get rid of the algebraic signs of deviations. This can be done in two
ways :(a) taking absolute deviations (b) taking squares of the deviations.
Definition : The arithmetic mean of absolute deviations from any average (mean or median or mode) is called as mean
deviation about the respective average.
(Ö) Mean Deviation (M.D.) about Mean :

i=1
=
n
for individual observations, where di |=|X-x |
n

fi|di|
i =1
N for frequency distribution where N= Ef
Relative measure of dispersion is
M.D. about mean
Coefficient of M.D. about mean =
Mean
(ii) M.D. about mean:
n

i=1
n
for individual observations where | di|= }x; - median

fË| dË|
0=1
N for frequency distribution
Relative measure of dispersion is
M.D. about median
Coefficient of M.D, about median =
|Medianl
(ii) M.D. about mode:

i=1
n
for individual observations where |di |=}xj-mode|
F

i=1
N for frequency distribution
Relative measure of dispersion is
M.D. about mode
Coefficient of M.D. about mode
|Model
STATISTICS (SE, AI& DS)
(5.5) MEASURES OF DISPERSION
Computational Procedure :
Step 1 : Obtain the required average (mean or
Step 2: Obtain the absolute deviation ldi Xi-mode or median).
Step 3 : Find the sum of ld| as
averaqel for each observation.
S Jdi| for individual
observation
and fi ldi| for frequency distribution.
Step 4 : Compute M.D. as

for individual observations and

for frequency distribution,

Step 5 : Obtain coefficient of M.D. (if required) by M.D.


formula Average
Example 5.3: Compute (0 M.D. about mean and coefficient of MD. about mean (i) M.D.
M.D. about median for the prices per 10 kgof about median and coefficient of
sugar for 7 days in a certain week.
80, 82, 79, 78, 85, 80, 83.
Solution:

() Arithmetic mean = x_567 = 81


n 7

X 80 82 79 78 85 80 83 Total
dil =|x-81| 1 2 3 4 2 14

14
M.D. about mean = = 27
n 7

M.D. 2
Coefficient of M.D. about mean = = 0.0247.
Mean 81
(i) To find the median, we use the ordered arrangement:

78, 79, 80,80,82, 83, 85

Median = i.e. 4th observation = 80.

X 80 82 79 78 85 80 83 Total
dil=x 80| 0 2 1 2 5 3 13

M.D.about median =
ldl 13
= 1.85717
n 7

M.D. 1.8571
Coefficient of M.D. about median = Median 80
= 0.0232

xample 5.4 :Obtain M.D. about (0 mean (i) median and the absolute measure of dispersion in eachcase for the following
frequency distribution.
Chss 24 4-6 IN 6-8 8-10
4 1
MEASURES OF DISPERS
(5.6)
STATISTICS (SE, AI &DS) calculations.
for further
Solution: First we find mean and median
which we need
f
fx
Cumulative frequency
Class Mid-values
XË 3
3
2-4 3 7
4 20
4-6 5 9
2 14
6-8 7 10
8-10
10 52
Total
z
Mean = = 5.2
)th
Median = Thesize of Tie.sth observation.
Median class :4-6
Median = l+
N/2- c.f. xh
f
(l=4, N/2 = 5, c.f. = 3, h = 2)
Median = 5
xË- Mel fË x- Mel
XË f f-x|
Sa
a e
mo

-x
3 6.6 2 6
3 2.2
5 4 0.2 0.4 0

7 2 1.8 3.6 2 4
3.8 3.8 4 4
&VARIA
Total 10 14.8 14
Tebnerbou
14.8 Definton:
M.D. about mean = = 1.48
10
M.D. 1.48
Coefficient of M.D. about mean = = 0.2846
Mean 5.2
~f I- Mel 14
M.D. about median =
10 =14
14
Coefficient of M.D. about median = = 0.28
5
Minimality propertyof M.D. : Among allmean deviations, mean
Therefore. in order to avoid the effect of choice of average, meandeviation about median is minimum.
Merits of M.D. : deviation about median is preferred
It is simple to understand and easy to calculate.
It is rigidly defined.
It is based on all observations.
Demerits of M.D. :
It is not applicable for qualitative data.
Since algebraic signs of deviations are ignored, it is not
It cannot be computed for the frequency distribution withapplicable for further mathematical treatment.
open end class.
Åserious drawback mentioned in demerits of M.D. (2) can
be
it overcome taking
squares of deviations a measure of dispersion is defined by
and is
discussed below: squares of the deviations. Based o
MEASURES OF DISPERSION
& DS)
STATISTICS(SE,AI (5.7)

5.7 MEAN SQUARE


DEVIATION
sign of d, we either use
Supposed=x-ais deviation taken from an arbitrary reference point 'a'. To get rid of algebraic
a
can develop a
d. The measure of dispersion based on d viz. mean
deviation, we have already studied. Using d² we deviation. It
ldl of as a measure of
measure of
dispersion, which is better than mean deviation. The
arithmetic mean of d² is used
isknown as mean square deviation (M.S.D.).

(x- a) for individual observations


i=1
Clearly, M.S.D. =

fi (X;- a)? for frequency distribution


i=1
k

i=1
e
However, M.S.D. is affected by choice of a. Thus, it creates difficulty in measuringthe dispersion properly. we y o
measure which will overcome this difficulty.
the sum of squares of deviations taken
We have studied properties of arithmetic mean. One of the properties states that,
property of M.S.D. It will enable us to develop a
from arithmetic mean is the minimum. Using this fact we get the minimal
measure of dispersion.
taken from arithmetic mean.
Minimality property of M.S.D. : Mean square deviation is the least if the deviations are
Since, sum of squares of deviations taken from arithmetic mean the minimum, we get,

X(X;- a?/n > (-x )/n


M.S.D. about a > M.S.D. about x .
5.8 VARIANCE, STANDARD DEVIATION AND COEFFICIENT OF VARIATION
variance.
The lower bound of M.S.D. is taken as a measure of dispersion. It is called as
mean is called as variance.
Definition: The arithmetic mean of squares of deviations taken from arithmetic

Variance = (X;-x) /n for individual observations


Clearly, i=1
k
f (X-x)?
i=1
k
for frequency distribution

i=1
is suggested by R. A. Fisher.
Note: Symbolically we write variance of xas Var (x). The term Variance
Remark:The units of oriainal items and that of the variance are not same.
in (cm)2. Therefore we take positive square
For example, if items are measured in cm, then the variance will be expressed
mean square deviation.
root of variance. It is called as standard deviation or least root
taken from arithmetic mean is called as
Definition : The positive square root of mean of squares of the deviations
Standard Deviation (S.D.).
letter).
L IS dehoted by o (read as sigma, a lower case Greek
n

i=1
for individual observations
Therefore, n
STATISTICS (SE, AI &DS) (5.8) MEASURES OF DISPERSION

i=1 for frequency distribution


k

i=1

For computational purpose the above formulae can be simplified as follows :


Case (i) : Individual observations
1 = -2xx +())
E (;-x)2
"Ë=1
n n
2
X X; -2x x; + (x)2
Li=1 j=1 i= 1

( =nx)
1
n

Case (ii) :Frequency distribution


k

i=1
g2 = k

i=1
k
1 2
|Let N = X f
= N.i=1 i=1
k
2
~ fix; -2x fx; + fx2
Li=1 i=1 i=1
2

N
- 2x
N +(x

N -2 (8) +(x =x

Standard deviation is ameasure of dispersion which satisfies most of the requisites of agood measure. It is free from the
drawbacks present in the other measures of dispersion.
Coefficient of Variation : Prof. Karl Pearson suggested the relative measure of standard deviation. It is called as
coefficient of variation (C.V.).
S.D
X 100 - x 100% ...1)
Itis given by JA.M.)
MEASURES OF DISPERSION
STATISTICS (SE, AI && DS) (5.9)

Coeffcient of variation is always expressed in percentage.


dis Remarks : (1) R.H.S. of (1) includes the multiplier 100, because is too small in many cases. Thus, for convenience it is

multiplied by 100.
2 Frequently we need to compare dispersions of two or more groups. If the values in data set are large in magnitude,
naturally variation among them will be proportionately larger.
For example, S.D. of weights of a group of elephants will be larger than that of a group of human beings. Suppose S.D.
of weights of a group of elephants is 15 kg and that of human beings is also 15 kg. In this case we cannot say, both
the groups have identical variation. This is because average weight of agroup of elephants is larger than that of the
average weight of a group of persons. Therefore for comparing variations between two different data sets, a measure
based on the ratio of o and x would be appropriate. This is achieved in coefficient of variation. It measures variatíon in
alldata sets using a common yard stick; moreover it is free from units.
3. According to Prof. Karl Pearson,CV. is the percentage variation in mean whereas S.D. gives the total variation in the
mean.
CV. and Least Count:
Using proper measuring instrument is also a way to check whether C.V. is maintained properly. If appropriate instrument is
not used, CV. will be inflated. As a thumb rule in industry.
1
Least COunt s
10 specified range.
For example, if the inner diameter of cylinder is required to be between 0.95 cm and 1.05 cm, the least count of the gauge
1
1 th of the specified range which is i (1.05 -0.95) = 0.01 cm = 0.1 mm.
should be 10
Properties of Variance and S.D. :
(1) Mean square deviation > Variance.
2) Efect of change of origin : Variance (S.D) is invariant to the change of origin. In other words, if a constant is added to
(or subtracted from) each item, the variance (S.D.) remains same.
Proof : Case() Individual observations : Suppose x, X2 .., Xn is set of observations. Let y; =X;-a where 'a' is constant. We
have to show that
var (y) = var () or oy = Ox

Since y =X-a, we get y = X -a.


n
1
By definition, Var (y) =

"i=1

= var (X)
n
'i=1 i=1

Case (i) Frequency distribution:Suppose ( f),i =1, 2, ...k) is afrequency distribution. Let y=x- a, hence y =x -a.
k k

By definition, Var (9) i=1


t where, N=
i=1
f

k
1
N
i=1
tlu-a-f-)
k
1 = Var (x)
N
i=1
MEASURES OF DISPERSION
STATISTICS (SE, AI&& DS) (5.10)

3. Effect of change of origin and scale :


IfuK-a) 1
h aand h being constants, then var (u) =2 var (x) or ou= h

X-a X-a
Proof: Since u = h h

For frequency distribution{(%, f), i =1, 2, .., k) k


k
flu where, N= X f
Var(u) = 0=1
Ë= 1

k k

N.i=1 |h h
1
h2N
i =1
-
H|H|S 1 Var ()

1
Ou = h Ox

Note:

(a) The properties (2) and (3) simplify the computations of variance and S.D. to a large extent.
(b) If we define y = ax + b, then Var (y) = a? Var () or oy = aoy.
(c) In property (3), if we take h =1, then we get u =X- a. This amounts to change of origin and not the change of scale
(d) In property (3), if we take a = 0, then we get u = x/h. It is equivalent to change of scale only.
4. Combined Variance and S.D. :

Suppose there are two groups. First is of size n, with arithmetic mean x1 and variance oj. Second group is of size n, with
2
arithmetic mean x, and variance oz. Then the variance of combined group of size n + n, is given by
2

where, d,= X-Xo d, = X, -Xo and x is combined arithmetic mean.


Generalization : Let there be k groups (k > 2) with size of r group as ni, arithmetic mean xË and variance o;2 ,i=1.2.3. ..
k. The combined variance of k groups is given by
k

2 i=1
k

|=1

where, d; = xj-X and x = combined arithmetic mean.


5. S.D.2M.D. about arithmetic mean:

Proof: Suppose X1, X ., Xn are the observations witn mean x. Let dË = x; -x then co 2
V~d; /n and M.D. about
mean=
STATISTICS (SE, Al& DS)
(5.11) MEASURES OF DISPERSION
Let y;= d;l
Note that > Obeing a square.
n

i=1

y; -2y Ey; + Z(y)²> 0


Dividing by n we get,
2

- 2(y )² + (y )² 2 0
2

V 2 y

n n

ofg 2 M.D.

.. S.D. > M.D. about arithmetic mean.


Example 5.5: Compute S.D. and C.V. for the following data: 36, 15, 25, 10, 14.
ze Solution: We use computational formula
X

Total
36 15 25 10 14 100

1296 225 625 100 196 2442

X; 100
X = 20
n 5

2442
- 202 = V88.4 = 9.4021

9.4021
CV. = x 100 = x 100
20

= 45.2105%
Compute S.D. and CV. for above data using MS-Excel.
(A5: A9), we get S.D.
Solution : Using the command = stdevp (range) in this case type = stdevp
S.D. x 100%
CV. =
Mean
MEASURES OF DISPERSION
(5.12)
STATISTICS (SE, AI & DS) D
C

1
C.V. of given data
2 Find S.D. and

Data
5 3E
15 20
7 25 Mean = 9.40212742
10
S.D. - =STDEVP(A5:A9) 47.0106371
C.V, =
10
Fig. 5.2

distribution, therefore we use worksheet.


This not possible for frequency
numbers.
Example 5.6: Compute S.D. of first n natural
Solution: We have to find S.D, of 1, 2, 3, ..., n. Total

n (n +1)/2
X 1 2
n2 n (n +1) (2n + 1)/6
2
12 22
n(n + 1)/2
X =

n(n + 1) (2n + 1) (n 4+ 1)2


2

nx6

|2 (n + 1) (2n + 1) -3 (n + 1)2
12
n-1
[4n + 2- 3n -3] = 12

by 10 candidates given below:


Example 5.7: Compute S.D. and C.V. of marks scored
54,61, 64 69, 58, 56, 49, 57, 55, 50.
Solution: Let a= 57, u X- 57
58 56 49 57 55 50 Total
54 61 64 69

12 1 -1 -8 -2 -7 3
-3 7

49 144 1 64 0 4 49 337
9 16

-
V33.61 = 5.7974
To compute C.V. we require x. Note that u = X-57,
3
X= u+ 57 = n
+ 57 =
10
+ 57 = 57.3

5.7974
C.V. = x 100 = x 100 = 10.1176%
57.3
MEASURES OF DISPERSION
STATISTICS (SE, Al& DS) (5.13)

Evnmple 5.8: Calculate the standard deviation and coefficlent of variation for the frequency distribution of marks of 100
candidates given below :
Marks 0-20 20-40 40-60 60-80 80-100
11
Frequency 5 12 32 40

X- 50
Solution: Let u=
20

Class Mid-values xË Frequency X-50 fixuj fu


fi 20

00-20 10 5 -2 - 10 -10x-2 =20


20-40 30 -1 -12 -12 x-1= 12
12
40-60 50 32
60-80 70 1 40 40x 1= 40
40
22 22 x 2 = 44
80-100 90 11 2

Total 100 40 116

X-50
Here, u= therefore x = 50 + 20u
20

Zfiui 40
= 50 + 20 u = 50 + 20 = 50 + 20 X100 = 58

Var (x) = 202 Var (u) = 400

= 400 [116 40
= 400
100 100
S.D. of x = Vvar (x) =V400 = 20
S.D. 20
CV. of X = A.M. × 100 = 58 x 100 = 34.4828%

Merits of S.D. :
It is based on all observations.
It is rigidly defined.
It is capable of further mathematical treatment.
It does not ignore algebraic signs of deviations.
otal It is not much affected by sampling fluctuations.
Demerits of S.D. :
3
It is difficult to understand and to calculate.
It cannot be computed for a distribution with open-end class.
It is unduly affected due to extreme deviations.
It cannot be calculated for qualitative data.
Use of Variance and S.D.:
statistical quality control, statistical inference deal
Practically, in almost all advanced statistical methods such as sampling,
with variance.
many situations. However, there are some situations in
AS Tar as variance is concerned, smaller variance is better in
genetical sciences where larger variance is better.
them are discussed below:
Variance and standard deviation are used in number of situations. Some of
k
precision = variance'
(a) Precisionof an instrument is inversely proportional to variance. Therefore
STATISTCs (SE, AI &DS) (5.14)
MEASURES OF DISPERSION
(0) n porttolio analysis, risk is deseribed in terms of variance of prices of Snare
(c) For the machines, coefficient of variation is used.
comparison of performance oftwo or more instruments
or

(0) he spread of variable is approximately taken as (- 34, x+3a)


Thus, standard deviation helps in estimating lower limit and upper limit of the items.
We state below the notes, which willbe useful in solving numerical problems.
Note:
1. If all the obsevations are equal, S.D. is zero (Wny )
2 If data contains only one observation, S.D. is zero (why ?)
EXOmple 5.9: Agroup of 50 items hove mean and S.D. 61 and 8 respectively. Another group of l00 observations hoe mod
and S.D. 70 and 9 respectively, Find mean and S.D. of combined group.
sOluuon:We are given that :n, = 50. xX, = 61, G, =8, n, = 100, X%, = 70 and o = 9. Iherefore combined mean is
(50 x61) + (100 x 70)
= 67
X¢ = 50 + 100
nt n2

.: d, = xË - X¢ = 61-67 =-6 and d, = X,-X¢ =70--67 = 3.


Combined S.D. is

n (o+ d) +na (G;+ d)


Oc =

50 (64 + 36) + 100 (81 + 9)


= 9.6609
150

Example 5.10 : The mean weight of 150 students is 60 kg. The mean weight of boys is 70 kg, with S.D. of 10 k. For girls the
mean weight is 55 kg with S.D. of 15 ka. Find the number of boys and combined S.D.

Solution : et there are n, boys with mean Xand S.D. o, Similarly, there are n, girls with mean x, and S.D. G,. Hence, we
get n+ n, = 150, xç=60, XË = 70, X, = 55,o, = 10, G, = 15.

X¢ =

70n, + 55n2
60 =

60n, + 60ng =70nË +55n,


..()
nË + n, = 150
nË + 2n,= 150 ...from(1)
nË 50
Number of boys = 50,
We get d=X-X¢=70 -60= 10 and d, =x,- X= 55- 60 =-5

Combined S.D. = o =

50 (100 + 100) + 100 (225 + 25)


150 = 15.2753 kg.
STATISTICS (SE, Al & DS) (5.15)
MEASURES OF DISPERSION

Evample 5.11 : The mean and S.D. of 10observations were 9.5 and 2.5 respectively. If one more observation with value 15 is
includedin group, obtain the mean and S.D, of these 11 observations.
solution : Let there be two groups, first group of original 10observations and second group of new single observation.
Hence,
n, = 10, n, = 1

Xj = 9.5. X) = 15 (why ?)
Þ = 2.5, O = 0 (why ?)
10x 9.5 + 15
Combined mean = X¢ = 11
= 10
n+ n
dË = X1 - X¢ = -0.5 and d, = X)- Xc =5
|10 (6.25 + 0.25) + (25 + 0) = 2,8604
11

Example 5.12 : The number of runs scored by cricketers Aand Bin 10 test matches are shown below:
A 5 20 90 76 102 90 6 108 20 16
B 40 35 60 62 58 76 42 30 30 20

Find ()which cricketer is better in average ?(i) which cricketer is more consistent ?
533
Solution : Mean ofA - n2X 10 =53.3
|45161
S.D. of A =
10
- (53.3)2 = 40.9293

CV. of A = 76.79%
Mean of B =
453 = 45.3
n 10

S.D. of B = =1 23373 - (45.3)2


10 = 16.8882
C.V. of B = 37.28%
) A
gives better average runs (mean A > mean B).
(i) Bis more consistent (C.V. of B < C.V. of A)
Example 5.13 : Arithmetic mean and S.D. of 12 items are 22 and 3 respectively. Later on it was observed that the item 32
Was wrongly taken as 23. Compute correct mean, S.D. and CV.
Solution:
Incorrect sum ()x) = nX Incorrect mean = 12 x 22 = 264.
Correct 5x = Incorrect Sx+ Correct item - Incorrect item
Sx = 264-23 +32 = 273
273
Correct mean = 22.75
12

Incorrect Sx = n + («)2 with o and x incorrect


= 12 (9 + 484) = 5916
STATISTICS (SE, AI &DS) MEASURES OF DISPERSIO.
(5.16)

Correct x? = Incorrect Sx? + (Correct item)2 - (Incorrect item)


= 5916 + 322 232 = 6411

Correct o = with correct Sx? and Sx

6411
12 - (22.75)? = V16.6875 = 4.0850
Correct CV. = x100 = 17.9562%

EXCmple 5.14 : For aset of 90 items the mean and SD. are 59 and9
the mean and S.D, are 54 and 6 respectively. For 40 items selected from those 90 items
Solution: respectively. Find the mean and S.D. of the
remaining tems.
Group 1 Group 2 Combined Group
n, = 40 n, = 50 n= 90

XË = 54 X, = ? X = 59
O= 6
Cc =9
Tofind xTWe use xc

n/x1 + n2X2 40 × 54 + 50 x2
X¢ =
nË + n2 gives 59 =
90

X = 63

dË = X1-X¢ =-5, d, = X2 - X¢ = 4
2
2

81 =
40 (36 + 25) + 50 (¡, + 16)
90
Oz = 9.
Example 5.15: Given that n = 10, (x20) = 8, (x- 20)2 = 762. Find mean and
S.D.
Solution: Letu =x- 20, Hence x = 20 + ~u = 20.8
n

S.D. 762 = 8.6925


10 10.
Example 5.16 : Avariable takes valuesa- kd, a - (k-1) d, . ,a- d, a, a + d.... a +kd
Find its M.D. about arithmetic mean and S.D.
Solution : First of all we find A.M.
- (a - kd) +(a -(k-1) d) +... +
(a - d) + a + (a + d) + ... + (a
2k + 1
+ kd)
(2k+ 1) a
2k + 1

We needto compute (Xj- x). It is useful for M.D. as well as S.D


MEASURES OF DISPERSION
STATISTICS (SE, AI & DS) (5.17)

Therefore, is a convenient formula for computations.


X
d| =Ki-x1=-al d
a-kd kd
a- (k-1)d (k-1) d (k- 1 d?

those a-d
0
a+d d

a+ (k-1) d (k-1)d (k-1)2 d2


a + kd ko

M.D. about x Xldi 2(d +2d +.. +kd) k (k + 1) d


n 2k + 1 2k + 1

2d2 (12 + 22 + ... + k)


S.D. =
n 2k + 1
2d2 k (k + 1) (2k + 1)
= d
k(k + 1)
6 (2k + 1)
Example 5.17: Suppose x, x .., Xn are the observations.
3M
n n
1 I
2n2
TË=1 j=1
Solution: Suppose x is the A.M. Let us evaluate
1 1 1

1
2n2
-+xy--2-30-)
=

-n¿g-i-2-90-9|
(-x)
2
2 (X-x) x 0
2n 2n 2n
-0 = g

CASE STUDY
Parag Infotech Pvt. Ltd. is a company toprovide software solutions. Directors of the company have taken a decision to
double the capital and expand it in abig way. In view of this, company decides to recruit at least 50 computer
enginers. Company invited applications from fresh computer engineering graduates having at least 70% marks at
ther finalexamination. Company also expected furnish details of marks obtainedfrom their SSC examination onwards.
Company received 200 applications. Most of the applications have secured marks between 70% to 73% in their final
examination. Due to short of time to recruit, company is not interested to conduct personal interview of all the
applicants butto select 70 of the best applicants for personal interview of the final selection. Company feels that 2% to
S% variation in final examination marks may be due to chance and has no effect in the performance.
Statisticians have advised to company to use the concept of measures of dispersion. Discuss the use of range and
stândard deviation in this regard to take the proper decision.
MEASURES OF DISPERSIO
STATISTICS (SE, AI&& DS) (5.18)
POINTS TO REMEMBER
Range = Largest observation - Smallest
observation.
Coefficient of range Largest observation - Smallest observation
Largest observation + Smallest observation
Standard deviation (S.D.) =G = X
x for discrete series

1/
-x for frequency distribution
Coefficient of variation (C.V) =x100%.

CV. is used for the comparison of variation.


Quartile deviation= 2

Coefficient of quartile deviation =


Q3 + Q1
M.D. about m = fX-m m is mean or median or mode.

Coefficient of mean deviation about m is M.D. about m


=
m

EXERCISE
(A) Theory Questions :
1. What is dispersion ? What purpose does it
serve in the study of distribution ?
2. What type of measures will you use for
comparison of dispersion in different distributions ? Mention any two of such
measures.
3. Explain relative measure of dispersion and
state its utility.
4. Define : Range, Quartile deviation, Mean
deviation and Standard deviation. State the formula for each in case
ungrouped data and frequency distribution. of
5. Compare critically the measures of
dispersion:() range and Q.D. (i) M.D. and S.D.
6. State the merits and demerits of each of the
7. Explain why S.D. is the best measure of
following measures of dispersion: (i) range (i) M.D. (ii) S.D.
dispersion.
8. What is utility of CV.?
9. Show that S.D,> M.D. about
arithmetic mean.
10. Show that mean squared deviation is greater
than or equal tovariance.
11. State and prove minimal property of mean squared
12. Discuss the effect of change of origin on deviation.
13. Discuss the effect of change of scale on
variance and S.D.
variance and S.D.
14. Suggest a suitable measure of
dispersion if 0) the frequency distribution includes open end
qualitative. class : (i) the dataare
15. Showthat allthe measures of
dispersion are invariant to the change of origin.
16. Define deviation about x and explain how it
17 Given the size,A.M, and S.D. of each can be used to measure dispersion.
of k (k2 2) groups, state the
19 TwO groups of n and n formula for combined S D
observations have tne arntnmetic means XË and X2, the
respectively. State the formula for combined S.D. Also standard deviations and Þ:
discuss the cases for
(ii) n = n, and xË= X2 (V) n, = ng XË= X and Þ1 = O2. combined S.D. () X1 = X2 () n =n:
STATISTICS (SE, AI & DS) (5.19)
MEASURES OF DISPERSION

19. Suppose X1, X> Xn are n values of a variable xwith A.M. x.Can we measure dispersion of x using (X- x) ? If not,
give reason and suggest howthedeviations( -X); i= 1, 2,..o, can be used to measure dispersion.
20. Compare mean deviation and standard deviation as measures of dispersion.
21. Suggest a measure of dispersion which can be obtained graphically. Also explain the procedure for obtaining the
same.
(B) Numerical Problems:
1. IfY = 2X +3 then show that o, = 9o.
2
2. IfY= aX +bthen show that Y=aX +band =a o.
3. Show that S.D. of {x, x,) = 2
4. Show that S.D. of {X1, X2, ..., X = S.D. of {- x1, - X2, ..., - Xn}.

5. Show that () x2 nx, () E%: n

6 Find the variance of1, 2, ..., n.

Ans.:
(n'-1)
7. Avariable takes values a, a + d, a + 2d, ...,a + (n -1) d; find its variance.
n(n-2) d²
Ans.. 12

8. Avariable takes values 1, 2, ..,nwith frequencies 1, 2, .., nrespectively. Find the standard deviation.
J(n +n- 1)
Ans.: 18
9. Avariable takes values 0,1, ...n with frequencies "Co, "G, .."C, respectively. Find the variance.
n
Ans.:
hind
10. Find the standard deviation of the following frequency distribution :
1 2 3 4 5

2a 3a 4a 5
Frequency
Ans.: 55.6
(C) NumericalProblems : Discrete Series :
mean deviation about mean
1. Compute the (i) standard deviation, (i) mean deviation about mean, (ii) coefficient of
for the following data:
15, 18, 22, 25, 10.
Ans.: S.D, = 5.2536, M.D. = 44, Coefficient of M.D. = 0.2444.
following series :
2. Calculate the (i) coefficient of variation, (ii) mean deviation about the median for the
12, 18, 15, 20, 16.
Ans.: C.V. = 16.75%, M.D. = 2.2
3. Find the standard deviation of the following observations:
2, 3, 5, 2, 7, 5,7, 6, 11, 12.
Ans.. 3.2558
mean deviation about median, (iv) coefficient of mean
4 ind the () standard deviation, (i) coefficient of variation, (ii)
ions deviation about median for the following data
6,4,5,3, 12, 10.
ATs.: G 3.1842, CV. = 46.5975% M.D. = 2.5, Coefficient of M.D. = 0.4167
MEASURES OF DISPERSION:
STATISTICS (SE, AI &DS) (5.20)

S. Which of the following two series A and B is more


stable ? Why ?
-1
2
A 2 3 6
3 4
6 7
B 7 5 5

Ans.: Series B, more stable, C.V. (A) =89.1898% CV. (B) =40%% more consistent in scoring
which of thefollowing batsman is
gcoemient of variation find 119 36 84 29
7 19 73
Score of A 42 115
51 37 48 13
Score of B 47 12 76 42
Ans.: B is more consistent CV. (A) = 75.54 %, C.V. (B) = 70.82 %
ot 10 persons using coefficient of variatio
ICompare the variation between the weight and the height of agroup
5 6 7 10
Sr. No. 2 4
69 63 65 70 71 62
65 65 64
Weight (kg) 70
160
145 165 167 156 153 168
Height (cm) 170 140 151
Ans.: CV. (weight) =4.67% < CV. (height) =6.18 %
(D) Numerical Problems: Frequency Distribution :
8. Calculate the standard deviation and coefficient of variation for the following frequency distribution :
2 6 10

4 14 8
Frequency 2

Also find the quartile deviation, coefficient of quartile deviation.


Ans.: = 1.9137, CV. =43.73%, Q.D. =1, Coefficient of Q.D. = 1/7
9. Asurvey conducted to determine distance travelled (in kms) per litre of petrol by newly introduced motorvcle gives
the following distribution:
Distance.(km) 40-45 45-50 50-55 55-60 60-65
No. of Motorcycles 10 17 23 40 10
Find the ) standard deviation, () quartile deviation and coefficient of quartile deviation, (ii) mean
median and coefficient of mean deviation.
deviation about
Ans, o 5.7383, Q.D, = 4.3566 Coefficient of Q.D. = 0.08103
M.D. =4.85 Coefficient of M.D. = 0.0882
10. Find the variance for the following frequency
distribution and the mean deviation about the mode.
Cass 5-15 15-25 25-35 35-45 45-55
Freguency 05 15 22 18 10
Ans.: o = 1294082, M.D. = 9.2727
11. Compute the standard deviation for
the following data. Also find mean
measure. deviation about median and its relatve
Marks 0-9 10-19 20-29 30-39 40-49
No of Stuients 3 7 25 10 5
Ans.: o =9.8, M.D,=7, Coefficient of M.D, = 0.2745
12, Calculate the coefficient of
variation (CV) for the following data:
1-10 11-20 21-30 31-40 41-50 51-60
5
15 21
Ans.: CV. = 42.9986 6 4
STATISTICS (SE, Al & DS) MEASURES OF DISPERSION
(5.21)
12 Find the standard deviation and the coefficient of variation from the
following data:
Marks 0-10 10 - 20 20-30 30- 40 40- 50
Frequency 10 16 30 32 12
Ans.: o = 11.4891, CV. = 42.5522
14 Find the quartile deviation and coefficient of quartile deviation of distribution of daily wages.
Daily wages Below 20 21 - 40 41 -60 61-80 Above 80
Frequency 5 32 45 17 1
Ans.: Q.D. = 12.1944, Coefficient of Q.D. =0.1349
15. Two automatic tea filling machines A and Btested for the performance. Machines are supposed to fill 500 gm tea in
each packet. A random sample of 100 filled packets on each machine showed the following distribution.
Weight in gm Frequency A Frequency B
485-490 12 10
490-495 18 15
495-500 20 24
500-505 22 20
505-510 24 18
510-515 4 13
Which machine is more consistent ? Why ?
Ans.: CV. (A) = 14294 % CV. (B) = 1.5084 %, Machine A is more consistent.
16. The following data pertain to two workers doing the same job in afactory:
Worker A Worker B
Mean time of completing the job (minutes) 40 42

Standard deviation (minutes) 6


Who is more consistent worker ? Why ?
Ans.. B, since C.V. (A) = 20% > C.V. (B) = 14.2857%
(E) Numerical Problems on Combined Standard Deviation:
17. Two samples of sizes 40 and 50 have the same mean and standard deviations 20 and 10 respectively. Find the
variance of the combined group.

Ans.: X¢ = 20, o = 10
18. Out of 400observations, 100 observations have one each and the test of the observations are zero. Find the mean
and standard deviation of 400observations together.
1 5
Ans.: Mean = 4 Variance =
19. The arithmetic mean and the standard deviation of the values of 100 items in a group are 80 and 5 respectively. In a
second group of 25 items, each item has a value equal to 60. Find the combined standard deviation of two groups
taken together.
Ans.: x =76, o, =V84
20. SupposeX denotes the time required to complete ajob by worker Aand Ydenotes the time required to complete a
Job by worker B. Ten jobs were assigned to both workers A and B. Information regarding completion times is as
follows:
Zx= 300, Ey= 250, x =9360, y= 7850.
) which worker appears to be faster in completing the jobs ?
(0) Which worker is more consistent ?
S 0) B (i) Asince CV (A) = 20% < CV. (B) = 50.5964%
STATISTICS (SE, AI &DS) (5.22)
MEASURES OF DISPERSO
21. Find
combined standard deviation from the following data Standard Deviation
Workers Number Average Salary
Male 80 1520
Female 20 1420
Ans.: 40.42029
Z2. lwO workers on the same iob show
the following results over long period or e
Worker 'A' Worker 'B
Mean time of completing the job (in 24
minutes) 30
Standard Deviation
Number of jobs 10 10
r Wnicn worker appears to be more consistent in the time he reguires to
(i) Which worker is faster in complete the job ? Why ?
(i) Find the combined mean and
completing the job ? Why ?
standard deviation of the two workers together.
Ans.: () CV. (A) = 20% > CV. (B) =
(ii) B (ii) Combined mean =27,
16.6667%, Bis more consistent
Combined S.D. = 5.9161
23. Information about the daily
salaries of emplovees in firms Aand Bis stated below:
Firm No. of employees Mean Salary S.D. of Salary
A 60 7400 10
40 7500 11
0) Which firm gives more amount as salary ?
(i) Which firm has smaller variation in salary ?
(iü) Find the combined mean and S.D. of two firms.
Ans.: () B () CV. (A) = 2.5% > CV. (B) = 2.2%, Bhas smaller variation.
(ü) Combined mean = 440, Combined S.D. = 50.0839.
24. Information regarding daily salaries of two companies Aand Bis given
below:
Company A Company B
No. of workers 600 400
Mean salary 180 200
S.D. of salay 10

0 Which company pays larger salary ?Why ?


() Which company has less variation in salaries ? Why ?
(m) Find combined mean and S.D. of two firms Aand B.
Ans.: )B G) Both have same CV. = 5%, both are equal in variation.
(i) Combined mean =188, Combined S.D. =14.2969%.
25. Find the combined S.D. from the following data :
Group A Group B Group G
Sze 100 150 250
Aithmetic mean 50 55 60
S.D. 10 11 12

Ans.: 11.9812
STATISTICS (SE, AI & DS) MEASURES OF DISPERSION
(5.23)
(F) Numerical Problems on Corrected
Standard Deviation:
26 Asample or 10 numbers gave a mean of 13 and a variance of A
in the sample should have been 21, Later it was discovered that the number 12 included
Find the corrected mean and
Ans.: Mean = 13.9, Variance = 9 49 variance.
7 The mean and standard deviation of 100
obsevations are 40 and 5.1 respectively. During cross-checking, it Was
found that an observation 40was misread as 50.
Compute correct values of mean and standard devlation.
Ans.: Mean = 39.9, S.D, = 5
28 The mean and standard deviation of 20
obseNations are 10 and 2 respectively. Later on it was noticed that item
8was incorrect. Calculate arithmetic mean and
standard deviation if () the wrong item is omitted. (n) the wrong
item is replaced by 12.
Anc: ) Mean = 10.1053, S.D. = 1.9922 (ii) Mean = 10.2. S.D. =
1199
(G) Miscellaneous Problems :
29. Find the missing observations, if the arithmetic mean and standard
deviation of the following series are 10 and
4 respectively.
14, ?, 11, 10, 13, 16, ?, 9, 12, 2.
Ans.: 8, 5
30. In a group of 10 children, the heaviest boy weighs 10 ka more than the average weight of other children.
the standard deviation of the group cannot be less than 3 Show that
kg.
Ans.: 20, 14.9254
31. The range, arithmetic mean and standard deviation of a group of 10 items is 20, 62, 10 respectively. If each
observation is increased by 5, what will be the range and the coefficient of variation ?
Ans.: Mean = 122, S.D. = 2
32. Ifn= 10, 2 (x-120) = 20, (x-120) = 200. Find the mean and the standard deviation.
Ans.: S.D. = 14697, C.V. = 734.8469%
33. Ifn =100, x =- 20,Sy? = 220, find standard deviation and coefficient of variation.
Ans.: oA = OB = V2, oe = 10/2, Op = 0.
34. Find the standard deviation of Set A, Set B, Set Cand Set Dand comment on findings.
Set A: 1 2 3 4 5
Set B: 11 12 13 14 15

Set C: 10 20 30 40 50

Set D: 4 4 4 4 4

Ans.: Range = 12, x = 60, o = 6.


35. The range, arithmetic mean and standard deviation of 10 items are 12, 50, 6 respectively. If each observation is
increased by 10, what willbe the range, arithmetic mean and standard deviation?
(H) State whether the following statements are True or False:
1. The dispersion of adata set gives insight into the reliability of the measure of central tendency.
Ans.: True
2. The standard deviation is equal to the positive square root of the variance.
Ans. True
Ihe diference between the highest and lowest observations in a dataset is called the inter-quartile range.
Ans.: False
4, The inter-quartile range is based on onlytwo values taken from the data set.
Ans.: True
STATISTICS (SE, AI && DS) (5.24)
MEASURES OF DISPERSIO

ne standard deviation is measured in the same units as the observations in the data set.
Ans.: True
OTne variance, like the standard deviation takes intoaccount every observation in the data set.
Ans.: True
1. It iS possible to measure the range of an
open-ended distribution.
Ans.: Flase
8. The measure of dispersion ensures credibility to the measure of central tendency.
Ans.: True
9. The mean deviation is minimum when computed from the
median.
Ans.: True

10. Let X1, X2 ......,Xn be a set of values of X then the least root mean square deviation of X about x is known as
standard deviation.
Ans.: False: It is positive square root.
11. The standard deviation of a variable whose values are all egual, must be zero and converse is also true.
Ans.: True
12. The standard deviation is less affected by extreme values than the mean deviation.
Ans.: False: Mean deviation is less affected.
13. Consistent player has more variability in test SCore.
Ans.: False: Less variability.
14. If standard deviations of two groups are known, then value of combined S.D. lies between the S.D.s of two group.
Ans.: False

15. Range is not based on all observations and it does not give proper idea regarding variation between the extreme
observations.
Ans.: True

16. The range is widely used in the statistical quality control.


Ans.: True
17. The range of middle 50% items is computed with the help of quartile deviation.
Ans.. True
18. Variance is invariant to the change of origin.
Ans.: True
Chapter 6
MOMENTS

6.1 INTRODUCTION
There are several aspects of studying frequency distribution, In earlier chapters we have studied two of them viz. average
and dispersion. In order to study few more aspects such as symmetry, shape of frequency distribution (or frequency curve)
moments are useful. For quantitative data, we have used x = n
and g² = as the best measures of average and
Xx;- a)r
dispersion respectively. Here we study a more general type of descriptive measure such as It is called as moment.

Particularly for a=0, r =1, we get arithmetic mean and for a =x and r=2, we get variance.
X(X;- a)
Moments :The quantity is called as rtn moment (or moment or order r) about 'a'.
According to the choice of 'a, we make three categories of moments,viz. raw moments, central moments, moments about

6.2 RAW MOMENTS


Raw moment of order r (or rth raw moment) is denoted by yr and given by the following formula.

i=1
n
for individual observations
k

i=1
forfrequency distribution

i=1

The first four moments are of prime importance for defining various descriptive measures of data. Therefore, for r=1,2, 3,
4we get the first four raw moments H y , respectively. We state the corresponding formula in the following table.
Moments Formula For
Individua observations Frequency distribution
x/n=x Zfx; /N=x
2 2
F.F.
3 3

where N= f
Note: Arithmetic mean
() The raw moments are also called as the moments about origin.
(6.1)
STATISTICS (SE, AI&DS) (6.2)

6.3 CENTRAL MOMENTS MOMENT


Central moment of order r(or rth order moment) Hr is given by the following formula.
for individual observations

for frequency distribution.


Substituting r = 1, 2, 3, 4 in the above formula we get first four central moments, H W2 H3 and J4
respectively.
Moments Formula For
Individual observations Frequency distribution
I(-x)h =0 Et (%-x PN =0
Si(-x N=²
Xi(-x PN
Mainly we need central moments. However, Xi(-x)N where N=X
these are difficult to compute as
require to find relation between raw compared to raw moments. Therefore, we
6.4 moments.
RELATION BETWEEN RAW AND CENTRAL MOMENTS
Sf,(xi- x) Nx
Pi =
N N = 0

.. (1)
3

...(2)
2

The above relations also hold true for ... 3)


Note: In the above relations we observeindividual observations. Proofs can be given on similar
the following facts lines.
0 Sumn of the
coefficients on R.H.S. of each of the relations is zero.
() The final expression of ur
contains r terms.
(in) First term in the expression of u, is
positive and alternative terms are negative.
(iv) The last term in the expression of pr is
(H)
SOLVED EXAMPLES
Example 6.1:Compute the first four central moments for the
No of obs completed fllowing frequency distribution.
0-10 10-20 20-30 30-40
No. of workers 40-50
6 26 47
Solution : 15 6

Mid Pts.)Ereg ) X-25 fu


10 fu2 fu3 fu
0-10 5 6 -2 -12
10-20 15 24 - 48
26 -1 96
20-30 25 2 26 26
47 26
30-40 35 0 0
15 1 0
15 15
40-50 45 6 2 15 15
12 24
Total 100 48 96
-11 80
-11 233
DS) MOMENTS
STATISTICS (SE, AI & (6.3)
- 11
Moments of U: =-0.11
Raw 100
89
Hy = ~fix; /f = 100 = 0.89

3 -11 =-0.11
100
233
Ha = Xfx; /Ef = 100 = 2.33

Central Moments of U: HË = 0
2
U2 = , -H, =0.89 -(-0.11)2 = 0.8779
,3
H3 = 3 -34, H, + 2u, = -0.11 - 3 x 0.89 x (-0.11) + 2x (-0.113
=-0.11 + 0.2937-0.002662 = 0.181038
2

= 2.33 -4 x(-0.11) × (-0.11) + 6 x 0.89 x (-0.11)2 -3 (-0.11


= 2.33 -0.0484 + 0.064614 -0.00043923 = 2.3457748
X- 25
Central Moments of x: Since u = we get x = 25 + 10U, hence u, of x = (10) Hr of u
10
H1 of x = 0
, of x = 102 × 0.8779 = 87.79
Ua of x = 10 x 0.181038 = 181.38
Ha of x = 10 x 2.3457748 = 23457.748

Example 6.2: The first four moments of adistribution about the value '5' are 2, 20, 40 and 200 respectively. () Find the frst
four central moments. () Find the arithmetic mean and S.D.

Solution : (0 Let u= x-5. Raw moments of Uare H, =2, H, =20, M,'= 40 and H, = 50. Let us compute central
moments of U. Notice that the central moments are invariant to the change of origin, hence central moments of Uand x
are same.
Central Moments: HË = 0
2

H2 = H-H =20- 22 = 16
P3 = , -3u, H, + 2u, = 40 3 x 20 x2 + 2x 23 = -64
,2 ,4
- 3u, = 200 -4 x 40x 2 + 6x 20 x 22 -3x 24 = 312
() Note that U= X-5

u= x-5
2 = x-5 (. 4 of U= u)

S.D. =Vea = V16 =4.


POINTS TO REMEMBER
Ihe relation between raw and central moments :

-341
STATISTICS (SE, AI &DS) (6.4) MOMENTS
EXERCISE
(A) Theory Questions :
Dene raw and central moments of (i) frequency distribution (i) series of individual observations.
2. Describe the utility of moments.
3. Express first four central moments in terms of raw moments.
4. Show that the central moments are invariant to the change of origin.
5. Ifu= k, show that ur of U= k' u, of x.
(B) Numerical Problems :
1. Find the third central moment of the following observations: 1, 2, 3, ..., n.
Ans.: 0
2. Find the first two moments of numbers 0, 0, 0, 1, 1, 1, 1, 1.
3. Find the first four central moments of the following observations: 4, 0, 2, 6, 3, 1, -7,-5,1,5.
Ans.: uz= 10.6, H3 =-35.4, J4 = 607.2.
4. Compute the first four central moments of the following frequency distribution:
2 3 4 5 7
2 9 25 35 20 1

Ans.: u, = 1.39, 3 = 0.018, ua = 5.5237.


5. Find the first four central moments of the frequency distribution given below:
Class 100-105 105-110 110-115 115-120 120-125
7 13 25 25 30
Frequency
Ans.: u2= 38.09, 3 =-110.772, u4 = 3229.7056.
6. Given that :f =125, Sf (- 10) =-46, (; -10)2 =306, Zf (%;- 10)=-292
S (x;- 104 = 1962, find the first four central moments of x. Ans.: U2 = 2.3122, uz = 0.2769, u4 = 14.1915.
mean and variance.
7. The first two moments of a distribution about the value 4 are 3 and 34. Find the
Ans.: 7, 25.
its mean, S.D. and the third central moment.
8. The first three moments of a distribution about 2 are 1, 22, 10. Find
Ans.: x =3, o =21, u3 =-54;
moments. Ans.: u, =1, u, = 4, H, = 10, P, =46.
9. Given that x =1, u, = 3, ua = 0, Wa = 27 find the first four raw
(C) State whether the following statements are True or False. Ans.: True
1. Even ordered central moments are always positive. Ans.: True
2. The first central moment is zero.
Ans.: False
3. The first raw moment is equal to standard deviation. Ans.: False
4. The second central moment is equal to variance. Ans.: True
5. Central momentsare independent of change of origin.
(D)Answer in briefthe following:
1 Define rtn order raw moment.
2. Define rth order central moment.
of raw moments.
3. Write the expressions for first four cental moments in terms
4. Show that , 2 mean?.
2

5. Showthat u4 2 .
6. Show that u, 2 0, H4 2 0.
moments.
7. State the Sheppard's correction for second and fourth central Ans.: l3 0
8. State of data set (1, 2, 3, .., n). n²-1
Ans.: 2 12
9. State u of dataset (1, 2,3, ., n).
Ans.:12x2
10. IfX has ur= 12 and U = 2X +5 then find ur, of U.
Chapter 7
SKEWNESS AND KURTOSIS
7.1 INTRODUCTION
In the previous chapters we have studied twO aspects in the study of frequency
distríbution viz. average and
dispersion. However, in order to compare two frequency distributions, average and dispersion are not adequate.
Sometimes two trequency distributions have same average and dispersion, however they differ in symmetry.
Therefore, symmetry is the third aspect in the study of frequency distribution. The term skewness carries the meaning9
opposite to symmetry i.e. lack of symmetry. In this chapter we study various measures of skewness.
7.2 SYMMETRY
Afrequency distribution is symmetric about avalue 'a', ifthe corresponding frequency curve is
(See Fig. 7.1). In
symmetric about 'a
other words, the ordinate at x = a divides freguency Curve into two equal parts. For a symmetric
frequency curve these twO parts are mirror images of each other. The point 'a' turns out to be arithmetic mean, mode
as wellas median.

Symmetric frequency
Curve about 'a'

Fig. 7.1
In case of symmetric frequency distribution, frequencies of classes equidistant from central class on either side are
same.

For Example:
Class 0-10 10-20 20-30 30-40 40-50
Frequency 5 12 20 12 5

Here, the frequency of first is the same as that of the last class. Similarly, second and second last classes have equal
frequencies and so on.
Properties of Symmetric Distribution:
0In case of bellshaped unimodal symmetric frequency distributions, arithmetic mean, mode, median coincide.
(Gi)
The quartles of symmetric distribution are equispaced. By that we mean Q-Qa =Q2-Q.
(n) The odd order central moments of symmetric distribution are zero.

Inday-to-day life we come across several distributions which are not symmetric.
For Example:Distribution of income of individuals, distribution of agricultural land holdings, distribution of number of
misprints per page. In these situations we require to measure the extent of departure from symmetry.
7.3 SKEWNNESS
Skewness is a lack of symmetry or departure from symmetry. If the distribution is skew the corresponding frequency
Curve is elongated on either side. If the curve is elongated towards right side (Fig. 7.3), then the distribution said to
possess positive skewness. On the other hand, if it is elongated towards left side (Fig. 7.4), the distribution is said to
possess negative skewness. In other words, in case of positive skewness, the frequency increases rapidly to reach the
maximum and further decreases slowly. Exactly reverse process is observed in case of the distribution with negative
skewness.
(7.1)
SKEWNESS AND KURTOSte
(7.2)
STATISTICS (SE, AI& DS)
distribution we observe that,
In case of positively skew
Mode < Median < Arithmetic mean
distribution we observe that
Whereas, in case of negatively skew
Arithmetic Mean <Median <Mode.

Mo < Me < Mn Mn < Me < Mo


Mo = Me = Mn
Fig. 7.4
Fig. 7.3
Fig. 7.2 Mn = arithmetic mean
Me =median
Mo = mode,

Examples :
(1) The frequency curve of annual income is positively skew

Frequency

Income
Fig. 7.5

(2) The frequency curve of deaths among adults is negatively skew

No. of
death

Age
Fig. 7.6
(3) The frequency curve of intelligence quotient is symmetric.
74 KARL PEARSON's cOEFFICIENT OFSKEWNESS
Mode is the most sensitive average to departure from symmetry. Larger the skewness, larger is the difference betwen
arithmetic mean and mode. In case of positively skew data we observe, A.M. - mode > 0 and in case of negatively skew
data A.M. - mode < 0. Therefore, the quantity (A.M. - mode) gives the extent of skewness as well as the type of skewnes.
Thus using the quantity (A.M. - mode) a relative measure of skewness is given below:
A.M.- Mode
Karl Pearson's coefficient of skewness (S) =
If, S7 <0, distribution is negatively skew.
Sk =0, distribution is symmetric.
Sk> 0, distribution is positively skew.
Remark:
(0 Karl Pearson's coefficient of skewness (Sk) is
independent of change of origin and scale.
(i) It cannot be computed for a distribution with open end
classes as well as for qualitative data.
(iti) Theoretically, there is no limit on the value of S. However, in
majority of the cases it lies between - land .
goes beyond- 3 and 3.
STATISTICS (SE, AI& DS) SKEWNESS AND KURTOSIS
(7.3)
(iv) Sometimes, mode is ill-defined. It cannot be computed, hence there is difficulty in computing Karl Pearson's

coeficient of skewness. In such a case, we use the following empirical relation (x -


moderately skew distribution. Hence, mode)*3(X- media) o

S7 =
3(x - Median)
w S is a unitless pure number,.
In case of qualitative data and frequency distribution with open end classes we cannot compute arithmetic mean and S.D.
order to Overcome this difficulty, we measure skewness using quartiles.
E5 BOVWLEY'S COEFFICIENT OF SKEWNESS
The frst and third quartiles of symmetric distribution are equidistant from median. (Fig. 7.7). If the frequency curve is
elongated towards right side then the third quartile goes away from the median as compared to the first quartile.
Accordingly for positively skew distribution Q, -Q >Q,-Q, (Fig. 7.8), In case of negatively skew distribution left side tail
of frequency cuve IS elongated, which influences the first quartile to go away from the median. This results into
Qg-Q2<2-1(Fig. 7.9).
The amount of skewness and the type of skewness is reflected by the quantity (Q, - Q)- (Q,- Q). A relative measure
based on this quantity is called as Bowley's Coefficient of Skewness (Sp), which is given by the following formula
Sg
(Q3-Q2)-( -Q1) Q3-2Q +Q1
(Q3-Q2) + (Q2-Q) Q3-Q1

Q, Q Q
Symmetric Positively skew Negatively skew
Fig. 7.7 Fig. 7.8 Fig. 7.9
2-Q1 =3-Q2 Q3-Q2> Q2-Q1 Q3-Q2 < Q2-Q1

The corresponding box plots will look like as follows :

Qa Q Qa Q.
Q4
Fig. 7.10

Result : The Bowley'scoefficient of skewness Sp lies between- 1 and 1.


Interpretation :
IH, Sg <O, the distribution is negatively skew
Sg = 0,the distribution is symmetric
Sg > 0,the distribution is positively skew.
MOMENTS)
6 PEARSONIAN COEFFICIENT OF SKEWNESS (B) (BASED ON zero. Further,
order central moments of a symmetric distribution are
tne earlier discussion we have studied that, the odd
negative for negatively skew distribution (except ui). Hence, odd
nose are positive for positively skew distribution and
skewness.
der moments can be used to, define a measure of Pearsonian
central moment U = 0, therefore, we use u, for measuring the amount of skewness. The
e irst odd order following formula :
is a relative measure of skewness given by the
OeTCient of skewness is denoted by . It 2
P3
n= VB where, B, =3
SKEWNESS AND KURTOSK
STATISTICS (SE, AI && DS) (7.4)
Note that B is always positive, so it fails to exhibit the type of skewness. Therefore Y, a measure which consido.
is obtained by simply taking
square root of P1
this ta
3
3/7

Interpretation : Since,
)> 0, we take u3/2 > 0. Thus, Y possesses the sign of W3.
If, n <0, the distribution is
negatively skew.
h=0, the distribution is symmetric.
Y>0, the distribution is positively skew.
Remark:
() t can be shown that, the various measures of skewness which we have
discussed earlier are invariant to change of
Oigin and scale. These measures are based on the differences of similar
quantities. Note that S7 Is based on
(X- mode), Sg is based on (Qa-O,) - (0,- O,) and w. jis
based on (X: - x), Hence, these measures are invariant to the
cnanges of origin. Moreover, these measures are expressed in terms of
Therefore, measures of skewness are invariant to the ratios of quantities possessing same unit
(0) Skewness is a lack of symmetry. This lack change of scale als.
may be either positive or negative. Hence, while
distributions one has to compare the magnitudes of skewness. For comparing tWO Trequency
the coefficient of skewness as 0.5 and - 0.8. example, consider two frequency distributions with
Then the
skewness however is different, former is positively skew, latter has larger skewness, since - 0.8 > 0.5|. The nature of
(i) Choice of measure of skewness : while the later is negatively skew.
The Pearsonian coefficient of
However, it is not simple to compute. Hence, Karl skewness Y is the best among all the measures.
Pearson's
distribution has open end classes or qualitative data is under coefficient
study
of skewness Sk is preferred. If the
frequency
be computed. Under these then both of the above referred
situations, Bowley's coefficient of skewness is the only measures cannot
measure which can be used.
SOLVED EXAMPLES
Example 7.1 : Compute () the Karl Pearson's
frequency distribution. coefficient of skewness, (ii) Pearsonian coeffcient of skewness for
the following
Marks 0-20 20-40 40-6060-80 80-100
No. ofstudents 5 12 32 40 11
Solution:() In this case we need to compute mean, mode and S.D.
Marks Mid Pts. X Freg. X50
u 20 fuj fu Cumulative freg. less
0-20 10
than type
5 -2 -10 20 - 40 5
20-40 30 12 -1 -12 12 12 17
40-60 50 32 0 0
49
60-80 70 40 1 40 40 40 89
80-100 90 11 2 22 44 88 100
Total 100 40 116 76
Modal class : (60 80), fm =40, fË = 32, f, = 11, l= 60, h = 20, hence
fm-fi 40- 32
Mode = l+
2fm -fi-f,xh= 60 + 80 -32--11 X20 =64.3243
X-50
Note that 20
.:. X= 20 u+ 50
STATISTICS (SE, AI & DS)
(7.5) SKEWNESS AND KURTOSIS

X 20u + 50and oy 20 Ou
40
=0,4
100

X = 50 + 20 u =50 + 10 x 0,4 = 58
2
116
100-(0.4)2 =1
Ou = 1, Oy = 20 Gu = 20
Karl Pearson's coefficient of skewness

Sk =
X- mode 58-64.3243
=-0.3162
20
Interpretation : The distribution is negatively skew.
(i) For determining Bowley's coefficient of
skewness we need to compute the quartiles.
th
Qi = The value of/_100
\4 4 =25) item
Q, class:40- 60

Q1 = l+ N/4-C.f.
f x h= 40
5-17
x 20 = 45
32
Q, = The value of (N/2 = 100/2 = 50)th item
Q, class:60 - 80

Q = l+ N/2 - C.f. xh= 60 + 50- 49 x 20 = 60.5


f 40
(3N 300 \th
Q3 = The value of 4=75 item
4
Qz class:60 80

Q3 = l+
3N/4 - C.f. 75- 49
f xh= 60 + x 20 = 73
40
Bowley's coefficient of skewness:
Q3-2Q2 +Q1 73-2× 60.5 + 45
Q3-Q1 73 45
3
=-0.1071
28
. The distribution is negatively skew.
(i1) To find Pearsonian coefficient of skewness first we find the moments.

Note that (X-50)


20

Raw moments of U: Xfuj


-= 40/100 = 0.4
2

= 116/100 = 1.16
3

= 76/100 = 0.76
STATIST1CS (SE, AI &DS) (7.6)
SKEWNESS AND KURTOSIS

=1
Central moments of U: Hi = 0, , - , = variance = 1.16 -(0.4)2
3
0.4 + 2 (0.4)3
M = -3u, H, + 2u, = 0.76 -3x 1.16 x
= 0.76- 1.392 + 0.128 = -0.504
Since, the coefficient of skewness is independent of both origin and scale we get B of Xand B1 of Usame.

B,=(-0.504)/13 =0.2540

3/2
=-0.504

Interpretation : The distribution is negatively skew.


Example 7.2 : From the information given below, compare the skewness of the two groups.
Group l Group I
Median 22 25

Arithmetic Mean 24 22

S.D. 10 12

Solution: Since, mode is not given we use Karl Pearson's coefficient of skewness
(X- median)
Sk =
3 (24-22)
S7 for groupI = = 0.6
10

S7 for group I =
3(22- 25) =-0.75
12
Interpretation : () Group Iis positively skew whereas group II negatively skew. (i) Since |Sk = 0.75 for group I is larger
than that of group I, group II possesses more skewness.
Example 7.3: Adistribution has mean 30, coefficient of variation 20% and coeficient of skewness is 0.3. Find its mode.

CV. = X 100= X100 = 20 .:. g= 6.


Solution: 30
()

X-mode
Further, coefficient of skewness = = 0.3

30-mode
= 0.3
6

30-mode = 1.8
mode = 28.2

Example 7.4: In ocertain frequency distribution the sum of upper and lower quartiles is 45 and the difference betven
them is 15. If the median is 20, find the coefficient of skewness.
Solution : Note that Q3 + Q = 45, Q3-Q1= 15, Q,=20.
The Bowley's coefficient of skewness is
Q-202 + Q1 (Q3 + Q)- 2Q2 45 -40
Sp = =0.3333
Q3-Q1 15
STATISTICS (SE, AI & DS) (7.7)
SKEWNESS AND KURTOSIS

Example 7.5:In a certain frequency distribution upper quartile exceeds the median by 10 units, whereas the median exceeds
the lowerquartile by 7 units. Compute the coefficient of skewness.
colution : Q3-Q2 = 10, Q2 -O, = 7

Bowley's coefficient of skewness = (Q3-2)-(Q2-Q) 10-7 3


(Q3-Q) + (Q-Q) 10 + 7 = 17
oample 7.6: Given the following summary, draw box plots and compare the data sets. State your
interpretations.
Min. Max.
Set A 12 15 20 25 28
Set B
10 14 20
Set C 3 17 27 32
35
30
25

20

15

101

A B
Fig. 7.11
Interpretations :
(1) Average of B < Average of C < Average of A.
(2) Spread of B < Spread of A < Spread of C.
(3) Ais symmetric
Bis positively skew
Cis positively skew
7.7 KURTOSIS AND TYPESOF KURTOSIS
We have studied in preceding chapters the various three aspects of
comparison of frequency distributions viz. average,
dispersion and symmetry. However, the above three aspects are not enough for comparison. Two bell
frequency distributions may have same average, dispersion and same amount of skewness still they shaped, unimodal
may differ in the
fourth aspect viz. the relative height of the curve. This is referred to as Kurtosis. Detailed discussion is given below.
Definition:Clark and Schkade defined kurtosis as the property of a distribution which expresses its relative
peakedness.
Types of Kurtosis :
Thus, kurtosis is a height of unimodal, bell shaped curve or according to Karl Pearson,
convexity of curve. The main
reason of variation in height is variation in the concentration or proportion of observations around mode. If the
proportion of obsevations around mode is more then the curve willexhibit sharper peak or higher peak. On the other
hand, lower concentration around the mode will cause the curve to have blunt peak or peak with small
height. The
Curves are classified in three groups according to the relative peakedness.
In this reqard, normal distribution is considered to be the standard. The distributions having
of normal distributions are called mesokurtic distributions. The distribution having more peak than
peakedness equal to that
that of normal
distribution, is called as leptokurtic distribution, if it has less peak than that of normal distribution then the distribution
IS called as platykurtic (See Fig. 7.12).
SKEWNESS AND KURTOSG
STATISTICS (SE, AI & DS) (7.8)
Leptokurtic curve

-Mesokurtic curve

Platykurtic curve

Fig. 7.12

7.8 MEASURES OF KURTOSIS (B2)


scales
Measurement of kurtosis using fiqure poses several difficulties such as inaccuracy, subjectivity, lack of uniformity in
Moreover, curves with larger variance tend to have small peak and vice-versa. By considering all these facts measures
defined as
based on central moments called as Pearsonian coefficients B and y, are used to measure the kurtosis and are
follows:

B2 = and y = Bz-3
Note :
1. » is called kurtosis or excess of kurtosis.
2. B2 and are invariant to change of origin and scale.
3. B and y, are both free from units.
4. Bz and ½cannot be used for qualitative data and frequency distribution having open end classes. Ameasure based on
quartiles and percentiles is used in this situation. It is denoted by Ky and given by:
Ku =
(Q-Q)/2
Pgo-P10
For normal distribution K, = 0.263.
The detailed discussion is out of scope of the book.
5. The moments used to find B2 Ya are corrected ones.
Interpretation of ß,and y,:
If ß, <3 i.e. , <0, the distribution is platykurtic.
If B, =3 i.e. Y= 0, the distribution is mesokurtic.
If B, > 3i.e. y, > 0, the distribution is leptokurtic.
Result 1: B2 2 1.
Result 2: Bz 2B1 + 1
Example 7.7:The first four raw moments of afrequency distribution are 2, 20, 40, 200respectively. Comment on the nature
of Kurtosis.
Solution:We are given that , =2, , =20, 4, =40, P, = 200. We need to find u, and ua.
,2
H2 = -H, =20-4 = 16
,4

= 200- 4 x 40 x2 +6x 20 x 4-3 x


24 = 312
312
B2 =2 162 = l.2138 and y, = B-3 = -1.7812

Interpretation : Since B, < 3, the distribution is platykurtic.


sTATISTICS (SE, AI & DS) SKEWNESS AND KURTOSIS
(7.9)

POINTS TO REMEMBER
Karl Pearson's coefficient of skewness S =Mean- Mode
S.D.

Bowley's coefficient of skewness Sp = (Q-Q)-(Q2-Q


(3-Q) + (0,-0.)-1sSg S 1.
2
Pearsonian coefficient of skewness B = 3/2

Coefficient of Kurtosis B = 2, Y½ =B-3.

If coefficient of skewness is negative distribution is negativelv skew ifit is positive, distribution is positively skew, iT it is
0 distribution is symmetric.
If Bo > 3 distribution is leptokurtic, B = 3, distribution is mesokurtic, B, < 3
distribution is platykurtiC.
EXERCISE
(A) Theory Questions:
Skewness :

1. Explain the term 'skewness', using suitable diagrams. Explain the different types of skewness. Draw sketches to
indicate the relative positions of mean,mode median for these types.
2. Explain the following measures of skewness () Karl Pearson's coefficient of skewness (i) Pearsonian coefficient of
skewness based on moments, (i) Bowley's coefficient of skewness.
3 Ifthe mode is indeterminate, how Karl Pearson's coefficient of skewness is
computed.
4. (a) Show that Bowley's coefficient of skewness lies between -1 and 1.
(b) State the relative positions of quartiles and draw sketches in case the distribution is () symmetric, (i) positively
skewness, (ii) negatively skewness.
5. Write a note on skewness.

6. Show that the measures of skewness are independent of both change of origin and scale.
7. Define skewness, state the types of skewness, state the various measures of skewness. Which of the measures are
suitable for qualitative data ?
8. In each of the following cases state the relative positions of mean, mode, median (0) positively skew distribution
ö) negatively skew distribution.
Kurtosis :
9. State what is 'Kurtosis'.
10. What are the types of kurtosis?
11, Explain the use of moments in measuring kurtosis.
12. State the Pearsonian measure of kurtosis.
13. Show that B, 1. Notation have usual meanings.
(B) Numerical Problems :
1. For a group of 10 items, Sx = 452, X² = 24270 and mode = 43.7. Find the coefficient of skewness by appropriate
formula.
Ans.: 0.08
2. Given that, arithmetic mean 160, mode = 157, a =50, find () Karl Pearson's coefficient of skewness (i) median
(Hi) coefficient of variation.
Ans.: 0Sy= 0.06 1) Median = 159, (i) CV. =31.25%
SKEWNESS AND KURTOSTs
STATISTICS (SE, AI&DS) (7.10)
of a TrequenCy distribution ar
ditnmetic mean, standard deviation and Karl Pearson's coefficient of skewness
e
29.5, 6.5 and 0.32 respectively. Find the mode and median.
Ans: Mode = 27.52, median = 28.9 of skewness
the Value of 1 are 2, 25 and 80. Find the coertICient
+. The first three moments of a certain variable about
Y and interprete the result.
Ans.: -0.5619
Pearsonian coefficient of skewness 1
5. Showthat the followingseries of observations is symmetric using
1, 2, 3, ....... n.
Ans.: 0
median and Karl Pearson's coefficient of skewness
6. For amoderately symmetrical distribution the arithmetic mean,
are 86, 80, 0.42 respectively. Find the mode and coefficient of variation.
Ans.: C.V. = 49.8339%, Mode = 68
7. The first three moments about the value 3 for a certain distribution are 1, 16, 40 respectively. Find the mean,
variance, third central moment. Also find y and comment onthe nature of skewness.
Ans.: Mean = 4, Variance = 15, Uz =-86, Y = -1.4803
2

8. For a symmetric distribution, with usual notation, prove that:


9. For afrequency distribution, Bowley's coefficient of skewness is 0.6. The sum of first and third quartiles is 100 and
the median is 38. Find the two quartiles.
Ans.: Q1 = 30, Q: = 70
10. For the following frequency distribution of marks of candidates, find the Bowley's coefficient of skewness. Also draw
box plot and interpret.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80

No, of candidates 5 25 40 70 90 40 20 10

Ans.: -0.1101
11. Compute coefficient of skewness for the following frequency distribution using appropriate formula :
Age (Year) No. of deaths
under 10 15
under 20 25
under 30 48
under 40 70
under 50 95
under 60 105
under 70 110
under 80 120
Also draw box plot and comment on the symmetry.
Ans.:-0.02846
12. Obtain (i) Karl Pearson's coefficient of
skewness (iü) Pearson's coefficients of skewness Y, based on momens
following data :
Height (in inches) 59-61 61-63 63-65 65-67 67-69
No of students 4 30 45 15 6
Ans.: ) 0.1238 (i) 0.4926
STATISTICS (SE, AI & DS)
(7.11) SKEWNESS AND KURTOSIS
12 For two distributions A and B
following summary statistics are available.
A B
Median 20 24
Q. 13 14
Q 30 31
Compare the skewness of twO distributions () using appropriate
measure of skewness, () using b0X plot.
Ans.: Sg for A= 0.l/65, Sg for B =-0.1765
14 The first four moments about '4' of a certain
distribution are 1.5. 17. - 30 and 308. Find coefficient of skewness and
coefficients of kurtosis and interpret.
Ans.: 1.5734, platykurtic. B1 = 0.7917, negatively skew.
15. Find the coefficient of skewness and kurtosis based on
moments for the following frequency distribution.
Class 0-10 10-20 20-30 30-40
Frequency 1 3 4 2
Ans.: %=-0.1975, B, = 2.258, negatively skew and
platykurtic.
16. Given that n = 100, X = - 10, Sx² = 140, Sx³ = -40, Sx4 = 560. Find B,
BT and comment upon the nature oT
skewness and kurtosis of the distribution.
Ans.: B, = 0.000121, Bz = 2.8589, moderate positively skew and
platykurtic.
17. If , =1, = 4, V, = 10, H, = 46, compute B, B2, hence comment upon the nature of distribution
regarding the
skewness and kurtosis.
Ans.: B1 = 0, ß =3, symmetric and mesokurtic.
18. Is the following information consistent ?

H = 2 , = 20, Ha = 40, H, = 50
Justify your answer.
Ans.: Inconsistent, U4 < H2
19. Given that B = 2.6, B1 = 0.19, u, = 1.2. Find Pg and 4.
Ans.: 0.573, 3.744
20. Variance of a mesokurtic distribution is 4. Find H4.
Ans.: 48
21. The first four moments about'5.2 are 0, 5.16, -2.3 and 60; find Band B, and interpret.
Ans.: 0.0385, 2.2537, negatively skew and platykurtic.
22. Given that Sf= 100, Sfu = 100, Sfu? =4000. fu³ = 24,500, Sfu = 1386000. Find Band B,.
Ans.: 0.255, 8.6239.
23. Find B, Ba.Y, for the following frequency distribution.
Daily wages 70-90 90-110 110-130 130-150 150-170
16 22 36 18 8
No. of workers
Ans.: B= 0.006261, h = 0.0791, B, = 2.3058, Y, =-0.6942.
(9 Simple Numerical Problems :
1 For a moderately skew distribution mean is 29.6 and standard deviation is 6.5 and (Sk)p is 0.32. Find mode and
median.
2. Given that: Arithmetic mean = 225, Mode = 213, Variance = 164. Calculate Karl Pearson's coefficient of skewness
and comment on the nature of distribution.
SKEWNESS AND KURTOSIS
STATISTICS (SE, AI &DS) (7.12)
The mean, median and coefficient of variation of hourly wages of a group of workers are R 45, 42 and 40%
3.
skewness for the distribution of wages.
Tespectively. Find (i) mode. (i variance and (i) coefficient of calculate B, and interprete.
La = 3.75,
4.
Deine r cement moment Given that ls = 12 and 1.22 respectively. Find values of
5 The first four central moments of a distribution are 0, 9.2, 3.6 and appropriate
measures of skewnes
For a certain and kurtosis.
distribution mean =Also, interpret=the
10, variance 16, resuits.
Y = + 1 and B, = 4. Obtain the values of third and fourth
6
the distribution.
Central four rawAlso
The firstmoments. comment
moments areon1, the
4, 10 andof46. Compute skewness and kurtosis of the distribution using a
nature
7
set.
unonthe skewness and kurtosis of data
measure based on moments Comment M,=2, find l, and ..
Define th central moment for afrequency distribution. Given that : Bi = 0.5,F) = 2.5,
For a certain data set, the following results were obtained: Mean = 90, Mode = 96, Coefficient of skewness =-
variation.
0.4. Find the values of standard devjation and coefficient mode ie o0
distribution is 0.4 and coefficient of variation is 30%. Its
10. Pearson's coefficient of skewness for a
the mean and are median, (Assume that mean is positive).
moments. AISO, arrange following distributioni
11. What is Kurtosis ? State a measure of Kurtosis based on
ascendingorder of magnitude
(a) Mesokurtic distribution
(b) Leptokurtic distribution
(c) Platykurtic distribution.
(D) State whether the following statements are True or False.
1. Allthe coefficient of skewness are unitless measures.
Ans.: True
2. For a symmetric distribution the quartiles are equispaced.
Ans.: True
3. Bowley's coefficient of skewness cannot be negative.
Ans.: False
4. If mean, mode, median coincide then the distribution is symmetric.
Ans.: True
5. For negatively skew distribution left tail is more elongated.
Ans.: True
6. The coefficient of kurtosis B, and , are unitless.
Ans.: True
7. The coefficient of kurtosis is always greater than unity.
Ans.: True
8. The box-plot helps find the kurtosis.
Ans.: False
9. Asymmetric distribution is always mesokurtic.
Ans.: False
have
10. If two distributions have the same value of [B, the coefficient of kurtosis, then the frequency Curves A and B will
exactly the same height. Assume that both are plotted with the same scale.
Ans.: False
INTRODUCTION
A) CONTI
CONTINUOUS
NUOUS PROBABILITY DISTRIBUTIONS Chapter 10

We have
seen that there are three
types of sample
RANDOM VARIABLES
chapter, We discuss the random
samplespace which is
variables defined onspaces (i) finite (i)
A finite or
countable thenit is called count abl y cont inuous sample count
space.
ably infinite and (in) Continuous. In this
infinite is called as
correspondence between Q continuous.
and set of
In other
words for a denumerable or countable. If the sanple space not
lustrations ofUncountable natural numbers {1, 2, ..continuous sample space we cannot have is
Sample Space : one-to-one
Suppose weight of an oil bag
Thesample space will be an
having capacity of 1kg filled by an
interval in the automatic filling machine is noted.
Suppose
D in an experiment, life of an neighbourhood
electronic component
of 1 ka such as
in hours is
2=(0.980, 1.005).
scample space in this case may be an
interval as a part of R+ such asrecorded.
sQ = (0.
Note: A continuous sample space is an internal on real line. 5000)
M2 CONTINUOUS RANDOM VARIABLE
.neral we define a random
variable X (0) as a real valued function on
the rN. is continuous. The range set will a
be subset of real line. domain . If the range set of X(0) is
llustrations of Continuous r.V: continuous,
1) Weight of a person in kg.
(2) Consumption of electricity of a town in a specific month.
3) Daily rainfall in mm at a particular
place.
4) Instrumental error (measured in suitable
units) in the measurement.
5) Life in hours of an electrical
component.
ote : The distinction between continuoUs random variable and
discrete random variable is as folows.
AContinuous r.v. takes all possible values in a range set. The set is in the form of
interval. On the other hand discrete
rx. takes only specificor isolated values.
Since a
r.V. X.
continuous rv. takes uncountably infinite values, no probability mass can be attached to a particular value of
Therefore,
values taken by r.v.
P(X = x) = 0, for all x. However in case of a discrete r.v, probability mass is attached to individual

In case of continuous r.v. probability is attached to an interval which is asubset of R.


103
CONTINUOUS PROBABILITY DISTRIBUTION
In case of discrete rv. uSing p.m.f. we get probability distribution of r.v., however in case of continuous r.v. probability
attached to an interval
dttached to any particular value, It is attached to an interval, The probability
den
depends upon its location. order to
For example, P(a < X < b) varies for different values of a and b. In other words, it will not be uniform. Indensity.
concept of probability
obtain the probability associated with any interval, we need
to take into account the

(10.1)
(10.2) cONTINUOUS PROBABILITY DISTRIBUTIONG
STATISTICS (SE, AI & DS)
probability density function, should be a non-negative and
A function f(x) which is to be treated as a

interval.
Sx
function of x. The probability that a variable Xtakes values in a small intervalx- 2' X+
Sx
2 will be the Coprnotiductnuous dt
length of interval and the value of density function f (x) at the centre of
Sx
X- <X< x+ f(x). Sx

Note:
ôx
(1) Here we assume that the probability density is constant over the intervalX - 2X+ 2) This
assumption
valid for large interval. To overcome this difficulty we integrate f (x) w.r.t. x over the given interval. (In
wllnot be
b
case of discrete
r.v. we take summation). Thus P(a <X< b) = f(x) dx.
a

(2) The above probability is a definite integral hence


and the ordinates at a and b.
geometrically it is the area under curve y = f () bounded by Xais
Precise definition of probability density function is as follows.
Definition : Areal valued function f (x) is called as a probability density
if function (p.d.f.) of a continuous random variable X
() f(0) > 0;

(i) f() dx = 1.

Note : (0) Since probability is associated with any


b individual value is zero, P(a < X < b) = P (asX < b) = P (a <
Xsb) =P
(asX<b) = J f (x) dx = Area under the curve f ()
bounded between X axis and the ordinates ata and b.
It is shown in Fig. 10.1 by
shaded region.

y= f(x)

X
Fig. 10.1

(i) f(3) dx, can be


interpreted as P(-oo < X< o).
It also represents the total area under the density curve bounded by X
axis.
Thus the total area under the curve =
P(-o<X< o) = f(x) dx = 1.

(iii) If AcR, then P (Xe A) = J f(x) dx.


A
DS)
STATISTiCS (SE, AI & (10.3) CONTINUOUS PROBABILITY DISTRIBUTIONS

SOLVED EXAMPLES
Example10.1: Verify which of the following functions are p.d.f.s
() f (x) = 3x2 0sx<1
otherwise
(i) f(x) = 2 eX X>0
= 0 otherwise
ealhrtion: To verity wnetner T() Is p.d.f. we need to verify the following two conditions (a) f (x) 2 0, X and

f(x) dx = 1

f (x) = 3x2 > 0


1

and 3x² dx = J 3x² dx = = 1

Therefore, f () is a p.d.f.

(i) f(x) = 2eX >0 X and J2ex dx = 2 Je-x dx = 2+ 1

Hence f () is not a p.d.f.


Example 10.2 :Find the constant k such that the following functions are p.dfs.
f () = k (x- 1)2 1<x<3
= 0 otherwise

() f () = 1+ x2 -o0 KX < oo

Solution : () Since x > 1, f(x) = k (x-1)2 >0 ’ k 0 .. a)


3

J f() dx = J k (x-12 dx = 1
1

= 1

k
[8-1] = 1

k= .. (b)
: 7
3
3
Ihus from (a) and (b) we get k= . therefore f(x) = 7 (X-1)-.
k
f (X) = 1+ x² >0 implies thatk 0 ... (a)

k
= 1
J f(x) dx = 1+ y2 dx

k. [tan- ()1 = k- [tan-l (oo)-tan- (- oo)]) = 1

k [T/2-(-T/2)] = 1
CONTINUOUS PROBABILITY DISTRIBUTIONS
(10.4)
STATISTICS (SE, AI& DS)
k+t = 1
k = 1/T

From equation (a) and (b) we get


1
k =

11
f (x) = TI 1+ y?

Example 10.3 : Suppose acontinuous r.v. Xhas p.d.f.


-1<x2
f () = 3
= 0 otherwise

If A= {x|x>0),B =
find P(A), P(B), P(A), P (AnB), P(AU B), P (A'nB), P(A' UB'), P(A'n B).
2 2
8
Solution: ) P (A) = P(X>0) = f (x) dx = m
3
dx = =

1/2

P (B) =
-1/2

1/2 1/2
1
dx =
3 9 36
-1/2 .-1/2

8
(ii) P(A) = 1- P(A) =1-=
1/2 1/2
(iv) P (AnB) = P (0 X<1/2) = Sf() dx = dx

H|o 1
=

72
(v) P(AUB) = P(A) + P (B) P (An B)
=P-1/2 <X<2)
8 + 1 1 65
36 - 72
(vi) P (A'n B) = P (B) -P
(AnB) = P(- 1/2 < X< 0)
1 1 1
36 72 = 72
(vi) P (A'UB) = 1-P (AnB) = 1 1 71
72 72
(vii) P (A'n B') = 1-P 65
(AUB) = 1-79 = 72
Note: In case of discrete distribution P() <1, however in case of continuous distribution p.d.f. f(x) may be greaterthan
b
22
1, In the above problem f(2) => 1, but J f) dx <1.
a
(SE,
sTATISTICS AI & DS) (10.5) CONTINUOUS PROBABILITY DISTRIBUTIONS
t04DISTRIBUTION FUNCTION (D.F.) OR CUMULATIVE DISTRIBUTION FUNCTION (C.D.F)
dstributionfunction is defined in case of a continuous r.v. analogus to that of discrete rv. The summation is to be
eglacedbyintegration.
pstributionfunction is an important entity in the field of statistical inference, reliability theory and life testing etc.
pefinition:Let X be a continuous r.v. with p.d.f. f(x). The distribution function or cumulative distribution function denoted
definedas
is
Fio)
by F (X) = P(X s x)
X

Xiss defined to be continuous if F(x) is continuous.


Note: Arv.
Example 10.4: Ar.v..X has p.d.f.
f(x) = 2e 2x X> 0
= 0 otherwise
Find its distribution function
Solution : By definition,
X X

F(X) = J f(t) dt =J 2e2t dt = =1-2e-2x

Since F(x) is defined for all xe R we have to write


F (x) = 0 X<0
= 1-2 e-2X x>0
Note:
We have seen above how to find d.f. given the p.d.f. We can also find p.d.f. from d.f. as follows
d
f(x) = dx F (X)
2) Given thed.f. F (x) we can find P (a < X<b) as follows.
p (a < X<b) = P (X<b)-P(X<a)
= F (b) - F (a)
We state below the properties of distribution function.
Properties of Distribution Function :
4) Non-negative : F() is a non-negative function. That is, F(x) >0 x.
NO-decreasing : F(x) is non-decreasing. That is, if a < b then F(a) s F(b).
3) F(- o) = lim F() = 0 and F (oo) = lim F() = 1.
X’-oo
(4)
Continuity
:F () is continuous function for a continuous r.v. X.
Graph of F() of a continuous r. v. is asmooth continuous curve as shown in Fig. 10.2.
F(x)
1.0
0.75
0.5
0.25
X
-00

Fig. 10.2

functiCtionon must satisfy


stated properties of distribution function are called as characteristic properties. It means every distribution
these properties and any function satisfying these properties is a distribution function of some r.v.
STATISTICS (SE, AI &DS) (10.6) cONTINUOUS PROBABILITY DISTRIBUTIONS

Umple 10.5 :The life of tubelight (X in bours) follows the following distribution given by the p.d.f fo).
f(x) = 1000 < X < 2000

=0 elsewhere
Determine the constant k and the distribution function of X. Also compute the probability that a tubelioht
random will have life between 1500 hours and 1600 hours.
selected at
Solution: Sincef(x) dx =1, we have

2000
k
J dx = 1
1000

= 1

k = 2000
Distribution function F(x) is given by

F(X) = P[X<x] = J f() dt

X
1
= 2000 J +2 dt
1000

1000
F(x) = 2 2000
X 1000 < x< 2000.
Probability that the life of a tube is between 1500 and
1600 is given by;
F(1500 < X<1600) =
F(1600) F(1500)
1
200
1500 1600
= 0.0832
10.5 MEAN AND VARIANCE
In order to find mean,
varíance etc. we need to learn the
We have seen how to
find expectation ofa concept of expectation of a r.v.
used in case of discrete r.v. is discrete rv. On
replaced by integration in casesimilar
of
lines it is defined for
continuous rv. The
Summationsign
Definition Let Xbe a continuouS r.v. with
: continuous ry.
as p.d.f. f (x), then the mean or defined
expectation ofX is denoted by t (0) andis
E (X) =
x f(x) dx
provided an integral exists.
A1S1CS .AI &
(SE DS)
(10.7) CONTINUOUS PROBABILITY DISTRIBUTION:
p.d.f of Xis defined (non-zero) over an interval Ja,
fthe b], then
b
E(X) = J xf(0) dx
pectationof a Function of X:If g() is a real valued
function of a rv. X, then E[g
()] is given by i
ELg ()] = J g(x) f(o) dx.
exists.
jdedthe integral
Herealso, whenever f(x) is defined over afinite interval [a, b],
b
then
E[g ()] = J g()
f(x) dx
ein order to find Elg
(x)], we need not find p.d.f. g(x); using p.d.f. of X, viz f(x): it
Nnw we can define variance of X as follows can be determined.
Var (X) = E[X-E (X)]
=E(X) - (E(X))?
komple 10.6:IfXis a r.v. with p.d.f

f(x) =6x (1- x) 0<x<1


= 0
fnd () Mean () Variance of X. otherwise
1
Salution :
Mean = EX) = xf(X) dx
1
1 1

x 6x (1-x) dx = 6 x dx -6 J * dx
= 0.5
Var (X) = E(X)- [E(X)12
1 1
Now, E(X) = J x* f() dx = J 6x(1-x) dx
0

1 1

= 6J * dx -6J x* dx
0

= 0.3
Hence Var (X) = 0.3 - (0.5) = 0.05
POINTS TO REMEMBER
A
IN. conttakesinuousonly rv.specific or isolated values.
takes all possible values in a range set. The set is in the form of interval. On the other hand discrete

Areal valued function f(x) is called as probabilitydensity function (p.d.f.) of acontinuous random variable Xif
cONTINUOUS PROBABILITY DISTRIBUTIONS
STATISTICS (SE, AI &DS) (10.8)

For acontinuous r.. probability assigned to a particular value Is Zero.


b

P (a < x < b) = P(asx b) = P(a < x[ b) P(a < x < b) = J f(x) dx = Area under the cunve f(x)
bounded
X-axis and the ordinates at a and b.
a

betwe n
If AcR, then P(X e A) = f(x) dx.
A

Let Xbe acontinuouS r.V, with p.d.f. fo). The distribution function or cumulative distribution function
defined as denoted by Ftais
Fx(%) = PK<x] = Jf(t) dt;

We can find the p.d.f. given d.f. as follows :


d
f(x) = dx F(x)
P(a < Xsb) = P(X < b)- P(X S a)
= F(b) - F(a)
The characteristic properties of distribution function are:
() It is non-negative function.
(i) It is a non-decreasing function.
(ii) F(-oo) =0 and F(+ o) =1.
(iv) F(x) is continuous function.
If X is acontinuous r.v. with p.d.f. f(x), then
mean or expectation of X is denoted by E(X) and is defined as
E(0) = J xf() dx provided integral exists

E(X*) = J * f() dx
Var () = E(X)- (E(X)] = E[X-
E(0))
(A) Theoretical Questions EXERCISE 10 (A)
1. Define continuous random variable and
continuous r.v. explain the terms (i) probability density function
functionof
(ii) distribution
2 State the four
2 characteristic properties of distribution function of a
Describe how distribution function of a r.N. Xis used in continuous rv.
(B) Numerical Problem obtaining P (a < X< b).
1. Verify which of the
following can be looked upon as p.d.f.of rv, X.
(a)
f (x) = 2 1
(b)
f(x) =
(c) xe-X X> 0
f(x) = X
= 2-x 0sxs1
(d)
f () = 1 1<x<2
(x + 1)2 x>0
AISICS ,AI & DS)
(10.9)
CONTINUOUS PRORABIIITY DISTRIBUTIONS
1
(e)
f() =e- k-ml ;
- 0< m
f (x) = <oo
sin x
Yes (b) Yes (c) Yes (d) Yes (e) Yes
() Yes
0sx<Tð2.
(9)
2 Findthe value of constant cin each of the following p.d.f.s.
As:

f () = cx3
0sx<1
b f () = C
-2 S x< 2
(C f(x) = c x (2-x) 0<xs 2
f(x) = Ce mX
X>0, m>0
(e f(x) = c sinx coS X; 0
<x< T/2.
4 (b) 1/n (c) 3/16 (d) m (e) 4
Ans:(a)
3 Suppose Xis a rV. with p.d.f.
1
f (x) = X>0.
(1 + x)2
#A = {1 <x<9} and B = {0< X < 4}, find P (A), P (B), P (A), P (A o B), P
(AU B), P (A'n B), P (A'u B),
P(A'nB).
Ans: P(A) = 0.4, P (B) =0.8, P(AnB)= 0.3, P (AUB) = 0.9
P(A'OB) = 0.5, P(A'uB) = 0.7, P(A'nB) = 0.1.
4. IfY is a rv. with p.d.f.
k
f(y) = if 0 <y <4

= 0 elsewhere
find () the value of k
(i) distribution function of Y
(ii) P(1<Y<2)
Ans: (0) 1/4 (0) F() =0y s0 (Gi) V2 -1y/2 =Vy /2; 0sys4
= 1;y > 4
5. IfX is arv.with distribution function
F(X) = 0 X<-1

= (x + 1)/2 -1<x < 1


= 1 X1.
find () P(2 < X< 3)
(ii) p.d.f. of X
(ii) E(0), E(X), var (X).
Ans. 0) 0 (60) f
() = 1/2; -1<x<1
i) EX) = 0 E(X2) = Var (X) = 1/3
Arv.X has distribution function
F() = 0 X<5
25
= 1 x>5

Ans: 1J4find P(X >10),


STATISTICS (SE, AI &DS) (10.10) CONTINUOUS PROBABILITY DISTRIBUTI
7. If a r.v. X has p.d.f.
ONSONe
1 <x < 3
f (x) = X

then find (i) c(ii) E (X) (ii) var (X).


1 2 1
Ans.: ()
log 3 log 3 (ii) log 3 log 3)
8. The distribution of X is
X< - 1

F(x) = -1<xS2

1 X> 2
Evaluate P(0 < x< 1). Also obtain p.d.f. of X.
(C) State whether the following statements are true (T) or false (F) in question numbers 18 to 23.
1. IfA cR then P(xe A) = 1-J f(x) dx.
A
Ans.: False

2. If X is a continuous variable with distribution function


F(x) then P[X >x] = 1-F(x).
Ans.: True

3. If f() is p.d.f. of continuous random variable Xthen f(x) > 0, Vxe R.


Ans.: True
4. If the p.d.f. of a continuous random variable is given by;
f() = k (x 1)2 1<x<3
0 otherwise
then k =
3
Ans.: False
5. If X is a continuous r.v. with p.d.f. f(x) then

Var (X) = J x2f(x) dx

Ans.: False
b
6. If X is a continuous r.v. with p.d.f. f(x) then P(a < X <b) = J f(x) dx.
a
Ans.: True

(B) Normal Distribution (Gaussian


Distribution)
10.6 INTRODUCTION NORMAL DISTRIBUTION (GAUSSIAN DISTRIBUTION) height
Normal distribution is one of the most commonly used distribution. The variables such as quotient,Normal
of a person, weight of a person, errors in measurement of intelligence
distribution.
physical quantities follow normal operations researdt
distribution is useful in statistical quality control, statistical distribution
inference,
educational and psychological statistics. In the theory of sampling, designs reliability theory, normal
of experiment also, Suchasbinomial
plays an important role. Normal distribution works as a limiting distribution to several
Poisson.
distributions
STATISTICS,
(SE, AI & DS)
(10.11) CONTINUOUS PROBABILITY DISTRIBUTIONS
Normal distribution was first of all discovered by De Moivre around 1720 as an approximation to binomial distribution
forlargen.The normal distribution is also called as Gaussian distribution. Gauss (1777-1855) and Laplace (1749-1827)
broughtout the important
role of normal distribution in the field of
astronomy as the errors in
Quetelet(1796-1874) and Galton (1822-1911) fitted normal distribution to data on heights and the measurement.
weights of human
as well as animals.
beings
DEFINITION OF NORMAL DISTRIBUTION
h07
A
ContinuoUSrarndom variable Xis said to follow normal distribution with pararmeters u and g', if its pd.f is

f(x) =

Note:
1 Arv.XXfollows normal distribution with parameters uand g² is symbolically written as X’N(u, G')
Hz0.G²=1, then the normal variable is called as standard normal variable ie. N(0, 1), Generally it is dengted by Z.
The p.d.f. of Z is

1
f(z) = e

with T 3.141159 and e 2.71828


3 The probabilitydensity curve of N(p, o') is bell-shaped, symmetric about å and mesokurtic as shown in Fig. 103
f(x)

Fig. 10.3
Naturally, the curve of standard normal distribution is symmetric around zero.

4. The maximum height of probability density curve is =


mode coincide and all are equal to u.
rS tne Curve is symmetric about L, the mean, median and
Ine parameter o² is also the variance of X. Hence s.d. (X) = o.
kelation between N(4, o) and N(0, 1)
Result 1: IffX’N(, ’ N (0, 1). This result is useful while computing probabilities of a N (u, o)
o) then Z =
Variable. The normal i.e. N(0, 1) variable.
sttaatistical tables give probabilities of a standard
108 COMPUTATION OF PROBABILITIES
by Xaxis and
dx. It is equivalent to the area under the curve f(x), bounded
p.d.f. of Xthen P(a < X< b) = f(x)

e
ordinates X=
a,X= b.
TX’N(u, g) then e 202(x-)2 ... )
dx
P(X >a) = J
STATISTICS (SE, AI & DS) (10.12) CONTINUOUS PROBABILITY DISTRIBUTIONS
The integral (i) cannot be evaluated directly by using usual techniques. The values of integral are tabulated for aso
for standard normal variable. The table is referred to as 'normal probability integral table'. Thus to evaluate (0, we mo-ny
X-u
convert it in terms of N(0, 1) by using transformation Z =
The table gives area of right side tail for N(0, 1) i. e. P(Z ), a > 0. See Fig. 10.4, In order to obtain the probability ef sx.
region we use the following rules.
f(z)

P(z>a) =table value

Fig. 10.4

1. The total area under the probability density curve is 1

2. The p.d.f. curve of N(0, 1) is symmetric about 0, therefore if we fold the curve about Yaxis (i. e. at X= 0) the two parts
coincide. Thus we get area of right tail and that of symmetric left tail same. In other words for Z N (0, ),
P(Z > a) = P(Z<- a)
P(Z<-a)
P(Z>a)
0 a

Fig. 10.5
3. Area belowthe ordinate =1- Area above the ordinate
P(Z < a)' = 1-PZ2 a)
= 1-(Table value)
4. If X’N(4 o) and suppose we want to find P X2 al, then we transform the variable Xto Z, aN(0, 1) variable and
obtain the probability as follows :

P[X2 a] =
= table value

The following example will give clear idea as, how the normal probabilities are computed using the tabulated values.
Example 10.7 : Let X-’N(3, 4). Find () P(X >5) () PX< 1) (üi) P(X> O) (iv) P(X <6) ()P(2<X< 6) (v)P(4<X<
6) (vii) P(x|> 4) (viüi) P (X-3| < 3.92).
Solution : We need to express first of all the probabilities in terms of standard normal variable Z.
P (X > 5) =
P) (u=3, o' =4)
= P(Z > 1) where, Z =
X-P

From normal probability integral table, we get area of shaded region as,
P(Z > 1) =0.15866

Fig. 10.6
DS)
AI&
ICS(SE, (10.13)
PâX < 1) = PS E1-3 CONTINUOUS PROBABILITY DISTRIBUTIONS
2
= P(Z <- 1)
= P(Z > 1) (due
to
= 0.15866 (from thesymmetry)
table) Fig. 10.7
P(X > 0) =
)
= P(Z > -1.5) = B A
-1.5

tail in table, we Fig. 10.8


Sinceonly | area is given use the fact that A + B=1
P(Z -1.5) = 1-A = 1-P(Z<- 1.5)
>
= 1-P(Z> 1.5)
= 1-0.066087 = (due to symmetry)
0.933913
POX <6) =P | 3
2
= P(Z < 1.5)) = A
= 1-B
": A+B=1
B
= 1-P(X> 1.5) [From the table]
1.5
= 1-0.066087
Fig. 10.9
= 0.933913

(2-3
P(2 < X < 6) = P 2 ,)
=P-0.5 <Z < 1.5) = B
A
= 1-A-C (A+ B+C= 1) -0.5 1.5

= 1-P (Z <- 0.5) - P(Z > 1.5) Fig. 10.10

= 1-P(Z> 0.5) - P(Z > 1.5) (due tosymmetry)


= 1-0.30854 -0.066087

= 0.625373
A

P(4 <X<6) = p[X=P6-3)


2 2
B
=P(0.5<Z< 1.5) = A= (A + B) B 0.5 1.5
= P(Z > 0.5) - P(Z> 1.5) Fig. 10.11
= 0.30854 - 0.0668087 = 0.241533
(vi)
P(X| > 4) = P(X > 4) + P(X <-4)

-3.5 0.5
P(Z > 0.5) + P(Z < -3.5)
Fig. 10.12
P(Z > 0.5) + P(Z > 3.5) (due to symmetry)
= 0.30854 + 0.00023263
= 0.30877263
CONTINUOUS PROBABILITY DISTRIBUTIONS
STATISTICS (SE, AI& DS) (10.14)

pX-33.92
(vii) P(X- 3| <3.92) = 2
= P((Z| < 1.96)
= P(-1.96 < Z < 1.96)
A
= B
-1.96 1.96
= 1-A-C
Fig. 10.13
=1-2C (due to symmetry, A= C)
= 1-2x 0.024998
= 0.950004

1
Note: P(X > u) = P(Z >0) =

and P(X < ) = P(Z < 0) =


Hence u is a median of X.
10.9 AREA BETWEEN THE ORDINATES -Ko AND u + Ko
If X’ N (u, G) then

() P(u -G< X< u+ ¡) = =P(-1 < Z< 1)


= 0.68268

(i
P( 20 <X«u+ 20) =P-2<)
= P(-2 < N (0, 1) <2)
= 0.9545
(ii) P(u -36 <X<u+3o) = P(-3 < Z< 3)
= 0.9973

l-3
-2o

Area = 0.68268
Area = 0.9545
Area = 0.9973

Fig. 10.14
10.10 ORDINATES WHICH INCLUDE 50%, 95%,99% AREA
If X’N (, o), then we can find
the ordinates equidistant from
median which cover area 50%, 95 % and 99 %.
P(u - ko < X<| + ko) = 0.5

= 0.5

P(-k< Z< k) = 0.5


From table we get
P(-0.67 <Z< 0.67) = 0.5
P (u-0.67 g < X <u +
0.67 o) = 0.5
(SE, AI & DS)
ASTICS
(10.15)
- 1.965.<X< + 1.96 o) = 0.95 CONTINUOUS PROBABILITYDISTRIBUTIONS
welarlyPu and P(u-2.58 G<X< u+ 2.58
-1.96o o) = 0.99
-2.58 o
Ht 1.96
u-0.67oH u + 0.67 o u+ 2.58 o
Area = 0.5
Area = 0.95
Area = 0.99

Fig. 10.15
1011
ADDITIVE PROPERTY

o)
relt 1:Suppose X’ N(, and Y’ N(Hz o) If Xand Y are independent random variables then X + Y follows

n
.lisation : If X1, X2, ... Xn are independent random variables such that Xi ’ N then Y = L X; ’ N
i=1
2

Corollary :
i X, X...,Xn are iid N (H, G) variables then :
n

(a) 2 X’N(nu, n¡?) and


1

(b) X

Result 2: If X’ N
( oi) and Y’ N (H2,o) are independent random variables
2 where, a, b, c are constants.
Then aX +bY + cN (au, + bå, + G, a²o 1+bo2)
Particular cases :
I. Ifa = 1, b=1, c= 0, we
get

aX + bY + c =X+ Y’N (H + z o +)
. Ita= 1, b=-1, c = 0,
we get
2
o)
aX +bY +c = X-Y’N(Ë- H2, i+
Result 3: If X1Xz .. nare independent random variables suchthat
X; ’ N( oi).
then +... anXn +b’NajuË +b, nEa;2 1
2
CONTINUOUS PROBABILITY DISTRIBUTIONS
STATISTICS (SE, AI & DS) (10.16)

DISTRIBUTION
10.12 NORMAL APPROXIMATION TO BINOMIAL
then for large n and smallp such that np is a fixed finite
If Xfollows binomial distribution with parameters n and p,
However for large n, without putting any condition onp we
constant, we get limiting distributionas Poisson distribution.
result in this connection is called as DeMoivre theorem. It is given
get limiting distribution of binomial as normal. The
below:
and p then the probability distribution of
DeMoivre Theorem : If X follows binomial distribution with parameters n
X- np tends to N(O, 1) as n’ oo, Note that E(X) = np and Var () = npq
Vnpq
Note: This theorem has applications in developing tests for proportions.
Example 10.8 : Afair coin is tossed 400 times. Using normal approximation find the probability of getting :
() number of heads between 180 and 215.
(i) number of heads less than 185.
Solution :Afair coin, after tossing, results into head or tail each with probability 0.5.
Let X = Number of heads in 400 tosses.

.:. X’ B(400, 0.5)


E(X) = np = 400x 0.5 = 200and
Var (X) = npg = 400 x 0.5 x 0.5 = 100
(0) P=P (Number of heads are between 180 and 215)
= P(180 < X< 215)

= P
180 - 200 X- np 215 - 200
V100 Vnpq V100
Using normal approximation we get Z = ’ N(0,1).
Vnpq
:. P P(-2 < N(0, 1) < 1.5) = B
= 1-A-C

= 1-P(Z<-2) -P(Z >1.5)


(due to symmetry)
=1-PZ> 2) - P(Z> 1.5) -2 1.5

= 1-0.02275 - 0.066087 Fig. 10.16


= 0.911163

(i) P (Number of heads less than 185)


= P (X <185)

= P X-np185 -200) (Using normal approximation)


Vnpq V100
= P(Z < -1.5) = P(Z > 1.5)
(due to symmetry)
= 0.066087
ATISTICS
(SE, Al & DS)

NORIMAL (10.17)
ie
APPROXI
binomia/distribution M ATI O N TO
POISSON CONTINUOUS PROBABILITY DISTRIBUTIONS
standardised Poisson distribution DISTRIBUTION with
gatement:
IfX Poisson(m), then probability parameter malso
tends to standard normal
distribution of X-m tends to
Vm N(0, 1) as m ’ o, distribution.
10.9: Let X X., X100 are Lid. rv.s.
Sample with Poisson distribution with mean 4. Find
100

n:Usingthe
Solution additive property of Poisson
distribution, we get,
approximately P|2 X> 430
1

100

Y = 2 X; ’ P
0=l (400)
Since mis large, we use normal
approximation
7Y400 Y- 400
V400 20 ’N (0, 1)
100

21 X >430 =
430 400
20

= P[N (0, 1) > 1.5]


= 0.066807
Example 10.10 : Marks scored by candidate in an
condidates obtained marks below 55 and 6 % of the
examination follows normal distribution. Fourty four percent of the
candidates obtained marks above 80. Find the mean and variance of
marks.

Solution : Let X =marks scoredby candidate. X’ N(u, o). We are given that P(X < 55) =
0.44 and P(X > 80) = 0.06.
PX <55) = P(=H55-) -PzS-u) = 0.44 ... ()
From normal probability integral table we get
P(Z < 0.15) = 0.44 ... (u)

From equation () and (ii) we get


0,44
55 = -0.15
-0,15
55 -u = -0.15 o Fig. 10.17

Solving P (X > 80)= 0.06 we get one more equation in u and o,

P(X > 80) = P)-(20) = 0.06 . (iv)

From table we get


. (v)
P(Z > 1.555) = 0.06

From equation (iv) and (v) we get


80 = 1.555
0.06
1.555
0

80- u = 1.555 o ... (vi) Fig. 10.18


STATISTICS (SE, AI &DS) (10.18) CONTINUOUS PROBABILITY DISTRIBUTION
Taking ratio of (vi) to (v) we get
80 - 1.555
=-10.3667
55 - 0.15
u = 57.1994
Putting u = 57.1994 in (i) we get
G = 14,6628

Example 10.11 : IfX is arandom variable with p.d.f

-K-92
f(x) = e ,-00<X < oo

Find, () P(X > 6), (ii) P(2X + 3> 14) (iü) E(3X- 2), (iv) Var(2X + 5)
1

Solution: f(x) =
1
e
-0-s)2
Gy2 2V2n
Comparing the p.d.f. we get
u = 5, g² = 4
X’ N(S, 4)
() PX > 6) = P
os) =P(Z > 0.5) = 0.30854
(ii) 2X + 3 N(u, o)
where, u' = E (2X + 3) =2u + 3 =10 + 3 = 13
and G2 = Var (2X + 3) = 4 Var (X) = 16
2X + 3 ’ N(13, 16)

P(2X + 3 > 14) = P(2X +3-13 14- 13)


4 4

= P(Z > 0.25) = 0.40129.


(ii) E(3X- 2) =3E(X) -2 = 15-2 = 13
(iv) Var (2X + 5) = Var (2X) = 4Var(X) = 16
Example 10.12: Suppose heights, of soldiers follow normal
distribution with mean 170 cm and variance 50 cm. In
regiment of 1000 soldiers, how many would you expect to be over 180 cm tall ?
Solution: Let
X = height of a solider
X’ N (170, 50)
Proportion of soldiers having height above 180 cm
= P(X > 180)
180- 170
Vs0
= P(Z > 1.4142) = 0.07927
No. of soldiers out
of 1000 having Proportion of soldiers
= 1000 x having height above
height above 180 cm 180 cm
= 1000 x 0.07927 = 79.27 79.
STATISTICs(SE, AI & DS)
(10.19) CONTINUOUS PROBABILITY DISTRIBUTIONS
NormalProbabilities using MSEXCEL
Problem
1:Ifx
:1 ’ N(21.76923. 5.788302) find P(x s 10).
solution:

Chooseefunction category statistics and function name


1 NORMDIST. The following window will apeear.
Insert Function

Search for a function:

Go

Or select acategory: Statistical


Select a function:
|MAXA
MEDIAN
MIN
MINA
|MODE
NEGBINOMDIST
NORMDIST
NORMDIST(,mean,standard_dev,cumulative)
Returns thenormal aumulative distribution for the speifed mean and
standard deviation.

Help on this function OK Cancel

Fig. 10.19

Click OK to get following dialogue box.

Function Arquments

NORMDIST
10
X10
=21.76923
Mean 21,76923

Standard_dev 5.788302 E= 5.788302


= TRUE
Cumulative TRUE
= 0.0210 12197
speified mean and standard deviation.
Returns the normal umulative distribution for the

distribution function, use TRUE; for


Cumlatve is a logical value: for the CumulativeFALSE.
the probability mass function, use

Formula result = 0.021012197

OK Cancel
Help on this function

Fig. 10.20
(10.20) CONTINUOUS PROBABILITY DISTRIBUTIONS
STATISTICS (SE, AI & DS)
cumulative|True and click
Z. In dialogue box enter X= 10mean = 21.76923, standard dev = 5.788302. Respond
Alternatively the command can be used to get the same result.
= NORMDIST (10, 21.76923, 5.788302,True)
3. We get p (x s 10) = 0.021012197
POINTS TO REMEMBER
Definition of Normal Distribution : A continuous ry, X is said to follow normal distribution with parameters u and
if its p.d.f. is

e 2o < X< o, - 00 < u <o;G> 0


f(x) =

we write it as X’ N (4, o).


In particular if = 0and o = 1then the normal distribution is called standard normal distribution [N(0, 1)
distribution].
The standard normal variate is denoted by Z. The p.d.f. of z is

f(z) =

If X’ N(L,o) then z = X ’ N(0, 1).

Suppose Xi ’ N(4 o) and X, -’ N(42 o). If X, and X, are independent random variables then X+ X, follows
).
n n n

In general if X, Xa ... X, are independent rv.s. Such that X; ’ N(4, o) then Y= E x,’NE . E
i=1 i=1 i=1
If X, X, ... X, are i.i.d. N (u, o) then
n

() E x’N(n, n¡) and (i) X’N


i=1

If X, Xz ... X, are independent random variables such that X-’N (4, o) then
a, X; + a, X, +... +a, X, +b’ NEa +b, i =1 a, o
1

If X’ N(4, o) the density curve is symmetric about u.


Mean = Median = Mode.
All odd ordered central moments are equal to zero.
If X’N (L, o) then
() P(u-G<X< u +o) = 0.68268
(ii) P(u -2o < X <u+ 2o) = 0.9545
(ii) P(u - 3o < X<u+ 3o) = 0.9973

EXERCISE 10 (B)
(A) Theory Questions
1. Define normal probability distribution and discuss its importance in various fields.
2. State mean and variance of N (H, o).
3. State mean, mode and median of N (H, o).
STATISTICS
(SE, AI & DS)
the (10.21)
Showthat quartiles Q1 and Q3 are CONTINUOUS PROBABILITYDISTRIBUTIONS
Showthat normal
equidistant from median.
5 distribution symmetric about mean.
is
State and prove additive
independent normal variatesproperty two independent normal
6 ot
variates. State and prove its generalisation to n
1IfX’N(4, o), what is the distribution of (i)
and (i) aX + b, where a
and b are
y and Yare constants.
independent normal
distribution of aX + bY + c; where variates with means H, Hz and variances o,
2
a, b, c are o,
respectively; state the probabilty
constants.
Y. X... , Xnare independent r.v.s.
such that Xj ’N(U, o) i=1,2, .. nthen
n
state the probability distribution of
Sa Xi + bwhere aj's and b are constants.

n #X,.X, ..Xn is a random sample from


N(, c²), what is the probability distribution of X ?
11 State the De Moivre's theorem on
normal approximation to binomial distribution.
12 State the theorem concerning normal
approximation to Poisson distribution.
(B) Numerical Problems
1 ldentify the parameters if f(x) given below is a p.d.f. of N (u, G).

f(x) =
1 (x-4x +4)

1
f(x) = C. exp 24 ^ 0e- 6x +9).c>0
(in) - (2x?- x)
f(x) =ke k>0
-10x + 22)
(iv) f(x) = Ce C>0
1 1
Ans.: () p = 2, G= 5, (i) u = 3, G= 2/3 (ii) u =,o=,(iv) =5, G=3
. LetXN(15, 4), find (i) P(13 < Xs17); () P(X >17); (ii) P(X < 15)
Ans.: (0) 0.68268. (i) 0.15866, (ii) 0.5.
3. Arv. Xhas
following p.d.t.
1
f() = -X< X<X

Lompute (Ö) P(-5 < X<- 2) () P(X +3s1)


s 0) 0.022741, (ii)
0.0214001
4. An unbiased die is rolled 720 times. Using the normal approximation, find the probability that more than 128 sixes
will turn up. Also find the probability that the number of sixes will lie between 100 and 140.
Ans: 0.21186, 0.9545
5. There are 1000 students in the university of a certain age group and it is known that their weights are normally

distributed with mean 55 kg and standard deviation 4.5 kg. Find the number of students having weight
) less
than 48 kg.
) between 50 ka and 58 kg.
Ans.: )59.38, (i) 615.07
STATISTICS (SE, AI & DS) (10.22) cONTINUOUS PROBABILITY DISTRIBUTIONS
6. Let Xhave a standard normal distribution. Let A and B be two events such that A = x-0.2 < x< 03
B= fx 0.1 < x< 0.2}.
Find P(A) and P(AUB).
Ans.: PAUB) = P(A) = 0.27072
7. It is known that 30 % of adult population are smokers. What is the probability that in arandom sample of 1000
adults 280 or less will be smokers?
Ans.: 0.083793
8. Let Xand Ybe independent N(2200, 4002) and N(2000, 3002) variates respectively. Find
P[200< >X-Y < 5001.
Ans.: 0,22575
9. Let X’N (4, o). If P(X < 89) = 0.9 and P(X <94) = 0.95, find uand '.
Ans.: u=71.2222, g' = 192.9012
10. XË and X are two independent normal variates. Their respective means are 2500 and 2000 in
Their standard deviations are 500and 400 respectively. Obtain the appropriate units.
X, s 1500; (i) X, > 2000;
probabilities following events.
of
(ii) |X1 - Xz| s 100
Ans.: (i) 0.02275, (i) 0.5, (ii) 0.09402
11. The life time of acertain type of battery has a
mean of 300 hours and a standard deviation of 35 hours.
that the distribution of life time is normal, find, Assuming
() proportion of batteries having life time
between 223 and 356 hours.
(i) the life in hours above which we will find
the best 15 % of the batteries.
Ans.: () 0.931298, (ii) 3364 hours
12. A minimum height is to be
prescribed for eligibility to Government services such that 60 %
have a fair chance of coming upto that of the young men will
151.5 cm standard. The heights of young men are
normally
and standard deviation 6.375 cm.
Determine the distributed with mean
Ans.: 149.9063 cm.
minimum specification of height.
13. Suppose the marks scored by
students in a
SCored the marks less than 35 and 89 % of the certain examination are normally
students scored the marks less thandistributed.
If 7 % of the students
distribution. 63, find mean and variance of the
Ans.: u= 50.05263, o = 10.5263
14. A monthly balance on the
bank account of credit card holders is
5000and standard deviation 1000. assumed to be normally distributed with mean
Find the proportion of credit card
() over? 6500. () between 4000 and
holders with balance :
6000
Ans.: (i) 0.066807 (ii) 0.68268
15. Let Xand Ybe two
independent normal random variables with means 1, 2 and
() P(2X + Y > 3) (i) P(3X - 2Y < 10) variances 1, 4 respectively. Find:
Ans.: (i) 0.63683, (ii) 0.986097, (ii) (i) P(X -Y| s 6)
0.986581
16. An electrical unit is
made-up of 3 resistors in series. The
and standard deviation 0.5
ohms. The unit becomes resistance of each of these is normal with mean 10
resistance of the unit is the sum of the functionless if its resistance is areater than 31.732 ohms
number of units out of 1000, that might beresistances. Assuming that the three resistors are ohms. ne
Ans.: 22.75 expected to be useless. independent, tind ue
CONTINUOUS PROBABILITY DISTRIBUTIONS
(SE & DS) (10.23)
ATSTICS,AT
find a, b, Csuch that

=0.9

= 0.9

b= 1.28, c =-1.28
Ans..a =1.64,
Thefollowing
is a distribution of scores in TOFEL examination of 250 candidates selected at random.
18
Score No, of students Score No. of students
250 - 260 9 300 -310 54

260 -270 11 310--320 37

270 - 280 10 320-330 26

280-290 44 330-340 8

290 - 300 45 340 -350 6

it a normal distribution to the above data and find the expected frequencies.
19.6784
Ans.: | =300.4, G =
frequencies
10 Eit anormal distribution to the following data and find the expected
40-50 50 -60 60-70 70-80 80 - 90 90-100
Wages () 30 - 40
24 48 68 30 13
No. of workers 8

Ans.: u =63.15, G = 13.641022


expected frequencies.
20. Fit a normal distribution to the following data and obtain the
60 -65 65-70 70-75 75-80 80 - 85
Class
15 32 43 17
Frequency
Ans.: u=70.7272, o= 4.9353
its life in thousand km. Fit a normal distribution
certain brand of tyre has the following frequency distribution for
1. A
to these dataand find the expected frequencies.
Life 20-25 25-30 30-35 35-40 40- 45 45-50
8 12 15 18 13 9
No. of tyres
Ans.: u =35.3667. o=7.5354
It anormal distribution to the following data and obtain the expected frequencies.
75- 80 80 - 85 85-90 90- 95 95- 100
Class 60 - 65 65-70 70-75
335 326 135 26

Ans.: u
Frequency 3 21 150

=79.945, G= 5.4449
standard deviation 15. Assuming the
certain examination, mean of marks scored by 400 students is 45 with
isioution to be normal. find number of students securing marks between 30 and 60. Also find upper quartile of
the distribution.
24. S% of the electric bulbs manufactured by a company are defective. Using normal approximation find probability that
of defectives will lie
Lsdmple of 400 bulbs, 30 or more will be defective. Also find probability that the number
between 10 and 20.
STATISTICS (SE, AI & DS) (10.24) CONTINUOUS PROBABILITY DISTRIBUTIONG

()State whether the following statements are True (T) or False (F) in following questions.
1. Normal distribution is a discrete type of distribution.
Ans.: False
2. If X’ N [75, (10)2] then mode = 75.
Ans.: True
2 2
3. If Xy’N (44, 6Ë ), X, ’N(4z oz) and X, and X, are independent then aX, -bX, ’ N(auj- bu, a' oË - b?).
Ans.: False

4. Suppose X has normal distribution when mean 60 and variance 49. Then X- 60
49 ’ N(0, 1)
Ans.: False
5. Standard normal distribution is symmetric about zero.
Ans.: True
6. If X ’N (60, 62) then the value of B = 0.1
Ans.: False
7. If X4, X ... Xso are independent Poisson variables with parameter 4 then
50

P X;2 200 = 0.5.


|i=1
Ans.: True

No

Na.
Chapter 11
UNIFORM DISTRIBUTION
1.1 INTRODUCTION
ncontinuous set-up, many a times it is
observed that the variable follows a
described in a
be mathematical form called 'probability models.' In the
specific pattern. This behaviour of the r.v. can
important continuous probability
distributions; such as uniform, subsequent chapters, we are going to study afew
1.2UNIFORM DISTRIBUTION exponential, normal.
In this distribution, the p.d.f. of the r.v. remains
s.Geition :A continuous type r.v. Xis said to constant over the range space of the variable.
follow uniform distribution over interval [a, b], if its p.d.f. is given by
f(x) = i asxsb,a <b
b-a
= 0 otherwise
Notation :X’U[a, b]
Nature of Probability Curve:
f(x)+
The distribution is also known as
'rectangular distribution, as the graph of p.d.f.
describes a rectangle over the X axis and between 1
the ordinates at X = a and X = b. b-a
See Fig. 11.1].

Fig. 11,1
Note that (Ö) fx) > 0 for all xe Rand
b
1
) J f) dx =J b-a
dx =h-a= 1.
Note: If X’U-a, a] ; ae R,

then 1
f(x) = 2a
;-aSXZa

= 0 ; otherwise
Distribution Function : If X’U[a, b], then
F() = P [Xs x]
X X
1
= J f(t) dt =J b-adt
a

X-a
F(X) = b-a

X < a

X-a
F(X) = asxsb
b-a
1 X> b

(11.1)
(11.2)
UNIFORM DISTRIBUTION
STATISTICS (SE, AI&& DS)
b], then
2. Mean and Variance:If X’U[a, b
b
1 b²- a?
dx = J x b-a dx = 2(b- a)
Mean = E(X) =J x f(%)

a +b
E(X) = 2

Now 4 =E(X) = J x' fo) dx


b³-a? a + ab + b
3 (b-a) 3

a' +ab + b² (a + b) (b- a)


Var () = h= -i 4 12

Var (X) =
(b- a)?
12

SOLVED EXAMPLES
Example 11.1 : If mean and variance of a Ula, b]r.v. are 5 and 3 respectively, determine the values of a and b.
Solution: If X’ U[a, b] then E(X) =
a +b (b- a)
2 and Var (X) = 12
a + b
2 =5’a + b= 10 .. (1)
(b- a)?
12 = 3’b-a= 6 . .(2)
Solving (1) and (2) we get a = 2, b = 8.
Example 11.2 : On X'mas, John gives party to his friends. A machine fills the ice cream cups.
The quantity of ice cream per
Cup is uniformly distributed over 200 gm to 250gms. (O What is the probability
that afriend of Johngets a cup with more
than 230 gm of ice cream ? (i) If in all twenty fve people attended the party
and each had two cups of ice cream, what is the
expected quantity of icecream consumed in the party ?
Solution:X= Quantity of ice cream per cup.
X’U[200, 250]
PX> 230] = 1-P [X< 230] = 1 230 - 200 =0.4
50
(ii) On an average, quantity per cup is given by
200 + 250
E(X) = 2
= 225 gm
Total quantity consumed = 225 x 2x 25 = 11250 gm
= 11.25 ka

Example 11.3 : Let X’ U-a, a]. Find the value of asuch that P(X > 1] =;6
Solution: P[|X| > 1] = 1-p|X|s1] =1-P-1<Xs1]
1
1
P-1sxs1] = [: 2 dx =
-1
It gives a = 7
STATISTICs(SE, AI& DS)
(11.3) UNIFORM DISTRIBUTION
1.3 APPLICATIONS OF UNIFORM DISTRIBUTION
Although it is the simplest continuous distribution, it has wide
for parameters in Bayes' theory.
applicability in research, mainly used as a prior model
s used to
Itis represent the distrbution rounding-off
of errors.
Itisalso usedin life testing and traffic flow experiments.
ProbabilityIntegral Transformation :
|Theorem1 (Statement only): Suppose that a
random variable X has a continuous distribution with cumulative
tion function (c.d.f.) as Fx), which is invertible, then the
random variable Ydefined as
Y = Fx(x) follows U[0, 1]
.above result in probability theory is of immerse use while drawing a random
asfollows
sample from any continuous distribution
Suppose X~ FxX), then by the theorem. Y = F(X) = P[X S x] ~ U[o, 1]
The fact that the c.d.f. Fx() is invertible, allows us to writex= g(y) for
some function g.
Me can then obtain random sample from U[o, 1] forY= Fx()
using any
statistical software. For instance, the function
PAND (O in MS-Excel gives arandom number between (0, 1). Using the
relation,x= gly)l. We get a random value of X
very easily.
tlustration:Suppose we want to draw a random sample size 5 from the continuous
distribution with p.d.f.
f(x) = e*
= 0 ; elsewhere
X

Then the c.d.f. of Xis given by, Y= F(%) = PX< x] = f(t) dt = e dt = 1- e


0
y = 1 -e*
1-y = -e*
x= - In (1-y ... (1)
Random sample from Y = Fx(x) is obtained from MS-Excel using the
function RAND (). Random sample from the
distribution of Xis obtained by applying relation (1) gives in the Table below.
X=In(1 y)
0.3175 0.3820
0.2162 0.2436
0.9235 2.5705
0,4316 0.5649
0.5213 0.7367
Note : The
However, thisdistribution
has very striking
properties, such as the distribution of sum of two uniform r.v's is not uniform.
is beyondthe scope of the book.
Acontinuous POINTS TO REMEMBER
type r.v. Xis saíd to follow uniform distribution over [a, b] if its p.d.f. is given by
1
f(x) = b-a asxsb, a< b
=0 otherwise
We write it as X
’>U [a, b]
X’U a, b] then (i) Distribution function is given by
0 ;X< a
X- a
F() i a s Xsb
b-a
1 ; X> b
STATISTICS (SE, AI &DS) (11.4)
UNIFORM DISTRIBUTION
a+b
() E(X) = 2

(ii) Var (X) =


(b-a)'
12
farv. Yfollows any continuous distribution then its distribution function X= F(y) can be shown to follow UIo 1
EXERCISE
(A) Theory Questions:
1. Detine auniform r.v. over an interval la. bl. Find its distribution function, mean and variance.
2. Discuss the applications of uniform
distribution.
(B) Numerical Problems:
1. SuppOse X ’ U 0, 10]. Find (i) mean, (iü) variance, (ii) P (X > 4), (iv) P (X<3), (v) P
Ans.: (0) 5, (i) 8.33, (iii) 0.6, (iv) 0.3, (v) 0.5. (2sXs7).
2. IfXU[1, 2, find Ksuch that P [X > K+ 1
]=7: 1
Ans.:
3. Let X’U(-a, a), determine a such that (0) P (X>1) =7 (ii) P[Xs 0] = 0.5 (i) P[X >2 2] = 0.3.
Ans.: (0) 3 (0) any + ve real no. (ii) 5
4. IfXis uniformlydistributed over [a, b] with
mean u and variance o', find a and b in terms of u and o.
Ans.: a=-V3 o, b = +V36
5. If Xis uniformlydistributed with mean 1 andvariance 3, find P [X < 0].
Ans.:
6. If X ’U[0, 1], find
()
1 1
Ans.: () 3 )
7. If X’U[a, 10] and P (3 <X< 7) = 2 tind'a'
8. A string, 1 metre long, is to be cut in
Ans.: 2
twO at a random point along its length. Let X
211.1-P(x}
)cont. r.v. 12. (1, 2] 13.
be the point where cut occurs.
What is the probability that the two pieces cut are of unequal length ? Ans.: 1 0 cont. rv.
9. Arv. Xuniformly distributed over an
interval of unit length is such that P (X<3/2) =
Determine the interval.
Ans.: [1, 2]
10. On a route, the first bus is at 8.00
am and after every 30 minutes, there is
time X, which is uniformly distributed over a bus. A
the interval [8.15 a.m., 8,45 a.m.]. passenger arrives at the stop at
What is the probability that the
passenger will have to wait for more than 15 minutes for a bus ?
11, Abus travels between two
cities which are 100 km apart. There are
Ans.:
centre of the route. Whenever the bus fails, a service agarages in the two cities as well as at the
toe truck is arranged from the
breakdown of the bus. What is the probability that the toe truck service garage
has to travel more than 10 kmclosest
tO the pont o
Ans.: 0,6 to reach the bus
(C) State whether the following
statements are true (T) or false (F).
1. IfX U[a, b] then Var(X) = (b-a)? 4
Ans.: False
2. If X ’U-a, a] then E(X) = 0.
Ans.: True
3. Suppose X’ U-7, 7| then P[X|> 1] =÷ Ans.: False
4. Uniform distribution can be used to represent the
distribution of rounding-off errors. Ans.: True
Chapter 12
F21INTRODUCTION EXPONENTIAL DISTRIBUTION
Exponential
Survival
distribution
analysis.
is one of the
Life time of an important distributions used in life testing
exponential
distribution. electronic component,
The distribution has time until decay of a experiments in reliability theory and
number of
arrivals are close link with
the Poisson radioactive element are modelled by the
exponential distribution. according
to Poisson distribution
in the sense that, whenever the
Cor example, if the
distribution,
the time interval
between successive arrivals follows
number of customers
hetween the current arrival and the arriving at a telephone booth follows
Poisson distribution, then the time gap
important model in Queuing theory. Wenextshallarrival is exponential. This result makes the exponential
22 EXPONENTIAL DISTRIBUTION now introduce overselves with the preliminaries of the
distribution an
distribution.
nefinition:Acontinuous r.v. Xtaking
oobability density function (p.d.f.) is givennon-negative
by
values is said to follow
exponential distribution with mean if its
1
f (x) = ae x/
X>0,0>0
=0
Notation :X’ Exp () otherwise
Note: (1) We shall verify that f(x) is a p.d.f.
() Obviously f(x) >0 x >0,0> 0 and e x/ > 0

(i) J f(8) dx = 1 x/e]


0
e wedx = 1/0
0

= 1
From () and (iü), it is clear that f (%) is a p.d.f.
Nature of Probability Curve:
Ine density curve of an exponential distribution is as shown in
Fig. 12.1
2
IS apositively skew curve. From Fig. 12.1, it is clear that mode of
ne distribution is 0. However, when this
distribution is used, the 14 -0=1
valde of mode cannot be taken as an
B)If 0 = 1, then the distribution isaverage.
called standard exponential (0, 0) 1 2
distribution. Hence, its probability density function is given by; 3 4

Fig. 12.1
5

f (x) = e X X>0,0 >0


= 0 otherwise
Mean and Variance:If X ’ Exp (0),then it can be shown that
Mean = E(X0) = 0
Var (X) = 0
S.D. (X) = 0
Thus ior
exponential distribution, mean and standard deviation are same.
(12.1)
STATISTICS (SE, Al& DS) (12.2) EXPONENTIAL DISTRIBUTION

12.3 DISTRIBUTION FUNCTION OF EXP (0)


Let X’ Exp (0). Then the distribution function is given by;
X X X
1
Fy() = P (X <x] =J f(t) dt =J et/o dt- e - t/0 dt

1/0
0

Fx() = 1-e W/;x > 0, 0 > 0.


Note :IF Xis life time of a component, PX> x] is taken as reliability function or survival function.
Here P X > x] = 1-P [X <x] = 1-Fx (X) = ex/;x > 0,0> 0.
12.4 LACK OF MEMORY PROPERTY
This is an important property,the exponential distribution possesses.
Statement : If X ’Exp (0) then PX2s + t||X>s] =P[X>t] for s> 0, t >0
Proof: We have already seen that
If X’Exp (0) then P [X >x = P[X> x] =1-F (X) = e X/8
-X2s+t),(X> )]
L.H.S. = P[X>s + tlX>sl P [X > s]

P(Xs + t) e (s + t/e)
P X2s] es/0 =e ... (1)

R.H.S. = PX>t= e-t/8 .. (2)


From (1) and (2) it is clear that
P [X2s + t|X>s] = P[X>t] for s> 0, t> 0 ... (3)
Note : (1) The equality (3) can be written in form
P[X>s+t] = P[X>s] P[X>t];s> 0,t> 0
(2) Converse of the above statement is also true ie. If Xis a continuous random variable taking non-negative values
such that equation (3) holds good then X’ Exp (0). Hence, this is the characteristic property of exponential distribution.
(3) Interpretation : Equation(3) is equivalent to
P[X2s +t| X> s] =PXt|X> 0]
Suppose X is life time of an electronic component. Then above equality means that the probability that the component will
survive t time units more, given that it has already survived s time units, is same as the probability that anewly installed
component willsurvive up tot time units.
Thus whatever may be current age of the component, the distribution of remaining life time is same as the original ite
time distribution. In short so long as such acomponent is working it is as good as a new
component. It neither improves
nor deteriorates due to ageing. It is observed in electronic equipments.
Practical Situations :
(1) Some kinds of electrical components such as fuses, transistors etc. do not wear out
with time. Thus, these do not
experience ageing. In such situations exponential distribution, because of its lack of memory or forgetfulness propery
is used.
(2) When atoms of radioactive isotopes like carbon, uranium, stronium split, they
emit a pulse of radiation. This process
called as radioactive decay. The variable, life time of decay possesses lack of
memory property. Hence, exponenue
distribution is used to model this variable.
TATISTICS(SE, Al && DS)
(12.3) EXPONENTIAL DISTRIBUTION

.
SOLVED EXAMPLES
ample12.1, Suppose life time of a certain make of T.V. tube is
the probabilitythat exponentially distributed with a mean life 1600 hrs. What
thetube will work upto 2400 hrs ?
theetube will survive after 1000 hrs ?
solution: Let X:Number of hours that the T.V. tube work
|
Given :X ’ Exp {8} where = 1600
We know that if X’ Exp (0) then
P [X s x] = 1-e- x/0 ;X>0, 0>0
2400
P [X < 2400] = 1-e 1600
= 1-e-1.5 = 1-0.223130
= 0.77687
1000

P[X> 1000] = e
1600 |If X’ exp (0)
P [X > x = e x/0
=e0.625 = 0.5353319
Example 12.2 : The life time in hours of a certain electric component follows exponential distribution with distribution
function.
F (X) = 1-e-0.004 X X>0

What is 0 the probability that the component will survive 200 hours ?(@ the probabilty that it will foil during 250 to
350 hours ? (i) expected life time of the component?
Solution : Let X: life time (in hours) of the electric component
Given: X’ Exp (0)
and F(x) = 1-e x/ = 1-e-0.004 x
1
= 250 hours
0.004
200
250
P[X > 200] = e = e-0.8 = 0.449329
P [250 < X< 350] = P[X> 250] -P [X > 350]
250 350

= e 250-e 250 = e-l-e-1.4


= 0.367879 0.246597
= 0.121282
E (X) = 0 = 250 hours.
POINTS TO REMEMBER
ndnuous r.v. Xtaking non-negative values is said to follow exponential distribution with mean e if its probability
density function (p.d.f.) is given by
1
f(x) = X>0,0 > 0

= 0 otherwise
we write it as X’ Exp (0).
1, the distribution is called standard exponential distribuiton.
IKX’
)
Exp (0) then (i) the distribution function is given by:
Fx(x) = 1-ee X> 0, 0> 0
STATISTICS (SE, AI & DS) (12.4) EXPONENTIAL DISTRIBUTION
Mean = E(X) =
Var (X) = 9 S.D. (X) = 0
If X’ Exp (0) then P(X >s + t|X>s) = P [X>t| for s > 0, t > 0.
It is called "lack of memory property'. It is also characteristic property of exponential distribution.
EXERCISE
(A) Theory Questions
1. Define exponential distribution with parameter . Also state its mean and variance.
2. Obtain distribution function of exponential distribution with mean 0. Hence find all quartiles.
3. State some practical situations in which use of exponential distribution is
appropriate.
4. State and prove lack of memory property of exponential distribution. Also give its interpretation.
(B) Numerical Problems
1. Ar.v. X has the p.d.f. f(x) = e;x> 0. Prove that: PIIX -1| > 2] = e.
2. A random variable Xhas an exponential distribution with mean 5. Find P
[X> 8|X > 4]. Ans.: e- 0.8 = 0.449329
3. Let Xhave an exponential distribution with mean = 4. Find P [X < 2]
Ans.: 1- e- 0.5 =0.593469
4. The life time of a certain battery is a random variable which has an
exponential distribution with mean of 320 hours.
What is the probability that such a battery will last at most 160 hours ? Also find the
will last between 640 and 960 hours. probability that such a battery
Ans.: (i) 0.393469 (i) e e = 0.090348
5. The time until next earthquake occurs in a particular
region is assumed to be exponentially distributed with mean
per year. Find the probability that the next earthquake happens (i)
within two years (ii) after one and half years (ii)
between one and three years.
Ans.: 1-e= 0.981684
6. Suppose that component life times are
exponentially distributed with mean 1500 hours. Find:
() the probability that a component survives upto 2400
hours.
(ii) the median component life time.
(ii) the s.d. of component life time.
Ans.: (Ö) 0.7981 (i) 1039.6 (iii) 1500
7. Suppose that decay time of some
radioactive source (X) has an exponential distribution such that P(X > 1
Find a value of t so that P (X>t) = 0.9. 0.01) = 2
8. The amount of time, in hours that a Ans.: 1.52 x 103
computer functions before breaking down has exponential distribution with
probability density function (p.d.f.) given by
f(x) = 0.005 e-0.005 X
X>0
=0
Find mean of above distribution. Also, write otherwise
(C) State whether following
distribution function.
statements true (T) or false (F) in
are
question numbers 19 to 23.
1. Exponential distribution is adiscrete type
distribution.
2. If Xhas exponential distribution then E(X) = S.D. Ans.: False
(X).
3. Exponential distribution does not Ans.: True
satisfy lack of
4. An exponential variate always takes values in memory property. Ans.: False
interval (0, o).
5. Exponential distribution cannot be Ans.: True
used as life time distribution.
Ans.: False
Chapter 13
LOGNORMAL DISTRIBUTION
13.1 INTRODUCTION
distribution is a continuous probability distribution
Lognormal
of a random
1Ogvariable is normally distributed then
Y = log.(X) has
variable where logarithm of a non-negative
normally
Tlarge variance and all positive values often fit lognormal distributed, The data with a skewed distribution, low
del stock prices, asset return, exchange rate etc. distribution. Lognormal distribution is used in finance to
eteinn :A continuous random variable X is said to follow
lognormal distribution with parameters a, u and o if its p.d.f
sgivenby,

1 1
f (x) =
ov2 (K - a) X > a, - 0 < < oo, G > 0
Notation : X’ LN (a, u, o)
132 PROBABILITY DENSITY CURVE OF LOGNORMAL
DISTRIBUTION
a = 0, u=0, g= 0.25

1.5

0.5 a=0, u = 0, o =0.5

a=0, u=0, G= 1

(0,0) 0.5 1.5 2 2.5 3

Fig. 13.1

Mean: mean = E(X) = x f) dx

1 dx ...(1)
y2
Put log (x - a) = u ’ X-a = e
X = a + e
dx = e du
When X’a then u’-0
X’ o then U’o
(13.1)
STATISTICS (SE, AI & DS) (13.2) LOGNORMAL DISTRIBUTION

Thus expression (1) can be written as


1
Mean e e du

1
S(a + e) e du

du + ee du

Put
1
= a(1) + See du
U= u t GZ
du =o dz
1
Mean = a + G dz

~-202 +o-o) Z-o = y


= a + ee dz Put
.. dz = dy
V2r

2
= a+ e dz

Put Z-o =v dz = dy

When -o < Z < o then -oo < V < o

1
Mean = a + e e dv

+
Mean = a + e'
Variance: Var () = E()- (E ()] ..(1)

EO) = x' f 0)dx


X

After simplification we set


1 x
X-a dx

2 2u + 2o'
E(x) = a + 2ae +e

Equation (1) gives


2u + 2o
Var (x) = a' + 2a + e

2 2u + 20
2
+e - a-e - 2ae
24 + 2o 2u + o
= e -e

Var () = e - 1)
sTATISTICS(SE, AI && DS)
(13.3) LOGNORMAL DISTRIBUTION

SOLVED EXAMPLES
Éxomple.13.1 : Let X ~LN(0, 4, 8). Calculate P(7.8 <x< 20.2)
:We use the result, ifX ~LN (0, H, o») then X° ~ LN (0,
folution. aL, a»o
Let X ~ LN (0, 4, 8)
X ~ LN (0, 8, 32)
y = log X ~ N (8, 32)
Z =
log X-8
~N (0,1)
V32
Consider P (7.8 < X <20.2) = P(log (7.8) < log
(X») <log (20.2))
= P(2.0541 < log (x) < 3.0057
= P
(2.0541 -8 log (x)-8 3.0057 - 8
V32Vs2 V32
= P(1.0511 < z< -0.8829)
= P(z < -0.8829) P (z < -1.0511)
= P(z > 0.8829) - p(z > 1.0511)
= 0.18943 0.14686
= 0.04257
P(7.8 < x< 20.2) = 0.04257
Example 13.2 :Let x ~ LN (0, 1,1). Calculate P (logiox <0.1).
Solution : We know that, if X~ LN (0, 4, o) then log.x ~ N(4, o)
logx
Consider P (log10 X< 0.1) = P (2.3025
o1)
= P (log.x < 0.23025)
X~ LN (0, 1, 1) therefore log.X ~ N (1, 1)
P(og.X-1
1
0.23025
1 = P2> 0.76975)
= 0.22065

P (logo X < 0.1) = 0.22065


XX2
Example 13.3: Let. X(i = 1,2, 3, 4) be id LN (0, 4, o). ldentify the distribution of y=XX4
Solution : We have
K~ LN (0, u. a:i= 1 2, 3,4
Ks are independent
** log (x) -~ N(4, o') ;i= 1, 2, 3, 4
log (x)'s are independent.
:.
locg(x) +log (x)- - log (x;)-log (x4) ~ N(0, 46)
log (x1X2)-log (X3x4) ~N(0, 40)
log ~N(0, 4o)

y = ~ LN (0, 0, 4o')
XgX4
LOGNORMAL DISTRIBUTION
STATISTICS (SE, AI & DS) (13.4)

Example 13.4 : Let X~ LN (0, 1, 4). Calculate E(X)


Solution :X ~ LN (O, 1, 4)
X° ~ LN (0, 10, 400)
400
10 +
2
E(X") = a +e = 0+ e
= e0 = 1.59 x 10
Example 13.5 : Ify =logX ~N(5, 16). Calculate median of x.
Solution: y =log X~ N (5, 16)
X ~ LN (0, 15, 16)
Median of X = a + e
=0+ e
Median of x = 148.4132

EXERCISE
(A) Theory Questions
1. Derive an expression for moment about a for log normal. Hence find mean and variance.
2. Let X~ LN (0, 4 ¡)then prove that y= log (%) has normal distribution with mean and variance o.
3. Show that lognormal distribution is positively skewed.
1
4. Let X~ LN (a, 4, o). Obtain the distribution of log (x- a)
5. Let X~ LN (0, 4, o) then prove that y= X~ LN (0,as, ao); a >0
6. Let X~ N(4, ¡»)then prove that y= e' ~ LN (0, H, o)
7. Let logx ~ N(4, o) then prove that X~ LN (0, 4, o).
8. Let X~ LN (0, 1, 1). Calculate P (Xx*< 1).
9. Let X~ LN (a, L, o). Prove that : mean > median > mode.
10. Let X ~ LN (0, 2, 4). Calculate
) P(*> 1)
(ii) P (2.3 < x < 8.4)
(ii) P (loge x < 0.4)
11. Let X((= 1,2, 3, 4, 5, 6) be iid LN (0, 2, 6). Identify the distribution of
() y = (ii) y=
X3X4XsX6 X4XsX6
12. Let X ~ LN (O, 3, 5). Calculate E (x).
13. Let ~ LN (2, 3, 4). Verify that mean > median > mode.
14. If log X ~ N (3, 9). Calculate (i) Mean (ü) Median (iii) Mode.
Chapter 14
CHI-SQUARE () DISTRIBUTION
INTRODUCTION

Xi, .., Xn are


IfX, X n independent
standard normal variates then the
distribution of i= 1 X; is chi-square
distribution with n degrees of
freedom (d.f.). It is denoted as 2n. The degrees of
number of independent variables used in the 2
freedom n (d.t.) represents the
construction of x. Thus Xn variate can be looked-upon as the sum of
Squares ofn independent standard normal variates.
Tects based on chi-square distribution like
chi-square test of goodness of fit, chi-square test of
attributes etc. are wIdely used. It also plays an important role in independence oT
statistical inference.
Remarks:

1The probability density curves U.25


for n=1,2, 3, 4, 5, and 6 are as shown in Fia. 14.1.

0.20
n=3

0.15

N=4
0.10

\n=2
0.05

n=1
0.00

2 4 6 8 10 12

Fig. 14.1:Chi-square Probability Density


The
pr o bability density curve of x distribution has
longer tail
on right hand side. Thus
positively skew. It is also clear from Fig. 14.1 that mode of chi-square distribution the
exists only probability
whenn> 2density
. curve is

Some practical situations x? variate occurs can also be stated. For instance, suppose Xrepresents the error in
Tneasuring radius of acirclein andwhichX
(~N (0, 1). Then error in measuring area of the circle =A=X?
A
=X follows chi-square distribution with 1 d.f.

(14.1)
STATISTICS (SE, AI & DS) (14.2) CHI-SQUARE (X) DISTRIBUTION
14.2 MOMENTS OF CHI-SQUARE DISTRIBUTION
2
IfY ~ Xn,then rh raw moment of Yis given by,
n
y'e? 2 -1
E(v) -J y' fy) dy =
0
dy

n/2
-y 7tr-1
n
e' y' dy =

r +r

2 ..(0)

-afr-)-r-)-G-)9)
, = n(n + 2) ...(n + 2r-2); r= 1, 2,3 ...(i)
putting r =1,2, 3 and 4 in equation (i) we get first four raw moments as;
V, = Mean = n

U = n(n + 2)

U = n(n + 2) (n + 4)

H, =n(n + 2) (n + 4) (n + 6)
Remarks :
From above formulae we have

Var (Y) = -( = n(n + 2) - n² = 2n


It isconvenient to use cumulant generating function (c.g.f) to obtain central moments.
Coefficients of Skewness and Kurtosis
The coefficients of skewness B and Y are given by,

(8n)? 8
B1 = (2n)3 n'

Hence as n ’ o, B ’ 0and Y ’ 0. It means that when d.f is very large, the distribution becomes approximately
symmetric.
The coefficients of Kurtosis B2 and y are,
48 n + 12 n² 12
4n2 =3+
STATISTICS(SE, AI & DS)
CHI-SQUARE () DISTRIBUTION
(14.3)
12
and Y2 =B,-3 =

[herefore as n ’ o, B2 ’3 andy ’ 0.
distribution becomes mesokurtic for large d.f. (n).
Thus
4.3 ADDITIVE PROPERTY
Statement:If Y, and Y, are independent x variates with n and n, d.f. respectively, then Y; + Y, has also x distribution
of freedom
wth (0,+n2) degrees
2

Proof : Y; Xni
n/2
The m.g.f. of YË = My (t) = (1-20)
2

Similarly, Y ~ Xn2
- ng/2
The m.g.f. of Y2 = My, (t) =(1-2)

The m.g.f. of YË +Y is given by


(:YË and Y, are independent)
My, y, (t) = My0) . My,t)
= (1-2) 2 -20 2/2
(1

= (1- 2t)

which is m.g.f.of x² distribution with n + n, d.f.


2
Thus Y, + Y Xni t n2 2

good. Hence if Y Y2, ... Y .., Yk are independent % variates


Remark : Generalisation of the above stated property holds k
k
2
Y;follows x distribution with n; d.f.
MIn n, n .. n,...ng d.f. respectively then i=1 i=1

PROBABILITIES
L44 USE OFy TABLES FORCALCULATION OF
) Use of Tables for Calculation of Probabilities
c, d are
variate with n d.f. We need to find the probabilities such as P[Y a), P[Y s b], P [c s ysd]where a, b,
TDe a x However, the
computations of these probabilities by evaluating the integral is a very difficult task.
ais. The various d.f. from 1l to
these probabilities is facilitated by using the tables giving values of x variates for
aions of
0. Table of distribution of x' from statistical table].
2

Suppose An, a represents the value of y? variate such


that

1
yn/2 ey2dy =a
2

0.05, 0.2, 0.01


0.80, 0.70, 0.50, 0.30, 0.20, 0.10,
Ihen values of Xn, a are
2
available in the table for a = 0.99, 0.98, 0.95, 0.90,
area as shownin Fig. 14.2.
The area corresponding to this probability is shaded
STATISTICS (SE, Al& DS) (14.4) CHI-SQUARE (X) DISTRIBUTION

x probablity
(y) densityCurve

P[Y > =a
1-a

n, a

Fig. 14.2
2
Illustration : LetY=X1. Find:() P[Y > 14.631] (ii) P[Y < 8.148] (ii) P[4.575 < Y < 17.275].
(iv) Median of distribution of Y, (V) Second and seventh deciles of distribution of Y.
Solution:
f(y)
P[Y < 8.148]
Probability
Curve of X41
0.2

4 10 12 14\16
8.148 14.631
Fig. 14.3
P[Y > 14.631] = 0.2 [From statistical tables]
()
(ii) P[Y <8.148] = PXs 8148] =1-Px> 8148]
= 1-0.7 = 0.3
2
(i) P[4.575 <Y<17.275] = P4.575 <X11 < 17.275
P[i 24575]-P[iiz 17.275]
= 0.95 -0.1 = 0.85
(iv) Suppose median = M, then P[Y > M] = 0.5
= 0.5 As shown in Fig. 14,4.

P[a 2 10.341] = 0.5

M = 10.341

(v) Let D, and D, represent second and seventh deciles respectively then
P[Y < D] = 0.2 P[Y > D] = 0.8
P[Y > D] = 0.8 As shown in Fig. 6.5,

P[z> 6.989] = 0.8

D, = 6.989

Similarly, P(Y S D,) = 0.7


P(Y> D,) = 0.3
tATISTICS (SE, AI & DS) (14.5) CHI-SQUARE () DISTRIBUTION
From Fig. 14.5, P[í >12.899) = 0.3

D, = 12.899
Probability
density curve x.
0.2
f(y) ()
-Probability
density curve 1
0.3

2 4 68 14 16 10 1214 16
10.341 = M
D, = 6.989 D, = 12.899
Fig. 14.4 Fig. 14.5
B USE OF MS-EXCEL IN COMPUTING PROBABILITIES
Suppose we want to find P [X11 > 8.148] using MS-EXCEL.
Step 1: Click on Insert on the Excel sheet.

Step 2 : Click on fx
Step 3: Select functionCHIDIST
Then a box will appear on the screen as follows :

Function Arquments
CHIDIST
x8.143 =8.148

Deg_freedom 11 l 11
=0.69g988061
Returns the one-tailed probability of the chi -squared distribution.

between 1 and 10^10,


Deg_ freedom is the number of degrees of freedom, a number
exduding 10^10.

Formuia result = 0.699988061

Help on this function OK Cancel

Fig. 14.6 (a)


tnter the value of X= 8.148 and Deg_freedom as 11.
Step 4: Click onOK.
A2
-CHIDIST(8.148,11)
B C D
1
2 0.699988
3
Fig. 14.6 (b)
STATISTICS (SE, AI && DS) (14.6) CHI-SQUARE (X´) DISTRIBUTION
The answer 0.699988 appears on the screen.
Note that we always get probability of upper tail.
Note : The above probability can be directly obtained using MS-EXCEL Command = CHIDIST (8.148, 11). Thus
CHIDIST(X, n) gives P(X2x).
Computation of Partition Values of
) Tofindthe median we use the command = CHIINV (0.5, n)l where n is the degrees freedom.
2
|CHINV (0.5, 11)| gives 10.341, Hence 10.341 is the median of x1
(ii) To find the first qartile Q we use command= CHIINV (0.75, n)|
2
|CHINV (0.75, 11)| gives 7.584143; which is the first quartile of xi1.
(ii) To find the second decile D, we use command = CHIINV (0.8, n),CHIINV (0.8, 11)| gives 6.9887, which is the
2
second
decile of X11:
In general= CHINV (1- , n) gives an ordinate x such that P (X a) = or P(sa) =
1-.
(C) USE OF R SOFTWARE COMPUTING
PROBABILITIES
Rsoftware supports following two functions for computing
probabilities and ordinates.
() pchisq(x,n):It is used to find P[X sx] where Xhas
chi-square distribution with n degrees of freedom.
(a) Suppose X has chi-square distribution with 10 d.f. we
want to find P[X<12.5] then we use command.
>pchisq(12.5,10)
it gives answer 0.7470147
(b) Similarly to find P[X > 8.148] when X has
chi-square distribution with 11 d.f. Use command
> 1-pchisq(8. 148, 11)
It gives answer 0.6999881.
(ii) qchisg(p, n) :It is used to find the value of k
such that P[X s k]=p where Xhas chi-square
(a) Suppose we want to find k such that distribution with n d.f.
P[s k] = 0.78 when x has chi-square
Command distribution with 12 d.f. we use
> qchisq (0.78,12)
It gives answer 15.40562 which is the
value of k.
(b) Suppose we want to find k such that P[X k] = 0.58 when x
k] = 0.58 gives 1-P [X<k] = has chi-sguare distribution with 15 d.f. NowP [X
0.58 hence P Xsk] = 0.42 we use
> qchisq (0. 42, 15)
command
It gives answer 13.28882 which is the value of k.
Illustration: To find median, quartiles, deciles we use function
qchisq(). Suppose Y’Xi1 then
2

median is given by command> qchisq (0.5, 11) =


10.341.
(ii) first quartile Q1 is given by
command| qchisq (0.25, 11) =13.70.
(iii) second decile D, is given by
command > qchisq (0.2, 11) = 6.989.
DS)
sTATISTICS (SE, AI &
(14.7) CHI-SQUARE ( DISTRIBUTION
Gsher's Approximation:
ent: If Y~ Xn then y2Y -V2n-1’N(0,1) as n -oo
ory2Y ’ N(V2n-1, 1), for large n
Proofof this statement is out of scope of this book.
However, we use it for solving problems.
Note: P[Y s al = P(V2Y <V2a)
PV2Y-V2n -1 sy2a-V2n-1)
=P[N(O, 1) <V2a-V2n -1]
o(V2a-2n -1)
or
P[Y < a] =
P|Ly2n (V2n)
POINTS TO REMEMBER
The p.d.f.chi-square distribution with n d.f. is
-1
e2
f(y) = y : y0.

2
r9)
" Ify~ X, then
(a) First 4 raw moments are given by

H; =mean =n, , =n(n + 1), B, = n(n + 2) (n + 4),


H=n(n + 2) (n + 4) (n + 6).
(b) First 4 central moments are given by
Hi =0, Hz = 2n, H3 = 8n, ly = 48n + 12n2
12 12
n B =3 + n n

0) Therraw moments is =n (n + 2) ... (n + 2r- 2); r= 1,2...


E) The moment generating function (m.g.f.) is given by
My(t) = (1-2t)/2;t<
) The cumulant generating function (c.g.f) is
Kx(t) = -In
2
(1-2t) and r" cumulant
K, = (r- 1)!n 2;r= 1, 2...
(g) Theerecurrence relation among the central moments is
+1 = 2r (n u- t 4];r=1,2, 3 ...
h) The
mode of the distribution is n- 2 for n > 2.
Y-E M Y-n ~N (0, 1) as n ’ oo [normal approximation]
\Var () V2n
-N2n-1 N(0, 1) for large n[Fisher's approximation].
CHI-SQUARE («) DISTRIBUTION
STATISTICS (SE, AI & DS) (14.8)
has also y2
and Y, are independent y' variates with n, and n, d.f. respectively then YË + Y
Additive property :If Y, generalization of this property also holds good.
distribution with (n, + n2) degrees of freedom. The
Important Results:
and U= X + Y are
independent chi-square variates with m and n d.f. respectively then V = v.v
(0 IfXand Yare
2
Similarly U = X+ Y and V = , v are independently distributed. In this case U~
independently distributed.

respectively then U = X +Y and V=y are independently


(ii) If Xand Yare independent variates with m and n d.f.
2
Y
Here U = Xm+n
distributed. Similarly U = X+ Yand V =,are also independently distributed.
(ii) If Xis a continuous random variable with probability density function
1 0<x<1 with m>0,n >0
f(x) = B(m, n) xm=' (1- X)-1
= 0 otherwise
with 2n d.f.
then for large m and comparatively small n, -2m log. Xfollows approximately x distribution

EXERCISE
(A) Theory Questions
n

1 (a) Let X, X , Xn be iid. N(0, 1) variates. Obtain the moment generating function of Y= i=1 X;.ldentify the
distribution of Y.

(b) Define a chi-square withn d.f. and derive its p.d.f.


2. (a) Define y variate with n degrees of freedom. Find its mean and variance.
(b) Obtain the rth raw moment of x distribution with n d.f. Hence, find mean and variance.
3. Obtain the m.g.f. and hence find y and for a chi-square distribution.
4. State and prove additive property for x´ variates.
5. Obtain Karl Pearson's coefficient of skewness for a chi-square distribution.
6. Show thatthe mode of the chi-square distribution withn d.f. is n-2. Also find P X > 2.343) if mode of the
distribution is 3.

7. If X and Y are independent chi-square variates with m and n d.f. respectively, show that , X
vand X + Y are
independently distributed.
X are
8. Show that if Xand Yare independent x' variate with m and n d.f. respectively then X + Y and,
independentiy
distributed.

9. IfXis a continuous r.v. with probability density function


1
f(x) = B(m n x (1- x)n1 ;0sxsl with m> 0, n>0
= 0
;otherwise
then show that for large m and comparatively small n, - 2m loge Xfollows approximately y2
distribution with 2n d.t.
10. Showthat for a y'variate with nd.f.
ur +1= 2r [n ur-1+ Ur; r=1,2,3... and hence obtain Tirst four central moments of the
distribution.
STATISTICs
(SE, AI & DS) CHI-SQUARE () DISTRIBUTION
(14.9)
(a) Derive the recurrence relation between raw moments of chi-square distribution.

(b) Let. Xbe a chi-square variate with n d.f. Find the m.g.f. of y =X-n Hence, find its limiting values as n’ o and
V2n
interpret the result.

are independent
12. Let X,
and X, be two independent N(0, 1) variates. Showthat Y = 2 and Y = 2

chi-sauare variates with 1 d.f. each


Chtain Cumulant generating function (c.g.f) of chi-square distribution.Hence B1, YB, and .
.. sate limiting behaviour of xn as n’o according to (i) Normal approximation. () Fisher's approximation.
15 Let X1, X ... X6 be I.1.d. N(0, 1) variates. State giving reasons the distribution of

U=
x{+ Xí+ X
X
respectively. State the distribution of (i) X + Y.
16. IfXand Yare two independent chi-square variates with m and n d.f.
() V2X-2m- 1 for large m.
m n
obtain
standard normal variates. If U= E X; and V = E X^m+j'
17. Let X,X ... Xm Xm + 1, Xm + 2 ... Xm + nbe i.i.d. i=1 j =1
nU
the distribution of W= mV
derive the distribution of Y = X2.
18. Let X have a standard normal distribution. Then
n-1

variates with i2 k d.f. state the distribution of (Y;-i Yn) +


19. Let Y; (i = 1, 2... n) be n independent chi-square i=1

20. Let the p.d.f. of r.v. Xbe given by,

f(o) = (2)/2 e ;-00 < X< o

freedom.
Show that Y = X2 follows chi-square distribution with 1 degree of
of
distribution for n = 1,n = 2 and n > 2, where n is degrees
scuSs the nature of probability curves for chi-square
eedom. Write the type of skewness for the curve when n < 2.
22. Establish the recurrence relation among the central moments of y' distribution with n d.f.

independent chi-square variates with 10 and 20 d.f. respectively. Find median of the probability
Aand Ybe two
distribution of X+Y.
24. The probability density of continuous random variable Xis given by
1 - oo <X < oo,
f(x) =
of
X, If X,, X, .... Xn are n independent standard normal variates then state the distribution
tie distribution of
n

Y= X.
i=1
CHI-SQUARE (X) DISTRIBUTION
STATISTICS (SE, AI& DS) (14.10)

(B) Numerical Problems


1. Let X,, X, ... Xs0 be i.i.d. N(0, 1) variates.
60

Find PLt= 1 x{<65.227


Ans.: 0.7

2. IfX, and X are two independent N (0,1) and N0, variates. State the probability distribution of x/ + 2X,.
2
Ans.: X2

3. Let X;(i=1,2, .. 145) be iid. N(0, 2) variates Find P(xí+x+. *Xi4s s 162)
Ans.: P[N(0, 1) < 1] = 0.84134
4. Identity the distribution of a r.v. Xif its m.g.f.

My (t) = (1-2t)-8 where t <


Ans.: x distribution with 16 d.f.
-20
5. If My(t) = where t <is the m.g.f. of a r.v. X. find median of X.
Ans.: Median = 39.335
6. Show that for a y² distribution with 2 d.f. Po= Plx² v] = eu/2, If Po = 0.05 find v.
Ans.: v = 3.012

10
7. (a) Let X,, X2, ., X1o be ii.d. N(5, 10) variates. Calculate P
Li=1
(X;-5)² 72.67 .
(b) Let X, i= 1,2...8be independent and
identically distributed N (u = 20, s² = 20) variates.
Calculate P (X- 20) > 190.48
Li= 1
Ans.: 0.7
8. Let X1, X,...Xg be arandom sample
from N(4, 25) distribution. Find mean and
8
median ofYwhere.
Y = 2 X;-4) 2
i=1
8
Ans.: E(Y = 25
o
9. If X, and X, are two
independent normal variates with mean 4 and S.D. 1 each find P
[0.296 < X,- X)2 < 0.910)
Ans.: 0.2
10. Let X, X, .. Xr .., X12 be
2
2
independent normal variates such that E X) = 0 and
X12 Var (X) = r. Ino
+
122 S 11.340
Ans.: 0.5
11. Let X, ~ N(5.5, 1), X, ~ N(5.5, 1)
and X1, Y, aree
Ans.: 0.2 independent find P[2.148 < X, -X² < 5.412]
sTATISTICS (SE, AI && DS)
(14.11) CHI-SQUARE (*) DISTRIBUTION
12. If X4, X2 are independent standard normal variates then find,
P[Xi + Xy)2 s 0.296, (XË X,)² >5.412|
Ans.:0.003
:
) State True or False
, The p.d.f. of x' distribution with n (> 1)d.f. is same as that of
Ans.: False
G,distribution.
2 The p.d.f. of x' distribution with n d.f. is same as that of Gdistribution.
Ans.: True

3. The p..f. of y? distribution withn d.f. is isame as that of G|..


Ans.: False

4 The rth raw moment of x²


distribution with n d.f. is given by 2h
r4)
Ans.: False

5. For x² distribution with n d.f. the rth rawmoment is given by 2.


rl9)
Ans.: True
6. x distribution is symmetric.
Ans.: False

. Ihe moment generating functionof chi-square distributionwith 10 degrees of freedom is (1-205.


Ans.: True
O Ihe recurrence relation among the central moments of x² distribution with n d.f. is
Hra1= 2n [r ur- t uj for r = 1, 2, 3 ...
Ans.: False
* Ihe recurrence relation amongthe central momentsof x² distribution with nd.f. is
Hr*1=2r [n u- t L;] for r = 1, 2, 3 ...
Ans.: True
10. Chi-
Ans. True -square probability distribution satisfies additive property.
XË .. X10 are independent normal variates such that E (X) =0and Var (X) =i i= 1, 2,.. 10 then

22 + 22 102
> 7.267 = 0.7.
Ans: True
STATISTICS (SE, AI & DS) (14.12) CHI-SQUARE() DISTRIBUTION
10

i =1
o². Then
12. Let X1, X2 XË, ..,X10 be a random sample with population mean u and variance has x
distribution with 10 d.f. (where Xis sample mean).
Ans.: False

13. The (;.)


pdf. of G\2 2)
is same as that of y² distribution with 1 d.f.

Ans.: True

2
14. If X, X_, .., Xio are independent identically distributed N(0, 1) then the probability distribution of i 2=1 has chi
square distribution with 4 d.f.
Ans.: False
UNIT V-
INFERENTIAL STATISTICS: HYPOTHESIS
Chapter 15
STATISTICAL INFERENCE (THE IDEA OF
ESTIMATION AND TESTING OF HYPOTHESIS)
15.11
INTRODUCTION
study a group of large
ln orderto
several
number of items or individuals we require to
sampling times in every day life. Thus, sampling is draw a sample. We use the technique of
believedto be scientific and objective procedure. well-accepted means of collecting informatiíon. Moreover it is
define the terms like Sampling plays very important role in statistical inference. Hence, we
shall population, sample, simple random sampling with replacement (SRSWR), simple random
sampling without replacement (SRSWOR) etc.
15.2RANDOM SAMPLE FROM APROBABILITY DISTRIBUTION
Consider apopulation with probability law f(x, 0). [p.m.f. in case of discrete distribution and p.d.f. in case the probability
distribution is continuous.]
.fnition :Asample X, X .., n from the population with probability law f(x, 0) is called a random
sample if they are
independent and identically distributed random variables with probability law f(x, 0).
Nate : Sample drawn for estimating parameter 0 or for any other inference is generally a random sample. For drawing
inference we use numerical values of X,, X2 ...Xo.
15.3 STATISTIC AND PARAMETER
Using the random sample X X ., Xn we draw conclusion about the unknown probability distribution. However
probability distribution can be studied if the parameter is known. In other words study of probability distribution reduces
to the study of parameter 0. In practice, parameter values are not knownand the estimates based on the sample values are
cenerally used. We use functions of sampled values in place of the unknown parameters. These functions are known as
statistics, Astatistic is used to 'estimate' the value of parameter. There are various ways of summarizing the sampled
observations. The summarized quantity is called as a statistic. We define it precisely as follows:
Definition :If X1, X2, .., Xn is a random sample from a probability distribution f (x, 0), then T =T(X, X2, .., Xp) a function of
sample values which does not involve unknown parameter is called as a statistic (or estimator).
Some typical statistics are given below:
() Sample mean: T = T(,X2, ., Xp) = n

T = X is astatistic
0) Sample variance:
T =T(K, X2, ...Xp)
1
n-1
5 (Xj- x)* is astatistic.
a statistic.
) Similarly, it can be show that sample proportion (p) is also
Note: such as mean, variance, proportion,
correlation coefficient
) sample values
Verbally.statistic is a summarized quantity
e other hand similar quantities
which
of
correspond to population are called as parameters. If f(x, 0) is a
p.d.f. ther

Wie constant involved in it is also calledas parameter. 0. Thus by the term parameter we mean either
o
variance etc. are the functions of
6.d, the population mean,
function of 0.
(15.1)
STATISTICS (SE, AI&DS) (15.2)

theXnformer is a random
STATISTICAL INFERENCE
variable and the
latter is
that from asample to
Since statistics is a function of random variables
(2) A noteworthy difference in statistic and parameter X1, X, ..,. it is also a random variable. It varies
constant.
(3) sample.
Since statistic is a random variable it possesses some probability distribution, it may not be the same as that of the
p0ssess probability distribution
distribution f (x, 9), However, barameter being a constant, does not
parent
15.4 SAMPLING DISTRIBUTION OF STATISTICAND STANDARD ERROR
Further statistical inference is based on statistic, therefore we need to study its probability distribution. The general theory

is developed in the subsequent discussion. statistic


sample from f (x, 0) then the probability distribution of
Definition: If X, X, ., Kn is a random error (S.E.).
standard deviation of T is called as its standard
Called as its sampling distribution and
S.E. (T) = VE[T-E())?
problem of testing of hypothesis standard error of statistic used in the test procedure turns out to be impota.
In the
quantity. We list below some typical statistics alongwith standard errors.
) T=X, S.E. () =Vn
where, g² is the population variance.
from a population with variance
(n) Suppose there are two samples of size n, and n, respectively. First is drawn
and the second is drawn from a population with variance o, independently.
Let, T =X-Xz, then

S.E. (T) = V Var (X, -X)


2
O1
(verify)
nË n
(ii) If p is the sample proportion then
PQ
SE. (P) = /
where, P:population proportion and Q = 1-P.
(iv) If p, and p, are proportions obtained using two samples one from first population with population proportion Pi
andthe other sample from second population with P, as population proportion. Let, T= P1 -Pz then,
it can be shown that

S.E. (T) =
n

15.5 TESTINGOF HYPOTHESIS


15.5.1 Introduction
Ouite often we need to test some claims regarding populations onthe basis of sample. For example:
(1) Aparticular scooter gives average of 50 km per litre.
(2) Proportion of unemployed persons is same in two different states.
(3) Average life of an article produced by company Ais larger than that of company B.
increase thesales.
(4) Marketing agencies advertise the goods on radio, television or in newspaper etc. with a view to
In this situation one may be interested to test whether such an advertisement campaign was really eeci
increasing the sales.
parameters or
Thus, mainly, we are interested in testing certain claims. These claims are to be stated in terms of population
statistical distribution.
TISTICs(SE, AI & DS)
sTA
(15.3) STATISTICAL INFERENCE
15.5.2Hypothesis : Definition
tisa statement or claim or assertion about the
statistical
Inother wWords hypothesis is an assumption to be tested. distribution or parameter of statistical distribution.
forexample: (a) Average Iite of a
is bulb 1000 hours. It can be written as Ho : =1000, where represents population
mean.
1b) Proportionof literates in certain locality is 55% can be written using population parameter proportion as Ho:P= 0.55.
There are two tvpes of hypotheses in every problem of tests of significance. If one is rejected the other is to be accepted
andvice versa. These hypotheses are referred to as null hypothesis and alternative
Null Hypothesis: A hypothesis of "no difference" is called hypothesis.
as null hypothesis or according to R. A. Fisher null hypothesis is
thehypothesis which is tested for possible rejection under the assumption that it is true.
Nullhypothesis
is denoted by Ho-
ample: Ho: =100. Here the hypothesis states that there is no difference in population mean
and 100. Ho : Ui = P2
or Hoi H1 - = 0. This hypothesis states that there is no difference between two population means. While conducting the
ame difference will be observed in sample value and hypothesised value. Whether this difference is treated to be
nrable under Ho or it is just due to chance element, is decided in testing procedure.
Aternative Hypothesis : It is a hypothesis to be accepted in case null hypothesis is rejected. In other words a
romplementary hypothesis to nulhypothesis is called as alternative hypothesis. It is denoted by H
For example:If Ho : U1 = H2 then alternative hypothesis may be H,:* Or HË:HË < Hz or HË:1 > 2
Alternative hypothesis is also called as research hypothesis. Many a times, it is the alternative hypothesis which is of
interest to be proved. For instance, a drug intended to lower the diastolic blood pressure tests the null hypothesis
H:=42 its H, :4 <H, So that H:1 <42 is the hypothesis of interest.
One and Two Sided Hypotheses :
By considering the nature of hypothesis, those are classified as one sided or two sided.
Hypothesis of the type HË: >Ho H,:P< 0.5, H, :uË < H2, Ho : o; > ß, etc. are called as one sided hypothesis. On the
hypothesis.
other hand the hypothesis of the type HË: P 0.5, H, :oj# , HË:o etc. are called as two sided
15.5.3 Test of Hypothesis
each and every item in the population.
In order to study the above stated claims (hypotheses) it is impracticable to study size is selected, suitable statistic such
sample of appropriate
Naturally one has to use sampling methods. In this situation take decision about accepting or rejecting hypothesis. How
a5 Sample mean, sample variance sample proportion is used to
discussion.
to carry out the test will be clear in the subsequent
15.5.4 Type I and Type II Errors instance, in
based on sampling, it is subject to two kinds of errors. For
Snce, decision of acceptance or rejection of Ho, is accordingly take
inspector will choose a sample of suitable size and
Uie inspection of a lot of manufactured items, the errors are possible viz., rejection of a good lot and
this process, two
vedsion whether to accept or reiect the lot. In
these errors are called as type Iand type II errors.
tance of abad lot. In testing of hypothesis
Type I error: Rejecting HÍ when it is true.
Type II error : Accepting HÍ when it is false.
These errors can be put in tabular form to
remember easily.
Decision
Actual Accept Ho
Situation Reject Ho
Correct decision
H, is true Type Ierror
Correct decision Type Il error
H, is false
15.5.5Critical Region While deciding whether HÍ is true or false, entire sample
space
taken for testing Ho. space is partitioned
Xn be arandom sample non-overlapping regions. In other words the sample
due to sample Observations) is divided into two
Ho (or
Into two subsets Wand WC in the test procedure. region of H,) and the other is rejection region of
One Ho (or rejection which is used in
of the
regions is acceptance region of rejected is treated as critical region. If a value of statistic
Ho is or C.
estin region of H). Aregion in which accepted otherwise. Critical region is denoted by W
testing Ho falls in critical region, H, is rejected and
STATISTICS (SE, AI &DS) (15.4)
STATISTICAL INFERENCE
Acceptance region W Critical
region (W)

Fig. 15.1
15.5.6 Test
Arule which leads to the decision of acceptance of Hoor rejection of Ho is called a test.
Test Statistic : A
function of sample observations which is used to test Ho is called as test statistic.
Level of Significance :Aprobability of rejecting H, when it is true is called as levelof significance. It is denoted by c
Level of significance can be interpreted as proportion of cases in which Ho is rejected though it is true. It is a probabilih.wot
committing type I error.
Level ot significance cannot be made zero. However, we can fix it in advance as 0.01 (.e.
the cases it is taken as 5%.
I%%) or O.05 (1.e. 5%). In most of
15.5.7 One Sided and Two Sided Tests
As seen earlier the alternative hypothesis H, is either one sided or
two sided. The test used to test Ho when alternative
hypothesis H, is one-sided is termed as one sided test or one tailed test while the test applied to
test Ho when alternative
hypothesis is two sided is called as two sided (tailed) tests. The details of these tests are
discussed in the next chapter.
15.5.8 Observed Level of Significance (p-value)
Use of probability value (p-value) or observed level of
fixing the level of significance uniformly same for all datasignificance becoming popular in tests of hypothesis. Instead of
is
sets, sometimes it is computed for the data set used in
hypothesis. If Z is the test, statistic and its value under Ho for a given data set testing of
is Zo then for the above two tailed test
observed p-value will be P (|Z | > |Zo). In the above illustration p-value = P
error in this case is 0. (Z>-6 |) = 0. Thus probability of type I
If Z, would have been 1.5 then P -
value = P( Zal > 1.5) = 0.0668.
p-value is the smallest level of significance at which HÍ
Would be rejected.
p-value is more informative as compared to level of
Some software packages use p-value instead of levelsignificance. The details are beyond the scope of the
of significance. book.
If p-value < a, then Hois rejected at
100 a % L.o.s.
If p-value > a, then Ho is accepted at
100 a % L.o.s.
15.6 POWER OF THE TEST
Definition: Power of test is the probability of rejecting Ho when H, is
Probability of type II error is denoted by B. true.
B =P (Accept Ho|HË is
The behaviour of a. and B is opposite of each other. If a true)
is
= P(W|H) where, w is a critical region.
balance between the two, we fix a at 5% or 1% smaller B becomes larger and vice versa. To
and minimise B. obtain agoou
Power of test = P (Reject Ho| H) = P(w|
Thus power of the test is a probability of H) =1-P(Type II error) = 1- B
correct decision.
Hence we maximise power (1 - B) when a is fixed.

POINTS TO REMEMBER
Population is collection of items or individuals
selected according to some rule to study having some common
population is called sample. characteristic. Any part of the popuie
If X, X, ... Xn is a random
sample from aprobability distribution f(x, 0)
parameter 0 is called a statistic then T = T (X1, X, ... X) a function or sa
values which does not involve unknown
For example, sample mean, sample
proportion, sample variance are statistics.
TATISTICS(SE, AI & DS)

If X, X2 ... X, is a random (15.5)


sample from f(x, STATISTICAL INFERENCE
sampling distribution and
standard deviation of0)Tthe probability
is caled its distribution of statistic T(X Xy ... x) is called as its
S.E. (T) = NVar (T) standard error (S.E.)
For example, If T = Xthen

S.E.(X) =
, bunothesis of "no Vn ,S.E. (p) = where Q=1-P.
difference" is called as null hypothesis or
hypothesis which is tested for
possible rejection under the according to Prof. R. A. Fisher null
Eor example : Ho H = 100 or Hi = M2 etc. assumption
that it is hypothesis is thne
true. It is denoted by Ho.
hyoothesis rejected. It is denoted by H,.
is Alternative hypothesis is a hypothesis, which is to be accepted in case nun
tor example:If H;:= 100
then
The hypothesis of the type Hi: HË:l> 100 or H,: u< 100 or
p > Ho H; : P< 0.5, H,: u < H,:u+100.
H.:u 100, H;# 2 etc. are called two L, are called one sided
sided alternative hypotheses. hypotheses on other hand
, Tvpe I error in testing null
hypothesis HÍ is rejecting HÍ when it is true.
Probability type lerror is called level of significance (.o.s.)
of
error in testing nullhypothesis HÍ is and it is denoted byc. Generally, a =
accepting Ho when it is false. 0.05 or 0.01. Type l
While deciding whether Ho is true or false entire
sample space is divided into two non-overlapping regions. The
corresponding to which HÍ is rejected is called critical region or rejection region. It is denoted by Wor C.
region
If Z, is the value of test statistic Z then for
two tailed test p-value = P 1ZI > |Za). It is also called
significance. observed level of

EXERCISE15(A)
(A) Theory Questions
1 Explain the terms
0) Population, (ii) Sample,
(ii) Statistic, (iv) Hypothesis,
(v) Nullhypothesis, (vi) Alternative hypothesis,
(vi) Critical region, (vii) Type I error,
(x) Type II error, (X) Level of significance,
(xi) p-value. (xii) Power of test
. State the utility of sampling in statistical inference.
. Distinguish between a parameter and a statistic.
*. Explain the term sampling distribution of a statistic with illustration.
Explain the term standard error with illustrations.
aae the standard errors of the following:() sample mean, (ii) sample proportion, (ii) difference between two
Sample means, (iv) difference between two sample proportions.
(B) State the whether the following statements are true or false.
. Statistic is a
Ans.: False constant.
Z.
Parameter is a constant.
Ans.: True
3.
Standard
Ans:4. FalRandom error is human error in
se selection
using method of sampling.
of units must be done while drawing a sample for statistical inference.
Ans.S, TrueP-yalue is the smallest level of significance at which Ho Would be rejected.
Ans.. True
STATISTICS (SE, AI &DS) (15.6) STATISTICAL INFERENCE
L(B) Sampling Distributions
15.6 INTRODUCTION
In order to draw inference about acertain phenomenon, sampling is a wellaccepted tool. Entire population cannot be
studied due to several reasons. In such a situation sampling is the only alternative. Aproperly drawn sample is much
useful in drawing reliable conclusions. Here, we draw a sample from probability distribution rather than a group of
objects. Using simulation technique sample is drawn.
15.7 RANDOM SAMPLE FROM ACONTINUOUS DISTRIBUTION
Arandom sample from acontinuous probability distribution f (%, 0) is nothing but the values of independent and
identically distributed random variables with the common probability density function f (x, 0).
Definition :Random sample :If X, X , X, are independent and identically distributed random variables, with p.df
f(x, 0), then we say that, they form a random sample from the population with p.d.f. f (x, 0).
Note:
(1) For drawing inference, we use the numerical values of X, X, .. Xn:
(2) The joint p.d.f. of X1, X2, .. Xn is,

f (X, X, ., Xn) = f (X), f(x2),. .f(%n)= II f(x)


i=1
15.8 STATISTIC AND PARAMETER
Using the random sample X, Xy , X, we draw conclusion about the unknown probability distribution. However
probability distribution can be studied if the parameter 0 is known. In other words study of probability distribution reduces
to the study of parameter 0. We use sampled observations for this purpose. There are various ways of summarizing the
sampled observations. The summarízed quantity is called as statistic. We define it precisely as follows.
Definition :If X1, , ., Xn is a random sample from aprobability distribution f (x, 0), then T = T X, X2, ., Xn) a function of
sample values which does not involve unknown parameter is called as a statistic (or estimator).
Some typical statistics are given below:
() Sample mean: T= T(X, X2, ., Xn)= n

T= X is astatistic
(i) Sample variance:
T = T(X, X2, ... Xn) = n-1x;-x) is astatistic.
(ii) T= max X1, X2, ..., Xp}, T=min X1, X2, .. , Xp}
T = median of {X1, X2, .., XnB are also statistics.
(iv) Sample proportion (p) of observations less than u is also a statistic.
Let, Y; = 1; if X; < u
= 0; if X; 2

then, T= T(x1, X2, ..,Xn)= (y1 y2 .. yn)


n

No. of observations less than u


n =p
p is a statistic.
In this manner several statistics are defined and
the statistic suitable for the purpose is
Note: selected.
(1) Verbally, statistic is a summarized quantity of sample values such as mean,
On the other hand similar quantities which correspond to population are
variance, proportion, correlation coefficien.
called as parameters. If f (x, 0) is a p.d.f. then
the constant involved in it is also called as parameter. Note that, the
of 0. Thus by the term parameter we mean either or function of 0.
population mean,variance etc. are the functiois
STATISTICS(SE, AI & DS)
(15.7) STATISTICAL INFERENCE
noteworthy difference in
2) A statistic and paranmeter is that the
Since statistics is a function of former is a random variable
and the latter is a constant.
random variables X1, X, ... Xn it is
also a random variable. It varies from sample to
sample.
Since ctatistic is a random
3)
tdistribution f (x, 0).
variable it
possesses some probability distribution, it may not be same as that of the
However, parameter beinga constant, does not
the
15.9 SAMPLING DISTRIBUTION OF possess probability distribution.
STATISTIC
Further statistical inference is based on statistic, therefore we
AND STANDARD ERROR
isdevelopedin the subsequent discussion. need to study its probability distribution. The general theory
pefinition:If X1, X2, .., Xn is a random
sample from f (x, 0) then the probability
das its samplingdistribution and standard deviation of Tis called as its standarddistribution
error (S.E).
of statistic T(X1 X21 . , Xy) is

S.E. (T) = VE[T-E (T))2


.the problem of testing of hypothesis standard error of
statistic used in the test procedure turns out to be important
quantity. Welist below some typical statistics alongwith standard
errors.
) T=X, S.E. (T) = where, ² is the population variance.
() Suppose there are two samples of size n and n, respectively. First is drawn 2
from a population with variance G1
and the second is drawn from a population with variance o.
Let, T = X1- X, then
2

S.E. (T) = V Var (X1-X) = n n2


(Gi) If p is the sample proportion then
S.E. (p) =
where, P: population proportion and Q = 1-P.
(w) If pi and p, are proportions obtained using two samples one from first population with population proportion P1
andthe other sample from secondpopulation with P, as population proportion. Let, T= P-P, then,
PQ1 PzQ2
S.E. () = 1 +
nz

() If T = S2 =
n

then, S.E. (T) = g²1


(M) If Tis sample median of a sample from N (u, o), then
S.E. (T) = 1.2533

0 SAMPLING DISTRIBUTION OF MEAN AND VARIANCE EROM NORMAL POPULATION


In majority cases a sample is assumed to be drawn from normal population. In order to develop tests regarding mean u
variance S² = (Xi- X)*. The derivation
.alice ßf we need to find sampling distribution of mean Xand sampling
are introduced and included in brief in the
foldon of S² needs orthogonal transformation and some results. Those
tol owing discussion.
Theorem 1: arandom sample from N(u, o) then,
0) Sample mean XandIsample variance S2 = (X-X)2 are independently distributed.
STATISTICS (SE, AI & DS) (15.8) STATISTICAL INFERENCE
(i) Xfollows N (u,

of freedom.
(ii) follows chi-square distribution with (n - 1) degrees
POINTS TO REMEMBER
function of sample val.
If X1, X2, ..., Xn is a randomsample from a probability distribution then T = T(X1, X2, ..., Xn) a
which does not involve unknown parameter is called a statistic. e.g.
n

i=1 1
(ii) T==EX;- X)'. (ii) T=sample proportion = p.
() T= sample mean = n-1

(iv) T = max {X1, X ..,X} or T= min (X1, X2 ..., Xn.


Statistic is a variable while parameter is a constant.
IfTis a statistic then Standard Error (S.E.) of T is S.E. () =ET- EO)
Statistic Standard Error (S.E.)
T=X=Sample mean. S.E. (T) =Fwhere, o2 is population variance and nis sample size.
T=X1-X% where. X1 and X2 are means of samples of sizes n1, n2 , where, o and o; are population variances
S.E. (T) = n
drawn from 1s# and 2nd population respectively.
and n1, n2 are sample sizes.
T=P=Sample proportion.
S.E. (P) =1 P(1-P) n

where, P:Population proportion.


T=P1-P2 where, P1 and P2 are proportions in the samples of
sizes n1 and n2 drawn from.
S.E. (T) =S.E. (P1 - P2) =1/ P1 (1-P),
n1
P2 (1-P2)
n2

If X4, X2, ... X.. X, is a random sample from N (4, o) then


1
() Sample mean X and sample variance S = nË=1 (X;- X)' are independently distributed.

(Gi) }follows N(
n

i=1
(ii) follows chi-square distribution with (n - 1)degrees of freedom.

IfX, X2 ..., X, is a random sample from N (u, o) then X


1
where, s² = n-1 Xi-X)° follows t-distribution with (n-1)
SNn
d.f.
If X, X2, .., Xn, is a random sample from N(4, o) and Y, Y2, .,Yn, is a random sample from N(42, o)then
X- X)´ + 2(Y- )1
TOllows t-distribution with n, + n,-2 d.f where 2_ 2 nË + n, -2
1.
+
1

sY-?
where, s; = and
nË-1 n - 1 follows F-distribution with (n1- 1) and (nz-1) d.f.
TATISTICS (SE, AI & DS)
(15.9) STATISTICAL INFERENCE

A) Theory Questions EXERCISE 15 (B)


Eynlain the term random sample from a
probability distribution.
2 Explain the following terms:
(0 Statistic
(ii) Sampling distribution of a statistic
(ii) Standard error of a statistic.
2 Distinguish between 'parameter' and 'statistic'.
4. Obtain the sampling distribution of a mean ofa
random sample drawn from
() Normal distribution.
(ii) Exponential distribution.
(ii) Gammadistribution.
5. Let Xi, X2, ..,
Xn be a random sample from N (u, g) then
showthat Xand S2 = n X; - X)² are
independently
distributed random variables. Also find the probabilitydistribution of.
Discuss the importance of the result.
6. Explain the term sampling distribution of a statistic. Also
obtain the sampling distribution of a mean of random
sample of size n drawn from exponential distribution with
parameter a.
(B) Numerical Problems
1 If X4, X2, ..X0 is arandom sample from N (u, o), find the probability
10
distribution of
20
X;- a)? 10
i=1 1 i= 11
20 where, a =102 XË and b = 10
1

i= 11
Ans.: Fo,9
2. (a) IfX and S² are the mean and the variance of a random sample of size 16 from N(3, 64), then find
P(-1<X< 5, 34.188 < S² <77.244).
(b) Let X, X2, ., X6be a random sample of size 16 from N (20, 25) population. Find
P(19 < X< 21.5, 13.354 < S2<34.854).
Ans.: 0.573
3. IfX and $2 denote the mean and variance of a random sample of size 10 from N(4, 160) then evaluate,
P[0 <X < 4, 86.08 < S² < 170.496]
Ans.: 0.1707
15
Let X, X2 ...,. X1s be a random sample from a N(3, 10). Determine P ) (X;- X)2 s 77.9
Li=1
5, Let s2 be a sample variance of a random sample from N(, o), find the E(S?) and Var (S).
6. If X1, X2... X1o iS ar.s. from N(8, 16), find E(S), V(S), P(S²2 27.0704).
and Y = ,show that YË and Y are independent. Hence or
* Let X, and X, are i.d. N(0, oÐ. IÍ Y,=G
follows y² distribution with
otherwise show that XË +2 X and S² are independently distributed. Also show that 2s/
one d.f.
(15.10)
STATISTICAL INFERENCE
STATISTICS (SE, AI & DS)

8. Let X, X, and X, be i.i.d. N (0, 1).


(X1 + X-2 X,) (X + X + X;)
(X1- X2) and Y = Show that Y Y2 Y3 are independently
Let, YË =
V2
distributed. Hence or otherwise show that X and $2 are independently distributed. Also show that
Y3 d.f.
X= ’N (0, and 3S2 = Y+ Y, follows chi-square distribution with 2
V3
distribution, Compute
9. Let Xand S' be the mean an variance of a random sample of size 25 from N (3, 100)
P (0 <Xs 6, 55.2 <Ss 145.6).
Ans.: 0.1
(C) State whether the following statements are true or false.
25

2 X
i=1
1. If X, X,. .Xg;is a random sample from N(35, 100) then Var is 2.
25
Ans.: False

2. Let X, X2 .., X16 be arandom sample from N (27, 100) with sample variance S² then 0.16 S2 follows chi-square
distribution with 15 d.f.
Ans.: True
3. Statistic is afunction of sample values involving unknown population parameter.
Ans.: False
4. Statistic is a random variable while parameter is a constant.
Ans.: True
5. Suppose X1, X, .., Xn is a random sample from a population with unknown parameter 0and Tis a statistic based on
the sample values. Then S.E. (T) = E() =0.
Ans.: False

6. If X, X2, .., Xn is a random sample from N (4, o) then sample mean X and
sample variance S² are not independent.
Ans.: False

7. If X, X2, .., Xas is a random sample from N (50, o²)


then the probability distribution of 5(X-50) is t with 24 d.f.
(X-X)2
24
Ans.: True
8. If X, X2 ... n is a random sample from N
(4, o²) and Y, Y.. Yn is a random sample from N
(42, o) then with
usual notations X-y
1 where, s = mean square of pooled sample follows
t-distribution with nË + n2
d.f.
Ans.: False
MODEL QUESTION PAPER
Mid Sem. Examination

Time: 1Hours
Max. Marks:30
Instructions to thecandidates.
(1) Attempt any Three questions.
(2) Figures to the right indicate full marks.
() Use of scientific
(non-programmable) calculators is allowed.
(4) Use of statistical tables is allowed.
la) Explain the scope of statistics in
engineering, technology and management. [5]
(b) State the requisites of an ideal average.
[5]
.(a) Define mean, mode and median. Make a critical
comparison between them. [5]
(b) Compute the arithmetic mean of the series a, a + d, a +
2d, .., a + (n- 1) d. [5]
3. (a) Find the missing frequency in the following table if the arithmetic mean is 26.90.
[5]
Class 10-15 15-20 20-25 25-30 30-35 35-40 40-45

Frequency 5 6 8 7 5 4

(b) Find the combined arithmetic mean from the following information. [5]

Group 1 X,= 2100 nË = 100

Group 2 n2 =200
X=1500
4. (a) Explain the terms with illustration () Sample (i) Population (ii) Frequency (iv) Less than cumulative frequency
(V) Primary of data. [5]

(b) Find the mode graphically for the following frequency distribution. (5]

Income ( in Thousand 20-30 30-40 40-50 50-60 60-70

20 58 95 62 35
Frequency
ar?-1 (5]
(a) Find the geometric mean of the following series a, ar, ar, ...
(5]
(b) State advantages of sampling.

(P.1)
MODEL QUESTION PAPER
STATISTICS (SE, AI& DS) (P.2)
End Sem. Examination
Max. Marks :70
Time: 2 Hours
Instruction to candidates:

(1) Attempt any Five questions.


(2) Figures to the ight indicate full marks.
is allowed.
(3) Use of scientific (non-programmable) calculators
(4) Use of statistical tables is allowed.

deviation. [5]
standard deviation. State any two properties of standard
1. (a) Explain the term dispersion and define
frequency distribution.
(b) Find the standard deviationof the following 40-50
10-20 20-30 30-40
Marks 0-10
32 12
10 16 30
Frequency [5]
doing the same job in a factory.
(c) The following data pertain to two workers
Worker B
Worker A
40 minutes 42 minutes
Mean job time (X)
8 minutes 6 minutes
Standard deviation (o)
[4]
Who is more consistent worker? Why ?
[4]
correlation state in properties. Discuss its applications.
2. (a) Define Karl-Pearson coefficient of
is the level of rustingof iron material.
(b) Suppose X is rainfull in suitable units and y
59 21 80
X 43 45

7 8 1 10
6

() Find corr (x, y) and interprete.


[10]
estimate amount of rusting given that the rainfall is 30 units.
(i) Find the regression line of y and on x. Also
3. (a) If X is a random variable with
probability mass function as follows
1 2 3 4

0.15 0.40 0.30 0.15


P(x)
[7]
Find E(X), Var (X), median of X.
14!
(b) Define normal probability distribution, describe its important properties.
4.50). 3
(c) Suppose X(in cm) is the length of screw manufacted. If x ’ N (4, 0.25) find P ((X| >
[51
4. (a) Define exponential probability distribution. Explain its lack of memory property and interpret.
[5
(b) Define lognormal probability distribution. If x’ LN (0, 0, 1), Find P(X*>e).
(c) Ifx’ B| find P(X + Y= 3), p(X + Y> 3), E(X + Y).
141
5. (a) Explain the terms: Statistical hypothesis, Critical region, Type I error, Power of test.
(b) Develop a test for testing Ho : H =H2 against H, : 4 H2 When the two independent samples from N(H O
N (2, o) are available.
STATISTICS (SE, AI & DS) (P.3) MODEL QUESTION PAPER
(c) Arandom
sample of 10 boys had the
following: Intelligent Quotients (1.Q.)
72. 120, 110, 101, 88, 83, 95, 88, 107,. 100
posthese data support the assumptionthat the population mean 1Q is 100 ? Use appropriate
a Define geonetric probability distribution. State its applications and statistical test. [5]
.
mention important properties. [5]
Develop likelihood ratio test to test equality of
unknown means.
variances of two groups from different normal populations with
. Define chi-square distribution. State the [6]
relationship with normal distribution. [4]
Define uniform distribution on (a, b)
7. (a) find its mean and variance.
[5]
(b) Acontinuous random variable has probability density function
f (x) = 5x 0sx<1
= 0 otherwise
Find E() and Var (x).
[5]
(c) Explainthe
terms most powerful test,
uniformaly most power test. [4]
o (a) Develop most powertul test based on a
random sample of size 10 from poisson (2) to test Ho:à = 0.1 against
H,:2= 0.2.
(b) In order to start newS.T. bus on a [5]
certain route, it is required to get the average fare of
21 days revealed the average fare 4900 5000 daily. Report on
with a standard deviation 200. Can we
bus? Justify ? recommend to start the ST
(c) If Xand Yare [5]
independent N (5, 1) and N (4, 1) respectively. Find P (X- Y < 0), P (2X + Y > 14), E (X + ),
E(2X + 3Y).
[4]
APPENDIX
PROBABILITY INTEGRAL
TABLE1:THE NORMAL
5 6
1 2 3
47608 47210 46812
0.0 S0000 49601 49292 48803 48405 48006 46414
43644 43251 42858 42465
45224 44828 44433 44038
0.1 46017 45620 39358 38974
40905 40517 40129 39743 38591
0.2 42074 41683 41294 35942 35569 35197 34827
37448 37070 36693 36317
0.3 38209 37828 31918 31561 31207
32997 32636 32276
0.4 34458 34090 33724 33360 28096 27760
29116 28774 28434
0.5 30854 30503 30153 29806 29460
25785 25463 25143 24825 24510
0.6 27425 27093 26763 26435 26109
22363 22065 21770 21476
0.7 24196 23885 23576 23270 22965 22663
20327 20045 19766 19489 19215 18943 18673
0.8 21186 20897 206; 1 16354 16109
17361 17106 16853 16602
0.9 18406 18141 17879 17619
14686 14457 14231 14007 13786
1.0 15866 15625 15386 15151 14917
12302 12100 11900 11702
1.1 13567 13350 13136 12924 12714 12507
10565 10383 10204 10027 98525
1.2 11507 11314 11123 10935 10749
88508 86915 85343 83793 82264
1.3 0.0 96800 95098 93418 91759 90123
74934 73529 72145 70781 69437 68112
14 80757 79270 77804 76359
59380 54208 57053 55917
1.5 66807 65522 64255 63008 61780 60571
50503 49471 48457 47460 46479 45514
1.6 54799 53699 52616 51551
1.7 44565 43633 42716 41815 40930 40059 39204 38364 37538
30054
36727
29379
1.8 35930 35148 34380 33625 32884 32157 31443 30742
28717 28067 27429 26803 26190 25588 24998 24419 23852 23295
1.9
2.0 22750 22216 21692 21178 20675 20182 19699 19226 18763 18309
17003 16586 16177 15778 15386 15Q03 14629 14262
2.1 17864 17429
12545 12224 11911 11604 11304 11011
2.2 13903 13553 13209 12874
2.3 10724 10444 10170 99031 96419 93867 91375 88940 86563 84242
77603 75494 73436 71428 69469 67557 65691 63872
2.4 0.02 81975 79763
62097 60366 58677 57031 55426 53861 52336 S0849 49400 47388
2.5
2.6 46612 45271 43965 42692 41453 40246 39070 37926 36811 35726
2.7 34670 33642 32641 31667 30720 29798 28901 28028 27179 26354
25551 24771 24012 23274 22557 21860 21182 20524 19884 19262
2.8
2.9 18658 18071 17502 16948 16411 15889 15382 14890 14412 13949
13062 12639 12228 11829 11442 11067 10703 10350 10008
3.0 13499
3.1 0.0 96760 93544 90426 87403 84474 81635 78885 76219 73638 71136
68714 66367 64095 61895 59765 57703 55706 53774 S1904 S0094
3.2
3.3 48342 46648 45009 43423 41889 40406 38971 37584 36243 34946
33693 32481 31311 30179 29086 28029 27009 26023 25071 24151
3.4
23263 22405 21577 20778 20006 19262 18543 17849 17180 16534
3.5
15310 14730 14171 13632 13112 12611 12128 11662 11213
3.6 15911
10780 10363 99611 95740 92010 88417 84957 &1624 78414 75324
3.7
0.04 72348 69483 66726 64072 61517 59059 56694 54418 52228 50122
3.8
48096 46148 44274 42473 40741 39076 37475 35936 34458 33037
3.9
4.0 31671 30359 29099 27888 26726 25609 24536 23507 22518 21569
19783 18944 18138 17365 16624 15912 15230 14575 13948
4.1 20658
13346 12769 12215 11685 11176 10689 10221 97736 93447 89337
4.2
0.0s 85369 81627 78015 74555, 71241 68069 65031 62123 59340 56675
4.3
42935 40980 39110 37322 35612
4.4 54125 51685 49350 47117 44979
26823 25577 24386 23249 22162
4.5 33977 32414 30920 29492 28127
19187 18283 17420 16597 15810 15060 14344 13660
4.6 21125 20133
13008 12386 11792 11226 10686 10171 96796 92113 87648 83391
4.7 50418
71779 68267 64920 61731 58693 55799 53043
4.8 0.06 79333 75465 30190
47918 45538 43272 41115 39061 37107 35247 33476 31792
4.9
& DS)
STATISTICS(SE, AI (A.2) APPENDIX
TABLE 2: DISTRIBUTION OF t
Probability
.9 7 .6 5 .4 3 .2 1 .05 .02 .01 .001
158 .325 510 727 1.000 1.376 1.963 3.078
1 6.314 12.706 31.821 63.657 636.619
.142 289 445 .617 .816 1.061 1.386 1.886 2.920 4.303 "6.965 9.925 31.598
.137 .277 424 .584 765 978 1.250 1.886 2,353 3.182 4.541 5.841 12.924
3
.134 .271 .414 .569 .741 .941 1.190 1.533 2.132 2.776 3.747 4.604 8.610
4

132 .267 408 .559 727 .920 1.156 1.476 2.015 2.571 3.365 4.032 6.869
5
.131 .265 404 .553 .718 906 1.134 1.440 1.943 2.447 3.143 3.707 5.959
6
.130 .263 402 .549 .711 .896 1.119 1.415 1.895 2.365 2.998 3.499 5.408
.130 .262 399 .546 .706 .889 1.108 1.397 1.860 2.306 2.896 3.355- 5.041
.129 .261 .398 .543 703 .883 1.100 1.383 1.833 2.262 2.821 3.250 4.781

10 .129 .260 397 .542 .700 .879 1.093 1.372 1.812 2,228 2,764 3.169 4.587

129 .260 396 .540 .697 .876 1.088 1.363 1.796 2.201 2J18 3.106 4.437
11
.128 .259 395 .539 .695 .873 1.083 1.356 1.782 2.179 2.681 3.055 4.318
12
.128 .259 .394 .538 .694 .870 1.079 1.350 1.771 2.160 2.650 3.012 4.221
13

.258 393 .537 .692 .868 1.076 1.345 1.761 2,145 2.624 2.977 4.140
14 128
.393 536 .691 .866 1.074 1.341 1.753 2.131 2.602 2.947 4.073
15 .128 .258
.535 .690 865 1.071 1.337 1.746 2.120 2.583 2.921 4.015.
16 .128 .258 392
.689 .863 1.069 1.333 1,740 2.110 2.567 2.898 3.965
17 .128 .257 392 .534
1.067 1.330 1.734 2.101 2.552 2.878 3.922
.127 .257 392 .534 .688 .862
18
1.066 1.328 1.729 2.093 2.539 2.861 3.883
.257 .391 .533 .688 .861
19 .127
1.064 1.325 1.725 2.086 2.528 2.845 3.850
.127 .257 .391 .533 .687 .860
20
1.063 1.323 1.721 2.080 2.518 2.831 3.819
.257 391 532 .686 .859
21 .127
1.321 1.717 2.074 2.508 2.819 3.792
.532 .686 .858 1.061
22 127 .256 390
1.319 1.714 2.069 2.500 2.807 3.767
.532 .685 .858 1,060
23 .127 256 390
1.318 1.711 2.064 2.492 2.797 3.745
.531 .685 .857 1.059
24 .127 .256 390
2,060 2.485 2.787 3.725
.856 1.058 1.316 1.708
25 .127 .256 .390 531 .684
2.056 2.479 2.779 3.707
.390 .531 .684 .856 1.058 1.315 1.706
26 127 256 2,771 3.690
1.314 1.703 2.052 2.473
389 .531 .684 .855 1.057
27 .127 .256 2.763 3.674
1.701 2.048 2.467
.389 530 .683 .855 1.056 1.313
28 .127 .256 2.462 2,756 3.659
1.055 1.311 1.699 2.045
389 530 .683 .854 2.75 3.646
29 127 .256 2.042 2.457
1.055 1.310 1.697
30 127 256 389 530 .683 854. 2.021 2,423 2.704 3.551
1.050 1.303 1.684
.529 .681 .851 2.000 2.390 2.660 3.460
40 .126 255 388 1.671
,848 1.046 1.296 3.373
60 387 527 .679 2.358 2.617
.126 254 1.658 1.980
.845 1041 1.289 3.291
120 386 .526 .677 1.960 2,326 2.576
126 254 1,645
1.036 1.282
.674 .842
.126 253 .385 524
STATISTICS (SE, AI & DS) (A.3)

TABLE 3: DISTRIBUTION OF y
Probability
AP ENDIK
99 0.98 .95 90 .80 70 50 30 20 10 05 02 0 .001
0.0315 0.03628 0.00393 0,0158 0.0642 0.148 0.455 1.074 1.642 2.706 3.841 5.412 6.635 10.827
0.0201 0.0404 0.103 0.211 0,446 0.713 1.386 2.408 3.219 4.605 5991 7.824 9.210 13.815
3 0.115 0.185 0.352 0.584 1.005 1.424 2.366 3.665 4,642 6.251 7,115. 9.837 11.345 16.266
4 0.297 0.429 0.711 L.064 1.649 2.195 3.357 4.878 5.989 7.779 9.488 11.668 13.277 18.467
5 0.554 0.752 1.145 1.610 2.343 3.000 4.351 6.064 7.289 9.236 11.070 13.388 15.086 20.515

6 0.872 1.134 1.635 2.204 3.070 3.828 5.348 7.231 8.558 10.645 12.592 15.033 16.812 22.457
1.239 1.564 2.167 2.833 3.822 4.671 6.346 8.383 9.803 12.017 14.057 16.622 18.475 24.322
8 1.646 2.032 2.733 3.490 4.594 5.527 7.344 9.524 11.003 13.362 15.507 18.168 20.090 26.125
2.088 2.532 3.325 4.168 5.380 6.393 8.343 10.656 12.242 14.684 16919 19.679 21.666 27.877
10 2.558 3.059 3.940 4.865 6.179 7.267 9.342 11.781 13.442 15.987 18.307 21.161 23.209 29.588
3.053 3.609 4.575 S.578 6.989 8.148 10.341 12.899 14.631 17.275 19.675 22.618 24.725 31.264
12 3.571 4.178 5.226 6.304 7.807 9.034 11.340 14.011 15.812 18.549 21.026 24.034 26.217 32.909
13 4.107 4.765 5.892 7.042 8.634 9.926 12.340 15.119 16.985 19.812 22.362 25.472 27.688 34.528
14 4.660 5.368 6.571 7.790 9.467 10.821 13.339 16.222 18.151 21.064 23.685 26.873 29.141 36.123
15 5.229 S.985 7.261 8.547 10.307 11.721 14.339 17.322 19.311 22.307 24.996 28.259 30.578 37.697
16 5.812 6.614 7.962 9.312 11.152 12.624 15.338 18.418 20.465 23.542 26.296 29.633 32.000 39.252
17 6.408 7.255 8.672 10.085 12.002 13.531 16.338 19.511 21.615 24.769 27.587 30.995 33.409 40.790
18 7.015 7.906 9.390 12.857 14,44 17.338 20.601 22.760 25.989 28.859 22.346 34.805 42.312
10.865
19 7.633 8.567 10.117 11.651 13.716 15.352 18.338 21.689 23.900 27.204 30.144 33.637 36.191 43.820
20 8.260 9.237 10.851 12.443 14.578 16.266 19.337 22.775 25.038 28.412 31.410 35.020 37.566 45.315
21 8.897 9.915 11.591 13.240 15.445 17.182 20.337 23.858 26.171 29.615 32.671 36.343 38.932 46.797
22 9.542 10.600 12.338 14.041 16.314 18.101 21.337 24.939 27.301 30.813 33.924 37.659 40.289 48.268
23 10.196 11.293 13.091 14.848 17.187 19.021 22.337 26.018 28,429 32.007 35.172 38.968 41.638 49.728
24 10.856 I1.992 13.848 15.659 18.062 19.943 23.337 27.096 29.553 33.196 36.415 40.270 42.980 51.179
25 11.524 12.697 14.611 16.473 18.940 20.867 24.337 28.J72 30.675 34.382 37.652 41.566 44.314 52.620

26 12.198 13.409 15.379 17.292 19.820 21.792 25.336 29.246 31.795 35.563 38.885 42.856 45.642 54.052
27 12.879 14.125 16.151 18.114 20.703 22.719 26.336 30.319 32.912 36.741 40.113 44.140 46.963 55.476
28 13.565 14.847 16.928 18.939 21.588 23.647 27.336 31.391 34.027 37.916 41.337 45,419 48.278 56.893
29 14.256 15.574 17.708 19.786 22.475 24.577 28.336 32,461 35.139 39.087 42.557 46.693 49.588 58.302
30 14.953 16.306 18.493 20.599 23.364 25.508 29.336 33.530 36.250 40.256 43.773 47.962 50.892 59.703
32 16.362 17.783 20.072 22.271 25.148 27.373 31.336 35.665 38.466 42.585 46.194 50.487 53.486 62.487
34 17.789 19.275 21.664 23.952 26.938 29.242 33.336 37.795 40.676 44.903 48.602 52.995 56.061 65.247
36 19.233 20.783 23.269 25.643 28.735 31.115 35.336 39.922 42.879 47.212 50.999 55.489 58.619 67.985
38 20.691 22.304 24.884 27.343 30.537 32.992 37.335 42.045 45.076 49.513 53.384 57.969 61.162 70.703
40 22.164 23.838 26.509 29.0S! 32.34S 34.872 39.335 44.165 47.269 51.805 55.759 60.435 63.691 73.402

42 23.650 25.383 28.144 30.765 34.I57 36.755 41.335 46.282 49.456 S4.090 S8.124 62.892 66.206 76.084
44 25.148 26.939 29.787 32.487 35.974 38.641 43.33S 48.396 51.639 56.369 60.481 65.337 68.710 78.750
46 26.657 28.504 31.439 34.215 37.795 40.529 45.335 50.507 53.818 58.641 62.830 67.771 71.201 81.400
48 28.177 30.080 33.098 35.949 39.621 42.420 47.335 S2.616 5S.993 60.907 65.171 70.197 73.683 84.037
29.707 31.664 34.764 37.689 41.449 44.313 49.335 54.723 58.164 63.167 67.505 72.613 76.154 86.661

52 31.246 33.256 36.437 39.433 43.281 46.209 S1.335 56.827 60.332 65422 69.832 75.021 78.616 89.272
54 32.793 34.856 38,116 41.183 45.117 48.106 $3.335 58.930 62.496 67.673 72.153 77.422 81.069 91.872
56 34.350 36.464 39.801 42.937 46.955 50.005 55.335 61.031 64.658 69.919 74.463 79.81s 83.513 94.461
58 35.913 38.078 41.492 44.696 48.797 51.906 57.335 63.129 66.816 72.160 76.778 82.201 85.950 97.039
60 37.485 39.699 43.188 46.459 50.641 53.809 59.335 65.227 68.972 74.397 79.082 84,580 83.379 99.607
62 39.063 41.327 44.889 48.226 52.487 55.714 61.335 67.322 71.125 76.630 81.381 86.953 90.802 102,166
64 40.649 42.960 46.595 49.996 54.336 57.62063.335 69.416 73.276 78.860 83.675 89.320 93.217 104.716
66 42.240 44.599 48.305 51.770 56.188 59.527 65.335 71,508 75.424 81.085 85.965 91.681 95.626 107.258
68 43.838 46.244 50.020 53.548 58.042 61,436 67.335 73.600 77.571 83.308 88.250 94.037 98.028 109.791
70 45.442
47.893 51.739 55.329 59.898 63.346 69.334 75.689 79.715 85.527 90.531 96.388 100.425 112.31I7
For odd yalues of n between 30 and 70 the mean of the tabular values for n-1 and n+l may be
taken. For larger values of n, the
expression/ 2xr-V2n-1 may be used as anormal deviate with unit variance, remembering that the probability for g² corresponds win
that of a single tail of the normal curve. (For fuller formulae, see Introduction.)
AI& DS) (A.4)
STATISTICS(SE,
APPENDIX
TABLE4: -
DISTRIBUTION - contl.
5 percent. Polntsof et

2 4 5 6 8 12 24
161.4 199.5 215.7 224.6 230.2 234.0 238.9 243.9
18.51 19.00 19.16 19.25 19.30 249.0 254.3
2 19,33 19.37 1941
10.13 9.55 9.28 9.12 9.01
1945 19.50
3 8.94 8.84 8.74
7.71 6.94 6.59 6.39 8.64 8.53
4 6.26 6.16 6.04
6.61 5.79 5.41 5.91 5.77 5.63
5 5.19 5.05 4.95 4.82 4.68 4.53 4.36
5.99 5.14 4.76 4.53 4.39 4.28 4.15
5.59 4.74 4.35 4.00 3.84 3.67
7 4.12 3.97 3.87
5.32 4.46 3.73 3.57 3.41
4.07 3.84 3.69 3.23
3.58 3.44 3.28
5.12 4.26 3.86 3.63 3.12 2.93
3,48 3.37 3.23
4.96 4.10 3.71 3.07 2.90 2.71
10 3.48 3.33 3.22 3.07 2.91 2.74 2.54
4.84 3.98 3.59 3.36 3.20 3.09 2.95
12 4.75 3.88 3.49 2.79 2.61 2.40
3.26 3.11 3.00
4.67 3.80 2.85 2.69 2.50
13 3.41 3.18 3.02 2.30
2.92 2.77 2.60
14 4.60 3.74 3.34 3.11 2.42 2.21
2.96 2.85 2.70
15 4.54 3.68 3.29 2.53 2.35 2.13
3.06 2.90 2.79 2.64 2.48 2.29 2.07
16 4.49 3.63 3.24 3.01 2.85 2.74 2.59 2,42
17 4.45 3.59 3.20 2.96 2.24 2.01
2.81 2.70 2.55 2.38
18 4.41 3.55 3.16 2.93 2.19 1.96
2.77 2.66 2.51
19 4.38 3.52 3.13 2.34 2,15 1.92
2.90 2.74 2.63
20 4.35 3.49 2.48 2.31 2.11 1.88
3.10 2.87 2.71 2.60 2.45 2.28 2.08 1.84
2 4.32 3.47 3.07 2.84 2.68 2.57
22 4.30 3.44 2.42 2.25 2.05
3.05 2.82 2.66 1.81
2.55 2.40 2.23
23 4.28 3.42 3.03 2.80 2.03 1.78
2.64 2.53 2.38
24 4.26 3.40 2.20 2.00 1.76
3.01 2.78 2.62
25 2.51 2.36 2.18 1.98
4.24 3.38 2.99 2.76 1.73
2.60 2.49 2.34 2.16 1.96 1.71
26 4.22 3.37 2.98 2.74 2.59 2.47
27 4.21 2.32 2.15 1.95 1.69
3.35 2.96 2.73 2.57
28 2.46 2.30 2.13 1.93
4.20 3.34 2.95 2.71 1.67
20 2.56 2.44 2.29 2.12
4.18 3.33 2.93 1.91 1.65
2.70 2.54 2.43 2.28
30 4.17 2.10 1.90 1.64
3.32 2.92 2.69 2.53 2.42 2.27 2.09 1.89 1.62
40
4.08 3.23 2.84
60 2,61 2.45 2.34 2.18
4.00 2.00 1,79 1.51
88120 3.15 2.76 2.52 2.37 2.25 2.10 1.92 1.70
3.92 3.07 2.68 2.45 2.29 2.17 2.02 1.83 1.61
1.39
3.84 1.25
2.99 2.60 2.37 : 2.21 2.10 1.94 1.75 1.52 1.00

Lower 5percent, points are found by interchatnge of n, and n, i.e. n,


must always oorrespond
with the greater mean square.
STATISTICS (SE, AI & DS) (A.5)

TABLE 5:F-DISTRIBUTION - contd. APPENDX


lpercent. Points of ee

2 3 5 6 8 12 24

4052 4999 S403 5625 S764 5859 5982 6106 6234 6366
2 98.50 99.00 99.17 99.25 99.30 99.33 99.37 99.42 99.46 99.50
3 34.12 30.82 29.46 28.71 28.24 27.91 27.49 27.05 26.6 26.12
21.20 18.00 16.69 15.98 15.52 15.21 14.80 14.37 13.93 13.46
16.26 13.27 12.06 11.39 10.97 10.67 10.29 9.89 9.47 9.02
6 13.74 10.,92 9.78 9.15 8.75 8.47 8.10 7.72 7.31 6.88
7 12.25 9.55 8.45 7.85 6.84 6.47 6.07 5.65
7.46 7.19
11.26 8.65 7.59 7.01 6.63 6.37 6.03 5.67 5.28 4.86
10.56 8.02 6.99 6.42 6.06 5.80 5.47 5.11 4.73 4.31
10 10.04 7.56 6.55 5.99 5.64 5.39 5.06 4.71 4.33 3.91
11 9.65 7.20 6.22 5.67 5.32 5.07 4.74 4.40 4.02 3.60
12 9.33 6.93 5.95 5.41 5.06 4.82 4.50 3.36
4.16 3.78
13 9.07 6.70 5.74 5.20 4.86 4.62 4.30 3.96 3.59 3.16
14 8.86 6.51 5.56 5.03 4.69 4.46 4.14 3.80 3.43 3.00
15 8.68 6.36 5.42 4.89 4.56 4.32 4.00 3.67 3.29 2.87
16 8.53 6.23 S.29 4.77 4.44 4.20 3.89 3.55 3.18 2.75
17 8.40 6.11 5.18 4.67 4.34 4.10 3.79 3.45 3.08 2.65
18 8.28 6.01 5.09 4.58 4.25 4.01 3.71 3.37 3.00 2.57
19 8.18 5.93 5.01 4.50 4.17 3.94 3.63 3.30 2.92 2.49
20 8.10 5.85 4.94 4.43 4.10 3.87 3.56 3.23 2.86 2.42
21 8.02 5.78 4.87 4.37 4.04 3.81 3.51 3.17 2.80 2.36
22 7.94 5.72 4.82 4.31 3.99 3.76 3.45 3.12 2.75 2.31
23 7.88 5.66 4.76 4.26 3.94 3.71 3.41 3.07 2.70 2.26
24 7.82 5.61 4.72 4.22 3.90 3.67 3.36 3.03 2.66 2.21
25 7.77 5.57 4.68 4.18 3.86 3.63 3.32 2.99 2,62 2.17
26 7.72 5.53 4.64 4.14 3.82 3.59 3.29 2.96 2.58 2.13
27 7.68 5.49 4.60 4.11 3.78 3.56 3.26 2.93 2.55 2.10
28 7.64 5.45 4.57 4.07 3.75 3.53 3.23 2.90 2.52 2.06
29 7.60 5.42 4.54 4.04 3.73 3.50 3.20 2.87 2.49 2,03
30 7.56 5.39 4.51 4.02 3.70 3.47 3.17 2.84 2.47 2.01
40 7.31 S.18 4.31 3.83 3.51 3.29 2.99 2.66 2.29 1.80
60 7.08 4.08 4.13 3.65 3.34 3.12 2.82 2.50 2,.12 L.60
120 6.85 4.70 3.95 3.48 3.17 2.96 2.66 2.34 1.95 138
6.64 4.60 3.78 3.32 3.02 2.80 2.51 2.18 1.79 1.00

Lower I percent, points are found by interchange of n, and n, ie. n, must always correspond
with the greater mean square.

You might also like