Iii - 4 - Statistical Methods and Its Applications - II
Iii - 4 - Statistical Methods and Its Applications - II
1
PERIYAR UNIVERSITY
CENTRE FOR DISTANCE AND ONLINE EDUCATION
(CDOE)
Prepared by:
Centre for Distance and Online Education (CDOE)
Periyar University, Salem – 11.
Unit-V: Test of Significance (Small Samples Tests): Small sample tests with
regard to Mean, Difference between Means and Paired t- test, F-test -
Definition of Chi-square test – Assumptions – Characteristics – Chi-square
tests for Goodness of fit and Independence of attributes – Simple
Problems.
Unit Objectives
variable.
infinite values of X ; that the range space R contains a finite or countably infinite points.
A random variable is said to be continuous if it can be assuming all points on the real
line.
variable then, F x p x dx.
i
1. F 0 and F 1.
2. F x is non-decreasing function
Let Us Sum Up
Learners, in this section we have seen that definition of random variable, discrete
and continuous random variables cumulative distribution function and its properties of
distribution function.
consider random experiment with more than one characteristic. For example, we study
the heights and weights of some persons resulting in the outcomes (H, W). here the
outcome has two characteristics height and weight. This can regard as a two-
Let S be the sample space associate with a random experiment E. Let X = X(s),
Y = Y(s) be two functions, each assigning a real number to every outcome s S. Then
1.2.2 EXAMPLE
obtained in the first toss and Y as the number of heads observed in the second toss.
Then,
The sample space of the two-dimensional random variable (X, Y) is {(1,1), (1,0), (0,1),
(0,0)}. If the values of (X, Y) are finite or countably infinite then (X, Y) is called a two-
dimensional continuous random variable. If (X, Y) assume all values in some region in
s1 ( H , H ), s2 ( H , T ), s3 ( H , T ), s4 (T , T )
1
p ( s1 ) p ( s2 ) p ( s3 ) p ( s4 )
4
Define X as the number of heads in two throws and Y as the number of tails in two
associated with each ( xi , y j ) ,i,j = 1,2, 3,….Then p is called the joint probability function
(ii) p( x , y ) =1.
i j
i j
ii.
f(x,y)dxdy =1
Let Us Sum Up
Learners, in this section we have seen the definition of two-dimensional random
variable and its examples.
A. pq+(1-p)(1-q)
B. pq
C. p(1-q)
D.1-pq
1.3.2 DEFINITION
If (X, Y) is a two dimensional continuous random variable with pdf of f then the marginal
distribution of X is defined by g(x) =
f(x,y)dy and the marginal distribution of Y is
defined by
h(y) =
f(x,y)dx
Note: p (a x b) p (a x b, y )
b
=
a
f(x,y)dxdy
b
=
a
g(x)dx
d
Similarly, p(c y d ) =
a
h(y)dy
Let Us Sum Up
Learners, in this section we have seen about definition of marginal probability
distribution.
A. Bravais distribution
C. Gaussian distribution
P( X i ,Y j )
p( xi / y j ) p( X xi / Y y j ) if q( y j )>0.
q( y j )
1.4.2 DEFINITION
Let (X, Y) is a two dimensional continuous random variable with joint probability
function f(x,y). Then the conditional probability density function of X gives that Y = y is
f ( x, y)
given that X = x is defined by h(y/x) = if g(x) >0.
g ( x)
Let Us Sum Up
Learner, in this we have seen that definition of conditional probability distribution.
A. E(XY)=E(X) E(Y/x)
B. E(XY)=E(Y) E(Y/x)
C. E(XY)=X E(Y/x)
D. E(XY)=E[X E(Y/x)]
p( xi ) . q( y j ).
1.5.2 DEFINITION
Let (X, Y) is a two dimensional continuous random variable with joint probability
function f(x,y). Let g(x) and h(y) be the marginal probability density functions of the
random variables X and Y respectively. Then the random variables X and Y are said to
1.5.3 EXAMPLE
y/x 1 2 3 4
4 2 5 1
1
36 36 36 36
1 3 1 2
2
36 36 36 36
3 3 1 1
3
36 36 36 36
2 1 1 5
4
36 36 36 36
Find (a) The marginal probability functions of X and Y
P( xi ) = j
p( xi , yi ) q( y j ) = j
p( xi , yi )
P(1) = p(1,1) + p(1,2) + p(1,3) + p(1,4) q(1) = p(1,1) + p(2,1) + p(3,1) + p(4,1)
4 1 3 2 10 4 2 5 1 12
P(1) = = q(1)= =
36 36 36 36
2 3 3 1 9 1 3 1 2 7
P(2) = = q(2)= =
36 36 36 36
5 111 8 3 3 11 8
P(3) = = q(3)= =
36 36 36 36
1 2 1 5 9 2 11 5 9
P(4) = = q(4)= =
36 36 36 36
2
9 fory 1
1 fory 2
3
1
q ( x / y1 ) fory 3
6
1
fory 4
9
0otherwise
f ( x, y)
P(x/y) = if q(y)>0
q( y )
2
p (2,1) 1
p 2 / y 1 36
q(1) 12 6
36
5
p (3,1) 5
p 3 /1 36
q (1) 12 12
36
1
p(4,1) 1
p 4 /1 36
q(1) 12 12
36
1
3 forx 1
1 forx 2
6
5
p( x / y1 ) forx 3
12
1
forx 4
12
0otherwise
Let Us Sum Up
Learners, in this section we have seen definition of independent random variable
and examples.
A. E[X+Y]=E[X]+E[Y]
B. Var(X+Y)=Var(X)+Var(Y)
C. Cov(X,Y)=0
D. E[X.Y]=E[X].E[Y]
A. 1
B. -1
C. 0
expected value of the random variable. Before, going to the definition let us consider an
example. Suppose we get, from someone, one rupee per dot that appears when a
perfect die is tossed. If this experiment is repeated 6,000 times (a large number of
items) we would obtain approximately 1,000 1’s, 1000 2’s etc. Therefore, total amount
1 1 1
= Rs. 1. 2. .... 6.
6 6 6
= Rs.3.5
Which is the summation of the amounts associated with the outcomes multiplied by the
say the average or mean value of x is 3.5. Here we have to note that the expected value
1.6.2 DEFINITION
Let X be a discrete random variable with possible values x1 x1, x2, ,.... xn ,..... Let,
continuous random variable with pdf, f(x) then E ( X ) xf ( x)dx , then
if the integral is
absolutely convergent.
Note 1: Let us note the similarity between the expected value (when x assumes finite
number of values.) and the notion of average of a set of numbers x1, x2, ,....xn .We usually
n
xi
denote the average of above set of numbers X i . Suppose a sample of n1
i 1 n
x2 and so on.
Then _ n x n x ....n x
x 1 1n 2n 2....n k k
1 2 k
xi ni
k
i 1 n
k
xi fi
i 1
ni
Letting fi
n
If n is sufficiently large,
fi p( xi )
_ k
x xi p( xi )
i 1
= E(X)
concept centre of mass’ in mechanics. If a unit mass is distributes along a line at the
k
discrete points x1 , x2 ,....xn ,...... and p( xi ) is the mass at the point xi then x p( x )
i 1
i i
represents the centre of mass about origin. Similarly if a unit mass is distributed
continuously over a line and if f(x) represents mass density at x, then xf ( x)dx
i 1
may
again be interpreted as the centre of mass. In this sense E(x) represents a centre of the
1.6.3 EXAMPLE
Suppose a random variable X takes the value 0!, 1!, 2! ,….. with probabilities
1
P( X xi ) e
xi !
e1[1 1 1 .....]
1.6.4 EXAMPLE
1
A Cauchy distribution is defined by the pdf f ( x) , x
(1 x 2 ) for
Then E( X ) xf ( x)dx
1
log(1 x2 ) Which does not exist.
2
Let Us Sum Up
Learners, in this section we have seen definition of mathematical
expectations and examples.
Check Your Progress
1. If X and Y are independent random variables, what is the expected
A. E[X] E[Y]
B. E[X+Y]
C. E[X]+E[Y]
D. Cov(X,Y)
A. The median
B. The mode
C. The mean
D. The variance
knowing the probability distribution of Y. Let us now see these two approaches.
1.7.1 DEFINITION
possible values y1 , y2 ,..... , and q( y1 ), q( y2 )..... are their probabilities, the expectation of Y
is defined as E (Y ) yi q ( yi ) provided the series on R.H.S is absolutely convergent. If
i 1
E ( y) yq( y)dy
If it exists.
The following theorem gives the method on finding E(y) without the knowledge of the
probability distribution of Y.
1.7.2 THEOREM
E ( y ) H ( xi ) p ( xi )
j 1
(i) E (3 X 1) (3x 1) p( x)
=1.9
(ii ) E ( X 2 ) x 2 p( x)
=1(0.2)+0(0.3)+1(0.5)
=0.7
So far, we have seen the expectation of a random variable and the expectation of a
function of a random variable of one dimension. These concepts can be extended for
higher dimensional random variables. Let us now give the definition in the case of a
Then (X, Y) is a two-dimensional random variable. Let us now give the define Z= sum of
the numbers in two throws of the die. Clearly Z is a function of (X, Y) (i.e,) Z=H(X,Y). Let
variable.
Let (X, Y) be a two dimensional random variable and Z=H(X, Y) be a real valued
function of (X, Y). Then Z is a one-dimensional random variable and E(Z) is defined as
follows.
p( zi ) P(Z zi ) then E ( Z ) zi p ( zi ) if the series on R.H.S is absolutely
i 1
convergent.
(ii) If Z is a continuous random variable with pdf f(z) then E ( z ) zf ( z)dz
if the
We shall now state the theorem (without proof), with the help of which we can
1.7.5 THEOREM
p( xi , yi ) then E ( z ) H ( xi , yi ) p( xi , yi ) if it exists.
j i
1.7.6 EXAMPLE
Three urns contain respectively 3 green and 2 white balls, 5 green and 6 white
balls and 2 green and 4 white balls. One ball is drawn from each urn. Find the
Solution: Let X be a random variable denoting the number of white balls drawn. Then
G W
U1 3 2
U2 5 6
U3 2 6
3 5 2 30
P( x 0)
5 11 6 330
2 5 2 3 6 2 3 5 4
p( x 1)
5 11 6 5 11 6 5 11 6
20 36 60 116
330 330
24 40 72 136
p( x 2)
330 330
48
p( x 3)
330
30 116 136 48
E( X ) 0 1 2 3
330 330 330 330
Show that the expected number p failures preceding the first success in a series of
1 P
Bernoulli trials with a constant probability of success p is .
P
Let Us Sum Up
Learners, this section discusses the definition of function of a random variable and
given example.
A. E[g(X)]
B. G(E[x])
C. E[x]
D. G(E[g(X)])
A. a2 Var(X)
B. Var(X)
C. aVar(X)+b
D. Var(x)+b2
E(X) Expectation of X
Var(X) Variance of X
Cov(X,Y) Covariance of X and Y
E(etx) Characteristic function of X
y/x 1 2 3 4
4 2 5 1
1
36 36 36 36
1 3 1 2
2
36 36 36 36
3 3 1 1
3
36 36 36 36
2 1 1 5
4
36 36 36 36
(b) The conditional probability density functions of X when Y = 1 and that Y when X =2.
Values of X -1 0 1
4. Three urns contain respectively 3 green and 2 white balls, 5 green and 6 white balls
and 2 green and 4 white balls. One ball is drawn from each urn. Find the expected
1.11 EXERCISES
1. If a bag containing 40 blue marbles and 60 red marbles. Choose 10 marbles (without
replacement) at random. Let X be the number of blue marbles and Y be the number
oranges were bad. Is it reasonable to assume that 20% of the oranges were bad?
3. A die is thrown 9000 times and throw of 3 or 4 observed 3240 times. Show that the die
4. The following table gives the number of aircraft accidents that occurred during the various
days of the week. Test whether the accidents are uniformly distributed over the week.
No. of Accidents: 14 18 12 11 15 14
N follows Poisson distribution. Assume that each customer purchases a drink with
probability p, independently from other customers, and independently from the values
of N. Let X be the number of customers that do not purchase drinks: so, X+Y=N.
6. I toss a coin twice. Let X be the number of observed heads. Find the CDF of X.
7. Let X be a discrete random variable with range R2 = {1,2,3,…} .Suppose the PMF of
X is given by
0 otherwise
1. Gupta S. P. (2001), Statistical Methods, Sultan Chand and Sons, New Delhi.
2. Gupta. S. C. and Kapoor. V. K. Fundamentals of Applied Statistics, Sultan Chand and Sons,
New Delhi
3. Pillai R. S. N. And Bagavathi. V. (2005), Statistics, S. Chand and Company Ltd., New Delhi.
4. Sancheti D. C. And Kapoor. V. K (2005), Statistics (7th Edition), Sultan Chand & Sons, New
Delhi.
5. Arora P. N, Comprehensive Statistical Methods, Sultan Chand & Sons, New Delhi.
6. Murthy M. N (1978), Sampling Theory and Methods, Statistical Publishing Society, Kolkata.
7. Pillai R. S. N. And Bagavathi. V. (1987), Practical Statistics, S. Chand & Company Ltd., New
Delhi.
8. Agarwal B. L, Basic Statistics, Wiley Eastern Ltd., Publishers, New Delhi.
9. Gupta C. B (1978), An Introduction to Statistical Methods, Vikas Publishing House, New
Delhi.
10. Snedecor G.W and Cochran W.G., Statistical Methods, Oxford Press and IBH.
Unit objectives
2. To introduce the definition of Binomial distribution and its mean and variance.
with its own Boolean - valued outcome: success (with probability p) or failure (with
single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial
size n drawn with replacement from a population of size N. If the sampling is carried out
without replacement, the draws are not independent and so the resulting distribution is
a hypergeometric distribution, not a binomial one. However, for N much larger than n,
the binomial distribution remains a good approximation, and is widely used. We are
1. There are only two possible outcomes for each trial (arbitrarily called “success” and
2. The probability of success is the same for each trial. The outcomes from different
Let X be the random variable that equals the number of successes in n trials. To obtain
of success and failure on any one trial, then the probability of getting x successes and
(n−x) failures, in some specific order, is px(1− p)n−x. Clearly, in this product of p’s and
(1−p)’s there is one factor p for each success, one factor 1−p for each failure. The x
factors p and n−x factors 1−p are all multiplied together by virtue of the generalized
multiplication rule for more than two independent events. Since this probability applies
to any point of the sample space that represents x successes and (n−x) failures (in any
specific order), we have only to count how many points of this kind there are, and then
This probability distribution is called the binomial distribution because for x= 0, 1,2, . . . ,
and n, the values of the probabilities are the successive terms of the binomial expansion
of [p+(1− p)]n. For the same reason, the combinatorial quantities C(n, x) are referred to
as binomial coefficients.
2.1.3 EXAMPLE
It has been claimed that in 60% of all solar-heat installations the utility bill is
reduced by at least one-third. Accordingly, what are the probabilities that the utility
Solution: (a) Substituting x = 4, n = 5, and p = 0.60 into the formula for the binomial
distribution, we get
= 0.259;
(b) Substituting x = 5, n = 5, and p = 0.60 into the formula for the binomial distribution,
we get
and the answer is b(4; 5, 0.60) + b(5; 5, 0.60) = 0.259 + 0.078 = 0.337.
2.1.4 EXAMPLE
Let X have the binomial distribution with probability distribution
𝑛
𝑏 (𝑥|𝑛, 𝑝) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 𝑓𝑜𝑟 𝑥 = 0, 1, . . . . , 𝑛.
𝑥
Show that
(𝑀′ (𝑡)) = 𝑛 𝑃 𝑒 𝑡 (𝑝 𝑒 𝑡 + 1 − 𝑃 )
E (X) = np
E (X2) = (n – 1) n P2 + np
If x is a B(𝑛1 , P) and Y is B (n2,p) and they are independent then their sum X + Y
= (𝑞 + 𝑝 𝑒 𝑡 )𝑛1 × (𝑞 + 𝑝 𝑒 𝑡 )𝑛2
= (𝑞 + 𝑝 𝑒 𝑡 )𝑛1 +𝑛2
= mgf of B (𝑛1 + 𝑛2 , 𝑝)
Therefore,
𝑋 + 𝑌~𝐵 (𝑛1 + 𝑛2 , 𝑝)
𝑑𝜇
𝜇𝑟+1 = 𝑝𝑞 [𝑛𝑟𝜇𝑟−1 + ]
𝑑𝑝
Proof : We have
𝜇𝑟 = 𝐸[ 𝑋 − 𝐸(𝑋)]𝑟
= 𝐸 (𝑋 − 𝑛𝑝)𝑟
𝑛
= ∑ (𝑥 − 𝑛𝑝)𝑟 𝑛 𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥
0
Therefore,
𝑑𝜇 𝑛
= ∑ [𝑛 𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥 𝑟 (𝑥 − 𝑛𝑝)𝑟−1 (−𝑛) +
𝑑𝑝 0
Therefore,
𝑑𝜇
𝜇𝑟+1 = 𝑝𝑞 [𝑛𝑟 𝜇𝑟−1 + ]
𝑑𝑝
Proof:
We have,
𝑛
𝐵(𝑥; 𝑛, 𝑝) = 𝐶𝑥𝑝 𝑥 𝑞 𝑛−𝑥
𝑛
𝐵(𝑥 + 1; 𝑛, 𝑝) = 𝐶𝑥+1 𝑝 𝑥+1 𝑞 𝑛−(𝑥+1)
𝑛
𝐵(𝑥 + 1; 𝑛, 𝑝) 𝐶𝑥+1 𝑝 𝑥+1 𝑞 𝑛−𝑥−1
= 𝑛 𝐶 𝑝 𝑥 𝑞 𝑛−𝑥
𝐵(𝑥; 𝑛, 𝑝) 𝑥
𝑛! 𝑥! (𝑛 − 𝑥)! 𝑝
= × ×
(𝑥 + 1)! (𝑛 − 𝑥 − 1)! 𝑛 𝑞
(𝑛 − 𝑥) 𝑝
=
𝑥+1 𝑞
There fore,
(𝑛 − 𝑥) 𝑝
𝐵(𝑥 + 1; 𝑛, 𝑝) = 𝐵(𝑥; 𝑛, 𝑝)
𝑥+1 𝑞
= 𝑛(𝑛 − 1)p2 + 𝑛𝑝
𝑛!
− ∑𝑛𝑥=0 2𝑥 𝑝 𝑥 𝑞 𝑛−𝑥
𝑥! (𝑛 − 𝑥)!
(𝑛 − 3)!
= n(n − 1)(n − 2)p3 ∑𝑛𝑥=0 [ p𝑥−3 + qn−x ] + 3E(X 2 ) − 2E(X)
(𝑥 − 3)! (𝑛 − 𝑥)!
Similarly,
E(X 4 ) = 𝑛(𝑛 − 1)(𝑛 − 2)(𝑛 − 3)𝑝4 + 3[𝑛(𝑛 − 1)(𝑛 − 2)p3 ] + 7𝑛(𝑛 − 1)𝑝2 + 𝑛𝑝
(𝑞 − 𝑝)2
𝛽1 = 𝜇32 =
𝑛𝑝𝑞
𝑞−𝑝
𝛾1 = √𝛽1 =
√𝑛𝑝𝑞
𝜇4 1 − 6𝑝𝑞
𝛽2 = 2 = 3+
𝜇2 𝑛𝑝𝑞
The mgf,
𝑀𝑥 (𝑡) = 𝐸(𝑒 tx )
= ∑𝑥 𝑒 𝑡𝑥 P(X = 𝑥)
(q + pet )n
A set of three similar coins are tossed 100 times with the following results
Number of heads 0 1 2 3
Frequency 36 40 22 2
Solution:
(i) Mean= x
fx 90 0.9
f 100
x 0.9
(ii) p 0.3
n 3
nx p
(vii) F ( x 1) F ( x)
x 1 q
3 0 0.3
F (1) F (0 1) 34.3 44.247
0 1 0.7
3 1 0.3
F (2) F (1 1) 44.247 19.03
1 1 0.7
x 0 1 2 3 4
Observed frequency 36 40 22 2 100
(Oi)
Expected frequency 34 44 19 3 100
(Ei)
Let Us Sum Up
Learners, in this section we have seen that definition of Binomial distribution,
additive property, recurrence relation of Binomial distribution, mean and variance and
2. If for a binomial distribution, b(n,p) , n=4 and also P(X=2)=3P(X=3), the value of
p is:
A. 9/11
B. 1
C. 1/3
D. None of the above.
The Poisson distribution often serves as a model for counts which do not have a
natural upper bound. The Poisson distribution, with mean λ (lambda), has probabilities
given by;
𝜆𝑥
𝑓(𝑥) = 𝑒 −𝜆 for 𝑥 = 0,1, … , ∞
𝑥!
Show that
𝑡 −1)
(a) 𝑀(𝑡) = 𝑒 𝜆(𝑒 for all 𝑡
Solution:
𝑦𝑘
where we have used the series 𝑒 𝑦 = ∑∞
𝑘=0 𝑘!
Additive property of Poisson distribution: Let 𝑋1 and 𝑋2 be two independent Poisson random
with parameter 𝜆1 + 𝜆2 .
Proof:
𝑡
X1 ∼ P(𝜆1 ) ⇒ Mx1 (t) = 𝑒 𝜆1(e −1)
𝑡 −1)
X2 ∼ P(𝜆2 ) ⇒ Mx2 (t) = 𝑒 𝜆2(e
Thus,
𝑋 = 𝑋1 + 𝑋2 ∼ 𝑃(𝜆1 + 𝜆2 )
x f fx
0 211 0
1 90 90
2 19 38
3 5 15
4 0 0
Total 325 143
(i) Mean x
fx 145 0.44
f 325
(ii) x 0.44
e x e0.44 (0.44) x
(iii) P( X x)
x! x!
e0.44 (0.44)0
(iv) P(0) e0.44 0.6440
0! (from the poisson table)
(vi) F ( x 1) F ( x)
x 1
0.44
F (1) F (0 1) 209.43 92.15
0 1
0.44
F (2) F (1 1) 92.15 20.27
11
0.44
F (3) F (2 1) 2.972
2 1
0.44
F (4) F (3 1) 2.97 0.33
3 1
e0.44 0.44 x
(1) Fit Poisson distribution is P(X=x) = , x 0,1, 2,...
x!
x 0 1 2 3 4 Total
Observed Frequencies (Oi) 211 90 19 5 0 325
Expected Frequencies (Ei) 210 92 20 3 0 325
Let Us Sum Up
Learners, in this section we have seen the definition of Poisson distribution, mean and
A. Binomial distribution
B. Gamma distribution
C. Normal distribution
D. Poisson distribution
2. What If X and Y are two Poisson variates such X follows P(1) and Y follows P(2)
A. e-3
B. 3e-3
C. 4e-3
D. 8.5e-3
Binomial distribution, Poisson distribution, Mean and variance of Binomial and Poisson
2.4 Glossary
KEYWORDS MEANING
2. A machine produces 10 percent defective item. Ten items are selected at random. Find the
𝑛
𝑏 (𝑥|𝑛, 𝑝) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 𝑓𝑜𝑟 𝑥 = 0, 1, . . . . , 𝑛.
𝑥
Show that
4. A set of three similar coins are tossed 100 times with the following results
Number of heads 0 1 2 3
Frequency 36 40 22 2
Fit a binomial distribution and estimate the expected frequencies.
2.6 EXERCISES
1. A coin that is fair in nature is tossed n number of times. The probability of the occurrence of
a head six times is the same as the probability that a head comes 8 times, the find the
value of n.
2. The probability that a person can achieve a target is 3/4. The count of tries is 5. What is the
3. There are four fused bulbs in a lot of 10 good bulbs. If three bulbs are drawn at random with
replacement, find the probability of distribution of the number of fused bulbs drawn.
answer so that the probability of guessing the correct answer at least once is greater than
2/3?
5. A set of three similar coins are tossed 100 times with the following results.
Number of heads: 0 1 2 3
Frequency : 36 40 22 2
7. As only 3 students came to attend the class today, find the probability for exactly 4
2/min. Calculate the probability that exactly two calls will be received during each of
1. Gupta S. P. (2001), Statistical Methods, Sultan Chand & Sons, New Delhi.
3. Pillai R. S. N. And Bagavathi. V. (2005), Statistics, S. Chand and Company Ltd., New
Delhi.
4. Pillai R. S. N. And Bagavathi. V. (1987), Practical Statistics, S. Chand & Company Ltd.,
New Delhi.
Delhi.
Unit objectives
4. To introduce the concepts of fitting of straight line and second-degree parabola and
simple problems.
The normal distribution is the most widely known and used of all distributions.
2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the
mean.
3. It is a continuous distribution.
54 Periyar University – CDOE| Self-Learning Material
4. It is a family of curves, i.e., every unique pair of mean and standard deviation
described by two parameters: mean and standard deviation. See the following
figure.
5. Total area under the curve sums to 1, i.e., the area of the distribution on each side
7. The probability that a random variable will have a value between any two points is
3.1.2 - EXAMPLE
To solve: for positive values of a, look up and report the value for
It has been claimed that in 60% of all solar-heat installations the utility bill is
reduced by at least one-third. Accordingly, what are the probabilities that the utility bill
Solution: (a) Substituting x = 4, n = 5, and p = 0.60 into the formula for the binomial
distribution, we get
= 0.259;
(b) Substituting x = 5, n = 5, and p = 0.60 into the formula for the binomial distribution,
we get
= 0.078
and the answer is b(4; 5, 0.60) + b(5; 5, 0.60) = 0.259 + 0.078 = 0.337.
variable and Y the dependent variable. The general problem in curve fitting is to find, if
possible, an analytic expression of the form y = f(x), for the functional relationship
3.1.6 EXAMPLE
Fitting a straight-line using Least square method for the following data.
x 5 4 3 2 1
y 1 2 3 4 5
Solution:
Method-1 of solution:
∑y=an+b∑x
x y x2 xy
5 1 25 5
4 2 16 8
3 3 9 9
2 4 4 8
1 5 1 5
--- --- --- ---
∑x=15 ∑y=15 ∑x2=55 ∑x⋅y=35
5a+15b=15
15a+55b=35
5a+15b = 15
5(a+3b) = 5⋅3
a+3b = 3
and 15a+55b = 35
5(3a+11b) = 5⋅7
3a+11b = 7
a+3b = 3→ (1)
3a+11b = 7→(2)
Equation (2)×1⇒3a+11b = 7
Subtracting ⇒-2b = 2
⇒2b = -2
⇒b = -22
⇒b = -11
⇒b = -1
a+3(-1)=3
⇒a=3+3
⇒a=6
y=6-x
Let Us Sum Up
characteristics of normal distribution, fitting of normal distribution, curve fitting and its
types.
A. 5 Q.D=4 S.D
B. 4 Q.D=5 S.D
C. 2 Q.D=3 S.D
2. For a normal curve the Q.D, M.D, and S.D are in the ratio
A. 5:6:7
B. 10:12:15
C. 2:3:4
3.2.1 DEFINITION
∑y=an+b∑x+c∑x2
∑xy=a∑x+b∑x2+c∑x3
∑x2y=a∑x2+b∑x3+c∑x4
3.2.2 EXAMPLE
Calculate Fitting second degree parabola - Curve fitting using Least square
method.
X 2 2 3 4 5 6 7
Y -5 -2 5 16 31 50 73
x y x2 x3 x4 x⋅y x2y
1 -5 1 1 1 -5 -5
2 -2 4 8 16 -4 -8
3 5 9 27 81 15 45
4 16 16 64 256 64 256
5 31 25 125 625 155 775
6 50 36 216 1296 300 1800
7 73 49 343 2401 511 3577
--- --- --- --- --- --- ---
∑x=28 ∑y=168 ∑x2=140 ∑x3=784 ∑x4=4676 ∑x⋅y=1036 ∑x2y=6440
7a+28b+140c=168
28a+140b+784c=1036
140a+784b+4676c=6440
7a+28b+140c=168→(1)
28a+140b+784c=1036→(2)
140a+784b+4676c=6440→(3)
Select the equations (1) and (2), and eliminate the variable a.
Select the equations (1) and (3), and eliminate the variable a.
Select the equations (4) and (5), and eliminate the variable b.
From (6)
84c=168
⇒c=16884=2
From (4)
-28b-224c=-364
⇒-28b-224(2)=-364
⇒-28b-448=-364
⇒-28b=-364+448=84
⇒b=84-28=-3
rom (1)
7a+28b+140c=168
⇒7a+28(-3)+140(2)=168
⇒7a+196=168
⇒7a=168-196=-28
a=-4,b=-3,c=2
y=-4-3x+2x2
Let Us Sum Up
Learners, in this section we have seen the definition of second degree parabola
1. What is the equation of the parabola with vertex (0,0) and focus (0,2)?
A. Y2=8x
B. X2=8y
C. Y2=4x
D. X2=4y
2. What is the vertex of the parabola given by the equation y=2(x-3)2 +4?
A. (3,4)
B. (-3,-4)
C. (3,-4)
D. (-3,4)
3.2.1 DEFINITION
log10y=log10(aebx)
log10y=log10a+log10ebx
log10y=log10a+bxlog10e
which linear in Y, x
∑Y = nA+B∑x
∑xY=A∑x+B∑x2
3.2.2 EXAMPLE
Fitting a exponential equation (y=aebx) - Curve fitting using Least square method.
Solution:
log10y=log10a+bxlog10e
∑Y=nA+B∑x
∑xY=A∑x+B∑x2
x y Y=log10(y) x2 x⋅Y
0 0.1 -1 0 0
0.5 0.45 -0.3468 0.25 -0.1734
1 2.15 0.3324 1 0.3324
1.5 9.15 0.9614 2.25 1.4421
2 40.35 1.6058 4 3.2117
2.5 180.75 2.2571 6.25 5.6427
--- --- --- --- ---
∑x=7.5 ∑y=232.95 ∑Y=3.81 ∑x2=13.75 ∑x⋅Y=10.4556
6A+7.5B=3.81
7.5A+13.75B=10.4556
we obtain A=-0.9916,B=1.3013
∴a=antilog10(A)=antilog10(-0.9916)=0.102
and b=Blog10(e)=1.30130.4343=2.9963
y=0.102e2.9963x
normal distribution, curve fitting, fitting of straight line and second degree parabola with
simple problems.
KEYWORDS MEANING
N Signifies that the distribution is normal.
Mean
2 Standard deviation
Y = a+bx Straight line
Exp(m) Exponential distribution
3. It has been claimed that in 60% of all solar-heat installations the utility bill is
reduced by at least one-third. Accordingly, what are the probabilities that the
2. Fit a straight line using Least square method for the following data.
X 5 4 3 2 1
Y 1 2 3 4 5
X 1 2 3 4 5 6 7
Y -5 -2 5 16 31 50 73
3.7 EXERCISES
1. For some computers, the time period between charges of the battery is normally
has one of these computers and needs to know the probability that the time period
2. The speeds of cars are measured using a radar unit, on a motor way. The speeds
km/hr . what is the probability that a car selected at chance is moving at more than
100 km/hr?
distribution with a mean of 266 days and a standard deviation of 16 days. What
proportion of all pregnancies will last between 240 and 270 days (roughly between 8
and 9 months)?
X 1 2 3 4 5 6 7
Y -5 -2 5 16 31 50 73
7. The mileage which car owners get with a certain kind of radial tire is a random
variable having an exponential distribution with mean 40,000 km. Find the
probabilities that one of these tires will last (i) at least 20,000 km and (ii) at most
30,000 km.
8. The length of time a person speaks over phone follows exponential distribution with
mean 6. What is the probability that the person will talk for (i) more than 8 minutes
1. D.3Q.D-2S.D
Module 1
2. A. 10:12:15
1. D.x2=4y
Module 2
2. A.(3,4)
1. Gupta S. P. (2001), Statistical Methods, Sultan Chand & Sons, New Delhi.
2. Gupta. S. C. and Kapoor. V. K. Fundamentals of Applied Statistics, Sultan Chand and Sons,
New Delhi
Unit objectives
4. To introduce the concepts of large sample tests for proportion and difference of
proportions.
4.1.1 POPULATION
4.1.3 - PARAMETER
The statistical constants about population are called parameters. e.g. Population
4.1.4 - STATISTIC
The statistical constants about sample are called statistic. e.g. Sample mean, sample
variance etc.
Let Us Sum Up
1. A population consists of all the items which are physically present is called:
A. Hypothetical population
B. Real population
C. Infinite population
4.2.1 TYPES
Null Hypothesis: A hypothesis which is tested for possible rejection under the
the null hypothesis H0 is called as critical region. The region complement to critical
4.2.3 ERRORS
Type I error: Rejecting the hypothesis H0 when it is true is called Type I error.
Type II error: Accepting the Hypothesis H0 when it is wrong is called type II error.
denoted by α.
When the sample size is less than 30, then the sample is said to be small sample.
Step 3: Choose the level of significance and state the degrees of freedom.
Let Us Sum Up
Learners, in this section we have seen the definition of hypothesis, types, critical
region, errors, level of significance, degrees of freedom, small sample test and its test
procedure.
A. Formula errors
B. Planning error
A. Simple hypothesis
C. Null hypothesis
Let x1, x2, …xn be a random sample from a population with mean µ, then we want
to test whether there is any significant different between population mean and sample
mean or not.
x
The test statistic is given by t =
s
n 1
4.3.2 PROBLEM
through a large no. of retail shops. Before a heavy add campaign the mean sales per
week per shop was 140 dozen. After the campaign, a sample of 26 shops was taken
and the mean sales was found to be 147 dozen with a standard deviation of 16 dozen,
n = 26, x = 147, s = 16
Reject H0 if t>1.708
x
t =
s
n 1
Conclusion : Since the calculated value is greater than tabulated value, so we may
Suppose we want to test if two independent samples x1, x2, …xn1 and y1, y2,… yn2 of
sizes n1 and n2 have been drawn from two normal populations with population mean
μ1and μ2 respectively. Under the Null hypothesis H0: μ1= μ2(There is no significant
4.3.4 - PROBLEM
Samples of two types of electric bulbs were tested for length of life and following
Type - I Type - II
Sample No. 8 7
Sample Mean 1234hours 1046 hours
Sample S.D. 36hours 40 hours
Is the difference between in the means sufficient to warrant that type I is superior to
Sol:
We are given n1 = 8, x1 = 1234hrs, s1 = 36hrs
n2 = 7. x2 = 1036hrs, s2 = 40hrs
x y
t =
1 1
S ( )
n1 n2
n1s12 n2 s22
S=
n1 n2 2
= 40.73
since the calculated value of t is much greater than table value, hence H 0 rejected. So,
Let Us Sum Up
Learners, in this section we have seen the test for population mean ,test
1. When performing a one sample-t test for the population mean, which
A. Normal distribution
B. t-distribution
C. Chi-square distribution
D. F-distribution
2. Which test is appropriate when the population standard deviation is known and
C. Paired t-test
D. Chi-square test
period before add and Y1, y2,… yn are the corresponding sales of the same product for
the same period after the add. Now we want to test the significant difference between
Let the Null hypothesis H0: μ1= μ2 (There is no significant different between the sales
d
t= follows t distribution with (n-1) df.
s
n
4.4.2 PROBLEM
An I.Q test was conducted to 5 persons before and after they were trained, the results
Before training
I.Q 110 120 123 132 125
After training
120 118 125 136 121
Test whether there is any change in IQ after the training programme at 5% level.
Solution:
df = 3.747. We may reject H0 if t< -3.747. Assuming H0 is true, the test statistic is given
by
d
t=
s
n
x y d =x-y d2
110 120 -10 100
120 118 2 4
123 125 -2 4
132 136 -4 16
125 121 4 16
d =-10 d 2
=140
= -2
S2 = 5.47
t = 0.82
Conclusion: Since the calculated value is less than the table value. So, we accept the
Let Us Sum Up
Learners, in this section we have seen the test procedure for Paired t-test with
simple problems.
1. The degrees of freedom for statistic-t for paired t-test based on n pairs of
observations is:
A. 2(n-1)
B. n-1
C. 2n-1
A. paired
B. correlated
C. Equal in number
variables; the resulting table is called a two-way frequency table or a contingency table.
One characteristic or attribute is shown along the rows and the other is shown along the
column. Each cell of the table gives the cont or the number of cases corresponding to
that cell. We wish to test whether the two attributes are independent or not.
Let Oij and Eij denote the observed and expected frequencies respectively in the
ith row and the jth column. When the null hypothesis is true, the expected frequencies
After finding the expected frequencies of all the cells, we calculate the chi-square value
as follows.
(Oi Ei )2
2
follows Chi-Square distribution with (r-1)x(c-1) df.
Ei
The degrees of freedom is (r-1)x(c-1), where r is the number of rows ad c is the number
of columns. The critical value can be seen from the statistical tables corresponding to
these degrees of freedom at the chosen level of significance. If the chi-square value
exceeds the critical value, we reject the null hypothesis, the conclusion is that the two
characteristics are not independent and they are dependent with each other.
From the following table, test whether the son’s eye colour is dependent with father’s
eye colour;
Null Hypothesis H0: The eye colour of father and eye colour of son are independent.
Alternative HyothesisH1: The eye colour of father and eye colour of son are not
independent.
χ2= 0.095
df = (r-1)(c-1) = (2-1)(2-1) = 1.
CONCLUSION: Since the calculated value of Chi-square is more than the table value,
we can accept the null hypothesis H0; Hence we conclude that eye-colour of son is
Let Us Sum Up
order(2X2) is
A. 3
B. 4
C. 2
D. 1
2. The hypothesis that the population variance has a specified value can be
tested by:
A. F-test
B. Z-test
C. Chi-square test
errors and types, sampling distribution and standard error, test of significance for large
problems.
4.7 Glossary
KEYWORDS MEANING
Proportion of Population parameter.
P
p Proportion of Sample statistic.
H0 Null hypothesis.
H1 Alternative hypothesis.
1. Define (i) Hypothesis and its types (ii) errors and its types
through a large no. of retail shops. Before a heavy add campaign the mean sales
per week per shop was 140 dozen. After the campaign, a sample of 26 shops was
taken and the mean sales was found to be 147 dozen with a standard deviation of
3. Samples of two types of electric bulbs were tested for length of life and following
Type - I Type - II
Sample no. 8 7
Sample Mean 1234hours 1046 hours
Sample SD 36hours 40 hours
4. An I.Q test was conducted to 5 persons before and after they were trained, the
Candidates I II III IV V
Test whether there is any change in IQ after the training programme at 5% level.
eye colour
4.9 EXERCISES
1. A survey of 320 families with 5 children each reveals the following distribution:
No. of Boys: 5 4 3 2 1 0
No. of Girls: 0 1 2 3 4 5
Is this result being consistent with the hypothesis that male and female births were
equally probable?
2. Fit a Poisson distribution to the following data which gives the number of yeast cells
No. of cells
per square: 0 1 2 3 4 5 6 7 8 9 10
No. of
3. 15000 random numbers were taken from some logarithmic table and the following
Freq: 1493 1441 1461 1552 1494 1454 1613 1491 1482 1519
Using chi-square test, test the hypothesis that each digit has an equal chance
of being chosen.
4. The theory predicts the proportion of beans in the groups A, B, C and D should be
9:3:3:1. In an experiment among the beans, the numbers in the four groups were
882, 313, 287, and 118. Does the experimental report support the theory?
5. Samples of two types of electric bulbs were tested for length of life and the following
Test the significant difference between the average life of the bulbs.
6. A group of five patients treated with medicine A weigh 42, 39,48,60 and 41kgm. A
second group of 7 patients from the same hospital treated with medicine B weigh 38,
42,56,64,68,69 and 62kgms. Do you agree with the claim the medicine ‘B’
7. The mean life of a sample of 10 electric bulbs was found to be 1456 hours with a
S.D of 423 hours. A second of 17 bulbs chosen from a different batch showed a
mean life of 1280 hours with S.D 398 hours. Is there significant difference between
with S.D 20 and 25 respectively on the basis of records of 25 days production. Can you
3. Pillai R. S. N. And Bagavathi. V. (2005), Statistics, S. Chand and Company Ltd., New
Delhi.
4. Sancheti D. C. And Kapoor. V. K (2005), Statistics (7th Edition), Sultan Chand & Sons,
New Delhi.
5. Arora P. N, Comprehensive Statistical Methods, Sultan Chand & Sons, New Delhi.
Kolkata.
Unit objectives
1. To impart statistical concepts with small sample tests with regard to mean.
3. To introduce the definition of Chi-square, test its assumptions and its characteristics.
4. To introduce the concepts of Chi-square test for goodness of fit and independence of
Step 3: Choose the level of significance and state the degrees of freedom.
Test procedure: Let x1, x2, …xn be a random sample from a population with mean
µ, then we want to test whether there is any significant different between population
x
The test statistic is given by t ~ t( n 1)
s
n 1
5.1.3 PROBLEM
large no. of retail shops. Before a heavy add campaign the mean sales per week per
shop was 140 dozens. After the campaign, a sample of 26 shops was taken and the
mean sales was found to be 147 dozens with a standard deviation of 16 dozens, can
n = 26, x = 147, s = 16
Reject H0 if t>1.708
x
t = ~ t( n 1)
s
n 1
Conclusion : Since the calculated value is greater than tabulated value, so we may
Suppose we want to test if two independent samples x1, x2, …xn1 and y1, y2,… yn2
of sizes n1 and n2 have been drawn from two normal populations with population mean
μ1and μ2 respectively. Under the Null hypothesis H0: μ1= μ2(There is no significant
population mean.
__ __
x y
t ~ t( n1 n2 1)
1 1
S
n1 n2
n1s12 n2 s22
Where S =
n1 n2 2
Samples of two types of electric bulbs were tested for length of life and following data
were obtained.
Type - I Type - II
Sample no. 8 7
Sample Mean 1234hours 1046 hours
Sample s.d 36hours 40 hours
Is the difference between in the means sufficient to warrant that type I is superior to
Sol:
.n2 = 7. x2 = 1036hrs, s2 = 40hrs
__ __
x y
t ~ t( n1 n2 1)
1 1
S
n1 n2
= 40.73
since the calculated value of t is much greater than table value, hence H0 rejected. So,
Let Us Sum Up
Learners, in this section we have seen that Testing of significance for small
sample tests, test procedure, test for population mean, test for difference of population
mean.
1. Which of the following tests is typically used for hypothesis testing with large
samples?
A. T-test
B. Chi-square test
C. Z-test
D. Wilcoxon test
2. Which test is appropriate for hypothesis testing with small sample sizes?
A. Z-test
B. Chi-square test
C. T-test
D. F-test
Let x1, x2, …xn be the sales of a product in n independent stores for a certain
period before add and y1, y2,… yn are the corresponding sales of the same product for
the same period after the add. Now we want to test the significant difference between
Let the Null hypothesis H0: μ1= μ2 (There is no significant different between the
d
t= follows t distribution with (n-1) d.f
s
n
5.2.2 PROBLEM
An I.Q test was conducted to 5 persons before and after they were trained, the results
Before training
I.Q 110 120 123 132 125
After training
120 118 125 136 121
Test whether there is any change in IQ after the training programme at 5% level.
Solution:
Level of significance = 5%
d
t =
s
n
d =-10 d 2
=140
d
d
n
= -2
S2 = 5.47
t = 0.82
Conclusion: Since the calculated value is less than the table value. So we accept the
from a normal population with mean µ and variance σ 2 then the sample mean is
2 2
distributed normally with mean µ and variance . i.e., x ~N(µ, .)
n n
x
The test statistic is given by Z = ~N(0,1)
n
Then compare the calculated value with the tabulated value at given level of
significance, reject H0 if the calculated value is greater than the tabulated value.
5.2.4 PROBLEM
A sample of 900 members has a mean 3.4 Cms with S.D 2.61 cms. IS the sample
Null Hypothesis H0 : The sample has been drawn from the population with mean 3.25
Cm
x
Z = ~N(0,1), since n is large.
n
Here we are given x = 3.4 cm, n = 900cms, µ = 3.25. σ = 2.61
= 1.73
Hence, we conclude that the sample is come from the population with mean 3.25.
Let x1 be the mean of a sample of size n1 from a population with mean μ1 and
variance 21 and x 2 be the mean of an independent sample of size n2 from an another
population with mean μ2 and variance 2 2 then sample sizes are large,
12 22
x1 ~ N(μ1, .) x 2 ~N(μ2, )
n1 n2
Under the Null hypothesis H0: μ1= μ2(There is no significant difference between two
population mean.)
population mean)
Then compare the calculated value with the tabulated value at given level of
significance, reject H0 if the calculated value is greater than the tabulated value.
Remark 1:
98 Periyar University – CDOE| Self-Learning Material
If both the sample drawn from the populations with common S.D σ, then 12 22
( x1 x2 )
then we get, z ~N(0,1)
1 1
( )
n1 n2
Remark 2:
5.2.6 - PROBLEM
The means of two single large samples of 1000and 2000 members are 67.5 and 68
inches respectively. Can the samples regarded as drawn from the same population of
Solution: We are given : .n1 = 1000, n2 = 2000, x1 = 67.5, x 2 =68
population means.)
Alternative Hypothesis H1: μ1≠ μ2 (There is a significant difference between the two
population means.)
(67.5 68)
z = -5.1
1 1
2.5 ( )
1000 2000
Let Us Sum Up
Learners, in this section we have seen the Paired t-test, test procedure, test for
population distribution?
2.When comparing the means of two independent samples with equal variances,
A. Paired t-test
C. Z-test
D.ANOVA
the population having a particular characteristic of interest. For example, the researcher
may be concerned with the proportion of defective items turned out by the production
consumer’s expression a preference for one packaging over another. In such situations,
Test Procedure: If p represents the proportion of success, then using the binomial
distribution, the mean and the standard deviation of the proportion of success are given
pq
by p and respectively. For testing the null hypothesis H0: p = p0 against the
n
Where p is the sample proportion of success.
Problem: a fruits marketing federation claims that nor more than 4% of the apples
Solution:
The sample proportion, p = 36/600 = .06. The test statistic is given by
0.06 0.04
Z = 2.5
(.04)(.96)
600
Since the calculated value is greater than the table value. Hence we may reject the
Hypothesis H0.
PROPORTIONS
this section, we will discuss tests associated with the difference between two population
proportions.. We have two populations and two samples are drawn from each of them.
We are interested in testing the significance in difference between the two population
proportions, or equivalently whether the proportions (of success) of the two populations
are equal or not. Representing p1 and p2 as the proportion of success of the first and
H0: p1=p2 Vs H1 p1 ≠ p2
from which the samples have been taken. Under H0, an unbiased estimate of the
X1 X 2
P and Q = 1-P
n1 n 2
Then compare the calculated value of Z with table value at given level of significance,
reject H0 if the calculated value is greater than tabulated value. Otherwise accept H 0.
Let Us Sum Up
population proportion, test for single proportion, test for the difference between two
population proportions.
1. You are performing a hypothesis test for a proportion and obtain a z-score of -
1.75. if the significance level is 0.10 for a two-tailed test, what is the p-value?
A. 0.0818
B. 0.0898
C. 0.10
D. 0.05
2. A hypothesis test for a population proportion result in a p-value of 0.03, and the
significance level is 0.05. What is the decision regarding the null hypothesis?
ATTRIBUTES
the resulting table is called a two-way frequency table or a contingency table. One
characteristic or attribute is shown along the rows and the other is shown along the
column. Each cell of the table gives the cont or the number of cases corresponding to
that cell. We wish to test whether the two attributes are independent or not.
Let Oij and Eij denote the observed and expected frequencies respectively in the
ith row and the jth column. When the null hypothesis is true, the expected frequencies
After finding the expected frequencies of all the cells, we calculate the chi-square
value as follows.
(Oi Ei )2
2 ~ (2r 1)(c 1)
Ei
of columns. The critical value can be seen from the statistical tables corresponding to
these degrees of freedom at the chosen level of significance. If the chi-square value
exceeds the critical value, we reject the null hypothesis, the conclusion is that the two
characteristics are not independent and they are dependent with each other.
5.4.1 EXAMPLE
From the following table, Test whether the son’s eye colour is dependent with father’s
eye colour
Solution:
Null Hypothesis H0: The eye colour of father and eye colour of son are independent.
Alternative HyothesisH1: The eye colour of father and eye colour of son are not
independent.
χ2= 0.095
CONCLUSION
Since the calculated value of Chi-square is more than the table value, we can accept
Chi-square test of goodness of fit is given by Karl Pearson which is used to test
the significance of the discrepancy between theory and experiment. It enables us to find
if the deviation of the experiment from theory is just by chance or is it really due to the
n (oi Ei ) 2
2
follows Chi-square distribution with (n-1) d.f.
i 1 Ei
5.4.3 PROBLEM
Chi-square test of goodness of fit is given by Karl Pearson which is used to test
the significance of the discrepancy between theory and experiment. It enables us to find
if the deviation of the experiment from theory is just by chance or is it really due to the
Test Procedure:
n
(o Ei )
2
Test Procedure:
Let two independent random samples of sizes n 1 and n2 are drawn from two normal
populations. We want to test whether these two population variances are equal or not.
Null HypothesisH0: 12 22
n1s12 n s2
S12 = , S22 = 2 2
n1 1 n2 1
S12
F = 2
(If S12 > S22 ) or
S2
S 22
= 2
(If S22 > S12 ) follows F-distribution with ( n1 1 , n2 1 ) d.f or ( n2 1 n1 1 ) d.f.
S1
Reject H0 if the calculated value is greater than tabulated value. Other wise accept it.
5.4.5 - PROBLEM
In a sample of 8 observations, the sum of the squared deviation of items for the
mean was 94.5. in another sample of 10 observations, the value was found to be 101.7.
Solution:
Null HypothesisH0: 12 22
We have S12 =
( X i X )2 ; S22 =
(Yi Y )2
n1 1 n2 1
We are given n1 = 8, (X i X ) 2 = 94.5; n2 = 10. (Y Y )
i
2
= 101.7
94.5 101.7
S12 = = 13.5; S22 = = 11.3
7 9
= 1.1947
D.F = ( n1 1, n2 1 )
= (7, 9)
The calculated value is less than table value, so we accept the hypothesis H 0. The two
Let Us Sum Up
2. If you have a contingency table with 3 rows and 4 columns, how many degrees
A. 6
B. 8
C. 12
D.9
The first unit content on financial services covers Small sample tests with regard
to mean, Difference between mean and paired t-test, F-test, Definition of chi-square
test, assumptions, characteristics, chi-square test for goodness of fit and independence
5.6 Glossary
KEYWORDS MEANING
t-test Small sample
z-test Large sample
X Sample mean
5.7Self-Assessment Questions
1. Samples of two types of electric bulbs were tested for length of life and following
Type - I Type - II
Sample no. 8 7
Sample Mean 1234hours 1046 hours
Sample s.d 36hours 40 hours
Is the difference between in the means sufficient to warrant that type I is superior to
2. An I.Q test was conducted to 5 persons before and after they were trained, the
candidates I II III IV V
Before training
I.Q 110 120 123 132 125
After training
120 118 125 136 121
Test whether there is any change in IQ after the training programme at 5% level.
3. A sample of 900 members has a mean 3.4 Cms with S.D 2.61 cms. Is the sample
4.The means of two single large samples of 1000and 2000 members are 67.5 and 68
inches respectively. Can the samples have regarded as drawn from the same
5.8 EXERCISES
1. Given a sample mean of 83, a sample standard deviation of 12.5 and a sample size
of 22, test the hypothesis that the value of the population mean is 70 against the
alternative that is more than 70. Use the 0.025 significance level.
5,2,8,-1,3,0,6,-2,1,5,0,4
B.P?
3. The mean lifetime of a sample of 25 bulbs is found as 1550 hours with a S.D of 120
hours. The company manufacturing the bulbs claims that the average life of their bulbs
5. A sample of coin flips is collected from three different coins. The results are below.
Use one hypothesis test to test the claim that all three coins have the same probability
of landing heads. Use the critical value method with significance level 0.10.
6. To test the claim that snack choices are related to the gender of the customer, a
survey at a ball park shows this selection of snacks purchased. Write the null
hypothesis and check the assumptions. Do not do the rest of the hypothesis test.
7. In an experiment in breeding mice, a geneticist has obtained 120 brown mice with
pink eyes,48 brown mice with brown eyes,36 white mice with pink eyes and 13 white
mice with brown eyes. Theory predicts that these types of mice should be obtained in
the ratios 9:3:3:1. Test the compatibility of the data with theory, using a 5% critical
value.
8. A small component in an electronic device has two small holes where another tiny
part is fitted. In the manufacturing process the average distance between the two holes
must be tightly controlled at 0.02 mm, else many units would be defective and wasted.
Many times throughout the day quality control engineers take a small sample of the
components from the production line, measure the distance between the two holes,
and make adjustments if needed. Suppose at one time four units ate taken and the
Module 1 1. C. Z-test
2. B. GDP Growth
2. B. Two-sample t-test.
1. A. 0.0818
Module 3
2. A. Reject the null hypothesis.
1. Gupta S. P. (2001), Statistical Methods, Sultan Chand & Sons, New Delhi.
3. Pillai R. S. N. And Bagavathi. V. (2005), Statistics, S. Chand and Company Ltd., New
Delhi.
4. Sancheti D. C. And Kapoor. V. K (2005), Statistics (7th Edition), Sultan Chand & Sons,
New Delhi.
5. Arora P. N, Comprehensive Statistical Methods, Sultan Chand & Sons, New Delhi.
6. Pillai R. S. N. And Bagavathi. V. (1987), Practical Statistics, S. Chand & Company Ltd.,
New Delhi.
Delhi.