0% found this document useful (0 votes)
31 views

Bcs301 Imp Notes PRP

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Bcs301 Imp Notes PRP

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

Inspire before you expire…, BCS301

||JAI SRI GURUDEV||


S.J.C. INSTITUTE OF TECHNOLOGY, CHICKBALLAPUR
Department of Mathematics
LECTURE NOTES
MATHEMATICS-3 FOR COMPUTER SCIENCE STREAM (BCS301)
MODULE - 1
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

Prepared by:
Purushotham P
Assistant Professor
SJC Institute of Technology
Email id: [email protected]

Random Experiment:
An activity that yield some results called the random experiment. The random variable means a real
number, i.e. X associated with the outcomes of a random experiment.
Definition: Let S be a sample space associated with a random experiment with a real value function defined
and taking its values is called a Random variable.
The random variables are two types. They are,
i) Discrete Random Variables (DRV)
ii) Continuous Random Variables (CRV)

Discrete Random Variables: A Discrete random variable is a variable which can only take a countable
number of values.
For example, if a coin is tossed three times, the number of heads can be obtained is 0, 1, 2 or 3. The
probabilities of each of these probabilities can be tabulated as shown.
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
X 0 1 2 3
P(x) 1 3 3 1
8 8 8 8

Continuous Random variables: A Continuous random variable is a random variable where the data can
take infinitely many values. For example, a random variable measuring the time taken for something to be
done is continuous since there are an infinite number of possible times that can be taken.
Ex: Temperature of the climate, Age of a person, etc.

Probability Mass Function:


Probability mass function is the probability distribution of a discrete random variable and provides the possible
values and their associated probabilities.
1. 𝑃(𝑥𝑖 ) ≥ 0
2. ∑𝑛𝑖=1 𝑃(𝑋 = 𝑥𝑖 ) = 1
3. 0 ≤ 𝑃(𝑥) ≤ 1
4. 𝑀𝑒𝑎𝑛 𝜇 = ∑𝑛𝑖=1 𝑥𝑖 𝑃(𝑥𝑖 )
5. 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝜎 2 = ∑𝑛𝑖=1 𝑥𝑖2 𝑃(𝑥𝑖 ) − 𝜇2

Probability Density Function:


Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 1 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
Probability density function is the probability distribution of a continuous random variable and provides the
possible values and their associated probabilities infinitely.
1. 𝑃(𝑥𝑖 ) ≥ 0 𝑜𝑟 𝑓(𝑥) ≥ 0

2. ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1

3. Mean 𝜇 = ∫−∞ 𝑥𝑓(𝑥)𝑑𝑥

4. Variance 𝜎 2 = ∫−∞ 𝑥 2 𝑓(𝑥)𝑑𝑥 − 𝜇2
𝑏
5. 𝑃(𝑎 ≤ 𝑥 ≤ 𝑏) = 𝑃(𝑎 < 𝑥 ≤ 𝑏) = 𝑃(𝑎 ≤ 𝑥 < 𝑏) = 𝑃(𝑎 < 𝑥 < 𝑏) = ∫𝑎 𝑓(𝑥)𝑑𝑥

PROBLEMS
1) Show that the following probabilities one satisfying the properties of discrete random variables,
hence find it’s mean and variance.
x 10 20 30 40
P(x) 1 3 3 1
8 8 8 8
Soln: Let X be the random variable for the random values,
x1 = 10, x2 = 20, x3 = 30, x4 = 40
and given
1
𝑃(𝑋 = 𝑥1 ) = 𝑃(𝑥1 ) = 𝑝1 =
8
3
𝑃(𝑋 = 𝑥2 ) = 𝑃(𝑥2 ) = 𝑝2 =
8
3
𝑃(𝑋 = 𝑥3 ) = 𝑃(𝑥3 ) = 𝑝3 =
8
1
𝑃(𝑋 = 𝑥4 ) = 𝑃(𝑥4 ) = 𝑝4 =
8

Let ∑4𝑖=1 𝑃(𝑋 = 𝑥𝑖 ) = 𝑃(𝑋 = 𝑥1 ) + 𝑃(𝑋 = 𝑥2 ) + 𝑃(𝑋 = 𝑥3 ) + 𝑃(𝑋 = 𝑥4 )

1 3 3 1
= + + +
8 8 8 8

8
=
8

=1
Hence the given probabilities can satisfy the DRV property.

Mean 𝜇 = ∑4𝑖=1 𝑥𝑖 𝑃(𝑥𝑖 )


1 3 3 1
=10 × 8 + 20 × 8 + 30 × 8 + 40 × 8

200
= 8

=25

Variance 𝜎 2 = ∑4𝑖=1 𝑥𝑖2 𝑃(𝑥𝑖 ) − 𝜇2

1 3 3 1
=102 × + 202 × + 302 × + 402 × − 252
8 8 8 8

=700 − 625

=75

S.D=√𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = √75 = 8.66

2) Find the value of k, such that the following distribution represents discrete probability
distribution. Hence find Mean, S.D, 𝑷(𝒙 ≤ 𝟏), 𝑷(𝒙 > 𝟏) and 𝑷(−𝟏 < 𝒙 ≤ 𝟐).
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 2 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301

x -3 -2 -1 0 1 2 3
P(x) k 2k 3k 4k 3k 2k k

Soln: Let X be the random variable for the random values,


x1 = -3, x2 = -2, x3 = -1, x4 = 0, x5 = 1, x6 = 2, x7 = 3
and the given probabilities are,
𝑃(𝑋 = 𝑥1 ) = 𝑃(−3) = 𝑘
𝑃(𝑋 = 𝑥2 ) = 𝑃(−2) = 2𝑘
𝑃(𝑋 = 𝑥3 ) = 𝑃(−1) = 3𝑘
𝑃(𝑋 = 𝑥4 ) = 𝑃(0) = 4𝑘
𝑃(𝑋 = 𝑥5 ) = 𝑃(1) = 3𝑘
𝑃(𝑋 = 𝑥6 ) = 𝑃(2) = 2𝑘
𝑃(𝑋 = 𝑥7 ) = 𝑃(3) = 𝑘

We know that,
∑7𝑖=1 𝑃(𝑋 = 𝑥𝑖 ) = 1
⇒ 𝑘 + 2𝑘 + 3𝑘 + 4𝑘 + 3𝑘 + 2𝑘 + 𝑘 = 1
⇒ 16𝑘 = 1
1
⇒𝑘=
16
𝑥 𝑃(𝑥) 𝑥𝑃(𝑥) 𝑥 2 𝑥 2 𝑃(𝑥)
-3 K -3k 9 9k
-2 2k -4k 4 8k
-1 3k -3k 1 3k
0 4k 0 0 0
1 3k 3k 1 3k
2 2k 4k 4 8k
3 K 3k 9 9k
 0 - 40k
Mean 𝜇 = ∑4𝑖=1 𝑥𝑖 𝑃(𝑥𝑖 ) = 0

Variance 𝜎 2 = ∑4𝑖=1 𝑥𝑖2 𝑃(𝑥𝑖 ) − 𝜇2

= 40𝑘 − 02
1
= 40 × 16

= 2.5
S.D = √2.5 = 1.5811

𝑖) 𝑃(𝑥 ≤ 1) = 𝑃(−3) + 𝑃(−2) + 𝑃(−1) + 𝑃(0) + 𝑃(1)


⇒ 𝑃(𝑥 ≤ 1) = 𝑘 + 2𝑘 + 3𝑘 + 4𝑘 + 3𝑘
13
⇒ 𝑃(𝑥 ≤ 1) = 13𝑘 = = 0.8125
16
3
𝑖𝑖) 𝑃(𝑥 > 1) = 𝑃(2) + 𝑃(3) = 2𝑘 + 𝑘 = 3𝑘 = = 0.1875
16
9
𝑖𝑖𝑖) 𝑃(−1 < 𝑥 ≤ 2) = 𝑃(0) + 𝑃(1) + 𝑃(2) = 4𝑘 + 3𝑘 + 2𝑘 = 9𝑘 = = 0.5625
16

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 3 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
3) Find the value of k, such that the following distribution represents discrete probability
distribution. Hence find Mean, S.D, 𝑷(𝒙 ≥ 𝟓) and 𝑷(𝟑 < 𝒙 ≤ 𝟔).
𝑥 0 1 2 3 4 5 6
𝑃(𝑥) 𝑘 3𝑘 5𝑘 7𝑘 9𝑘 11𝑘 13𝑘

Soln: Let X be the random variable for the random values,


x1 = 0, x2 = 1, x3 = 2, x4 = 3, x5 = 4, x6 = 5, x7 = 6
and the given probabilities are,
𝑃(𝑋 = 𝑥1 ) = 𝑃(0) = 𝑘
𝑃(𝑋 = 𝑥2 ) = 𝑃(1) = 3𝑘
𝑃(𝑋 = 𝑥3 ) = 𝑃(2) = 5𝑘
𝑃(𝑋 = 𝑥4 ) = 𝑃(3) = 7𝑘
𝑃(𝑋 = 𝑥5 ) = 𝑃(4) = 9𝑘
𝑃(𝑋 = 𝑥6 ) = 𝑃(5) = 11𝑘
𝑃(𝑋 = 𝑥7 ) = 𝑃(6) = 13𝑘

We know that,
∑7𝑖=1 𝑃(𝑋 = 𝑥𝑖 ) = 1
⇒ 𝑘 + 3𝑘 + 5𝑘 + 7𝑘 + 9𝑘 + 11𝑘 + 13𝑘 = 1
⇒ 49𝑘 = 1
1
⇒𝑘=
49

𝑥 𝑃(𝑥) 𝑥𝑃(𝑥) 𝑥2 𝑥 2 𝑃(𝑥)


0 k 0 0 0
1 3k 3k 1 3k
2 5k 10k 4 20k
3 7k 21k 9 63k
4 9k 36k 16 144k
5 11k 55k 25 275k
6 13k 78k 36 468k
 203k - 973k
203
Mean 𝜇 = ∑4𝑖=1 𝑥𝑖 𝑃(𝑥𝑖 ) = 203𝑘 = = 4.1428
49

Variance 𝜎 2 = ∑4𝑖=1 𝑥𝑖2 𝑃(𝑥𝑖 ) − 𝜇2

= 973𝑘 − 4.14282
973
= 49
− 17.1628

= 2.6943

S.D = √2.6943 = 1.6414

𝑖) 𝑃(𝑥 ≥ 5) = 𝑃(5) + 𝑃(6)


⇒ 𝑃(𝑥 ≥ 5) = 11𝑘 + 13𝑘
24
⇒ 𝑃(𝑥 ≥ 5) = 24𝑘 = = 0.4898
49
33
𝑖𝑖) 𝑃(3 < 𝑥 ≤ 6) = 𝑃(4) + 𝑃(5) + 𝑃(6) = 9𝑘 + 11𝑘 + 13𝑘 = 33𝑘 = = 0.6734
49

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 4 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
4) A random variable X has a probability function for various values of x. Find i) k, ii) 𝑷(𝒙 < 𝟔),
iii) 𝑷(𝒙 ≥ 𝟔) and 𝑷(𝟑 < 𝒙 ≤ 𝟔). Also find the probability distribution and distribution function of
x.

0 1 2 3 4 5 6 7
𝒙
0 𝒌 𝟐𝒌 𝟐𝒌 𝟑𝒌 𝒌𝟐 𝟐𝒌𝟐 𝟕𝒌𝟐 + 𝒌
𝑷(𝒙)

Soln: Let X be the random variable for the random values,


x1 = 0, x2 = 1, x3 = 2, x4 = 3, x5 = 4, x6 = 5, x7 = 6, x8 = 7
and the given probabilities are,
𝑃(𝑋 = 𝑥1 ) = 𝑃(0) = 0
𝑃(𝑋 = 𝑥2 ) = 𝑃(1) = 𝑘
𝑃(𝑋 = 𝑥3 ) = 𝑃(2) = 2𝑘
𝑃(𝑋 = 𝑥4 ) = 𝑃(3) = 2𝑘
𝑃(𝑋 = 𝑥5 ) = 𝑃(4) = 3𝑘
𝑃(𝑋 = 𝑥6 ) = 𝑃(5) = 𝑘 2
𝑃(𝑋 = 𝑥7 ) = 𝑃(6) = 2𝑘 2
𝑃(𝑋 = 𝑥7 ) = 𝑃(7) = 7𝑘 2 + 𝑘

We know that,
∑7𝑖=1 𝑃(𝑋 = 𝑥𝑖 ) = 1
⇒ 0 + 𝑘 + 2𝑘 + 2𝑘 + 3𝑘 + 𝑘 2 + 2𝑘 2 + 7𝑘 2 + 𝑘 = 1

⇒ 10𝑘 2 + 9𝑘 = 1

⇒ 10𝑘 2 + 9𝑘 − 1 = 0

⇒ (10𝑘 − 1)(𝑘 + 1) = 0

⇒ 10𝑘 − 1 = 0 , 𝑘 + 1 = 0

1
⇒𝑘= , 𝑘 ≠ −1
10

𝑥
0 1 2 3 4 5 6 7

𝑃(𝑥)
O 0.1 0.2 0.2 0.3 0.01 0.02 0.17

CP
0 0.1 0.3 0.5 0.8 0.81 0.83 1

𝑖) 𝑃(𝑥 < 6) = 1 − 𝑃(𝑥 ≥ 6) = 1 − {𝑃(6) + 𝑃(7)} = 1 − {0.02 + 0.17} = 0.81

𝑖𝑖) 𝑃(𝑥 ≥ 6) = 𝑃(6) + 𝑃(7) = 0.02 + 0.17 = 0.19

𝑖𝑖𝑖) 𝑃(3 < 𝑥 ≤ 6) = 𝑃(4) + 𝑃(5) + 𝑃(6) = 0.3 + 0.01 + 0.02 = 0.33

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 5 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
5) A random variable has the following probability function for the various values of X=x. Find
i) Value of k, ii) 𝑷(𝒙 < 𝟏), iii)𝑷(𝒙 ≥ 𝟏).
𝒙 -2 -1 0 1 2 3
𝑷(𝒙) 0.1 𝒌 0.2 2𝒌 0.3 𝒌

Soln: Let X be the random variable for the random values,


x1 = -2, x2 = -1, x3 = 0, x4 = 1, x5 = 2, x6 = 3,
and the given probabilities are,
𝑃(𝑋 = 𝑥1 ) = 𝑃(−2) = 0.1
𝑃(𝑋 = 𝑥2 ) = 𝑃(−1) = 𝑘
𝑃(𝑋 = 𝑥3 ) = 𝑃(0) = 0.2
𝑃(𝑋 = 𝑥4 ) = 𝑃(1) = 2𝑘
𝑃(𝑋 = 𝑥5 ) = 𝑃(2) = 0.3
𝑃(𝑋 = 𝑥6 ) = 𝑃(3) = 𝑘

i) We know that,
∑6𝑖=1 𝑃(𝑋 = 𝑥𝑖 ) = 1
⇒ 0.1 + 𝑘 + 0.2 + 2𝑘 + 0.3 + 𝑘 = 1

⇒ 4𝑘 + 0.6 = 1

⇒ 4𝑘 = 0.4

⇒ 𝑘 = 0.1

𝑖𝑖) 𝑃(𝑥 < 1) = 𝑃(−2) + 𝑃(−1) + 𝑃(0) = 0.1 + 𝑘 + 0.2 = 𝑘 + 0.3 = 0.1 + 0.3 = 0.4

𝑖𝑖𝑖) 𝑃(𝑥 ≥ −1) = 𝑃(−1) + 𝑃(0) + 𝑃(1) + 𝑃(2) + 𝑃(3) = 𝑘 + 0.2 + 2𝑘 + 0.3 + 𝑘 = 4𝑘 + 0.5 = 0.9

6) A random variable has the following probability function for the various values of X=x. Find
i)Value of k, ii)𝑷(𝒙 ≤ 𝟏), iii) 𝑷(𝟎 ≤ 𝒙 < 𝟑).
𝒙 0 1 2 3 4 5
𝑷(𝒙) 𝒌 𝟓𝒌 𝟏𝟎𝒌 𝟏𝟎𝒌 𝟓𝒌 𝒌

Soln: Let X be the random variable for the random values,


x1 = 0, x2 = 1, x3 = 2, x4 = 3, x5 = 4, x6 = 5,
and the given probabilities are,
𝑃(𝑋 = 𝑥1 ) = 𝑃(0) = 𝑘
𝑃(𝑋 = 𝑥2 ) = 𝑃(1) = 5𝑘
𝑃(𝑋 = 𝑥3 ) = 𝑃(2) = 10𝑘
𝑃(𝑋 = 𝑥4 ) = 𝑃(3) = 10𝑘
𝑃(𝑋 = 𝑥5 ) = 𝑃(4) = 5𝑘
𝑃(𝑋 = 𝑥6 ) = 𝑃(5) = 𝑘

i) We know that,
∑6𝑖=1 𝑃(𝑋 = 𝑥𝑖 ) = 1
⇒ 𝑘 + +5𝑘 + 10𝑘 + 10𝑘 + 5𝑘 + 𝑘 = 1

⇒ 32𝑘 = 1

1
⇒𝑘=
32
6
𝑖𝑖) 𝑃(𝑥 ≤ 1) = 𝑃(0) + 𝑃(1) = 𝑘 + 5𝑘 = 6𝑘 = = 0.1875
32
16
𝑖𝑖𝑖) 𝑃(0 ≤ 𝑥 < 3) = 𝑃(0) + 𝑃(1) + 𝑃(2) = 𝑘 + 5𝑘 + 10𝑘 = 16𝑘 = = 0.5
32
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 6 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
𝒆−𝒙 , 𝒙 ≥ 𝟎
7) Show that the function 𝒇(𝒙) = { is probability density function. Hence find
𝟎, 𝒙 < 𝟎
𝑷(𝟏. 𝟓 < 𝒙 < 𝟐. 𝟓).

Soln: Given probability function,


𝑒 −𝑥 , 𝑥 ≥ 0
𝑓(𝑥) = {
0, 𝑥 < 0
∞ 0 ∞
Let ∫−∞ 𝑓(𝑥)𝑑𝑥 = ∫−∞ 𝑓(𝑥)𝑑𝑥 + ∫0 𝑓(𝑥)𝑑𝑥

0 ∞
= ∫−∞ 0 𝑑𝑥 + ∫0 𝑒 −𝑥 𝑑𝑥

𝑒 −𝑥 ∞
=0+[ ]
−1 0

= −[𝑒 −∞ − 𝑒 0 ]

= −[0 − 1]

=1
Hence the given probability function is p.d.f.
2.5
𝑃(1.5 < 𝑥 < 2.5) = ∫1.5 𝑓(𝑥)𝑑𝑥

2.5
= ∫1.5 𝑒 −𝑥 𝑑𝑥

2.5
= −[𝑒 −𝑥 ]1.5

1 1
= −[𝑒 −2.5 − 𝑒 −1.5 ] = [ − ]
𝑒 1.5 𝑒 2.5

𝒌𝒙𝟐 , 𝟎 ≤ 𝒙 ≤ 𝟑
8) A random variable X has probability density function 𝒇(𝒙) = { , Evaluate i) k
𝟎 , 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
ii) 𝑷(𝒙 ≤ 𝟏), iii) 𝑷(𝒙 > 𝟏), iv) 𝑷(𝟏 ≤ 𝒙 ≤ 𝟐), v) 𝑷(𝒙 ≤ 𝟐), vi) 𝑷(𝒙 ≥ 𝟐).

Soln: Given probability function,


𝑘𝑥 2 , 0 ≤ 𝑥 ≤ 3
𝑓(𝑥) = {
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝑖) ∫ 𝑓(𝑥)𝑑𝑥 = 1
−∞
∞ 0 3 ∞
⇒ ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥 = 1
−∞ −∞ 0 3
3
⇒ 0 + ∫ 𝑘𝑥 2 𝑑𝑥 + 0 = 1
0
3 3
𝑥
⇒ 𝑘[ ] = 1
3 0
⇒ 9𝑘 = 1
1
⇒𝑘=
9
1
𝑖𝑖) 𝑃(𝑥 ≤ 1) = ∫ 𝑓(𝑥)𝑑𝑥
−∞
0 1
⇒ 𝑃(𝑥 ≤ 1) = ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥
−∞ 0

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 7 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
1
⇒ 𝑃(𝑥 ≤ 1) = 0 + ∫ 𝑘𝑥 2 𝑑𝑥
0
1
𝑥3
⇒ 𝑃(𝑥 ≤ 1) = 𝑘 [ ]
3 0
𝑘 1
⇒ 𝑃(𝑥 ≤ 1) = =
3 27

𝑖𝑖𝑖) 𝑃(𝑥 > 1) = ∫ 𝑓(𝑥)𝑑𝑥
1
3 ∞
⇒ 𝑃(𝑥 > 1) = ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥
1 3
3
2
⇒ 𝑃(𝑥 > 1) = 0 + ∫ 𝑘𝑥 𝑑𝑥
1
3
𝑥3
⇒ 𝑃(𝑥 < 1) = 𝑘 [ ]
3 1
26𝑘 26
⇒ 𝑃(𝑥 < 1) = =
3 27
2
𝑖𝑣) 𝑃(1 ≤ 𝑥 ≤ 2) = ∫ 𝑓(𝑥)𝑑𝑥
1
2
⇒ 𝑃(1 ≤ 𝑥 ≤ 2) = ∫ 𝑘𝑥 2 𝑑𝑥
1
2
𝑥3
⇒ 𝑃(1 ≤ 𝑥 ≤ 2) = 𝑘 [ ]
3 1
7𝑘 7
⇒ 𝑃(1 ≤ 𝑥 ≤ 2) = =
3 27
2
𝑣) 𝑃(𝑥 ≤ 2) = ∫ 𝑓(𝑥)𝑑𝑥
−∞
0 2
⇒ 𝑃(𝑥 ≤ 2) = ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥
−∞ 0
1
⇒ 𝑃(𝑥 ≤ 2) = 0 + ∫ 𝑘𝑥 2 𝑑𝑥
0
2
𝑥3
⇒ 𝑃(𝑥 ≤ 2) = 𝑘 [ ]
3 0
8𝑘 8
⇒ 𝑃(𝑥 ≤ 2) = =
3 27

𝑣𝑖) 𝑃(𝑥 ≥ 2) = ∫ 𝑓(𝑥)𝑑𝑥
2
3 ∞
⇒ 𝑃(𝑥 ≥ 2) = ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥
2 3
3
⇒ 𝑃𝑥 ≥ 2) = 0 + ∫ 𝑘𝑥 2 𝑑𝑥
2
3
𝑥3
⇒ 𝑃(𝑥 ≥ 2) = 𝑘 [ ]
3 2
19𝑘 19
⇒ 𝑃(𝑥 ≥ 2) = =
3 27

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 8 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
𝒌𝒙𝟐 , −𝟑 ≤ 𝒙 ≤ 𝟑
9) A random variable X has the pdf, 𝒇(𝒙) = { , find i) k, ii) 𝑷(𝒙 ≤ 𝟐),
𝟎 , 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
iii) 𝑷(𝒙 ≥ 𝟐), iv) 𝑷(𝒙 > 𝟏), v) 𝑷(𝟏 ≤ 𝒙 ≤ 𝟐).

Soln: Given probability function,


𝑘𝑥 2 , −3 ≤ 𝑥 ≤ 3
𝑓(𝑥) = {
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝑖) ∫ 𝑓(𝑥)𝑑𝑥 = 1
−∞
∞ −3 3 ∞
⇒ ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥 = 1
−∞ −∞ −3 3
3
2
⇒ 0 + ∫ 𝑘𝑥 𝑑𝑥 + 0 = 1
−3
3
𝑥3
⇒ 𝑘[ ] = 1
3 −3
⇒ 18𝑘 = 1
1
⇒𝑘=
18
2
𝑖𝑖) 𝑃(𝑥 ≤ 2) = ∫ 𝑓(𝑥)𝑑𝑥
−∞
−3 2
⇒ 𝑃(𝑥 ≤ 2) = ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥
−∞ −3
2
⇒ 𝑃(𝑥 ≤ 2) = 0 + ∫ 𝑘𝑥 2 𝑑𝑥
−3
3 3
𝑥
⇒ 𝑃(𝑥 ≤ 2) = 𝑘 [ ]
3 −3
35𝑘 35 1 35
⇒ 𝑃(𝑥 ≤ 2) = = × =
3 3 18 54

𝑖𝑖𝑖) 𝑃(𝑥 ≥ 2) = ∫ 𝑓(𝑥)𝑑𝑥
2
3 ∞
⇒ 𝑃(𝑥 ≥ 2) = ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥
2 3
3
⇒ 𝑃(𝑥 ≥ 2) = ∫ 𝑘𝑥 2 𝑑𝑥
2
3
𝑥3
⇒ 𝑃(𝑥 ≥ 2) = 𝑘 [ ]
3 2
19𝑘 19 1 19
⇒ 𝑃(𝑥 ≥ 2) = = × =
3 3 18 54

𝑖𝑣) 𝑃(𝑥 > 1) = ∫ 𝑓(𝑥)𝑑𝑥
1
3 ∞
⇒ 𝑃(𝑥 > 1) = ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥
1 3
3
2
⇒ 𝑃(𝑥 > 1) = ∫ 𝑘𝑥 𝑑𝑥
1
3
𝑥3
⇒ 𝑃(𝑥 > 1) = 𝑘 [ ]
3 1
26𝑘 26 1 26
⇒ 𝑃(𝑥 > 1) = = × =
3 3 18 54

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 9 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
2
𝑣) 𝑃(1 ≤ 𝑥 ≤ 2) = ∫ 𝑓(𝑥)𝑑𝑥
1
2
⇒ 𝑃(1 ≤ 𝑥 ≤ 2) = ∫ 𝑘𝑥 2 𝑑𝑥
1
2
𝑥3
⇒ 𝑃(1 ≤ 𝑥 ≤ 2) = 𝑘 [ ]
3 1
7𝑘 7 1 7
⇒ 𝑃(1 ≤ 𝑥 ≤ 2) = = × =
3 3 18 54

10) The diameter of an electric cable is assumed to be a CRV with pdf


𝒌𝒙(𝟏 − 𝒙) , 𝟎 ≤ 𝒙 ≤ 𝟏
𝒇(𝒙) = { , find i) value of k, ii) Mean & Variance.
𝟎 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆

Soln: Given probability function,



𝑊𝐾𝑇, ∫ 𝑓(𝑥)𝑑𝑥 = 1
−∞
0 1 ∞
⇒ 𝑖) ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥 = 1
−∞ 0 1
1
⇒ 0 + ∫ 𝑘𝑥(1 − 𝑥)𝑑𝑥 + 0 = 1
0
1
⇒ 𝑘 ∫ (𝑥 − 𝑥 2 )𝑑𝑥 = 1
0

1
𝑥2 𝑥3
⇒ 𝑘[ − ] = 1
2 3 0
𝑘
⇒ =1
6
⇒𝑘=6

𝑖𝑖) 𝑀𝑒𝑎𝑛 𝜇 = ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞
1
= ∫ 𝑥𝑘𝑥(1 − 𝑥)𝑑𝑥
0
1
= 𝑘 ∫ 𝑥 2 (1 − 𝑥)𝑑𝑥
0
1
𝑥3 𝑥4
= 𝑘[ − ]
3 4 0
1 1 𝑘 6 1
= 𝑘[ − ] = = =
3 4 12 12 2

𝑖𝑖𝑖) 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝜎 2 = ∫ 𝑥 2 𝑓(𝑥)𝑑𝑥 − 𝜇2
−∞
1
= ∫ 𝑘𝑥 3 (1 − 𝑥)𝑑𝑥 − 𝜇2
0
1
12
= 𝑘 ∫ (𝑥 3 − 𝑥 4 )𝑑𝑥 − [ ]
0 2
4 5 1
𝑥 𝑥 1
= 𝑘[ − ] −
4 5 0 4
𝑘 1 6 1 1
= − = × =
20 4 20 4 20

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 10 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301

𝒌𝒙𝒆−𝒙 ,𝟎 < 𝒙 < 𝟏


11) Find the constant k such that 𝒇(𝒙) = { , is pdf. Find the mean.
𝟎 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆

Soln: Given probability function,


𝑘𝑥𝑒 −𝑥 ,0 < 𝑥 < 1
𝑓(𝑥) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

also given f(x) represents pdf for the CRV ‘X’.



𝑊𝐾𝑇, ∫ 𝑓(𝑥)𝑑𝑥 = 1
−∞
0 1 ∞
⇒ ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥 = 1
−∞ 0 1
1
⇒ ∫ 𝑘𝑥𝑒 −𝑥 𝑑𝑥 = 1
0
1 1
⇒ 𝑘 {∫ 𝑒 −𝑥 𝑑𝑥 − ∫ (1 × ∫ 𝑒 −𝑥 𝑑𝑥) 𝑑𝑥 } = 1
0 0
⇒ 𝑘(−𝑒 −1 −𝑒 −1 + 1 = 1
2
⇒ 𝑘 (1 − ) = 1
𝑒
𝑒
⇒𝑘=
𝑒−2

𝑀𝑒𝑎𝑛 𝜇 = ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞

1
= ∫ 𝑥𝑘𝑥𝑒 −𝑥 𝑑𝑥
0

1
= 𝑘 ∫ 𝑥 2 𝑒 −𝑥 𝑑𝑥
0

1 1
= 𝑘 {𝑥 2 ∫ 𝑒 −𝑥 𝑑𝑥 − ∫ (2𝑥 × ∫ 𝑒 −𝑥 𝑑𝑥) 𝑑𝑥 }
0 0

1
= 𝑘 {−(𝑥 2 𝑒 −𝑥 )10 + 2 ∫ 𝑥𝑒 −𝑥 𝑑𝑥 }
0

= 𝑘(2 − 5𝑒 −1 )

5
= 𝑘 (2 − )
𝑒

𝑒 2𝑒 − 5
= ×
𝑒−2 𝑒
2𝑒 − 5
𝜇=
𝑒−2

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 11 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
Binomial Distribution:
Let X be a discrete random variable, ‘p’ be the probability of success and let ‘q’ be the probability of
failure, then the probability mass function of the binomial distribution can be defined as,
𝑛 𝑝 𝑥 𝑞𝑛−𝑥 ,𝑥 ≥ 0
𝑃(𝑋 = 𝑥) = 𝑏(𝑛, 𝑝, 𝑥) = { 𝑐𝑥
0 , 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where, n is the number of trials and n & p are the parameters which follows,
i) 𝑃(𝑋 = 𝑥) = 𝑏(𝑛, 𝑝, 𝑥) ≥ 0
ii) p + q = 1
iii) ∑𝑛𝑥=0 𝑛𝑐𝑥 𝑝 𝑥 𝑞𝑛−𝑥 = 1
iv) The mean of B.D 𝜇 = 𝑛𝑝, Vairance 𝜎 2 = 𝑛𝑝𝑞 and S.D is 𝜎 = √𝑛𝑝𝑞.

MEAN & VARIANCE OF A BINOMIAL DISTRIBUTION:

WKT, the probability mass function of the binomial distribution is,


𝑛 𝑝 𝑥 𝑞𝑛−𝑥 , 𝑥≥0
𝑃(𝑋 = 𝑥) = 𝑓(𝑥) = { 𝑐𝑥
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

i) Mean:
𝑛
𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑃(𝑋 = 𝑥)
𝑥=0

𝑛
=∑ 𝑥 𝑛𝑐𝑥 𝑝 𝑥 𝑞𝑛−𝑥
𝑥=0

𝑛 𝑛!
=∑ 𝑥 𝑝 𝑥−1 𝑝1 𝑞𝑛−𝑥
𝑥=0 (𝑛
𝑥! − 𝑥)!
𝑛 𝑛 (𝑛 − 1)!
=∑ 𝑥 𝑝 𝑥−1 𝑝1 𝑞 𝑛−𝑥
𝑥=0 𝑥 (𝑥 − 1)! (𝑛 − 𝑥)!
𝑛 (𝑛 − 1)!
= 𝑛𝑝 ∑ 𝑝 𝑥−1 𝑞 𝑛−𝑥
𝑥=1 (𝑥 − 1)! (𝑛 − 𝑥)!

𝑛 (𝑛 − 1)!
= 𝑛𝑝 ∑ 𝑝 𝑥−1 𝑞(𝑛−1)−(𝑥−1)
𝑥=1 (𝑥 − 1)! ((𝑛 − 1) − (𝑥 − 1))!

𝑛
= 𝑛𝑝 ∑ (𝑛 − 1)𝑐(𝑥−1) 𝑝 𝑥−1 𝑞(𝑛−1)−(𝑥−1)
𝑥=1

= 𝑛𝑝(1)

𝜇 = 𝐸(𝑥) = 𝑛𝑝

ii) Variance:
𝜎 2 = 𝐸(𝑥 2 ) − [𝐸(𝑥)]2 − − − −(1)

⇒ 𝐸(𝑥 2 ) = 𝐸(𝑥(𝑥 − 1) + 𝑥)

⇒ 𝐸(𝑥 2 ) = 𝐸(𝑥(𝑥 − 1)) + 𝐸(𝑥) − − − −(2)


𝑛
 𝐸(𝑥(𝑥 − 1)) = ∑ 𝑥(𝑥 − 1)𝑝(𝑥)
𝑥=0

𝑛
=∑ 𝑥(𝑥 − 1) 𝑛𝑐𝑥 𝑝 𝑥 𝑞𝑛−𝑥
𝑥=0

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 12 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
𝑛 𝑛!
=∑ 𝑥(𝑥 − 1) 𝑝 𝑥 𝑞𝑛−𝑥
𝑥=0 𝑥! (𝑛 − 𝑥)!
𝑛 𝑛 (𝑛 − 1)(𝑛 − 2)!
=∑ 𝑥(𝑥 − 1) 𝑝 𝑥−2 𝑝2 𝑞𝑛−𝑥
𝑥=0 𝑥 (𝑥 − 1)(𝑥 − 2)! (𝑛 − 𝑥)!
𝑛 𝑛(𝑛 − 1)(𝑛 − 2)! 𝑥−2 2 𝑛−𝑥
=∑ 𝑝 𝑝 𝑞
𝑥=2 (𝑥 − 2)! (𝑛 − 𝑥)!

𝑛 (𝑛 − 2)!
= 𝑛(𝑛 − 1)𝑝2 ∑ 𝑝 𝑥−2 𝑞𝑛−𝑥
𝑥=2 (𝑥 − 2)! (𝑛 − 𝑥)!

𝑛 (𝑛 − 2)!
= 𝑛(𝑛 − 1)𝑝2 ∑ 𝑝 𝑥−2 𝑞 (𝑛−2)−(𝑥−2)
𝑥=2 (𝑥 − 2)! ((𝑛 − 2) − (𝑥 − 2))!

𝑛
= 𝑛(𝑛 − 1)𝑝2 ∑ (𝑛 − 2)𝑐(𝑥−2) 𝑝 𝑥−2 𝑞(𝑛−2)−(𝑥−2)
𝑥=2

= 𝑛(𝑛 − 1)𝑝2 (1)

𝐸(𝑥(𝑥 − 1)) = 𝑛(𝑛 − 1)𝑝2

(2) ⇒ 𝐸(𝑥 2 ) = 𝐸(𝑥(𝑥 − 1)) + 𝐸(𝑥)

⇒ 𝐸(𝑥 2 ) = 𝑛(𝑛 − 1)𝑝2 + 𝑛𝑝

(1) ⇒ 𝜎 2 = 𝐸(𝑥 2 ) − [𝐸(𝑥)]2

⇒ 𝜎 2 = 𝑛(𝑛 − 1)𝑝2 + 𝑛𝑝 − [𝑛𝑝]2

⇒ 𝜎 2 = 𝑛2 𝑝2 − 𝑛𝑝2 + 𝑛𝑝 − 𝑛2 𝑝2

⇒ 𝜎 2 = 𝑛𝑝 − 𝑛𝑝2

⇒ 𝜎 2 = 𝑛𝑝(1 − 𝑝)

𝑏𝑢𝑡 1 − 𝑝 = 𝑞

𝜎 2 = 𝑛𝑝𝑞

PROBLEMS
1) Let X be a binomially distributed random variable based on 6 repetitions of an experiment. If p=0.3,
evaluate the following probabilities i) 𝑷(𝒙 ≤ 𝟑), ii) 𝑷(𝑿 > 𝟒).
Soln: Given p=0.3 and n=6, hence q = 1-p = 1-0.3 = 0.7
and 𝑃(𝑋 = 𝑥) = 𝑏(6,0.3, 𝑥) = 6𝐶𝑥 (0.3)𝑥 (0.7)6−𝑥

i) 𝑃(𝑥 ≤ 3) = 𝑃(0) + 𝑃(1) + 𝑃(2) + 𝑃(3)


= 6𝐶0 (0.3)0 (0.7)6 + 6𝐶1 (0.3)1 (0.7)5 + 6𝐶2 (0.3)2 (0.7)4 + 6𝐶3 (0.3)3 (0.7)3
= 0.1176 + 0.3025 + 0.3241 + 0.1852
= 0.9294

ii) 𝑃(𝑥 ≤ 3) = 𝑃(5) + 𝑃(6)


= 6𝐶5 (0.3)5 (0.7)1 + 6𝐶6 (0.3)6 (0.7)0
= 0.0102 + 0.0007
= 0.0109

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 13 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
2) The probability that a pen manufactured by a company will be defective is 0.1. If 12 such pens are
selected at random, find the probability that
i)Exactly 2 pens will be defective
ii)Atmost 2 pens will be defective
iii)None will be defective

Soln: Let the probability that a pen manufactured is defective, p=0.1


then, q = 1-p = 1-0.1 = 0.9 and given n=12
Hence 𝑃(𝑋 = 𝑥) = 𝑏(6,0.3, 𝑥) = 12𝐶𝑥 (0.1)𝑥 (0.9)12−𝑥

i) The probability that exactly 2 pens will be defective, 𝑃(2) = 12𝐶2 (0.1)2 (0.9)12−2
= (66)(0.01)(0.3487)
= 0.2301

ii) The probability that atmost 2 pens will be defective, 𝑃(𝑥 ≤ 2) = 𝑃(0) + 𝑃(1) + 𝑃(2)
= 12𝐶0 (0.1)0 (0.9)12 + 12𝐶1 (0.1)1 (0.9)11 + 12𝐶2 (0.1)2 (0.9)10
= 0.2824 + 0.3766 + 0.2301
= 0.8891

iii) The probability that none will be defective, 𝑃(0) = 12𝐶0 (0.1)0 (0.9)12
= (1)(1)(0.2824)
= 0.2824

3) The number of telephonic lines busy at an instant is a binomial variant with a probability 0.1. If 10
lines are chosen at random, what is the probability that,
i)No line is busy
ii)All lines are busy
iii)Atleast one line is busy
iv)Atmost two lines are busy

Soln: Let the probability that a telephonic line is busy p=0.1


then q = 1-p = 1-0.1 = 0.9 and number of lines chosen is n = 10
Hence 𝑃(𝑋 = 𝑥) = 𝑏(10,0.1, 𝑥) = 10𝐶𝑥 (0.1)𝑥 (0.9)10−𝑥

i) The probability that no line is busy, 𝑃(0) = 10𝐶0 (0.1)0 (0.9)10


= (1)(1)(0.3487)
= 0.3487

ii) The probability that all lines are busy, 𝑃(10) = 10𝐶10 (0.1)10 (0.9)0
= (1)(10−10 )(1)
= 10−10

iii) The probability that atleast one line is busy, 𝑃(𝑥 ≥ 1) = 1 − 𝑃(0)
= 1 − 10𝐶0 (0.1)0 (0.9)10
= 1 − 0.3487
= 0.6513

iv) The probability that atmost two lines are busy, 𝑃(𝑥 ≤ 2) = 𝑃(0) + 𝑃(1) + 𝑃(2)
= 10𝐶0 (0.1)0 (0.9)10 + 10𝐶1 (0.1)1 (0.9)9 + 10𝐶2 (0.1)2 (0.9)8
= 0.3487 + 0.3874 + 0.1937
= 0.92968

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 14 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
4) When a coin is tossed 4 times, find the probability of getting i)Exactly one head, ii)Atmost three
heads, iii)Atleast two heads.

Soln: The number o times a coin is tossed, n=4


Let x be the binomial variant getting head, p=0.5
then q = 1-p = 1-0.5 = 0.5
Hence 𝑃(𝑋 = 𝑥) = 𝑏(4,0.5, 𝑥) = 4𝐶𝑥 (0.5)𝑥 (0.5)4−𝑥
= 4𝐶𝑥 (0.5)4 = 4𝐶𝑥 (0.0625)

i) The probability of getting exactly one head, 𝑃(1) = 4𝐶1 (0.0625) = 4 × 0.0625 = 0.25

ii) The probability of getting atmost three heads, 𝑃(𝑥 ≤ 3) = 1 − 𝑃(4)


= 1 − 4𝐶4 (0.0625)
= 1 − 0.0625
= 0.9375

iii) The probability of getting atleast two heads, , 𝑃(𝑥 ≥ 2) = 𝑃(2) + 𝑃(3) + 𝑃(4)
= 4𝐶2 (0.0625) + 4𝐶3 (0.0625) + 4𝐶4 (0.0625)
= 0.375 + 0.25 + 0.0625
= 0.6875

5) The probability of germination of a seed in a packet of seeds is found to be 0.7. If 10 seeds are
taken for experimenting on germination in a laboratory, find the probability that
i)8 seeds germinate
ii)Atleast 8 seeds germinate
iii)Atmost 8 seeds germinate

Soln: Let X be the binomial variant of seed germination.


Given the number of seeds taken for experimenting in laboratory, n=10
The probability of germination of a seed in a packet of seeds is, p=0.7
then q = 1-p = 1-0.7 = 0.3
Hence, 𝑃(𝑋 = 𝑥) = 10𝐶𝑥 (0.7)𝑥 (0.3)10−𝑥

i) The probability that exactly 8 seeds germinate, 𝑃(8) = 10𝐶8 (0.7)8 (0.3)2
= (45)(0.0576)(0.09)
= 0.2334

ii) The probability that atleast 8 seeds germinate, 𝑃(𝑥 ≥ 8) = 𝑃(8) + 𝑃(9) + 𝑃(10)
= 10𝐶8 (0.7)8 (0.3)2 + 10𝐶9 (0.7)9 (0.3)1 + 10𝐶10 (0.7)10 (0.3)0
= 0.2334 + 0.1210 + 0.0282
= 0.3826

iii) The probability that atmost 8 seeds germinate, 𝑃(𝑥 ≤ 8) = 1 − {𝑃(9) + 𝑃(10)}
= 1 − {10𝐶9 (0.7)9 (0.3)1 + 10𝐶10 (0.7)10 (0.3)0 }
= 1 − {0.1210 + 0.0282}
= 0.8508

6) A communication channel receives independent pulses at the rate of 12 pulses per micro second.
The probability of transmission error is 0.001 for each micro second. Compute the probability of,
i)No error during a micro second
ii)1 error
iii)Atleast 1 error
iv)2 error
v)Atmost 2 error

Soln: Let X be the binomial variant of Transmission error.


Given the number of pulses per micro second, n=12
Let p be the probability of transmission error, p=0.001
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 15 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
then q = 1-p = 1-0.001 = 0.999

The pmf of binomial distribution is, 𝑃(𝑋 = 𝑥) = 𝑃(𝑥) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥


= 12𝐶𝑥 (0.001)𝑥 (0.999)12−𝑥

i) The probability of no error during a micro second, 𝑃(0) = 12𝐶0 (0.001)0 (0.999)12
= (1)(1)(0.9880)
= 0.9880

ii) The probability of only one error during a micro second, 𝑃(1) = 12𝐶1 (0.001)1 (0.999)11
= (12)(0.001)(0.9890)
= 0.01186

iii) The probability of atleast one error during a micro second, 𝑃(𝑥 ≥ 1) = 1 − 𝑃(0)
= 1 − 12𝐶0 (0.001)0 (0.999)12
= 1 − 0.9880
= 0.0120

iv) The probability of two error during a micro second, 𝑃(2) = 12𝐶2 (0.001)2 (0.999)10
= (66)(0.000001)(0.9900)
= 0.00006534

v) The probability of atmost two error during a micro second, 𝑃(𝑥 ≤ 2) = 𝑃(0) + 𝑃(1) + 𝑃(2)
= 12𝐶0 (0.001)0 (0.999)12 + 12𝐶1 (0.001)1 (0.999)11 + 12𝐶2 (0.001)2 (0.999)10
= 0.9880 + 0.01186 + 0.00006534
= 0.999925

7) In 800 families with 5 children each, how many family would be expected to have,
i)3 boys
ii)5 girls
iii)Atmost 2 girls
iv)Either 2 or 3 boys
by assuming probability for boys and girls to be equal.

Soln: The total number of families given is 800 and number of children per family is, n=5
Given the probability of boy or girl to born, p=0.5
then q = 1-p = 1-0.5 = 0.5

The pmf of binomial distribution is, 𝑃(𝑋 = 𝑥) = 𝑃(𝑥) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥 = 5𝐶𝑥 (0.5)𝑥 (0.5)5−𝑥
= 5𝐶𝑥 (0.5)5 = 5𝐶𝑥 (0.03125)

i) The probability to have exactly 3 boys, 𝑃(3) = 5𝐶3 (0.03125)


= (10)(0.03125)
= 0.3125
The total number of families may have exactly 3 boys = 800 × 0.3125 = 250.

ii) The probability to have exactly 5 girls, 𝑃(5) = 5𝐶5 (0.03125)


= (1)(0.03125)
= 0.03125
The total number of families may have exactly 5 girls, = 800 × 0.03125 = 25.

iii) The probability to have atmost two girls, 𝑃(𝑥 ≤ 2) = 𝑃(0) + 𝑃(1) + 𝑃(2)
= 5𝐶0 (0.03125) + 5𝐶1 (0.03125) + 5𝐶2 (0.03125)
= 0.03125 + 0.15625 + 0.3125
= 0.5
The total number of families may have atmost two girls, = 800 × 0.5 = 400.

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 16 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
iv) The probability to have either 2 or 3 boys, 𝑃(2 ≤ 𝑥 ≤ 3) = 𝑃(2) + 𝑃(3)
= 5𝐶2 (0.03125) + 5𝐶3 (0.03125)
= 0.03125 + 0.03125
= 0.0625
The total number of families may have either 2 or 3 boys, = 800 × 0.0625 = 500.

Poisson Distribution:
Let X be the discrete random variable for any real value  , such that the probability mass function of
poisson distribution can be defined as,

𝑒 −  𝑥
𝑃(𝑋 = 𝑥) = 𝑃(𝑥) = { 𝑥! ,𝑥 ≥ 0
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where,  is called the parameter and,

i) 𝑃(𝑋 = 𝑥) = 𝑃(𝑥) ≥ 0

𝑒 −𝜆 𝜆𝑥
ii) ∑𝑛𝑥=0 𝑃(𝑥) = ∑𝑛𝑥−0 𝑥!
=1

iii) Mean 𝜇 = 𝑛𝑝 = 𝜆

iv) Vairance 𝜎 2 = 𝜆, S.D = √𝜆

The poisson distribution can be used to find the probabaility that an event might happen a definite
number of times based on how often it usually occurs and the companies can utilize the poisson
distribution to examine how they may be able to take steps to improve their operational effieciency.

MEAN & VARIANCE OF A POISSON DISTRIBUTION:

WKT, the probability mass function of the poisson distribution is,


𝑒 −  𝑥
𝑃(𝑋 = 𝑥) = 𝑃(𝑥) = { 𝑥! , 𝑥≥0
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

i) Mean:

𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑃(𝑥)
𝑥=0

∞ 𝑒 −  𝑥
=∑ 𝑥
𝑥=0 𝑥!

∞ 𝑒 − 𝑥−1 
=∑ 𝑥
𝑥=0 𝑥(𝑥 − 1)!

𝑒 − 𝑥−1

= ∑
𝑥=1 (𝑥 − 1)!

= (1)

𝜇=

ii) Variance:
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 17 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
𝜎 2 = 𝐸(𝑥 2 ) − 𝜇2 − − − −(1)

= 𝐸(𝑥(𝑥 − 1) + 𝑥) − 𝜇2

= 𝐸(𝑥(𝑥 − 1)) + 𝐸(𝑥) − 𝜇2 − − − −(2)



 𝐸(𝑥(𝑥 − 1)) = ∑ 𝑥(𝑥 − 1)𝑃(𝑥)
𝑥=0

∞ 𝑒 −  𝑥
=∑ 𝑥(𝑥 − 1)
𝑥=0 𝑥!

∞ 𝑒 − 𝑥−2 2
=∑ 𝑥(𝑥 − 1)
𝑥=0 𝑥(𝑥 − 1)(𝑥 − 2)!

∞ 𝑒 − 𝑥−2
= 2 ∑
𝑥=2 (𝑥 − 2)!

= 2 (1)

𝐸(𝑥(𝑥 − 1)) = 2 (1)

(2) ⇒ 𝜎 2 = 2 +  − 2

⇒ 𝜎2 = 

𝑆. 𝐷 = 𝜎 = √

𝑀𝑒𝑎𝑛 =  = 𝑛𝑝

PROBLEMS
1) The number of accidents in a year to taxi drivers in a city follows a poisson distribution with mean
3. Out of 1000 taxi drivers find approximately the number of drivers with,
i)No accident in a year
ii)More than 3 accidents in a year.

Soln: Let X be the poisson variant follows accident in the year of the poisson distribution.
𝑒 − 𝑥
The probability mass function of the poisson distribution is 𝑃(𝑋 = 𝑥) =
𝑥!
Given the mean of poisson distribution is 𝜇 =  = 3
𝑒 −3 3𝑥
𝑃(𝑋 = 𝑥) = 𝑥!

i) No accident in a year out of 1000 taxi drivers = 1000 × 𝑃(0)


𝑒 −3 30
= 1000 ×
0!
= 1000 × 0.05
= 50
Hence 50 drivers out of 1000 having no accidents in a year.

ii) More than 3 accidents in a year out of 1000 taxi drivers = 1000 × 𝑃(𝑥 > 3)
= 1000 × [1 − 𝑃(𝑥 ≤ 3)]
= 1000 × [1 − 𝑃(0) − 𝑃(1) − 𝑃(2]) − 𝑃(3)]
𝑒 −3 30 𝑒 −3 31 𝑒 −3 32 𝑒 −3 33
= 1000 × [1 − − − − ]
0! 1! 2! 3!
9 27
= 1000 × (1 − (𝑒 −3 + 𝑒 −3 (3) + 𝑒 −3 ( ) + 𝑒 −3 ( )))
2 6
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 18 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
= 1000 × (1 − 0.06472) = 352.8
= 353
Therefore 353 drivers out of 1000 have done more than 3 accidents in the year.
𝟏
2) In a certain factory turning out razor blades there is a small probability of 𝟓𝟎𝟎 for any blade to be
defective. The blades are supplied in a packets of 10. Use poisson distribution to calculate
approximate number of packets containing,
i)No defective
ii)2 defective
iii)3 defective
in the consignment of 10000 packets.

Soln: Let X be the poisson variant follows the blades to be defective of the poisson distribution.
𝑒 − 𝑥
The probability mass function of the poisson distribution is 𝑃(𝑋 = 𝑥) = 𝑥!
1
Given, p=500 = 0.002 , n=10, 𝜇 = 𝑛𝑝 = 0.002 × 10 = 0.02 = 
𝑒 −0.02 (0.02)𝑥
𝑃(𝑋 = 𝑥) =
𝑥!
i) No blades are defective out 10000 packets = 10000 × 𝑃(𝑥 = 0)
𝑒 −0.02 (0.02)0
= 10000 ×
0!
= 10000 × 0.9802
= 9802
9802 packet blades are not defective out of 10000 packets.

ii) 2 defective blades out of 10000 packets = 10000 × 𝑃(𝑥 = 2)


𝑒 −0.02 (0.02)2
= 10000 ×
2!
= 10000 × 0.0002
=2
2 packets blades are 2 defective out of 10000 packets.

iii) 3 defective blades out of 10000 packets = 10000 × 𝑃(𝑥 = 3)


𝑒 −0.02 (0.02)3
= 10000 ×
3!
= 10000 × 0.0000
=0
No packets blades are 3 defective out of 10000 packets.

3) If the probability of bad reaction from a certain injection is 0.001, determine the probability that out
of 2000 individuals more than 2 will get a bad reaction.

Soln: Let X be the poisson variant follows the bad reaction of the injection.
𝑒 −  𝑥
WKT, The probability mass function of the poisson distribution is 𝑃(𝑋 = 𝑥) = 𝑥!
Given, n=2000, p=0.001 and 𝜇 = 𝑛𝑝 = 2000 × 0.001 = 2 = 
𝑒 −2 (2)𝑥
𝑃(𝑋 = 𝑥) = 𝑥!
The probability that of more than two individuals get bad reaction = 𝑃(𝑥 > 2)
= 1 − 𝑃(𝑥 ≤ 2)
= 1 − [𝑃(0) + 𝑃(1) + 𝑃(2)]
𝑒 −2 (2)0 𝑒 −2 (2)1 𝑒 −2 (2)2
=1− − −
0! 1! 2!
5
= 1 − 2 = 0.3233
𝑒
𝟏
4) The probability that a news reader commits no mistakes in reading the news is 𝒆𝟑 . Find a probability
on a particular news broadcast he commits,
i)Only 2 mistakes
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 19 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
ii)More than 3 mistakes
iii)Atmost 3 mistakes

Soln: Let X be the poisson variant follows the news reader do mistakes of the poisson distribution.
𝑒 − 𝑥
The probability mass function of the poisson distribution is 𝑃(𝑋 = 𝑥) = 𝑥!
1
Given, 𝑃(𝑋 = 0) = 𝑒 3
𝑒 − 0 1 1 1
= ⇒ = ⇒𝜆=3
0! 𝑒3 𝑒𝜆 𝑒3
𝑒 −3 (3)𝑥
𝑃(𝑋 = 𝑥) =
𝑥!

i) The probability that news reader can do 2 mistakes = 𝑃(2)


𝑒 −3 (3)2
=
2!
= 0.2240

ii) The probability that the news reader can do more than 3 mistakes 𝑃(𝑥 > 3) = 1 − 𝑃(𝑥 ≤ 3)
⇒ 𝑃(𝑥 > 3) = 1 − [𝑃(0) + 𝑃(1) + 𝑃(2) + 𝑃(3)]
30 31 32 33
⇒ 𝑃(𝑥 > 3) = 1 − 𝑒 −3 [ + + + ]
0! 1! 2! 3!
⇒ 𝑃(𝑥 > 3) = 1 − 0.05(1 + 3 + 4.5 + 4.5)
⇒ 𝑃(𝑥 > 3) = 1 − 0.65
⇒ 𝑃(𝑥 > 3) = 0.3500

iii) The probability that the news reader can do atmost 3 mistakes = 𝑃(𝑥 ≤ 3)
⇒ 𝑃(𝑥 ≤ 3) = 𝑃(𝑥 = 0) + 𝑃(𝑥 = 1) + 𝑃(𝑥 = 2) + 𝑃(𝑥 = 3)
−3
30 31 32 33
⇒ 𝑃(𝑥 ≤ 3) = 𝑒 [ + + + ]
0! 1! 2! 3!
⇒ 𝑃(𝑥 ≤ 3) = (0.05)(1 + 3 + 4.5 + 4.5)
⇒ 𝑃(𝑥 ≤ 3) = 0.6500

5) Suppose 300 misprints are randomly distributed throughout a book of 500 pages, find the
probability that a given page contains,
i)Exactly 3 misprints
ii)Less than 3 misprints
iii)4 or more misprints

Soln: Let X be the poisson variant of misprints throughout a book of 500 pages.
𝑒 − 𝑥
The probability mass function of the poisson distribution is 𝑃(𝑋 = 𝑥) =
𝑥!
Given, suppose 300 misprints are randomly distributed throughout a book of 500 pages.
300
Mean  = 500 = 0.6
𝑒 −  𝑥
WKT, the pmf of poisson distribution is 𝑃(𝑋 = 𝑥) =
𝑥!

i) The probability that exactly 3 misprints = 𝑃(3)


𝑒 −0.6 (0.6)3
=
3!
= 0.01975

ii) The probability that there are less than three misprints 𝑃(𝑥 < 3) = 𝑃(0) + 𝑃(1) + 𝑃(2)

𝑒 −0.6 (0.6)0 𝑒 −0.6 (0.6)1 𝑒 −0.6 (0.6)2


⇒ 𝑃(𝑥 < 3) = + +
0! 1! 2!
⇒ 𝑃(𝑥 < 3) = 𝑒 −0.6 [1 + 0.6 + 0.18]
⇒ 𝑃(𝑥 < 3) = 0.5488 × 1.78
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 20 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
⇒ 𝑃(𝑥 < 3) = 0.9768

iii) The probability that there are 4 or more misprints, 𝑃(𝑥 ≥ 4) = 1 − 𝑃(𝑥 < 4)

⇒ 𝑃(𝑥 ≥ 4) = 1 − 𝑃(0) − 𝑃(1) − 𝑃(2) − 𝑃(3)

𝑒 −0.6 (0.6)0 𝑒 −0.6 (0.6)1 𝑒 −0.6 (0.6)2 𝑒 −0.6 (0.6)3


⇒ 𝑃(𝑥 ≥ 4) = 1 − − − −
0! 1! 2! 3!

⇒ 𝑃(𝑥 ≥ 4) = 1 − 𝑒 −0.6 (1 + 0.6 + 0.18 + 0.036)

= 1 − 0.5488 × 1.816 = 0.00338

6) A certain screw making machine produces an average 2 defective out of 100 and packs of them in
boxes of 500. Find the probability that the box contains,
i)3 defective
ii)Atleast 1 defective
iii)Between 2 & 4 defective
2
Soln: Given the machine producing an average defective screw is 𝑝 = = 0.02
100
also given, n=500, 𝜇 = 𝑛𝑝 = 500 × 0.02 = 10 = 

𝑒 − 𝑥
The probability mass function of the poisson distribution is 𝑃(𝑋 = 𝑥) = 𝑥!

𝑒 −10 (10)𝑥
𝑃(𝑋 = 𝑥) =
𝑥!

i) The probability that exactly 3 defective = 𝑃(3)


𝑒 −10 (10)3
=
3!
= 0.007566

ii) The probability that atleast 1 screw is defective = 1 − 𝑃(0)


𝑒 −10 (10)0
=1−
0!
= 1 − 0.0000454
= 0.9999546

iii) The probability that between 2 & 4 screw will be defective = 𝑃(2 ≤ 𝑥 ≤ 4)
= 𝑃(2) + 𝑃(3) + 𝑃(4)

𝑒 −10 (10)2 𝑒 −10 (10)3 𝑒 −10 (10)4


= + +
2! 3! 4!
100 1000 10000
= 𝑒 −10 ( + + )
2 6 24

= 𝑒 −10 × 633.32

= 0.02875

Exponential Distribution:
Let X be a continuous random variable for any real value 𝛼 > 0, then the probability density function
𝛼𝑒 −𝛼𝑥 , 𝑥 > 0
of an exponential distribution can be defined as, 𝑃(𝑋 = 𝑥) = 𝑓(𝑥) = { , it follows:
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 21 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301

i) 𝑓(𝑥) ≥ 0

ii) ∫−∞ 𝑓(𝑥) 𝑑𝑥 = 1

1
iii) 𝑀𝑒𝑎𝑛 𝜇 = 𝛼

1
iv) 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝜎 2 = 𝛼2

1
v) 𝑆. 𝐷 𝑜𝑓 𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛, 𝜎 =
𝛼

PROBLEMS
1) If X is an Exponential variant with mean 3, then find 𝑷(𝒙 > 𝟏) & 𝑷(𝒙 < 𝟑).

Soln: Given X be a continuous random variable of an Exponential distribution is,


𝛼𝑒 −𝛼𝑥 , 𝑓𝑜𝑟𝑥 ≥ 0
𝑃(𝑋 = 𝑥) = 𝑓(𝑥) = {
0 , 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
and given the mean of exponential distribution is 3.
1 1
⇒𝜇=3⇒𝛼=3⇒𝛼=3
1 −𝑥
{3 𝑒 𝑓𝑜𝑟𝑥 ≥ 0
3
∴ 𝑓(𝑥) =
0 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

∞ 1 ∞ −𝑥 −𝑥 −1 −1 −1
i) 𝑃(𝑥 > 1) = ∫1 𝑓(𝑥)𝑑𝑥 = 3 ∫1 𝑒 3 𝑑𝑥 = − [𝑒 3 ] = − [𝑒 −∞ − 𝑒 3 ] = − [0 − 𝑒 3 ] = 𝑒 3
1

3
3 0 3 1 3 −𝑥 −𝑥
1
ii) 𝑃(𝑥 < 3) = ∫−∞ 𝑓(𝑥)𝑑𝑥 = ∫−∞ 𝑓(𝑥)𝑑𝑥 + ∫0 𝑓(𝑥)𝑑𝑥 = 0 + 3 ∫0 𝑒 3 𝑑𝑥 = − [𝑒 3 ] = −[𝑒 −1 − 𝑒 0 ] = 1 − 𝑒
0

2) If X is an exponential variant with mean 4, then find 𝑷(𝟎 < 𝒙 < 𝟏), 𝑷(𝒙 > 𝟐) & 𝑷(−∞ < 𝒙 < 𝟏𝟎).

Soln: Given X be a continuous random variable of an exponential distribution is,


𝛼𝑒 −𝛼𝑥 , 𝑓𝑜𝑟𝑥 ≥ 0
𝑓(𝑥) = {
0 , 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
and given the mean of exponential distribution is 4.
1 1
⇒𝜇=4⇒𝛼=4⇒𝛼=4
1 −𝑥
∴ 𝑓(𝑥) = 𝑒 4 ,𝑥 ≥ 0
{4
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
1
1 1 −𝑥 −𝑥 1 −1 1
𝑃(0 < 𝑥 < 1) = ∫ 𝑓𝑟(𝑥)𝑑𝑥 = ∫ 𝑒 4 𝑑𝑥 = − [𝑒 4 ] = − [𝑒 4 − 𝑒 0 ] = 1 − 1
0 4 0 0
𝑒4

1 ∞ −𝑥 −𝑥 ∞ −2 −1
𝑃(𝑥 > 2) = ∫ 𝑓𝑟(𝑥)𝑑𝑥 = ∫ 𝑒 4 𝑑𝑥 = − [𝑒 4 ] = − [𝑒 −∞ − 𝑒 4 ] = 𝑒 2
2 4 2 2

10 0 10
1 10 −𝑥
𝑃(−∞ < 𝑥 < 10) = ∫ 𝑓𝑟(𝑥)𝑑𝑥 = ∫ 𝑓𝑟(𝑥)𝑑𝑥 + ∫ 𝑓𝑟(𝑥)𝑑𝑥 = 0 + ∫ 𝑒 4 𝑑𝑥
−∞ −∞ 0 4 0

−𝑥 10 −10 1
⇒ 𝑃(−∞ < 𝑥 < 10) = − [𝑒 4 ] = − [𝑒 4 − 𝑒0] = 1 − 5
0
𝑒2

3) In a certain town the duration of shower has mean 5 minutes, what is the probability that shower
will last for,
i)10 minutes and more
ii)Less than 10 minutes
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 22 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
iii)Between 10 & 12 minutes.

Soln: Given X be a continuous random variable of an exponential distribution is,


𝛼𝑒 −𝛼𝑥 ,𝑥 ≥ 0
𝑓(𝑥) = {
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
and given the mean of exponential distribution is 5.
1 1
⇒𝜇=5⇒𝛼=5⇒𝛼=5
1 −𝑥
𝑒 5 ,𝑥 ≥ 0
∴ 𝑓(𝑥) = {5
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

i) The probability that the shower will last 10 minutes and more is,
∞ ∞
1 −𝑥 1 ∞ −𝑥 𝑥 ∞ 1
𝑃(𝑥 ≥ 10) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 𝑑𝑥 = ∫ 𝑒 5 𝑑𝑥 = − [𝑒 −5 ] = −[0 − 𝑒 −2 ] = 2
5
10 10 5 5 10 10 𝑒

ii) The probability that the shower will last less than 10 minutes is,
10 0 10
𝑃(𝑥 < 10) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥
−∞ −∞ 0
10
1 −𝑥 1 10 𝑥 𝑥 10 1
⇒ 𝑃(𝑥 < 10) = 0 + ∫ 𝑒 5 𝑑𝑥 = ∫ 𝑒 −5 𝑑𝑥 = − [𝑒 −5 ] = −[𝑒 −2 − 1] = 1 − 2
0 5 5 0 0 𝑒

iii) The probability that the shower will last between 10 & 12 minutes is,
12 12
1 −𝑥 1 12 𝑥
𝑃(10 < 𝑥 < 12) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 5 𝑑𝑥 = ∫ 𝑒 −5 𝑑𝑥
10 10 5 5 10
𝑥 12 −12 1 1
⇒ 𝑃(10 < 𝑥 < 12) = − [𝑒 −5 ] = − [𝑒 5 − 𝑒 −2 ] = 12 − 2
10 𝑒
𝑒5

4) The life of a TV tube manufactured by a company is known to have mean 200 months. Assuming
that the life of tube has an exponential distribution, find the probability that the life of a tube
manufactured by a company is,
i)Less than 200 months
ii)Between 100 & 300 months
iii)More than 200 months

Soln: Given X be a continuous random variable of an exponential distribution is,


𝛼𝑒 −𝛼𝑥 ,𝑥 ≥ 0
𝑓(𝑥) = {
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
and given the mean of exponential distribution is 200 months.
1 1
⇒ 𝜇 = 200 ⇒ 𝛼 = 200 ⇒ 𝛼 = 200
𝑥
1 −
𝑒 200 ,𝑥 ≥ 0
∴ 𝑓(𝑥) = {200
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

i) The probability that the life of a tube is less than 200 months is,
200 200 200 −𝑥 −𝑥 200
1 −𝑥 1 1
𝑃(𝑥 < 200) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 200 𝑑𝑥 = ∫ 𝑒 200 𝑑𝑥 = − [𝑒 200 ] = −[𝑒 −1 − 𝑒 0 ] = 1 −
−∞ 0 200 200 0 0 𝑒

ii) The probability that the life of a tube is between 100 & 300 months is,
300 300 300
1 − 𝑥 1 𝑥
𝑃(100 ≤ 𝑥 ≤ 300) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 200 𝑑𝑥 = ∫ 𝑒 −200 𝑑𝑥
100 100 200 200 100

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 23 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
𝑥 300 −3 −1 1 1
⇒ 𝑃(100 ≤ 𝑥 ≤ 300) = − [𝑒 −200 ] = − [𝑒 2 − 𝑒2] = 1 − 3
100
𝑒2 𝑒2

iii) The probability that the life of a tube is more than 200 months is,
∞ ∞
1 −𝑥
𝑃(𝑥 > 200) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 200 𝑑𝑥
200 200 200

∞ −𝑥 −𝑥 ∞
1 −∞ 1
⇒ 𝑃(𝑥 > 200) = ∫ 𝑒 200 𝑑𝑥 = − [𝑒 200 ] = − [𝑒 200 − 𝑒 −1 ] = 𝑒 −1 =
200 200 200 𝑒

5) The length of a telephone conversation is an exponential variant with mean 3 minutes. Find the
probability that a call,
i)ends in less than 3 minutes
ii)ends between 3 & 5 minutes
iii)ends in more than 4 minutes

Soln: Given X be a continuous random variable of an exponential distribution is,


𝛼𝑒 −𝛼𝑥 ,𝑥 ≥ 0
𝑓(𝑥) = {
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
and given the mean of exponential distribution is 3 minutes.
1 1
⇒𝜇=3⇒ =3⇒𝛼=
𝛼 3
1 −𝑥
∴ 𝑓(𝑥) = 𝑒 3 ,𝑥 ≥ 0
{3
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

i) The probability that the conversation ends in less than 3 minutes is,
3 3
1 −𝑥 1 3 −𝑥 −𝑥 3 1
𝑃(𝑥 < 3) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 3 𝑑𝑥 = ∫ 𝑒 3 𝑑𝑥 = − [𝑒 3 ] = −[𝑒 −1 − 𝑒 0 ] = 1 −
−∞ 0 3 3 0 0 𝑒

ii) The probability that the conversation ends in between 3 & 5 minutes is,
5 5
1 −𝑥 1 5 −𝑥
𝑃(3 ≤ 𝑥 ≤ 5) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 3 𝑑𝑥 = ∫ 𝑒 3 𝑑𝑥
3 3 3 3 3

𝑥 5 −5 1 1
⇒ 𝑃(100 ≤ 𝑥 ≤ 300) = − [𝑒 −3 ] = − [𝑒 3 − 𝑒 −1 ] = − 5
3 𝑒
𝑒3

iii) The probability that the conversation ends in more than 4 minutes is,
∞ ∞
1 −𝑥
𝑃(𝑥 > 4) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 3 𝑑𝑥
4 4 3

1 ∞ −𝑥 −𝑥 ∞ −∞ −4 −4 1
⇒ 𝑃(𝑥 > 4) = ∫ 𝑒 3 𝑑𝑥 = − [𝑒 3 ] = − [𝑒 3 − 𝑒 3 ] = 𝑒 3 = 4
3 4 4
𝑒3

Normal Distribution:
Let X be a continuous random variable for any real 𝜇 𝑎𝑛𝑑 𝜎 2 , the normal distribution can be defined
as,
−(𝑥−𝜇) 2
1
𝑓(𝑥) = 𝑒 2𝜎2 ,
𝜎 √2𝜋
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 24 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
where −∞ ≤ 𝑥 ≤ ∞ , −∞ ≤ 𝜇 ≤ ∞ and here 𝜇 , 𝜎 2 (> 0) are called the mean and variance of the
normal distribution i.e., widely used in statistical inference, hypothesis testing, data analysis, i.e., to analysis
the data when there is an equal chance for the data to be above and below the average value of the
continuous data. The normal is also known as Gaussian distribution (or) Probability Bell Curve. The normal
distribution is a probability distribution i.e., symmetric about the mean, showing that data near the mean are
more frequent in occurrence than data far from the mean.
The normal distribution follows as,
𝑏
𝑃(𝑎 ≤ 𝑥 ≤ 𝑏) = ∫𝑎 𝑓(𝑥)𝑑𝑥
𝑏
1 −(𝑥−𝜇)2
⇒ 𝑃(𝑎 ≤ 𝑥 ≤ 𝑏) = ∫ 𝑒 2𝜎2 𝑑𝑥
𝑎 𝜎√2𝜋

𝑥−𝜇 2
𝑏 −( )
1 𝜎
⇒ 𝑃(𝑎 ≤ 𝑥 ≤ 𝑏) = ∫ 𝑒 2 𝑑𝑥
𝑎 𝜎√2𝜋
𝑥−𝜇
𝐿𝑒𝑡 𝑍 =
𝜎
𝑥−𝜇 𝑎−𝜇 𝑏−𝜇
Where 𝑧 = 𝜎
, 𝑧1 = 𝜎
, 𝑧2 = 𝜎
𝑧2
1
and 𝐹(𝑧) = 𝑒 − 2 is called the standard normal function
√2𝜋
𝑥−𝜇
and 𝑧 = is called the standard normal variate.
𝜎
when 𝑧1 = 0 , 𝑧2 = 𝑧 , then the normal curve over 0 to 𝑧 is defined as
𝑧 𝑧2
1
𝐴(𝑧) = 𝜑(𝑧) = ∫ 𝑒 − 2 𝑑𝑧
√2𝜋 0
where these values will be taken from Area table of normal distribution.

Sl.N
Probability Range Result Graph
o.

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 25 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301

1 𝑃(−∞ < 𝑧 < ∞) 1

2 𝑃(−∞ < 𝑧 < 0) = 𝑃(0 < 𝑧 < ∞) 0.5

3 𝑃(−𝑧1 < 𝑧 < 𝑧1 ) = 2𝑃(0 < 𝑧 < 𝑧1 ) 2𝐴(𝑧1 )

𝑃(−∞ < 𝑧 < 𝑧1 ) = 0.5 + 𝑃(0 < 𝑧


4 0.5+𝐴(𝑧1 )
< 𝑧1 )

𝑃(𝑧1 < 𝑧 < ∞) = 𝑃(0 < 𝑧 < ∞) − 𝑃(0


5 0.5-𝐴(𝑧1 )
< 𝑧 < 𝑧1 )

𝑃(𝑧1 < 𝑧 < 𝑧2 ) = 𝑃(0 < 𝑧 𝐴(𝑧2 ) -


6
< 𝑧2 ) − 𝑃(0 < 𝑧 < 𝑧1 ) 𝐴(𝑧1 )

𝑃(−𝑧1 < 𝑧 < 𝑧2 ) = 𝑃(0 < 𝑧 𝐴(𝑧2 )


7
< 𝑧2 ) + 𝑃(0 < 𝑧 < 𝑧1 ) +𝐴(𝑧1 )

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 26 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 27 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
PROBLEMS

1) The marks of 1000 students in an examination follows normal distribution with mean
70 and standard deviation 5. Find the number students whose marks will be
i) Less than 65
i) More than 75
ii) Between 65 and 75. [A(1)=0.3413]
Sol.
Let X be the continuous random variable
Given
Mean of the Normal distribution 𝜇 = 70
Standard deviation of the Normal distribution 𝜎 = 5
𝑥−𝜇 𝑥−70
∴The standard normal variate 𝑧 = 𝜎 ⇒ 𝑧 = 5

65−70
When 𝑥 =65 then 𝑧 = = −1
5
5−70
When 𝑥 =75 then 𝑧 = 5
=1
i) No.0f students scored less than 65 marks=𝑃(𝑥 < 65) = 𝑃(𝑧 < −1) =
𝑃(𝑧 > 1) = 0.5 − 𝐴(1) = 0.5 − 0.3413 = 0.1587
No.0f students scored less than 65 marks out of 1000 students=1000x0.1587=158.7=159

ii) No.0f students scored more than 75 marks=𝑃(𝑥 > 75) = 𝑃(𝑧 > 1) =
𝑃(𝑧 > 1) = 0.5 − 𝐴(1) = 0.5 − 0.3413 = 0.1587
No.0f students scored more than 75 marks out of 1000 students=1000x0.1587=158.7=159
iii) No.0f students scored marks between 65 and 75 =𝑃(65 < 𝑥 < 75) = 𝑃(−1 < 𝑧 < 1)
= 2𝑃(0 < 𝑧 < 1)
= 2𝐴(1)
= 2 × 0.3413
= 0.6826
No.0f students scored between65 and 75 marks out of 1000 students=1000x0.6826
=682.6=683

2) 200 students appeared in an examination, distribution of marks is assumed to be


normal with mean 30 and standard deviation 6.25, how many students are expected to
get marks .
i) Between 20 and 40
ii) Less than 35 [A(1.6)=0.4452 , A(0.8)=0.2881]
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 28 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301

Sol.
Let X be the continuous random variable
Given
Mean of the Normal distribution 𝜇 = 30
Standard deviation of the Normal distribution 𝜎 = 6.25
𝑥−𝜇 𝑥−30
∴The standard normal variate 𝑧 = 𝜎 ⇒ 𝑧 = 6.25

20−30
When 𝑥 =20 then 𝑧 = 6.25
= −1.6
40−30
When 𝑥 =40 then 𝑧 = 6.25
= 1.6
35−30
When 𝑥 =35 then 𝑧 = = 0.8
6.25
The probability that number of students expected to score between 20 and 40 marks:
𝑃(20 < 𝑥 < 40) = 𝑃(−1.6 < 𝑧 < 1.6)
⇒ 𝑃(20 < 𝑥 < 40) = 2 × 𝑃(0 < 𝑧 < 1.6)
⇒ 𝑃(20 < 𝑥 < 40) = 2 × 𝐴(1.6)
⇒ 𝑃(20 < 𝑥 < 40) = 2 × 0.4452 = 0.8904 = 0.9

The number of students expected to score between 20 and 40 marks out of 200:
The probability that number of students expected to score less than 35:
= 𝑃(𝑥 < 35) = 𝑃(𝑧 < 0.8)
= 𝐴(0.8) = 0.2881 = 0.3

The number of students expected to score less than 35 marks out of 200:
=200x0.3=60

3) The weekly wages of workers in a company are normally distributed with mean of
Rs.700 and S.D. of Rs.50.Find the probability that the weekly wage of randomly
chosen workers is i) Between Rs.650 and Rs.750 ii) More than Rs.750.
Sol.
Let X be the continuous random variable
Given
Mean of the Normal distribution 𝜇 = 700
Standard deviation of the Normal distribution 𝜎 = 50
𝑥−𝜇 𝑥−700
∴The standard normal variate 𝑧 = 𝜎 ⇒ 𝑧 = 50

650−700
When 𝑥 =650 then 𝑧 = 50
= −1
750−700
When 𝑥 =750 then 𝑧 = =1
50
The probability of the weekly wages between Rs.659 and Rs.750 is:

𝑃(650 < 𝑥 < 750) = 𝑃(−1 < 𝑧 < 1)


⇒ 𝑃(650 < 𝑥 < 750) = 2 × 𝑃(0 < 𝑧 < 1)
⇒ 𝑃(650 < 𝑥 < 750) = 2 × 𝐴(1)
⇒ 𝑃(650 < 𝑥 < 750) = 2 × 0.3413 = 0.6826

The probability of the weekly wages of more than Rs.750 is:


= 𝑃(𝑥 > 750) = 𝑃(𝑧 > 1)
= 0.5 − 𝑃(𝑧 < 1)
= 0.5 − 0.3413
= 0.17065

4) In a test on 2000 electric bulbs , it was found that the life of a particular make was
normally distributed with an average life of 2040 hours and SD of 60 hours . Estimate
the number of bulbs likely to burn for...
i) More than 2150 hours
ii) Less than 1950 hours

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 29 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
iii) Between 1920 and 2160 hours.

Sol.
Let X be the continuous random variable
Given
Mean of the Normal distribution 𝜇 = 2040
Standard deviation of the Normal distribution 𝜎 = 60
𝑥−𝜇 𝑥−2040
∴The standard normal variate 𝑧 = 𝜎 ⇒ 𝑧 = 60

2150−2040
When 𝑥 =2150 then 𝑧 = = 1.83
60
1950−2040
When 𝑥 =1950 then 𝑧 = 60
= −1.5
1920−2040
When 𝑥 =1920 then 𝑧 = 60
= −2
2160−2040
When 𝑥 =2160 then 𝑧 = =2
60

i) The probability that the number of bulbs likely to burn of more than 2150 hours:
𝑃(𝑥 > 2150) = 𝑃(𝑧 > 1.83) =
𝑃(𝑧 > 1.83) = 0.5 − 𝐴(1.83) = 0.5 − 0.4664 = 0.0336
The number of bulbs likely to burn of more than 2150 hours out of 2000 bulbs :
=2000x0.0336
=67.2=67

ii) The probability that the number of bulbs likely to burn of less than 1950 hours:
=𝑃(𝑥 < 1950) = 𝑃(𝑧 < −1.5) =
𝑃(𝑧 > 1.5) = 0.5 − 𝐴(1.5) = 0.5 − 0.4332 = 0.0668
The number of bulbs likely to burn of less than 1950 hours out of 2000 bulbs :
=2000x0.0668
=133.6=137

iii) The probability that the number of bulbs likely to burn between 1920 and 2160 hours
=𝑃(1920 < 𝑥 < 2160) = 𝑃(−2 < 𝑧 < 2)
= 2𝑃(0 < 𝑧 < 2)
= 2𝐴(2)
= 2 × 0.4772
= 0.9544
The number of bulbs likely to burn between 1920 and 2160 hours out of 2000 bulbs :
=2000x0.9544
=1908.8=1909

5) If the life time of a certain types electric bulbs of a particular brand was distributed
normally with an average life of 2000 hours and S.D.60 hours. If a firm purchases 2500
bulbs, find the number of bulbs that are likely to last for (i) more than 2100 hours
(ii) less than 1950 hours
(iii) between 1900 and 2100 hours.
Sol.
Let X be the continuous random variable
Given
Mean of the Normal distribution  = 2000
Standard deviation of the Normal distribution  = 60
x− x − 2000
 The standard normal variate z = z=
 60

1950 − 2000
When x =1950 then z = = −0.83
60

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 30 TAKEITEASY ENGINEERS


Inspire before you expire…, BCS301
1900 − 2000
When x =1900 then z = = −1.66
60
2100 − 2000
When x =2100 then z = = 1.66
60

i) The probability that the number of bulbs likely to burn of more than 2100 hours:
𝑃(𝑥 > 2100) = 𝑃(𝑧 > 1.66)=0.5 − 𝐴(1.66) = 0.5 − 0.4515 = 0.0485
The number of bulbs likely to burn of more than 2100 hours out of 2500 bulbs:
=2500x0.0485
=121.25=121

ii) The probability that the number of bulbs likely to burn of less than 1950 hours:
=𝑃(𝑥 < 1950) = 𝑃(𝑧 < −0.83) = 0.5 − 𝐴(0.83) = 0.5 − 0.2967 = 0.2033
The number of bulbs likely to burn of less than 1950 hours out of 2500 bulbs :
=2500x0.2033=508.25=508
iii) The probability that the number of bulbs likely to burn between 1900 and 2100 hours
= P (1900  x  2100 ) = P (−1.66  z  1.66 )
= 2 P (0  z  1.66 )
= 2 A(1.66 )
= 2  0.4515
= 0.9030
The number of bulbs likely to burn between 1900 and 2100 hours out of 2500 bulbs :
=2500x0.9030
=2257.5=2258

6) In a normal distribution , 7% of items are under 35 and 89% of the items are under
63.Find the mean and standard deviation of the distribution.
Sol.

Let X be the continuous random variable


Given
Let 𝜇 and𝜎 be the Mean and Standard deviation of the distribution
𝑥−𝜇
∴The standard normal variate 𝑧 = 𝜎
--------------(1)
35−𝜇
When x=35 the standard normal variate 𝑧 = 𝜎
= 𝑧1 (𝑆𝑎𝑦)
63−𝜇
When x=63 the standard normal variate 𝑧 = 𝜎
= 𝑧2 (𝑆𝑎𝑦)

Given
𝑃(𝑥 < 35) = 𝑃(𝑧 < 𝑧1 ) = 0.07
⇒ 𝑃(𝑧 < 𝑧1 ) = 𝑃(−∞ < 𝑧 < 0) − 𝑃(0 < 𝑧 < 𝑧1 ) = 0.07
⇒ 0.5 − 𝐴(𝑧1 ) = 0.07
⇒ 𝐴(𝑧1 ) = 0.5 − 0.07

⇒ 𝐴(𝑧1 ) = 𝐴(−1.47)
⇒ 𝑧1 = −1.47
35 − 𝜇
⇒ = −1.47
𝜎
⇒ 𝜇 − 1.47𝜎 = 35 − − − − − −(2)
And 𝑃(𝑥 < 63) = 𝑃(𝑧 < 𝑧2 ) = 0.89
⇒ 𝑃(𝑧 < 𝑧2 ) = 𝑃(−∞ < 𝑧 < 0) + 𝑃(0 < 𝑧 < 𝑧2 ) = 0.89
⇒ 0.5 + 𝐴(𝑧2 ) = 0.89
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 31 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
⇒ 𝐴(𝑧2 ) = 0.89 − 0.5 = 0.39
⇒ 𝐴(𝑧2 ) = 𝐴(1.23)
⇒ 𝑧2 = 1.23
63 − 𝜇
⇒ = 1.23
𝜎
⇒ 𝜇 + 1.23𝜎 = 63 − − − − − − − −(3)
Solving eq(2) and (3)
we get
𝜇 = 50.2915
𝜎 = 10.332

7) In a normal distribution , 31% of items are under 45 and 8% of the items are 0ver
64.Find the mean and standard deviation of the distribution.
Sol.

Let X be the continuous random variable


Given
Let 𝜇 and𝜎 be the Mean and Standard deviation of the distribution
𝑥−𝜇
∴The standard normal variate 𝑧 = 𝜎
--------------(1)
45−𝜇
When x=35 the standard normal variate 𝑧 = = 𝑧1 (𝑆𝑎𝑦)
𝜎
64−𝜇
When x=63 the standard normal variate 𝑧 = 𝜎
= 𝑧2 (𝑆𝑎𝑦)

Given
𝑃(𝑥 < 35) = 𝑃(𝑧 < 𝑧1 ) = 0.31
⇒ 𝑃(𝑧 < 𝑧1 ) = 𝑃(−∞ < 𝑧 < 0) − 𝑃(0 < 𝑧 < 𝑧1 ) = 0.31
⇒ 0.5 − 𝐴(𝑧1 ) = 0.31
⇒ 𝐴(𝑧1 ) = 0.5 − 0.31 = 0.19

⇒ 𝐴(𝑧1 ) = 𝐴(0.5)
⇒ 𝑧1 = 0.5
45 − 𝜇
⇒ = 0.5
𝜎
⇒ 𝜇 + 0.5𝜎 = 45 − − − − − −(2)
And 𝑃(𝑥 > 64) = 𝑃(𝑧 > 𝑧2 ) = 0.08
⇒ 𝑃(𝑧 > 𝑧2 ) = 0.5 − 𝑃(0 < 𝑧 < 𝑧2 ) = 0.08
⇒ 0.5 − 𝐴(𝑧2 ) = 0.08
⇒ 𝐴(𝑧2 ) = 0.08 − 0.5 = −0.42
⇒ 𝐴(𝑧2 ) = 𝐴(1.4)
⇒ 𝑧2 = 1.4
64 − 𝜇
⇒ = 1.4
𝜎
⇒ 𝜇 + 1. .4𝜎 = 64 − − − − − − − −(3)
Solving eq(2) and (3)
we get
𝜇 = 50
𝜎 = 10

8) In an examination 7% of the students scored less than 35% of the marks and 89% of
the students scored less than 60% of the marks. Find the mean and standard
deviation if marks are normally distributed.
Sol.
Let X be the continuous random variable
Given
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 32 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
Let 𝜇 and𝜎 be the Mean and Standard deviation of the distribution
𝑥−𝜇
∴The standard normal variate 𝑧 = 𝜎
--------------(1)
35−𝜇
When x=35 the standard normal variate 𝑧 = 𝜎
= 𝑧1 (𝑆𝑎𝑦)
60−𝜇
When x=63 the standard normal variate 𝑧 = = 𝑧2 (𝑆𝑎𝑦)
𝜎

Given
𝑃(𝑥 < 35) = 𝑃(𝑧 < 𝑧1 ) = 0.07
⇒ 𝑃(𝑧 < 𝑧1 ) = 𝑃(−∞ < 𝑧 < 0) − 𝑃(0 < 𝑧 < 𝑧1 ) = 0.07
⇒ 0.5 − 𝐴(𝑧1 ) = 0.07
⇒ 𝐴(𝑧1 ) = 0.5 − 0.07

⇒ 𝐴(𝑧1 ) = 𝐴(−1.47)
⇒ 𝑧1 = −1.47
35 − 𝜇
⇒ = −1.47
𝜎
⇒ 𝜇 − 1.47𝜎 = 35 − − − − − −(2)
And 𝑃(𝑥 < 60) = 𝑃(𝑧 < 𝑧2 ) = 0.89
⇒ 𝑃(𝑧 < 𝑧2 ) = 𝑃(−∞ < 𝑧 < 0) + 𝑃(0 < 𝑧 < 𝑧2 ) = 0.89
⇒ 0.5 + 𝐴(𝑧2 ) = 0.89
⇒ 𝐴(𝑧2 ) = 0.89 − 0.5 = 0.39
⇒ 𝐴(𝑧2 ) = 𝐴(1.23)
⇒ 𝑧2 = 1.23
60 − 𝜇
⇒ = 1.23
𝜎
⇒ 𝜇 + 1.23𝜎 = 60 − − − − − − − −(3)
Solving eq(2) and (3)
we get
𝜇 = 48.65
𝜎 = 9.25

*****

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 33 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301

||Jai Sri Gurudev||


S.J.C. INSTITUTE OF TECHNOLOGY, CHICKBALLAPUR
DEPARTMENT OF MATHEMATICS
MATHEMATICS-3 FOR COMPUTER SCIENCE STREAM (BCS301)
MODULE - 2
JOINT DISTRIBUTION, STOCHASTIC PROCESS & MARKOV CHAIN

Prepared by:
Purushotham P
Assistant Professor
SJC Institute of Technology
Email id: [email protected]

Joint Probability:
Let 𝑋 = {𝑥1 , 𝑥2 , 𝑥3 , . . . . . . . 𝑥𝑚 } and 𝑌 = {𝑦1 , 𝑦2 , 𝑦3 , . . . . . . . 𝑦𝑛 } are two discrete random variables ,
then the joint probability function of X and Y is defined as
𝑃(𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 ) = 𝑃(𝑥𝑖 , 𝑦𝑗 ) = 𝑓(𝑥𝑖 , 𝑦𝑗 ) = 𝑝𝑖𝑗 = 𝑓𝑖𝑗
where the function 𝑓(𝑥, 𝑦) satisfy the conditions
i) 𝑓(𝑥, 𝑦) ≥ 0ii) ∑𝑖 ∑𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 ) = 1
The joint probability table as shown below,
Y 𝑦1 𝑦2 𝑦3 ..... 𝑦𝑛 𝑓(𝑥𝑖 )
X

𝑥1 .....
𝑝11 𝑝12 𝑝13 𝑝1𝑛 𝑓(𝑥1 )

𝑥2 𝑝22 𝑝23 .....


𝑝21 𝑝2𝑛 𝑓(𝑥2 )

𝑥3 .....
𝑝31 𝑝32 𝑝33 𝑝3𝑛 𝑓(𝑥3 )

. .....
. . . . . .
. .
. . . . .
. . . . . .
.
. . . . .
.

𝑥𝑚 𝑝𝑚2 𝑝𝑚3 .....


𝑝𝑚1 𝑝𝑚𝑛 𝑓(𝑥𝑚 )

𝑔(𝑦𝑖 ) 𝑔(𝑦1 ) 𝑔(𝑦2 ) 𝑔(𝑦3 ) ..... 𝑔(𝑦𝑛 ) 1

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 1 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301
Marginal Probability Distributions:
In the joint probability table 𝑓(𝑥1 ), 𝑓(𝑥2 ), 𝑓(𝑥3 ), . . . . . . . . , 𝑓(𝑥𝑚 ) 𝑎𝑛𝑑 𝑔(𝑦1 ), 𝑔(𝑦2 ), 𝑔(𝑦3 ), . . . . . . . . , 𝑔(𝑦𝑛 ) are
called the marginal probability distributions respectively and represents the sum of all entries in all the rows
and columns.

Independent Random Variables:


The discrete random variables X and Y are said to be independent if,
𝑃(𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 ) = 𝑃(𝑋 = 𝑥𝑖 ). 𝑃(𝑌 = 𝑦𝑗 ) , for every i, j and it is equivalent to
𝑃(𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 ) = 𝑃(𝑋 = 𝑥𝑖 ). 𝑃(𝑌 = 𝑦𝑗 ) = 𝑓(𝑥𝑖 ). 𝑔(𝑦𝑗 ) or COV(X,Y) = 0

Expectation, Variance & Covariance:


Let X be the random variable taking the random values 𝑥1 , 𝑥2 , 𝑥3 , . . . . . . . 𝑥𝑚 , having the probability
function 𝑓(𝑥). Then,

a) The expectation of X is denoted by E(X) and is defined as, 𝜇𝑋 = 𝐸(𝑋) = ∑𝑚


𝑖=1 𝑥𝑖 𝑓(𝑥𝑖 )

b) The expectation of Y is denoted by E(Y) and is defined as, 𝜇𝑌 = 𝐸(𝑌) = ∑𝑛


𝑗=1 𝑦𝑗 𝑓(𝑦𝑗 )

c) The variance of X is denoted by 𝜎𝑋 2 and is defined as 𝜎𝑋 2 = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2


𝑚
⇒ 𝜎𝑋 2 = ∑ 𝑥𝑖 2 𝑓(𝑥𝑖 ) − 𝜇𝑋 2
𝑖=1

d) The variance of Y is denoted by 𝜎𝑌 2 and is defined as 𝜎𝑌 2 = 𝐸(𝑌 2 ) − [𝐸(𝑌)]2


𝑛
⇒ 𝜎𝑌 2 = ∑ 𝑦𝑗 2 𝑓(𝑦𝑗 ) − 𝜇𝑌 2
𝑖=1

e) The covariance of X and Y is denoted by COV(X,Y) and defined as 𝐶𝑂𝑉(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝐸(𝑋). 𝐸(𝑌)
𝑚 𝑛

𝐶𝑂𝑉(𝑋, 𝑌) = ∑ ∑ 𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 ) − 𝜇𝑋 . 𝜇𝑌
𝑖=1 𝑗=1

𝐶𝑂𝑉(𝑋,𝑌)
f) The correlation between X and Y is 𝜌(𝑋, 𝑌) =
𝜎𝑋 𝜎𝑌

PROBLEMS

1) The joint distribution of two random variables X and Y are as follows:


Y -4 2 7
X
1 1⁄ 1⁄ 1⁄
8 4 8
5 1⁄ 1⁄ 1⁄
4 8 8
Compute the following,
i) E(X) and E(Y)
ii) E(XY)
iii) 𝝈𝑿 & 𝝈𝒀
iv) 𝝆(𝑿, 𝒀)

Soln: Given,
𝑥1 = 1, 𝑥2 = 5, 𝑦1 = −4, 𝑦2 = 2, 𝑦3 = 7
And the probabilities are
1 1 1 1 1 1
𝑝11 = 8 , 𝑝12 = 4 , 𝑝13 = 8 , 𝑝21 = 4 , 𝑝22 = 8 , 𝑝23 = 8
Given the joint probability distribution is follows as

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 2 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301

Y -4 2 7 𝑓(𝑥𝑖 )
X
1 1⁄ 1⁄ 1⁄ 1⁄
8 4 8 2
5 1⁄ 1⁄ 1⁄ 1⁄
4 8 8 2
𝑔(𝑦𝑖 ) 3⁄ 3⁄ 1⁄ 1
8 8 4

The marginal distribution of X and Y are


𝑥𝑖 1 5 𝑦𝑖 -4 2 7
𝑓(𝑥𝑖 ) 1 ⁄2 1⁄2 𝑔(𝑦𝑖 ) 3⁄ 3⁄ 1⁄
8 8 4

1 1
i) 𝜇𝑋 = 𝐸(𝑋) = ∑2𝑖=1 𝑥𝑖 𝑓(𝑥𝑖 ) = (1 × 2) + (5 × 2) = 3
3 3 1
𝜇𝑌 = 𝐸(𝑌) = ∑3𝑗=1 𝑦𝑗 𝑔(𝑦𝑗 ) = (−4 × ) + (2 × ) + (7 × ) = 1
8 8 4

ii) 𝐸(𝑋𝑌) = ∑2𝑖=1 ∑3𝑗=1 𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 )


1 1 1 1 1 1
= (1 × (−4) × ) + (1 × 2 × ) + (1 × 7 × ) + (5 × (−4) × ) + (5 × 2 × ) + (5 × 7 × )
8 4 8 4 8 8
3
=
2

1 1
iii) 𝜎𝑋 2 = 𝐸(𝑋 2 ) − 𝜇𝑋 2 = ∑2𝑖=1 𝑥𝑖 2 𝑓(𝑥𝑖 ) − 𝜇𝑋 2 = (12 × 2) + (52 × 2) − 9 = 13 − 9 = 4 ⇒ 𝜎𝑋 = 2
3 3 1 75
𝜎𝑌 2 = 𝐸(𝑌 2 ) − 𝜇𝑌 2 = ∑3𝑗=1 𝑦𝑗 2 𝑔(𝑦𝑗 ) − 𝜇𝑌 2 = ((−4)2 × 8) + (22 × 8) + (72 × 4) − 12 = 4
⇒ 𝜎𝑌 = 4.33

3 3
iv) 𝐶𝑂𝑉(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝜇𝑋 𝜇𝑌 = − (3)(1) = −
2 2

𝐶𝑂𝑉(𝑋,𝑌) −3⁄
2
v) 𝜌(𝑋, 𝑌) = = = −0.1732
𝜎𝑋 𝜎𝑌 2×4.33
Hence the given random variables are not independent

2) The joint distribution of two random variables X and Y are as follows:


Y -2 -1 4 5
X
1 0.1 0.2 0 0.3
2 0.2 0.1 0.1 0
Find the marginal distribution of X and Y. Also find the covariance of X and Y.
n
Sol : Given,
𝑥1 = 1, 𝑥2 = 2, 𝑦1 = −2, 𝑦2 = −1, 𝑦3 = 4, 𝑦4 = 5
And the probabilities are
𝑝11 = 0.1, 𝑝12 = 0.2, 𝑝13 = 0, 𝑝14 = 0.3, 𝑝21 = 0.2, 𝑝22 = 0.1, 𝑝23 = 0.1, 𝑝13 = 0
Given the joint probability distribution is follows as
Y -4 2 7 𝑓(𝑥𝑖 )
X
1 0.1 0.2 0 0.3 0.6
2 0.2 0.1 0.1 0 0.4
𝑔(𝑦𝑖 ) 0.3 0.3 0.2 0.3 1

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 3 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301

The marginal distribution of X and Y are


𝑥𝑖 1 2 𝑦𝑖 -2 -1 4 5
𝑓(𝑥𝑖 ) 0.6 0.4 𝑔(𝑦𝑖 ) 0.3 0.3 0.1 0.3

i) 𝜇𝑋 = 𝐸(𝑋) = ∑2𝑖=1 𝑥𝑖 𝑓(𝑥𝑖 ) = (1 × 0.6) + (2 × 0.4) = 1.4


𝜇𝑌 = 𝐸(𝑌) = ∑3𝑗=1 𝑦𝑗 𝑔(𝑦𝑗 ) = (−2 × 0.3) + (−1 × 0.3) + (4 × 0.1) + (5 × 0.3) = 1

ii) 𝐸(𝑋𝑌) = ∑2𝑖=1 ∑3𝑗=1 𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 )


= (1)(−2)(0.1) + (1)(−1)(0.2) + (1)(4)(0) + (1)(5)(0.3) + (2)(−2)(0.2) + (2)(−1)(0.1) +
(2)(4)(0.1) + (2)(5)(0)
= −0.2 − 0.2 + 0 + 1.5 − 0.8 − 0.2 + 0.8 + 0 = 2.3 − 1.4 = 0.9

iii) 𝜎𝑋 2 = 𝐸(𝑋 2 ) − 𝜇𝑋 2 = ∑2𝑖=1 𝑥𝑖 2 𝑓(𝑥𝑖 ) − 𝜇𝑋 2


= (12 × 0.6) + (22 × 0.4) − (1.4)2 = 2.2 − 1.96 = 0.24 ⇒ 𝜎𝑋 = 0.4898
𝜎𝑌 2 = 𝐸(𝑌 2 ) − 𝜇𝑌 2 = ∑3𝑗=1 𝑦𝑗 2 𝑔(𝑦𝑗 ) − 𝜇𝑌 2
= ((−2)2 × 0.3) + ((−1)2 × 0.3) + (42 × 0.1) + (52 × 0.3) − 12 = 9.6 ⇒ 𝜎𝑌 = 3.0983

iv) 𝐶𝑂𝑉(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝜇𝑋 𝜇𝑌 = 0.9 − (1.4)(1) = −0.5

𝐶𝑂𝑉(𝑋,𝑌) −0.5
vi) 𝜌(𝑋, 𝑌) = = = −0.3294
𝜎𝑋 𝜎𝑌 0.4898×3.0983
Hence the given random variables are not independent

3) Determine,
i) Marginal distribution.
ii) Covariance between the discrete random variables X and Y,
using the joint probability distribution.
Y 3 4 5
X
2 1⁄ 1⁄ 1⁄
6 6 6
5 1⁄ 1⁄ 1⁄
12 12 12
7 1⁄ 1⁄ 1⁄
12 12 12

Soln: Given,
𝑥1 = 2, 𝑥2 = 5, 𝑥3 = 7, 𝑦1 = 3, 𝑦2 = 4, 𝑦3 = 5
And the probabilities are
1 1 1 1 1 1 1 1 1
𝑝11 = 6 , 𝑝12 = 6 , 𝑝13 = 6 , 𝑝21 = 12 , 𝑝22 = 12 , 𝑝23 = 12 𝑝31 = 12 , 𝑝32 = 12 , 𝑝33 = 12

The joint distribution table is as follows:


Y 3 4 5 𝑓(𝑥𝑖 )
X
2 1⁄ 1⁄ 1⁄ 1⁄
6 6 6 2
5 1⁄ 1⁄ 1⁄ 1⁄
12 12 12 4
7 1⁄ 1⁄ 1⁄ 1⁄
12 12 12 4
𝑔(𝑦𝑖 ) 1⁄ 1⁄ 1⁄ 1
3 3 3

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 4 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301
The marginal distributions of X and Y are

𝑥𝑖 2 5 7 𝑦𝑖 3 4 5
𝑓(𝑥𝑖 ) 1⁄ 1⁄ 1⁄ 𝑔(𝑦𝑖 ) 1⁄ 1⁄ 1⁄
2 4 4 3 3 3

𝜇𝑋 = 𝐸(𝑋) = ∑3𝑖=1 𝑥𝑖 𝑓(𝑥𝑖 )


1 1 1
⇒ 𝜇𝑋 = (2 × ) + (5 × ) + (7 × )
2 4 4
⇒ 𝜇𝑋 = 4
𝜇𝑌 = 𝐸(𝑌) = ∑3𝑗=1 𝑦𝑗 𝑔(𝑦𝑗 )
1 1 1
⇒ 𝜇𝑌 = (3 × ) + (4 × ) + (5 × )
3 3 3
⇒ 𝜇𝑌 = 4
𝐸(𝑋𝑌) = ∑3𝑖=1 ∑3𝑗=1 𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 )
1 1 1 1 1 1
⇒ 𝐸(𝑋𝑌) = (2 × 3 × ) + (2 × 4 × ) + (2 × 5 × ) + (5 × 3 × ) + (5 × 4 × ) + (5 × 5 × )
6 6 6 12 12 12
1 1 1
+ (7 × 3 × ) + (7 × 4 × ) + (7 × 5 × ) = 16
12 12 12

∴ 𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝜇𝑋 𝜇𝑌
⇒ 𝐶𝑜𝑣(𝑋, 𝑌) = 16 − 4 × 4
⇒ 𝐶𝑜𝑣(𝑋, 𝑌) = 0
Hence the given random variables are independent

4) The joint probability distribution of discrete random variables X and Y is given below:
Y 1 3 6
X
1 1⁄ 1⁄ 1⁄
9 6 18
3 1⁄ 1⁄ 1⁄
6 4 12
6 1⁄ 1⁄ 1⁄
18 12 36
Determine,
i) Marginal distribution of X and Y.
ii) Are X and Y statistically independent?

Soln: Given,
𝑥1 = 1, 𝑥2 = 3, 𝑥3 = 6, 𝑦1 = 1, 𝑦2 = 3, 𝑦3 = 6
And the probabilities are
1 1 1 1 1 1 1 1 1
𝑝11 = 9 , 𝑝12 = 6 , 𝑝13 = 18 , 𝑝21 = 6 , 𝑝22 = 4 , 𝑝23 = 12 , 𝑝31 = 18 , 𝑝32 = 12 , 𝑝33 = 36

The joint distribution table is as follows


Y 1 3 6 𝑓(𝑥𝑖 )
X
1 1⁄ 1⁄ 1⁄ 1⁄
9 6 18 3
3 1⁄ 1⁄ 1⁄ 1⁄
6 4 12 2
6 1⁄ 1⁄ 1⁄ 3⁄
18 12 36 18
𝑔(𝑦𝑖 ) 1⁄3 1 ⁄2 3⁄ 1
18

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 5 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301
i) The marginal distributions of X and Y are,

𝑥𝑖 1 3 6 𝑦𝑖 1 3 6
𝑓(𝑥𝑖 ) 1⁄ 1⁄ 3⁄ 𝑔(𝑦𝑖 ) 1⁄ 1⁄ 3⁄
3 2 18 3 2 18
1 3
ii) 𝜇𝑋 = 𝐸(𝑋) = ∑3𝑖=1 𝑥𝑖 𝑓(𝑥𝑖 ) = 3 + 2 + 1 = 2.8333
1 3
𝜇𝑌 = 𝐸(𝑌) = ∑3𝑗=1 𝑦𝑗 𝑔(𝑦𝑗 ) = 3 + 2 + 1 = 2.8333

1 3 6 3 9 18 6 18 36
𝐸(𝑋𝑌) = ∑3𝑖=1 ∑3𝑗=1 𝑥𝑖 𝑦𝑗 𝑝𝑖𝑗 = 9 + 6 + 18 + 6 + 4 + 12 + 18 + 12 + 36 = 8.0278

𝐶𝑂𝑉(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝜇𝑋 𝜇𝑌 = 8.0278 − (2.8333)(2.8333) = 8.0278 − 8.0276 = 0.0002


The given random variables X and Y are not statistically independent.

5) Determine,
i) Marginal distribution.
ii) Covariance between the discrete random variables X and Y along with corelation using the joint
probability distribution.
Y 1 3 9
X
2 1⁄ 1⁄ 1⁄
8 24 12
4 1⁄ 1⁄ 0
4 4
6 1⁄ 1⁄ 1⁄
8 24 12

Soln: Given
𝑥1 = 2, 𝑥2 = 4, 𝑥3 = 6, 𝑦1 = 1, 𝑦2 = 3, 𝑦3 = 9
And the probabilities are
1 1 1 1 1 1 1 1
𝑝11 = 8 , 𝑝12 = 24 , 𝑝13 = 12 , 𝑝21 = 4 , 𝑝22 = 4 , 𝑝23 = 0, 𝑝31 = 8 , 𝑝32 = 24 , 𝑝33 = 12

The joint distribution table is as follows

Y 1 3 9 𝑓(𝑥𝑖 )
X
2 1⁄ 1⁄ 1⁄ 1⁄
8 24 12 4
4 1⁄ 1⁄ 0 1⁄
4 4 2
6 1⁄ 1⁄ 1⁄ 1⁄
8 24 12 4
𝑔(𝑦𝑖 ) 1⁄ 1⁄ 1⁄ 1
2 3 6

The marginal distributions of X and Y are

𝑦𝑖 3 4 5
𝑥𝑖 2 4 6
𝑔(𝑦𝑖 ) 1⁄ 1⁄ 1⁄
𝑓(𝑥𝑖 ) 1⁄ 1⁄ 1⁄ 2 3 6
4 2 4
1 1 1
𝜇𝑋 = 𝐸(𝑋) = ∑3𝑖=1 𝑥𝑖 𝑓(𝑥𝑖 ) = (2 × 4) + (4 × 2) + (6 × 4) = 0.5 + 2 + 1.5 = 4
1 1 1
𝜇𝑌 = 𝐸(𝑌) = ∑3𝑗=1 𝑦𝑗 𝑔(𝑦𝑗 ) = (1 × ) + (3 × ) + (9 × ) = 0.5 + 1 + 1.5 = 3
2 3 6

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 6 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301
𝐸(𝑋𝑌) = ∑3𝑖=1 ∑3𝑗=1 𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 )
= (1)(−2)(0.1) + (1)(−1)(0.2) + (1)(4)(0) + (1)(5)(0.3) + (2)(−2)(0.2) + (2)(−1)(0.1) +
(2)(4)(0.1) + (2)(5)(0)
= −0.2 − 0.2 + 0 + 1.5 − 0.8 − 0.2 + 0.8 + 0 = 2.3 − 1.4 = 0.9

𝜎𝑋 2 = 𝐸(𝑋 2 ) − 𝜇𝑋 2 = ∑3𝑖=1 𝑥𝑖 2 𝑓(𝑥𝑖 ) − 𝜇𝑋 2


1 1 1
= (22 × 4) + (42 × 2) + (62 × 4) − 42 = 18 − 16 = 2 ⇒ 𝜎𝑋 = 1.4142
𝜎𝑌 2 = 𝐸(𝑌 2 ) − 𝜇𝑌 2 = ∑3𝑗=1 𝑦𝑗 2 𝑔(𝑦𝑗 ) − 𝜇𝑌 2
1 1 1
= (12 × 2) + (32 × 3) + (92 × 6) − 32 = 17 − 9 = 8 ⇒ 𝜎𝑌 = 2.8284

𝐶𝑂𝑉(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝜇𝑋 𝜇𝑌 = 12 − (4)(3) = 12 − 12 = 0

𝐶𝑂𝑉(𝑋,𝑌) 0
𝜌(𝑋, 𝑌) = 𝜎𝑋 𝜎𝑌
= 1.4142×2.8284 = 0
The given random variables X and Y are not statistically independent.

6) Determine,
i) Marginal distribution.
ii) Covariance between the discrete random variables X and Y along with corelation using the joint
probability distribution.

Y -3 2 4
X
1 0.1 0.2 0.2
2 0.3 0.1 0.1

Soln: Given
𝑥1 = 1, 𝑥2 = 2, 𝑦1 = −3, 𝑦2 = 2, 𝑦3 = 4
And the probabilities are
𝑝11 = 0.1, 𝑝12 = 0.2, 𝑝13 = 0.2, 𝑝21 = 0.3, 𝑝22 = 0.1, 𝑝23 = 0.1

The joint distribution table is as follows


Y -3 2 4 𝑓(𝑥𝑖 )
X
1 0.1 0.2 0.2 0.5
2 0.3 0.1 0.1 0.5
𝑔(𝑦𝑖 ) 0.4 0.3 0.3 1

The marginal distributions of X and Y are

𝑥𝑖 1 2 𝑦𝑖 -3 2 4
𝑓(𝑥𝑖 ) 0.5 0.5 𝑔(𝑦𝑖 ) 0.4 0.3 0.3

𝜇𝑋 = 𝐸(𝑋) = ∑2𝑖=1 𝑥𝑖 𝑓(𝑥𝑖 ) = (1 × 0.5) + (2 × 0.5) = 0.5 + 1 = 1.5


𝜇𝑌 = 𝐸(𝑌) = ∑3𝑗=1 𝑦𝑗 𝑔(𝑦𝑗 ) = (−3 × 0.4) + (2 × 0.3) + (4 × 0.3) = −1.2 + 0.6 + 1.2 = 0.6

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 7 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301
𝐸(𝑋𝑌) = ∑2𝑖=1 ∑3𝑗=1 𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 )
= (1)(−3)(0.1) + (1)(2)(0.2) + (1)(4)(0.2) + (2)(−3)(0.3) + (2)(2)(0.1) + (2)(4)(0.1)
= −0.3 + 0.4 + 0.8 − 1.8 + 0.4 + 0 = 2.3 − 1.4 = 0.9

𝜎𝑋 2 = 𝐸(𝑋 2 ) − 𝜇𝑋 2 = ∑2𝑖=1 𝑥𝑖 2 𝑓(𝑥𝑖 ) − 𝜇𝑋 2


= (12 × 0.5) + (22 × 0.5) − 1.52 = 2.5 − 2.25 = 0.25 ⇒ 𝜎𝑋 = 0.5
𝜎𝑌 2 = 𝐸(𝑌 2 ) − 𝜇𝑌 2 = ∑3𝑗=1 𝑦𝑗 2 𝑔(𝑦𝑗 ) − 𝜇𝑌 2
= (−32 × 0.4) + (22 × 0.3) + (42 × 0.3) − 0.62 = 9.6 − 0.36 = 9.24 ⇒ 𝜎𝑌 = 3.0397

𝐶𝑂𝑉(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝜇𝑋 𝜇𝑌 = 0.9 − (1.5)(0.6) = 0.9 − 0.9 = 0

𝐶𝑂𝑉(𝑋,𝑌) 0
𝜌(𝑋, 𝑌) = 𝜎𝑋 𝜎𝑌
= 0.5×3.0397 = 0
The given random variables X and Y are not statistically independent.
𝟏 𝟏
7) X and Y are independent random variables. X takes the values 2,5 and 7 with probabilities 𝟐 , 𝟒 and
𝟏 𝟏 𝟏 𝟏
𝟒
respectively. Y takes the values 3,4 and 5 with the probabilities 𝟑 , 𝟑 & 𝟑.
a) Find the JPD of X and Y
b) Show that COV (X, Y) = 0
n
Sol :
Given X & Y are independent random variables follows the marginal probabilities as below.
𝑥 2 5 7 𝑦 3 4 5
𝑓(𝑥) 1 1 1 𝑔(𝑦) 1 1 1
2 4 4 2 4 4

The joint distribution table is as follows

Y 3 4 5 𝑓(𝑥𝑖 )
X
2 1 1 1 1
6 6 6 2
5 1 1 1 1
12 12 12 4
7 1 1 1 1
12 12 12 4
𝑔(𝑦𝑖 ) 1 1 1
1
3 3 3

𝟏 𝟏 𝟏
∴ 𝝁𝑿 = 𝑬(𝑿) = ∑ 𝒙𝒊 𝒇(𝒙𝒊 ) = (𝟐 × 𝟐) + (𝟓 × 𝟒) + (𝟕 × 𝟒) = 𝟒
𝟏 𝟏 𝟏
∴ 𝝁𝒀 = 𝑬(𝑿) = ∑ 𝒚𝒋 𝒈(𝒚𝒋 ) = (𝟑 × 𝟑) + (𝟒 × 𝟑) + (𝟓 × 𝟑) = 𝟒
8 10 15 20 25 21 28 35 192
∴ 𝐸(𝑋𝑌) = ∑2𝑖=1 ∑3𝑗=1 𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 ) = 1 + + + + + + + + = = 16
6 6 12 12 12 12 12 12 12
∴ 𝐶𝑂𝑉(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝜇𝑋 𝜇𝑌 = 16 − (4)(4) = 16 − 16 = 0

Stochastic Process
Stochastic process consists of sequence of experiments in which each experiment has a finite number of
outcomes with the given probabilities.

Probability Vector
A vector 𝑉 = [𝑣1 , 𝑣2 , 𝑣3 , . . . . . . , 𝑣𝑛 ] is called the probability vector if each one of its components are non-
negative and their sum is equal to unity or 1.
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 8 TAKEITEASY ENGINEERS
Inspire before you expire…, TIE- Notes and Resources App BCS301
1 1 1
Ex: = [0.1, 0.6, 0.3] , 𝑉 = [ 3 , 3 , 3] , etc…

Stochastic Matrix
A square matrix P is called a stochastic matrix if all the entries of P are non-negative and the sum of all the
entries of any row is 1
(or)
A square matrix P is called a stochastic matrix where each row is in the form of the probability vector.
1 1 0 1 0
1 1
Ex: = [2 2] , 𝑃 = [0 2 2 ]
2 1
1 1 1
3 3
2 4 4

Regular Stochastic Matrix


A matrix P is said to be a Regular Stochastic Matrix, if all the entries of some power (𝑃𝑛 ) are positive. The
Regular Stochastic Matrix P has a unique probability vector Q such that QP=Q and all the sum of the probabilities of a
fixed vector matrix should be equal to 1.
𝑝11 𝑝12 𝑝13 . . . . 𝑝1𝑛
𝑝21 𝑝22 𝑝23 . . . 𝑝2𝑛
𝑃 = 𝑝31 𝑝32 𝑝33 . . . 𝑝3𝑛
... ... ... ... ...
[𝑝𝑛1 𝑝𝑛2 𝑝𝑛3 . . . . 𝑝𝑛𝑛 ]

Transition Matrix
A transition matrix is also known as a stochastic or probability matrix, is a square matrix (n x n) representing
the transition probabilities of a stochastic system.
0 1 0
1 1 1
Ex: 𝑃 = [ 2 4 4]
1 1 1
3 3 3

PROBLEMS

𝟎 𝟎 𝟏
1) Verify that the matrix 𝐀 = [𝟏⁄𝟐 𝟏⁄𝟒 𝟏⁄𝟒] is a regular stochastic matrix.
𝟎 𝟏 𝟎
Soln: Given matrix A , each element is nonnegative and the sum of the elements in each row is equal to 1.
∴A is stochastic matrix.
0 0 1 0 0 1 0 1 0
2 1 1 1 1 1 1 1⁄ 5⁄ 9⁄
Let 𝐴 = [ ⁄2 ⁄4 ⁄4] [ ⁄2 ⁄4 ⁄4] = [ 8 16 16]
0 1 0 0 1 0 1⁄2 1⁄4 1⁄4

0 1 0 1⁄ 1⁄ 1⁄
0 0 1 2 4 4
3 2 1⁄ 5⁄ 9⁄ 1 1 1 5 41 13
𝐴 =𝐴 ×𝐴=[ 8 16 16] . [ ⁄2 ⁄4 ⁄4] = ⁄32 ⁄64 ⁄64

1 2 1⁄4 1⁄4 0 1 0 5⁄ 9⁄
[ 1⁄8 16 16 ]

∴Hence, all the entries in 𝐴3 are nonnegative or positive and the sum of each row =1.

∴ The given matrix A is regular stochastic matrix.

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 9 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301
𝟎 𝟐 ⁄𝟑 𝟏⁄𝟑
2) Prove that the Markov Chain with Transition matrix 𝐀 = 𝟏⁄𝟐 𝟎 𝟏⁄𝟐 is irreducible.
𝟏 𝟏⁄
[ ⁄𝟐 𝟐 𝟎 ]
n
Sol : Given matrix A is a stochastic matrix (Being a transition matrix).
Also, all the elements of given matrix does have non-negative and the sum of each row=1.
0 2⁄3 1⁄3 0 2⁄3 1⁄3 1⁄ 1⁄ 1⁄
2 6 3
∴ 𝐴2 = 1⁄2 0 1⁄2 . 1⁄2 0 1⁄2 = 1⁄4 7⁄12 1⁄16
1 1 1 1 1 1⁄ 5⁄
[ ⁄2 ⁄2 0 ] [ ⁄2 ⁄2 0 ] [ ⁄4 3 12]
∴Hence, all the entries in 𝐴2 are nonnegative or positive and the sum of each row =1.
Hence the given transition matrix A is regular, consequently it follows that the given Markov Chain is
irreducible.

𝟏⁄ 𝟐⁄
3) Find the fixed probability vector for the regular stochastic matrix 𝐀 = [ 𝟑 𝟑].
𝟏⁄ 𝟑⁄
𝟒 𝟒
1⁄ 2⁄
Soln: Given 𝐴 = [ 3 3]
1⁄ 3⁄
4 4
Since, the given matrix A is of second order.
Let 𝑄 = [𝑥 𝑦] be the fixed probability vector, for every 𝑥 ≥ 0, 𝑦 ≥ 0&𝑥 + 𝑦 = 1
𝟏⁄ 𝟐⁄
∴ 𝑸𝑨 = [𝒙 𝒚]. [ 𝟑 𝟑] = [𝟏 𝒙 + 𝟏 𝒚 𝟐 𝒙 + 𝟑 𝒚]
𝟏⁄ 𝟑⁄ 𝟑 𝟒 𝟑 𝟒
𝟒 𝟒
Since QA=Q
1 1 2 3
⇒ [3 𝑥 + 4 𝑦 𝑥 + 𝑦] = [𝑥 𝑦]
3 4
1 1 2 3
⇒ 𝑥 + 𝑦 = 𝑥, 𝑥 + 𝑦 = 𝑦
3 4 3 4
2 1 2 1
⇒ 𝑥 = 𝑦, 𝑥 = 𝑦. . . . (1), 𝑊𝑒ℎ𝑎𝑣𝑒𝑥 + 𝑦 = 1 ⇒ 𝑦 = 1 − 𝑥
3 4 3 4
2 1
∴ (1) ⇒ 𝑥 = (1 − 𝑥)
3 4
2 1 1
⇒ 𝑥+ 𝑥=
3 4 4
2 1 1
⇒ 3𝑥 + 4𝑥 = 4
8𝑥+3𝑥 1
⇒ 12
=4
11 3
⇒ 3
𝑥=1⇒ 𝑥 = 11
3 8
∴ 𝑦 = 1 − 𝑥 ⇒ 𝑦 = 1 − 11 ⇒ 𝑦 = 11
Thus, the required fixed probability vector is 𝑄 = [𝑥 𝑦] = [ 3 8
]
11 11

𝟎 𝟏 𝟎
4) Find the fixed probability vector of the regular stochastic matrix 𝐏 = [𝟎 𝟎 𝟏].
𝟏 𝟏
𝟎
𝟐 𝟐
n
Sol : Since the given matrix P is of order 3x3, the required fixed probability vector Q must be also order of
3x3.
Let 𝑄 = [𝑥 𝑦 𝑧], 𝐹𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑥 ≥ 0, 𝑦 ≥ 0, 𝑧 ≥ 0&𝑥 + 𝑦 + 𝑧 = 1
Also, QP=Q

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 10 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301
0 1 0
∴ [𝑥 𝑦 𝑧 ] [ 0 0 1] = [ 𝑧 𝑥 + 𝑧 𝑦 ]
1 1 2 2
0
2 2
𝑧 𝑧
⇒ 𝑥
[2 𝑦] = [𝑥 𝑦 𝑧]
+2
𝑧 𝑧
⇒ = 𝑥, 𝑥 + = 𝑦, 𝑦 = 𝑧
2 2
1 1
⇒ (1 − 𝑥 − 𝑦) = 𝑥. . . . . . . (1), 𝑥 + (1 − 𝑥 − 𝑦) = 𝑦. . . . (2), 𝑦 = 1 − 𝑥 − 𝑦. . . . (3)
2 2
⇒ 3𝑥 + 𝑦 = 1, 𝑥 − 3𝑦 = −1, 𝑥 + 2𝑦 = 1
1 2 1 2 2
⇒ 𝑥 = ,𝑦 = ⇒ 𝑧 = 1 − − =
5 5 5 5 5
1 2 2
Hence the required fixed probability vector is 𝑄 = [𝑥 𝑦 𝑧] = [5 5 5]
𝟏 𝟏 𝟏
𝟐 𝟒 𝟒
5) Find the fixed probability vector of the regular stochastic matrix 𝐏 = [𝟏𝟎
𝟏 ].
𝟐 𝟐
𝟎 𝟏 𝟎
Soln:
1 1 1
2 4 4
Given, 𝑃 = 0 [1 1]
2 2
0 1 0
Since the given matrix P is of order 3x3, the required fixed probability vector Q must be also order of 3x3.
Let 𝑄 = [𝑥 𝑦 𝑧]𝐹𝑜𝑟𝑒𝑣𝑒𝑟𝑦𝑥 ≥ 0, 𝑦 ≥ 0, 𝑧 ≥ 0&𝑥 + 𝑦 + 𝑧 = 1
Also, QP=Q
1 1 1
2 4 4
∴ 𝑄𝑃 = [𝑥 0
𝑦 𝑧] [ 1 1]
2 2
0 1 0
1 1 1 1 1
⇒ 𝑄𝑃 = [2 𝑥 + 2 𝑦 4 𝑥 + 𝑧 4
𝑥 + 2 𝑦]
𝑊𝐾𝑇
𝑄𝑃 = 𝑄
1 1 1 1 1
⇒ [2 𝑥 + 2 𝑦 4
𝑥 +𝑧 4
𝑥 + 2 𝑦] = [𝑥 𝑦 𝑧]
𝑥 𝑦 𝑥 1 1
⇒ 𝑥 = 2 + 2 , 𝑦 = 4 + 𝑧, 𝑧 = 4 𝑥 + 2 𝑦
𝑥 𝑦 𝑥 1 1
⇒ 2 + 2 = 0, 4 + (1 − 𝑥 − 𝑦) − 𝑦 = 0, 4 𝑥 + 2 𝑦 = 1 − 𝑥 − 𝑦
⇒ 𝑥 + 𝑦 = 0,3𝑥 + 8𝑦 = 4. . . . . (1),5𝑥 + 6𝑦 = 4. . . . . . (2)

By solving eq (1) & (2)

4 4 4 4 8 3
⇒ 𝑥 = 11 , 𝑦 = 11 , 𝑧 = 1 − 11 − 11 = 1 − 11 = 11
𝑄 = [𝑥 𝑦 𝑧] = [ 4 4 3
]
11 11 11
𝟎 𝟏 𝟎
𝟏 𝟏 𝟏
6) Find the fixed probability vector of the regular stochastic matrix 𝐏 = [ 𝟔 𝟐 𝟑 ].
𝟐 𝟏
𝟎 𝟑 𝟑
n
Sol :
0 1 0
1 1 1
Given, 𝑃 = [ 6 2 3]
2 1
0 3 3
Since the given matrix P is of order 3x3, the required fixed probability vector Q must be also order of 3x3.

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 11 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301
Let 𝑄 = [𝑥 𝑦 𝑧]𝐹𝑜𝑟𝑒𝑣𝑒𝑟𝑦𝑥 ≥ 0, 𝑦 ≥ 0, 𝑧 ≥ 0&𝑥 + 𝑦 + 𝑧 = 1
Also, QP=Q
0 1 0
1 1 1
∴ 𝑄𝑃 = [𝑥 𝑦 𝑧] [ 6 2 3]
2 1
0 3 3
1 1 2 1 1
⇒ 𝑄𝑃 = [6 𝑦 𝑥+ 2
𝑦 + 3𝑧 3𝑦 + 3 𝑧]
𝑊𝐾𝑇, 𝑄𝑃 = 𝑄
1 1 2 1 1
⇒ [6 𝑦 𝑥+ 𝑦+ 𝑧 𝑦 + 𝑧] = [𝑥 𝑦 𝑧]
2 3 3 3
1 1 2 1 1
⇒ 𝑥 = 6 𝑦, 𝑦 = 𝑥 + 2 𝑦 + 3 𝑧, 𝑧 = 3 𝑦 + 3 𝑧
⇒ 6𝑥 − 𝑦 = 0,2𝑥 − 7𝑦 = −4. . . . (1)2𝑥 + 3𝑦 = 2. . . . . . . (2)

By solving eq (1) & (2)

1 6 1 6 7 3
⇒ 𝑥 = 10 , 𝑦 = 10 , 𝑧 = 1 − 10 − 10 = 1 − 10 = 10
𝑄 = [𝑥 𝑦 𝑧] = [ 1 6 3
]
10 10 10

𝟐 𝟏
𝟎
𝟑 𝟑
𝟏 𝟏
7) Find the fixed probability vector of the regular stochastic matrix 𝐏 = 𝟐
𝟎 𝟐
.
𝟏 𝟏
[𝟐 𝟐
𝟎]
n
Sol :
2 1
0
3 3
1 1
Given, 𝑃 = 2
0 2
1 1
[2 2
0]
Since the given matrix P is of order 3x3, the required fixed probability vector Q must be also order of 3x3.
Let 𝑄 = [𝑥 𝑦 𝑧]𝐹𝑜𝑟𝑒𝑣𝑒𝑟𝑦𝑥 ≥ 0, 𝑦 ≥ 0, 𝑧 ≥ 0&𝑥 + 𝑦 + 𝑧 = 1
Also, QP=Q
2 1
0 3 3
1 1
∴ 𝑄𝑃 = [𝑥 𝑦 𝑧]
2
0 2
1 1
[2 2 0]
1 1 2 1 1 1
⇒ 𝑄𝑃 = [2 𝑦 + 2 𝑧 3
𝑥+ 𝑧
2 3
𝑥 + 𝑦]
2
𝑊𝐾𝑇
𝑄𝑃 = 𝑄
1 1 2 1 1 1
⇒ [2 𝑦 + 2 𝑧 3
𝑥 + 2𝑧 3
𝑥 + 2 𝑦] = [𝑥 𝑦 𝑧]
1 1 2 1 1 1
⇒ 𝑥 = 2 𝑦 + 2 𝑧, 𝑦 = 3 𝑥 + 2 𝑧, 𝑧 = 3 𝑥 + 2 𝑦
⇒ 3𝑥 − 1 = 0, 𝑥 − 9𝑦 = −3,8𝑥 + 9𝑦 = 6
9 10 8
⇒ 𝑥 = 27 , 𝑦 = 27 , 𝑧 = 27
∴ 𝑄[𝑥 𝑦 𝑧] = [ 9 10 8
]
27 27 27

𝟏−𝒂 𝒂 𝟏−𝒃 𝒃
8) If 𝐏𝟏 = [ ] and 𝐏𝟐 = [ ]. Show that P1, P2 and P1 P2 are stochastic matrices.
𝒃 𝟏−𝐛 𝒂 𝟏−𝒂

Soln: In P1 we have 𝑎 + (1 − 𝑎) = 1 and 𝑏 + (1 − 𝑏) = 1


In P2 we have 𝑏 + (1 − 𝑏) = 1 and 𝑎 + (1 − 𝑎) = 1
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 12 TAKEITEASY ENGINEERS
Inspire before you expire…, TIE- Notes and Resources App BCS301
Thus, P1 and P2 are stochastic matrices.

1−𝑎 𝑎 1−𝑏 𝑏
Now, 𝐏𝟏 𝐏𝟐 = [ ][ ]
𝑏 1−b 𝑎 1−𝑎
(1 − 𝑎)(1 − 𝑏) + 𝑎2 𝑏(1 − 𝑎) + 𝑎(1 − 𝑎) 𝑎1 𝑏1
=[
(1 2 ] = [𝑎 𝑏2
] (𝑠𝑎𝑦)
𝑏(1 − 𝑏) + 𝑎(1 − 𝑏) − 𝑎)(1 − 𝑏) + 𝑏 2
We shall know that 𝑎1 + 𝑏1 = 1 and 𝑎2 + 𝑏2 = 1
Now,
𝑎1 + 𝑏1 = (1 − 𝑎)(1 − 𝑏) + 𝑎2 + 𝑏(1 − 𝑎) + 𝑎(1 − 𝑎)
= (1 − 𝑎){1 − 𝑏 + 𝑏} + 𝑎{𝑎 + 1 − 𝑎}
=1−𝑎+𝑎
=1
∴ 𝑎1 + 𝑏1 = 1

Also,
𝑎2 + 𝑏2 = 𝑏(1 − 𝑏) + 𝑎(1 − 𝑏) + (1 − 𝑏)(1 − 𝑎) + 𝑏 2
= 𝑏{1 − 𝑏 + 𝑏} + (1 − 𝑏){𝑎 + 1 − 𝑎}
=𝑏+1−𝑏
=1
∴ 𝑎2 + 𝑏2 = 1

Thus, 𝐏𝟏 𝐏𝟐 is a stochastic matrix.

Markov Chain
A Markov Chian or Markov process is a stochastic model describing a sequence of possible events in which the
probability of each event depends only on the state attained in the previous event.

Ex (1).

Ex (2).

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 13 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301
Higher transition probabilities:
Let P be a 𝑛 × 𝑛 transition probability matrix of the Markov chain with the probabilities 𝑝𝑖𝑗 , 1 ≤ 𝑖, 𝑗 ≤ 𝑛 , is
called changes from 𝑎𝑖 to the state 𝑎𝑗 , that is 𝑎𝑖 → 𝑎𝑗 .The probabilities that the system changes from 𝑎𝑖 to
the state 𝑎𝑗 in exists 𝑛 steps is denoted by 𝑝(𝑛) 𝑖𝑗 and the matrix formed by the probabilities 𝑝(𝑛) 𝑖𝑗 is called
the 𝑛 - step transition matrix , denoted by 𝑃(𝑛) and initial probabilities are defined as,
𝑝(0) = [𝑝1 (0) , 𝑝2 (0) , 𝑝3 (0) , 𝑝4 (0) … . . 𝑝𝑛 (0) ]
𝑝(1) = [𝑝1 (1) , 𝑝2 (1) , 𝑝3 (1) , 𝑝4 (1) . . . . . 𝑝𝑛 (1) ]
𝑝(2) = [𝑝1 (2) , 𝑝2 (2) , 𝑝3 (2) , 𝑝4 (2) . . . . . 𝑝𝑛 (2) ]
−−−−−−−−−−−−−−−−−−
𝑝(𝑛) = [𝑝1 (𝑛) , 𝑝2 (𝑛) , 𝑝3 (𝑛) , 𝑝4 (𝑛) . . . . . 𝑝𝑛 (𝑛) ]
And 𝑝(1) , 𝑝(2) , 𝑝(3) , 𝑝(4) . . .. will be evaluated as
𝑝(1) = 𝑝(0) 𝑃, 𝑝(2) = 𝑝(1) 𝑃 = 𝑝(0) 𝑃2 , 𝑝(3) = 𝑝(2) 𝑃 = 𝑝(0) 𝑃3 ….. 𝑝(𝑛) = 𝑝(𝑛−1) 𝑃 = 𝑝(0) 𝑃𝑛

𝒕 𝟎 𝟏
1) Consider the 𝒕. 𝒑. 𝒎. of the 𝑷 = [𝟏⁄ 𝟏⁄ ] , hence find 𝑷𝟐 , 𝑷𝟑 , also find 𝒑(𝟑) take the initial
𝒃 𝟐 𝟐
probability distribution the person rolled a die and decided that he will go by bus if the number
appeared on the face is divisible by 3.

Soln: Given,

𝑡 0 1 𝑝𝑡𝑡 𝑝𝑡𝑏
𝑃 = [1⁄ 1⁄ ]=[𝑝 𝑝𝑏𝑏 ]
𝑏 2 2 𝑡𝑏

0 1 0 1
∴ 𝑃2 = [1⁄ 1⁄ ] . [1⁄ 1⁄ ]
2 2 2 2
1⁄ 1⁄ (2) (2)
⇒ 𝑃2 = [ 2 2] = [ 𝑝 𝑡𝑡 𝑝 𝑡𝑏 ]
1⁄ 3⁄ 𝑝(2) 𝑡𝑏 𝑝(2) 𝑏𝑏
4 4
1⁄ 1⁄
∴ 𝑃3 = 𝑃2 . 𝑃 = [ 2 2].[ 0 1
]
1⁄ 3⁄ 1 1
⁄2 ⁄2
4 4
1⁄ 3⁄ 𝑝 (3)
𝑝(3) 𝑡𝑏
⇒ 𝑃3 = [ 4 4] = [ 𝑡𝑡
]
3⁄ 5⁄ 𝑝(3) 𝑡𝑏 𝑝(3) 𝑏𝑏
8 8
1
∴ 𝑝(2) 𝑡𝑏 = Means that the probability that the system changes from the state 𝑡 → 𝑏 in exactly 2
2
1
steps is 2.
3
∴ 𝑝(3) 𝑏𝑡 = 8 Means that the probability that the system changes from the state 𝑏 → 𝑡 in exactly 3
3
steps is 8.

Given, the probability distribution is evaluated from the person rolled a die and decided that he will
go by bus if the number appeared on the face is divisible by 3.
2 1 1 2
∴ 𝑝(𝑏) = 6 = 3 , ⇒ 𝑝(𝑡) = 1 − 𝑝(𝑏) = 1 − 3 = 3
2 1
𝑝(0) = [𝑝(𝑡) 𝑝(𝑏)] = [3 3
]
1 ⁄2 1⁄
∴𝑝 (2)
=𝑝 (0) 2
𝑃 =
2
[3
1
] . [ 2] = [ 5 7
]
3 1 3
⁄4 ⁄4 12 12

1 3⁄
⁄4
(3) (0) 3 2 1 4] = [ 7 17
= [𝑝𝑡 (3)
∴𝑝 =𝑝 𝑃 = [3 3
] . [
3 5 24 24
] 𝑝𝑏 (3) ]
⁄8 ⁄8
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 14 TAKEITEASY ENGINEERS
Inspire before you expire…, TIE- Notes and Resources App BCS301
7
∴ The probability of travelling by train after 3 days =24

17
∴ The probability of travelling by bus after 3 days =24

𝟏⁄ 𝟏⁄
2) The transition matrix P of a Markov chain is given by [ 𝟐 𝟐] with initial probability
𝟑⁄ 𝟏⁄
𝟒 𝟒
distribution𝒑(𝟎) = [𝟏⁄𝟒 𝟑⁄𝟒]. Define and find the following i) 𝒑𝟐𝟏 (𝟐) ii) 𝒑𝟏𝟐 (𝟐) iii) 𝒑(𝟐) iv) 𝒑𝟏 (𝟐) v)
the vector 𝒑(𝟎) 𝑷𝒏 approaches. Vi) The matrix approaches.

Soln:

1⁄ 1⁄
Given transition matrix 𝑃 = [ 2 2]
3⁄ 1⁄
4 4
1⁄ 1⁄ 1⁄ 1⁄
𝑃2 = 𝑃. 𝑃 = [ 2 2] . [ 2 2]
3⁄ 1⁄ 3⁄ 1⁄
4 4 4 4
5⁄ 3⁄ (2)
8 ] = [𝑝11 𝑝12 (2)
⇒ 𝑃2 = [ 8 ]
9⁄ 7⁄ 𝑝21 (2) 𝑝22 (2)
16 16
(2) 9 (2) 3
∴ 𝑝21 = 16 , 𝑝12 = 8
Given initial probability distribution is 𝑝(0) = [1⁄4 3⁄4]
5⁄ 3⁄
∴ 𝑝(2) = 𝑝(0) 𝑃2 = [1⁄4 3⁄4]. [ 8 8]
9⁄ 7⁄
16 16
37 27
⇒ 𝑝(2) = [64 64] = [𝑝1 (2) 𝑝2 (2) ]
37
∴ 𝑝1 (2) = 64
𝑝(0) 𝑃𝑛 Approaches the unique probability vector 𝑄 = [𝑥 𝑦] for which 𝑄𝑃 = 𝑄
1⁄ 1⁄
⇒ [𝑥 𝑦] [ 2 2] = [𝑥 𝑦]
3⁄ 1⁄
4 4
𝑥 3𝑦 𝑥 𝑦
⇒ [2 + 4 2 + 4 ] = [𝑥 𝑦]
𝑥 3𝑦 𝑥 𝑦
⇒2+ 4
= 𝑥, 2 + 4 = 𝑦
𝑥 3𝑦 𝑥 𝑦
⇒ + = 𝑥, + = 𝑦
2 4 2 4
𝑥 3(1−𝑥)
⇒ −2 + 4 = 0
𝑥 3𝑥 3
⇒ −2 − 4 = −4
5 3
⇒ 4𝑥 = 4
3 2
⇒𝑥=5⇒𝑦=5
3 2
∴ 𝑄[𝑥 𝑦] = [5 5
]
3 2
Therefore, the vector 𝑝(0) 𝑃𝑛 approaches the vector [5 5
]
3 2

Therefore, the vector 𝑃𝑛 approaches the matrix [53 5


2]
5 5

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 15 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301
𝟏⁄ 𝟏⁄
𝟐 𝟎 𝟐
3) The t.p.m. of a Markov chain is given by 𝑷 = [ 𝟏 𝟎 𝟎 ] and the initial probability
𝟏⁄ 𝟏⁄ 𝟏⁄
𝟒 𝟐 𝟒
𝟏 𝟏
distribution is 𝒑(𝟎) = (𝟐 𝟐
𝟎). Find 𝒑𝟏𝟑 (𝟐) , 𝒑𝟐𝟑 (𝟐), 𝒑(𝟐) 𝒑𝟏 (𝟐).

Soln:

1⁄ 1⁄
2 0 2
Given transition matrix of Markov chain 𝑃 = [ 1 0 0 ]
1⁄ 1⁄ 1⁄
4 2 4
1 1
And the initial probability distribution 𝑝(0) = (2 2
0)
1⁄ 1⁄ 1⁄ 1⁄
2 0 2 2 0 2
∴ 𝑃2 = 𝑃. 𝑃 = [ 1 0 0 ].[ 1 0 0 ]
1⁄ 1⁄ 1⁄ 1⁄ 1⁄ 1⁄
4 2 4 4 2 4
3⁄ 1⁄ 3⁄ 𝑝 11 𝑝 12 𝑝 (2)13
(2) (2)
8 4 8
⇒ 𝑃2 == 1⁄2 0 1⁄ = [𝑝(2)
2 21 𝑝
(2)
22 𝑝
(2)
23 ]
11⁄ 1⁄ 3⁄ (2) (2) (2)
[ 16 𝑝 31 𝑝 32 𝑝 33
8 16]
∴ 𝑝(2)13 = 3⁄8 , 𝑝(2) 23 = 1⁄2
3⁄ 1⁄ 3⁄
8 4 8
(2) (0) 2 1 1 1 1
∴ 𝑝 = 𝑝 𝑃 = (2 2 0) . ⁄2 0 ⁄2
11 1 3
[ ⁄16 ⁄8 ⁄16]
7 1 7
∴ 𝑝(2) = [𝑝1 (2) 𝑝2 (2) 𝑝3 (2) ] = [16 8 16
]
7
∴ 𝑝1 (2) = 16

𝟎 𝟐⁄𝟑 𝟏⁄𝟑
4) Prove that the Markov chain whose t.p.m is 𝑷 = 𝟏⁄𝟐 𝟎 𝟏⁄𝟐 is irreducible. Find the
𝟏 𝟏⁄
[ ⁄𝟐 𝟐 𝟎 ]
corresponding stationary probability vector.

Soln:

Given transition matrix of Markov chain


0 2⁄3 1⁄3
𝑃 = 1⁄2 0 1⁄2
1 1
[ ⁄2 ⁄2 0 ]
0 4 2
1
⇒ 𝑃 = 6 [ 3 0 3]
3 3 0
0 4 2 0 4 2
1 1
⇒ 𝑃2 = 𝑃. 𝑃 = 6 [3 0 3] . 6 [3 0 3]
3 3 0 3 3 0
18 6 12
1
⇒ 𝑃2 == 36 [ 9 21 6 ]
9 12 15
Since all the entries of 𝑃2 are non- negative, thus the given t.p.m P is regular and hence the Markov
chain having t.p.m P is irreducible.
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 16 TAKEITEASY ENGINEERS
Inspire before you expire…, TIE- Notes and Resources App BCS301
Let the unique probability vector 𝑄 = [𝑥 𝑦 𝑧] for which 𝑄𝑃 = 𝑄 , ∀𝑥 + 𝑦 + 𝑧 = 1
0 2⁄3 1⁄3
𝑄𝑃 = [𝑥 𝑦 𝑧] 1⁄2 0 1⁄2 = [𝑥 𝑦 𝑧]
1 1
[ ⁄2 ⁄2 0 ]
𝑦 𝑧 2𝑥 𝑧 𝑥 𝑦
⇒[ + + + ] = [𝑥 𝑦 𝑧 ]
2 2 3 2 3 2
𝑦 𝑧 2𝑥 𝑧 𝑥 𝑦
⇒ 2
+ 2 = 𝑥, 3 + 2 = 𝑦, 3 + 2 = 𝑧
⇒ 2𝑥 − 𝑦 − 𝑧 = 0,4𝑥 − 6𝑦 + 3𝑧 = 0,2𝑥 + 3𝑦 − 6𝑧 = 0
⇒ 2𝑥 − 𝑦 − (1 − 𝑥 − 𝑦) = 0,4𝑥 − 6𝑦 + 3(1 − 𝑥 − 𝑦) = 0
⇒ 3𝑥 = 1&𝑥 − 9𝑦 = −3
1 1 10 10
⇒ 𝑥 = 3 , 3 − 9𝑦 = −3 ⇒ 9𝑦 = 3
⇒ 𝑦 = 27
1 10 8
∴𝑧 =1−𝑥−𝑦 =1−3− 27
= 27
∴ 𝑄[𝑥 𝑦 𝑧] = [1 10 8
] is the required stationary probability vector.
3 27 27

5) A student’s study habits are as follows. If he studies one night, he is 30% sure to study the next
night. On the other hand, if he does not study one night, he is 40% sure to study the next night.
Find the transition matrix for the chain of his study.

Soln: We have two possible states


𝑎1 = Studying 𝑎2 =Not studying
Therefore, given that
𝑝11 =Probability of studying on night, given that he has studied in the previous night=30%=0.3
𝑝12 =Probability of not studying on night, given that he has studied the previous night=70%=0.7
𝑝21 =Probability of studying on night, given that he has not studied the previous night=40%=0.4
𝑝22 =Probability of not studying on night, given that he has not studied the previous night=60%=0.6

𝑝11 𝑝12 0.3 0.7


Accordingly, the transition matrix of the chain of study is 𝑃 = [𝑝 𝑝22 ] = [0.4 0.6]
21
Let the unique probability vector 𝑄 = [𝑥 𝑦] for which 𝑄𝑃 = 𝑄
0.3 0.7
∴ [𝑥 𝑦] [ ] = [𝑥 𝑦]
0.4 0.6
⇒ [0.3𝑥 + 0.4𝑦 0.7𝑥 + 0.6𝑦] = [𝑥 𝑦]
⇒ 0.3𝑥 + 0.4𝑦 = 𝑥, 0.7𝑥 + 0.6𝑦 = 𝑦
⇒ 0.7𝑥 − 0.4𝑦 = 0
⇒ 0.7𝑥 − 0.4(1 − 𝑥) = 0
⇒ 1.1𝑥 − 0.4 = 0
0.4 4 4 7
⇒𝑥= = ⇒𝑦 =1− =
1.1 11 11 11
4 7
∴ 𝑄[𝑥 𝑦] = [ ] = [𝑝𝑎1 𝑝𝑎2 ]
11 11
4
Thus, we conclude that in the long run the student will study 11 of the time or 36.36 % of the time.

6) A software engineer goes to his work-place every day by motor bike or by car. He never goes by a
bike on two consecutive days; but if he goes by car on a day then he is equally likely to go by car
or bike on the next day. Find the transition matrix for the chain of the mode of transport he uses.
If car is used on the first day of a week, find the probability that, (i) Bike is used, (ii) Car is used
on the fifth day.
n
Sol : Given the Markov chain of the mode of transport has the following two states:
𝑎1 = Using bike 𝑎2 =Using car
And to find,
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 17 TAKEITEASY ENGINEERS
Inspire before you expire…, TIE- Notes and Resources App BCS301
𝑝11 =Probability of using bike on a day, given that bike has been used on the previous day=0
(Because bike is not used on two consecutive days)
𝑝12 =Probability of using car on a day, given that the bike has been used on the previous day=1
(Because it is certain that car is used on a day if bike is used on the previous day)
𝑝21 =Probability of using bike on a day, given that car is used on the previous day= ½
(Because using car or bike on a day are equally likely if car is used on the previous day)
𝑝22 =Probability of using car on a day, given that car is used on the previous day=1/2
𝑝11 𝑝12 0 1
Hence the transition matrix for the chain of the mode of transport is 𝑃 = [𝑝 𝑝 ] = [ 1 1 ].
21 22 2 2
∴ The initial probability distribution vector of the mode of transport is given by 𝑝(0) = [𝑝1 (0) 𝑝2 (0) ] =
[0 1]
0 1 0 1
 𝑃2 = 𝑃. 𝑃 = [1⁄ 1⁄ ] . [1⁄ 1⁄ ]
2 2 2 2
1⁄ 1⁄
⇒ 𝑃2 = [ 2 2]
1⁄ 3⁄
4 4
1⁄ 1⁄ 1 1
⇒ 𝑃4 = 𝑃2 . 𝑃2 = [ 2 2] . [ ⁄2 ⁄2]
1⁄ 3⁄ 1⁄ 3⁄
4 4 4 4
3⁄ 5⁄
⇒ 𝑃4 = [ 8 8 ]
5⁄ 11⁄
16 16
3⁄ 5⁄
∴ 𝑝(4) = 𝑝(0) 𝑃4 = [0 1]. [ 8 8 ]
5⁄ 11⁄
16 16
(4) (4) (4) 5 11
⇒ 𝑝 = [𝑝1 𝑝2 ] = [16 16]
5
Therefore, on the fifth day the probability of using the bike is 𝑝1 (4) = 16 , the probability of
11
using the car is 𝑝2 (4) = 16.

7) A man’s smoking habits are as follows. If he smokes filter cigarettes one week, he switches to non-
filter cigarettes the next week with the probability 0.2. On the other hand, if he smokes non filter
cigarettes one week there is a probability of 0.7that he will smoke non filter cigarettes the next
week as well. In the long run how often does he smoke filter cigarettes?
n
Sol :
Let A= Smoking filter cigarettes B= Smoking non filter cigarettes
Therefore, the associated transition probability matrix is as follows
𝑝 (1) 𝑝𝐴𝐵 (1) 0.8 0.2
𝑃 = [ 𝐴𝐴 (1) ]=[ ]
𝑝𝐵𝐴 𝑝𝐵𝐵 (1) 0.3 0.7
Let the unique probability vector 𝑄 = [𝑥 𝑦] for which 𝑄𝑃 = 𝑄, ∀𝑥 + 𝑦 = 1
0.8 0.2
∴ [𝑥 𝑦] [ ] = [𝑥 𝑦]
0.3 0.7
⇒ [0.8𝑥 + 0.3𝑦 0.2𝑥 + 0.7𝑦] = [𝑥 𝑦]
⇒ 0.8𝑥 + 0.3𝑦 = 𝑥, 0.2𝑥 + 0.7𝑦 = 𝑦
⇒ 0.2𝑥 − 0.3𝑦 = 0,0.2𝑥 − 0.3𝑦 = 0
⇒ 0.2𝑥 − 0.3(1 − 𝑥) = 0
⇒ 0.2𝑥 + 0.3𝑥 − 0.3 = 0
⇒ 0.5𝑥 = 0.3
0.3 3 0.2 2
⇒ 𝑥 = 0.5 = 5 ⇒ 𝑦 = 0.5 = 5
3 2
∴ 𝑄 = [5 5
] = [𝑝𝐴 𝑝𝐵 ]
3
Thus, in the long run, he will smoke filter cigarettes 5 or 60% of the time.
(SMOKING IS INJURIOUS TO HEALTH, IT CAUSES CANCER AND TOBACCO CAUSES PAINFUL DEATH)
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 18 TAKEITEASY ENGINEERS
Inspire before you expire…, TIE- Notes and Resources App BCS301
8) Three boys A, B, C are throwing ball to each other. “A” always throws the ball to “B” and “B”
always throws ball to “C”. “C” is just as likely to throw the ball to “B” as to “A”. If, “C” was the
first person to throw the ball, find the probabilities that after three throws.
i) A has the ball
ii) B has the ball
iii) C has the ball

Soln: Given three boys A, B, C are throwing a ball associated with the transition probability matrix of the
Markov chain as below,
𝑝𝐴𝐴 (1) 𝑝𝐴𝐵 (1) 𝑝𝐴𝐶 (1) 0 1 0
𝑃 = [𝑝𝐵𝐴 (1)
𝑝𝐵𝐵 (1) (1)
𝑝𝐵𝐶 ] = [ 0 0 1]
1 1
⁄2 ⁄2 0
𝑝𝐶𝐴 (1) 𝑝𝐶𝐵 (1) 𝑝𝐶𝐶 (1)
0 1 0 0 1 0
⇒ 𝑃 = 𝑃. 𝑃 = [ 0
2 0 1] . [ 0 0 1]
1⁄ 1⁄ 0 1⁄ 1⁄ 0
2 2 2 2
0 0 1
1 1
⇒ 𝑃2 = [ ⁄2 ⁄2 0 ]
0 1⁄2 1⁄2
0 0 1 0 1 0
3 2 1⁄ 1⁄ 0 0 0 1]
∴ 𝑃 = 𝑃 .𝑃 = [ 2 2 ].[
1 1
0 1⁄2 1⁄2 ⁄2 ⁄2 0
1⁄ 1⁄
2 2 0
⇒ 𝑃3 = 0 1⁄2 1⁄2
1 1 1
[ ⁄4 ⁄4 ⁄2]
Initially if C has the ball, associated with the initial probability vector is given by 𝑝(0) = [0 0 1]
1⁄ 1⁄
2 2 0
∴ 𝑝(3) = 𝑝(0) 𝑃3 = [0 0 1]. 0 1⁄2 1⁄2 = [1⁄4 1⁄4 1⁄2]
1⁄ 1⁄ 1⁄
[[ 4 4 2]]
∴ 𝑝(3) = [𝑝𝐴 (3) 𝑝𝐵 (3) 𝑝𝐶 (3) ] = [1⁄4 1⁄4 1⁄2]
1 1
Thus, after three throws, the probability that the ball is with A is 𝑝𝐴 (3) = 4, with B is 𝑝𝐵 (3) = 4 and
1
with C is 𝑝𝐶 (3) = 2

9) A gambler’s luck follows a pattern: if he wins a game, the probability of winning next game is 0.6.
However, he loses the game, the probability of losing the next game is 0.7. There is an even chance
of gambler winning the first game if so,
i) What is the probability of winning second game.
ii) What is the probability of winning the third game.
iii) In the long run, how often he will win.

Soln: Let W = Win the game , L = Lose the game


The transition probability matrix is given,
𝑝𝑤𝑤 𝑝𝑤𝑙 0.6 0.4
𝑃 = [𝑝 𝑝𝑙𝑙 ] = [0.3 0.7]
𝑙𝑤
And we know that the probability of winning and losing have the equal priority.
∴ The initial probability vector 𝑝(0) = [𝑝𝑤 (0) 𝑝𝑙 (0) ] = [0.5 0.5]
0.6 0.4
∴ 𝑝(1) = 𝑝(0) 𝑃 = [0.5 0.5]. [ ] = [0.45 0.55]
0.3 0.7

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 19 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301
(2) (1) 0.6 0.4
∴ 𝑝 = 𝑝 𝑃 = [0.45 0.55]. [ ] = [0.435 0.565]
0.3 0.7
0.6 0.4
∴ 𝑝(3) = 𝑝(2) 𝑃 = [0.435 0.565]. [ ] = [0.4305 0.5605]
0.3 0.7
i) ∴ 𝑝𝑤 (2) = 0.435 = 43.5%
ii) ∴ 𝑝𝑤 (3) = 0.4305 = 43.05%
iii) Let Q=[x y] be the probability vector for which x+y=1
∴ 𝑄𝑃 = 𝑄
0.6 0.4
∴ [𝑥 𝑦]. [ ] = [𝑥 𝑦]
0.3 0.7
⇒ 0.6𝑥 + 0.3𝑦 = 𝑥 , 0.4𝑥 + 0.7𝑦 = 𝑦
⇒ 0.4𝑥 − 0.3𝑦 = 0
⇒ 0.4𝑥 − 0.3(1 − 𝑥) = 0
⇒ 0.4𝑥 − 0.3 + 0.3𝑥 = 0
⇒ 0.7𝑥 = 0.3
3
⇒𝑥=7
3 4
⇒𝑦 =1−𝑥 ⇒𝑦 =1−7⇒𝑦 =7
∴ 𝑄 = [𝑝𝑤 𝑝𝑙 ] = [3 4
]
7 7

10) A Salesman’s territory consists of three cities A, B, C. He never sells in the same city on
successive days. If he sells in city A then the next day he sells in city B. If he sells in B or C then
the next day is twice as likely to sell in city A as than other cities. In long run, how often does he
sells in each of the city.
n
Sol :
Given a salesman can move to the cities A, B, C with the probabilities as below,
(1)
𝑝𝐴𝐵 (1) 𝑝𝐴𝐶 (1) 0 1 0
𝐴 𝑝𝐴𝐴 2 1
𝑃 = 𝐵 [𝑝𝐵𝐴 (1) 𝑝𝐵𝐵 (1) 𝑝𝐵𝐶 (1) ] = [ ⁄3 0 ⁄3]
𝐶 𝑝𝐶𝐴 (1) 𝑝𝐶𝐵 (1) 𝑝𝐶𝐶 (1) 2 1
⁄3 ⁄3 0
Let 𝑄 = [𝑥 𝑦 𝑧] be the probability vector for which x+y+z=1
∴ 𝑄𝑃 = 𝑄
0 1 0
2⁄ 1
∴ [𝑥 𝑦 𝑧]. [ 3 0 ⁄3] = [𝑥 𝑦 𝑧]
2⁄ 1⁄
3 3 0
2𝑦 2𝑧 𝑧 𝑦
⇒[ + 3
𝑥+
3
] = [𝑥 𝑦 𝑧]
3 3
2𝑦 2𝑧 𝑧 𝑦
⇒ 3
+ 3 = 𝑥 , 𝑥 +3 =𝑦 , 3
=𝑧
⇒ 3𝑥 − 2𝑦 − 2𝑧 = 0 , 3𝑥 − 3𝑦 + 𝑧 = 0
⇒ 3𝑥 − 2𝑦 − 2(1 − 𝑥 − 𝑦) = 0 , 3𝑥 − 3𝑦 + (1 − 𝑥 − 𝑦) = 0
⇒ 3𝑥 − 2𝑦 − 2 + 2𝑥 + 2𝑦 = 0 , 3𝑥 − 3𝑦 + 1 − 𝑥 − 𝑦 = 0
⇒ 5𝑥 = 2 , 2𝑥 − 4𝑦 = −1
2
⇒𝑥=5
9 9
⇒ 4𝑦 = 5 ⇒ 𝑦 = 20
2 9 3
⇒ 𝑧 = 1 − 𝑥 − 𝑦 ⇒ 𝑧 = 1 − 5 − 20 ⇒ 𝑧 = 20
∴ 𝑄 = [𝑥 𝑦 𝑧 ] = [2 9 3
]
5 20 20
Thus, the salesman in the long run sells,
2 9 3
𝑖𝑛 𝑐𝑖𝑡𝑦 𝐴 = 40% , 𝑖𝑛 𝑐𝑖𝑡𝑦 𝐵 = 45% , 𝑖𝑛 𝑐𝑖𝑡𝑦 𝐶 = 15%
5 20 20

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 20 TAKEITEASY ENGINEERS


Inspire before you expire…, TIE- Notes and Resources App BCS301
11) Every year, a man trades his car for a new car. If he has a Maruthi, he trades it for an Ambassador.
If he has an Ambassador, he trades it for Santro. However if he had a Santro, he is just as likely
to trade it for a Maruthi or an Ambassador. In 2000 he bought his first car which was a Santro.
Find the probability that he has,
i) 2002 Santro
ii) 2002 Maruthi
iii) 2003 Ambassador
iv) 2003 Santro

Soln:
Given a man trades his car for a new car with the probabilities as below,
(1)
𝑀 𝑝𝑀𝑀 𝑝𝑀𝐴 (1) 𝑝𝑀𝑆 (1) 𝑀 0 1 0
𝑃 = 𝐴 [ 𝑝𝐴𝑀 (1)
𝑝𝐴𝐴 (1)
𝑝𝐴𝑆 ] = 𝐴 [ 0
(1) 0 1]
1⁄ 1
𝑆 𝑝𝑆𝑀 (1)
𝑝𝑆𝐴 (1)
𝑝𝑆𝑆 (1) 𝑆 2 ⁄2 0
Also given, he has bought his first car in 2000 was Santro.
(0) (0) (0)
∴ The initial probability vector 𝑝(0) = [𝑝𝑀 𝑝𝐴 𝑝𝑆 ] = [0 0 1]
0 1 0 0 1 0 0 0 1
1
0 1] = [ ⁄2 ⁄2 0 ] 1
∴ 𝑃2 = 𝑃. 𝑃 = [ 0 0 1] . [ 0
1⁄ 1⁄ 0 1⁄ 1⁄ 0
2 2 2 2 0 1⁄2 1⁄2
1⁄ 1⁄
0 0 1 0 1 0 2 2 0
1 1
⇒ 𝑃3 = 𝑃2 . 𝑃 = [ ⁄2 ⁄2 0 ] . [ 0 0 1] = 0 1⁄ 1⁄
2 2
1 1
0 1⁄2 1⁄2 ⁄2 ⁄2 0 1⁄ 1⁄ 1⁄
[ 4 4 2]
0 0 1
1 1
∴ 𝑝(2) = 𝑝(0) 𝑃2 = [0 0 1]. [ ⁄2 ⁄2 0 ] = [0 1⁄2 1⁄2] = [𝑝𝑀 (2) (2)
𝑝𝐴
(2)
𝑝𝑆 ]
0 1⁄2 1⁄2
1⁄ 1⁄
2 2 0
∴ 𝑝(3) = 𝑝(0) 𝑃3 = [0 0 1]. 0 1⁄2 1⁄2 = [1⁄4 1⁄4 1⁄2] = [𝑝𝑀 (3) (3)
𝑝𝐴
(3)
𝑝𝑆 ]
1 1 1
[ ⁄4 ⁄4 ⁄2]
(2)
i) ∴ The probability to have a Santro car in the year 2002, 𝑝𝑆 = 1⁄2 = 50%
(2)
ii) ∴ The probability to have a Maruthi car in the year 2002,𝑝𝑀 = 0 = 0%
(3)
iii) ∴ The probability to have an Ambassador car in the year 2003,𝑝𝐴 = 1⁄4 = 25%
(3)
iv) ∴ The probability to have a Santro car in the year 2003,𝑝𝑆 = 1⁄2 = 50%

***
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 21 TAKEITEASY ENGINEERS
Inspire before you expire…, TIE- Notes and Resources BCS301

||Jai Sri Gurudev||


S.J.C. INSTITUTE OF TECHNOLOGY, CHICKBALLAPUR
DEPARTMENT OF MATHEMATICS
MATHEMATICS-3 FOR COMPUTER SCIENCE (BCS301)
MODULE - 3
STATISTICAL INFERENCE -1
Prepared by:
Purushotham P
Assistant Professor
SJC Institute of Technology
Email id: [email protected]

Introduction:
Sampling is a statistical method of obtaining representative data (observations) from a group. We have
been using sampling concepts in our day to day lives knowingly or unknowingly; for instance we take a
handful of rice to check the rice quality of the full lot. This is an example of random sampling from a large
population.

Population (Universe):
The group of objects (individuals) under study
is called population or universe. Universe may be finite
or infinite.

Sample:
A part containing objects(individuals), selected from
the population is called a sample.

Sample size:
The number of individuals in a sample is called a sample size. If the sample size n is less than or equal to 30, then
the sample is aid to be small, otherwise it is called a large sample.

Random Sampling:
The selection of objects (individuals) from the universe in such a way that each object (individual) of the
universe has the same chance of being selected is called random sampling. Lottery system is the most common
example of random sampling.
Every random sampling need not be simple. For example, if balls are drawn without replacement from
a bag of balls containing different balls; the probability of success changes in every trial. Thus, the sampling
though random is not simple.
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |1
Inspire before you expire…, TIE- Notes and Resources BCS301
Simple Sampling:
Simple sampling is a special case of random sampling in which each event has same probability of
success or failure.

Hypothesis:
A hypothesis is an assumption based on insubstantial evidences that lends itself to further testing and
experimentation. For example a farmer claims significant increase in crop production after using a particular
fertilizer and after a season of experimenting, his hypothesis may be proved true or false. Any hypothesis may
be accepted or rejected as per specific confidence levels and must be admissible to refutation.

Null Hypothesis:
The null hypothesis is a general statement or default position that there is no relationship between two
measured phenomena or no association among groups.
Example: Given the test scores of two random samples, one of men and one of women, does one group differ from
the other? A possible null hypothesis is that the mean male score is the same as the mean female score:
H0: μ1 = μ2
where
H0 = the null hypothesis,
μ1 = the mean of population 1, and
μ2 = the mean of population 2.
A stronger null hypothesis is that the two samples are drawn from the same population, such that the
variances and shapes of the distributions are also equal.

Alternative Hypothesis:
It is the opposite statement of null hypothesis and denoted by : 1   2

Significance levels (α):


The significance level of an event (such as a statistical test) is the probability that the
event could have occurred by chance. If the level is quite low, that is, the probability of occurring by chance is
quite small, we say the event is significant.
The level of significance is the measurement of the statistical significance. It defines whether the null
hypothesis is assumed to be accepted or rejected. It is expected to identify if the result is statistically significant
for the null hypothesis to be false or rejected.

Example: A level of significance of p=0.05 means that there is a 95% probability that the results found in the
study are the result of a true relationship/difference between groups being compared. It also means that there
is a 5% chance that the results were found by chance alone and no true relationship exists between groups.

Standard Error:
The standard deviation of the sampling distribution of a statistic is Known as Standard Error (S.E.).

Precision:
Reciprocal of standard error is known as precision.

Confidence Limits:
In short, confidence limits show how accurate an estimation of the mean is or is likely to be. Confidence limits
are the lowest and the highest numbers at the end of a confidence interval.

Confidence Interval:
A confidence interval is a range around a measurement that conveys how precise the measurement is. A
confidence interval, in statistics, refers to the probability that a population parameter will fall between a set of
values for a certain proportion of times. Analysts often use confidence intervals that contain either 95% or 99% of
expected observations.

Critical Value:
A critical value is the value of the test statistic which defines the upper and lower bounds of a confidence interval,
or which defines the threshold of statistical significance in a statistical test.

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |2


Inspire before you expire…, TIE- Notes and Resources BCS301

Level of Significance
Types of test 1% 5% 10%
Two tailed test 2.58 1.96 1.645
One tailed test 2.33 1.645 1.28

Critical Region:
A critical region, also known as the rejection region, is a set of values for the test statistic for which the null
hypothesis is rejected. i.e. if the observed test statistic is in the critical region then we reject the null hypothesis and
accept the alternative hypothesis.

Type I and Type II Errors:


When we test a statistic at specified confidence level, there are chances of taking wrong decisions due to
small sample size or sampling fluctuations etc.
Type I error is the incorrect rejection of a true null hypothesis, i.e. we reject , when itis true, whereas
Type II error is the incorrect acceptance of a false null hypothesis, i.e. we accept when it is false.

One Tailed and Two Tailed Tests:


While testing statistical significance levels; one- tailed test and a two-tailed test are used for accepting or
rejecting a hypothesis. One- tailed tests are used for asymmetric distributions (reference value is unidirectional)
whichhave a single tail; such as the chi-square distribution.
A two-tailed test is appropriate if the estimated value may lie on both sides of reference value. Two-
tailed tests are only applicable when the probability curve has two tails; such as normal distribution.

Test of hypothesis:
Let 𝑥 be the observed number of successes in a sample size of 𝑛 and 𝜇 = 𝑛𝑝 be the expected number of successes .Then
the standard normal variate 𝑍is defined as
𝑥−𝜇 𝑥−𝑛𝑝
𝑍 = 𝜎 = 𝑛𝑝𝑞

Test of hypothesis for means:


Let 𝜇1 , 𝜇2 be the means, 𝜎1 , 𝜎2 be the standard deviations of two populations and 𝑥̄ 1 , 𝑥̄ 2 are the
means of the samples, then
(𝑥̄ −𝑥̄ )
𝑍 = 22 1` 2 If the samples are drawn from the same population, then 𝜎1 = 𝜎2 = 𝜎 we have
𝜎 𝜎
√ 1 + 2
𝑛1 𝑛2
(𝑥̄ 2 −𝑥̄ 1 )
𝑍= 1 1
𝜎√ +
𝑛1 𝑛2

PROBLEMS
1) A coin is tossed 1000 times and head turns up 540 times. Decide on the hypothesis that the
coin is unbiased at 1 % level of significance.
Sol.
Let us suppose that the coin is unbiased.
and let p= the probability of getting a head in one toss=1/2=0.5
Since p+q=1, q=1-p=1/2=0.5
Expected number of heads in 1000 tosses=np=1000x0.5=500 , npq=250
∴ The difference is 𝑥 − 𝜇 =540-500=40
𝑥−𝜇 𝑥−𝑛𝑝
∴ Consider 𝑍 = 𝜎 = 𝑛𝑝𝑞

40
⇒𝑍= = 2.53 < 2.58
√250
1% level of significance = 99% confidence level.
Therefore accept the hypothesis that the coin is unbiased.

2) A coin is tossed 400 times and turns up head 216 times. Test the hypothesis that the coin is unbiased
at 5%level of significance.
Sol.
Let us suppose that the coin is unbiased.
and let p= the probability of getting a head in one toss=1/2=0.5
Since p+q=1, q=1-p=1/2=0.5
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |3
Inspire before you expire…, TIE- Notes and Resources BCS301
Expected number of heads in 400 tosses=np=400x0.5=200 , npq=100
∴ The difference is 𝑥 − 𝜇 =216-200=16
𝑥−𝜇 𝑥−𝑛𝑝
∴ Consider 𝑍 = 𝜎 = 𝑛𝑝𝑞

16 16
⇒𝑍= = 10 = 1.6 < 1.96
√100
Critical value of z at alpha = 0.05 is 1.96
Therefore accept the hypothesis that the coin is unbiased at the 5% level of significance.

3) A coin was tossed 1600 times and the tailed turned up 864 times. Test the hypothesis that the
Coin is unbiased at 1% level of significance.
Sol.
Let us suppose that the coin is unbiased .
and let p= the probability of getting a tail in one toss=1/2=0.5
Since p+q=1, q=1-p=1/2=0.5
Expected number of tailed in 1600 tosses=np=1600x0.5=800, npq=400
∴ The difference is 𝑥 − 𝜇 =864-800=64
𝑥−𝜇 𝑥−𝑛𝑝
∴ Consider 𝑍 = 𝜎 = 𝑛𝑝𝑞

64
⇒𝑍= = 3.2 > 2.58
√400
1% level of significance = 99% confidence level.
Therefore accept the hypothesis that the coin is biased.

4) In 324 throws of a six faced 'die' , an odd number turned up 181 times. Is it possible to think
that the 'die' is an unbiased one?
Sol.
Let us suppose that the die is unbiased.
and let p= the probability of the turn up of an odd number is=3/6=1/2=0.5
Since p+q=1, q=1-p=1/2=0.5
Expected number of successes=np=324x0.5=162, npq=81
∴ The difference is 𝑥 − 𝜇 =181-162=19
𝑥−𝜇 𝑥−𝑛𝑝
∴ Consider 𝑍 = 𝜎 = 𝑛𝑝𝑞

19 19
⇒𝑍= = = 2.11 < 2.58
√81 9
Thus we can that the die is unbiased

5) A die is thrown 9000 times and a throw of 3 or 4 was observed 3240 times. Show that the die
Can not be regarded as an unbiased one.
Sol.
The probability of getting 3 or 4 in a single through is 𝑝 = 2⁄6 = 1⁄3
1 2
And 𝑞 = 1 − 𝑝 = 1 − =
3 3
1
∴ Expected number of success = × 9000 = 3000
3
∴ The difference =3240-3000=240
𝑥−𝑛𝑝
𝑍 = 𝑛𝑝𝑞

1
(3240)−(9000× )
3
Consider ⇒ 𝑍 = 1 2
√9000× ×
3 3
240
⇒𝑍=
√2000
⇒ 𝑍 = 5.37
Since Z=5.37>2.58 ,
We conclude that the die is biased.

Test of significance for proportion:

Test of significance of single proportion:


To test the significant difference between the sample proportion p and the population proportion P,
we use the statistic.

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |4


Inspire before you expire…, TIE- Notes and Resources BCS301
𝑝−𝑃
𝑍= , where P+Q=1 => Q=1-P , n=Sample size
𝑃𝑄

𝑛
The formulated Null and Alternative hypothesis is , 𝐻0 ; 𝑃 = 𝑎 specified value , 𝐻1 : 𝑃 ≠ 𝑎 specified value

Test of significance of Difference between two sample proportions:


To test the significance of the difference between the samples proportions, the test statistic under the null
hypothesis 𝐻0 that there is no significance difference between the two sample proportions,
𝑝1 −𝑝2 𝑛 𝑝 +𝑛 𝑝 𝑥 +𝑥
We have 𝑍 = 1 1
𝑊ℎ𝑒𝑟𝑒𝑃 = 1𝑛1 +𝑛2 2 𝑜𝑟𝑃 = 𝑛1 +𝑛2 and P+Q=1,
√𝑃𝑄(𝑛 +𝑛 ) 1 2 1 2
1 2
Here 𝑝1 𝑎𝑛𝑑𝑝2 are the sample proportions in respect of an attribute corresponding to two large samples of size
𝑛1 𝑎𝑛𝑑𝑛2 drawn from the two populations.

PROBLEMS
1. A coin is tossed 400 times and it turns up head 216 times. Discuss whether the coin may be regarded as unbiased one.
Sol.
1
Set the null hypothesis 𝐻0 ; 𝑃 = 2
1
Set the Alternative hypothesis 𝐻1 : 𝑃 ≠ 2
The level of significance 𝛼 = 0.05 (5%)
𝑝−𝑃
∴ The test statistic 𝑍 = , where P+Q=1 => Q=1-P
𝑃𝑄

𝑛
Given, the coin is tossed and it turns up in the equal proportion
1
𝑃 = ⇒𝑄 =1−𝑃
2
1 1
⇒𝑄 =1−=
2 2
And the coin turns up head 216 times when it tossed 𝑛 = 400times
216
∴𝑝= = 0.54
400
0.54 − 0.5
∴𝑍=
√0.5 × 0.5
400
0.04
⇒𝑍= = 1.6
√0.000625
At 5% level, the tabulated value of 𝑍𝛼 is 1.96
Since |𝑍| = 1.6 < 1.96
Hence, the null hypothesis is accepted at 5% level of significance and the coin may be regarded as unbiased.

2. In a city of sample of 500 people, 280 are tea drinkers and the rest are coffee drinkers. Can we assume that both coffee
and tea are equally popular in this city at 5% Los.
Sol.
1
Set the null hypothesis 𝐻0 ; 𝑃 = ( Both coffee and tea drinkers are equally popular)
2
1
Set the Alternative hypothesis 𝐻1 : 𝑃 ≠ 2
The level of significance 𝛼 = 0.05 (5%)
𝑝−𝑃
∴ The test statistic = , where P+Q=1 => Q=1-P
𝑃𝑄

𝑛
1
𝑃= ⇒𝑄 =1−𝑃
2
1 1
⇒𝑄 =1−2=2
280
∴ 𝑝 = 500 = 0.56 , where 𝑛 = 500

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |5


Inspire before you expire…, TIE- Notes and Resources BCS301
0.56 − 0.5
∴𝑍=
√0.5 × 0.5
500
0.06
⇒ 𝑍 = 0.0005 = 2.68

At 5% level, the tabulated value of 𝑍𝛼 is 1.96
Since |𝑍| = 2.68 > 1.96
Hence, the null hypothesis is rejected at 5% level of significance and both the drinkers are not popular.

3. A manufacturing company claims that at least 95% of its products supplied confirm to the specifications out of a
sample of 200 products, 18 are defective. Test the claim at 5% Los.
Sol.
Set the null hypothesis 𝐻0 ; 𝑃 = 95% = 0.95
Set the Alternative hypothesis 𝐻1 : 𝑃 ≠ 0.95
The level of significance 𝛼 = 0.05 (5%)
𝑝−𝑃
∴ The test statistic 𝑍 = , where P+Q=1 => Q=1-P
𝑃𝑄

𝑛
Given,
𝑃 = 95% = 0.95 ⇒ 𝑄 = 1 − 𝑃

⇒ 𝑄 = 1 − 0.95 = 0.05
Found 18 products are defective out of 200 sample products
∴ The total defective less products( Non defective)=200-18=182
182
∴𝑝= = 0.91
200
0.91 − 0.95
∴𝑍=
√0.95 × 0.05
200
0.04
⇒𝑍=− = −2.5955
√0.0002375
At 5% level, the tabulated value of 𝑍𝛼 is 1.96
Since |𝑍| = 2.5955 > 1.96
Hence, the null hypothesis is rejected at 5% level of significance

4. If a sample of 300 units of a manufactured product 65 units were found to be defective and in another sample of 200
units, there were 35 defectives. Is there significant difference in the proportion of defectives in the samples at 5%
Los.
Sol.
Set the null hypothesis 𝐻0 ; 𝑃1 = 𝑃2
Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
The level of significance 𝛼 = 0.05 (5%)
Given
𝑛1 = 300, 𝑛2 = 200
The sample of 300 units of a manufactured product 65 units were found to be defective
65
∴ 𝑝1 = 300 = 0.2166 = 0.22
The sample of 200 units of a manufactured product 35 units were found to be defective
35
∴ 𝑝2 = 200 = 0.1750
𝑥1 +𝑥2
We know that 𝑃 =
𝑛1 +𝑛2
65+35
⇒ 𝑃 = 300+200
100
⇒𝑃=
500
⇒ 𝑃 = 0.2
⇒ 𝑄 = 1 − 𝑃 = 1 − 0.2 = 0.8

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |6


Inspire before you expire…, TIE- Notes and Resources BCS301
𝑝1 −𝑝2
∴𝑍= 1 1
√𝑃𝑄(𝑛 +𝑛 )
1 2
0.22−0.1750
⇒𝑍= 1 1
√(0.2×0.8)( + ))
300 200
0.045
⇒𝑍=
√(0.16)(0.00833)
0.045
⇒𝑍=
√0.001328
0.045
⇒𝑍= 0.03644
⇒ 𝑍 = 1.233
At 5% level, the tabulated value of 𝑍𝛼 is 1.96
Since |𝑍| = 1.233 < 1.96
Hence, the null hypothesis is accepted at 5% level of significance

5. In a large city A, 20% of a random sample of 900 school boys had a slight physical defect. In another large city B,
18.5% of a random sample of 1600 school boys had the same defect. Is the difference between the proportions
significant?
Sol.
Set the null hypothesis 𝐻0 ; 𝑃1 = 𝑃2
Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
The level of significance 𝛼 = 0.05 (5%)
Given
𝑛1 = 900, 𝑛2 = 1600
𝑥1 = 20%of random sample of 900=0.2x900=180
𝑥2 = 18.5%of random sample of 1600=0.185x1600=296

20 18.5
∴ 𝑝1 = 20% = = 0.2, 𝑝2 = 18.5% = = 0.185
100 100
𝑥 +𝑥
We know that 𝑃 = 𝑛1 +𝑛2
1 2
180 + 296
⇒𝑃=
900 + 1600
476
⇒𝑃=
2500
⇒ 𝑃 = 0.1904 ⇒ 𝑄 = 1 − 𝑃 = 1 − 0.1904 = 0.8096
𝑝1 − 𝑝2
∴𝑍=
1 1
√𝑃𝑄 ( + )
𝑛1 𝑛2
0.2 − 0.185
⇒𝑍=
1 1
√(0.1904 × 0.8096) (
900 + 1600))
0.015
⇒𝑍=
√(0.1541)(0.00173)
0.015
⇒𝑍=
√0.00026
0.015
⇒𝑍=
0.01612
⇒ 𝑍 = 0.9305
At 5% level, the tabulated value of 𝑍𝛼 is 1.96
Since |𝑍| = 0.9305 < 1.96
Hence, the null hypothesis 𝐻0 is accepted at 5% level of significance and hence there is no significant difference.

6. Before an increase in excise duty on tea, 800 persons out of a sample of 1000 persons were found to be tea drinkers.
After an increase is excise duty. 800 people were tea drinkers in a sample of 1200 people. Test whether there is a
significant decrease in the consumption of tea after the increase in excise duty at 5% Los.
Sol.
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |7
Inspire before you expire…, TIE- Notes and Resources BCS301
Set the null hypothesis 𝐻0 ; 𝑃1 = 𝑃2
Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
The level of significance 𝛼 = 0.05 (5%)
Given
𝑛1 = 1000, 𝑛2 = 1200 & 𝑥1 = 800, 𝑥2 = 800
800 800
∴ 𝑝1 = = 0.8, 𝑝2 = = 0.6670
1000 1200
𝑥1 +𝑥2
We know that 𝑃 = 𝑛 +𝑛
1 2
800 + 800
⇒𝑃=
1000 + 1200
1600
⇒𝑃=
2200
⇒ 𝑃 = 0.7272 ⇒ 𝑄 = 1 − 𝑃 = 1 − 0.7272 = 0.2728
𝑝1 − 𝑝2
∴𝑍=
1 1
√𝑃𝑄 ( + )
𝑛 1𝑛 2
0.8 − 0.6670
⇒𝑍=
1 1
√(0.7272 × 0.2728) (
1000 + 1200))
0.133
⇒𝑍=
√(0.1983)(0.00183)
0.133
⇒𝑍=
√0.00036
0.133
⇒𝑍=
0.0189
⇒ 𝑍 = 7.037
At 5% level, the tabulated value of Zα is 1.645.
Since |Z| = 7.037 > 1.645
Hence Null Hypothesis 𝐻0 is rejected at 5% level of significance.
There is a significance decrease in the consumption of tea due to increase in excise duty.

7. In a sample of 600 men from a certain city, 450 are found smokers. In another sample of 900 men from another city,
450 are smokers. Do the indicate that the cities are significantly different with respect to the habit of smoking among
men. Test at 5% significance level.
(Warning: Smoking is injurious to health, causes cancer, Tabaco causes painful death)
Sol.
Set the null hypothesis 𝐻0 ; 𝑃1 = 𝑃2
Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
The level of significance 𝛼 = 0.05 (5%)
Given
𝑛1 = 600, 𝑛2 = 900 & 𝑥1 = 450, 𝑥2 = 450
450 450
∴ 𝑝1 = 600 = 0.75, 𝑝2 = 900 = 0.5
𝑥 +𝑥
We know that 𝑃 = 𝑛1 +𝑛2
1 2
450+450
⇒ 𝑃 = 600+900
900
⇒ 𝑃 = 1500 = 0.6
⇒ 𝑃 = 0.6 ⇒ 𝑄 = 1 − 𝑃 = 1 − 0.6 = 0.4
𝑝1 −𝑝2
∴𝑍= 1 1
√𝑃𝑄(𝑛 +𝑛 )
1 2
0.75−0.5
⇒𝑍=
1 1
√(0.6×0.4)( + ))
600 900
0.25
⇒𝑍=
√(0.24)(0.00277)
0.25
⇒𝑍=
√0.0006648

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |8


Inspire before you expire…, TIE- Notes and Resources BCS301
0.25
⇒𝑍= 0.02578
⇒ 𝑍 = 9.69
At 5% level, the tabulated value of Zα is 1.645.
Since |Z| = 9.69 > 1.645
Hence Null Hypothesis 𝐻0 is rejected at 5% level of significance.

8. One type of air craft is found to develop engine trouble in 5 flights out of a total of 100 and another type in 7 flights
out of a total of 200 flights. Is there a significance difference in the two types of air craft’s so far as engine defects
are concerned? Test at 5% significance level.
Sol.
Set the null hypothesis 𝐻0 ; 𝑃1 = 𝑃2
Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
The level of significance 𝛼 = 0.05 (5%)
Given
𝑛1 = 100, 𝑛2 = 200 & 𝑥1 = 5, 𝑥2 = 7
5 7
∴ 𝑝1 = 100 = 0.05, 𝑝2 = 200 = 0.35
𝑥 +𝑥
We know that 𝑃 = 𝑛1 +𝑛2
1 2
5+7
⇒ 𝑃 = 100+200
12
⇒ 𝑃 = 300
⇒ 𝑃 = 0.04 ⇒ 𝑄 = 1 − 𝑃 = 1 − 0.04 = 0.96
𝑝1 −𝑝2
∴𝑍= 1 1
√𝑃𝑄(𝑛 +𝑛 )
1 2
0.05−0.35
⇒𝑍= 1 1
√(0.04×0.96)( + ))
100 200
0.3
⇒𝑍=−
√(0.384)(0.015)
0.3
⇒𝑍= − 0.00576

0.3
⇒𝑍= −
0.07589
⇒ 𝑍 = −3.953
At 5% level, the tabulated value of Zα is 1.645.
Since |Z| = 3.953 > 1.645
Hence Null Hypothesis 𝐻0 is rejected at 5% level of significance.

9. A machine produced 16 defective articles in a batch of 500. After overhauling it produced 3 defectives in a batch of
100. Has the machine improved?
Sol.
Set the null hypothesis 𝐻0 ; 𝑃1 = 𝑃2
Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
The level of significance 𝛼 = 0.01 (1%)
Given
𝑛1 = 500, 𝑛2 = 100 & 𝑥1 = 16, 𝑥2 = 3
16 3
∴ 𝑝1 = 500 = 0.032, 𝑝2 = 100 = 0.03
𝑥 +𝑥
We know that 𝑃 = 𝑛1 +𝑛2
1 2
16+3
⇒ 𝑃 = 500+100
19
⇒ 𝑃 = 600
⇒ 𝑃 = 0.03166 ⇒ 𝑄 = 1 − 𝑃 = 1 − 0.03166 = 0.96834
𝑝1 −𝑝2
∴𝑍= 1 1
√𝑃𝑄(𝑛 +𝑛 )
1 2

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |9


Inspire before you expire…, TIE- Notes and Resources BCS301
0.032−0.03
⇒𝑍= 1 1
√(0.03166×0.96834)( + ))
500 100
0.002
⇒𝑍=
√(0.03065)(0.012)
0.002
⇒𝑍=
√0.0003678
0.002
⇒𝑍= 0.01917
⇒ 𝑍 = 0.1047
At 1% level, the tabulated value of Zα is 1.96.
Since |Z| = 0.1047 <1.96
Hence Null Hypothesis 𝐻0 is accepted at 1% level of significance
∴the machine is not improved after overhauling.

10. A machine produced 25 defective articles in a batch of 400. After over hauling it produced 15 defectives in a batch
of 200. Test at 1% level of significance whether there is a reduction of defective articles after overhauling.
Sol.

The null hypothesis 𝐻0 ; 𝑃1 = 𝑃2


Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
The level of significance 𝛼 = 0.01 (1%)
Given
𝑛1 = 400, 𝑛2 = 200 & 𝑥1 = 25, 𝑥2 = 15
25 15
∴ 𝑝1 = 400 = 0.0625, 𝑝2 = 200 = 0.075
𝑥 +𝑥
We know that 𝑃 = 𝑛1 +𝑛2
1 2
25+15
⇒ 𝑃 = 400+200
40
⇒ 𝑃 = 600
⇒ 𝑃 = 0.0666 ⇒ 𝑄 = 1 − 𝑃 = 1 − 0.0666 = 0.9334
𝑝1 −𝑝2
∴𝑍= 1 1
√𝑃𝑄(𝑛 +𝑛 )
1 2

0.0625−0.075
⇒𝑍=
1 1
√(0.0666×0.9334)( + ))
400 200
0.0125
⇒𝑍=−
√(0.0621)(0.0075)
0.0125
⇒𝑍= − 0.00046

0.0125
⇒𝑍= − 0.0214
⇒ 𝑍 = −0.5841
At 1% level, the tabulated value of Zα is 1.96.
Since |Z| = 0.5841 <1.96
Hence Null Hypothesis 𝐻0 is accepted at 1% level of significance

11. In an examination given to students at a large number of different schools the mean grade was 74.5 and S.D grade
was 8. At one particular school where 200 students took the examination the mean grade was 75.9. Discuss the
significance of this result at both 5% and 1% level of significance.
Sol.
The level of significance 𝛼 = 0.05 (5%)  𝑍0.05 =1.96
The level of significance 𝛼 = 0.01 (1%)  𝑍0.01 =1.64
Given
𝑛 = 200
𝜎=8
𝜇 = 74.5𝑎𝑛𝑑𝑥̄ = 75.9
We calculate Z through Test Statistic,
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 10
Inspire before you expire…, TIE- Notes and Resources BCS301
𝑥̄ − 𝜇
𝑍=𝜎
⁄ 𝑛

75.9 − 74.5
⇒𝑍=
8⁄
√200
1.4
⇒𝑍=
8⁄
14.1421
1.4 × 14.1421
⇒𝑍=
8
⇒ 𝑍 = 2.4748
i) Thus At 5% level, the tabulated value of Zα is 1.645.
Since |Z| = 2.4748 > 1.96
Hence Null Hypothesis 𝐻0 is rejected at 5% level of significance.
ii) Thus At 1% level, the tabulated value of Zα is 1.645.
Since |Z| = 2.4748 > 1.645
Hence Null Hypothesis 𝐻0 is rejected at 1% level of significance.

12. Intelligent tests were given to the two groups of boys and girls,
Mean S.D Size
Girls 75 8 60
Boys 73 10 100
Find out if the two mean significantly differ at 5% level of significance.
Soln:
Set The null hypothesis 𝐻0 ; 𝑃1 = 𝑃2
Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
where, P1 refers the girls and P2 refers the boys
Given, the means, S.D’s & sizes of both the groups of girls and boys are as follows,
𝑥1 = 75 , ̅̅̅
̅̅̅ 𝑥2 = 73 , 𝜎1 = 8 , 𝜎2 = 10 , 𝑛1 = 60 , 𝑛2 = 100
(𝑥̄ 2 −𝑥̄ 1` ) (73−75) 2
WKT, 𝑍 = = =− = −1.3898.
𝜎 2 𝜎 2 64 100 √2.07
√ 1 + 2 √ +
𝑛1 𝑛2 60 100

Thus At 5% level, the tabulated value of Zα is 1.96.


Since |Z| = 1.3898 < 1.96
Hence, the null hypothesis is accepted at 5% level of significance, i.e., there is no significant difference.

***

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 11


Inspire before you expire…, TIE- Notes and Resources BCS301

||Jai Sri Gurudev||


S.J.C. INSTITUTE OF TECHNOLOGY, CHICKBALLAPUR
DEPARTMENT OF MATHEMATICS
MATHEMATICS-3 FOR COMPUTER SCIENCE (BCS301)
MODULE - 4
STATISTICAL INFERENCE -2
Prepared by:
Purushotham P
Assistant Professor
SJC Institute of Technology
Email id: [email protected]

Statistics: Any function of the sample values is known as a statistics.


Eg: Sample mean, Sample median, Sample variance etc. are all statistics.

Sampling Distribution: A sampling distribution is a distribution of a statistic over all possible


samples. That is sampling distribution is the probability distribution of the statistics.

Sampling Variables: Variables sampling is the process used to predict the value of a specific
variable within a population. For example, a limited sample size can be used to compute the average
accounts receivable balance, as well as a statistical derivation of the plus or minus range of the total
receivables value that is under review.

The Central Limit Theorem: Suppose that a sample of size 𝑛 is selected from a population that
has mean 𝜇 and the standard deviation 𝜎, then Let 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 . . . . . . . , 𝑥𝑛 be the 𝑛 observations, they
𝑥 +𝑥 +𝑥 +......𝑥𝑛 1
are independent and identically distributed with mean 𝑋̄ = 1 2 3 = ∑𝑛𝑖=1 𝑥𝑖 , the central
𝑛 𝑛
limit theorem states that the sample mean 𝑥̄ follows approximately the normal distribution with
𝜎 𝜎
mean 𝜇and standard deviation ( is also called Standard error) , i.e. 𝑋̄ ~𝑁 (𝜇, ) , where 𝜇 , 𝜎are
√𝑛 √𝑛
mean and standard deviation of the population from where the sample was selected and the sample
size becomes large ( 𝑛 ≥ 30).

Degrees of freedom: Degrees of freedom refer to the maximum number of logically independent
values, which may vary in a data sample. Degrees of freedom are calculated by subtracting one
from the number of items within the data sample (𝑛 -1).

Description Population notation Sample Notation


Size N 𝑛
𝑛
Mean 𝜇 1
𝑋̄ = ∑ 𝑥𝑖
𝑛
𝑖=1
Variance 𝜎2 1
𝑠2 = ∑(𝑥
𝑛−1
− 𝑥̄ )2
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |1
Inspire before you expire…, TIE- Notes and Resources BCS301
Standard deviation 𝜎 𝑠
1
=√ ∑(𝑥 − 𝑥̄ )2
𝑛−1

Confidence Intervals:
Suppose we want to estimate an actual population mean𝜇. As you know, we can only obtain𝑥̄ , the
mean of a sample randomly selected from the population of interest. We can use 𝑥̄ to find a range of
values:
Lower value<population mean𝜇 <Upper value
That we can be really confident contains the population mean𝜇. The range of values is called a
"confidence interval."
𝑋̄ −𝜇
𝑍=𝜎
⁄ 𝑛

𝜎
Confidence interval 𝐶. 𝐼. == 𝑀𝑒𝑎𝑛 ± 𝑍(𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛/√𝑆𝑎𝑚𝑝𝑙𝑒𝑆𝑖𝑧𝑒 = 𝜇 ± 𝑍 or 𝑋̄ =
√𝑛
𝜎
𝜇±𝑍
√𝑛
Confidence 99% 98% 95% 90% 50%
Level
Z 2.58 2.33 1.96 1.645 0.6745

PROBLEMS
1. State Central limit theorem. Use the theorem to evaluate P[50 < 𝑿̄ < 56] where 𝑿̄ represents
the mean of a random sample of size 100 from an infinite population with mean 𝜇 = 53 and
variance 𝝈𝟐 = 400.
Sol.
The central limit theorem states that the sample mean 𝑥̄ follows approximately the normal
𝜎 𝜎
distribution with mean 𝜇and standard deviation ( is also called Standard error), i.e., 𝑥̄ ~𝑁 (𝜇, )
√𝑛 √𝑛
, where 𝜇 , 𝜎are mean and standard deviation of the population from where the sample.
Given,
Sample size n=100
Mean of the population 𝜇 = 53
Variance of the population 𝜎 2 = 400 ⇒ 𝜎 = √400 = 20
𝜎
𝑋̄ ~𝑁 (𝜇, )
√𝑛
20
⇒ 𝑋̄ ~𝑁 (53, )
√100
⇒ 𝑋̄ ~𝑁(53,2)
𝑋̄ −𝜇
∴ we know that 𝑍 = 𝜎
⁄ 𝑛

𝑋̄ −53
⇒𝑍= 20⁄
√100
𝑋̄ −53
⇒𝑍=
2
50−53 3
∴ At 𝑋̄ =50 ⇒ 𝑍 = = − = −1.5 = 𝑧1
2 2
56−53 3
At 𝑋̄ =56 ⇒ 𝑍 = = = 1.5 = 𝑧2
2 2
∴ 𝑃(50 < 𝑋̄ < 56) = 𝑃(−1.5 < 𝑧 < 1.5)
= 2𝑃(0 < 𝑧 < 1.5)
= 2𝐴(1.5)
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |2
Inspire before you expire…, TIE- Notes and Resources BCS301
= 2 × 0.4332
∴ 𝑃(50 < 𝑋̄ < 56) = 0.8664

2. An unknown distribution has a mean of 90 and a standard deviation of 15. Samples of size 𝑛
= 25 are drawn randomly from the population. Find the probability that the sample mean is
between 85 and 92.
Sol.
Given,
Sample size n=25
Mean of the population 𝜇 = 90
Variance of the population ⇒ 𝜎 = 15
𝜎
𝑋̄ ~𝑁 (𝜇, )
√𝑛
15
⇒ 𝑋̄ ~𝑁 (90, )
√25
̄
⇒ 𝑋 ~𝑁(90,3)
𝑋̄ −𝜇
∴ we know that 𝑍 = 𝜎
⁄ 𝑛

𝑋̄ −90
⇒𝑍= 15⁄
√25
𝑋̄ −90
⇒𝑍=
3
85−90 5
∴ At 𝑋̄ = 85 ⇒ 𝑧 = = − = −1.66
3 3
92−90 2
̄
∴ At 𝑋 = 92 ⇒ 𝑧 = = = 0.66
3 3

∴ 𝑃(85 < 𝑋̄ < 92) = 𝑃(−1.66 < 𝑧 < 0.66)


⇒ 𝑃(−1.66 < 𝑧 < 0.66) = 𝑃(0 < 𝑧 < 1.66) + 𝑃(0 < 𝑧 < 0.66)
= 0.4515 + 0.2454
⇒ 𝑃(−1.66 < 𝑧 < 0.66) = 0.6965

3. A random sample of size 64 is taken from an infinite population having mean 112 and
variance 144. Using central limit theorem, find the probability of getting the sample mean
𝑋̅ greater than 114.5.
Sol.
Given,
Sample size n=64
Mean of the population 𝜇 = 112
Variance of the population ⇒ 𝜎 2 = 144 ⇒ 𝜎 = 12
𝜎
𝑋̄ ~𝑁 (𝜇, )
√𝑛
12
⇒ 𝑋̄ ~𝑁 (112, )
√64
̄
⇒ 𝑋 ~𝑁(90,1.5)
𝑋̄ −𝜇
∴ we know that 𝑍 = 𝜎
⁄ 𝑛

𝑋̄ −112
⇒𝑍= 12⁄
√64
𝑋̄ −112
⇒𝑍=
1.5
114.5−112
∴ At 𝑋̄ = 114.5 ⇒ 𝑧 = = 1.66
1.5
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |3
Inspire before you expire…, TIE- Notes and Resources BCS301
∴ 𝑃(𝑋̄ > 114.5) = 𝑃(𝑧 > 1.66)
⇒ 𝑃(𝑧 > 1.66) = 0.5 − 𝑃(0 < 𝑧 < 1.66)
= 0.5 − 0.4515
⇒ 𝑃(𝑧 > 1.66) = 0.0489

4. Let 𝑿̄denote the mean of a random sample of size 100 from a distribution, that is 𝝌𝟐 (𝟓𝟎).
Compute an approximate value of P(49<𝑿̄<51).
Sol.
The sample size n is=100
The chi-square distribution is given as ,X~ 𝜒 2 (50) , where d.f.=50
The mean and variance of chi-square distribution is given as, 𝜇 = 50
Therefore ⇒ 𝜎 2 = 2 × 𝑑. 𝑓. = 2 × 50 = 100 ⇒ 𝜎 = 10
The sample mean of chi-square distribution follows normal distribution with mean and
𝜎
standard error .
√𝑛
𝜎
∴ 𝑋̄ ~𝑁 (𝜇, )
√𝑛
10
⇒ 𝑋̄ ~𝑁 (50, )
√100
⇒ 𝑋̄ ~𝑁(50,1)
𝑋̄ −𝜇
∴ we know that 𝑍 = 𝜎
⁄ 𝑛

49−50 1
∴ At 𝑋̄ =50 ⇒ 𝑍 = = − = −1 = 𝑧1
1 1
51−50 1
At 𝑋̄ =51 ⇒ 𝑍 = = = 1 = 𝑧2
1 1
∴ 𝑃(49 < 𝑋̄ < 51) = 𝑃(−1 < 𝑧 < 1)
= 2𝑃(0 < 𝑧 < 1)
=𝐴(1)
= 2 × 0.3416
∴ 𝑃(50 < 𝑋̄ < 56) = 0.6826
5. An electrical firm manufactures light bulbs that have a length of life that is approximately
normally distribute with mean 800 hours and a standard deviation of 40 hours. Find the
probability that a random sample of 16 bulbs will have an average life of less than 775
hours.
Sol.
Total number of bulbs n=16
An average life of bulbs 𝜇 = 800
Standard deviation of the bulbs ⇒ 𝜎 = 40
𝜎
𝑋̄ ~𝑁 (𝜇, )
√𝑛
40
⇒ 𝑋̄ ~𝑁 (800. , )
√16
⇒ 𝑋̄ ~𝑁(800,10)
𝑋̄ −𝜇 𝑋̄ −800
∴ We know that 𝑍 = 𝜎 =
⁄ 𝑛 10

775−800 25
∴ At 𝑋̄ =775 ⇒ 𝑍 = =− = −2.5.
10 10
∴ 𝑃(𝑋̄ < 775) = 𝑃(𝑧 < −2.5)
⇒ 𝑃(𝑧 < −2.5) = 𝑃(𝑧 > 2.5)
⇒ 𝑃(𝑧 < −2.5) = 0.5 − 𝑃(0 < 𝑧 < 2.5)
⇒ 𝑃(𝑧 < −2.5) = 0.5 − 𝐴(2.5)

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |4


Inspire before you expire…, TIE- Notes and Resources BCS301
⇒ 𝑃(𝑧 < −2.5) = 0.5 − 0.4938
⇒ 𝑃(𝑧 < −2.5) = 0.0062

6. The heights of a random sample of 50 college students showed a mean of 174.5 centimeters
and a standard deviation of 6.9 centimeters. Construct a 99% confidence interval for the
mean height of all college students.
Sol.
Given the sample size n=50
Average height of Students (Mean) 𝜇 = 174.5𝑐. 𝑚.
Standard deviation of the Students 𝜎 = 6.9𝑐. 𝑚.
We know that, Confidence level of 99%, the corresponding z value is 2.576. This is
determined from the normal distribution table.
𝜎
Confidence interval 𝐶. 𝐼. == 𝑀𝑒𝑎𝑛 ± 𝑍(𝑆 𝑡𝑎𝑛 𝑑 𝑎𝑟𝑑𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛/√𝑆𝑎𝑚𝑝𝑙𝑒𝑆𝑖𝑧𝑒 = 𝜇 ± 𝑍
√𝑛
6.9
∴ 𝐶. 𝐼. = 174.5 ± (2.576 × )
√50
⇒ 𝐶. 𝐼. = 174.5 ± (2.576 × 0.9758)
⇒ 𝐶. 𝐼. = 174.5 ± 2.5136
The lower end of the confidence interval is = 174.5 − 2.5136 = 171.9864
The upper end of the confidence interval is = 174.5 + 2.5136 = 177.0136
Therefore, with 99% confidence interval, the mean height of all college students is
between 171.9864 centimeters and 177.0136 centimeters.

7. The mean and SD of the diameters of a sample of 250 rivet heads manufactured by a
company are 7.2642 mm and 0.0058 mm respectively. Find,
a) 99% b) 98% c) 95% d) 90% e) 50%
Confidence limits for the mean diameter of all the rivet heads manufactured by the
company.
Sol.
Given the sample size n=250
Mean of a diameter 𝜇 = 7.2642𝑚𝑚.
Standard deviation of the diameter 𝜎 = 0.0058𝑚𝑚
We know that,

Confidence 99% 98% 95% 90% 50%


Level
Z 2.58 2.33 1.96 1.645 0.6745
𝝈
Confidence interval 𝑪. 𝑰. = 𝑴𝒆𝒂𝒏 ± 𝒁(𝑺 𝒕𝒂𝒏 𝒅 𝒂𝒓𝒅𝑫𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏/√𝑺𝒂𝒎𝒑𝒍𝒆𝑺𝒊𝒛𝒆 = 𝝁 ± 𝒁
√𝒏

Confidence 𝝈 Final C.I. Interval


𝑪. 𝑰. = 𝝁 ± 𝒁
Level 𝒏
99% 7.2642 ± (2.58 7.2642 ± 0.00094 (7.26326 , 7.26514)

0.0058
× )
√250

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |5


Inspire before you expire…, TIE- Notes and Resources BCS301
98% 7.2642 ± (2.33 7.2642 ± 0.00086 (7.26334 , 7.26504)

0.0058
× )
√250
95% 7.2642 ± (1.96 7.2642 ± 0.00073 (7.26347 , 7.26493)

0.0058
× )
√250
90% 7.2642 ± (1.645 7.2642 ± 0.00061 (7.26359 , 7.26481)

0.0058
× )
√250
50% 7.2642 ± (0.6745 7.2642 ± 0.00025 (7.26395 , 7.26445)

0.0058
× )
√250

8. A random sample of size 25 from a normal distribution (𝜎 2 = 4) yields, sample mean 𝑋̅ = 78.3.
Obtain a 99% confidence interval for 𝜇.
Sol.
Given the sample size n=25
Mean of sample 𝑋̄ = 78.3
Standard deviation 𝜎 = 2
We know, Confidence level of 99%, the corresponding z value is 2.58. This is determined
from the normal distribution table.

Confidence interval 𝐶. 𝐼. = 𝜇 = 𝑀𝑒𝑎𝑛 ± 𝑍(𝑆 𝑡𝑎𝑛 𝑑 𝑎𝑟𝑑𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛/√𝑆𝑎𝑚𝑝𝑙𝑒𝑆𝑖𝑧𝑒 = 𝑋̄ ±


𝜎
𝑍
√𝑛
2
∴ 𝐶. 𝐼. = 𝜇 = 78.3 ± (2.58 × )
√25
⇒ 𝜇 = 78.3 ± 1.032
⇒ 𝐶. 𝐼 ⇒ (78.3 − 1.032 , 78.3 + 1.032) = (77.268 , 79.332)

9. Let the observed value of the mean 𝑿̄of a random sample of size 20 from a normal
distribution with 𝑚𝑒𝑎𝑛 𝜇 and variance 𝜎 2 = 80 be 81.2. Find a 90% and 95% confidence
intervals for 𝜇.
Sol.
Given the sample size n=20
Mean of sample 𝑋̄ = 81.2
Variance 𝜎 2 = 80 ⇒ 𝜎 = √80 = 8.9442
We know, Confidence level of 95%, 90% the corresponding z values are 1.96 , 1.645. This is
determined from the normal distribution table.
𝜎
Confidence interval 𝐶. 𝐼. = 𝜇 = 𝑀𝑒𝑎𝑛 ± 𝑍(𝑆 𝑡𝑎𝑛 𝑑 𝑎𝑟𝑑𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛/√𝑆𝑎𝑚𝑝𝑙𝑒𝑆𝑖𝑧𝑒 = 𝑋̄ ± 𝑍
√𝑛
For 95%:
8.9442
∴ 𝐶. 𝐼. = 𝜇 = 81.2 ± (1.96 × )
√20
⇒ 𝜇 = 81.2 ± 3.92
⇒ 𝐶. 𝐼 = 81.2 − 3.92,81.2 + 3.92) = (77.28,85.12)
(
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |6
Inspire before you expire…, TIE- Notes and Resources BCS301

For 90%:
8.9442
∴ 𝐶. 𝐼. = 𝜇 = 81.2 ± (1.645 × )
√20
⇒ 𝜇 = 81.2 ± 3.29
⇒ 𝐶. 𝐼 = (81.2 − 3.29,81.2 + 3.29) = (77.91,84.49)

10. Suppose that 10, 12, 16, 19 is a sample taken from a normal population with variance 6.25.
Find at 95% confidence interval for the population mean.
Sol.
Given samples are 10, 12, 16 and 19
Therefore, sample size n=4
Mean 𝑋̄ =14.25
Variance 𝜎 2 = 6.25 ⇒ 𝜎 = √6.25 = 2.5
We know, Confidence level of 95%, the corresponding z value is 1.96, This is determined
from the normal distribution table.
𝜎
Confidence interval 𝐶. 𝐼. = 𝜇 = 𝑀𝑒𝑎𝑛 ± 𝑍(𝑆 𝑡𝑎𝑛 𝑑 𝑎𝑟𝑑𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛/√𝑆𝑎𝑚𝑝𝑙𝑒𝑆𝑖𝑧𝑒 = 𝑋̄ ± 𝑍
√𝑛
2.5
∴ 𝐶. 𝐼. = 𝜇 = 14.25 ± (1.96 × )
√4
⇒ 𝜇 = 14.25 ± 2.45
⇒ 𝐶. 𝐼 = (14.25 − 2.45,14.25 + 2.45) = (11.80,16.70)

SAMPLING DISTRIBUTIONS
Student's 𝑡 -distribution:
1 1
Let 𝜇be the mean of population, 𝑥̄ = ∑𝑛𝑖=1 𝑥𝑖 be the mean and 𝑠 = √ ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̄ )2 be
𝑛 𝑛−1
the standard deviation of a sample, then the Student's 𝑡 -distribution is defined as
𝑥̄ −𝜇 𝑥̄ −𝜇
𝑡=𝑠 = √𝑛
⁄ 𝑛 𝑠

Another formula for 𝑡 - test of two samples is
(𝑥̄ −𝑥̄ )
𝑡 = 21 11
𝑠√ +
𝑛1 𝑛2

𝑛1 𝜎1 2 +𝑛2 𝜎2 2 1 𝑛1 𝑛2
where, 𝑠 2 = or 𝑠 = √ [∑𝑖=1(𝑥𝑖 − 𝑥̄ 1 )2 + ∑𝑖=1(𝑥𝑖 − 𝑥̄ 2 )2 ]
𝑛1 +𝑛2 −2 𝑛 1 +𝑛2 −2

Chi-square distribution:
Let 𝑂𝑖 (𝑖 = 1,2,3. . . 𝑛) and 𝐸𝑖 (𝑖 = 1,2,3. . . 𝑛) be the set of observed frequencies and expected
frequencies respectively, then the Chi-square distribution is defined as
(𝑂1 −𝐸1 )2 (𝑂2 −𝐸2 )2 (𝑂3 −𝐸3 )2 (𝑂𝑛 −𝐸𝑛 )2
𝜒2 = + + +. . . . . . +
𝐸1 𝐸2 𝐸3 𝐸𝑛
(𝑂𝑖 −𝐸𝑖 )2
⇒ 𝜒 2 = ∑𝑛𝑖=1
𝐸𝑖

F-Distribution:
The F-distribution is useful in hypothesis testing. Hypothesis testing is used by scientists to
statistically compare data from two or more populations. The F-distribution is needed to determine
whether the F-value for a study indicates any statistically significant differences between two
populations.

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |7


Inspire before you expire…, TIE- Notes and Resources BCS301
F-test is to determine whether the two independent estimates of population variance differ
𝜎 2 1
significantly. In this case, F-ratio is : 𝐹 = 1 2 where 𝜎 2 = ∑(𝑥 − 𝜇)2 or
𝜎2 𝑛
𝑠1 2 /𝜎1 2
𝐹= ,
𝑠2 2 /𝜎2 2
where 𝜎1 =Standard deviation of population-1
𝜎2 =Standard deviation of population-2
𝑠1 =Standard deviation of sample-1
𝑠2 =Standard deviation of sample-2
To find out whether the two samples drawn from the normal population have the same
variance. In this case, F-ratio is,
𝑠 2 1 1
𝐹 = 1 2 Where 𝑠1 2 = ∑(𝑥 − 𝑥̄ )2 ,𝑠2 2 = ∑(𝑦 − 𝑦̄ )2
𝑠2 𝑛1 −1 𝑛2 −1
It should be noted that numerator is always greater than the denominator in F-ratio
𝐿𝑎𝑟𝑔𝑒𝑟 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝐹=
𝑆𝑚𝑎𝑙𝑙𝑒𝑟 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑛1 = d.f for sample having larger variance
𝑛2 = d.f for sample having smaller variance
Expected value of F:
𝑠2 2
𝐹𝐸 = follows F- distribution with 𝑣1 = 𝑛1 − 1 , 𝑣2 = 𝑛2 − 1 d.f.
𝑠1 2

11. A certain stimulus administered to each of the 12 patients resulted in the following change
in the blood pressure 5,2,8,-1,3,0,6,-2,1,5,0,4. Can it be concluded that the stimulus will
increase the blood pressure? (Note: t0.05 for 11 d.f. is 2.201).
Sol.
Given the change in blood pressure
𝑥: 5,2,8,-1,3,0,6,-2,1,5,0,4
1 31
∴ 𝑥̄ = ∑ 𝑥 = = 2.5833
𝑛 12
2 1
Variance, 𝑠 = ∑(𝑥 − 𝑥̄ )2
𝑛−1
⇒ 𝑠2 =
2 2 2 2 2 2
1 (5 − 2.58) + (2 − 2.58) + (8 − 2.58) + (−1 − 2.58) + (0 − 2.58) + (6 − 2.58)
{ }
11 +(−2 − 2.58)2 + (1 − 2.58)2 + (5 − 2.58)2 + (0 − 2.58)2 + (4 − 2.58)2
⇒ 𝑠 2 = 9.538 ⇒ 𝑠 = 3.088
Let us suppose that the stimulus administration is not accompanied with increase in blood
pressure, we can take 𝜇 = 0
we have,
𝑥̄ −𝜇
𝑡= 𝑠
√𝑛
2.5833−0
⇒𝑡= 3.088
( )
√12
⇒ 𝑡 = 2.8979 ≈ 2.9 > 2.201
Hence the hypothesis is rejected at 5%level of significance. We conclude with 95%
Confidence that the stimulus in general is accompanied with increase of blood pressure.

12. A random sample of 10 boys had the following I.Q: 70, 120, 110, 101, 88, 83, 95, 98, 107,
100. Does this data support the assumption of a population mean I.Q. of 100 at 5% level of
Significance? (Note:𝒕𝟎.𝟎𝟓 =2.262 for 9 d.f.).
Sol.
Given the I.Q. of 10 boys
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |8
Inspire before you expire…, TIE- Notes and Resources BCS301
𝑥: 70, 120, 110,101,88,83,95,98,107,100
1 972
∴ 𝑥̄ = ∑ 𝑥 = = 97.2
𝑛 10
2 1
Variance, 𝑠 = ∑(𝑥 − 𝑥̄ )2
𝑛−1
1
⇒ 𝑠 2 = × 1833.6
9
⇒ 𝑠 2 = 203.73333
⇒ 𝑠 = 14.2735
Given the mean of population 𝜇 = 100
We have,
𝑥̄ −𝜇
𝑡= 𝑠
√𝑛
97.2−100
⇒𝑡= 14.2735
( )
√10
−2.8
⇒𝑡= ≈ −0.6203 < 2.262
4.5136

13. Ten individuals are chosen at random from a population and their heights in inches are
found to be 63, 63,66,67,68,69,70,70,71,71 . Test the hypothesis that the mean height of the
universe is 66inches (𝒕𝟎.𝟎𝟓 =2.262 for 9 d.f.).
Sol.
Given the heights of the population in inches
𝑥: 63, 63,66,67,68,69,70,70,71,71
1 678
∴ 𝑥̄ = ∑ 𝑥 = = 67.8
𝑛 10
2 1
Variance, 𝑠 = ∑(𝑥 − 𝑥̄ )2
𝑛−1
⇒ 𝑠2 =
2 2 2 2 2 2
1 (63 − 67.8) + (63 − 67.8) + (66 − 67.8) + (67 − 67.8) + (68 − 67.8) + (69 − 67.8)
{ }
9 +(70 − 67.8)2 + (70 − 67.8)2 + (71 − 67.8)2 + (71 − 67.8)2
⇒ 𝑠 2 = 9.067 ⇒ 𝑠 = 3.011

And given the mean of population 𝜇 = 66


𝑥̄ −𝜇
𝑡= 𝑠
√𝑛
67.8−66
⇒𝑡= 3.011
( )
√10
We have ⇒ 𝑡 = 1.8979 ≈ 1.89 > 2.262
Thus, the hypothesis is accepted at 5% level of significance.

14. The nine items of a sample have the following values: 45, 47, 50, 52, 48, 47, 49, 53, 51. Does
the mean of these differ significantly from the assumed mean of 47.5 at 5% significance
level?
Sol.
Given sample values: 45, 47, 50, 52, 48, 47, 49, 53, 51
Therefore, sample size n=9
Population Mean 𝜇 = 47.50
1 442
∴Sample mean 𝑥̄ = ∑ 𝑥 = = 49.11
𝑛 9

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |9


Inspire before you expire…, TIE- Notes and Resources BCS301
2 1 2
Variance, 𝑠 = ∑(𝑥 − 𝑥̄ )
𝑛−1
⇒ 𝑠2 =
2 2 2 2 2
1 (45 − 49.11) + (47 − 49.11) + (50 − 49.11) + (52 − 49.11) + (48 − 49.11)
{ }
8 +(47 − 49.11)2 + (49 − 49.11)2 + (53 − 49.11)2 + (51 − 49.11)2
54.9
⇒ 𝑠2 = = 6.8625 ⇒ 𝑠 = √6.8625 = 2.6196
8
∴Thee Null hypothesis 𝐻0 : 𝜇 = 47.5
𝑥̄ −𝜇
𝑡= 𝑠
√𝑛
49.11−47.5
⇒𝑡=
(2.6196⁄ )
√9
1.61
⇒𝑡=
0.8732
⇒ 𝑡 = 1.8437

∴ Level of significance =5%


Critical value at 5 % level of significance for v=9-1=8 degrees of freedom is 2.3060.
Since the calculated value 1.8437 is less than the tabulated value 2.3060.
Hence the Null hypothesis is accepted.

15. Two types of batteries are tested for their length of life and the following results are
obtained:
Battery A: 𝒏𝟏 = 𝟏𝟎, 𝒙̄ 𝟏 = 𝟓𝟎𝟎𝒉𝒓𝒔. , 𝝈𝟏 𝟐 = 𝟏𝟎𝟎
Battery B: 𝒏𝟐 = 𝟏𝟎, 𝒙̄ 𝟐 = 𝟓𝟔𝟎𝒉𝒓𝒔. , 𝝈𝟐 𝟐 = 𝟏𝟐𝟏
Compute Student's t and test whether there is a significant difference in the two means.
Sol.
Given
Battery A: 𝑛1 = 10, 𝑥̄ 1 = 500ℎ𝑟𝑠. , 𝜎1 2 = 100
Battery B: 𝑛2 = 10, 𝑥̄ 2 = 560ℎ𝑟𝑠. , 𝜎2 2 = 121
We know that,
𝑛1 𝜎1 2 +𝑛2 𝜎2 2
𝑠2 =
𝑛1 +𝑛2 −2
(10×100)+(10×121)
⇒ 𝑠2 =
10+10−2
2
⇒ 𝑠 = 122.78
⇒ 𝑠 = 11.0805
We have,
(𝑥̄ −𝑥̄ )
𝑡 = 21 11
𝑠√ +
𝑛1 𝑛2
560−500
⇒𝑡=
11.0805√0.1+0.1
⇒ 𝑡 = 12.1081 ≈ 12.11
The value of t is greater than the table value of t for 18d.f.at all levels of significance.

16. A group of boys and girls were given an intelligence test. The mean score , SD score and
numbers in each group are as follows.
Boys Girls
Mean 74 70
SD 8 10
n 12 10

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 10


Inspire before you expire…, TIE- Notes and Resources BCS301
Is the difference between the means of the two groups significant at 5% level of significance
(𝒕𝟎.𝟎𝟓 = 𝟐. 𝟎𝟖𝟔 for 20 𝒅. 𝒇.)
Sol.
Given
𝑥̄ 1 = 74, 𝜎1 = 8, 𝑛1 = 12{𝐵𝑜𝑦𝑠}
𝑥̄ 2 = 70, 𝜎2 = 10, 𝑛2 = 10{𝐺𝑖𝑟𝑙𝑠}
We know that
𝑛1 𝜎1 2 +𝑛2 𝜎2 2
𝑠2 =
𝑛1 +𝑛2 −2
2 (12×64)+(10×100)
⇒𝑠 =
12+10−2
2 1768
⇒𝑠 = = 88.4
20
⇒ 𝑠 = 9.402 ≈ 9.4
We have
|(𝑥̄ −𝑥̄ )|
𝑡 = 21 11
𝑠√ +
𝑛1 𝑛2
74−70 4 4
⇒𝑡= = = = 0.9939
1 1
9.4√ + 9.4  0.4281 4.0244
12 10
Thus, the hypothesis that there is a difference between the means of the two groups is accepted
at 5% level of significance.

17. Two horses A and B were tested according to the time (In Seconds) to run a particular
race with the following results:

Horse A 28 30 32 33 33 29 34
Horse B 29 30 30 24 27 29 -
Test whether you can discriminate between the two horses.
Sol.
Let the variables x and y respectively correspond to Horse A and B
𝑥: 28,30,32,33, ,33,29,34
𝑦: 29,30,30,24,27,29
1 219 1 169
∴ 𝑥̄ = ∑𝑖 𝑥𝑖 = = 31.30 , ∴ 𝑦̄ = ∑𝑖 𝑦𝑖 = = 28.20
𝑛1 7 𝑛2 6
∑(𝑥 − 𝑥̄ )2 = (28 − 31.3)2 + (30 − 31.3)2 + (32 − 31.3)2 + (33 − 31.3)2 + (33 − 31.3)2 +
(29 − 31.3)2 + (34 − 31.3)2 = 31.4

∑(𝑦 − 𝑦̄ )2 = (29 − 28.20)2 + (30 − 28.20)2 + (30 − 28.20)2 + (24 − 28.20)2 + (27 −
28.20)2 + (29 − 28.20)2 = 26.84
1
∴ 𝑠2 = [∑(𝑥 − 𝑥̄ )2 + ∑(𝑦 − 𝑦̄ )2 ]
𝑛1 +𝑛2 −2
2 31.4+26.84
⇒𝑠 = = 5.2973
7+6−2
⇒ 𝑠 = 2.3016
We have,
|(𝑥̄ 2 −𝑥̄ 1 )| 31.30−28.20 > 𝑡0.05 = 2.2
𝑡= ⇒𝑡= ⇒ 𝑡 = 2.42 {
1
𝑠√ +
1 1 1
2.3016√ + < 𝑡0.02 = 2.72
𝑛1 𝑛2 7 6

18. Four coins are tossed 100 times and the following results were obtained:
No. of Heads 0 1 2 3 4
Frequency 5 29 36 25 5
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 11
Inspire before you expire…, TIE- Notes and Resources BCS301
𝟐
Fit a binomial distribution for the data and test the goodness of fit 𝝌𝟎.𝟎𝟓 = 9.49 for 4 d.f.
Sol.
Given the 4 coins are tossed 100 times
The probability of getting head is p=0.5, q=0.5
The probability mass function of a binomial distribution is
𝑃(𝑋 = 𝑥) = 4𝐶𝑥 (0.5)𝑥 (0.5)4−𝑥
𝑃(0) = 4𝐶0 (0.5)0 (0.5)4−0 = 0.0625
𝑃(1) = 4𝐶1 (0.5)1 (0.5)4−1 = 0.25
𝑃(2) = 4𝐶2 (0.5)2 (0.5)4−2 = 0.375
𝑃(3) = 4𝐶3 (0.5)3 (0.5)4−3 = 0.25
𝑃(4) = 4𝐶4 (0.5)4 (0.5)4−4 = 0.0625

∴ 𝐸0 = 100 × 0.0625 = 6.25


𝐸1 = 100 × 0.25 = 25
𝐸2 = 100 × 0.375 = 37.5
𝐸3 = 100 × 0.25 = 25
𝐸4 = 100 × 0.0625 = 6.25
where 100 is the sum of frequency

𝑂𝑖 5 29 36 25 5
𝐸𝑖 6.25 25 37.5 25 6.25

(𝑂𝑖 −𝐸𝑖 )2
∴ 𝜒 2 = ∑𝑖 [ ]
𝐸𝑖
1.5625 16 2.25 1.5625
⇒ 𝜒2 = + + +0+
6.25 25 37.5 6.25
⇒ 𝜒 2 = 0.25 + 0.64 + 0.06 + 0.25
⇒ 𝜒 2 = 1.2 < 𝜒 2 0.05 = 9.49

Hence the fitness is good.

19. A dice thrown 264 times and the number appearing on the face (𝒙) follows the following
frequency (𝒇) distribution.
𝒙 1 2 3 4 5 6
𝒇 40 32 28 58 54 60
Calculate the value of 𝝌𝟐 .
Sol.
The frequencies in the given data are the observed frequencies. assuming that dice is
unbiased, the expected number of frequencies for the numbers 1,2,3,4,5,6 to appear on the
264
face is = 44 each.
6
Now the data is as follows:
𝑥 1 2 3 4 5 6
𝑂𝑖 40 32 28 58 54 60
𝐸𝑖 44 44 44 44 44 44
(𝑂𝑖 −𝐸𝑖 )2
∴ 𝜒 2 = ∑𝑖 [ ]
𝐸𝑖
(40−44)2 (32−44)2 (28−44)2 (58−44)2 (54−44)2 (60−44)2
⇒ 𝜒2 = + + + + +
44 44 44 44 44 44

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 12


Inspire before you expire…, TIE- Notes and Resources BCS301
2 1 968
⇒ 𝜒 = [16 + 144 + 256 + 196 + 100 + 256]
44 44
⇒ 𝜒 2 = 22

20. A die was thrown 60 times and the following frequency distribution was observed:
Faces 1 2 3 4 5 6
Frequency 15 6 4 7 11 17
Test whether the die is unbiased at 5% significance level.
Sol.
The frequencies in the given data are the observed frequencies. Assuming that dice is
unbiased, the expected number of frequencies for the numbers 1,2,3,4,5,6 to appear on the
60
face is = 10 each.
6
Now the data is as follows:
𝑥 1 2 3 4 5 6
𝑂𝑖 15 6 4 7 11 17
𝐸𝑖 10 10 10 10 10 10
(𝑂𝑖 −𝐸𝑖 )2
∴ 𝜒 2 = ∑𝑖 [ ]
𝐸𝑖
(15−10)2 (6−10)2 (4−10)2 (7−10)2 (11−10)2 (17−10)2
⇒ 𝜒2 = + + + + +
10 10 10 10 10 10
2 1 136
⇒𝜒 = [25 + 16 + 36 + 9 + 1 + 49] =
10 10
⇒ 𝜒 2 = 13.6

21. A survey of 320 families with 5 children each revealed the following distribution.
No. of boys 5 4 3 2 1 0
No. of girls 0 1 2 3 4 5
No. of families 14 56 110 88 40 12
Is the result consistent with the hypothesis that male and female births are equally
probable at 5% level of significance?
Sol.
Given,
Number of families selected for the survey = 320
1
The probability of female and male birth is equal, 𝑝 = = 0.5 ⇒ 𝑞 = 1 − 𝑝 = 1 − 0.5 =
2
0.5
Number of children in the selected families, n = 5

No. of boys 5 4 3 2 1 0
No. of girls 0 1 2 3 4 5
No. of families 14 56 110 88 40 12

The statistical hypothesis is,


𝐻0 : The probability of female and male birth is equal.
𝐻1 : The probability of female and male birth is not equal.
Here Chi square distribution is used to test the hypothesis.
Therefore, by the Binomial distribution.
We have,
𝑃(𝑥) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞𝑛−𝑥
∴ 𝑃(𝑥) = 5𝐶𝑥 (0.5)𝑥 (0.5)5−𝑥
⇒ 𝑃(𝑥) = 5𝐶𝑥 (0.5)5
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 13
Inspire before you expire…, TIE- Notes and Resources BCS301
The expected frequencies can be calculated for 320 families as
𝐸(𝑥) = 320 × 𝑃(𝑥) = 320 × 5𝐶𝑥 (0.5)5
∴ 𝐸(0) = 320 × 𝑃(0) = 320 × 5𝐶0 (0.5)5 = 320 × (0.5)5 = 10 = 𝐸0
𝐸(1) = 320 × 𝑃(1) = 320 × 5𝐶1 (0.5)5 = 320 × 5 × (0.5)5 = 50 = 𝐸1
𝐸(2) = 320 × 𝑃(2) = 320 × 5𝐶2 (0.5)5 = 320 × 5𝐶2 × (0.5)5 = 100 = 𝐸2
𝐸(3) = 320 × 𝑃(3) = 320 × 5𝐶3 (0.5)5 = 320 × 5𝐶3 × (0.5)5 = 100 = 𝐸3
𝐸(4) = 320 × 𝑃(4) = 320 × 5𝐶4 (0.5)5 = 320 × 5𝐶4 × (0.5)5 = 50 = 𝐸4
𝐸(5) = 320 × 𝑃(5) = 320 × 5𝐶5 (0.5)5 = 320 × 5𝐶5 × (0.5)5 = 10 = 𝐸5

No. No. Total Expected 𝑶𝒊 (𝑶𝒊 (𝑶𝒊 − 𝑬𝒊 )𝟐


of of Observed Frequencies(Ei) − 𝑬𝒊 − 𝑬𝒊 )𝟐 𝑬𝒊
Boys Girls Frequencies
(Oi)
5 0 14 10 4 16 1.6
4 1 56 50 6 36 0.72
3 2 110 100 10 100 1
2 3 88 100 -12 144 1.44
1 4 40 50 -10 100 2
0 5 12 10 2 4 0.4

We have the Table value of𝜒 2 for 5 degrees of freedom at level of significance 5% from the
chi-square table is 11.07.
(𝑂 −𝐸 )2
∴ 𝜒 2 = ∑𝑖 [ 𝑖 𝑖 ] ⇒ 𝜒 2 = 7.16 < 11.02
𝐸𝑖
Since the calculated 𝜒 2 value is less than tabulated 𝜒 2 value then the decision is fail to reject
the 𝐻0 (Accept 𝐻0 ) that means both the male and female birth is equal.

22. The theory predicts the proportion of beans in the four groups A, B, C and D should be
9:3:3:1. In an experiment among 1600 beans, the number in four groups were 882, 313, 287
and 118. The chi square value is approximately equal to.
Sol.
Given,
The total number of beans: 882+313+287+118=1600
Sum of the ratios: 9+3+3+1=16
9
𝐸(𝐴) = 1600 × = 900
16
3
𝐸(𝐵) = 1600 × = 300
16
3
𝐸(𝐶) = 1600 × = 300
16
1
𝐸(𝐷) = 1600 × = 100
16
𝑶𝒊 𝑬𝒊 𝑶𝒊 − 𝑬𝒊 (𝑶𝒊 − 𝑬𝒊 )𝟐 ∑(𝑶𝒊 − 𝑬𝒊 )𝟐
𝑬𝒊
882 900 -18 324 0.36
313 300 13 169 0.5633
287 300 -13 169 0.5633
118 100 18 324 3.24
(𝑂𝑖 −𝐸𝑖 )2
∴ 𝜒 2 = ∑𝑖 [ ]
𝐸𝑖
⇒ 𝜒 2 = 4.72
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 14
Inspire before you expire…, TIE- Notes and Resources BCS301

23. Two random samples drawn from two normal populations are:
Sample-I 20 16 26 27 22 23 18 24 19 25 - -
Sample-II 27 33 42 35 32 34 38 28 41 43 30 37
Obtain the estimates of the variance of the population and test 5% level of significance
whether the two populations have the same variance.
Sol.
Set Null Hypothesis:𝐻0 : 𝜎1 2 = 𝜎2 2
i.e., The two samples are drawn from two populations having the same variance.
Alternate Hypothesis: 𝐻1 : 𝜎1 2 ≠ 𝜎2 2
Given,

Sample-I 20 16 26 27 22 23 18 24 19 25 - -
Sample-II 27 33 42 35 32 34 38 28 41 43 30 37
𝑛
∑𝑖=11 𝑥𝑖 20+16+26+27+22+23+18+24+19+25 220
𝑥̄ 1 = ⇒ 𝑥̄ 1 = ⇒ 𝑥̄ 1 = ⇒ 𝑥̄ 1 = 22
𝑛1 10 10
𝑛2
∑𝑖=1 𝑥𝑖 27+33+42+35+32+34+38+28+41+43+30+37 420
𝑥̄ 2 = ⇒ 𝑥̄ 2 = ⇒ 𝑥̄ 2 = ⇒ 𝑥̄ 2 = 35
𝑛2 12 12

∴The statistic F is defined by the ratio:


𝑆1 2
𝐹0 = − − − − − −(1)
𝑆2 2
1 120
where 𝑆1 2 = ∑(𝑥1 − 𝑥̄ 1 )2 = = 13.33
𝑛1 −1 9
1 314
𝑆2 2 = ∑(𝑥2 − 𝑥̄ 2 )2 = = 28.54
𝑛2 −1 11
𝐿 𝑎𝑟𝑔 𝑒𝑟𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
Since 𝑆2 2 > 𝑆1 2 , 𝐹 =
𝑆𝑚𝑎𝑙𝑙𝑒𝑟𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑆2 2 28.54
𝐹0 = = = 2.14
𝑆1 2 13.33
Expected Value:
𝑆2 2
𝐹𝐸 = , follows F-distribution with the degrees of freedom as given below for 5% level of
𝑆1 2
significance:
𝑣1 = 𝑛1 − 1 = 10 − 1 = 9, 𝑣2 = 𝑛2 − 1 = 12 − 1 = 11 is 3.10

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 15


Inspire before you expire…, TIE- Notes and Resources BCS301
Since F0 < FE we accept null hypothesis at 5% level of significance and conclude that the two
samples may be regarded as drawn from the populations having same variance.

24. The table shows the standard Deviation and Sample Standard Deviation for both men and
women. Find the f statistic considering the Men population in numerator.
Population Population Standard Sample Standard
Deviation Deviation
Men 30 35
Women 50 45
Sol.
Given,
𝜎1 =Standard deviation of population-1=30
𝜎2 =Standard deviation of population-2=50
𝑠1 =Standard deviation of sample-1=35
𝑠2 =Standard deviation of sample-2=45
We know that,
𝑠1 2 2
𝜎1 2 (35 ⁄ 2 ) (1225⁄900) 1.3610
30
𝐹= 𝑠2 2
⇒𝐹= 2 ⇒𝐹= ⇒𝐹= ⇒ 𝐹 = 1.68
(45 ⁄ 2 ) (2025⁄2500) 0.81
𝜎2 2 50

***

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 16


Inspire before you expire…, TIE- Notes and Resources BCS301

||JAI SRI GURUDEV||


S.J.C. INSTITUTE OF TECHNOLOGY, CHICKBALLAPUR
Department of Mathematics
LECTURE NOTES
MATHEMATICS-3 FOR COMPUTER SCIENCE STREAM (BCS301)
MODULE - 5
DESIGN OF EXPERIMENTS AND ANOVA

Prepared by:
Purushotham P
Assistant Professor
SJC Institute of Technology
Email id: [email protected]

Experimental unit:
For conducting an experiment, the experimental material is divided into smaller parts and
each part is referred to as an experimental unit. The experimental unit is randomly assigned to
treatment is the experimental unit. The phrase “randomly assigned” is very important in this
definition.

Experiment:
A way of getting an answer to a question which the experimenter wants to know.

Treatment
Different objects or procedures which are to be compared in an experiment are called
treatments.

Sampling unit:
The object that is measured in an experiment is called the sampling unit. This may be different
from the experimental unit.

Factor:
A factor is a variable defining a categorization. A factor can be fixed or random in nature. A
factor is termed as a fixed factor if all the levels of interest are included in the experiment. A
factor is termed as a random factor if all the levels of interest are not included in the experiment
and those that are can be considered to be randomly chosen from all the levels of interest.

Replication:
It is the repetition of the experimental situation by replicating the experimental unit.

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |1


Inspire before you expire…, TIE- Notes and Resources BCS301
Experimental error:
The unexplained random part of the variation in any experiment is termed as
experimental error. An estimate of experimental error can be obtained by replication.
Treatment design:
A treatment design is the manner in which the levels of treatments are arranged in an
experiment.

ANOVA:
Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed
aggregate variability found inside a data set into two parts: systematic factors and random factors.
The systematic factors have a statistical influence on the given data set, while the random factors do
not.
ANOVA stands for Analysis of Variance. It is a statistical method used to analyze the
differences between the means of two or more groups or treatments. It is often used to determine
whether there are any statistically significant differences between the means of different groups
There are two main types of ANOVA: one-way (or unidirectional) and two-way. There also
variations of ANOVA.

Real Life Applications of ANOVA:

• In social sciences, ANOVA tests can be used to study the statistical significance of various study
environments on test scores. Medical research. In medical research, the ANOVA test can be used
to identify the relationship between various types or brands of medications on individuals with
migraines or depression.
• We can use the ANOVA test to compare different suppliers and select the best available. ANOVA
(Analysis of Variance) is used when we have more than two sample groups and determine whether
there are any statistically significant differences between the means of two or more independent
sample groups.
CRD: A completely randomized design (CRD) is one where the treatments are assigned completely
at random so that each experimental unit has the same chance of receiving any one treatment.

RBD: A randomized block design is a restricted randomized design, in which experimental units are
first organized into homogeneous blocks and then the treatments are assigned at random to these units

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |2


Inspire before you expire…, TIE- Notes and Resources BCS301
within these blocks. The main advantage of this design is, if done properly, it provides more precise
results.

LSD: The Latin Square Design gets its name from the fact that we can write it as a square with Latin
letters to correspond to the treatments. The treatment factor levels are the Latin letters in the Latin
square design. The number of rows and columns has to correspond to the number of treatment levels.

ONE WAY CLASSIFICATION:

• Define the problem for different varieties and different treatments.


Verities Sum Squares
𝑥11 𝑥12 𝑥13 … 𝑥1𝑛1 𝑇1 𝑇1 2
𝑥21 𝑥22 𝑥23 … 𝑥2𝑛2 𝑇2 𝑇2 2
𝑥31 𝑥32 𝑥33 … 𝑥3𝑛3 𝑇3 𝑇3 2
--- --- --- … --- ---
-
𝑥𝑘1 𝑥𝑘2 𝑥𝑘3 … 𝑥𝑘𝑛𝑘 𝑇𝑘 𝑇𝑘 2

• Define the null hypothesis 𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 =….=𝜇𝑛 for the level of significance.


• Find the sum of all the verities (Row wise) Find the sum of all the contents of N varieties,
say T.
𝑇2
• Find the correction factor 𝐶𝐹 =
𝑁
• Find the sum of squares of individual items 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹
𝑇𝑖 2
• Find the sum of the squares of between the treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
• Find the sum of squares with in the class or sum of squares due to error by subtraction
SEE=TSS-SST.
• Here k represents total number of verities, N represents the total number of observations.
• Plot the ANOVA table

Sources d.f SS MSS F Ratio


variation
Between k-1 SST 𝑆𝑆𝑇 𝑀𝑆𝑇
𝑀𝑆𝑇 = 𝐹=
treatments 𝑘−1 𝑀𝑆𝐸
Error N-k SSE 𝑆𝑆𝐸
𝑀𝑆𝐸 =
𝑁−𝑘
Total N-1 - -

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |3


Inspire before you expire…, TIE- Notes and Resources BCS301

PROBLEMS:

1. Three processes A, B and C are tested to see whether their outputs are equivalent. The
following observations of outputs are made:

A 10 12 13 11 10 14 15 13

B 9 11 10 12 13 - - -

C 11 10 15 14 12 13 - -
Carry out the analysis of variance and state your conclusion.

Sol. To carry out the analysis of variance, we form the following tables

Total Squares

A 10 12 13 11 10 14 15 13 T1=98 T21=9604

B 9 11 10 12 13 T2=55 T22=3025

C 11 10 15 14 12 13 T3=75 T23=5625

Total T 228 -

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |4


Inspire before you expire…, TIE- Notes and Resources BCS301

The squares are as follows

Sum of Squares

A 100 144 169 121 100 196 225 169 1224

B 81 121 100 144 169 615

C 121 100 225 196 144 169 955

Grand Total - ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 2794


Set the null hypotheses 𝐻0 : 𝜇1 = 𝜇2= 𝜇3
𝑇2 (228)2 51984
Correction Factor 𝐶𝐹 = = = = 2736
𝑁 19 19

Therefore, Total sum of squares 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹

⇒ 𝑇𝑆𝑆 = 2794 − 2736


⇒ 𝑇𝑆𝑆 = 58
𝑇𝑖 2
Sum of the squares of between the treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

9604 3025 5625


𝑆𝑆𝑇 = + + − 2736
8 5 6
⇒ 𝑆𝑆𝑇 = 1200.5 + 605 + 937.5 − 2736
⇒ 𝑆𝑆𝑇 = 2743 − 2736
⇒ 𝑆𝑆𝑇 = 7

Therefore, sum of squares due to error SEE=TSS-SST

⇒ 𝑆𝑆𝐸 = 58 − 7
⇒ 𝑆𝑆𝐸 = 51

Sources d.f. SS MSS F Ratio


variation
Between 3-1=2 SST=7 7
𝑀𝑆𝑇 = = 3.5 3.5
treatments 2
51 𝐹=
Error 19-3=16 SSE=51 3.1875
𝑀𝑆𝐸 = = 1.0980
16
= 3.1875
Total 19-1=18 - -

Since evaluated value 1.0980<3.63 for F(2,16) at 5% level of significance


Hence the null hypothesis is accepted, there is no significance between the three process.

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |5


Inspire before you expire…, TIE- Notes and Resources BCS301
2. A test was given to five students taken at random from the fifth class of three schools
of a town. The individual scores are
School I 9 7 6 5 8
School II 7 4 5 4 5
School III 6 5 6 7 6

Carry out the analysis of variance.


Sol.

To carry out the analysis of variance, we form the following tables

Total Squares

S1 9 7 6 5 8 T1=35 T21=1225

S2 7 4 5 4 5 T2=25 T22=625

S3 6 5 6 7 6 T3=30 T23=900

Total T= 90 -

The squares are as follows

Sum of Squares

S1 81 49 36 25 64 255

S2 49 16 25 16 25 131

S3 36 25 36 49 36 182

Grand Total - ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 568


Set the null hypotheses 𝐻0 : 𝜇1 = 𝜇2= 𝜇3
𝑇2 (90)2 8100
Correction Factor 𝐶𝐹 = = = = 540
𝑁 15 15

Therefore, Total sum of squares 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹

⇒ 𝑇𝑆𝑆 = 568 − 540


⇒ 𝑇𝑆𝑆 = 28
𝑇𝑖 2
Sum of the squares of between the treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

1225 625 900


𝑆𝑆𝑇 = + + − 540
5 5 5
⇒ 𝑆𝑆𝑇 = 245 + 125 + 180 − 540
⇒ 𝑆𝑆𝑇 = 550 − 540
⇒ 𝑆𝑆𝑇 = 10

Therefore sum of squares due to error SEE=TSS-SST

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |6


Inspire before you expire…, TIE- Notes and Resources BCS301
⇒ 𝑆𝑆𝐸 = 28 − 10 ⇒ 𝑆𝑆𝐸 = 18

Sources d.f. SS MSS F Ratio


variation
Between 3-1=2 SST=10 10
=5 𝑀𝑆𝑇 = 5
treatments 2
18 𝐹 = = 3.33
Error 15-3=12 SSE=18 1.5
𝑀𝑆𝐸 = = 1.5
12
Total 15-1=14 - -
Since evaluated value 3.33<3.63 for F(2,12) at 5% level of significance
Hence the null hypothesis is accepted, there is no significance between the three process.

3. Three different kinds of food are tested on three groups of rats for 5 weeks. The
objective is to check the difference in mean weight (in grams) of the rats per week.
Apply one-way ANOVA using a 0.05 significance level to the following data:
Food 1 8 12 19 8 6 11
Food 2 4 5 4 6 9 7
Food 3 11 8 7 13 7 9
Sol. To carry out the analysis of variance, we form the following tables

Total Squares
8 12 19 8 6 11
F1 T1=64 T21=4096
4 5 4 6 9 7
F2 T2=35 T22=1225
11 8 7 13 7 9
F3 T3=55 T23=3025
Total T
154 -

The squares are as follows

Sum of Squares

F1 64 144 361 64 36 121 790

F2 16 25 16 36 81 49 223

F3 121 64 49 169 49 81 533

Grand Total - ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 1546


Set the null hypotheses 𝐻0 : 𝜇1 = 𝜇2 = 𝜇3
𝑇2 (154)2 23716
Correction Factor 𝐶𝐹 = = = = 1317.55
𝑁 18 18

Therefore Total sum of squares 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹

⇒ 𝑇𝑆𝑆 = 1546 − 1317.55


⇒ 𝑇𝑆𝑆 = 228.45

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |7


Inspire before you expire…, TIE- Notes and Resources BCS301
𝑇𝑖 2
Sum of the squares of between the treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

4096 1225 3025


𝑆𝑆𝑇 = + + − 1317.55
6 6 6
⇒ 𝑆𝑆𝑇 = 682.66 + 204.166 + 504.166 − 1317.55
⇒ 𝑆𝑆𝑇 = 1391 − 1317.55
⇒ 𝑆𝑆𝑇 = 73.45

Therefore, sum of squares due to error SEE=TSS-SST

⇒ 𝑆𝑆𝐸 = 228.45 − 73.45 ⇒ 𝑆𝑆𝐸 = 155

Sources d.f. SS MSS F Ratio


variation
Between 3-1=2 SST=73.45 73.45
treatments 𝑀𝑆𝑇 = 36.725
2 𝐹=
= 36.725 10.33
Error 18-3=15 SSE=155 155 = 3.55
𝑀𝑆𝐸 = = 10.33
15

Total 18-1=17 - -
Since evaluated value 3.55 <3.68 for F(2,15) at 5% level of significance
Hence the null hypothesis is accepted , there is no significance between the three process.

4. Three types of fertilizers are used on three groups of plants for 5 weeks. We want
to check if there is a difference in the mean growth of each group. Using the data
given below apply a one-way ANOVA test at 0.05 significant level
Fertilizer 1 6 8 4 5 3 4
Fertilizer 2 8 12 9 11 6 8
Fertilizer 3 13 9 11 8 7 12

Sol.
To carry out the analysis of variance, we form the following tables

Total Squares
6 8 4 5 3 4
F1 T1=30 T21=900
8 12 9 11 6 8
F2 T2=54 T22=2916
13 9 11 8 7 12
F3 T3=60 T23=3600
Total T
144 -

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |8


Inspire before you expire…, TIE- Notes and Resources BCS301

The squares are as follows

Sum of Squares

F1 36 64 16 25 9 16 166

F2 64 144 81 121 36 64 510

F3 169 81 121 64 49 144 628

Grand Total - ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 1304


Set the null hypotheses 𝐻0 : 𝜇1 = 𝜇2= 𝜇3
𝑇2 (144)2 20736
Correction Factor 𝐶𝐹 = = = = 1152
𝑁 18 18

Therefore Total sum of squares 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹

⇒ 𝑇𝑆𝑆 = 1304 − 1152


⇒ 𝑇𝑆𝑆 = 152
𝑇𝑖 2
Sum of the squares of between the treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

900 2916 3600


𝑆𝑆𝑇 = + + − 1152
6 6 6
⇒ 𝑆𝑆𝑇 = 150 + 486 + 600 − 1152
⇒ 𝑆𝑆𝑇 = 1236 − 1152
⇒ 𝑆𝑆𝑇 = 84

Therefore sum of squares due to error SEE=TSS-SST

⇒ 𝑆𝑆𝐸 = 152 − 84 ⇒ 𝑆𝑆𝐸 = 68

Sources d.f. SS MSS F Ratio


variation
Between 3-1=2 SST=84 84
𝑀𝑆𝑇 = = 42 42
treatments 2 𝐹=
4.533
Error 18-3=15 SSE=68 68 = 9.2653
𝑀𝑆𝐸 = = 4.533
15
Total 18-1=17 - -

Since evaluated value 9.2653 >3.68 for F(2,15) at 5% level of significance


Hence the null hypothesis is rejected, there is significance between the tree process.

5. Set an analysis of variance table for the following data.


A 6 7 3 8
B 5 5 3 7
C 5 4 3 4
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |9
Inspire before you expire…, TIE- Notes and Resources BCS301
Sol.

To carry out the analysis of variance, we form the following tables

Total Squares
6 7 3 8
A T1=24 T21=576
5 5 3 7
B T2=20 T22=400
5 4 3 4
C T3=16 T23=256
Total T
60 -

The squares are as follows

Total Squares
36 49 9 64
A 158
25 25 9 49
B 108
25 16 9 16
C 66
2
Grand Total - ∑𝑖 ∑𝑗 𝑥𝑖𝑗
332
Set the null hypotheses 𝐻0 : 𝜇1 = 𝜇2= 𝜇3
𝑇2 (60)2 3600
Correction Factor 𝐶𝐹 = = = = 300
𝑁 12 12

Therefore, Total sum of squares 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹

⇒ 𝑇𝑆𝑆 = 332 − 300


⇒ 𝑇𝑆𝑆 = 32
𝑇𝑖 2
Sum of the squares of between the treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

576 400 256


𝑆𝑆𝑇 = + + − 300
4 4 4
⇒ 𝑆𝑆𝑇 = 144 + 100 + 64 − 300
⇒ 𝑆𝑆𝑇 = 308 − 300
⇒ 𝑆𝑆𝑇 = 8

Therefore sum of squares due to error SEE=TSS-SST

⇒ 𝑆𝑆𝐸 = 32 − 8
⇒ 𝑆𝑆𝐸 = 24

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 10


Inspire before you expire…, TIE- Notes and Resources BCS301
Sources d.f. SS MSS F Ratio
variation
Between 3-1=2 SST=8 8
𝑀𝑆𝑇 = =4 4
treatments 2
24 𝐹=
Error 12-3=9 SSE=24 2.66
𝑀𝑆𝐸 = = 1.5037
9
= 2.66
Total 12-1=11 - -

Since evaluated value 1.5037<4.26 for F(2,9) at 5% level of significance


Hence the null hypothesis is accepted, there is no significance between the three process.

6. A trial was run to check the effects of different diets. Positive numbers indicate weight loss
and negative numbers indicate weight gain. Check if there is an average difference in the
weight of people following different diets using an ANOVA Table.
Low Fat Low Low protein Low
Calorie carbohydrate
8 2 3 2
9 4 5 2
6 3 4 -1
7 5 2 0
3 1 3 3
Sol.
To carry out the analysis of variance, we form the following tables

Low Fat Low Low protein Low


Calorie carbohydrate
8 2 3 2
9 4 5 2
6 3 4 -1
7 5 2 0
3 1 3 3
T 33 15 17 6 71
2
T 1089 225 289 36 -
The squares are as follows
Low Fat Low Calorie Low protein Low carbohydrate
64 4 9 4
81 16 25 4
36 9 16 1
49 25 4 0
9 1 9 9
∑ ∑ 𝑥𝑖𝑗 2 375
𝑖 𝑗 239 55 63 18

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 11


Inspire before you expire…, TIE- Notes and Resources BCS301
Set the null hypotheses 𝐻0 : 𝜇1 = 𝜇2= 𝜇3
𝑇2 (71)2 5041
Correction Factor 𝐶𝐹 = = = = 252
𝑁 20 20

Therefore Total sum of squares 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹

⇒ 𝑇𝑆𝑆 = 375 − 252


⇒ 𝑇𝑆𝑆 = 123
𝑇𝑖 2
Sum of the squares of between the treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

1089 225 289 36


𝑆𝑆𝑇 = + + + − 252
5 5 5 5
⇒ 𝑆𝑆𝑇 = 217.8 + 45 + 57.8 + 7.2 − 252
⇒ 𝑆𝑆𝑇 = 327.8 − 252
⇒ 𝑆𝑆𝑇 = 75.80

Therefore sum of squares due to error SEE=TSS-SST

⇒ 𝑆𝑆𝐸 = 123 − 75.80


⇒ 𝑆𝑆𝐸 = 47.2

Sources d.f. SS MSS F Ratio


variation
Between 4-1=3 SST=75.80 75.80
treatments 𝑀𝑆𝑇 =
25.26
3 𝐹 = = 8.56
= 25.26 2.95
Error 20-4=16 SSE=47.20 47.20
𝑀𝑆𝐸 =
16
= 2.95
Total 20-1=19 - -
Since evaluated value 8.56 >3.24 for F(3,16) at 5% level of significance
Hence the null hypothesis is rejected, there is significance between the four process.

7. The following data show the number of worms quarantined from the GI areas offour groups
of muskrats in a carbon tetrachloride anthelmintic study. Conduct a
two-way ANOVA test.
I II III IV
33 41 12 38
32 38 35 43
26 40 46 25
14 23 22 13
30 21 11 26

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 12


Inspire before you expire…, TIE- Notes and Resources BCS301
Sol.
Given
I II III IV
33 41 12 38
32 38 35 43
26 40 46 25
14 23 22 13
30 21 11 26
Subtract 30 from all the observations, we get
I II III IV
3 11 -18 8
2 8 5 13
-4 10 16 -5
-16 -7 -8 -17
0 -9 -19 -4

I II III IV
3 11 -18 8
2 8 5 13
-4 10 16 -5
-16 -7 -8 -17
0 -9 -19 -4
T -15 13 -24 -5 -31
T2 225 169 576 25

The squares are as follows


I II III IV
9 121 324 64
4 64 25 169
16 100 256 25
256 49 64 289
0 81 361 16
∑ ∑ 𝑥𝑖𝑗 2 2293
𝑖 𝑗 285 415 1030 563
Set the null hypotheses 𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4
𝑇2 (−31)2 961
Correction Factor 𝐶𝐹 = = = = 48
𝑁 20 20

Therefore, Total sum of squares 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹

⇒ 𝑇𝑆𝑆 = 2293 − 48
⇒ 𝑇𝑆𝑆 = 2245
𝑇𝑖 2
Sum of the squares of between the treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

225 169 576 25


𝑆𝑆𝑇 = + + + − 48
5 5 5 5
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 13
Inspire before you expire…, TIE- Notes and Resources BCS301
⇒ 𝑆𝑆𝑇 = 45 + 33.8 + 115.2 + 5 − 48
⇒ 𝑆𝑆𝑇 = 199 − 48
⇒ 𝑆𝑆𝑇 = 151

Therefore sum of squares due to error SEE=TSS-SST

⇒ 𝑆𝑆𝐸 = 2245 − 151


⇒ 𝑆𝑆𝐸 = 2094

Sources d.f. SS MSS F Ratio


variation
Between 4-1=3 SST=151 151
treatments 𝑀𝑆𝑇 = 130.87
3 𝐹= = 2.6
= 50.33 50.33
Error 20-4=16 SSE=2094 2094
𝑀𝑆𝐸 =
16
= 130.87
Total 20-1=19 - -
Since evaluated value 2.6<3.24 for F(3,16) at 5% level of significance
Hence the null hypothesis is accepted, there is no significance between the four process.

TWO WAY CLASSIFICATION:

• Define the problem for different varieties and different treatments.


Verities Sum Squares

𝑥11 𝑥12 𝑥13 … 𝑥1𝑛1 𝑇1 𝑇1 2


𝑥21 𝑥22 𝑥23 … 𝑥2𝑛2 𝑇2 𝑇2 2
𝑥31 𝑥32 𝑥33 … 𝑥3𝑛3 𝑇3 𝑇3 2
--- --- … ---
--- --- ---
-
𝑥𝑘1 𝑥𝑘2 𝑥𝑘3 … 𝑥𝑘𝑛𝑘 𝑇𝑘 𝑇𝑘 2
Sum P1 P2 P3 --- PK =G
2 2 2 2
Squares P1 P2 P3 --- PK
• Define the null hypothesis 𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 =….=𝜇𝑛 for the level of significance.
• Find the sum of all the verities (Row wise) Find the sum of all the observations of N
varieties, say T.
𝑇2
• Find the correction factor 𝐶𝐹 =
𝑁
• Find the sum of squares of individual items 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹
𝑇𝑖 2
• Find the sum of the squares of rows 𝑆𝑆𝑅 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
𝑃𝑖 2
• Find the sum of the squares of columns 𝑆𝐶𝐶 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
• Find the sum of squares with in the class or sum of squares due to error by subtraction
SEE=TSS-SSR-SSC.
• Plot the ANOVA table

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 14


Inspire before you expire…, TIE- Notes and Resources BCS301
Sources d.f. SS MSS F Ratio
variation
Rows r-1 SSR 𝑆𝑆𝑅 𝑀𝑆𝑅
𝑀𝑆𝑅 = 𝐹𝑟 =
𝑟−1 𝑀𝑆𝐸
Columns c-1 SSC 𝑆𝑆𝐶
𝑀𝑆𝐶 =
𝑐−1
𝑀𝑆𝐶
𝐹𝑐 =
Error (r-1)(c-1) SSE 𝑆𝑆𝐸 𝑀𝑆𝐸
𝑀𝑆𝐸 =
(𝑟 − 1)(𝑐 − 1)
Total N-1 - -

1. Set up an analysis of variance table for the following per acre production data for
three varieties of wheat, each grown on 4 plots and state it the variety differences
are significant at 5% significant level.
Per acre production data
Plot of land Variety of wheat
A B C
1 6 5 5
2 7 5 4
3 3 3 3
4 8 7 4
Sol.

To carry out the analysis of variance, we form the following tables


Per acre production data T T2
Plot of land Variety
A B C
1 6 5 5 16 256
2 7 5 4 16 256
3 3 3 3 9 81
4 8 7 4 19 361
P 24 20 16 =60 -
2
P 576 400 256
The squares are as follows

Variety
A B C
36 25 25
49 25 16
9 9 9
64 49 16 Grand Total -
∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 =332
Set the null hypotheses 𝐻0 : 𝜇1 = 𝜇2= 𝜇3 , N=12
𝑇2 (60)2 3600
Correction Factor 𝐶𝐹 = = = = 300
𝑁 12 12

Therefore, Total sum of squares 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹


Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 15
Inspire before you expire…, TIE- Notes and Resources BCS301
⇒ 𝑇𝑆𝑆 = 332 − 300
⇒ 𝑇𝑆𝑆 = 32

𝑇𝑖 2
Sum of the row squares 𝑆𝑆𝑅 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

256 256 81 361


𝑆𝑆𝑅 = + + + − 300
3 3 3 3
⇒ 𝑆𝑆𝑅 = 85.33 + 85.33 + 27 + 120.33 − 300
⇒ 𝑆𝑆𝑅 = 318 − 300
⇒ 𝑆𝑆𝑅 = 18
𝑃𝑖 2
Sum of the column squares 𝑆𝑆𝐶 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

576 400 256


𝑆𝑆𝐶 = + + − 300
4 4 4
⇒ 𝑆𝑆𝐶 = 144 + 100 + 64 − 300
⇒ 𝑆𝑆𝐶 = 308 − 300
⇒ 𝑆𝑆𝐶 = 8

Therefore SSE=TSS-SSR-SSC

SSE=32-18-8=6

Sources d.f. SS MSS F Ratio


variation
Rows 4-1=3 SSR=18 18 6
𝑀𝑆𝑅 = =6 𝐹𝑟 = =6
3 1
Columns 3-1=2 SSC=8 8
𝑀𝑆𝐶 = = 4
2
6 4
Error 3X2=6 SSE=6 𝑀𝑆𝐸 = =1
6 𝐹𝑐 = =4
Total 12-1=11 - - 1

𝐹𝑟 = 6> F(3,6)=4.76 &


𝐹𝑐 = 4< F(6,2)=19.33

2. Three varieties of coal were analysed by four chemists and the ash-content in the varieties
was found to be as under.
Varieties Chemists
1 2 3 4
A 8 5 5 7
B 7 6 4 4
C 3 6 5 4
Carry out the analysis of variance.

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 16


Inspire before you expire…, TIE- Notes and Resources BCS301
Sol. To carry out the analysis of variance, we form the following tables

Chemists T T2
Variety 1 2 3 4
A 8 5 5 7 25 625
B 7 6 4 4 21 441
C 3 6 5 4 18 324
P 18 17 14 15 =64 -
2
P 324 289 196 225
The squares are as follows

Chemists
1 2 3 4
64 25 25 49
49 36 16 16
9 36 25 16 Grand Total - ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 =366
Set the null hypotheses 𝐻0 : 𝜇1 = 𝜇2= 𝜇3 , N=12
𝑇2 (64)2 4096
Correction Factor 𝐶𝐹 = = = = 341.33
𝑁 12 12

Therefore, Total sum of squares 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹

⇒ 𝑇𝑆𝑆 = 366 − 341.33


⇒ 𝑇𝑆𝑆 = 24.67
𝑇𝑖 2
Sum of the row squares 𝑆𝑆𝑅 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

625 441 324


𝑆𝑆𝑅 = + + − 341.33
4 4 4
⇒ 𝑆𝑆𝑅 = 156.25 + 110.25 + 81 − 341.33
⇒ 𝑆𝑆𝑅 = 347.50 − 341.33
⇒ 𝑆𝑆𝑅 = 6.17
𝑃𝑖 2
Sum of the column squares 𝑆𝑆𝐶 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

324 289 196 225


𝑆𝑆𝐶 = + + + − 341.33
3 3 3 3
⇒ 𝑆𝑆𝐶 = 108 + 96.33 + 65.33 + 75 − 341.33
⇒ 𝑆𝑆𝐶 = 344.66 − 341.33
⇒ 𝑆𝑆𝐶 = 3.33

Therefore SSE=TSS-SSR-SSC

SSE=24.67-6.17-3.33=15.17

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 17


Inspire before you expire…, TIE- Notes and Resources BCS301
Sources d.f. SS MSS F Ratio
variation
Rows 3-1=2 SSR=6.17 6.17 3.085
𝑀𝑆𝑅 = = 3.085 𝐹𝑟 =
2 2.53
Columns 4-1=3 SSC=3.33 3.33 = 1.22
𝑀𝑆𝐶 = = 1.11
3
Error 3X2=6 SSE=15.17 15.17 2.53
𝑀𝑆𝐸 = =2.53 𝐹𝑐 =
6
Total 12-1=11 - - 1.11
= 2.28

𝐹𝑟 = 1.22 < F(2,6) &


𝐹𝑐 = 2.28< F(6,3)

3. Perform ANOVA and test at 0.05 level of significant whether these are differences in the
detergent or in the engines for the following data:
Detergent Engine
I II III
A 45 43 51
B 47 46 52
C 48 50 55
D 42 37 49
Sol.
Given the data
Engine
Detergent
I II III
A 45 43 51
B 47 46 52
C 48 50 55
D 42 37 49
Subtract 45 from all the observations, we get
Detergent Engine T T2
I II III
A 0 -2 6 4 16
B 2 1 7 10 100
C 3 5 10 18 324
D -3 -8 4 -7 49
P 2 -4 27 2 =25
P2 4 16 729 4 -
The squares are
Detergent Engine Sum
I II III
A 0 4 36 40
B 4 1 49 54
C 9 25 100 134
D 9 64 16 89
Grand Total -
∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 = 317

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 18


Inspire before you expire…, TIE- Notes and Resources BCS301
Set the null hypotheses 𝐻0 : 𝜇1 = 𝜇2= 𝜇3 , N=12
𝑇2 (25)2 625
Correction Factor 𝐶𝐹 = = = = 52.08
𝑁 12 12

Therefore Total sum of squares 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹

⇒ 𝑇𝑆𝑆 = 317 − 52.08


⇒ 𝑇𝑆𝑆 = 264.92
𝑇𝑖 2
Sum of the row squares 𝑆𝑆𝑅 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

16 100 324 49
𝑆𝑆𝑅 = + + + − 52.08
3 3 3 3
⇒ 𝑆𝑆𝑅 = 5.33 + 33.33 + 108 + 16.33 − 52.08
⇒ 𝑆𝑆𝑅 = 163 − 52.08
⇒ 𝑆𝑆𝑅 = 110.92
𝑃𝑖 2
Sum of the column squares 𝑆𝑆𝐶 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

4 16 729
𝑆𝑆𝐶 = + + − 52.08
4 4 4
⇒ 𝑆𝑆𝐶 = 1 + 4 + 182.25 − 52.08
⇒ 𝑆𝑆𝐶 = 187.25 − 52.08
⇒ 𝑆𝑆𝐶 = 135.17

Therefore SSE=TSS-SSR-SSC

SSE=264.92-110.92-135.17=18.83

Sources d.f. SS MSS F Ratio


variation
Rows 4-1=3 SSR=110.92 110.92 36.97
𝑀𝑆𝑅 = 𝐹𝑟 =
3 3.14
= 36.97 = 11.77
Columns 3-1=2 SSC=135.17 135.17
𝑀𝑆𝐶 =
2
= 67.58
18.83 67.58
Error 3X2=6 SSE=18.83 𝑀𝑆𝐸 = =3.14
6 𝐹𝑐 =
Total 12-1=11 - - 3.14
= 21.52

𝐹𝑟 = 11.77 > F(3,6) &


𝐹𝑐 = 21.52> F(6,2)

Since the null hypothesis is rejected and there is a significance between Detergent and
Engine.

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 19


Inspire before you expire…, TIE- Notes and Resources BCS301
4. Analyze and interpret the following statistics concerning output of wheat for field obtained as
result of experiment conducted to test for Four varieties of wheat viz. A,B,C and D under
Laton square design.

C B A D
25 23 20 20
A D C B
19 19 21 18
B A D C
19 14 17 20
D C B A
17 20 21 15
Sol.

Given observations are

C B A D
25 23 20 20
A D C B
19 19 21 18
B A D C
19 14 17 20
D C B A
17 20 21 15
Null hypothesis Ho : There is no significant difference between rows, columns and treatment

Code the data by subtracting 20 from each value, we get

T T2

C B A D
5 3 0 0 8 64
A D C B
-1 -1 1 -2 -3 9
B A D C
-
-1 -6 -3 0
10 100
D C B A
-3 0 1 -5 -7 49
P 0 -4 -1 -7 =- 12
2
P 0 16 1 49 - -

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 20


Inspire before you expire…, TIE- Notes and Resources BCS301
The squares are as follows

C B A D
25 9 0 0
A D C B
1 1 1 4
B A D C
1 36 9 0
D C B A
9 0 1 25
36 46 11 29 ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 =122
𝑇2 (−12)2 144
Correction Factor 𝐶𝐹 = = = =9
𝑁 16 16

Therefore, Total sum of squares 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹

⇒ 𝑇𝑆𝑆 = 122 − 9
⇒ 𝑇𝑆𝑆 = 113
𝑇𝑖 2
Sum of the row squares 𝑆𝑆𝑅 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

64 9 100 49
𝑆𝑆𝑅 = + + + −9
4 4 4 4
⇒ 𝑆𝑆𝑅 = 16 + 2.25 + 25 + 12.25 − 9
⇒ 𝑆𝑆𝑅 = 55.5 − 9
⇒ 𝑆𝑆𝑅 = 4
𝑃𝑖 2
Sum of the column squares 𝑆𝑆𝐶 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

16 1 49
𝑆𝑆𝐶 = 0 + + + −9
4 4 4
⇒ 𝑆𝑆𝐶 = 4 + 0.25 + 12.25 − 9
⇒ 𝑆𝑆𝐶 = 16.5 − 9
⇒ 𝑆𝑆𝐶 = 7.5

To find the sum of the treatments

Observations 𝑄
= ∑(𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠) 𝑄2
A 0 -1 -6 -5 -12 144
B 3 -2 -1 1 1 1
C 5 1 0 0 6 36
D 0 -1 -3 -3 -7 49
𝑄𝑖 2
Sum of the squares of treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

144 1 36 49
𝑆𝑆𝑇 = + + + −9
4 4 4 4
⇒ 𝑆𝑆𝑇 = 36 + 0.25 + 9 + 12.25 − 9
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 21
Inspire before you expire…, TIE- Notes and Resources BCS301
⇒ 𝑆𝑆𝑇 = 57.50 − 9
⇒ 𝑆𝑆𝑇 = 48.50

∴ 𝑆𝑆𝐸 = 𝑇𝑆𝑆 − 𝑆𝑆𝑅 − 𝑆𝑆𝐶 − 𝑆𝑆𝑇 ⇒ 𝑆𝑆𝐸 = 113 − 46.5 − 7.5 − 48.50 = 10.5, We
know that 𝐹 (3,6) = 4.76

Sources d.f SS MSS F Ratio Conclusion


variation
Rows 4-1=3 SSR=46.5 46.5 15.5 𝐹𝑟 > 𝐹(3,6)
𝑀𝑆𝑅 = 𝐹𝑟 =
3 1.75 𝐻0 -Rejected
= 15.5 = 8.85

Columns 4-1=3 SSC=7.5 7.5 2.5 𝐹𝑐 < 𝐹(3,6)


𝑀𝑆𝐶 = 𝐹𝑐 =
3 1.75 𝐻0 -Accepted
= 2.5 = 1.428
Treatments 4-1=3 SST=48.5 48.5 16.16 𝐹𝑇 > 𝐹(3,6)
𝑀𝑆𝑇 = 𝐹𝑇 =
3 1.75 𝐻0 -Rejected
= 16.16 = 9.23
Error 3x2=6 SSE=10.5 10.5 - -
𝑀𝑆𝐸 =
6
= 1.75
Total 25- - - - -
1=24

5. Five varieties of paddy A, B, C, D, and E are tried. The plan, the varieties shown in each plot
and yields obtained in Kg are given in the following table (LSD)
B E C A D
95 85 139 117 97
E D B C A
90 89 75 146 87
C A D B E
116 95 92 89 74
A C E D B
85 130 90 81 77
D B A E C
87 65 99 89 93
Test whether there is a significant difference between rows and columns at 5% LOS.

Sol. Given observations are


B E C A D
95 85 139 117 97
E D B C A
90 89 75 146 87
C A D B E
116 95 92 89 74
A C E D B
85 130 90 81 77
D B A E C
87 65 99 89 93
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 22
Inspire before you expire…, TIE- Notes and Resources BCS301
Null hypothesis Ho: There is no significant difference between rows, columns and treatment,
Code the data by subtracting 100 from each value, we get

T T2

B E C A D
-5 -15 39 17 -3 33 1089
E D B C A -
-10 -11 -25 46 -13 13 169
C A D B E
-
16 -5 -8 -11 -26
34 1156
A C E D B -
-15 30 -10 -19 -23 37 1369
D B A E C -
-13 -35 -1 -11 -7 67 4489
P -27 -36 -5 22 -72 = - 118
2
P 729 1296 25 484 5184 - -

The squares are as follows:

B E C A D
25 225 1521 289 9
E D B C A
100 121 625 2116 169
C A D B E
256 25 64 121 676
A C E D B
225 900 100 361 529
D B A E C
169 1225 1 121 49
∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 =10022
775 2496 2311 3008 1432
𝑇2 (−118)2 13924
Correction Factor 𝐶𝐹 = = = = 557
𝑁 25 25

Therefore Total sum of squares 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹

⇒ 𝑇𝑆𝑆 = 10022 − 557


⇒ 𝑇𝑆𝑆 = 9465
𝑇𝑖 2
Sum of the row squares 𝑆𝑆𝑅 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 23


Inspire before you expire…, TIE- Notes and Resources BCS301
1089 169 1156 1369 4489
𝑆𝑆𝑅 = + + + + − 557
5 5 5 5 5
⇒ 𝑆𝑆𝑅 = 217.8 + 33.8 + 231.2 + 273.8 + 897.8 − 557
⇒ 𝑆𝑆𝑅 = 1654.4 − 557
⇒ 𝑆𝑆𝑅 = 1097.4

𝑃𝑖 2
Sum of the column squares 𝑆𝑆𝐶 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

729 1296 25 484 5184


𝑆𝑆𝐶 = + + + + − 557
5 5 5 5 5
⇒ 𝑆𝑆𝐶 = 145.8 + 259.2 + 5 + 96.8 + 1036.8 − 557
⇒ 𝑆𝑆𝐶 = 1543.6 − 557
⇒ 𝑆𝑆𝐶 = 986.6

To find the sum of the treatments,

Observations 𝑄 𝑄2
= ∑(𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠)
A 17 - -5 -15 -1
13 -17 289
B -5 - -11 -23 -35
25 -99 9801
C 39 46 16 30 -7 124 15376
D -3 - -8 -19 -13
11 -54 2916
E -15 - -26 -10 -11
10 -72 5184

𝑄𝑖 2
Sum of the squares of treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

289 9801 15376 2916 5184


𝑆𝑆𝑇 = + + + + − 557
5 5 5 5 5
⇒ 𝑆𝑆𝑇 = 57.8 + 1960.2 + 3075.2 + 583.2 + 1036.8 − 557
⇒ 𝑆𝑆𝑇 = 6713.2 − 557
⇒ 𝑆𝑆𝑇 = 6156.2

∴ 𝑆𝑆𝐸 = 𝑇𝑆𝑆 − 𝑆𝑆𝑅 − 𝑆𝑆𝐶 − 𝑆𝑆𝑇


⇒ 𝑆𝑆𝐸 = 9465 − 1097.4 − 986.6 − 6156.2
⇒ 𝑆𝑆𝐸 = 1224.8

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 24


Inspire before you expire…, TIE- Notes and Resources BCS301
Sources d.f. SS MSS F Ratio Conclusion
variation
Rows 5-1=4 𝑀𝑆𝑅
SSR=1097.4 274.3 𝐹𝑟 < 𝐹(4,12)
1097.4 𝐹𝑟 =
102.66 𝐻0 -Accepted
= = 2.672
4
= 274.3
Columns 5-1=4 SSC=986.6 986.6 246.65 𝐹𝑐 < 𝐹(4,12)
𝑀𝑆𝐶 = 𝐹𝑐 =
4 102.66 𝐻0 -Accepted
= 246.65 = 2.4026
Treatments 5-1=4 SST=6156.2 𝑀𝑆𝑇 1539.05 𝐹𝑇 > 𝐹(4,12)
6156.2 𝐹𝑇 =
102.66 𝐻0 -Rejected
= = 15
4
= 1539.05
Error 4x3=12 SSE=1224.8 𝑀𝑆𝐸 =
1224.8
=102.66
12
Total 25- - - -
1=24

6. Present your conclusions after doing analysis of variance to the following results of
the Latin-square design experiment conducted in respect of five fertilizers which

were used on plots of different fertility.

A B C D E
16 10 11 9 9
E C A B D
10 9 14 12 11
B D E C A
15 8 8 10 18
D E B A C
12 6 13 13 12
C A D E B
13 11 10 7 14
Sol. Given observations are

A B C D E
16 10 11 9 9
E C A B D
10 9 14 12 11
B D E C A
15 8 8 10 18
D E B A C
12 6 13 13 12
C A D E B
13 11 10 7 14

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 25


Inspire before you expire…, TIE- Notes and Resources BCS301
Null hypothesis Ho: There is no significant difference between rows, columns and treatment,
Code the data by subtracting 10 from each value. We get,

T T2

A B C D E
6 0 1 -1 -1 5 25
E C A B D
0 -1 4 2 1 6 36
B D E C A
5 -2 -2 0 8 9 81
D E B A C
2 -4 3 3 2 6 36
C A D E B
3 1 0 -3 4 5 25
P 16 -6 6 1 14 = 31
2
P 256 36 36 1 196 - -
The squares are as follows:

A B C D E
36 0 1 1 1
E C A B D
0 1 16 4 1
B D E C A
25 4 4 0 64
D E B A C
4 16 9 9 4
C A D E B
9 1 0 9 16

∑ ∑ 𝑥𝑖𝑗 2
𝑖 𝑗
74 22 30 23 86 = 235
𝑇2 (31)2 961
Correction Factor 𝐶𝐹 = = = = 38.44
𝑁 25 25

Therefore Total sum of squares 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹

⇒ 𝑇𝑆𝑆 = 235 − 38.44


⇒ 𝑇𝑆𝑆 = 196.56
𝑇𝑖 2
Sum of the row squares 𝑆𝑆𝑅 = ∑𝑖 − 𝐶𝐹
𝑛𝑖

25 36 81 36 25
𝑆𝑆𝑅 = + + + + − 38.44
5 5 5 5 5
⇒ 𝑆𝑆𝑅 = 5 + 7.2 + 16.2 + 7.2 + 5 − 38.44
⇒ 𝑆𝑆𝑅 = 40.60 − 38.44
⇒ 𝑆𝑆𝑅 = 2.16
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 26
Inspire before you expire…, TIE- Notes and Resources BCS301
𝑃𝑖 2
Sum of the column squares 𝑆𝑆𝐶 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
256 36 36 1 196
𝑆𝑆𝐶 = + + + + − 38.44
5 5 5 5 5
⇒ 𝑆𝑆𝐶 = 51.2 + 7.2 + 7.2 + 0.2 + 39.2 − 38.44
⇒ 𝑆𝑆𝐶 = 105 − 38.44 ⇒ 𝑆𝑆𝐶 = 66.56
To find the sum of the treatments
Observations 𝑄 𝑄2
= ∑(𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠)
A 6 4 8 3 1 22 484
B 0 2 5 3 4 14 196
C 1 -1 0 2 3 5 25
D -1 1 -2 2 0 0 0
E -1 0 -2 -4 -3 -10 100
𝑄𝑖 2
Sum of the squares of treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
484 196 25 0 100
𝑆𝑆𝑇 = + + + + − 38.44
5 5 5 5 5
⇒ 𝑆𝑆𝑇 = 96.8 + 39.2 + 5 + 0 + 20 − 38.44
⇒ 𝑆𝑆𝑇 = 161 − 38.44
⇒ 𝑆𝑆𝑇 = 122.56
∴ 𝑆𝑆𝐸 = 𝑇𝑆𝑆 − 𝑆𝑆𝑅 − 𝑆𝑆𝐶 − 𝑆𝑆𝑇
⇒ 𝑆𝑆𝐸 = 196.56 − 2.16 − 66.56 − 122.56
⇒ 𝑆𝑆𝐸 = 5.28
Sources d.f. SS MSS F Ratio Conclusion
variation
Rows 5-1=4 SSR=2.16 2.16 0.54 𝐹𝑟 < 𝐹(4,12)
𝑀𝑆𝑅 = 𝐹𝑟 =
4 0.44 𝐻0 -Accepted
= 0.54 = 1.227

Columns 5-1=4 SSC=66.56 𝑀𝑆𝐶 16.64 𝐹𝑐 > 𝐹(4,12)


66.56 𝐹𝑐 =
0.44 𝐻0 -Rejected
= = 37.81
4
= 16.64
Treatments 5-1=4 SST=122.56 𝑀𝑆𝑇 30.64 𝐹𝑇 > 𝐹(4,12)
122.56 𝐹𝑇 =
0.44 𝐻0 -Rejected
= = 69.63
4
= 30.64
Error 4x3=12 SSE=5.28 𝑀𝑆𝐸 = - -
5.28
=0.44
12
Total 25- - - - -
1=24

7. Set up ANOVA table for the following information relating to three drugs testing to judge the effectiveness in
reducing blood pressure for three different groups of people:

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 27


Inspire before you expire…, TIE- Notes and Resources BCS301
Group of people Drug

X Y Z

A 14 10 11
15 9 11

B 12 7 10
11 8 11

C 10 11 8
11 11 7

Do the drugs act differently? Are the different groups of people affected differently? Is the interaction term
significant? Answer the above questions taking a significant level of 5%.
Sol.
Given observations from different people (A, B, C) to the different drugs (X, Y, Z) are as
Group Drug T T2
of
X Y Z
people
A 14 10 11 70 4900
15 9 11
B 12 7 10 59 3481
11 8 11
C 10 11 8 58 3364
11 11 7
P 73 56 58 =187 -

P2 5329 3136 3364 - -

Where N=6+6+6=18
𝑇2 (187)2 34969
Correction Factor 𝐶𝐹 = = = = 1942.722
𝑁 18 18
The squares are as follows
Group Drug Sum of
of Squares
people X Y Z

A 196 100 121 844


225 81 121
B 144 49 100 599
121 64 121
C 100 121 64 576 ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 =2019
121 121 49

Therefore, Total sum of squares 𝑇𝑆𝑆 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − 𝐶𝐹

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 28


Inspire before you expire…, TIE- Notes and Resources BCS301
⇒ 𝑇𝑆𝑆 = 2019 − 1942.722
⇒ 𝑇𝑆𝑆 = 76.28
𝑇𝑖 2
Sum of the row squares 𝑆𝑆𝑅 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
4900 3481 3364
𝑆𝑆𝑅 = + + − 1942.722
6 6 6
⇒ 𝑆𝑆𝑅 = 816.67 + 580.16 + 560.67 − 1942.722
⇒ 𝑆𝑆𝑅 = 14.78
𝑃𝑖 2
Sum of the column squares 𝑆𝑆𝐶 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
5329 3136 3364
𝑆𝑆𝐶 = + + − 1942.722
6 6 6
⇒ 𝑆𝑆𝐶 = 888.16 + 522.66 + 560.67 − 1942.722
⇒ 𝑆𝑆𝐶 = 28.77
SS within samples (SST)= (14 – 14.5)2 + (15 – 14.5)2 + (10 – 9.5)2 + (9 – 9.5)2 + (11 – 11)2 +
(11 – 11)2 + (12 – 11.5)2 + (11 – 11.5)2 + (7 – 7.5)2 + (8 – 7.5)2 + (10 – 10.5)2 + (11 – 10.5)2
+ (10 – 10.5)2 + (11 – 10.5)2 + (11 – 11)2 + (11 – 11)2 + (8 – 7.5)2 + (7 – 7.5)2
SST=3.50
Therefore,
SSE=TSS-SSR-SSC-SST
⇒ SSE = 76.28 − 14.78 − 28.77 − 3.5
⇒ SSE = 29.23
We have F(2,9)=4.26 , F(4,9)=3.63
Sources d.f. SS MSS F Ratio Conclusion
variation
Rows 3-1=2 SSR=14.78 14.78 7.39 𝐹𝑟 > 𝐹(2,9)
𝑀𝑆𝑅 = 𝐹𝑟 =
2 0.389 𝐻0 -Rejected
= 7.39 = 19

Columns 3-1=2 SSC=28.77 28.77 14.385 𝐹𝑟 > 𝐹(2,9)


𝑀𝑆𝐶 = 𝐹𝑐 =
2 0.389 𝐻0 -Rejected
= 14.385 = 37
Treatments 9 SST=3.5 3.5 7.33 𝐹𝑇 > 𝐹(4,9)
𝑀𝑆𝑇 = 𝐹𝑇 =
9 0.389 𝐻0 -Rejected
= 0.389 = 18.84
29.33
Error 4 SSE=29.33 𝑀𝑆𝐸 = =7.33 - -
4

All the very best…..

Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 29

You might also like