Bcs301 Imp Notes PRP
Bcs301 Imp Notes PRP
Prepared by:
Purushotham P
Assistant Professor
SJC Institute of Technology
Email id: [email protected]
Random Experiment:
An activity that yield some results called the random experiment. The random variable means a real
number, i.e. X associated with the outcomes of a random experiment.
Definition: Let S be a sample space associated with a random experiment with a real value function defined
and taking its values is called a Random variable.
The random variables are two types. They are,
i) Discrete Random Variables (DRV)
ii) Continuous Random Variables (CRV)
Discrete Random Variables: A Discrete random variable is a variable which can only take a countable
number of values.
For example, if a coin is tossed three times, the number of heads can be obtained is 0, 1, 2 or 3. The
probabilities of each of these probabilities can be tabulated as shown.
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
X 0 1 2 3
P(x) 1 3 3 1
8 8 8 8
Continuous Random variables: A Continuous random variable is a random variable where the data can
take infinitely many values. For example, a random variable measuring the time taken for something to be
done is continuous since there are an infinite number of possible times that can be taken.
Ex: Temperature of the climate, Age of a person, etc.
PROBLEMS
1) Show that the following probabilities one satisfying the properties of discrete random variables,
hence find it’s mean and variance.
x 10 20 30 40
P(x) 1 3 3 1
8 8 8 8
Soln: Let X be the random variable for the random values,
x1 = 10, x2 = 20, x3 = 30, x4 = 40
and given
1
𝑃(𝑋 = 𝑥1 ) = 𝑃(𝑥1 ) = 𝑝1 =
8
3
𝑃(𝑋 = 𝑥2 ) = 𝑃(𝑥2 ) = 𝑝2 =
8
3
𝑃(𝑋 = 𝑥3 ) = 𝑃(𝑥3 ) = 𝑝3 =
8
1
𝑃(𝑋 = 𝑥4 ) = 𝑃(𝑥4 ) = 𝑝4 =
8
1 3 3 1
= + + +
8 8 8 8
8
=
8
=1
Hence the given probabilities can satisfy the DRV property.
200
= 8
=25
1 3 3 1
=102 × + 202 × + 302 × + 402 × − 252
8 8 8 8
=700 − 625
=75
2) Find the value of k, such that the following distribution represents discrete probability
distribution. Hence find Mean, S.D, 𝑷(𝒙 ≤ 𝟏), 𝑷(𝒙 > 𝟏) and 𝑷(−𝟏 < 𝒙 ≤ 𝟐).
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 2 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
x -3 -2 -1 0 1 2 3
P(x) k 2k 3k 4k 3k 2k k
We know that,
∑7𝑖=1 𝑃(𝑋 = 𝑥𝑖 ) = 1
⇒ 𝑘 + 2𝑘 + 3𝑘 + 4𝑘 + 3𝑘 + 2𝑘 + 𝑘 = 1
⇒ 16𝑘 = 1
1
⇒𝑘=
16
𝑥 𝑃(𝑥) 𝑥𝑃(𝑥) 𝑥 2 𝑥 2 𝑃(𝑥)
-3 K -3k 9 9k
-2 2k -4k 4 8k
-1 3k -3k 1 3k
0 4k 0 0 0
1 3k 3k 1 3k
2 2k 4k 4 8k
3 K 3k 9 9k
0 - 40k
Mean 𝜇 = ∑4𝑖=1 𝑥𝑖 𝑃(𝑥𝑖 ) = 0
= 40𝑘 − 02
1
= 40 × 16
= 2.5
S.D = √2.5 = 1.5811
We know that,
∑7𝑖=1 𝑃(𝑋 = 𝑥𝑖 ) = 1
⇒ 𝑘 + 3𝑘 + 5𝑘 + 7𝑘 + 9𝑘 + 11𝑘 + 13𝑘 = 1
⇒ 49𝑘 = 1
1
⇒𝑘=
49
= 973𝑘 − 4.14282
973
= 49
− 17.1628
= 2.6943
0 1 2 3 4 5 6 7
𝒙
0 𝒌 𝟐𝒌 𝟐𝒌 𝟑𝒌 𝒌𝟐 𝟐𝒌𝟐 𝟕𝒌𝟐 + 𝒌
𝑷(𝒙)
We know that,
∑7𝑖=1 𝑃(𝑋 = 𝑥𝑖 ) = 1
⇒ 0 + 𝑘 + 2𝑘 + 2𝑘 + 3𝑘 + 𝑘 2 + 2𝑘 2 + 7𝑘 2 + 𝑘 = 1
⇒ 10𝑘 2 + 9𝑘 = 1
⇒ 10𝑘 2 + 9𝑘 − 1 = 0
⇒ (10𝑘 − 1)(𝑘 + 1) = 0
⇒ 10𝑘 − 1 = 0 , 𝑘 + 1 = 0
1
⇒𝑘= , 𝑘 ≠ −1
10
𝑥
0 1 2 3 4 5 6 7
𝑃(𝑥)
O 0.1 0.2 0.2 0.3 0.01 0.02 0.17
CP
0 0.1 0.3 0.5 0.8 0.81 0.83 1
𝑖𝑖𝑖) 𝑃(3 < 𝑥 ≤ 6) = 𝑃(4) + 𝑃(5) + 𝑃(6) = 0.3 + 0.01 + 0.02 = 0.33
i) We know that,
∑6𝑖=1 𝑃(𝑋 = 𝑥𝑖 ) = 1
⇒ 0.1 + 𝑘 + 0.2 + 2𝑘 + 0.3 + 𝑘 = 1
⇒ 4𝑘 + 0.6 = 1
⇒ 4𝑘 = 0.4
⇒ 𝑘 = 0.1
𝑖𝑖) 𝑃(𝑥 < 1) = 𝑃(−2) + 𝑃(−1) + 𝑃(0) = 0.1 + 𝑘 + 0.2 = 𝑘 + 0.3 = 0.1 + 0.3 = 0.4
𝑖𝑖𝑖) 𝑃(𝑥 ≥ −1) = 𝑃(−1) + 𝑃(0) + 𝑃(1) + 𝑃(2) + 𝑃(3) = 𝑘 + 0.2 + 2𝑘 + 0.3 + 𝑘 = 4𝑘 + 0.5 = 0.9
6) A random variable has the following probability function for the various values of X=x. Find
i)Value of k, ii)𝑷(𝒙 ≤ 𝟏), iii) 𝑷(𝟎 ≤ 𝒙 < 𝟑).
𝒙 0 1 2 3 4 5
𝑷(𝒙) 𝒌 𝟓𝒌 𝟏𝟎𝒌 𝟏𝟎𝒌 𝟓𝒌 𝒌
i) We know that,
∑6𝑖=1 𝑃(𝑋 = 𝑥𝑖 ) = 1
⇒ 𝑘 + +5𝑘 + 10𝑘 + 10𝑘 + 5𝑘 + 𝑘 = 1
⇒ 32𝑘 = 1
1
⇒𝑘=
32
6
𝑖𝑖) 𝑃(𝑥 ≤ 1) = 𝑃(0) + 𝑃(1) = 𝑘 + 5𝑘 = 6𝑘 = = 0.1875
32
16
𝑖𝑖𝑖) 𝑃(0 ≤ 𝑥 < 3) = 𝑃(0) + 𝑃(1) + 𝑃(2) = 𝑘 + 5𝑘 + 10𝑘 = 16𝑘 = = 0.5
32
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 6 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
𝒆−𝒙 , 𝒙 ≥ 𝟎
7) Show that the function 𝒇(𝒙) = { is probability density function. Hence find
𝟎, 𝒙 < 𝟎
𝑷(𝟏. 𝟓 < 𝒙 < 𝟐. 𝟓).
0 ∞
= ∫−∞ 0 𝑑𝑥 + ∫0 𝑒 −𝑥 𝑑𝑥
𝑒 −𝑥 ∞
=0+[ ]
−1 0
= −[𝑒 −∞ − 𝑒 0 ]
= −[0 − 1]
=1
Hence the given probability function is p.d.f.
2.5
𝑃(1.5 < 𝑥 < 2.5) = ∫1.5 𝑓(𝑥)𝑑𝑥
2.5
= ∫1.5 𝑒 −𝑥 𝑑𝑥
2.5
= −[𝑒 −𝑥 ]1.5
1 1
= −[𝑒 −2.5 − 𝑒 −1.5 ] = [ − ]
𝑒 1.5 𝑒 2.5
𝒌𝒙𝟐 , 𝟎 ≤ 𝒙 ≤ 𝟑
8) A random variable X has probability density function 𝒇(𝒙) = { , Evaluate i) k
𝟎 , 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
ii) 𝑷(𝒙 ≤ 𝟏), iii) 𝑷(𝒙 > 𝟏), iv) 𝑷(𝟏 ≤ 𝒙 ≤ 𝟐), v) 𝑷(𝒙 ≤ 𝟐), vi) 𝑷(𝒙 ≥ 𝟐).
1
𝑥2 𝑥3
⇒ 𝑘[ − ] = 1
2 3 0
𝑘
⇒ =1
6
⇒𝑘=6
∞
𝑖𝑖) 𝑀𝑒𝑎𝑛 𝜇 = ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞
1
= ∫ 𝑥𝑘𝑥(1 − 𝑥)𝑑𝑥
0
1
= 𝑘 ∫ 𝑥 2 (1 − 𝑥)𝑑𝑥
0
1
𝑥3 𝑥4
= 𝑘[ − ]
3 4 0
1 1 𝑘 6 1
= 𝑘[ − ] = = =
3 4 12 12 2
∞
𝑖𝑖𝑖) 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝜎 2 = ∫ 𝑥 2 𝑓(𝑥)𝑑𝑥 − 𝜇2
−∞
1
= ∫ 𝑘𝑥 3 (1 − 𝑥)𝑑𝑥 − 𝜇2
0
1
12
= 𝑘 ∫ (𝑥 3 − 𝑥 4 )𝑑𝑥 − [ ]
0 2
4 5 1
𝑥 𝑥 1
= 𝑘[ − ] −
4 5 0 4
𝑘 1 6 1 1
= − = × =
20 4 20 4 20
1
= ∫ 𝑥𝑘𝑥𝑒 −𝑥 𝑑𝑥
0
1
= 𝑘 ∫ 𝑥 2 𝑒 −𝑥 𝑑𝑥
0
1 1
= 𝑘 {𝑥 2 ∫ 𝑒 −𝑥 𝑑𝑥 − ∫ (2𝑥 × ∫ 𝑒 −𝑥 𝑑𝑥) 𝑑𝑥 }
0 0
1
= 𝑘 {−(𝑥 2 𝑒 −𝑥 )10 + 2 ∫ 𝑥𝑒 −𝑥 𝑑𝑥 }
0
= 𝑘(2 − 5𝑒 −1 )
5
= 𝑘 (2 − )
𝑒
𝑒 2𝑒 − 5
= ×
𝑒−2 𝑒
2𝑒 − 5
𝜇=
𝑒−2
i) Mean:
𝑛
𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑃(𝑋 = 𝑥)
𝑥=0
𝑛
=∑ 𝑥 𝑛𝑐𝑥 𝑝 𝑥 𝑞𝑛−𝑥
𝑥=0
𝑛 𝑛!
=∑ 𝑥 𝑝 𝑥−1 𝑝1 𝑞𝑛−𝑥
𝑥=0 (𝑛
𝑥! − 𝑥)!
𝑛 𝑛 (𝑛 − 1)!
=∑ 𝑥 𝑝 𝑥−1 𝑝1 𝑞 𝑛−𝑥
𝑥=0 𝑥 (𝑥 − 1)! (𝑛 − 𝑥)!
𝑛 (𝑛 − 1)!
= 𝑛𝑝 ∑ 𝑝 𝑥−1 𝑞 𝑛−𝑥
𝑥=1 (𝑥 − 1)! (𝑛 − 𝑥)!
𝑛 (𝑛 − 1)!
= 𝑛𝑝 ∑ 𝑝 𝑥−1 𝑞(𝑛−1)−(𝑥−1)
𝑥=1 (𝑥 − 1)! ((𝑛 − 1) − (𝑥 − 1))!
𝑛
= 𝑛𝑝 ∑ (𝑛 − 1)𝑐(𝑥−1) 𝑝 𝑥−1 𝑞(𝑛−1)−(𝑥−1)
𝑥=1
= 𝑛𝑝(1)
𝜇 = 𝐸(𝑥) = 𝑛𝑝
ii) Variance:
𝜎 2 = 𝐸(𝑥 2 ) − [𝐸(𝑥)]2 − − − −(1)
⇒ 𝐸(𝑥 2 ) = 𝐸(𝑥(𝑥 − 1) + 𝑥)
𝑛
=∑ 𝑥(𝑥 − 1) 𝑛𝑐𝑥 𝑝 𝑥 𝑞𝑛−𝑥
𝑥=0
𝑛 (𝑛 − 2)!
= 𝑛(𝑛 − 1)𝑝2 ∑ 𝑝 𝑥−2 𝑞𝑛−𝑥
𝑥=2 (𝑥 − 2)! (𝑛 − 𝑥)!
𝑛 (𝑛 − 2)!
= 𝑛(𝑛 − 1)𝑝2 ∑ 𝑝 𝑥−2 𝑞 (𝑛−2)−(𝑥−2)
𝑥=2 (𝑥 − 2)! ((𝑛 − 2) − (𝑥 − 2))!
𝑛
= 𝑛(𝑛 − 1)𝑝2 ∑ (𝑛 − 2)𝑐(𝑥−2) 𝑝 𝑥−2 𝑞(𝑛−2)−(𝑥−2)
𝑥=2
⇒ 𝜎 2 = 𝑛2 𝑝2 − 𝑛𝑝2 + 𝑛𝑝 − 𝑛2 𝑝2
⇒ 𝜎 2 = 𝑛𝑝 − 𝑛𝑝2
⇒ 𝜎 2 = 𝑛𝑝(1 − 𝑝)
𝑏𝑢𝑡 1 − 𝑝 = 𝑞
𝜎 2 = 𝑛𝑝𝑞
PROBLEMS
1) Let X be a binomially distributed random variable based on 6 repetitions of an experiment. If p=0.3,
evaluate the following probabilities i) 𝑷(𝒙 ≤ 𝟑), ii) 𝑷(𝑿 > 𝟒).
Soln: Given p=0.3 and n=6, hence q = 1-p = 1-0.3 = 0.7
and 𝑃(𝑋 = 𝑥) = 𝑏(6,0.3, 𝑥) = 6𝐶𝑥 (0.3)𝑥 (0.7)6−𝑥
i) The probability that exactly 2 pens will be defective, 𝑃(2) = 12𝐶2 (0.1)2 (0.9)12−2
= (66)(0.01)(0.3487)
= 0.2301
ii) The probability that atmost 2 pens will be defective, 𝑃(𝑥 ≤ 2) = 𝑃(0) + 𝑃(1) + 𝑃(2)
= 12𝐶0 (0.1)0 (0.9)12 + 12𝐶1 (0.1)1 (0.9)11 + 12𝐶2 (0.1)2 (0.9)10
= 0.2824 + 0.3766 + 0.2301
= 0.8891
iii) The probability that none will be defective, 𝑃(0) = 12𝐶0 (0.1)0 (0.9)12
= (1)(1)(0.2824)
= 0.2824
3) The number of telephonic lines busy at an instant is a binomial variant with a probability 0.1. If 10
lines are chosen at random, what is the probability that,
i)No line is busy
ii)All lines are busy
iii)Atleast one line is busy
iv)Atmost two lines are busy
ii) The probability that all lines are busy, 𝑃(10) = 10𝐶10 (0.1)10 (0.9)0
= (1)(10−10 )(1)
= 10−10
iii) The probability that atleast one line is busy, 𝑃(𝑥 ≥ 1) = 1 − 𝑃(0)
= 1 − 10𝐶0 (0.1)0 (0.9)10
= 1 − 0.3487
= 0.6513
iv) The probability that atmost two lines are busy, 𝑃(𝑥 ≤ 2) = 𝑃(0) + 𝑃(1) + 𝑃(2)
= 10𝐶0 (0.1)0 (0.9)10 + 10𝐶1 (0.1)1 (0.9)9 + 10𝐶2 (0.1)2 (0.9)8
= 0.3487 + 0.3874 + 0.1937
= 0.92968
i) The probability of getting exactly one head, 𝑃(1) = 4𝐶1 (0.0625) = 4 × 0.0625 = 0.25
iii) The probability of getting atleast two heads, , 𝑃(𝑥 ≥ 2) = 𝑃(2) + 𝑃(3) + 𝑃(4)
= 4𝐶2 (0.0625) + 4𝐶3 (0.0625) + 4𝐶4 (0.0625)
= 0.375 + 0.25 + 0.0625
= 0.6875
5) The probability of germination of a seed in a packet of seeds is found to be 0.7. If 10 seeds are
taken for experimenting on germination in a laboratory, find the probability that
i)8 seeds germinate
ii)Atleast 8 seeds germinate
iii)Atmost 8 seeds germinate
i) The probability that exactly 8 seeds germinate, 𝑃(8) = 10𝐶8 (0.7)8 (0.3)2
= (45)(0.0576)(0.09)
= 0.2334
ii) The probability that atleast 8 seeds germinate, 𝑃(𝑥 ≥ 8) = 𝑃(8) + 𝑃(9) + 𝑃(10)
= 10𝐶8 (0.7)8 (0.3)2 + 10𝐶9 (0.7)9 (0.3)1 + 10𝐶10 (0.7)10 (0.3)0
= 0.2334 + 0.1210 + 0.0282
= 0.3826
iii) The probability that atmost 8 seeds germinate, 𝑃(𝑥 ≤ 8) = 1 − {𝑃(9) + 𝑃(10)}
= 1 − {10𝐶9 (0.7)9 (0.3)1 + 10𝐶10 (0.7)10 (0.3)0 }
= 1 − {0.1210 + 0.0282}
= 0.8508
6) A communication channel receives independent pulses at the rate of 12 pulses per micro second.
The probability of transmission error is 0.001 for each micro second. Compute the probability of,
i)No error during a micro second
ii)1 error
iii)Atleast 1 error
iv)2 error
v)Atmost 2 error
i) The probability of no error during a micro second, 𝑃(0) = 12𝐶0 (0.001)0 (0.999)12
= (1)(1)(0.9880)
= 0.9880
ii) The probability of only one error during a micro second, 𝑃(1) = 12𝐶1 (0.001)1 (0.999)11
= (12)(0.001)(0.9890)
= 0.01186
iii) The probability of atleast one error during a micro second, 𝑃(𝑥 ≥ 1) = 1 − 𝑃(0)
= 1 − 12𝐶0 (0.001)0 (0.999)12
= 1 − 0.9880
= 0.0120
iv) The probability of two error during a micro second, 𝑃(2) = 12𝐶2 (0.001)2 (0.999)10
= (66)(0.000001)(0.9900)
= 0.00006534
v) The probability of atmost two error during a micro second, 𝑃(𝑥 ≤ 2) = 𝑃(0) + 𝑃(1) + 𝑃(2)
= 12𝐶0 (0.001)0 (0.999)12 + 12𝐶1 (0.001)1 (0.999)11 + 12𝐶2 (0.001)2 (0.999)10
= 0.9880 + 0.01186 + 0.00006534
= 0.999925
7) In 800 families with 5 children each, how many family would be expected to have,
i)3 boys
ii)5 girls
iii)Atmost 2 girls
iv)Either 2 or 3 boys
by assuming probability for boys and girls to be equal.
Soln: The total number of families given is 800 and number of children per family is, n=5
Given the probability of boy or girl to born, p=0.5
then q = 1-p = 1-0.5 = 0.5
The pmf of binomial distribution is, 𝑃(𝑋 = 𝑥) = 𝑃(𝑥) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥 = 5𝐶𝑥 (0.5)𝑥 (0.5)5−𝑥
= 5𝐶𝑥 (0.5)5 = 5𝐶𝑥 (0.03125)
iii) The probability to have atmost two girls, 𝑃(𝑥 ≤ 2) = 𝑃(0) + 𝑃(1) + 𝑃(2)
= 5𝐶0 (0.03125) + 5𝐶1 (0.03125) + 5𝐶2 (0.03125)
= 0.03125 + 0.15625 + 0.3125
= 0.5
The total number of families may have atmost two girls, = 800 × 0.5 = 400.
Poisson Distribution:
Let X be the discrete random variable for any real value , such that the probability mass function of
poisson distribution can be defined as,
𝑒 − 𝑥
𝑃(𝑋 = 𝑥) = 𝑃(𝑥) = { 𝑥! ,𝑥 ≥ 0
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where, is called the parameter and,
i) 𝑃(𝑋 = 𝑥) = 𝑃(𝑥) ≥ 0
𝑒 −𝜆 𝜆𝑥
ii) ∑𝑛𝑥=0 𝑃(𝑥) = ∑𝑛𝑥−0 𝑥!
=1
iii) Mean 𝜇 = 𝑛𝑝 = 𝜆
The poisson distribution can be used to find the probabaility that an event might happen a definite
number of times based on how often it usually occurs and the companies can utilize the poisson
distribution to examine how they may be able to take steps to improve their operational effieciency.
i) Mean:
∞
𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑃(𝑥)
𝑥=0
∞ 𝑒 − 𝑥
=∑ 𝑥
𝑥=0 𝑥!
∞ 𝑒 − 𝑥−1
=∑ 𝑥
𝑥=0 𝑥(𝑥 − 1)!
𝑒 − 𝑥−1
∞
= ∑
𝑥=1 (𝑥 − 1)!
= (1)
𝜇=
ii) Variance:
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 17 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
𝜎 2 = 𝐸(𝑥 2 ) − 𝜇2 − − − −(1)
= 𝐸(𝑥(𝑥 − 1) + 𝑥) − 𝜇2
∞ 𝑒 − 𝑥
=∑ 𝑥(𝑥 − 1)
𝑥=0 𝑥!
∞ 𝑒 − 𝑥−2 2
=∑ 𝑥(𝑥 − 1)
𝑥=0 𝑥(𝑥 − 1)(𝑥 − 2)!
∞ 𝑒 − 𝑥−2
= 2 ∑
𝑥=2 (𝑥 − 2)!
= 2 (1)
(2) ⇒ 𝜎 2 = 2 + − 2
⇒ 𝜎2 =
𝑆. 𝐷 = 𝜎 = √
𝑀𝑒𝑎𝑛 = = 𝑛𝑝
PROBLEMS
1) The number of accidents in a year to taxi drivers in a city follows a poisson distribution with mean
3. Out of 1000 taxi drivers find approximately the number of drivers with,
i)No accident in a year
ii)More than 3 accidents in a year.
Soln: Let X be the poisson variant follows accident in the year of the poisson distribution.
𝑒 − 𝑥
The probability mass function of the poisson distribution is 𝑃(𝑋 = 𝑥) =
𝑥!
Given the mean of poisson distribution is 𝜇 = = 3
𝑒 −3 3𝑥
𝑃(𝑋 = 𝑥) = 𝑥!
ii) More than 3 accidents in a year out of 1000 taxi drivers = 1000 × 𝑃(𝑥 > 3)
= 1000 × [1 − 𝑃(𝑥 ≤ 3)]
= 1000 × [1 − 𝑃(0) − 𝑃(1) − 𝑃(2]) − 𝑃(3)]
𝑒 −3 30 𝑒 −3 31 𝑒 −3 32 𝑒 −3 33
= 1000 × [1 − − − − ]
0! 1! 2! 3!
9 27
= 1000 × (1 − (𝑒 −3 + 𝑒 −3 (3) + 𝑒 −3 ( ) + 𝑒 −3 ( )))
2 6
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 18 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
= 1000 × (1 − 0.06472) = 352.8
= 353
Therefore 353 drivers out of 1000 have done more than 3 accidents in the year.
𝟏
2) In a certain factory turning out razor blades there is a small probability of 𝟓𝟎𝟎 for any blade to be
defective. The blades are supplied in a packets of 10. Use poisson distribution to calculate
approximate number of packets containing,
i)No defective
ii)2 defective
iii)3 defective
in the consignment of 10000 packets.
Soln: Let X be the poisson variant follows the blades to be defective of the poisson distribution.
𝑒 − 𝑥
The probability mass function of the poisson distribution is 𝑃(𝑋 = 𝑥) = 𝑥!
1
Given, p=500 = 0.002 , n=10, 𝜇 = 𝑛𝑝 = 0.002 × 10 = 0.02 =
𝑒 −0.02 (0.02)𝑥
𝑃(𝑋 = 𝑥) =
𝑥!
i) No blades are defective out 10000 packets = 10000 × 𝑃(𝑥 = 0)
𝑒 −0.02 (0.02)0
= 10000 ×
0!
= 10000 × 0.9802
= 9802
9802 packet blades are not defective out of 10000 packets.
3) If the probability of bad reaction from a certain injection is 0.001, determine the probability that out
of 2000 individuals more than 2 will get a bad reaction.
Soln: Let X be the poisson variant follows the bad reaction of the injection.
𝑒 − 𝑥
WKT, The probability mass function of the poisson distribution is 𝑃(𝑋 = 𝑥) = 𝑥!
Given, n=2000, p=0.001 and 𝜇 = 𝑛𝑝 = 2000 × 0.001 = 2 =
𝑒 −2 (2)𝑥
𝑃(𝑋 = 𝑥) = 𝑥!
The probability that of more than two individuals get bad reaction = 𝑃(𝑥 > 2)
= 1 − 𝑃(𝑥 ≤ 2)
= 1 − [𝑃(0) + 𝑃(1) + 𝑃(2)]
𝑒 −2 (2)0 𝑒 −2 (2)1 𝑒 −2 (2)2
=1− − −
0! 1! 2!
5
= 1 − 2 = 0.3233
𝑒
𝟏
4) The probability that a news reader commits no mistakes in reading the news is 𝒆𝟑 . Find a probability
on a particular news broadcast he commits,
i)Only 2 mistakes
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 19 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
ii)More than 3 mistakes
iii)Atmost 3 mistakes
Soln: Let X be the poisson variant follows the news reader do mistakes of the poisson distribution.
𝑒 − 𝑥
The probability mass function of the poisson distribution is 𝑃(𝑋 = 𝑥) = 𝑥!
1
Given, 𝑃(𝑋 = 0) = 𝑒 3
𝑒 − 0 1 1 1
= ⇒ = ⇒𝜆=3
0! 𝑒3 𝑒𝜆 𝑒3
𝑒 −3 (3)𝑥
𝑃(𝑋 = 𝑥) =
𝑥!
ii) The probability that the news reader can do more than 3 mistakes 𝑃(𝑥 > 3) = 1 − 𝑃(𝑥 ≤ 3)
⇒ 𝑃(𝑥 > 3) = 1 − [𝑃(0) + 𝑃(1) + 𝑃(2) + 𝑃(3)]
30 31 32 33
⇒ 𝑃(𝑥 > 3) = 1 − 𝑒 −3 [ + + + ]
0! 1! 2! 3!
⇒ 𝑃(𝑥 > 3) = 1 − 0.05(1 + 3 + 4.5 + 4.5)
⇒ 𝑃(𝑥 > 3) = 1 − 0.65
⇒ 𝑃(𝑥 > 3) = 0.3500
iii) The probability that the news reader can do atmost 3 mistakes = 𝑃(𝑥 ≤ 3)
⇒ 𝑃(𝑥 ≤ 3) = 𝑃(𝑥 = 0) + 𝑃(𝑥 = 1) + 𝑃(𝑥 = 2) + 𝑃(𝑥 = 3)
−3
30 31 32 33
⇒ 𝑃(𝑥 ≤ 3) = 𝑒 [ + + + ]
0! 1! 2! 3!
⇒ 𝑃(𝑥 ≤ 3) = (0.05)(1 + 3 + 4.5 + 4.5)
⇒ 𝑃(𝑥 ≤ 3) = 0.6500
5) Suppose 300 misprints are randomly distributed throughout a book of 500 pages, find the
probability that a given page contains,
i)Exactly 3 misprints
ii)Less than 3 misprints
iii)4 or more misprints
Soln: Let X be the poisson variant of misprints throughout a book of 500 pages.
𝑒 − 𝑥
The probability mass function of the poisson distribution is 𝑃(𝑋 = 𝑥) =
𝑥!
Given, suppose 300 misprints are randomly distributed throughout a book of 500 pages.
300
Mean = 500 = 0.6
𝑒 − 𝑥
WKT, the pmf of poisson distribution is 𝑃(𝑋 = 𝑥) =
𝑥!
ii) The probability that there are less than three misprints 𝑃(𝑥 < 3) = 𝑃(0) + 𝑃(1) + 𝑃(2)
iii) The probability that there are 4 or more misprints, 𝑃(𝑥 ≥ 4) = 1 − 𝑃(𝑥 < 4)
6) A certain screw making machine produces an average 2 defective out of 100 and packs of them in
boxes of 500. Find the probability that the box contains,
i)3 defective
ii)Atleast 1 defective
iii)Between 2 & 4 defective
2
Soln: Given the machine producing an average defective screw is 𝑝 = = 0.02
100
also given, n=500, 𝜇 = 𝑛𝑝 = 500 × 0.02 = 10 =
𝑒 − 𝑥
The probability mass function of the poisson distribution is 𝑃(𝑋 = 𝑥) = 𝑥!
𝑒 −10 (10)𝑥
𝑃(𝑋 = 𝑥) =
𝑥!
iii) The probability that between 2 & 4 screw will be defective = 𝑃(2 ≤ 𝑥 ≤ 4)
= 𝑃(2) + 𝑃(3) + 𝑃(4)
= 𝑒 −10 × 633.32
= 0.02875
Exponential Distribution:
Let X be a continuous random variable for any real value 𝛼 > 0, then the probability density function
𝛼𝑒 −𝛼𝑥 , 𝑥 > 0
of an exponential distribution can be defined as, 𝑃(𝑋 = 𝑥) = 𝑓(𝑥) = { , it follows:
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 21 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
i) 𝑓(𝑥) ≥ 0
∞
ii) ∫−∞ 𝑓(𝑥) 𝑑𝑥 = 1
1
iii) 𝑀𝑒𝑎𝑛 𝜇 = 𝛼
1
iv) 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝜎 2 = 𝛼2
1
v) 𝑆. 𝐷 𝑜𝑓 𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛, 𝜎 =
𝛼
PROBLEMS
1) If X is an Exponential variant with mean 3, then find 𝑷(𝒙 > 𝟏) & 𝑷(𝒙 < 𝟑).
3
3 0 3 1 3 −𝑥 −𝑥
1
ii) 𝑃(𝑥 < 3) = ∫−∞ 𝑓(𝑥)𝑑𝑥 = ∫−∞ 𝑓(𝑥)𝑑𝑥 + ∫0 𝑓(𝑥)𝑑𝑥 = 0 + 3 ∫0 𝑒 3 𝑑𝑥 = − [𝑒 3 ] = −[𝑒 −1 − 𝑒 0 ] = 1 − 𝑒
0
2) If X is an exponential variant with mean 4, then find 𝑷(𝟎 < 𝒙 < 𝟏), 𝑷(𝒙 > 𝟐) & 𝑷(−∞ < 𝒙 < 𝟏𝟎).
10 0 10
1 10 −𝑥
𝑃(−∞ < 𝑥 < 10) = ∫ 𝑓𝑟(𝑥)𝑑𝑥 = ∫ 𝑓𝑟(𝑥)𝑑𝑥 + ∫ 𝑓𝑟(𝑥)𝑑𝑥 = 0 + ∫ 𝑒 4 𝑑𝑥
−∞ −∞ 0 4 0
−𝑥 10 −10 1
⇒ 𝑃(−∞ < 𝑥 < 10) = − [𝑒 4 ] = − [𝑒 4 − 𝑒0] = 1 − 5
0
𝑒2
3) In a certain town the duration of shower has mean 5 minutes, what is the probability that shower
will last for,
i)10 minutes and more
ii)Less than 10 minutes
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 22 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
iii)Between 10 & 12 minutes.
i) The probability that the shower will last 10 minutes and more is,
∞ ∞
1 −𝑥 1 ∞ −𝑥 𝑥 ∞ 1
𝑃(𝑥 ≥ 10) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 𝑑𝑥 = ∫ 𝑒 5 𝑑𝑥 = − [𝑒 −5 ] = −[0 − 𝑒 −2 ] = 2
5
10 10 5 5 10 10 𝑒
ii) The probability that the shower will last less than 10 minutes is,
10 0 10
𝑃(𝑥 < 10) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥
−∞ −∞ 0
10
1 −𝑥 1 10 𝑥 𝑥 10 1
⇒ 𝑃(𝑥 < 10) = 0 + ∫ 𝑒 5 𝑑𝑥 = ∫ 𝑒 −5 𝑑𝑥 = − [𝑒 −5 ] = −[𝑒 −2 − 1] = 1 − 2
0 5 5 0 0 𝑒
iii) The probability that the shower will last between 10 & 12 minutes is,
12 12
1 −𝑥 1 12 𝑥
𝑃(10 < 𝑥 < 12) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 5 𝑑𝑥 = ∫ 𝑒 −5 𝑑𝑥
10 10 5 5 10
𝑥 12 −12 1 1
⇒ 𝑃(10 < 𝑥 < 12) = − [𝑒 −5 ] = − [𝑒 5 − 𝑒 −2 ] = 12 − 2
10 𝑒
𝑒5
4) The life of a TV tube manufactured by a company is known to have mean 200 months. Assuming
that the life of tube has an exponential distribution, find the probability that the life of a tube
manufactured by a company is,
i)Less than 200 months
ii)Between 100 & 300 months
iii)More than 200 months
i) The probability that the life of a tube is less than 200 months is,
200 200 200 −𝑥 −𝑥 200
1 −𝑥 1 1
𝑃(𝑥 < 200) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 200 𝑑𝑥 = ∫ 𝑒 200 𝑑𝑥 = − [𝑒 200 ] = −[𝑒 −1 − 𝑒 0 ] = 1 −
−∞ 0 200 200 0 0 𝑒
ii) The probability that the life of a tube is between 100 & 300 months is,
300 300 300
1 − 𝑥 1 𝑥
𝑃(100 ≤ 𝑥 ≤ 300) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 200 𝑑𝑥 = ∫ 𝑒 −200 𝑑𝑥
100 100 200 200 100
iii) The probability that the life of a tube is more than 200 months is,
∞ ∞
1 −𝑥
𝑃(𝑥 > 200) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 200 𝑑𝑥
200 200 200
∞ −𝑥 −𝑥 ∞
1 −∞ 1
⇒ 𝑃(𝑥 > 200) = ∫ 𝑒 200 𝑑𝑥 = − [𝑒 200 ] = − [𝑒 200 − 𝑒 −1 ] = 𝑒 −1 =
200 200 200 𝑒
5) The length of a telephone conversation is an exponential variant with mean 3 minutes. Find the
probability that a call,
i)ends in less than 3 minutes
ii)ends between 3 & 5 minutes
iii)ends in more than 4 minutes
i) The probability that the conversation ends in less than 3 minutes is,
3 3
1 −𝑥 1 3 −𝑥 −𝑥 3 1
𝑃(𝑥 < 3) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 3 𝑑𝑥 = ∫ 𝑒 3 𝑑𝑥 = − [𝑒 3 ] = −[𝑒 −1 − 𝑒 0 ] = 1 −
−∞ 0 3 3 0 0 𝑒
ii) The probability that the conversation ends in between 3 & 5 minutes is,
5 5
1 −𝑥 1 5 −𝑥
𝑃(3 ≤ 𝑥 ≤ 5) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 3 𝑑𝑥 = ∫ 𝑒 3 𝑑𝑥
3 3 3 3 3
𝑥 5 −5 1 1
⇒ 𝑃(100 ≤ 𝑥 ≤ 300) = − [𝑒 −3 ] = − [𝑒 3 − 𝑒 −1 ] = − 5
3 𝑒
𝑒3
iii) The probability that the conversation ends in more than 4 minutes is,
∞ ∞
1 −𝑥
𝑃(𝑥 > 4) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 3 𝑑𝑥
4 4 3
1 ∞ −𝑥 −𝑥 ∞ −∞ −4 −4 1
⇒ 𝑃(𝑥 > 4) = ∫ 𝑒 3 𝑑𝑥 = − [𝑒 3 ] = − [𝑒 3 − 𝑒 3 ] = 𝑒 3 = 4
3 4 4
𝑒3
Normal Distribution:
Let X be a continuous random variable for any real 𝜇 𝑎𝑛𝑑 𝜎 2 , the normal distribution can be defined
as,
−(𝑥−𝜇) 2
1
𝑓(𝑥) = 𝑒 2𝜎2 ,
𝜎 √2𝜋
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 24 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
where −∞ ≤ 𝑥 ≤ ∞ , −∞ ≤ 𝜇 ≤ ∞ and here 𝜇 , 𝜎 2 (> 0) are called the mean and variance of the
normal distribution i.e., widely used in statistical inference, hypothesis testing, data analysis, i.e., to analysis
the data when there is an equal chance for the data to be above and below the average value of the
continuous data. The normal is also known as Gaussian distribution (or) Probability Bell Curve. The normal
distribution is a probability distribution i.e., symmetric about the mean, showing that data near the mean are
more frequent in occurrence than data far from the mean.
The normal distribution follows as,
𝑏
𝑃(𝑎 ≤ 𝑥 ≤ 𝑏) = ∫𝑎 𝑓(𝑥)𝑑𝑥
𝑏
1 −(𝑥−𝜇)2
⇒ 𝑃(𝑎 ≤ 𝑥 ≤ 𝑏) = ∫ 𝑒 2𝜎2 𝑑𝑥
𝑎 𝜎√2𝜋
𝑥−𝜇 2
𝑏 −( )
1 𝜎
⇒ 𝑃(𝑎 ≤ 𝑥 ≤ 𝑏) = ∫ 𝑒 2 𝑑𝑥
𝑎 𝜎√2𝜋
𝑥−𝜇
𝐿𝑒𝑡 𝑍 =
𝜎
𝑥−𝜇 𝑎−𝜇 𝑏−𝜇
Where 𝑧 = 𝜎
, 𝑧1 = 𝜎
, 𝑧2 = 𝜎
𝑧2
1
and 𝐹(𝑧) = 𝑒 − 2 is called the standard normal function
√2𝜋
𝑥−𝜇
and 𝑧 = is called the standard normal variate.
𝜎
when 𝑧1 = 0 , 𝑧2 = 𝑧 , then the normal curve over 0 to 𝑧 is defined as
𝑧 𝑧2
1
𝐴(𝑧) = 𝜑(𝑧) = ∫ 𝑒 − 2 𝑑𝑧
√2𝜋 0
where these values will be taken from Area table of normal distribution.
Sl.N
Probability Range Result Graph
o.
1) The marks of 1000 students in an examination follows normal distribution with mean
70 and standard deviation 5. Find the number students whose marks will be
i) Less than 65
i) More than 75
ii) Between 65 and 75. [A(1)=0.3413]
Sol.
Let X be the continuous random variable
Given
Mean of the Normal distribution 𝜇 = 70
Standard deviation of the Normal distribution 𝜎 = 5
𝑥−𝜇 𝑥−70
∴The standard normal variate 𝑧 = 𝜎 ⇒ 𝑧 = 5
65−70
When 𝑥 =65 then 𝑧 = = −1
5
5−70
When 𝑥 =75 then 𝑧 = 5
=1
i) No.0f students scored less than 65 marks=𝑃(𝑥 < 65) = 𝑃(𝑧 < −1) =
𝑃(𝑧 > 1) = 0.5 − 𝐴(1) = 0.5 − 0.3413 = 0.1587
No.0f students scored less than 65 marks out of 1000 students=1000x0.1587=158.7=159
ii) No.0f students scored more than 75 marks=𝑃(𝑥 > 75) = 𝑃(𝑧 > 1) =
𝑃(𝑧 > 1) = 0.5 − 𝐴(1) = 0.5 − 0.3413 = 0.1587
No.0f students scored more than 75 marks out of 1000 students=1000x0.1587=158.7=159
iii) No.0f students scored marks between 65 and 75 =𝑃(65 < 𝑥 < 75) = 𝑃(−1 < 𝑧 < 1)
= 2𝑃(0 < 𝑧 < 1)
= 2𝐴(1)
= 2 × 0.3413
= 0.6826
No.0f students scored between65 and 75 marks out of 1000 students=1000x0.6826
=682.6=683
Sol.
Let X be the continuous random variable
Given
Mean of the Normal distribution 𝜇 = 30
Standard deviation of the Normal distribution 𝜎 = 6.25
𝑥−𝜇 𝑥−30
∴The standard normal variate 𝑧 = 𝜎 ⇒ 𝑧 = 6.25
20−30
When 𝑥 =20 then 𝑧 = 6.25
= −1.6
40−30
When 𝑥 =40 then 𝑧 = 6.25
= 1.6
35−30
When 𝑥 =35 then 𝑧 = = 0.8
6.25
The probability that number of students expected to score between 20 and 40 marks:
𝑃(20 < 𝑥 < 40) = 𝑃(−1.6 < 𝑧 < 1.6)
⇒ 𝑃(20 < 𝑥 < 40) = 2 × 𝑃(0 < 𝑧 < 1.6)
⇒ 𝑃(20 < 𝑥 < 40) = 2 × 𝐴(1.6)
⇒ 𝑃(20 < 𝑥 < 40) = 2 × 0.4452 = 0.8904 = 0.9
The number of students expected to score between 20 and 40 marks out of 200:
The probability that number of students expected to score less than 35:
= 𝑃(𝑥 < 35) = 𝑃(𝑧 < 0.8)
= 𝐴(0.8) = 0.2881 = 0.3
The number of students expected to score less than 35 marks out of 200:
=200x0.3=60
3) The weekly wages of workers in a company are normally distributed with mean of
Rs.700 and S.D. of Rs.50.Find the probability that the weekly wage of randomly
chosen workers is i) Between Rs.650 and Rs.750 ii) More than Rs.750.
Sol.
Let X be the continuous random variable
Given
Mean of the Normal distribution 𝜇 = 700
Standard deviation of the Normal distribution 𝜎 = 50
𝑥−𝜇 𝑥−700
∴The standard normal variate 𝑧 = 𝜎 ⇒ 𝑧 = 50
650−700
When 𝑥 =650 then 𝑧 = 50
= −1
750−700
When 𝑥 =750 then 𝑧 = =1
50
The probability of the weekly wages between Rs.659 and Rs.750 is:
4) In a test on 2000 electric bulbs , it was found that the life of a particular make was
normally distributed with an average life of 2040 hours and SD of 60 hours . Estimate
the number of bulbs likely to burn for...
i) More than 2150 hours
ii) Less than 1950 hours
Sol.
Let X be the continuous random variable
Given
Mean of the Normal distribution 𝜇 = 2040
Standard deviation of the Normal distribution 𝜎 = 60
𝑥−𝜇 𝑥−2040
∴The standard normal variate 𝑧 = 𝜎 ⇒ 𝑧 = 60
2150−2040
When 𝑥 =2150 then 𝑧 = = 1.83
60
1950−2040
When 𝑥 =1950 then 𝑧 = 60
= −1.5
1920−2040
When 𝑥 =1920 then 𝑧 = 60
= −2
2160−2040
When 𝑥 =2160 then 𝑧 = =2
60
i) The probability that the number of bulbs likely to burn of more than 2150 hours:
𝑃(𝑥 > 2150) = 𝑃(𝑧 > 1.83) =
𝑃(𝑧 > 1.83) = 0.5 − 𝐴(1.83) = 0.5 − 0.4664 = 0.0336
The number of bulbs likely to burn of more than 2150 hours out of 2000 bulbs :
=2000x0.0336
=67.2=67
ii) The probability that the number of bulbs likely to burn of less than 1950 hours:
=𝑃(𝑥 < 1950) = 𝑃(𝑧 < −1.5) =
𝑃(𝑧 > 1.5) = 0.5 − 𝐴(1.5) = 0.5 − 0.4332 = 0.0668
The number of bulbs likely to burn of less than 1950 hours out of 2000 bulbs :
=2000x0.0668
=133.6=137
iii) The probability that the number of bulbs likely to burn between 1920 and 2160 hours
=𝑃(1920 < 𝑥 < 2160) = 𝑃(−2 < 𝑧 < 2)
= 2𝑃(0 < 𝑧 < 2)
= 2𝐴(2)
= 2 × 0.4772
= 0.9544
The number of bulbs likely to burn between 1920 and 2160 hours out of 2000 bulbs :
=2000x0.9544
=1908.8=1909
5) If the life time of a certain types electric bulbs of a particular brand was distributed
normally with an average life of 2000 hours and S.D.60 hours. If a firm purchases 2500
bulbs, find the number of bulbs that are likely to last for (i) more than 2100 hours
(ii) less than 1950 hours
(iii) between 1900 and 2100 hours.
Sol.
Let X be the continuous random variable
Given
Mean of the Normal distribution = 2000
Standard deviation of the Normal distribution = 60
x− x − 2000
The standard normal variate z = z=
60
1950 − 2000
When x =1950 then z = = −0.83
60
i) The probability that the number of bulbs likely to burn of more than 2100 hours:
𝑃(𝑥 > 2100) = 𝑃(𝑧 > 1.66)=0.5 − 𝐴(1.66) = 0.5 − 0.4515 = 0.0485
The number of bulbs likely to burn of more than 2100 hours out of 2500 bulbs:
=2500x0.0485
=121.25=121
ii) The probability that the number of bulbs likely to burn of less than 1950 hours:
=𝑃(𝑥 < 1950) = 𝑃(𝑧 < −0.83) = 0.5 − 𝐴(0.83) = 0.5 − 0.2967 = 0.2033
The number of bulbs likely to burn of less than 1950 hours out of 2500 bulbs :
=2500x0.2033=508.25=508
iii) The probability that the number of bulbs likely to burn between 1900 and 2100 hours
= P (1900 x 2100 ) = P (−1.66 z 1.66 )
= 2 P (0 z 1.66 )
= 2 A(1.66 )
= 2 0.4515
= 0.9030
The number of bulbs likely to burn between 1900 and 2100 hours out of 2500 bulbs :
=2500x0.9030
=2257.5=2258
6) In a normal distribution , 7% of items are under 35 and 89% of the items are under
63.Find the mean and standard deviation of the distribution.
Sol.
Given
𝑃(𝑥 < 35) = 𝑃(𝑧 < 𝑧1 ) = 0.07
⇒ 𝑃(𝑧 < 𝑧1 ) = 𝑃(−∞ < 𝑧 < 0) − 𝑃(0 < 𝑧 < 𝑧1 ) = 0.07
⇒ 0.5 − 𝐴(𝑧1 ) = 0.07
⇒ 𝐴(𝑧1 ) = 0.5 − 0.07
⇒ 𝐴(𝑧1 ) = 𝐴(−1.47)
⇒ 𝑧1 = −1.47
35 − 𝜇
⇒ = −1.47
𝜎
⇒ 𝜇 − 1.47𝜎 = 35 − − − − − −(2)
And 𝑃(𝑥 < 63) = 𝑃(𝑧 < 𝑧2 ) = 0.89
⇒ 𝑃(𝑧 < 𝑧2 ) = 𝑃(−∞ < 𝑧 < 0) + 𝑃(0 < 𝑧 < 𝑧2 ) = 0.89
⇒ 0.5 + 𝐴(𝑧2 ) = 0.89
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 31 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
⇒ 𝐴(𝑧2 ) = 0.89 − 0.5 = 0.39
⇒ 𝐴(𝑧2 ) = 𝐴(1.23)
⇒ 𝑧2 = 1.23
63 − 𝜇
⇒ = 1.23
𝜎
⇒ 𝜇 + 1.23𝜎 = 63 − − − − − − − −(3)
Solving eq(2) and (3)
we get
𝜇 = 50.2915
𝜎 = 10.332
7) In a normal distribution , 31% of items are under 45 and 8% of the items are 0ver
64.Find the mean and standard deviation of the distribution.
Sol.
Given
𝑃(𝑥 < 35) = 𝑃(𝑧 < 𝑧1 ) = 0.31
⇒ 𝑃(𝑧 < 𝑧1 ) = 𝑃(−∞ < 𝑧 < 0) − 𝑃(0 < 𝑧 < 𝑧1 ) = 0.31
⇒ 0.5 − 𝐴(𝑧1 ) = 0.31
⇒ 𝐴(𝑧1 ) = 0.5 − 0.31 = 0.19
⇒ 𝐴(𝑧1 ) = 𝐴(0.5)
⇒ 𝑧1 = 0.5
45 − 𝜇
⇒ = 0.5
𝜎
⇒ 𝜇 + 0.5𝜎 = 45 − − − − − −(2)
And 𝑃(𝑥 > 64) = 𝑃(𝑧 > 𝑧2 ) = 0.08
⇒ 𝑃(𝑧 > 𝑧2 ) = 0.5 − 𝑃(0 < 𝑧 < 𝑧2 ) = 0.08
⇒ 0.5 − 𝐴(𝑧2 ) = 0.08
⇒ 𝐴(𝑧2 ) = 0.08 − 0.5 = −0.42
⇒ 𝐴(𝑧2 ) = 𝐴(1.4)
⇒ 𝑧2 = 1.4
64 − 𝜇
⇒ = 1.4
𝜎
⇒ 𝜇 + 1. .4𝜎 = 64 − − − − − − − −(3)
Solving eq(2) and (3)
we get
𝜇 = 50
𝜎 = 10
8) In an examination 7% of the students scored less than 35% of the marks and 89% of
the students scored less than 60% of the marks. Find the mean and standard
deviation if marks are normally distributed.
Sol.
Let X be the continuous random variable
Given
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 32 TAKEITEASY ENGINEERS
Inspire before you expire…, BCS301
Let 𝜇 and𝜎 be the Mean and Standard deviation of the distribution
𝑥−𝜇
∴The standard normal variate 𝑧 = 𝜎
--------------(1)
35−𝜇
When x=35 the standard normal variate 𝑧 = 𝜎
= 𝑧1 (𝑆𝑎𝑦)
60−𝜇
When x=63 the standard normal variate 𝑧 = = 𝑧2 (𝑆𝑎𝑦)
𝜎
Given
𝑃(𝑥 < 35) = 𝑃(𝑧 < 𝑧1 ) = 0.07
⇒ 𝑃(𝑧 < 𝑧1 ) = 𝑃(−∞ < 𝑧 < 0) − 𝑃(0 < 𝑧 < 𝑧1 ) = 0.07
⇒ 0.5 − 𝐴(𝑧1 ) = 0.07
⇒ 𝐴(𝑧1 ) = 0.5 − 0.07
⇒ 𝐴(𝑧1 ) = 𝐴(−1.47)
⇒ 𝑧1 = −1.47
35 − 𝜇
⇒ = −1.47
𝜎
⇒ 𝜇 − 1.47𝜎 = 35 − − − − − −(2)
And 𝑃(𝑥 < 60) = 𝑃(𝑧 < 𝑧2 ) = 0.89
⇒ 𝑃(𝑧 < 𝑧2 ) = 𝑃(−∞ < 𝑧 < 0) + 𝑃(0 < 𝑧 < 𝑧2 ) = 0.89
⇒ 0.5 + 𝐴(𝑧2 ) = 0.89
⇒ 𝐴(𝑧2 ) = 0.89 − 0.5 = 0.39
⇒ 𝐴(𝑧2 ) = 𝐴(1.23)
⇒ 𝑧2 = 1.23
60 − 𝜇
⇒ = 1.23
𝜎
⇒ 𝜇 + 1.23𝜎 = 60 − − − − − − − −(3)
Solving eq(2) and (3)
we get
𝜇 = 48.65
𝜎 = 9.25
*****
Prepared by:
Purushotham P
Assistant Professor
SJC Institute of Technology
Email id: [email protected]
Joint Probability:
Let 𝑋 = {𝑥1 , 𝑥2 , 𝑥3 , . . . . . . . 𝑥𝑚 } and 𝑌 = {𝑦1 , 𝑦2 , 𝑦3 , . . . . . . . 𝑦𝑛 } are two discrete random variables ,
then the joint probability function of X and Y is defined as
𝑃(𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 ) = 𝑃(𝑥𝑖 , 𝑦𝑗 ) = 𝑓(𝑥𝑖 , 𝑦𝑗 ) = 𝑝𝑖𝑗 = 𝑓𝑖𝑗
where the function 𝑓(𝑥, 𝑦) satisfy the conditions
i) 𝑓(𝑥, 𝑦) ≥ 0ii) ∑𝑖 ∑𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 ) = 1
The joint probability table as shown below,
Y 𝑦1 𝑦2 𝑦3 ..... 𝑦𝑛 𝑓(𝑥𝑖 )
X
𝑥1 .....
𝑝11 𝑝12 𝑝13 𝑝1𝑛 𝑓(𝑥1 )
𝑥3 .....
𝑝31 𝑝32 𝑝33 𝑝3𝑛 𝑓(𝑥3 )
. .....
. . . . . .
. .
. . . . .
. . . . . .
.
. . . . .
.
e) The covariance of X and Y is denoted by COV(X,Y) and defined as 𝐶𝑂𝑉(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝐸(𝑋). 𝐸(𝑌)
𝑚 𝑛
𝐶𝑂𝑉(𝑋, 𝑌) = ∑ ∑ 𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 ) − 𝜇𝑋 . 𝜇𝑌
𝑖=1 𝑗=1
𝐶𝑂𝑉(𝑋,𝑌)
f) The correlation between X and Y is 𝜌(𝑋, 𝑌) =
𝜎𝑋 𝜎𝑌
PROBLEMS
Soln: Given,
𝑥1 = 1, 𝑥2 = 5, 𝑦1 = −4, 𝑦2 = 2, 𝑦3 = 7
And the probabilities are
1 1 1 1 1 1
𝑝11 = 8 , 𝑝12 = 4 , 𝑝13 = 8 , 𝑝21 = 4 , 𝑝22 = 8 , 𝑝23 = 8
Given the joint probability distribution is follows as
Y -4 2 7 𝑓(𝑥𝑖 )
X
1 1⁄ 1⁄ 1⁄ 1⁄
8 4 8 2
5 1⁄ 1⁄ 1⁄ 1⁄
4 8 8 2
𝑔(𝑦𝑖 ) 3⁄ 3⁄ 1⁄ 1
8 8 4
1 1
i) 𝜇𝑋 = 𝐸(𝑋) = ∑2𝑖=1 𝑥𝑖 𝑓(𝑥𝑖 ) = (1 × 2) + (5 × 2) = 3
3 3 1
𝜇𝑌 = 𝐸(𝑌) = ∑3𝑗=1 𝑦𝑗 𝑔(𝑦𝑗 ) = (−4 × ) + (2 × ) + (7 × ) = 1
8 8 4
1 1
iii) 𝜎𝑋 2 = 𝐸(𝑋 2 ) − 𝜇𝑋 2 = ∑2𝑖=1 𝑥𝑖 2 𝑓(𝑥𝑖 ) − 𝜇𝑋 2 = (12 × 2) + (52 × 2) − 9 = 13 − 9 = 4 ⇒ 𝜎𝑋 = 2
3 3 1 75
𝜎𝑌 2 = 𝐸(𝑌 2 ) − 𝜇𝑌 2 = ∑3𝑗=1 𝑦𝑗 2 𝑔(𝑦𝑗 ) − 𝜇𝑌 2 = ((−4)2 × 8) + (22 × 8) + (72 × 4) − 12 = 4
⇒ 𝜎𝑌 = 4.33
3 3
iv) 𝐶𝑂𝑉(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝜇𝑋 𝜇𝑌 = − (3)(1) = −
2 2
𝐶𝑂𝑉(𝑋,𝑌) −3⁄
2
v) 𝜌(𝑋, 𝑌) = = = −0.1732
𝜎𝑋 𝜎𝑌 2×4.33
Hence the given random variables are not independent
𝐶𝑂𝑉(𝑋,𝑌) −0.5
vi) 𝜌(𝑋, 𝑌) = = = −0.3294
𝜎𝑋 𝜎𝑌 0.4898×3.0983
Hence the given random variables are not independent
3) Determine,
i) Marginal distribution.
ii) Covariance between the discrete random variables X and Y,
using the joint probability distribution.
Y 3 4 5
X
2 1⁄ 1⁄ 1⁄
6 6 6
5 1⁄ 1⁄ 1⁄
12 12 12
7 1⁄ 1⁄ 1⁄
12 12 12
Soln: Given,
𝑥1 = 2, 𝑥2 = 5, 𝑥3 = 7, 𝑦1 = 3, 𝑦2 = 4, 𝑦3 = 5
And the probabilities are
1 1 1 1 1 1 1 1 1
𝑝11 = 6 , 𝑝12 = 6 , 𝑝13 = 6 , 𝑝21 = 12 , 𝑝22 = 12 , 𝑝23 = 12 𝑝31 = 12 , 𝑝32 = 12 , 𝑝33 = 12
𝑥𝑖 2 5 7 𝑦𝑖 3 4 5
𝑓(𝑥𝑖 ) 1⁄ 1⁄ 1⁄ 𝑔(𝑦𝑖 ) 1⁄ 1⁄ 1⁄
2 4 4 3 3 3
∴ 𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝜇𝑋 𝜇𝑌
⇒ 𝐶𝑜𝑣(𝑋, 𝑌) = 16 − 4 × 4
⇒ 𝐶𝑜𝑣(𝑋, 𝑌) = 0
Hence the given random variables are independent
4) The joint probability distribution of discrete random variables X and Y is given below:
Y 1 3 6
X
1 1⁄ 1⁄ 1⁄
9 6 18
3 1⁄ 1⁄ 1⁄
6 4 12
6 1⁄ 1⁄ 1⁄
18 12 36
Determine,
i) Marginal distribution of X and Y.
ii) Are X and Y statistically independent?
Soln: Given,
𝑥1 = 1, 𝑥2 = 3, 𝑥3 = 6, 𝑦1 = 1, 𝑦2 = 3, 𝑦3 = 6
And the probabilities are
1 1 1 1 1 1 1 1 1
𝑝11 = 9 , 𝑝12 = 6 , 𝑝13 = 18 , 𝑝21 = 6 , 𝑝22 = 4 , 𝑝23 = 12 , 𝑝31 = 18 , 𝑝32 = 12 , 𝑝33 = 36
𝑥𝑖 1 3 6 𝑦𝑖 1 3 6
𝑓(𝑥𝑖 ) 1⁄ 1⁄ 3⁄ 𝑔(𝑦𝑖 ) 1⁄ 1⁄ 3⁄
3 2 18 3 2 18
1 3
ii) 𝜇𝑋 = 𝐸(𝑋) = ∑3𝑖=1 𝑥𝑖 𝑓(𝑥𝑖 ) = 3 + 2 + 1 = 2.8333
1 3
𝜇𝑌 = 𝐸(𝑌) = ∑3𝑗=1 𝑦𝑗 𝑔(𝑦𝑗 ) = 3 + 2 + 1 = 2.8333
1 3 6 3 9 18 6 18 36
𝐸(𝑋𝑌) = ∑3𝑖=1 ∑3𝑗=1 𝑥𝑖 𝑦𝑗 𝑝𝑖𝑗 = 9 + 6 + 18 + 6 + 4 + 12 + 18 + 12 + 36 = 8.0278
5) Determine,
i) Marginal distribution.
ii) Covariance between the discrete random variables X and Y along with corelation using the joint
probability distribution.
Y 1 3 9
X
2 1⁄ 1⁄ 1⁄
8 24 12
4 1⁄ 1⁄ 0
4 4
6 1⁄ 1⁄ 1⁄
8 24 12
Soln: Given
𝑥1 = 2, 𝑥2 = 4, 𝑥3 = 6, 𝑦1 = 1, 𝑦2 = 3, 𝑦3 = 9
And the probabilities are
1 1 1 1 1 1 1 1
𝑝11 = 8 , 𝑝12 = 24 , 𝑝13 = 12 , 𝑝21 = 4 , 𝑝22 = 4 , 𝑝23 = 0, 𝑝31 = 8 , 𝑝32 = 24 , 𝑝33 = 12
Y 1 3 9 𝑓(𝑥𝑖 )
X
2 1⁄ 1⁄ 1⁄ 1⁄
8 24 12 4
4 1⁄ 1⁄ 0 1⁄
4 4 2
6 1⁄ 1⁄ 1⁄ 1⁄
8 24 12 4
𝑔(𝑦𝑖 ) 1⁄ 1⁄ 1⁄ 1
2 3 6
𝑦𝑖 3 4 5
𝑥𝑖 2 4 6
𝑔(𝑦𝑖 ) 1⁄ 1⁄ 1⁄
𝑓(𝑥𝑖 ) 1⁄ 1⁄ 1⁄ 2 3 6
4 2 4
1 1 1
𝜇𝑋 = 𝐸(𝑋) = ∑3𝑖=1 𝑥𝑖 𝑓(𝑥𝑖 ) = (2 × 4) + (4 × 2) + (6 × 4) = 0.5 + 2 + 1.5 = 4
1 1 1
𝜇𝑌 = 𝐸(𝑌) = ∑3𝑗=1 𝑦𝑗 𝑔(𝑦𝑗 ) = (1 × ) + (3 × ) + (9 × ) = 0.5 + 1 + 1.5 = 3
2 3 6
𝐶𝑂𝑉(𝑋,𝑌) 0
𝜌(𝑋, 𝑌) = 𝜎𝑋 𝜎𝑌
= 1.4142×2.8284 = 0
The given random variables X and Y are not statistically independent.
6) Determine,
i) Marginal distribution.
ii) Covariance between the discrete random variables X and Y along with corelation using the joint
probability distribution.
Y -3 2 4
X
1 0.1 0.2 0.2
2 0.3 0.1 0.1
Soln: Given
𝑥1 = 1, 𝑥2 = 2, 𝑦1 = −3, 𝑦2 = 2, 𝑦3 = 4
And the probabilities are
𝑝11 = 0.1, 𝑝12 = 0.2, 𝑝13 = 0.2, 𝑝21 = 0.3, 𝑝22 = 0.1, 𝑝23 = 0.1
𝑥𝑖 1 2 𝑦𝑖 -3 2 4
𝑓(𝑥𝑖 ) 0.5 0.5 𝑔(𝑦𝑖 ) 0.4 0.3 0.3
𝐶𝑂𝑉(𝑋,𝑌) 0
𝜌(𝑋, 𝑌) = 𝜎𝑋 𝜎𝑌
= 0.5×3.0397 = 0
The given random variables X and Y are not statistically independent.
𝟏 𝟏
7) X and Y are independent random variables. X takes the values 2,5 and 7 with probabilities 𝟐 , 𝟒 and
𝟏 𝟏 𝟏 𝟏
𝟒
respectively. Y takes the values 3,4 and 5 with the probabilities 𝟑 , 𝟑 & 𝟑.
a) Find the JPD of X and Y
b) Show that COV (X, Y) = 0
n
Sol :
Given X & Y are independent random variables follows the marginal probabilities as below.
𝑥 2 5 7 𝑦 3 4 5
𝑓(𝑥) 1 1 1 𝑔(𝑦) 1 1 1
2 4 4 2 4 4
Y 3 4 5 𝑓(𝑥𝑖 )
X
2 1 1 1 1
6 6 6 2
5 1 1 1 1
12 12 12 4
7 1 1 1 1
12 12 12 4
𝑔(𝑦𝑖 ) 1 1 1
1
3 3 3
𝟏 𝟏 𝟏
∴ 𝝁𝑿 = 𝑬(𝑿) = ∑ 𝒙𝒊 𝒇(𝒙𝒊 ) = (𝟐 × 𝟐) + (𝟓 × 𝟒) + (𝟕 × 𝟒) = 𝟒
𝟏 𝟏 𝟏
∴ 𝝁𝒀 = 𝑬(𝑿) = ∑ 𝒚𝒋 𝒈(𝒚𝒋 ) = (𝟑 × 𝟑) + (𝟒 × 𝟑) + (𝟓 × 𝟑) = 𝟒
8 10 15 20 25 21 28 35 192
∴ 𝐸(𝑋𝑌) = ∑2𝑖=1 ∑3𝑗=1 𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 ) = 1 + + + + + + + + = = 16
6 6 12 12 12 12 12 12 12
∴ 𝐶𝑂𝑉(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝜇𝑋 𝜇𝑌 = 16 − (4)(4) = 16 − 16 = 0
Stochastic Process
Stochastic process consists of sequence of experiments in which each experiment has a finite number of
outcomes with the given probabilities.
Probability Vector
A vector 𝑉 = [𝑣1 , 𝑣2 , 𝑣3 , . . . . . . , 𝑣𝑛 ] is called the probability vector if each one of its components are non-
negative and their sum is equal to unity or 1.
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 8 TAKEITEASY ENGINEERS
Inspire before you expire…, TIE- Notes and Resources App BCS301
1 1 1
Ex: = [0.1, 0.6, 0.3] , 𝑉 = [ 3 , 3 , 3] , etc…
Stochastic Matrix
A square matrix P is called a stochastic matrix if all the entries of P are non-negative and the sum of all the
entries of any row is 1
(or)
A square matrix P is called a stochastic matrix where each row is in the form of the probability vector.
1 1 0 1 0
1 1
Ex: = [2 2] , 𝑃 = [0 2 2 ]
2 1
1 1 1
3 3
2 4 4
Transition Matrix
A transition matrix is also known as a stochastic or probability matrix, is a square matrix (n x n) representing
the transition probabilities of a stochastic system.
0 1 0
1 1 1
Ex: 𝑃 = [ 2 4 4]
1 1 1
3 3 3
PROBLEMS
𝟎 𝟎 𝟏
1) Verify that the matrix 𝐀 = [𝟏⁄𝟐 𝟏⁄𝟒 𝟏⁄𝟒] is a regular stochastic matrix.
𝟎 𝟏 𝟎
Soln: Given matrix A , each element is nonnegative and the sum of the elements in each row is equal to 1.
∴A is stochastic matrix.
0 0 1 0 0 1 0 1 0
2 1 1 1 1 1 1 1⁄ 5⁄ 9⁄
Let 𝐴 = [ ⁄2 ⁄4 ⁄4] [ ⁄2 ⁄4 ⁄4] = [ 8 16 16]
0 1 0 0 1 0 1⁄2 1⁄4 1⁄4
0 1 0 1⁄ 1⁄ 1⁄
0 0 1 2 4 4
3 2 1⁄ 5⁄ 9⁄ 1 1 1 5 41 13
𝐴 =𝐴 ×𝐴=[ 8 16 16] . [ ⁄2 ⁄4 ⁄4] = ⁄32 ⁄64 ⁄64
⁄
1 2 1⁄4 1⁄4 0 1 0 5⁄ 9⁄
[ 1⁄8 16 16 ]
∴Hence, all the entries in 𝐴3 are nonnegative or positive and the sum of each row =1.
𝟏⁄ 𝟐⁄
3) Find the fixed probability vector for the regular stochastic matrix 𝐀 = [ 𝟑 𝟑].
𝟏⁄ 𝟑⁄
𝟒 𝟒
1⁄ 2⁄
Soln: Given 𝐴 = [ 3 3]
1⁄ 3⁄
4 4
Since, the given matrix A is of second order.
Let 𝑄 = [𝑥 𝑦] be the fixed probability vector, for every 𝑥 ≥ 0, 𝑦 ≥ 0&𝑥 + 𝑦 = 1
𝟏⁄ 𝟐⁄
∴ 𝑸𝑨 = [𝒙 𝒚]. [ 𝟑 𝟑] = [𝟏 𝒙 + 𝟏 𝒚 𝟐 𝒙 + 𝟑 𝒚]
𝟏⁄ 𝟑⁄ 𝟑 𝟒 𝟑 𝟒
𝟒 𝟒
Since QA=Q
1 1 2 3
⇒ [3 𝑥 + 4 𝑦 𝑥 + 𝑦] = [𝑥 𝑦]
3 4
1 1 2 3
⇒ 𝑥 + 𝑦 = 𝑥, 𝑥 + 𝑦 = 𝑦
3 4 3 4
2 1 2 1
⇒ 𝑥 = 𝑦, 𝑥 = 𝑦. . . . (1), 𝑊𝑒ℎ𝑎𝑣𝑒𝑥 + 𝑦 = 1 ⇒ 𝑦 = 1 − 𝑥
3 4 3 4
2 1
∴ (1) ⇒ 𝑥 = (1 − 𝑥)
3 4
2 1 1
⇒ 𝑥+ 𝑥=
3 4 4
2 1 1
⇒ 3𝑥 + 4𝑥 = 4
8𝑥+3𝑥 1
⇒ 12
=4
11 3
⇒ 3
𝑥=1⇒ 𝑥 = 11
3 8
∴ 𝑦 = 1 − 𝑥 ⇒ 𝑦 = 1 − 11 ⇒ 𝑦 = 11
Thus, the required fixed probability vector is 𝑄 = [𝑥 𝑦] = [ 3 8
]
11 11
𝟎 𝟏 𝟎
4) Find the fixed probability vector of the regular stochastic matrix 𝐏 = [𝟎 𝟎 𝟏].
𝟏 𝟏
𝟎
𝟐 𝟐
n
Sol : Since the given matrix P is of order 3x3, the required fixed probability vector Q must be also order of
3x3.
Let 𝑄 = [𝑥 𝑦 𝑧], 𝐹𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑥 ≥ 0, 𝑦 ≥ 0, 𝑧 ≥ 0&𝑥 + 𝑦 + 𝑧 = 1
Also, QP=Q
4 4 4 4 8 3
⇒ 𝑥 = 11 , 𝑦 = 11 , 𝑧 = 1 − 11 − 11 = 1 − 11 = 11
𝑄 = [𝑥 𝑦 𝑧] = [ 4 4 3
]
11 11 11
𝟎 𝟏 𝟎
𝟏 𝟏 𝟏
6) Find the fixed probability vector of the regular stochastic matrix 𝐏 = [ 𝟔 𝟐 𝟑 ].
𝟐 𝟏
𝟎 𝟑 𝟑
n
Sol :
0 1 0
1 1 1
Given, 𝑃 = [ 6 2 3]
2 1
0 3 3
Since the given matrix P is of order 3x3, the required fixed probability vector Q must be also order of 3x3.
1 6 1 6 7 3
⇒ 𝑥 = 10 , 𝑦 = 10 , 𝑧 = 1 − 10 − 10 = 1 − 10 = 10
𝑄 = [𝑥 𝑦 𝑧] = [ 1 6 3
]
10 10 10
𝟐 𝟏
𝟎
𝟑 𝟑
𝟏 𝟏
7) Find the fixed probability vector of the regular stochastic matrix 𝐏 = 𝟐
𝟎 𝟐
.
𝟏 𝟏
[𝟐 𝟐
𝟎]
n
Sol :
2 1
0
3 3
1 1
Given, 𝑃 = 2
0 2
1 1
[2 2
0]
Since the given matrix P is of order 3x3, the required fixed probability vector Q must be also order of 3x3.
Let 𝑄 = [𝑥 𝑦 𝑧]𝐹𝑜𝑟𝑒𝑣𝑒𝑟𝑦𝑥 ≥ 0, 𝑦 ≥ 0, 𝑧 ≥ 0&𝑥 + 𝑦 + 𝑧 = 1
Also, QP=Q
2 1
0 3 3
1 1
∴ 𝑄𝑃 = [𝑥 𝑦 𝑧]
2
0 2
1 1
[2 2 0]
1 1 2 1 1 1
⇒ 𝑄𝑃 = [2 𝑦 + 2 𝑧 3
𝑥+ 𝑧
2 3
𝑥 + 𝑦]
2
𝑊𝐾𝑇
𝑄𝑃 = 𝑄
1 1 2 1 1 1
⇒ [2 𝑦 + 2 𝑧 3
𝑥 + 2𝑧 3
𝑥 + 2 𝑦] = [𝑥 𝑦 𝑧]
1 1 2 1 1 1
⇒ 𝑥 = 2 𝑦 + 2 𝑧, 𝑦 = 3 𝑥 + 2 𝑧, 𝑧 = 3 𝑥 + 2 𝑦
⇒ 3𝑥 − 1 = 0, 𝑥 − 9𝑦 = −3,8𝑥 + 9𝑦 = 6
9 10 8
⇒ 𝑥 = 27 , 𝑦 = 27 , 𝑧 = 27
∴ 𝑄[𝑥 𝑦 𝑧] = [ 9 10 8
]
27 27 27
𝟏−𝒂 𝒂 𝟏−𝒃 𝒃
8) If 𝐏𝟏 = [ ] and 𝐏𝟐 = [ ]. Show that P1, P2 and P1 P2 are stochastic matrices.
𝒃 𝟏−𝐛 𝒂 𝟏−𝒂
1−𝑎 𝑎 1−𝑏 𝑏
Now, 𝐏𝟏 𝐏𝟐 = [ ][ ]
𝑏 1−b 𝑎 1−𝑎
(1 − 𝑎)(1 − 𝑏) + 𝑎2 𝑏(1 − 𝑎) + 𝑎(1 − 𝑎) 𝑎1 𝑏1
=[
(1 2 ] = [𝑎 𝑏2
] (𝑠𝑎𝑦)
𝑏(1 − 𝑏) + 𝑎(1 − 𝑏) − 𝑎)(1 − 𝑏) + 𝑏 2
We shall know that 𝑎1 + 𝑏1 = 1 and 𝑎2 + 𝑏2 = 1
Now,
𝑎1 + 𝑏1 = (1 − 𝑎)(1 − 𝑏) + 𝑎2 + 𝑏(1 − 𝑎) + 𝑎(1 − 𝑎)
= (1 − 𝑎){1 − 𝑏 + 𝑏} + 𝑎{𝑎 + 1 − 𝑎}
=1−𝑎+𝑎
=1
∴ 𝑎1 + 𝑏1 = 1
Also,
𝑎2 + 𝑏2 = 𝑏(1 − 𝑏) + 𝑎(1 − 𝑏) + (1 − 𝑏)(1 − 𝑎) + 𝑏 2
= 𝑏{1 − 𝑏 + 𝑏} + (1 − 𝑏){𝑎 + 1 − 𝑎}
=𝑏+1−𝑏
=1
∴ 𝑎2 + 𝑏2 = 1
Markov Chain
A Markov Chian or Markov process is a stochastic model describing a sequence of possible events in which the
probability of each event depends only on the state attained in the previous event.
Ex (1).
Ex (2).
𝒕 𝟎 𝟏
1) Consider the 𝒕. 𝒑. 𝒎. of the 𝑷 = [𝟏⁄ 𝟏⁄ ] , hence find 𝑷𝟐 , 𝑷𝟑 , also find 𝒑(𝟑) take the initial
𝒃 𝟐 𝟐
probability distribution the person rolled a die and decided that he will go by bus if the number
appeared on the face is divisible by 3.
Soln: Given,
𝑡 0 1 𝑝𝑡𝑡 𝑝𝑡𝑏
𝑃 = [1⁄ 1⁄ ]=[𝑝 𝑝𝑏𝑏 ]
𝑏 2 2 𝑡𝑏
0 1 0 1
∴ 𝑃2 = [1⁄ 1⁄ ] . [1⁄ 1⁄ ]
2 2 2 2
1⁄ 1⁄ (2) (2)
⇒ 𝑃2 = [ 2 2] = [ 𝑝 𝑡𝑡 𝑝 𝑡𝑏 ]
1⁄ 3⁄ 𝑝(2) 𝑡𝑏 𝑝(2) 𝑏𝑏
4 4
1⁄ 1⁄
∴ 𝑃3 = 𝑃2 . 𝑃 = [ 2 2].[ 0 1
]
1⁄ 3⁄ 1 1
⁄2 ⁄2
4 4
1⁄ 3⁄ 𝑝 (3)
𝑝(3) 𝑡𝑏
⇒ 𝑃3 = [ 4 4] = [ 𝑡𝑡
]
3⁄ 5⁄ 𝑝(3) 𝑡𝑏 𝑝(3) 𝑏𝑏
8 8
1
∴ 𝑝(2) 𝑡𝑏 = Means that the probability that the system changes from the state 𝑡 → 𝑏 in exactly 2
2
1
steps is 2.
3
∴ 𝑝(3) 𝑏𝑡 = 8 Means that the probability that the system changes from the state 𝑏 → 𝑡 in exactly 3
3
steps is 8.
Given, the probability distribution is evaluated from the person rolled a die and decided that he will
go by bus if the number appeared on the face is divisible by 3.
2 1 1 2
∴ 𝑝(𝑏) = 6 = 3 , ⇒ 𝑝(𝑡) = 1 − 𝑝(𝑏) = 1 − 3 = 3
2 1
𝑝(0) = [𝑝(𝑡) 𝑝(𝑏)] = [3 3
]
1 ⁄2 1⁄
∴𝑝 (2)
=𝑝 (0) 2
𝑃 =
2
[3
1
] . [ 2] = [ 5 7
]
3 1 3
⁄4 ⁄4 12 12
1 3⁄
⁄4
(3) (0) 3 2 1 4] = [ 7 17
= [𝑝𝑡 (3)
∴𝑝 =𝑝 𝑃 = [3 3
] . [
3 5 24 24
] 𝑝𝑏 (3) ]
⁄8 ⁄8
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 14 TAKEITEASY ENGINEERS
Inspire before you expire…, TIE- Notes and Resources App BCS301
7
∴ The probability of travelling by train after 3 days =24
17
∴ The probability of travelling by bus after 3 days =24
𝟏⁄ 𝟏⁄
2) The transition matrix P of a Markov chain is given by [ 𝟐 𝟐] with initial probability
𝟑⁄ 𝟏⁄
𝟒 𝟒
distribution𝒑(𝟎) = [𝟏⁄𝟒 𝟑⁄𝟒]. Define and find the following i) 𝒑𝟐𝟏 (𝟐) ii) 𝒑𝟏𝟐 (𝟐) iii) 𝒑(𝟐) iv) 𝒑𝟏 (𝟐) v)
the vector 𝒑(𝟎) 𝑷𝒏 approaches. Vi) The matrix approaches.
Soln:
1⁄ 1⁄
Given transition matrix 𝑃 = [ 2 2]
3⁄ 1⁄
4 4
1⁄ 1⁄ 1⁄ 1⁄
𝑃2 = 𝑃. 𝑃 = [ 2 2] . [ 2 2]
3⁄ 1⁄ 3⁄ 1⁄
4 4 4 4
5⁄ 3⁄ (2)
8 ] = [𝑝11 𝑝12 (2)
⇒ 𝑃2 = [ 8 ]
9⁄ 7⁄ 𝑝21 (2) 𝑝22 (2)
16 16
(2) 9 (2) 3
∴ 𝑝21 = 16 , 𝑝12 = 8
Given initial probability distribution is 𝑝(0) = [1⁄4 3⁄4]
5⁄ 3⁄
∴ 𝑝(2) = 𝑝(0) 𝑃2 = [1⁄4 3⁄4]. [ 8 8]
9⁄ 7⁄
16 16
37 27
⇒ 𝑝(2) = [64 64] = [𝑝1 (2) 𝑝2 (2) ]
37
∴ 𝑝1 (2) = 64
𝑝(0) 𝑃𝑛 Approaches the unique probability vector 𝑄 = [𝑥 𝑦] for which 𝑄𝑃 = 𝑄
1⁄ 1⁄
⇒ [𝑥 𝑦] [ 2 2] = [𝑥 𝑦]
3⁄ 1⁄
4 4
𝑥 3𝑦 𝑥 𝑦
⇒ [2 + 4 2 + 4 ] = [𝑥 𝑦]
𝑥 3𝑦 𝑥 𝑦
⇒2+ 4
= 𝑥, 2 + 4 = 𝑦
𝑥 3𝑦 𝑥 𝑦
⇒ + = 𝑥, + = 𝑦
2 4 2 4
𝑥 3(1−𝑥)
⇒ −2 + 4 = 0
𝑥 3𝑥 3
⇒ −2 − 4 = −4
5 3
⇒ 4𝑥 = 4
3 2
⇒𝑥=5⇒𝑦=5
3 2
∴ 𝑄[𝑥 𝑦] = [5 5
]
3 2
Therefore, the vector 𝑝(0) 𝑃𝑛 approaches the vector [5 5
]
3 2
Soln:
1⁄ 1⁄
2 0 2
Given transition matrix of Markov chain 𝑃 = [ 1 0 0 ]
1⁄ 1⁄ 1⁄
4 2 4
1 1
And the initial probability distribution 𝑝(0) = (2 2
0)
1⁄ 1⁄ 1⁄ 1⁄
2 0 2 2 0 2
∴ 𝑃2 = 𝑃. 𝑃 = [ 1 0 0 ].[ 1 0 0 ]
1⁄ 1⁄ 1⁄ 1⁄ 1⁄ 1⁄
4 2 4 4 2 4
3⁄ 1⁄ 3⁄ 𝑝 11 𝑝 12 𝑝 (2)13
(2) (2)
8 4 8
⇒ 𝑃2 == 1⁄2 0 1⁄ = [𝑝(2)
2 21 𝑝
(2)
22 𝑝
(2)
23 ]
11⁄ 1⁄ 3⁄ (2) (2) (2)
[ 16 𝑝 31 𝑝 32 𝑝 33
8 16]
∴ 𝑝(2)13 = 3⁄8 , 𝑝(2) 23 = 1⁄2
3⁄ 1⁄ 3⁄
8 4 8
(2) (0) 2 1 1 1 1
∴ 𝑝 = 𝑝 𝑃 = (2 2 0) . ⁄2 0 ⁄2
11 1 3
[ ⁄16 ⁄8 ⁄16]
7 1 7
∴ 𝑝(2) = [𝑝1 (2) 𝑝2 (2) 𝑝3 (2) ] = [16 8 16
]
7
∴ 𝑝1 (2) = 16
𝟎 𝟐⁄𝟑 𝟏⁄𝟑
4) Prove that the Markov chain whose t.p.m is 𝑷 = 𝟏⁄𝟐 𝟎 𝟏⁄𝟐 is irreducible. Find the
𝟏 𝟏⁄
[ ⁄𝟐 𝟐 𝟎 ]
corresponding stationary probability vector.
Soln:
5) A student’s study habits are as follows. If he studies one night, he is 30% sure to study the next
night. On the other hand, if he does not study one night, he is 40% sure to study the next night.
Find the transition matrix for the chain of his study.
6) A software engineer goes to his work-place every day by motor bike or by car. He never goes by a
bike on two consecutive days; but if he goes by car on a day then he is equally likely to go by car
or bike on the next day. Find the transition matrix for the chain of the mode of transport he uses.
If car is used on the first day of a week, find the probability that, (i) Bike is used, (ii) Car is used
on the fifth day.
n
Sol : Given the Markov chain of the mode of transport has the following two states:
𝑎1 = Using bike 𝑎2 =Using car
And to find,
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 17 TAKEITEASY ENGINEERS
Inspire before you expire…, TIE- Notes and Resources App BCS301
𝑝11 =Probability of using bike on a day, given that bike has been used on the previous day=0
(Because bike is not used on two consecutive days)
𝑝12 =Probability of using car on a day, given that the bike has been used on the previous day=1
(Because it is certain that car is used on a day if bike is used on the previous day)
𝑝21 =Probability of using bike on a day, given that car is used on the previous day= ½
(Because using car or bike on a day are equally likely if car is used on the previous day)
𝑝22 =Probability of using car on a day, given that car is used on the previous day=1/2
𝑝11 𝑝12 0 1
Hence the transition matrix for the chain of the mode of transport is 𝑃 = [𝑝 𝑝 ] = [ 1 1 ].
21 22 2 2
∴ The initial probability distribution vector of the mode of transport is given by 𝑝(0) = [𝑝1 (0) 𝑝2 (0) ] =
[0 1]
0 1 0 1
𝑃2 = 𝑃. 𝑃 = [1⁄ 1⁄ ] . [1⁄ 1⁄ ]
2 2 2 2
1⁄ 1⁄
⇒ 𝑃2 = [ 2 2]
1⁄ 3⁄
4 4
1⁄ 1⁄ 1 1
⇒ 𝑃4 = 𝑃2 . 𝑃2 = [ 2 2] . [ ⁄2 ⁄2]
1⁄ 3⁄ 1⁄ 3⁄
4 4 4 4
3⁄ 5⁄
⇒ 𝑃4 = [ 8 8 ]
5⁄ 11⁄
16 16
3⁄ 5⁄
∴ 𝑝(4) = 𝑝(0) 𝑃4 = [0 1]. [ 8 8 ]
5⁄ 11⁄
16 16
(4) (4) (4) 5 11
⇒ 𝑝 = [𝑝1 𝑝2 ] = [16 16]
5
Therefore, on the fifth day the probability of using the bike is 𝑝1 (4) = 16 , the probability of
11
using the car is 𝑝2 (4) = 16.
7) A man’s smoking habits are as follows. If he smokes filter cigarettes one week, he switches to non-
filter cigarettes the next week with the probability 0.2. On the other hand, if he smokes non filter
cigarettes one week there is a probability of 0.7that he will smoke non filter cigarettes the next
week as well. In the long run how often does he smoke filter cigarettes?
n
Sol :
Let A= Smoking filter cigarettes B= Smoking non filter cigarettes
Therefore, the associated transition probability matrix is as follows
𝑝 (1) 𝑝𝐴𝐵 (1) 0.8 0.2
𝑃 = [ 𝐴𝐴 (1) ]=[ ]
𝑝𝐵𝐴 𝑝𝐵𝐵 (1) 0.3 0.7
Let the unique probability vector 𝑄 = [𝑥 𝑦] for which 𝑄𝑃 = 𝑄, ∀𝑥 + 𝑦 = 1
0.8 0.2
∴ [𝑥 𝑦] [ ] = [𝑥 𝑦]
0.3 0.7
⇒ [0.8𝑥 + 0.3𝑦 0.2𝑥 + 0.7𝑦] = [𝑥 𝑦]
⇒ 0.8𝑥 + 0.3𝑦 = 𝑥, 0.2𝑥 + 0.7𝑦 = 𝑦
⇒ 0.2𝑥 − 0.3𝑦 = 0,0.2𝑥 − 0.3𝑦 = 0
⇒ 0.2𝑥 − 0.3(1 − 𝑥) = 0
⇒ 0.2𝑥 + 0.3𝑥 − 0.3 = 0
⇒ 0.5𝑥 = 0.3
0.3 3 0.2 2
⇒ 𝑥 = 0.5 = 5 ⇒ 𝑦 = 0.5 = 5
3 2
∴ 𝑄 = [5 5
] = [𝑝𝐴 𝑝𝐵 ]
3
Thus, in the long run, he will smoke filter cigarettes 5 or 60% of the time.
(SMOKING IS INJURIOUS TO HEALTH, IT CAUSES CANCER AND TOBACCO CAUSES PAINFUL DEATH)
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 18 TAKEITEASY ENGINEERS
Inspire before you expire…, TIE- Notes and Resources App BCS301
8) Three boys A, B, C are throwing ball to each other. “A” always throws the ball to “B” and “B”
always throws ball to “C”. “C” is just as likely to throw the ball to “B” as to “A”. If, “C” was the
first person to throw the ball, find the probabilities that after three throws.
i) A has the ball
ii) B has the ball
iii) C has the ball
Soln: Given three boys A, B, C are throwing a ball associated with the transition probability matrix of the
Markov chain as below,
𝑝𝐴𝐴 (1) 𝑝𝐴𝐵 (1) 𝑝𝐴𝐶 (1) 0 1 0
𝑃 = [𝑝𝐵𝐴 (1)
𝑝𝐵𝐵 (1) (1)
𝑝𝐵𝐶 ] = [ 0 0 1]
1 1
⁄2 ⁄2 0
𝑝𝐶𝐴 (1) 𝑝𝐶𝐵 (1) 𝑝𝐶𝐶 (1)
0 1 0 0 1 0
⇒ 𝑃 = 𝑃. 𝑃 = [ 0
2 0 1] . [ 0 0 1]
1⁄ 1⁄ 0 1⁄ 1⁄ 0
2 2 2 2
0 0 1
1 1
⇒ 𝑃2 = [ ⁄2 ⁄2 0 ]
0 1⁄2 1⁄2
0 0 1 0 1 0
3 2 1⁄ 1⁄ 0 0 0 1]
∴ 𝑃 = 𝑃 .𝑃 = [ 2 2 ].[
1 1
0 1⁄2 1⁄2 ⁄2 ⁄2 0
1⁄ 1⁄
2 2 0
⇒ 𝑃3 = 0 1⁄2 1⁄2
1 1 1
[ ⁄4 ⁄4 ⁄2]
Initially if C has the ball, associated with the initial probability vector is given by 𝑝(0) = [0 0 1]
1⁄ 1⁄
2 2 0
∴ 𝑝(3) = 𝑝(0) 𝑃3 = [0 0 1]. 0 1⁄2 1⁄2 = [1⁄4 1⁄4 1⁄2]
1⁄ 1⁄ 1⁄
[[ 4 4 2]]
∴ 𝑝(3) = [𝑝𝐴 (3) 𝑝𝐵 (3) 𝑝𝐶 (3) ] = [1⁄4 1⁄4 1⁄2]
1 1
Thus, after three throws, the probability that the ball is with A is 𝑝𝐴 (3) = 4, with B is 𝑝𝐵 (3) = 4 and
1
with C is 𝑝𝐶 (3) = 2
9) A gambler’s luck follows a pattern: if he wins a game, the probability of winning next game is 0.6.
However, he loses the game, the probability of losing the next game is 0.7. There is an even chance
of gambler winning the first game if so,
i) What is the probability of winning second game.
ii) What is the probability of winning the third game.
iii) In the long run, how often he will win.
10) A Salesman’s territory consists of three cities A, B, C. He never sells in the same city on
successive days. If he sells in city A then the next day he sells in city B. If he sells in B or C then
the next day is twice as likely to sell in city A as than other cities. In long run, how often does he
sells in each of the city.
n
Sol :
Given a salesman can move to the cities A, B, C with the probabilities as below,
(1)
𝑝𝐴𝐵 (1) 𝑝𝐴𝐶 (1) 0 1 0
𝐴 𝑝𝐴𝐴 2 1
𝑃 = 𝐵 [𝑝𝐵𝐴 (1) 𝑝𝐵𝐵 (1) 𝑝𝐵𝐶 (1) ] = [ ⁄3 0 ⁄3]
𝐶 𝑝𝐶𝐴 (1) 𝑝𝐶𝐵 (1) 𝑝𝐶𝐶 (1) 2 1
⁄3 ⁄3 0
Let 𝑄 = [𝑥 𝑦 𝑧] be the probability vector for which x+y+z=1
∴ 𝑄𝑃 = 𝑄
0 1 0
2⁄ 1
∴ [𝑥 𝑦 𝑧]. [ 3 0 ⁄3] = [𝑥 𝑦 𝑧]
2⁄ 1⁄
3 3 0
2𝑦 2𝑧 𝑧 𝑦
⇒[ + 3
𝑥+
3
] = [𝑥 𝑦 𝑧]
3 3
2𝑦 2𝑧 𝑧 𝑦
⇒ 3
+ 3 = 𝑥 , 𝑥 +3 =𝑦 , 3
=𝑧
⇒ 3𝑥 − 2𝑦 − 2𝑧 = 0 , 3𝑥 − 3𝑦 + 𝑧 = 0
⇒ 3𝑥 − 2𝑦 − 2(1 − 𝑥 − 𝑦) = 0 , 3𝑥 − 3𝑦 + (1 − 𝑥 − 𝑦) = 0
⇒ 3𝑥 − 2𝑦 − 2 + 2𝑥 + 2𝑦 = 0 , 3𝑥 − 3𝑦 + 1 − 𝑥 − 𝑦 = 0
⇒ 5𝑥 = 2 , 2𝑥 − 4𝑦 = −1
2
⇒𝑥=5
9 9
⇒ 4𝑦 = 5 ⇒ 𝑦 = 20
2 9 3
⇒ 𝑧 = 1 − 𝑥 − 𝑦 ⇒ 𝑧 = 1 − 5 − 20 ⇒ 𝑧 = 20
∴ 𝑄 = [𝑥 𝑦 𝑧 ] = [2 9 3
]
5 20 20
Thus, the salesman in the long run sells,
2 9 3
𝑖𝑛 𝑐𝑖𝑡𝑦 𝐴 = 40% , 𝑖𝑛 𝑐𝑖𝑡𝑦 𝐵 = 45% , 𝑖𝑛 𝑐𝑖𝑡𝑦 𝐶 = 15%
5 20 20
Soln:
Given a man trades his car for a new car with the probabilities as below,
(1)
𝑀 𝑝𝑀𝑀 𝑝𝑀𝐴 (1) 𝑝𝑀𝑆 (1) 𝑀 0 1 0
𝑃 = 𝐴 [ 𝑝𝐴𝑀 (1)
𝑝𝐴𝐴 (1)
𝑝𝐴𝑆 ] = 𝐴 [ 0
(1) 0 1]
1⁄ 1
𝑆 𝑝𝑆𝑀 (1)
𝑝𝑆𝐴 (1)
𝑝𝑆𝑆 (1) 𝑆 2 ⁄2 0
Also given, he has bought his first car in 2000 was Santro.
(0) (0) (0)
∴ The initial probability vector 𝑝(0) = [𝑝𝑀 𝑝𝐴 𝑝𝑆 ] = [0 0 1]
0 1 0 0 1 0 0 0 1
1
0 1] = [ ⁄2 ⁄2 0 ] 1
∴ 𝑃2 = 𝑃. 𝑃 = [ 0 0 1] . [ 0
1⁄ 1⁄ 0 1⁄ 1⁄ 0
2 2 2 2 0 1⁄2 1⁄2
1⁄ 1⁄
0 0 1 0 1 0 2 2 0
1 1
⇒ 𝑃3 = 𝑃2 . 𝑃 = [ ⁄2 ⁄2 0 ] . [ 0 0 1] = 0 1⁄ 1⁄
2 2
1 1
0 1⁄2 1⁄2 ⁄2 ⁄2 0 1⁄ 1⁄ 1⁄
[ 4 4 2]
0 0 1
1 1
∴ 𝑝(2) = 𝑝(0) 𝑃2 = [0 0 1]. [ ⁄2 ⁄2 0 ] = [0 1⁄2 1⁄2] = [𝑝𝑀 (2) (2)
𝑝𝐴
(2)
𝑝𝑆 ]
0 1⁄2 1⁄2
1⁄ 1⁄
2 2 0
∴ 𝑝(3) = 𝑝(0) 𝑃3 = [0 0 1]. 0 1⁄2 1⁄2 = [1⁄4 1⁄4 1⁄2] = [𝑝𝑀 (3) (3)
𝑝𝐴
(3)
𝑝𝑆 ]
1 1 1
[ ⁄4 ⁄4 ⁄2]
(2)
i) ∴ The probability to have a Santro car in the year 2002, 𝑝𝑆 = 1⁄2 = 50%
(2)
ii) ∴ The probability to have a Maruthi car in the year 2002,𝑝𝑀 = 0 = 0%
(3)
iii) ∴ The probability to have an Ambassador car in the year 2003,𝑝𝐴 = 1⁄4 = 25%
(3)
iv) ∴ The probability to have a Santro car in the year 2003,𝑝𝑆 = 1⁄2 = 50%
***
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY, | 21 TAKEITEASY ENGINEERS
Inspire before you expire…, TIE- Notes and Resources BCS301
Introduction:
Sampling is a statistical method of obtaining representative data (observations) from a group. We have
been using sampling concepts in our day to day lives knowingly or unknowingly; for instance we take a
handful of rice to check the rice quality of the full lot. This is an example of random sampling from a large
population.
Population (Universe):
The group of objects (individuals) under study
is called population or universe. Universe may be finite
or infinite.
Sample:
A part containing objects(individuals), selected from
the population is called a sample.
Sample size:
The number of individuals in a sample is called a sample size. If the sample size n is less than or equal to 30, then
the sample is aid to be small, otherwise it is called a large sample.
Random Sampling:
The selection of objects (individuals) from the universe in such a way that each object (individual) of the
universe has the same chance of being selected is called random sampling. Lottery system is the most common
example of random sampling.
Every random sampling need not be simple. For example, if balls are drawn without replacement from
a bag of balls containing different balls; the probability of success changes in every trial. Thus, the sampling
though random is not simple.
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |1
Inspire before you expire…, TIE- Notes and Resources BCS301
Simple Sampling:
Simple sampling is a special case of random sampling in which each event has same probability of
success or failure.
Hypothesis:
A hypothesis is an assumption based on insubstantial evidences that lends itself to further testing and
experimentation. For example a farmer claims significant increase in crop production after using a particular
fertilizer and after a season of experimenting, his hypothesis may be proved true or false. Any hypothesis may
be accepted or rejected as per specific confidence levels and must be admissible to refutation.
Null Hypothesis:
The null hypothesis is a general statement or default position that there is no relationship between two
measured phenomena or no association among groups.
Example: Given the test scores of two random samples, one of men and one of women, does one group differ from
the other? A possible null hypothesis is that the mean male score is the same as the mean female score:
H0: μ1 = μ2
where
H0 = the null hypothesis,
μ1 = the mean of population 1, and
μ2 = the mean of population 2.
A stronger null hypothesis is that the two samples are drawn from the same population, such that the
variances and shapes of the distributions are also equal.
Alternative Hypothesis:
It is the opposite statement of null hypothesis and denoted by : 1 2
Example: A level of significance of p=0.05 means that there is a 95% probability that the results found in the
study are the result of a true relationship/difference between groups being compared. It also means that there
is a 5% chance that the results were found by chance alone and no true relationship exists between groups.
Standard Error:
The standard deviation of the sampling distribution of a statistic is Known as Standard Error (S.E.).
Precision:
Reciprocal of standard error is known as precision.
Confidence Limits:
In short, confidence limits show how accurate an estimation of the mean is or is likely to be. Confidence limits
are the lowest and the highest numbers at the end of a confidence interval.
Confidence Interval:
A confidence interval is a range around a measurement that conveys how precise the measurement is. A
confidence interval, in statistics, refers to the probability that a population parameter will fall between a set of
values for a certain proportion of times. Analysts often use confidence intervals that contain either 95% or 99% of
expected observations.
Critical Value:
A critical value is the value of the test statistic which defines the upper and lower bounds of a confidence interval,
or which defines the threshold of statistical significance in a statistical test.
Level of Significance
Types of test 1% 5% 10%
Two tailed test 2.58 1.96 1.645
One tailed test 2.33 1.645 1.28
Critical Region:
A critical region, also known as the rejection region, is a set of values for the test statistic for which the null
hypothesis is rejected. i.e. if the observed test statistic is in the critical region then we reject the null hypothesis and
accept the alternative hypothesis.
Test of hypothesis:
Let 𝑥 be the observed number of successes in a sample size of 𝑛 and 𝜇 = 𝑛𝑝 be the expected number of successes .Then
the standard normal variate 𝑍is defined as
𝑥−𝜇 𝑥−𝑛𝑝
𝑍 = 𝜎 = 𝑛𝑝𝑞
√
PROBLEMS
1) A coin is tossed 1000 times and head turns up 540 times. Decide on the hypothesis that the
coin is unbiased at 1 % level of significance.
Sol.
Let us suppose that the coin is unbiased.
and let p= the probability of getting a head in one toss=1/2=0.5
Since p+q=1, q=1-p=1/2=0.5
Expected number of heads in 1000 tosses=np=1000x0.5=500 , npq=250
∴ The difference is 𝑥 − 𝜇 =540-500=40
𝑥−𝜇 𝑥−𝑛𝑝
∴ Consider 𝑍 = 𝜎 = 𝑛𝑝𝑞
√
40
⇒𝑍= = 2.53 < 2.58
√250
1% level of significance = 99% confidence level.
Therefore accept the hypothesis that the coin is unbiased.
2) A coin is tossed 400 times and turns up head 216 times. Test the hypothesis that the coin is unbiased
at 5%level of significance.
Sol.
Let us suppose that the coin is unbiased.
and let p= the probability of getting a head in one toss=1/2=0.5
Since p+q=1, q=1-p=1/2=0.5
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |3
Inspire before you expire…, TIE- Notes and Resources BCS301
Expected number of heads in 400 tosses=np=400x0.5=200 , npq=100
∴ The difference is 𝑥 − 𝜇 =216-200=16
𝑥−𝜇 𝑥−𝑛𝑝
∴ Consider 𝑍 = 𝜎 = 𝑛𝑝𝑞
√
16 16
⇒𝑍= = 10 = 1.6 < 1.96
√100
Critical value of z at alpha = 0.05 is 1.96
Therefore accept the hypothesis that the coin is unbiased at the 5% level of significance.
3) A coin was tossed 1600 times and the tailed turned up 864 times. Test the hypothesis that the
Coin is unbiased at 1% level of significance.
Sol.
Let us suppose that the coin is unbiased .
and let p= the probability of getting a tail in one toss=1/2=0.5
Since p+q=1, q=1-p=1/2=0.5
Expected number of tailed in 1600 tosses=np=1600x0.5=800, npq=400
∴ The difference is 𝑥 − 𝜇 =864-800=64
𝑥−𝜇 𝑥−𝑛𝑝
∴ Consider 𝑍 = 𝜎 = 𝑛𝑝𝑞
√
64
⇒𝑍= = 3.2 > 2.58
√400
1% level of significance = 99% confidence level.
Therefore accept the hypothesis that the coin is biased.
4) In 324 throws of a six faced 'die' , an odd number turned up 181 times. Is it possible to think
that the 'die' is an unbiased one?
Sol.
Let us suppose that the die is unbiased.
and let p= the probability of the turn up of an odd number is=3/6=1/2=0.5
Since p+q=1, q=1-p=1/2=0.5
Expected number of successes=np=324x0.5=162, npq=81
∴ The difference is 𝑥 − 𝜇 =181-162=19
𝑥−𝜇 𝑥−𝑛𝑝
∴ Consider 𝑍 = 𝜎 = 𝑛𝑝𝑞
√
19 19
⇒𝑍= = = 2.11 < 2.58
√81 9
Thus we can that the die is unbiased
5) A die is thrown 9000 times and a throw of 3 or 4 was observed 3240 times. Show that the die
Can not be regarded as an unbiased one.
Sol.
The probability of getting 3 or 4 in a single through is 𝑝 = 2⁄6 = 1⁄3
1 2
And 𝑞 = 1 − 𝑝 = 1 − =
3 3
1
∴ Expected number of success = × 9000 = 3000
3
∴ The difference =3240-3000=240
𝑥−𝑛𝑝
𝑍 = 𝑛𝑝𝑞
√
1
(3240)−(9000× )
3
Consider ⇒ 𝑍 = 1 2
√9000× ×
3 3
240
⇒𝑍=
√2000
⇒ 𝑍 = 5.37
Since Z=5.37>2.58 ,
We conclude that the die is biased.
PROBLEMS
1. A coin is tossed 400 times and it turns up head 216 times. Discuss whether the coin may be regarded as unbiased one.
Sol.
1
Set the null hypothesis 𝐻0 ; 𝑃 = 2
1
Set the Alternative hypothesis 𝐻1 : 𝑃 ≠ 2
The level of significance 𝛼 = 0.05 (5%)
𝑝−𝑃
∴ The test statistic 𝑍 = , where P+Q=1 => Q=1-P
𝑃𝑄
√
𝑛
Given, the coin is tossed and it turns up in the equal proportion
1
𝑃 = ⇒𝑄 =1−𝑃
2
1 1
⇒𝑄 =1−=
2 2
And the coin turns up head 216 times when it tossed 𝑛 = 400times
216
∴𝑝= = 0.54
400
0.54 − 0.5
∴𝑍=
√0.5 × 0.5
400
0.04
⇒𝑍= = 1.6
√0.000625
At 5% level, the tabulated value of 𝑍𝛼 is 1.96
Since |𝑍| = 1.6 < 1.96
Hence, the null hypothesis is accepted at 5% level of significance and the coin may be regarded as unbiased.
2. In a city of sample of 500 people, 280 are tea drinkers and the rest are coffee drinkers. Can we assume that both coffee
and tea are equally popular in this city at 5% Los.
Sol.
1
Set the null hypothesis 𝐻0 ; 𝑃 = ( Both coffee and tea drinkers are equally popular)
2
1
Set the Alternative hypothesis 𝐻1 : 𝑃 ≠ 2
The level of significance 𝛼 = 0.05 (5%)
𝑝−𝑃
∴ The test statistic = , where P+Q=1 => Q=1-P
𝑃𝑄
√
𝑛
1
𝑃= ⇒𝑄 =1−𝑃
2
1 1
⇒𝑄 =1−2=2
280
∴ 𝑝 = 500 = 0.56 , where 𝑛 = 500
3. A manufacturing company claims that at least 95% of its products supplied confirm to the specifications out of a
sample of 200 products, 18 are defective. Test the claim at 5% Los.
Sol.
Set the null hypothesis 𝐻0 ; 𝑃 = 95% = 0.95
Set the Alternative hypothesis 𝐻1 : 𝑃 ≠ 0.95
The level of significance 𝛼 = 0.05 (5%)
𝑝−𝑃
∴ The test statistic 𝑍 = , where P+Q=1 => Q=1-P
𝑃𝑄
√
𝑛
Given,
𝑃 = 95% = 0.95 ⇒ 𝑄 = 1 − 𝑃
⇒ 𝑄 = 1 − 0.95 = 0.05
Found 18 products are defective out of 200 sample products
∴ The total defective less products( Non defective)=200-18=182
182
∴𝑝= = 0.91
200
0.91 − 0.95
∴𝑍=
√0.95 × 0.05
200
0.04
⇒𝑍=− = −2.5955
√0.0002375
At 5% level, the tabulated value of 𝑍𝛼 is 1.96
Since |𝑍| = 2.5955 > 1.96
Hence, the null hypothesis is rejected at 5% level of significance
4. If a sample of 300 units of a manufactured product 65 units were found to be defective and in another sample of 200
units, there were 35 defectives. Is there significant difference in the proportion of defectives in the samples at 5%
Los.
Sol.
Set the null hypothesis 𝐻0 ; 𝑃1 = 𝑃2
Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
The level of significance 𝛼 = 0.05 (5%)
Given
𝑛1 = 300, 𝑛2 = 200
The sample of 300 units of a manufactured product 65 units were found to be defective
65
∴ 𝑝1 = 300 = 0.2166 = 0.22
The sample of 200 units of a manufactured product 35 units were found to be defective
35
∴ 𝑝2 = 200 = 0.1750
𝑥1 +𝑥2
We know that 𝑃 =
𝑛1 +𝑛2
65+35
⇒ 𝑃 = 300+200
100
⇒𝑃=
500
⇒ 𝑃 = 0.2
⇒ 𝑄 = 1 − 𝑃 = 1 − 0.2 = 0.8
5. In a large city A, 20% of a random sample of 900 school boys had a slight physical defect. In another large city B,
18.5% of a random sample of 1600 school boys had the same defect. Is the difference between the proportions
significant?
Sol.
Set the null hypothesis 𝐻0 ; 𝑃1 = 𝑃2
Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
The level of significance 𝛼 = 0.05 (5%)
Given
𝑛1 = 900, 𝑛2 = 1600
𝑥1 = 20%of random sample of 900=0.2x900=180
𝑥2 = 18.5%of random sample of 1600=0.185x1600=296
20 18.5
∴ 𝑝1 = 20% = = 0.2, 𝑝2 = 18.5% = = 0.185
100 100
𝑥 +𝑥
We know that 𝑃 = 𝑛1 +𝑛2
1 2
180 + 296
⇒𝑃=
900 + 1600
476
⇒𝑃=
2500
⇒ 𝑃 = 0.1904 ⇒ 𝑄 = 1 − 𝑃 = 1 − 0.1904 = 0.8096
𝑝1 − 𝑝2
∴𝑍=
1 1
√𝑃𝑄 ( + )
𝑛1 𝑛2
0.2 − 0.185
⇒𝑍=
1 1
√(0.1904 × 0.8096) (
900 + 1600))
0.015
⇒𝑍=
√(0.1541)(0.00173)
0.015
⇒𝑍=
√0.00026
0.015
⇒𝑍=
0.01612
⇒ 𝑍 = 0.9305
At 5% level, the tabulated value of 𝑍𝛼 is 1.96
Since |𝑍| = 0.9305 < 1.96
Hence, the null hypothesis 𝐻0 is accepted at 5% level of significance and hence there is no significant difference.
6. Before an increase in excise duty on tea, 800 persons out of a sample of 1000 persons were found to be tea drinkers.
After an increase is excise duty. 800 people were tea drinkers in a sample of 1200 people. Test whether there is a
significant decrease in the consumption of tea after the increase in excise duty at 5% Los.
Sol.
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |7
Inspire before you expire…, TIE- Notes and Resources BCS301
Set the null hypothesis 𝐻0 ; 𝑃1 = 𝑃2
Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
The level of significance 𝛼 = 0.05 (5%)
Given
𝑛1 = 1000, 𝑛2 = 1200 & 𝑥1 = 800, 𝑥2 = 800
800 800
∴ 𝑝1 = = 0.8, 𝑝2 = = 0.6670
1000 1200
𝑥1 +𝑥2
We know that 𝑃 = 𝑛 +𝑛
1 2
800 + 800
⇒𝑃=
1000 + 1200
1600
⇒𝑃=
2200
⇒ 𝑃 = 0.7272 ⇒ 𝑄 = 1 − 𝑃 = 1 − 0.7272 = 0.2728
𝑝1 − 𝑝2
∴𝑍=
1 1
√𝑃𝑄 ( + )
𝑛 1𝑛 2
0.8 − 0.6670
⇒𝑍=
1 1
√(0.7272 × 0.2728) (
1000 + 1200))
0.133
⇒𝑍=
√(0.1983)(0.00183)
0.133
⇒𝑍=
√0.00036
0.133
⇒𝑍=
0.0189
⇒ 𝑍 = 7.037
At 5% level, the tabulated value of Zα is 1.645.
Since |Z| = 7.037 > 1.645
Hence Null Hypothesis 𝐻0 is rejected at 5% level of significance.
There is a significance decrease in the consumption of tea due to increase in excise duty.
7. In a sample of 600 men from a certain city, 450 are found smokers. In another sample of 900 men from another city,
450 are smokers. Do the indicate that the cities are significantly different with respect to the habit of smoking among
men. Test at 5% significance level.
(Warning: Smoking is injurious to health, causes cancer, Tabaco causes painful death)
Sol.
Set the null hypothesis 𝐻0 ; 𝑃1 = 𝑃2
Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
The level of significance 𝛼 = 0.05 (5%)
Given
𝑛1 = 600, 𝑛2 = 900 & 𝑥1 = 450, 𝑥2 = 450
450 450
∴ 𝑝1 = 600 = 0.75, 𝑝2 = 900 = 0.5
𝑥 +𝑥
We know that 𝑃 = 𝑛1 +𝑛2
1 2
450+450
⇒ 𝑃 = 600+900
900
⇒ 𝑃 = 1500 = 0.6
⇒ 𝑃 = 0.6 ⇒ 𝑄 = 1 − 𝑃 = 1 − 0.6 = 0.4
𝑝1 −𝑝2
∴𝑍= 1 1
√𝑃𝑄(𝑛 +𝑛 )
1 2
0.75−0.5
⇒𝑍=
1 1
√(0.6×0.4)( + ))
600 900
0.25
⇒𝑍=
√(0.24)(0.00277)
0.25
⇒𝑍=
√0.0006648
8. One type of air craft is found to develop engine trouble in 5 flights out of a total of 100 and another type in 7 flights
out of a total of 200 flights. Is there a significance difference in the two types of air craft’s so far as engine defects
are concerned? Test at 5% significance level.
Sol.
Set the null hypothesis 𝐻0 ; 𝑃1 = 𝑃2
Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
The level of significance 𝛼 = 0.05 (5%)
Given
𝑛1 = 100, 𝑛2 = 200 & 𝑥1 = 5, 𝑥2 = 7
5 7
∴ 𝑝1 = 100 = 0.05, 𝑝2 = 200 = 0.35
𝑥 +𝑥
We know that 𝑃 = 𝑛1 +𝑛2
1 2
5+7
⇒ 𝑃 = 100+200
12
⇒ 𝑃 = 300
⇒ 𝑃 = 0.04 ⇒ 𝑄 = 1 − 𝑃 = 1 − 0.04 = 0.96
𝑝1 −𝑝2
∴𝑍= 1 1
√𝑃𝑄(𝑛 +𝑛 )
1 2
0.05−0.35
⇒𝑍= 1 1
√(0.04×0.96)( + ))
100 200
0.3
⇒𝑍=−
√(0.384)(0.015)
0.3
⇒𝑍= − 0.00576
√
0.3
⇒𝑍= −
0.07589
⇒ 𝑍 = −3.953
At 5% level, the tabulated value of Zα is 1.645.
Since |Z| = 3.953 > 1.645
Hence Null Hypothesis 𝐻0 is rejected at 5% level of significance.
9. A machine produced 16 defective articles in a batch of 500. After overhauling it produced 3 defectives in a batch of
100. Has the machine improved?
Sol.
Set the null hypothesis 𝐻0 ; 𝑃1 = 𝑃2
Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
The level of significance 𝛼 = 0.01 (1%)
Given
𝑛1 = 500, 𝑛2 = 100 & 𝑥1 = 16, 𝑥2 = 3
16 3
∴ 𝑝1 = 500 = 0.032, 𝑝2 = 100 = 0.03
𝑥 +𝑥
We know that 𝑃 = 𝑛1 +𝑛2
1 2
16+3
⇒ 𝑃 = 500+100
19
⇒ 𝑃 = 600
⇒ 𝑃 = 0.03166 ⇒ 𝑄 = 1 − 𝑃 = 1 − 0.03166 = 0.96834
𝑝1 −𝑝2
∴𝑍= 1 1
√𝑃𝑄(𝑛 +𝑛 )
1 2
10. A machine produced 25 defective articles in a batch of 400. After over hauling it produced 15 defectives in a batch
of 200. Test at 1% level of significance whether there is a reduction of defective articles after overhauling.
Sol.
0.0625−0.075
⇒𝑍=
1 1
√(0.0666×0.9334)( + ))
400 200
0.0125
⇒𝑍=−
√(0.0621)(0.0075)
0.0125
⇒𝑍= − 0.00046
√
0.0125
⇒𝑍= − 0.0214
⇒ 𝑍 = −0.5841
At 1% level, the tabulated value of Zα is 1.96.
Since |Z| = 0.5841 <1.96
Hence Null Hypothesis 𝐻0 is accepted at 1% level of significance
11. In an examination given to students at a large number of different schools the mean grade was 74.5 and S.D grade
was 8. At one particular school where 200 students took the examination the mean grade was 75.9. Discuss the
significance of this result at both 5% and 1% level of significance.
Sol.
The level of significance 𝛼 = 0.05 (5%) 𝑍0.05 =1.96
The level of significance 𝛼 = 0.01 (1%) 𝑍0.01 =1.64
Given
𝑛 = 200
𝜎=8
𝜇 = 74.5𝑎𝑛𝑑𝑥̄ = 75.9
We calculate Z through Test Statistic,
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 10
Inspire before you expire…, TIE- Notes and Resources BCS301
𝑥̄ − 𝜇
𝑍=𝜎
⁄ 𝑛
√
75.9 − 74.5
⇒𝑍=
8⁄
√200
1.4
⇒𝑍=
8⁄
14.1421
1.4 × 14.1421
⇒𝑍=
8
⇒ 𝑍 = 2.4748
i) Thus At 5% level, the tabulated value of Zα is 1.645.
Since |Z| = 2.4748 > 1.96
Hence Null Hypothesis 𝐻0 is rejected at 5% level of significance.
ii) Thus At 1% level, the tabulated value of Zα is 1.645.
Since |Z| = 2.4748 > 1.645
Hence Null Hypothesis 𝐻0 is rejected at 1% level of significance.
12. Intelligent tests were given to the two groups of boys and girls,
Mean S.D Size
Girls 75 8 60
Boys 73 10 100
Find out if the two mean significantly differ at 5% level of significance.
Soln:
Set The null hypothesis 𝐻0 ; 𝑃1 = 𝑃2
Set the Alternative hypothesis 𝐻1 : 𝑃1 ≠ 𝑃2
where, P1 refers the girls and P2 refers the boys
Given, the means, S.D’s & sizes of both the groups of girls and boys are as follows,
𝑥1 = 75 , ̅̅̅
̅̅̅ 𝑥2 = 73 , 𝜎1 = 8 , 𝜎2 = 10 , 𝑛1 = 60 , 𝑛2 = 100
(𝑥̄ 2 −𝑥̄ 1` ) (73−75) 2
WKT, 𝑍 = = =− = −1.3898.
𝜎 2 𝜎 2 64 100 √2.07
√ 1 + 2 √ +
𝑛1 𝑛2 60 100
***
Sampling Variables: Variables sampling is the process used to predict the value of a specific
variable within a population. For example, a limited sample size can be used to compute the average
accounts receivable balance, as well as a statistical derivation of the plus or minus range of the total
receivables value that is under review.
The Central Limit Theorem: Suppose that a sample of size 𝑛 is selected from a population that
has mean 𝜇 and the standard deviation 𝜎, then Let 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 . . . . . . . , 𝑥𝑛 be the 𝑛 observations, they
𝑥 +𝑥 +𝑥 +......𝑥𝑛 1
are independent and identically distributed with mean 𝑋̄ = 1 2 3 = ∑𝑛𝑖=1 𝑥𝑖 , the central
𝑛 𝑛
limit theorem states that the sample mean 𝑥̄ follows approximately the normal distribution with
𝜎 𝜎
mean 𝜇and standard deviation ( is also called Standard error) , i.e. 𝑋̄ ~𝑁 (𝜇, ) , where 𝜇 , 𝜎are
√𝑛 √𝑛
mean and standard deviation of the population from where the sample was selected and the sample
size becomes large ( 𝑛 ≥ 30).
Degrees of freedom: Degrees of freedom refer to the maximum number of logically independent
values, which may vary in a data sample. Degrees of freedom are calculated by subtracting one
from the number of items within the data sample (𝑛 -1).
Confidence Intervals:
Suppose we want to estimate an actual population mean𝜇. As you know, we can only obtain𝑥̄ , the
mean of a sample randomly selected from the population of interest. We can use 𝑥̄ to find a range of
values:
Lower value<population mean𝜇 <Upper value
That we can be really confident contains the population mean𝜇. The range of values is called a
"confidence interval."
𝑋̄ −𝜇
𝑍=𝜎
⁄ 𝑛
√
𝜎
Confidence interval 𝐶. 𝐼. == 𝑀𝑒𝑎𝑛 ± 𝑍(𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛/√𝑆𝑎𝑚𝑝𝑙𝑒𝑆𝑖𝑧𝑒 = 𝜇 ± 𝑍 or 𝑋̄ =
√𝑛
𝜎
𝜇±𝑍
√𝑛
Confidence 99% 98% 95% 90% 50%
Level
Z 2.58 2.33 1.96 1.645 0.6745
PROBLEMS
1. State Central limit theorem. Use the theorem to evaluate P[50 < 𝑿̄ < 56] where 𝑿̄ represents
the mean of a random sample of size 100 from an infinite population with mean 𝜇 = 53 and
variance 𝝈𝟐 = 400.
Sol.
The central limit theorem states that the sample mean 𝑥̄ follows approximately the normal
𝜎 𝜎
distribution with mean 𝜇and standard deviation ( is also called Standard error), i.e., 𝑥̄ ~𝑁 (𝜇, )
√𝑛 √𝑛
, where 𝜇 , 𝜎are mean and standard deviation of the population from where the sample.
Given,
Sample size n=100
Mean of the population 𝜇 = 53
Variance of the population 𝜎 2 = 400 ⇒ 𝜎 = √400 = 20
𝜎
𝑋̄ ~𝑁 (𝜇, )
√𝑛
20
⇒ 𝑋̄ ~𝑁 (53, )
√100
⇒ 𝑋̄ ~𝑁(53,2)
𝑋̄ −𝜇
∴ we know that 𝑍 = 𝜎
⁄ 𝑛
√
𝑋̄ −53
⇒𝑍= 20⁄
√100
𝑋̄ −53
⇒𝑍=
2
50−53 3
∴ At 𝑋̄ =50 ⇒ 𝑍 = = − = −1.5 = 𝑧1
2 2
56−53 3
At 𝑋̄ =56 ⇒ 𝑍 = = = 1.5 = 𝑧2
2 2
∴ 𝑃(50 < 𝑋̄ < 56) = 𝑃(−1.5 < 𝑧 < 1.5)
= 2𝑃(0 < 𝑧 < 1.5)
= 2𝐴(1.5)
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |2
Inspire before you expire…, TIE- Notes and Resources BCS301
= 2 × 0.4332
∴ 𝑃(50 < 𝑋̄ < 56) = 0.8664
2. An unknown distribution has a mean of 90 and a standard deviation of 15. Samples of size 𝑛
= 25 are drawn randomly from the population. Find the probability that the sample mean is
between 85 and 92.
Sol.
Given,
Sample size n=25
Mean of the population 𝜇 = 90
Variance of the population ⇒ 𝜎 = 15
𝜎
𝑋̄ ~𝑁 (𝜇, )
√𝑛
15
⇒ 𝑋̄ ~𝑁 (90, )
√25
̄
⇒ 𝑋 ~𝑁(90,3)
𝑋̄ −𝜇
∴ we know that 𝑍 = 𝜎
⁄ 𝑛
√
𝑋̄ −90
⇒𝑍= 15⁄
√25
𝑋̄ −90
⇒𝑍=
3
85−90 5
∴ At 𝑋̄ = 85 ⇒ 𝑧 = = − = −1.66
3 3
92−90 2
̄
∴ At 𝑋 = 92 ⇒ 𝑧 = = = 0.66
3 3
3. A random sample of size 64 is taken from an infinite population having mean 112 and
variance 144. Using central limit theorem, find the probability of getting the sample mean
𝑋̅ greater than 114.5.
Sol.
Given,
Sample size n=64
Mean of the population 𝜇 = 112
Variance of the population ⇒ 𝜎 2 = 144 ⇒ 𝜎 = 12
𝜎
𝑋̄ ~𝑁 (𝜇, )
√𝑛
12
⇒ 𝑋̄ ~𝑁 (112, )
√64
̄
⇒ 𝑋 ~𝑁(90,1.5)
𝑋̄ −𝜇
∴ we know that 𝑍 = 𝜎
⁄ 𝑛
√
𝑋̄ −112
⇒𝑍= 12⁄
√64
𝑋̄ −112
⇒𝑍=
1.5
114.5−112
∴ At 𝑋̄ = 114.5 ⇒ 𝑧 = = 1.66
1.5
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |3
Inspire before you expire…, TIE- Notes and Resources BCS301
∴ 𝑃(𝑋̄ > 114.5) = 𝑃(𝑧 > 1.66)
⇒ 𝑃(𝑧 > 1.66) = 0.5 − 𝑃(0 < 𝑧 < 1.66)
= 0.5 − 0.4515
⇒ 𝑃(𝑧 > 1.66) = 0.0489
4. Let 𝑿̄denote the mean of a random sample of size 100 from a distribution, that is 𝝌𝟐 (𝟓𝟎).
Compute an approximate value of P(49<𝑿̄<51).
Sol.
The sample size n is=100
The chi-square distribution is given as ,X~ 𝜒 2 (50) , where d.f.=50
The mean and variance of chi-square distribution is given as, 𝜇 = 50
Therefore ⇒ 𝜎 2 = 2 × 𝑑. 𝑓. = 2 × 50 = 100 ⇒ 𝜎 = 10
The sample mean of chi-square distribution follows normal distribution with mean and
𝜎
standard error .
√𝑛
𝜎
∴ 𝑋̄ ~𝑁 (𝜇, )
√𝑛
10
⇒ 𝑋̄ ~𝑁 (50, )
√100
⇒ 𝑋̄ ~𝑁(50,1)
𝑋̄ −𝜇
∴ we know that 𝑍 = 𝜎
⁄ 𝑛
√
49−50 1
∴ At 𝑋̄ =50 ⇒ 𝑍 = = − = −1 = 𝑧1
1 1
51−50 1
At 𝑋̄ =51 ⇒ 𝑍 = = = 1 = 𝑧2
1 1
∴ 𝑃(49 < 𝑋̄ < 51) = 𝑃(−1 < 𝑧 < 1)
= 2𝑃(0 < 𝑧 < 1)
=𝐴(1)
= 2 × 0.3416
∴ 𝑃(50 < 𝑋̄ < 56) = 0.6826
5. An electrical firm manufactures light bulbs that have a length of life that is approximately
normally distribute with mean 800 hours and a standard deviation of 40 hours. Find the
probability that a random sample of 16 bulbs will have an average life of less than 775
hours.
Sol.
Total number of bulbs n=16
An average life of bulbs 𝜇 = 800
Standard deviation of the bulbs ⇒ 𝜎 = 40
𝜎
𝑋̄ ~𝑁 (𝜇, )
√𝑛
40
⇒ 𝑋̄ ~𝑁 (800. , )
√16
⇒ 𝑋̄ ~𝑁(800,10)
𝑋̄ −𝜇 𝑋̄ −800
∴ We know that 𝑍 = 𝜎 =
⁄ 𝑛 10
√
775−800 25
∴ At 𝑋̄ =775 ⇒ 𝑍 = =− = −2.5.
10 10
∴ 𝑃(𝑋̄ < 775) = 𝑃(𝑧 < −2.5)
⇒ 𝑃(𝑧 < −2.5) = 𝑃(𝑧 > 2.5)
⇒ 𝑃(𝑧 < −2.5) = 0.5 − 𝑃(0 < 𝑧 < 2.5)
⇒ 𝑃(𝑧 < −2.5) = 0.5 − 𝐴(2.5)
6. The heights of a random sample of 50 college students showed a mean of 174.5 centimeters
and a standard deviation of 6.9 centimeters. Construct a 99% confidence interval for the
mean height of all college students.
Sol.
Given the sample size n=50
Average height of Students (Mean) 𝜇 = 174.5𝑐. 𝑚.
Standard deviation of the Students 𝜎 = 6.9𝑐. 𝑚.
We know that, Confidence level of 99%, the corresponding z value is 2.576. This is
determined from the normal distribution table.
𝜎
Confidence interval 𝐶. 𝐼. == 𝑀𝑒𝑎𝑛 ± 𝑍(𝑆 𝑡𝑎𝑛 𝑑 𝑎𝑟𝑑𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛/√𝑆𝑎𝑚𝑝𝑙𝑒𝑆𝑖𝑧𝑒 = 𝜇 ± 𝑍
√𝑛
6.9
∴ 𝐶. 𝐼. = 174.5 ± (2.576 × )
√50
⇒ 𝐶. 𝐼. = 174.5 ± (2.576 × 0.9758)
⇒ 𝐶. 𝐼. = 174.5 ± 2.5136
The lower end of the confidence interval is = 174.5 − 2.5136 = 171.9864
The upper end of the confidence interval is = 174.5 + 2.5136 = 177.0136
Therefore, with 99% confidence interval, the mean height of all college students is
between 171.9864 centimeters and 177.0136 centimeters.
7. The mean and SD of the diameters of a sample of 250 rivet heads manufactured by a
company are 7.2642 mm and 0.0058 mm respectively. Find,
a) 99% b) 98% c) 95% d) 90% e) 50%
Confidence limits for the mean diameter of all the rivet heads manufactured by the
company.
Sol.
Given the sample size n=250
Mean of a diameter 𝜇 = 7.2642𝑚𝑚.
Standard deviation of the diameter 𝜎 = 0.0058𝑚𝑚
We know that,
0.0058
× )
√250
0.0058
× )
√250
95% 7.2642 ± (1.96 7.2642 ± 0.00073 (7.26347 , 7.26493)
0.0058
× )
√250
90% 7.2642 ± (1.645 7.2642 ± 0.00061 (7.26359 , 7.26481)
0.0058
× )
√250
50% 7.2642 ± (0.6745 7.2642 ± 0.00025 (7.26395 , 7.26445)
0.0058
× )
√250
8. A random sample of size 25 from a normal distribution (𝜎 2 = 4) yields, sample mean 𝑋̅ = 78.3.
Obtain a 99% confidence interval for 𝜇.
Sol.
Given the sample size n=25
Mean of sample 𝑋̄ = 78.3
Standard deviation 𝜎 = 2
We know, Confidence level of 99%, the corresponding z value is 2.58. This is determined
from the normal distribution table.
9. Let the observed value of the mean 𝑿̄of a random sample of size 20 from a normal
distribution with 𝑚𝑒𝑎𝑛 𝜇 and variance 𝜎 2 = 80 be 81.2. Find a 90% and 95% confidence
intervals for 𝜇.
Sol.
Given the sample size n=20
Mean of sample 𝑋̄ = 81.2
Variance 𝜎 2 = 80 ⇒ 𝜎 = √80 = 8.9442
We know, Confidence level of 95%, 90% the corresponding z values are 1.96 , 1.645. This is
determined from the normal distribution table.
𝜎
Confidence interval 𝐶. 𝐼. = 𝜇 = 𝑀𝑒𝑎𝑛 ± 𝑍(𝑆 𝑡𝑎𝑛 𝑑 𝑎𝑟𝑑𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛/√𝑆𝑎𝑚𝑝𝑙𝑒𝑆𝑖𝑧𝑒 = 𝑋̄ ± 𝑍
√𝑛
For 95%:
8.9442
∴ 𝐶. 𝐼. = 𝜇 = 81.2 ± (1.96 × )
√20
⇒ 𝜇 = 81.2 ± 3.92
⇒ 𝐶. 𝐼 = 81.2 − 3.92,81.2 + 3.92) = (77.28,85.12)
(
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |6
Inspire before you expire…, TIE- Notes and Resources BCS301
For 90%:
8.9442
∴ 𝐶. 𝐼. = 𝜇 = 81.2 ± (1.645 × )
√20
⇒ 𝜇 = 81.2 ± 3.29
⇒ 𝐶. 𝐼 = (81.2 − 3.29,81.2 + 3.29) = (77.91,84.49)
10. Suppose that 10, 12, 16, 19 is a sample taken from a normal population with variance 6.25.
Find at 95% confidence interval for the population mean.
Sol.
Given samples are 10, 12, 16 and 19
Therefore, sample size n=4
Mean 𝑋̄ =14.25
Variance 𝜎 2 = 6.25 ⇒ 𝜎 = √6.25 = 2.5
We know, Confidence level of 95%, the corresponding z value is 1.96, This is determined
from the normal distribution table.
𝜎
Confidence interval 𝐶. 𝐼. = 𝜇 = 𝑀𝑒𝑎𝑛 ± 𝑍(𝑆 𝑡𝑎𝑛 𝑑 𝑎𝑟𝑑𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛/√𝑆𝑎𝑚𝑝𝑙𝑒𝑆𝑖𝑧𝑒 = 𝑋̄ ± 𝑍
√𝑛
2.5
∴ 𝐶. 𝐼. = 𝜇 = 14.25 ± (1.96 × )
√4
⇒ 𝜇 = 14.25 ± 2.45
⇒ 𝐶. 𝐼 = (14.25 − 2.45,14.25 + 2.45) = (11.80,16.70)
SAMPLING DISTRIBUTIONS
Student's 𝑡 -distribution:
1 1
Let 𝜇be the mean of population, 𝑥̄ = ∑𝑛𝑖=1 𝑥𝑖 be the mean and 𝑠 = √ ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̄ )2 be
𝑛 𝑛−1
the standard deviation of a sample, then the Student's 𝑡 -distribution is defined as
𝑥̄ −𝜇 𝑥̄ −𝜇
𝑡=𝑠 = √𝑛
⁄ 𝑛 𝑠
√
Another formula for 𝑡 - test of two samples is
(𝑥̄ −𝑥̄ )
𝑡 = 21 11
𝑠√ +
𝑛1 𝑛2
𝑛1 𝜎1 2 +𝑛2 𝜎2 2 1 𝑛1 𝑛2
where, 𝑠 2 = or 𝑠 = √ [∑𝑖=1(𝑥𝑖 − 𝑥̄ 1 )2 + ∑𝑖=1(𝑥𝑖 − 𝑥̄ 2 )2 ]
𝑛1 +𝑛2 −2 𝑛 1 +𝑛2 −2
Chi-square distribution:
Let 𝑂𝑖 (𝑖 = 1,2,3. . . 𝑛) and 𝐸𝑖 (𝑖 = 1,2,3. . . 𝑛) be the set of observed frequencies and expected
frequencies respectively, then the Chi-square distribution is defined as
(𝑂1 −𝐸1 )2 (𝑂2 −𝐸2 )2 (𝑂3 −𝐸3 )2 (𝑂𝑛 −𝐸𝑛 )2
𝜒2 = + + +. . . . . . +
𝐸1 𝐸2 𝐸3 𝐸𝑛
(𝑂𝑖 −𝐸𝑖 )2
⇒ 𝜒 2 = ∑𝑛𝑖=1
𝐸𝑖
F-Distribution:
The F-distribution is useful in hypothesis testing. Hypothesis testing is used by scientists to
statistically compare data from two or more populations. The F-distribution is needed to determine
whether the F-value for a study indicates any statistically significant differences between two
populations.
11. A certain stimulus administered to each of the 12 patients resulted in the following change
in the blood pressure 5,2,8,-1,3,0,6,-2,1,5,0,4. Can it be concluded that the stimulus will
increase the blood pressure? (Note: t0.05 for 11 d.f. is 2.201).
Sol.
Given the change in blood pressure
𝑥: 5,2,8,-1,3,0,6,-2,1,5,0,4
1 31
∴ 𝑥̄ = ∑ 𝑥 = = 2.5833
𝑛 12
2 1
Variance, 𝑠 = ∑(𝑥 − 𝑥̄ )2
𝑛−1
⇒ 𝑠2 =
2 2 2 2 2 2
1 (5 − 2.58) + (2 − 2.58) + (8 − 2.58) + (−1 − 2.58) + (0 − 2.58) + (6 − 2.58)
{ }
11 +(−2 − 2.58)2 + (1 − 2.58)2 + (5 − 2.58)2 + (0 − 2.58)2 + (4 − 2.58)2
⇒ 𝑠 2 = 9.538 ⇒ 𝑠 = 3.088
Let us suppose that the stimulus administration is not accompanied with increase in blood
pressure, we can take 𝜇 = 0
we have,
𝑥̄ −𝜇
𝑡= 𝑠
√𝑛
2.5833−0
⇒𝑡= 3.088
( )
√12
⇒ 𝑡 = 2.8979 ≈ 2.9 > 2.201
Hence the hypothesis is rejected at 5%level of significance. We conclude with 95%
Confidence that the stimulus in general is accompanied with increase of blood pressure.
12. A random sample of 10 boys had the following I.Q: 70, 120, 110, 101, 88, 83, 95, 98, 107,
100. Does this data support the assumption of a population mean I.Q. of 100 at 5% level of
Significance? (Note:𝒕𝟎.𝟎𝟓 =2.262 for 9 d.f.).
Sol.
Given the I.Q. of 10 boys
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e |8
Inspire before you expire…, TIE- Notes and Resources BCS301
𝑥: 70, 120, 110,101,88,83,95,98,107,100
1 972
∴ 𝑥̄ = ∑ 𝑥 = = 97.2
𝑛 10
2 1
Variance, 𝑠 = ∑(𝑥 − 𝑥̄ )2
𝑛−1
1
⇒ 𝑠 2 = × 1833.6
9
⇒ 𝑠 2 = 203.73333
⇒ 𝑠 = 14.2735
Given the mean of population 𝜇 = 100
We have,
𝑥̄ −𝜇
𝑡= 𝑠
√𝑛
97.2−100
⇒𝑡= 14.2735
( )
√10
−2.8
⇒𝑡= ≈ −0.6203 < 2.262
4.5136
13. Ten individuals are chosen at random from a population and their heights in inches are
found to be 63, 63,66,67,68,69,70,70,71,71 . Test the hypothesis that the mean height of the
universe is 66inches (𝒕𝟎.𝟎𝟓 =2.262 for 9 d.f.).
Sol.
Given the heights of the population in inches
𝑥: 63, 63,66,67,68,69,70,70,71,71
1 678
∴ 𝑥̄ = ∑ 𝑥 = = 67.8
𝑛 10
2 1
Variance, 𝑠 = ∑(𝑥 − 𝑥̄ )2
𝑛−1
⇒ 𝑠2 =
2 2 2 2 2 2
1 (63 − 67.8) + (63 − 67.8) + (66 − 67.8) + (67 − 67.8) + (68 − 67.8) + (69 − 67.8)
{ }
9 +(70 − 67.8)2 + (70 − 67.8)2 + (71 − 67.8)2 + (71 − 67.8)2
⇒ 𝑠 2 = 9.067 ⇒ 𝑠 = 3.011
14. The nine items of a sample have the following values: 45, 47, 50, 52, 48, 47, 49, 53, 51. Does
the mean of these differ significantly from the assumed mean of 47.5 at 5% significance
level?
Sol.
Given sample values: 45, 47, 50, 52, 48, 47, 49, 53, 51
Therefore, sample size n=9
Population Mean 𝜇 = 47.50
1 442
∴Sample mean 𝑥̄ = ∑ 𝑥 = = 49.11
𝑛 9
15. Two types of batteries are tested for their length of life and the following results are
obtained:
Battery A: 𝒏𝟏 = 𝟏𝟎, 𝒙̄ 𝟏 = 𝟓𝟎𝟎𝒉𝒓𝒔. , 𝝈𝟏 𝟐 = 𝟏𝟎𝟎
Battery B: 𝒏𝟐 = 𝟏𝟎, 𝒙̄ 𝟐 = 𝟓𝟔𝟎𝒉𝒓𝒔. , 𝝈𝟐 𝟐 = 𝟏𝟐𝟏
Compute Student's t and test whether there is a significant difference in the two means.
Sol.
Given
Battery A: 𝑛1 = 10, 𝑥̄ 1 = 500ℎ𝑟𝑠. , 𝜎1 2 = 100
Battery B: 𝑛2 = 10, 𝑥̄ 2 = 560ℎ𝑟𝑠. , 𝜎2 2 = 121
We know that,
𝑛1 𝜎1 2 +𝑛2 𝜎2 2
𝑠2 =
𝑛1 +𝑛2 −2
(10×100)+(10×121)
⇒ 𝑠2 =
10+10−2
2
⇒ 𝑠 = 122.78
⇒ 𝑠 = 11.0805
We have,
(𝑥̄ −𝑥̄ )
𝑡 = 21 11
𝑠√ +
𝑛1 𝑛2
560−500
⇒𝑡=
11.0805√0.1+0.1
⇒ 𝑡 = 12.1081 ≈ 12.11
The value of t is greater than the table value of t for 18d.f.at all levels of significance.
16. A group of boys and girls were given an intelligence test. The mean score , SD score and
numbers in each group are as follows.
Boys Girls
Mean 74 70
SD 8 10
n 12 10
17. Two horses A and B were tested according to the time (In Seconds) to run a particular
race with the following results:
Horse A 28 30 32 33 33 29 34
Horse B 29 30 30 24 27 29 -
Test whether you can discriminate between the two horses.
Sol.
Let the variables x and y respectively correspond to Horse A and B
𝑥: 28,30,32,33, ,33,29,34
𝑦: 29,30,30,24,27,29
1 219 1 169
∴ 𝑥̄ = ∑𝑖 𝑥𝑖 = = 31.30 , ∴ 𝑦̄ = ∑𝑖 𝑦𝑖 = = 28.20
𝑛1 7 𝑛2 6
∑(𝑥 − 𝑥̄ )2 = (28 − 31.3)2 + (30 − 31.3)2 + (32 − 31.3)2 + (33 − 31.3)2 + (33 − 31.3)2 +
(29 − 31.3)2 + (34 − 31.3)2 = 31.4
∑(𝑦 − 𝑦̄ )2 = (29 − 28.20)2 + (30 − 28.20)2 + (30 − 28.20)2 + (24 − 28.20)2 + (27 −
28.20)2 + (29 − 28.20)2 = 26.84
1
∴ 𝑠2 = [∑(𝑥 − 𝑥̄ )2 + ∑(𝑦 − 𝑦̄ )2 ]
𝑛1 +𝑛2 −2
2 31.4+26.84
⇒𝑠 = = 5.2973
7+6−2
⇒ 𝑠 = 2.3016
We have,
|(𝑥̄ 2 −𝑥̄ 1 )| 31.30−28.20 > 𝑡0.05 = 2.2
𝑡= ⇒𝑡= ⇒ 𝑡 = 2.42 {
1
𝑠√ +
1 1 1
2.3016√ + < 𝑡0.02 = 2.72
𝑛1 𝑛2 7 6
18. Four coins are tossed 100 times and the following results were obtained:
No. of Heads 0 1 2 3 4
Frequency 5 29 36 25 5
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 11
Inspire before you expire…, TIE- Notes and Resources BCS301
𝟐
Fit a binomial distribution for the data and test the goodness of fit 𝝌𝟎.𝟎𝟓 = 9.49 for 4 d.f.
Sol.
Given the 4 coins are tossed 100 times
The probability of getting head is p=0.5, q=0.5
The probability mass function of a binomial distribution is
𝑃(𝑋 = 𝑥) = 4𝐶𝑥 (0.5)𝑥 (0.5)4−𝑥
𝑃(0) = 4𝐶0 (0.5)0 (0.5)4−0 = 0.0625
𝑃(1) = 4𝐶1 (0.5)1 (0.5)4−1 = 0.25
𝑃(2) = 4𝐶2 (0.5)2 (0.5)4−2 = 0.375
𝑃(3) = 4𝐶3 (0.5)3 (0.5)4−3 = 0.25
𝑃(4) = 4𝐶4 (0.5)4 (0.5)4−4 = 0.0625
𝑂𝑖 5 29 36 25 5
𝐸𝑖 6.25 25 37.5 25 6.25
(𝑂𝑖 −𝐸𝑖 )2
∴ 𝜒 2 = ∑𝑖 [ ]
𝐸𝑖
1.5625 16 2.25 1.5625
⇒ 𝜒2 = + + +0+
6.25 25 37.5 6.25
⇒ 𝜒 2 = 0.25 + 0.64 + 0.06 + 0.25
⇒ 𝜒 2 = 1.2 < 𝜒 2 0.05 = 9.49
19. A dice thrown 264 times and the number appearing on the face (𝒙) follows the following
frequency (𝒇) distribution.
𝒙 1 2 3 4 5 6
𝒇 40 32 28 58 54 60
Calculate the value of 𝝌𝟐 .
Sol.
The frequencies in the given data are the observed frequencies. assuming that dice is
unbiased, the expected number of frequencies for the numbers 1,2,3,4,5,6 to appear on the
264
face is = 44 each.
6
Now the data is as follows:
𝑥 1 2 3 4 5 6
𝑂𝑖 40 32 28 58 54 60
𝐸𝑖 44 44 44 44 44 44
(𝑂𝑖 −𝐸𝑖 )2
∴ 𝜒 2 = ∑𝑖 [ ]
𝐸𝑖
(40−44)2 (32−44)2 (28−44)2 (58−44)2 (54−44)2 (60−44)2
⇒ 𝜒2 = + + + + +
44 44 44 44 44 44
20. A die was thrown 60 times and the following frequency distribution was observed:
Faces 1 2 3 4 5 6
Frequency 15 6 4 7 11 17
Test whether the die is unbiased at 5% significance level.
Sol.
The frequencies in the given data are the observed frequencies. Assuming that dice is
unbiased, the expected number of frequencies for the numbers 1,2,3,4,5,6 to appear on the
60
face is = 10 each.
6
Now the data is as follows:
𝑥 1 2 3 4 5 6
𝑂𝑖 15 6 4 7 11 17
𝐸𝑖 10 10 10 10 10 10
(𝑂𝑖 −𝐸𝑖 )2
∴ 𝜒 2 = ∑𝑖 [ ]
𝐸𝑖
(15−10)2 (6−10)2 (4−10)2 (7−10)2 (11−10)2 (17−10)2
⇒ 𝜒2 = + + + + +
10 10 10 10 10 10
2 1 136
⇒𝜒 = [25 + 16 + 36 + 9 + 1 + 49] =
10 10
⇒ 𝜒 2 = 13.6
21. A survey of 320 families with 5 children each revealed the following distribution.
No. of boys 5 4 3 2 1 0
No. of girls 0 1 2 3 4 5
No. of families 14 56 110 88 40 12
Is the result consistent with the hypothesis that male and female births are equally
probable at 5% level of significance?
Sol.
Given,
Number of families selected for the survey = 320
1
The probability of female and male birth is equal, 𝑝 = = 0.5 ⇒ 𝑞 = 1 − 𝑝 = 1 − 0.5 =
2
0.5
Number of children in the selected families, n = 5
No. of boys 5 4 3 2 1 0
No. of girls 0 1 2 3 4 5
No. of families 14 56 110 88 40 12
We have the Table value of𝜒 2 for 5 degrees of freedom at level of significance 5% from the
chi-square table is 11.07.
(𝑂 −𝐸 )2
∴ 𝜒 2 = ∑𝑖 [ 𝑖 𝑖 ] ⇒ 𝜒 2 = 7.16 < 11.02
𝐸𝑖
Since the calculated 𝜒 2 value is less than tabulated 𝜒 2 value then the decision is fail to reject
the 𝐻0 (Accept 𝐻0 ) that means both the male and female birth is equal.
22. The theory predicts the proportion of beans in the four groups A, B, C and D should be
9:3:3:1. In an experiment among 1600 beans, the number in four groups were 882, 313, 287
and 118. The chi square value is approximately equal to.
Sol.
Given,
The total number of beans: 882+313+287+118=1600
Sum of the ratios: 9+3+3+1=16
9
𝐸(𝐴) = 1600 × = 900
16
3
𝐸(𝐵) = 1600 × = 300
16
3
𝐸(𝐶) = 1600 × = 300
16
1
𝐸(𝐷) = 1600 × = 100
16
𝑶𝒊 𝑬𝒊 𝑶𝒊 − 𝑬𝒊 (𝑶𝒊 − 𝑬𝒊 )𝟐 ∑(𝑶𝒊 − 𝑬𝒊 )𝟐
𝑬𝒊
882 900 -18 324 0.36
313 300 13 169 0.5633
287 300 -13 169 0.5633
118 100 18 324 3.24
(𝑂𝑖 −𝐸𝑖 )2
∴ 𝜒 2 = ∑𝑖 [ ]
𝐸𝑖
⇒ 𝜒 2 = 4.72
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 14
Inspire before you expire…, TIE- Notes and Resources BCS301
23. Two random samples drawn from two normal populations are:
Sample-I 20 16 26 27 22 23 18 24 19 25 - -
Sample-II 27 33 42 35 32 34 38 28 41 43 30 37
Obtain the estimates of the variance of the population and test 5% level of significance
whether the two populations have the same variance.
Sol.
Set Null Hypothesis:𝐻0 : 𝜎1 2 = 𝜎2 2
i.e., The two samples are drawn from two populations having the same variance.
Alternate Hypothesis: 𝐻1 : 𝜎1 2 ≠ 𝜎2 2
Given,
Sample-I 20 16 26 27 22 23 18 24 19 25 - -
Sample-II 27 33 42 35 32 34 38 28 41 43 30 37
𝑛
∑𝑖=11 𝑥𝑖 20+16+26+27+22+23+18+24+19+25 220
𝑥̄ 1 = ⇒ 𝑥̄ 1 = ⇒ 𝑥̄ 1 = ⇒ 𝑥̄ 1 = 22
𝑛1 10 10
𝑛2
∑𝑖=1 𝑥𝑖 27+33+42+35+32+34+38+28+41+43+30+37 420
𝑥̄ 2 = ⇒ 𝑥̄ 2 = ⇒ 𝑥̄ 2 = ⇒ 𝑥̄ 2 = 35
𝑛2 12 12
24. The table shows the standard Deviation and Sample Standard Deviation for both men and
women. Find the f statistic considering the Men population in numerator.
Population Population Standard Sample Standard
Deviation Deviation
Men 30 35
Women 50 45
Sol.
Given,
𝜎1 =Standard deviation of population-1=30
𝜎2 =Standard deviation of population-2=50
𝑠1 =Standard deviation of sample-1=35
𝑠2 =Standard deviation of sample-2=45
We know that,
𝑠1 2 2
𝜎1 2 (35 ⁄ 2 ) (1225⁄900) 1.3610
30
𝐹= 𝑠2 2
⇒𝐹= 2 ⇒𝐹= ⇒𝐹= ⇒ 𝐹 = 1.68
(45 ⁄ 2 ) (2025⁄2500) 0.81
𝜎2 2 50
***
Prepared by:
Purushotham P
Assistant Professor
SJC Institute of Technology
Email id: [email protected]
Experimental unit:
For conducting an experiment, the experimental material is divided into smaller parts and
each part is referred to as an experimental unit. The experimental unit is randomly assigned to
treatment is the experimental unit. The phrase “randomly assigned” is very important in this
definition.
Experiment:
A way of getting an answer to a question which the experimenter wants to know.
Treatment
Different objects or procedures which are to be compared in an experiment are called
treatments.
Sampling unit:
The object that is measured in an experiment is called the sampling unit. This may be different
from the experimental unit.
Factor:
A factor is a variable defining a categorization. A factor can be fixed or random in nature. A
factor is termed as a fixed factor if all the levels of interest are included in the experiment. A
factor is termed as a random factor if all the levels of interest are not included in the experiment
and those that are can be considered to be randomly chosen from all the levels of interest.
Replication:
It is the repetition of the experimental situation by replicating the experimental unit.
ANOVA:
Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed
aggregate variability found inside a data set into two parts: systematic factors and random factors.
The systematic factors have a statistical influence on the given data set, while the random factors do
not.
ANOVA stands for Analysis of Variance. It is a statistical method used to analyze the
differences between the means of two or more groups or treatments. It is often used to determine
whether there are any statistically significant differences between the means of different groups
There are two main types of ANOVA: one-way (or unidirectional) and two-way. There also
variations of ANOVA.
• In social sciences, ANOVA tests can be used to study the statistical significance of various study
environments on test scores. Medical research. In medical research, the ANOVA test can be used
to identify the relationship between various types or brands of medications on individuals with
migraines or depression.
• We can use the ANOVA test to compare different suppliers and select the best available. ANOVA
(Analysis of Variance) is used when we have more than two sample groups and determine whether
there are any statistically significant differences between the means of two or more independent
sample groups.
CRD: A completely randomized design (CRD) is one where the treatments are assigned completely
at random so that each experimental unit has the same chance of receiving any one treatment.
RBD: A randomized block design is a restricted randomized design, in which experimental units are
first organized into homogeneous blocks and then the treatments are assigned at random to these units
LSD: The Latin Square Design gets its name from the fact that we can write it as a square with Latin
letters to correspond to the treatments. The treatment factor levels are the Latin letters in the Latin
square design. The number of rows and columns has to correspond to the number of treatment levels.
PROBLEMS:
1. Three processes A, B and C are tested to see whether their outputs are equivalent. The
following observations of outputs are made:
A 10 12 13 11 10 14 15 13
B 9 11 10 12 13 - - -
C 11 10 15 14 12 13 - -
Carry out the analysis of variance and state your conclusion.
Sol. To carry out the analysis of variance, we form the following tables
Total Squares
A 10 12 13 11 10 14 15 13 T1=98 T21=9604
B 9 11 10 12 13 T2=55 T22=3025
C 11 10 15 14 12 13 T3=75 T23=5625
Total T 228 -
Sum of Squares
⇒ 𝑆𝑆𝐸 = 58 − 7
⇒ 𝑆𝑆𝐸 = 51
Total Squares
S1 9 7 6 5 8 T1=35 T21=1225
S2 7 4 5 4 5 T2=25 T22=625
S3 6 5 6 7 6 T3=30 T23=900
Total T= 90 -
Sum of Squares
S1 81 49 36 25 64 255
S2 49 16 25 16 25 131
S3 36 25 36 49 36 182
3. Three different kinds of food are tested on three groups of rats for 5 weeks. The
objective is to check the difference in mean weight (in grams) of the rats per week.
Apply one-way ANOVA using a 0.05 significance level to the following data:
Food 1 8 12 19 8 6 11
Food 2 4 5 4 6 9 7
Food 3 11 8 7 13 7 9
Sol. To carry out the analysis of variance, we form the following tables
Total Squares
8 12 19 8 6 11
F1 T1=64 T21=4096
4 5 4 6 9 7
F2 T2=35 T22=1225
11 8 7 13 7 9
F3 T3=55 T23=3025
Total T
154 -
Sum of Squares
F2 16 25 16 36 81 49 223
Total 18-1=17 - -
Since evaluated value 3.55 <3.68 for F(2,15) at 5% level of significance
Hence the null hypothesis is accepted , there is no significance between the three process.
4. Three types of fertilizers are used on three groups of plants for 5 weeks. We want
to check if there is a difference in the mean growth of each group. Using the data
given below apply a one-way ANOVA test at 0.05 significant level
Fertilizer 1 6 8 4 5 3 4
Fertilizer 2 8 12 9 11 6 8
Fertilizer 3 13 9 11 8 7 12
Sol.
To carry out the analysis of variance, we form the following tables
Total Squares
6 8 4 5 3 4
F1 T1=30 T21=900
8 12 9 11 6 8
F2 T2=54 T22=2916
13 9 11 8 7 12
F3 T3=60 T23=3600
Total T
144 -
Sum of Squares
F1 36 64 16 25 9 16 166
Total Squares
6 7 3 8
A T1=24 T21=576
5 5 3 7
B T2=20 T22=400
5 4 3 4
C T3=16 T23=256
Total T
60 -
Total Squares
36 49 9 64
A 158
25 25 9 49
B 108
25 16 9 16
C 66
2
Grand Total - ∑𝑖 ∑𝑗 𝑥𝑖𝑗
332
Set the null hypotheses 𝐻0 : 𝜇1 = 𝜇2= 𝜇3
𝑇2 (60)2 3600
Correction Factor 𝐶𝐹 = = = = 300
𝑁 12 12
⇒ 𝑆𝑆𝐸 = 32 − 8
⇒ 𝑆𝑆𝐸 = 24
6. A trial was run to check the effects of different diets. Positive numbers indicate weight loss
and negative numbers indicate weight gain. Check if there is an average difference in the
weight of people following different diets using an ANOVA Table.
Low Fat Low Low protein Low
Calorie carbohydrate
8 2 3 2
9 4 5 2
6 3 4 -1
7 5 2 0
3 1 3 3
Sol.
To carry out the analysis of variance, we form the following tables
7. The following data show the number of worms quarantined from the GI areas offour groups
of muskrats in a carbon tetrachloride anthelmintic study. Conduct a
two-way ANOVA test.
I II III IV
33 41 12 38
32 38 35 43
26 40 46 25
14 23 22 13
30 21 11 26
I II III IV
3 11 -18 8
2 8 5 13
-4 10 16 -5
-16 -7 -8 -17
0 -9 -19 -4
T -15 13 -24 -5 -31
T2 225 169 576 25
⇒ 𝑇𝑆𝑆 = 2293 − 48
⇒ 𝑇𝑆𝑆 = 2245
𝑇𝑖 2
Sum of the squares of between the treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
1. Set up an analysis of variance table for the following per acre production data for
three varieties of wheat, each grown on 4 plots and state it the variety differences
are significant at 5% significant level.
Per acre production data
Plot of land Variety of wheat
A B C
1 6 5 5
2 7 5 4
3 3 3 3
4 8 7 4
Sol.
Variety
A B C
36 25 25
49 25 16
9 9 9
64 49 16 Grand Total -
∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 =332
Set the null hypotheses 𝐻0 : 𝜇1 = 𝜇2= 𝜇3 , N=12
𝑇2 (60)2 3600
Correction Factor 𝐶𝐹 = = = = 300
𝑁 12 12
𝑇𝑖 2
Sum of the row squares 𝑆𝑆𝑅 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
Therefore SSE=TSS-SSR-SSC
SSE=32-18-8=6
2. Three varieties of coal were analysed by four chemists and the ash-content in the varieties
was found to be as under.
Varieties Chemists
1 2 3 4
A 8 5 5 7
B 7 6 4 4
C 3 6 5 4
Carry out the analysis of variance.
Chemists T T2
Variety 1 2 3 4
A 8 5 5 7 25 625
B 7 6 4 4 21 441
C 3 6 5 4 18 324
P 18 17 14 15 =64 -
2
P 324 289 196 225
The squares are as follows
Chemists
1 2 3 4
64 25 25 49
49 36 16 16
9 36 25 16 Grand Total - ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 =366
Set the null hypotheses 𝐻0 : 𝜇1 = 𝜇2= 𝜇3 , N=12
𝑇2 (64)2 4096
Correction Factor 𝐶𝐹 = = = = 341.33
𝑁 12 12
Therefore SSE=TSS-SSR-SSC
SSE=24.67-6.17-3.33=15.17
3. Perform ANOVA and test at 0.05 level of significant whether these are differences in the
detergent or in the engines for the following data:
Detergent Engine
I II III
A 45 43 51
B 47 46 52
C 48 50 55
D 42 37 49
Sol.
Given the data
Engine
Detergent
I II III
A 45 43 51
B 47 46 52
C 48 50 55
D 42 37 49
Subtract 45 from all the observations, we get
Detergent Engine T T2
I II III
A 0 -2 6 4 16
B 2 1 7 10 100
C 3 5 10 18 324
D -3 -8 4 -7 49
P 2 -4 27 2 =25
P2 4 16 729 4 -
The squares are
Detergent Engine Sum
I II III
A 0 4 36 40
B 4 1 49 54
C 9 25 100 134
D 9 64 16 89
Grand Total -
∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 = 317
16 100 324 49
𝑆𝑆𝑅 = + + + − 52.08
3 3 3 3
⇒ 𝑆𝑆𝑅 = 5.33 + 33.33 + 108 + 16.33 − 52.08
⇒ 𝑆𝑆𝑅 = 163 − 52.08
⇒ 𝑆𝑆𝑅 = 110.92
𝑃𝑖 2
Sum of the column squares 𝑆𝑆𝐶 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
4 16 729
𝑆𝑆𝐶 = + + − 52.08
4 4 4
⇒ 𝑆𝑆𝐶 = 1 + 4 + 182.25 − 52.08
⇒ 𝑆𝑆𝐶 = 187.25 − 52.08
⇒ 𝑆𝑆𝐶 = 135.17
Therefore SSE=TSS-SSR-SSC
SSE=264.92-110.92-135.17=18.83
Since the null hypothesis is rejected and there is a significance between Detergent and
Engine.
C B A D
25 23 20 20
A D C B
19 19 21 18
B A D C
19 14 17 20
D C B A
17 20 21 15
Sol.
C B A D
25 23 20 20
A D C B
19 19 21 18
B A D C
19 14 17 20
D C B A
17 20 21 15
Null hypothesis Ho : There is no significant difference between rows, columns and treatment
T T2
C B A D
5 3 0 0 8 64
A D C B
-1 -1 1 -2 -3 9
B A D C
-
-1 -6 -3 0
10 100
D C B A
-3 0 1 -5 -7 49
P 0 -4 -1 -7 =- 12
2
P 0 16 1 49 - -
C B A D
25 9 0 0
A D C B
1 1 1 4
B A D C
1 36 9 0
D C B A
9 0 1 25
36 46 11 29 ∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 =122
𝑇2 (−12)2 144
Correction Factor 𝐶𝐹 = = = =9
𝑁 16 16
⇒ 𝑇𝑆𝑆 = 122 − 9
⇒ 𝑇𝑆𝑆 = 113
𝑇𝑖 2
Sum of the row squares 𝑆𝑆𝑅 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
64 9 100 49
𝑆𝑆𝑅 = + + + −9
4 4 4 4
⇒ 𝑆𝑆𝑅 = 16 + 2.25 + 25 + 12.25 − 9
⇒ 𝑆𝑆𝑅 = 55.5 − 9
⇒ 𝑆𝑆𝑅 = 4
𝑃𝑖 2
Sum of the column squares 𝑆𝑆𝐶 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
16 1 49
𝑆𝑆𝐶 = 0 + + + −9
4 4 4
⇒ 𝑆𝑆𝐶 = 4 + 0.25 + 12.25 − 9
⇒ 𝑆𝑆𝐶 = 16.5 − 9
⇒ 𝑆𝑆𝐶 = 7.5
Observations 𝑄
= ∑(𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠) 𝑄2
A 0 -1 -6 -5 -12 144
B 3 -2 -1 1 1 1
C 5 1 0 0 6 36
D 0 -1 -3 -3 -7 49
𝑄𝑖 2
Sum of the squares of treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
144 1 36 49
𝑆𝑆𝑇 = + + + −9
4 4 4 4
⇒ 𝑆𝑆𝑇 = 36 + 0.25 + 9 + 12.25 − 9
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 21
Inspire before you expire…, TIE- Notes and Resources BCS301
⇒ 𝑆𝑆𝑇 = 57.50 − 9
⇒ 𝑆𝑆𝑇 = 48.50
∴ 𝑆𝑆𝐸 = 𝑇𝑆𝑆 − 𝑆𝑆𝑅 − 𝑆𝑆𝐶 − 𝑆𝑆𝑇 ⇒ 𝑆𝑆𝐸 = 113 − 46.5 − 7.5 − 48.50 = 10.5, We
know that 𝐹 (3,6) = 4.76
5. Five varieties of paddy A, B, C, D, and E are tried. The plan, the varieties shown in each plot
and yields obtained in Kg are given in the following table (LSD)
B E C A D
95 85 139 117 97
E D B C A
90 89 75 146 87
C A D B E
116 95 92 89 74
A C E D B
85 130 90 81 77
D B A E C
87 65 99 89 93
Test whether there is a significant difference between rows and columns at 5% LOS.
T T2
B E C A D
-5 -15 39 17 -3 33 1089
E D B C A -
-10 -11 -25 46 -13 13 169
C A D B E
-
16 -5 -8 -11 -26
34 1156
A C E D B -
-15 30 -10 -19 -23 37 1369
D B A E C -
-13 -35 -1 -11 -7 67 4489
P -27 -36 -5 22 -72 = - 118
2
P 729 1296 25 484 5184 - -
B E C A D
25 225 1521 289 9
E D B C A
100 121 625 2116 169
C A D B E
256 25 64 121 676
A C E D B
225 900 100 361 529
D B A E C
169 1225 1 121 49
∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 =10022
775 2496 2311 3008 1432
𝑇2 (−118)2 13924
Correction Factor 𝐶𝐹 = = = = 557
𝑁 25 25
𝑃𝑖 2
Sum of the column squares 𝑆𝑆𝐶 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
Observations 𝑄 𝑄2
= ∑(𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠)
A 17 - -5 -15 -1
13 -17 289
B -5 - -11 -23 -35
25 -99 9801
C 39 46 16 30 -7 124 15376
D -3 - -8 -19 -13
11 -54 2916
E -15 - -26 -10 -11
10 -72 5184
𝑄𝑖 2
Sum of the squares of treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
6. Present your conclusions after doing analysis of variance to the following results of
the Latin-square design experiment conducted in respect of five fertilizers which
A B C D E
16 10 11 9 9
E C A B D
10 9 14 12 11
B D E C A
15 8 8 10 18
D E B A C
12 6 13 13 12
C A D E B
13 11 10 7 14
Sol. Given observations are
A B C D E
16 10 11 9 9
E C A B D
10 9 14 12 11
B D E C A
15 8 8 10 18
D E B A C
12 6 13 13 12
C A D E B
13 11 10 7 14
T T2
A B C D E
6 0 1 -1 -1 5 25
E C A B D
0 -1 4 2 1 6 36
B D E C A
5 -2 -2 0 8 9 81
D E B A C
2 -4 3 3 2 6 36
C A D E B
3 1 0 -3 4 5 25
P 16 -6 6 1 14 = 31
2
P 256 36 36 1 196 - -
The squares are as follows:
A B C D E
36 0 1 1 1
E C A B D
0 1 16 4 1
B D E C A
25 4 4 0 64
D E B A C
4 16 9 9 4
C A D E B
9 1 0 9 16
∑ ∑ 𝑥𝑖𝑗 2
𝑖 𝑗
74 22 30 23 86 = 235
𝑇2 (31)2 961
Correction Factor 𝐶𝐹 = = = = 38.44
𝑁 25 25
25 36 81 36 25
𝑆𝑆𝑅 = + + + + − 38.44
5 5 5 5 5
⇒ 𝑆𝑆𝑅 = 5 + 7.2 + 16.2 + 7.2 + 5 − 38.44
⇒ 𝑆𝑆𝑅 = 40.60 − 38.44
⇒ 𝑆𝑆𝑅 = 2.16
Prepared by: PURUSHOTHAM P, SJC INSTITUTE OF TECHNOLOGY and TAKEITEASY ENGINEERS P a g e | 26
Inspire before you expire…, TIE- Notes and Resources BCS301
𝑃𝑖 2
Sum of the column squares 𝑆𝑆𝐶 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
256 36 36 1 196
𝑆𝑆𝐶 = + + + + − 38.44
5 5 5 5 5
⇒ 𝑆𝑆𝐶 = 51.2 + 7.2 + 7.2 + 0.2 + 39.2 − 38.44
⇒ 𝑆𝑆𝐶 = 105 − 38.44 ⇒ 𝑆𝑆𝐶 = 66.56
To find the sum of the treatments
Observations 𝑄 𝑄2
= ∑(𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠)
A 6 4 8 3 1 22 484
B 0 2 5 3 4 14 196
C 1 -1 0 2 3 5 25
D -1 1 -2 2 0 0 0
E -1 0 -2 -4 -3 -10 100
𝑄𝑖 2
Sum of the squares of treatments 𝑆𝑆𝑇 = ∑𝑖 − 𝐶𝐹
𝑛𝑖
484 196 25 0 100
𝑆𝑆𝑇 = + + + + − 38.44
5 5 5 5 5
⇒ 𝑆𝑆𝑇 = 96.8 + 39.2 + 5 + 0 + 20 − 38.44
⇒ 𝑆𝑆𝑇 = 161 − 38.44
⇒ 𝑆𝑆𝑇 = 122.56
∴ 𝑆𝑆𝐸 = 𝑇𝑆𝑆 − 𝑆𝑆𝑅 − 𝑆𝑆𝐶 − 𝑆𝑆𝑇
⇒ 𝑆𝑆𝐸 = 196.56 − 2.16 − 66.56 − 122.56
⇒ 𝑆𝑆𝐸 = 5.28
Sources d.f. SS MSS F Ratio Conclusion
variation
Rows 5-1=4 SSR=2.16 2.16 0.54 𝐹𝑟 < 𝐹(4,12)
𝑀𝑆𝑅 = 𝐹𝑟 =
4 0.44 𝐻0 -Accepted
= 0.54 = 1.227
7. Set up ANOVA table for the following information relating to three drugs testing to judge the effectiveness in
reducing blood pressure for three different groups of people:
X Y Z
A 14 10 11
15 9 11
B 12 7 10
11 8 11
C 10 11 8
11 11 7
Do the drugs act differently? Are the different groups of people affected differently? Is the interaction term
significant? Answer the above questions taking a significant level of 5%.
Sol.
Given observations from different people (A, B, C) to the different drugs (X, Y, Z) are as
Group Drug T T2
of
X Y Z
people
A 14 10 11 70 4900
15 9 11
B 12 7 10 59 3481
11 8 11
C 10 11 8 58 3364
11 11 7
P 73 56 58 =187 -
Where N=6+6+6=18
𝑇2 (187)2 34969
Correction Factor 𝐶𝐹 = = = = 1942.722
𝑁 18 18
The squares are as follows
Group Drug Sum of
of Squares
people X Y Z