Probability
Probability
Sample Space
Definition: A sample space of an experiment is a set or collection of all possible outcomes of
the same experiment such that any outcome of the experiment corresponds to exactly one
element in the set. A sample space is usually denoted by the symbol S.
• Discrete Sample Space
Definition: If a sample space contains a finite number of possibilities or an unending sequence
with as many elements as there are whole numbers, it is called a discrete sample space.
Example 1: If the experiment consists of rolling a die, then the sample space can be presented
as follows:
S = {x|x = 1, 2, 3 … … … 6}
where x represents the number appearing on the uppermost face of the die. A more fundamental
sample space for the above experiment is as follows:
S = {1, 2, 3, 4, 5, 6}
Example 2: A businessman may make a profit, encounter a loss or may have on a break-even
point while running his business. With these possible outcomes, the possible sample space is:
S = {Profit, Loss, Break − even }
Example 3: If the experiment involves rolling a pair of dice, then the resulting sample space
is of the following form:
Page 1 of 68
(1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
S=
(4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
(5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
{(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)}
where the outcome (3,6), for example, is said to occur if 3 occurs on the first die and 6 occurs
on the second die.
• Continuous Sample Space
Definition: If a sample space contains an infinite number of possibilities equal to the number
of points on a line segment, it is called a continuous sample space.
Example: In the measurement of the longevity of the bulb, the sample space could be
S = {x | x ≥ 0}, where x is the time taken by the bulbs to burn out.
Event
When an experiment is performed, it can result in one or more experimental outcomes, which
are called events. Any subset of sample space S is known as an event.
Example-1: If an experiment consists of tossing two coins and noting whether they land Heads
(H) or Tails (T) then the set S is
S= {HH, HT, TH, TT}
If A = {HH, HT, then A is an event that the first c coin lands on heads.
Example-2: A businessman may make a profit, encounter a loss or may have on a break-even
point while running his business. With these possible outcomes, the possible sample space is
S= {Profit, Loss, Break-even}
If A= {Loss}, then A is an event that the businessman will incur a loss while running his
business.
• Equally Likely Events
Two or more events are said to be equally likely if they have the same chance of occurrence.
Example-1: In the experiment of testing a coin, if A be the event of getting a head and B be
the event of getting a tail then the events A and B are said to be equally likely events since both
the event have same chance of occurrence.
Example-2: In the experiment of testing a dice, if A be the event of getting 1 and B be the
event of getting 2 then the events A and B are said to be equally likely events since both the
event have same chance of occurrence.
Page 2 of 68
• Mutually Exclusive Events
Two or more events are said to be mutually exclusive if the happening of any one of the events
excludes the happening of all the others (events) that is, no two or more of the events can
happen simultaneously in the same trial. (The joint occurrence is not possible, disjoint events)
(A ∩ B = ∅).
Example-1: In coin tossing experiment, either head or tail will land in each trial. So, occurring
head and tail are mutually exclusive events.
• Exhaustive Events
The total number of possible outcomes in any trial is known as exhaustive events.
Example: When pesticide is applied, a pest may survive or die. There are two exhaustive
events: survival, death.
H HHH
H
T HHT
H
H HTH
T
T HTT
H THH
H
T THT
T
H TTH
T
T TTT
Page 3 of 68
is obtained by using the properties of conditional probability. We know that the conditional
probability of event A given that B has occurred is denoted by 𝑃(𝐴|𝐵) and is given by:
We aim to prove that if events A and B are independent, then the probability of both events A
and B occurring simultaneously is the product of their individual probabilities, i.e.,
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵)
We'll start by using the definition of conditional probability:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴|𝐵) × 𝑃(𝐵)
This equation represents the probability of both events A and B occurring simultaneously.
𝑃(𝐴|𝐵) denotes the conditional probability of event A occurring given that event B has already
occurred, and 𝑃(𝐵) is the probability of event B occurring.
Now, since events A and B are independent, the occurrence of one event does not affect the
occurrence of the other. Mathematically, this independence is expressed as: 𝑃(𝐴|𝐵) = 𝑃(𝐴)
This implies that the probability of event A occurring given that event B has occurred is the
same as the probability of event A occurring alone, which is P(A).
Substituting P(A|B) = P(A) into the equation for 𝑃(𝐴 ∩ 𝐵), we get:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵)
This concludes the proof of the multiplication rule of probability when events A and B are
independent.
Permutation
A permutation is an arrangement of objects in a definite order.
If 𝑛 is the total number of objects and 𝑟 is the number of objects you want to arrange, the
number of permutations at a time is given by,
𝑛
𝑛!
𝑃𝑟 =
(𝑛 − 𝑟)!
Example: If 𝑎, 𝑏, 𝑐 and 𝑑 are four letters and if we want to arrange these by taking two at a
time, the possible arrangements are: 𝑎𝑏, 𝑎𝑐, 𝑎𝑑, 𝑏𝑎, 𝑐𝑎, 𝑑𝑎, 𝑏𝑐, 𝑐𝑏, 𝑏𝑑, 𝑑𝑏, 𝑐𝑑 and 𝑑𝑐.
Here, 𝑛 = 4, 𝑟 = 2.
The number of permutations of 𝑛 distinct objects taken 2 at a time is,
4
4! 4×3×2×1
𝑃2 = = = 12
(4 − 2)! 2×1
Page 4 of 68
When one or more objects are repeated, the number of permutations needs adjustment. The
number of distinct permutations of 𝑛 things of which 𝑛1 are of one kind, 𝑛2 are of a second
kind… and 𝑛𝑘 of a 𝑘th kind is,
𝑛!
𝑛1 ! 𝑛2 ! … . 𝑛𝑘 !
Example: There are 9 birthday candles, of which four are yellow, three are red and two are
9!
blue. The number of ways these candles can be arranged in 9 positions is = 1260.
4!3!2!
Combination
Very often we are interested in the number of ways of selecting 𝑟 objects from 𝑛 without regard
to order of arrangements. These selections are called combinations. Here the arrangements 𝑎𝑏
and 𝑏𝑎 are regarded as the same. Thus, with the 3 letters 𝑎, 𝑏, 𝑐 the number of permutations
taking two at a time, the arrangements are 𝑎𝑏, 𝑏𝑎, 𝑎𝑐, 𝑐𝑎, 𝑏𝑐 𝑎𝑛𝑑 𝑐𝑏. But if the order of the
arrangements is disregarded (𝑖. 𝑒 𝑏𝑎 = 𝑎𝑏, 𝑎𝑐 = 𝑐𝑎, 𝑏𝑐 = 𝑐𝑏), the number of combinations
will be 3. A combination is a partition with two cells. The one cell containing 𝑟 objects selected
and the other cell containing (𝑛 − 𝑟) objects that are left.
Let (𝑛𝑟) denote the number of combinations of 𝑛 objects taken 𝑟 at a time irrespective of order.
The symbol (𝑛𝑟) is sometimes called binomial coefficient. We have noted earlier that for each
of set 𝑟 thing, there are 𝑟! permutations. Since combination of 𝑟 things are a set with 𝑟 elements,
(𝑛𝑟)𝑟! must be equal to the number of permutations of 𝑛 things taken 𝑟 at a time.
𝑛! 𝑛!
Thus, (𝑛𝑟) × 𝑟! = and (𝑛𝑟) =
(𝑛−𝑟)! 𝑟!(𝑛−𝑟)!
In particular, (10
7
) = (10
3
), (50
48
) = (50
2
), 𝑒𝑡𝑐.
Example: There are 20 people in a room of whom 12 are men and 8 are women. A committee
of 3 is to have formed them. How many ways can this be done? If it is desired that the
committee will consist of 2 men and 1 woman, in how many ways can this be done?
Solution: It is evident that the order of the selected people is of no importance. Therefore, this
is a problem of combination with 𝑛 = 20 𝑎𝑛𝑑 𝑟 = 3.
Page 5 of 68
The number of ways in which 2 men and 1 woman will be in the committee is (12
2
) × (81) =
528.
Definition of Probability
In assigning probabilities to experimental outcomes, there are various acceptable approaches.
There are usually three different approaches to define probability.
1. Classical approach
2. The relative frequency approaches.
3. Subjective approach
Here a brief description of these approaches is given below:
Classical Approach
Definition: If a random experiment can result in 𝑛(𝑆) mutually exclusive, exhaustive and
equally likely outcomes and if 𝑛(𝐴) of these outcomes are favorable to an event A, then the
probability of A is the ratio of 𝑛(𝐴) to 𝑛(𝑆). Symbolically
𝑛(𝐴)
𝑃(𝐴) =
𝑛(𝑆)
The definition under classical approach is also known as a priori or mathematical definition.
Example: An ordinary die is rolled once. Find the probability that,
i) an even number occurs?
ii) a number greater than 4 occurs?
Solution: Let S = {1,2, … . ,6}.If A denotes an even number and B a number greater than 4, then
A = {2,4,6} and B = {5,6} then,
𝑛(𝐴) 3 1
I. 𝑃(𝐴) = = =
𝑛(𝑆) 6 2
𝑛(𝐵) 2 1
II. 𝑃(𝐵) = = =
𝑛(𝑆) 6 3
Example: A bag contains 4 white and 6 red balls. A ball is drawn at random from the bag.
What is the probability it is red? That it is white? Are the events obtaining a red ball and
obtaining a white ball equally likely?
Solution: Here 𝑛(𝑆) = 10, 𝑛(𝑊) = 4, 𝑎𝑛𝑑 𝑛(𝑅) = 6.Hence,
𝑛(𝑅) 6
i. 𝑃(𝑅) = = = 0.6
𝑛(𝑆) 10
𝑛(𝑊) 4
ii. 𝑃(𝑊) = = = 0.4
𝑛(𝑆) 10
Since 𝑃(𝑅) ≠ 𝑃(𝑊),the occurrences of the event R and W are not equally likely.
Example: An ordinary die is rolled once. Find the probability that,
Page 6 of 68
i. An even number occurs?
ii. A number greater than 4 occurs?
Solution: Let 𝑆 = {1,2,3,4,5,6}. If A denotes an even number and B a number greater than 4,
then 𝐴 = {2,4,6} and 𝐵 = {5,6},then
𝑛(𝐴) 3 1
i. 𝑃(𝐴) = = =
𝑛(𝑆) 6 2
𝑛(𝐵) 2 1
ii. 𝑃(𝐵) = = =
𝑛(𝑆) 6 3
Example: A newly married couple plans to have two children and suppose that each child is
equally likely to be a boy or a girl. In order to find a sample space for this experiment, let B
denote that a child is a boy and G denote that a child is a girl. Then one possible sample space
that can be formed is S= {BB, BG, GB, GG}
The double BG, for instance represents the outcome 'the older child is a boy', while 'the younger
one is girl.
a. What is the probability that the couple will have two boys?
b. What is the probability that the couple will have one boy and one girl?
c. What is the probability that the couple will have at most one boy?
Solution: Let A1, A2, and A3 be the events that the couple will have two boys, one boy one
girl, and at most one boy respectively so that,
A1= {BB}, A2= {BG, BB}, A3= {BG, GB, GG}
Since but assumption, all the points in S are equally likely, that is,
𝑃(𝐵𝐵) = 𝑃(𝐵𝐺) = 𝑃(𝐺𝐵) = 𝑃(𝐺𝐺) = 1⁄4. Hence,
𝑛(𝐴1) 1
a. 𝑃(A1) = = 𝑃(𝐵𝐵) =
𝑛(𝑆) 4
𝑛(𝐴2) 1 1 1
b. 𝑃(𝐴2) = = 𝑃(𝐵𝐺) + 𝑃(𝐺𝐵) = + =
𝑛(𝑆) 4 4 2
𝑛(𝐴3) 1 1 1 3
c. 𝑃(𝐴3) = = 𝑃(𝐵𝐺) + 𝑃(𝐺𝐵) + 𝑃(𝐺𝐺) = + + =
𝑛(𝑆) 4 4 4 4
Example: A fair coin is tossed three times. Compute the probability that,
a. Exactly two tosses result in heads?
b. At most one toss results is head?
Solution: The experiment consists of observing the outcome for each of the three tosses of
the coin. One of the ways of presenting the sample space is as follows:
S= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
Because the coin is fair, we would expect the outcomes to be equally likely. That is, if Ai
represents the i-th outcome, then,
Page 7 of 68
1
𝑃(Ai ) =
8
For (a), the interest be A, so that,
A= {HHT, HTH, THH}
1 1 1 3
𝑃(𝐴) = 𝑃(𝐻𝐻𝑇) + 𝑃(𝐻𝑇𝐻) + 𝑃(𝑇𝐻𝐻) = + + =
8 8 8 8
For(b), The event of interest be B, so that,
B= {TTT, THT, TTH, HTT}
And hence
1 1 1 1 1
𝑃(𝐵) = 𝑃(𝑇𝑇𝑇) + 𝑃(𝑇𝐻𝑇) + 𝑃(𝑇𝑇𝐻) + 𝑃(𝐻𝑇𝑇) = + + + =
8 8 8 8 2
For coin tossing experiment, find the probability of obtaining (a) exactly two runs and (b) Less
than two runs.
Solution: Any unbroken sequence of like letters is called a run, even though the sequence has
only one letter. Thus, the outcome HHH has one run while the outcome HTT has two runs.
We now enumerate the number of runs in the above coin tossing experiment in a tabular form
as below:
Outcome Event Number of runs
HHH A1 1
HHT A2 2
HTH A3 3
HTT A4 2
THH A5 2
THT A6 3
TTH A7 2
TTT A8 1
Let 𝑋 denote the number of runs. Hence, for (a), there are four cases which are favorable to (a)
and hence the required probability is
4 1
𝑃(𝑋 = 2) = 𝑃(𝐴2 ) + 𝑃(𝐴4 ) + 𝑃(𝐴5 ) + 𝑃(𝐴7 ) = = .
8 2
Similarly, for (b) the event of interest is 𝑋 < 2 and number of cases favourable to this event is
2, so that the required probability is
Page 8 of 68
2 1
𝑃(𝑋 < 2) = 𝑃(𝐴1 ) + 𝑃(𝐴8 ) = =
8 4
A businessman has a stock of 8400 baby wears imported from 5 different countries. The
distribution of the wears was as follows:
Country Number of wears
USA 1500
India 1200
China 2700
Korea 1000
Thailand 2000
Total 8400
A piece of baby wear was selected at random. What is the probability that it was imported from
(i)USA, from (ii)China and (iii) either from India or from Thailand?
Solution: Using classical definition of probability, we find that
1500 2700 1200 2000
𝑃(𝑈𝑆𝐴) = = 0.18, 𝑃(𝐶ℎ𝑖𝑛𝑎) = = 0.32 𝑃(𝐼𝑛𝑑𝑖𝑎 𝑜𝑟 𝑇ℎ𝑎𝑖𝑙𝑎𝑛𝑑) = + =
8400 8400 8400 8400
Axioms of Probability
Axiom 1: 𝟎 ≤ 𝑷(𝑨) ≤ 𝟏
The relative frequency of occurrence of any event must be greater than or equal to zero.
For example
For one dice, total sample space, 𝑆 = {1,2,3,4,5,6}
Let, A is an event, 𝐴 = {1,2,3}
Now probability of A
𝑛(𝐴)
𝑃(𝐴) = [ 𝑛(𝐴) = 𝑛𝑜 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑒𝑣𝑒𝑛𝑡 ; 𝑛(𝑆) = 𝑛𝑜 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠]
𝑛(𝑆)
3 1
= = >0
6 2
Axiom 2: 𝑷(𝑺) = 𝟏
The probability of whole sample space is equal to one.
For example:
Sample space, 𝑆 = { 1,2,3,4,5,6}
Now,
Page 9 of 68
𝑛(𝑆)
𝑃(𝑆) = =1
𝑛(𝑆)
Hence,
1 1
P 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) = + =1
2 2
Page 10 of 68
probability of A and B, we ascertain the probability of the specific outcome 𝐻𝑇, yielding
𝑃 (𝐴 ∩ 𝐵) = 𝑃(𝐻𝑇) = 1/4.
To calculate 𝑃(𝐴 ∩ 𝐵), we need to find the probability of drawing a red card first and a face
card second.
There are 26 red cards in a deck of 52, and 12 face cards (4 Jacks, 4 Queens, and 4 Kings).
26 1
So, 𝑃(𝐴) = = (probability of drawing a red card first)
52 2
Now, after drawing a red card, there are 51 cards remaining, out of which 12 are face cards.
12
So, 𝑃(𝐵|𝐴) = (probability of drawing a face card given that the first card drawn was red)
51
1 12
Hence, 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) ∗ 𝑃(𝐵|𝐴) = ( ) ∗ ( )
2 51
Page 11 of 68
Properties of Joint Probability
Understanding the properties of joint probability elucidates its utility in probabilistic analysis.
Key properties include:
• Commutativity: The order of events does not influence the joint probability.
Mathematically, 𝑃 (𝐴 ∩ 𝐵) = 𝑃(𝐵 ∩ 𝐴).
• Independence: If events A and B are independent, their joint probability simplifies to
the product of their individual probabilities, i.e., 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) ∗ 𝑃(𝐵).
• Conditional Probability: Joint probability facilitates the computation of conditional
probability, enabling the assessment of the likelihood of one event given the occurrence
of another.
Related Questions:
Problem 01: In an office of 100 employees, 75 read English, 50 read Bangla dailies and 40
read both. An employee is selected at random. What is the probability that the selected
employee
(a) reads English newspaper? (b) reads at least one of the papers? (c) reads none? (d) reads
English but not Bangla?
Solution: Let us define the above events,
𝐸 = Reads English, 𝐵 = Reads Bangla, 𝐸̅ ∩ 𝐵̅ = Reads none,
𝐵 ∩ 𝐸̅ = Reads Bangla but not English.
The number of the cases favorable to the above events can be placed in a tabular form as
follows:
𝐸 𝐸̅ Total
𝐵 𝑛(𝐵 ∩ 𝐸) = 40 𝑛(𝐵 ∩ 𝐸̅ ) =? 𝑛(𝐵) = 50
𝐵̅ 𝑛(𝐵̅ ∩ 𝐸) =? 𝑛(𝐵̅ ∩ 𝐸̅ ) =? 𝑛(𝐵̅) = 50
Total 𝑛(𝐸) = 75 𝑛(𝐸̅ ) = 25 𝑛(𝑆) = 100
Page 12 of 68
75 50 40 85
=( + − )= = 0.85
100 100 100 100
(c) The probability that selected employee reads none is:
̅̅̅̅̅̅̅̅̅̅
𝑃(𝐵̅ ∩ 𝐸̅ ) = 𝑃((𝐸 ∪ 𝐵)) = 1 − 𝑃(𝐸 ∪ 𝐵) = (1 − 0.85) = 0.15
(d) The probability that selected employee reads Bangla but not English is:
𝑛(𝐵) − 𝑛(𝐵 ∩ 𝐸) 50 − 40 10
𝑃(𝐵 ∩ 𝐸̅ ) = = = = 0.10
𝑛(𝑆) 100 100
Problem 02: Of the total students at a women’s college, 60% wear neither a ring nor a
necklace, 20% wear a ring, and 30% wear a necklace. If one of the women in randomly chosen,
find the probability that she is wearing (a) A ring or a necklace (b) Both.
Solution:
Let 𝑅 and 𝑁 respectively denotes the events that a woman wears a ring and a necklace. We are
given that, 𝑃(𝑅) = 0.20, 𝑃(𝑁) = 0.30, 𝑃(𝑅̅ ∩ 𝑁
̅) = 0.60
(a) The probability that she is wearing a ring or a necklace is:
̅̅̅̅̅̅̅̅
𝑃(𝑅 ∪ 𝑁) = 1 − 𝑃(𝑅 ∪ 𝑁) = 1 − 𝑃(𝑅̅ ∩ 𝑁
̅) = 1 − 0.60 = 0.40
(b) The probability that she wears both is:
𝑃(𝑅 ∩ 𝑁) = 𝑃(𝑅) + 𝑃(𝑁) − 𝑃(𝑅 ∪ 𝑁) = 0.20 + 0.30 − 0.40 = 0.10
Problem 03: A class contains 10 men and 20 women of whom half of the men and half of the
women have brown eyes. A person is chosen at random. What is the probability that the person
is either a man or has brown eyes?
Solution: To solve this problem, we can construct a table as follows:
Man (𝑀) Woman (𝑊) Total
Brown (𝐵) 𝑛(𝑀 ∩ 𝐵) = 5 𝑛(𝑊 ∩ 𝐵) = 10 𝑛(𝐵) = 15
Not brown (𝐵̅) 𝑛(𝑀 ∩ 𝐵̅) = 5 𝑛(𝑊 ∩ 𝐵̅) = 10 𝑛(𝐵̅) = 15
Total 𝑛(𝑀) = 10 𝑛(𝑊) = 20 𝑛(𝑆) = 30
Page 13 of 68
𝑛(𝐵) 15 1
𝑃(𝐵) = = =
𝑛(𝑆) 30 2
𝑛(𝑀 ∩ 𝐵) 5 1
𝑃(𝑀 ∩ 𝐵) = = =
𝑛(𝑆) 30 6
Thus,
1 1 1 2
𝑃(𝑀 ∪ 𝐵) = ( + − ) =
3 2 6 3
2
The probability that the person is either a man or has brown eyes is .
3
Conditional Probability
The probability of an event A when it is known that some other event B has occurred is called
a conditional probability and is denoted by P(A|B). The symbol P(A|B) is usually read as ‘the
probability that A occurs given that B occurs or simply probability of A given B, where the
slash ‘|’ stands for ‘given that’. In general P(A|B) is not equal to P(A).
With two events A and B, the most fundamental formula to compute conditional probability
for A given B is
𝑃(𝐴∩𝐵)
𝑃(𝐴|𝐵) = , 𝑃(𝐵) ≠ 0
𝑃(𝐵)
Page 14 of 68
Where C|A∩ 𝐵 is read as “C occurs given that A and B have already occurred”.
For k events, the rule is as follows:
𝑃(𝐴1 ∩ 𝐴2 ∩ … . .∩ 𝐴𝐾 ) = P(𝐴1 )P(𝐴2 |𝐴1 )P(𝐴3 |𝐴1 ∩ 𝐴2 )…P(𝐴𝐾 |𝐴1 ∩ 𝐴2 ….∩ 𝐴𝐾−1 ).
Refer to the values in the preceding section. Suppose that the selected person was known to be
a male. We now ask: what is the probability under the changed situation that he is employed?
This is a problem of conditional probability and we symbolically write this as P(E|M).
The probability P(E|M) can be computed once P(E∩ 𝑀) and P(M) are known from the
original sample space:
𝑃(𝐸∩𝑀) .51
𝑃(𝐸|𝑀) = = =0.93.
𝑃(𝑀) .55
An alternative way of computing P(E|M) is to use the reduced sample space M, which is a
part of S. To accomplish this, note that
𝑛(𝐸∩𝑀) 𝑛(𝑀)
𝑃(𝐸|𝑀) = and P(M) =
𝑛(𝑆) 𝑛(𝑆)
As before.
Example 01: A pair of dice is thrown. Find the probability that sum of the points on the two
dice is 10 or greater if a 5 appears on the first die.
Solution: Let A be the event that sum of the points on the two dice is 10 or greater and B be
the event that a 5 appears on the first toss. Symbolically, we want to evaluate the conditional
probability 𝑃(𝐴|𝐵).
Now,
A ={(4,6), (5,5), (5,6), (6,4), (6,5), (6,6)},
B ={(5,1), (5,2), (5,3), (5,4), (5,5), (5,6)}
𝐴 ∩ 𝐵 = {(5,5), (5,6)}
Hence,
𝑃(𝐴∩𝐵) 2/36 1
𝑃(𝐴|𝐵) = = =
𝑃(𝐵) 6/36 3
Alternatively, if B is considered as a reduced sample space, then only two sample points, viz.
(5,5) and (5,6) are favorable to the event that the sum is 10 or more. Since there are 6 sample
points in B , the required probability is
Page 15 of 68
2 1
𝑃(𝐴|𝐵) = = ,
6 3
as ought to be.
Example 02: the probability that a married man watches a certain TV show is 0.4 and that his
wife watches the show is 0.5. the probability that a man watches the show, given that his wife
does, is 0.7. Find
(a) The probability that a married couple watches the show.
(b) The probability that a wife watches the show given that her husband does.
(c) The probability that at least one of the partners will watch the show.
Solution: let us define two events H and W as follows:
H: Husband watches the show
W: Wife watches the show
We are given that
P(H)=0.4, P(W)=0.5 and P(H|W) = 0.7.
(a) The probability that the couple watches the show is
P(W∩ 𝐻)=P(W)P(H|W) = 0.5× 0.7 = 0.35
(b) The conditional probability that a wife watches the show given that her husband also
watches
𝑃(𝑊 ∩ 𝐻) 0.35
𝑃(𝑊|𝐻) = = = 0.875
𝑃(𝐻) 0.40
(c) The probability that at least one (either H or W or both) watches
𝑃(𝑊 ∪ 𝐻) = 𝑃(𝑊) + 𝑃(𝐻) − 𝑃(𝑊 ∩ 𝐻) = 0.40 + 0.50 − 0.35 = 0.55.
Example 03: A coin is tossed until a head appears or it has been tossed three times. Given
that the head does not appear on the first toss, what is the probability that the coin is tossed
three times?
Solution: A sample space for the experiment is S = {H, TH, TTH, TTT}.
The associated probabilities are
1 1 1 1
𝑃(𝐻) = , 𝑃(𝑇𝐻) = , 𝑃(𝑇𝑇𝐻) = , 𝑃(𝑇𝑇𝑇) = .
2 4 8 8
Let A be the event that the coin is tossed 3 times and B be the event that no heads appear in
the first toss so that
𝐴 = {𝑇𝑇𝐻, 𝑇𝑇𝑇}, 𝐵 = {𝑇𝐻, 𝑇𝑇𝐻, 𝑇𝑇𝑇},
and hence
𝐴 ∩ 𝐵 = {𝑇𝑇𝐻, 𝑇𝑇𝑇}.
Page 16 of 68
The associated probabilities are
1 1 1 1 1 1 1
𝑃(𝐴) = + = , 𝑃(𝐵) = + + =
8 8 4 4 8 8 2
1 1 1
and thus 𝑃(𝐴 ∩ 𝐵) = + = .
8 8 4
Example 04: In a community there are equal number of males and females. Suppose 5% of
the males and 2% of females are disabled. A person is chosen at random. If this person is male,
what is the probability that he is disabled?
Solution: Let D stand for the event ‘disabled’ and M and F respectively for male and female.
As males and females are in equal proportion,
𝑃(𝑀) = 𝑃(𝐹) = 0.5 .
Also
𝑃(𝑀 ∩ 𝐷) = 0.05, 𝑃(𝐹 ∩ 𝐷) = 0.02
We want the conditional probability that the selected person is disabled:
𝑃(𝐷 ∩ 𝑀) 𝑃(𝑀 ∩ 𝐷) 0.05
𝑃(𝐷|𝑀) = = = = 0.1
𝑃(𝑀) 𝑃(𝑀) 0.5
large. This idealized value is called the probability of the event 𝐴. Symbolically,
𝑚
Lim = 𝑃(𝐴).
𝑛→0 𝑛
Page 17 of 68
Relative Frequency .1724 .1664 .1628 .1648 .1672 .1664
Note: 1/6=0.166667
The definition provided under relative frequency approach is also known as aposteriori or
empirical or statistical definition of probability.
Conceptually, the frequency definition of probability is a more appropriate definition of
probability.
Example 1: suppose you want to predict that a student being admitted in the first year honors
class in economics will belong to tribal area in any particular year. if your admission records
of several years in the past reveals that 12 percent of the admitted students come from tribal
areas, then it might be reasonable to assume that the probability of a tribal student being
admitted in the class is approximately 0.12.
Example 2: The dean of science has noticed that, according to past records, only 55% of the
students who begin a program successfully graduate from the programs 4 years later. We
choose a name at random from the list of beginning students to evaluate the chance that he will
successfully graduate from the program in 4 years. Basing probabilities on the statistical record,
the student has a 55% chance and, hence, a probability of 55/100, or more simply 11/20 of
graduating successfully. This is a problem that falls under the frequency interpretation of
probability.
Independence of Events
If A and B are two events and if the occurrence of A does not affect, and is not affected by the
occurrence of B, then A and B are said to be independent. Two events A and B are said to be
independent if and only if 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵).
Example 1: Three coins are tossed. Show that the events “heads on the first three coin” and
the event “tails on the last two” are independent.
Let,
Page 18 of 68
A: head on the first
Now,
4 1 2 1 1
𝑃(𝐴) = = , 𝑃(𝐵) = = , 𝑃(𝐴 ∩ 𝐵) =
8 2 8 4 8
Since,
1 1 1
𝑃(𝐴) × 𝑃(𝐵) = × = = 𝑃(𝐴 ∩ 𝐵)
2 4 8
Example 2: In a community, 36% of the families own a dog and 22% of the families own both
a dog and a cat. If a randomly selected family owns a dog, what is the probability that it owns
a cat too?
Then
Hence
0.22
𝑃(𝐶) = = 0.61
0.36
Page 19 of 68
Independence of more than two events:
Bayes’ Theorem
The Bayes’ Theorem is a mathematical formula used to determine the conditional probability
of events. Let 𝐴 and 𝐵 be two events and let𝑃(𝐴|𝐵)be the conditional probability of 𝐴 given
𝐵 has occurred. Then, Bayes’ theorem states that-
𝑃(𝐴 ∩ 𝐵) 𝑃(𝐵|𝐴)𝑃(𝐴)
𝑃(𝐴|𝐵) = =
𝑃(𝐵) 𝑃(𝐵|𝐴)𝑃(𝐴) + 𝑃(𝐵|𝐴𝑐 )𝑃(𝐴𝑐 )
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵|𝐴) =
𝑃(𝐴)
𝑃(𝐵) = 𝑃((𝐵 ∩ 𝐴) ∪ (𝐵 ∩ 𝐴𝑐 ))
Note that, 𝐴 ∩ 𝐵 = 𝐵 ∩ 𝐴
Now,
𝑃(𝐴 ∩ 𝐵) 𝑃(𝐵|𝐴)𝑃(𝐴)
𝑃(𝐴|𝐵) = = ; [𝑈𝑠𝑖𝑛𝑔 → (1)]
𝑃(𝐵) 𝑃(𝐴)
𝑃(𝐵|𝐴)𝑃(𝐴)
⟹ 𝑃(𝐴|𝐵) = ; [𝑈𝑠𝑖𝑛𝑔 ⟶ (2)]
𝑃(𝐵|𝐴)𝑃(𝐴) + 𝑃(𝐵|𝐴𝑐 )𝑃(𝐴𝑐 )
Proved
Page 20 of 68
For the events 𝐴, 𝐵, 𝐶, 𝐷;
𝑃(𝐴)𝑃(𝐷|𝐴) 𝑃(𝐴)𝑃(𝐷|𝐴)
𝑃(𝐴|𝐷) = =
𝑃(𝐷) 𝑃(𝐴)𝑃(𝐷|𝐴) + 𝑃(𝐵)𝑃(𝐷|𝐵) + 𝑃(𝐶)𝑃(𝐷|𝐶)
𝑃(𝐴𝑖 )𝑃(𝐷|𝐴𝑖 )
𝑃(𝐴|𝐷) = 𝑘
∑𝑖=1 𝑃(𝐴𝑖 ) 𝑃(𝐷|𝐴𝑖 )
Example 1: A blood test is 90% effective in detecting a certain disease when the disease is
present. However, the test yields a false-positive result for 5% of the healthy patients tested.
Suppose 1% of the population has the disease. Find the conditional probability that a randomly
chosen person actually has the disease given that his test result is positive.
Solution: Let,
Now,
𝑃(𝐷 ∩ 𝑃) 𝑃(𝐷)𝑃(𝑃 | 𝐷)
𝑃(𝐷 | 𝑃) = =
𝑃(𝑃) 𝑃(𝐷)𝑃(𝑃 | 𝐷) + 𝑃(𝐷𝑐 )𝑃(𝑃 |𝐷𝑐 )
0.01×0.90
⇒𝑃(𝐷| 𝑃) = = 0.15
(0.01×0.90)+(0.99×0.05)
Example 2: Consider two urns. The first urn contains 2 white and 7 black balls and second urn
contains 5 white & 6 black balls. We flip a fair coin and then draw a ball from the first urn or
the second urn depending on whether the outcome was heads or tails. What is the conditional
probability that the outcomes of the toss were heads given that a white ball was selected?
Solution: Let,
Now,
Page 21 of 68
𝑃(𝐻 ∩ 𝑊) 𝑃(𝐻)𝑃(𝑊 | 𝐻)
𝑃( 𝐻 | 𝑊 ) = =
𝑃(𝑊) 𝑃(𝑊)
𝑃(𝐻)𝑃(𝑊 |𝐻)
=
𝑃(𝐻)𝑃(𝑊 | 𝐻) + 𝑃(𝐻 𝑐 )𝑃(𝑊 |𝐻 𝑐 )
1 2
(2)(9) 22
= 1 2 1 5 = .
( )( )+( )( ) 67
2 9 2 11
Random Variable
A variable, whose values are any definite numbers or quantities that arise as result of chance
factors such that they cannot exactly be predicted in advance, is called a random variable.
The random variable may also be viewed as a function. For example, A school consists of 7
teachers of whom 4 are males and 3 are females. A committee of 2 teachers is to be formed.
If 𝑌 stands for the number of male teachers selected, then 𝑌 is a random variable assuming the
values of 0, 1 and 2. The possible outcomes and the values of the random variable 𝑌 are:
Page 22 of 68
Discrete random variable:
A random variable defined over a discrete sample space (i.e. that may only take on a finite or
countable number of different isolated values) is referred to as a discrete random variable.
Example:
A random variable defined over a continuous sample space (i.e. which may take on any value
in a certain interval) is referred to as a continuous random variable.
Example:
Probability Distribution:
Any statement of a function associating each of a set of mutually exclusive and exhaustive
classes or class intervals with its probability is a probability distribution. A probability
distribution will be either discrete or continuous according to the random variable of interest. It
is a mathematical function that describes the likelihood of obtaining the possible values that a
random variable can take. It provides a systematic way to assign probabilities to different
outcomes, reflecting the uncertainty associated with random phenomena. Probability
distributions can be categorized into two main types: discrete and continuous.
Types of Probability Distributions:
1. Discrete probability distribution: A distribution where the random variable can only
take distinct, separate values. Examples- Bernoulli distribution (models a binary
outcome), binomial distribution (describes the number of successes in a fixed number
of independent Bernoulli trials), Poisson distribution (models the number of events
occurring in a fixed interval of time or space).
2. Continuous probability distribution: A distribution where the random variable can take
any value within a given range. Examples- uniform distribution (all values within a
Page 23 of 68
given interval are equally likely), normal distribution (symmetric bell-shaped curve;
characterized by mean and standard deviation), exponential distribution (models the
time until an event occurs in a Poisson process). Understanding these types of
distributions is fundamental for analyzing and interpreting data in various fields, from
statistics and finance to science and engineering.
Example 1: A bag contains 10 balls of which 4 are black. If 3 balls are drawn at random without
replacement, obtain the probability distribution for the number of black balls drawn.
Solution: If 𝑋 denotes the number of black balls drawn, then clearly 𝑋 can assume values 0,
1, 2 and 3. To obtain the probability distribution of 𝑋 , we need to compute the probabilities
associated with 0, 1 2 and 3. Since 3 balls are to be chosen, the number of ways in which this
10
choice can be made is 𝐶3 .
Thus,
𝑋 𝑓(𝑥)
0 20/120
1 60/120
2 36/120
Example 2: An unbiased coin is tossed three times. Let 𝑋 be the number of runs obtained as a
result of the outcomes of this experiment. Find the probability distribution of 𝑋 .
Solution: Any unending sequence of a particular outcome will be counted as a run. Thus, for
the outcome 𝐻𝐻𝐻 , of runs is 1, since there no break or discontinuity in the sequence from the
first to the third outcome. Similarly, for the outcome 𝐻𝑇𝐻 , the number of runs is equal to 3.
We may put a bar on the top of the outcome to denote a run as follows: 𝐻 𝐻 𝐻, 𝐻 𝑇 𝐻 and so
on. We now construct a sample space for the above experiment along with the values of the
random variable 𝑋 together with their associated probabilities:
Page 24 of 68
Outcome Number of runs 𝑃(𝑋 = 𝑥)
(𝑋 = 𝑥)
𝐻𝐻𝐻 1 1/8
𝐻𝐻𝑇 2 1/8
𝐻𝑇𝐻 3 1/8
𝐻𝑇𝑇 2 1/8
𝑇𝐻𝐻 2 1/8
𝑇𝐻𝐻 3 1/8
𝑇𝑇𝐻 2 1/8
𝑇𝑇𝑇 1 1/8
The random variable 𝑋 is seen to take on three distinct values 1, 2 and 3 with probabilities
2/8 , 4/8 , and 2/8 respectively. The values of the random variables and their associated
probabilities are summarized in a tabular form, which appear below:
Example 3: Obtain the probability distribution of the number of turning points in all possible
permutations of first four natural numbers.
Solution: In a sequence of first three successive values, the middle-most value is said to form
a turning point if it is either greater or less than the value preceding or following it. Thus, in
sequence 1, 2, 3, the value 2 is greater than its preceding value (1) but less than 3. Hence the
number of turning points is 0 for this sequence. For the sequence 1, 4, 2, we have a turning
point, since 4 is greater than both 1 and 2, so is the case for 4, 2, 3. We now make all possible
permutations of the numbers 1, 2, 3 and 4:
Page 25 of 68
Permutation Turning points Permutation Turning points
1234 0 3124 1
1243 1 3142 2
1324 2 3241 2
1342 1 3214 1
1423 2 3412 2
1432 1 3421 1
2134 1 4123 1
2143 2 4132 2
2341 1 4231 2
2314 2 4213 1
2413 2 4312 1
2431 1 4321 0
Since these 24 possible permutations (4! = 24 ) are equally likely, each sequence has a
probability of 1/24. The random variable 𝑋 is seen to take on values 0, 1 and 2. Hence the
probability distribution of 𝑋 is as displayed in the accompanying table:
Values of 𝑋 = 𝑥 𝑃(𝑋 = 𝑥)
0 2/24
1 12/24
2 10/24
Total 1.0
If a random variable “𝑋” has a discrete distribution, the probability distribution of 𝑋 is defined
as the function 𝑓 such that for any real numbers 𝑥, 𝑓(𝑥) = 𝑃(𝑋 = 𝑥).
The function 𝑓(𝑥) defined above must satisfy the following conditions to be a pmf.
1. 𝑓(𝑥) ≥ 0
Page 26 of 68
2. ∑𝑥 𝑓(𝑥) = 1
3. 𝑃(𝑋 = 𝑥) = 𝑓(𝑥)
2𝑥−1
a. 𝑓(𝑥) = , 𝑥 = 0,1,2,3
8
3𝑥=6
b. 𝑓(𝑥) = , 𝑥 = 1,2
8
𝑥+1
c. 𝑓(𝑥) = , 𝑥 = 0,1,2,3
16
Solution:
3
∑ 𝑓(𝑥) = 𝑓(0) + 𝑓(1) + 𝑓(2) + 𝑓(3)
𝑥=0
1 1 3 5
=− + + + =1
8 8 8 8
1
Here, the total probability is 1 , but (𝑋 = 0) = 𝑓(0) = − , which provides impossible
8
9 12
b. Here, ∑2𝑥=1 𝑓(𝑥) = + =1
21 21
3 1 2 3 4
∑ 𝑓(𝑥) = + + +
𝑥=0 16 16 16 16
10
= ≠1
16
Here, the total probability is not 1. So, 𝑓(𝑥) is not a probability mass function.
Page 27 of 68
Example with PMF table
X 0 1 2 3 4 5 6 7
P(X=x) 0 k 2k 2k 3k 𝑘2 2𝑘 2 7𝑘 2 + 𝑘
Solution:
Therefore,
0+k+2k+2k+3k+𝑘 2 + 2𝑘 2 + 7𝑘 2 + 𝑘 = 1
(2) (i)
P(X<=6) = 1-P(X>6)
=1-(7𝑘 2 + 𝑘)
1 2 1
=1-(7 ( ) + ( ))
10 10
17
=1-( )
100
=(100-17)/100
=83/100
Page 28 of 68
Therefore, P(X<=6)=83/100
(ii)P(3<X<=6)=P(X=4)+P(X=5)+P(X=6)
=3𝑘 2 + 𝑘 2 + 2𝑘 2
3 1 2 1 2
=( ) + ( ) + 2 ( )
10 10 10
=3/10+1/100+2/100
=(30+3)/100
=33/100
P(3<x<=6) = 33/100
Solution:
𝑓(1) = 𝑃(𝑋 = 1) =4 𝐶1 ×3 𝐶1 / 7 𝐶2
𝑓(2) = 𝑃(𝑋 = 2) =4 𝐶2 ×3 𝐶0 / 7 𝐶2
X : 0 1 2
Page 29 of 68
Solution:
Since 𝑓(𝑥) is a probability function, ∑∞
𝑥=0 𝑓(𝑥) = 1
Now,
3 0
𝑓(0) = 𝛼 ( ) = α
4
3 1 3
𝑓(1) = 𝛼 ( ) = 𝛼 ( )
4 4
3 2
𝑓(2) = 𝛼 ( )
4
3 3
𝑓(3) = 𝛼 ( )
4
And so on.
Hence,
∞ 3 3 2 3 3
∑ 𝑓(𝑥) = 𝛼 + 𝛼 ( ) + 𝛼 ( ) + 𝛼 ( ) + ⋯ = 1
𝑥=0 4 4 4
3 3 2 3 3
⟹ 𝛼 (1 + + ( ) + ( ) + ⋯ ) = 1
4 4 4
1
⟹ 𝛼( )=1
3
1−
4
⟹𝛼×4=1
1
⟹𝛼=
4
1 1 𝑥
So, the complete 𝑝𝑚𝑓 of 𝑋 is 𝑓(𝑥) = ( ) ( ) , 𝑥 = 0,1,2 … . . ∞
4 4
Page 30 of 68
Discrete Distribution Function
In many occasions, we are interested to know the probability that a random variable takes on a
value less than or equal to a prescribed number 𝑥1 , say. If two dice are thrown, for example,
what is the probability that the sum is less than or equal to 5? If 3 coins are tossed, what is the
probability that at most 2 show heads? Answer to such questions is provided by what is known
as the cumulative distribution function, which applies to both continuous variable and discrete
variable.
Definition: The cumulative distribution function (CDF) or simply the distribution function
𝐹(𝑥) of a discrete random variable 𝑋 with probability function 𝑓(𝑥) defined over all real
numbers 𝑥 is the cumulative probability up to and including the point 𝑥. Symbolically, it is
defined as follows:
The function 𝐹(𝑥) is monotonic increasing, i.e. 𝐹(𝑎) ≤ 𝐹(𝑏), whenever 𝑎 ≤ 𝑏. And the limit
of 𝐹 to the left is 0 and to the right is 1:
The value of 𝐹(𝑥) at any point must be a number in the interval 0 ≤ 𝐹(𝑥) ≤ 1, because 𝐹(𝑥)
is the probability of the event (𝑋 ≤ 𝑥).
Example: Consider the following table shows the probability distribution of a random variable
𝑋
𝑿: 0 1 2 3
𝒇(𝒙): 20/120 60/120 36/120 4/120
Here,
20
𝐹(0) = 𝑃(𝑋 = 0) = 𝑓(0) =
120
80
𝐹(1) = 𝑃(𝑋 ≤ 1) = 𝑓(0) + 𝑓(1) = 𝐹(0) + 𝑓(1) =
120
116
𝐹(2) = P(X ≤ 2) = 𝑓(0) + 𝑓(1) + 𝑓(2) = 𝐹(1) + 𝑓(2) =
120
Page 31 of 68
𝐹(3) = 𝑃(𝑋 ≤ 3) = 𝑓(0) + 𝑓(1) + 𝑓(2) + 𝑓(3) = 𝐹(2) + 𝑓(3) = 1
𝑋 <0 0 1 2 3 >3
𝑓(𝑥) 0 20⁄120 60⁄120 36⁄120 4⁄120 0
𝐹(𝑥) 0 20⁄120 80⁄120 116⁄120 1 1
0,
20 𝑥<0
,
120 0≤𝑥≤1
80
𝐹(𝑥) = , 1≤𝑥< 2
120
116 2≤𝑥<3
,
120 𝑥≥3
{ 1,
Example: A coin is tossed three times. If 𝑋 is the random variable representing the number
of heads obtained, fined the probability distribution of 𝑋 and hence obtain 𝐹(𝑥).
Solution:
Sample space, 𝑆 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇}
[𝑋 = 3,2,2,1,2,1,1,0]
𝑋 = 0,1,2,3
So,
1
𝑃(𝑋 = 0) = 𝑃(𝑇𝑇𝑇) =
8
3
𝑃(𝑋 = 1) = 𝑃(𝐻𝑇𝑇, 𝑇𝐻𝑇, 𝑇𝑇𝐻) =
8
And so on.
Page 32 of 68
So, the probability distribution is:
𝑿: 0 1 2 3
𝒇(𝒙): 1/8 3/8 3/8 1/8
Therefore,
1
𝐹(0) = 𝑓(0) = ,
8
4
𝐹(1) = 𝑓(0) + 𝑓(1) = ,
8
7
𝐹(2) = 𝑓(0) + 𝑓(1) + 𝑓(2) = ,
8
Thus,
0, 𝑥<0
1
, 0≤𝑥<1
8
4
𝐹(𝑥) = , 1≤𝑥<2
8
7
, 2≤𝑥<3
8
{ 1, 𝑥≥3
Example: A six-sided fair is rolled once. Obtain the probability distribution of the number of
points on it and hence the distribution function.
Solution: Let 𝑋 be the number of points on the die so that 𝑋 = 1,2,3,4,5,6. If P assigns equal
mass to each of points on the die, then clearly
1
𝑓(x) = 𝑃(𝑋 = x) =
6
Page 33 of 68
Then
0, <1
1
, 1≤𝑥<2
6
2
, 2≤𝑥<3
6
3
𝐹 (𝑥 ) = 𝑃(𝑋 ≤ 𝑥) = , 3≤𝑥<4
6
4
, 4≤𝑥<5
6
5
, 5≤𝑥<6
6
{1, 6≤𝑥
In dealing with continuous variable, 𝑓(𝑥) is usually called a probability density function (pdf)
or simply density function. A formal definition of a probability density function may be
presented as follows:
The above definition leads to conclude that a pdf is one that possesses the following properties:
1. 𝑓(𝑥) ≥ 0
∞
2. ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1
𝑏
3. 𝑃(𝑎 < 𝑋 < 𝑏) = ∫𝑎 𝑓(𝑥)𝑑𝑥
𝑓(𝑥) = 𝑘𝑥 , 0<𝑥<4
= 0, elsewhere
Page 34 of 68
Thus,
∞
𝑘 ∫ 𝑥𝑑𝑥 = 1
0
Or,
4
𝑥2
𝑘 [ ] = 1.
2 0
From which
1
𝑘= .
8
𝑥
𝑓(𝑥) = , 0<𝑥<4
8
= 0, elsewhere
(ii) Again,
2 2
1 𝑥2 3
𝑃(1 < 𝑋 < 2) = ∫ 𝑥𝑑𝑥 = [ ] =
8 16 1 16
1
and
4
1 4 𝑥2 3
𝑃(𝑋 > 2) = ∫2 𝑥𝑑𝑥 = [ ] = .
8 16 4 2
2
𝑓(𝑥) = (1 + 𝑥), 2<𝑥<5
27
= 0, elsewhere
∞
(i) Verify that it satisfies the condition ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1,
(ii) Find 𝑃(𝑋 < 4) and (iii) Find 𝑃(3 < 𝑋 < 4).
Page 35 of 68
Solution: (i) Integrating between 2 and 5
5 5 5
2 2 𝑥2
∫ 𝑓(𝑥)𝑑𝑥 = ∫(1 + 𝑥)𝑑𝑥 = [𝑥 + ] = 1
27 27 2 2
2 2
(ii) Since the lower limit is 2, we integrate between 2 and 4 to evaluate 𝑃(𝑋 < 4).
4
2 4 2 𝑥2 16
𝑃(𝑋 < 4) = ∫ (1 + 𝑥)𝑑𝑥 = 27 [𝑥 +
27 2 2 2
] = .
27
(iii) Evaluating the integral between 3 and 4, we obtain 𝑃(3 < 𝑋 < 4),
4
2 4 2 𝑥2 1
𝑃(3 < 𝑋 < 4) = ∫ (1 + 𝑥)𝑑𝑥 = 27 [𝑥 +
27 3 2 3
] = .
3
𝑑
𝑓(𝑥) = 𝐹(𝑥) = 𝐹′(𝑥).
𝑑𝑥
1. 𝐹(𝑥) > 0
2. 𝐹(−∞) = 0
3. 𝐹(+∞) = 1
𝑏 𝑎
4. 𝑃(𝑎 < 𝑋 < 𝑏) = 𝐹(𝑏) − 𝐹(𝑎) = ∫−∞ 𝑓(𝑥)𝑑𝑥 − ∫−∞ 𝑓(𝑥)𝑑𝑥
Page 36 of 68
Example: If 𝑋 has the density function
2
𝑓(𝑥) = (1 + 𝑥), 2<𝑥<5
27
= 0, elsewhere
Obtain the distribution function and hence find 𝐹(3) and 𝐹(4). Also verify 𝑃(3 < 𝑥 < 4) =
𝐹(4) − 𝐹(3).
Solution:
2 𝑥
𝐹(𝑥) = ∫ (1 + 𝑡)𝑑𝑡
27 2
1 2
= (𝑥 + 2𝑥 − 8)
27
For 𝑥 = 3 and 𝑥 = 4,
1 2 7
𝐹(3) = (3 + 2 × 3 − 8) =
27 27
1 2 16
𝐹(4) = (4 + 2 × 4 − 8) =
27 27
Hence,
16 7 1
𝐹(4) − 𝐹(3) = − =
27 27 3
𝑥, 0<𝑥≤1
𝑓(𝑥) = { 2 − 𝑥, 1≤2
0, otherwise
Obtain 𝐹(𝑥).
Page 37 of 68
Solution:
0, 𝑥≤0
𝑥 2
𝑥
∫ 𝑡𝑑𝑡 = , 0<𝑥≤1
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = 0 2
1 𝑥
𝑥2
∫ 𝑡𝑑𝑡 + ∫ (2 − 𝑡)𝑑𝑡 = 2𝑥 − − 1, 1<𝑥≤2
{ 0 0 2
Example: A box contains good and defective items. If an item drawn is good, the number 1 is
assigned to the drawing; otherwise, the number 0 is assigned. Let p be the probability of
drawing a good item at random.
Then
1−𝑝, 𝑥=0
𝑓(𝑥) = 𝑃(𝑋 = 𝑥) = {
𝑝, 𝑥=1
0, 𝑥<0
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = {1 − 𝑝 , 0≤𝑥<1
1 , 1≤𝑥
If two or more random variables are given that are defined on the same probability space, the
joint probability distribution is the corresponding probability distribution on all possible
combinations of outputs.
Or simply, Joint probability distribution shows probability distribution for two or more random
variables.
Suppose that a given experiment involves two discrete variables 𝑋 and 𝑌. Then the joint
probability distribution of 𝑋 and 𝑌 can be expressed as
Page 38 of 68
Which has the following properties:
Example: A coin is tossed three times. If 𝑋 denotes the number of heads and 𝑌 denotes the
number of tails in the last two tosses, then find the joint probability distribution of 𝑋 and 𝑌.
Solution: The outcomes of the experiment and the associated probabilities are shown below:
Outcome 𝑿 𝒀 𝑷(𝑿, 𝒀)
HHH 3 0 1/8
HHT 2 1 1/8
HTH 2 1 1/8
HTT 1 2 1/8
THH 2 0 1/8
THT 1 1 1/8
TTH 1 1 1/8
TTT 0 2 1/8
It is easy to see that 𝑋 assumes values 0, 1, 2, and 3, while 𝑌 assumes values 0,1, and 2. The
joint probability distribution can be written as:
𝑿 values
𝒀 values 0 1 2 3 Row sum
0 0 0 1/8 1/8 2/8
1 0 2/8 2/8 0 4/8
2 1/8 1/8 0 0 2/8
Column sum 1/8 3/8 3/8 1/8 1
Let 𝑋 and 𝑌 be two continuous random variables. Then the function 𝑓(𝑥, 𝑦) is called the joint
probability density function of 𝑋 and 𝑌 if
1. 𝑓(𝑥, 𝑦) ≥ 0, for all (𝑥, 𝑦)
Page 39 of 68
∞ ∞
2. ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1
3. 𝑃[(𝑋, 𝑌)] = ∫A ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 for any region A in the 𝑋𝑌 plain.
𝑥 𝑦
1. 0 ≤ 𝐹(𝑥, 𝑦) ≤ 1
𝜕2 𝐹(𝑥,𝑦)
2. = 𝑓(𝑥, 𝑦), whenever 𝐹 is differentiable
𝜕𝑥𝜕𝑦
𝑥𝑦
𝑓(𝑥, 𝑦) = 𝑥 2 + , 0 ≤ 𝑥 ≤ 1, 0≤𝑦≤2
3
= 0, elsewhere
And
∞ ∞
∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1
−∞ −∞
Clearly, 𝑓(𝑥, 𝑦) ≥ 0, for all values of 𝑥 and 𝑦 in the given range. And
∞ ∞ 1 2
𝑥𝑦
∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = ∫ ∫ (𝑥 2 + ) 𝑑𝑥𝑑𝑦
3
−∞ −∞ 0 0
Page 40 of 68
2 2
1 𝑦 𝑦 𝑦2
= ∫ ( + ) 𝑑𝑦 = [ + ] = 1
3 6 3 12 0
0
𝑥(1 + 3𝑦 2 )
𝑓(𝑥, 𝑦) = , 0 ≤ 𝑥 ≤ 2, 0 ≤ 𝑦 ≤ 1
4
= 0, elsewhere
1 1
Find 𝑃 [0 < 𝑋 < 1, < 𝑌 < ].
4 2
1/2 1 1/4
𝑥=1
1 1 𝑥(1 + 3𝑦 2 ) 𝑥 2 3𝑥 2 𝑦 2
𝑃 [0 < 𝑋 < 1, < 𝑌 < ] = ∫ ∫ 𝑑𝑥𝑑𝑦 = ∫ ( + ) 𝑑𝑦
4 2 4 8 8 𝑥=0
1/4 0 1/2
1/4
1/4
1 3𝑦 2 𝑦 𝑦3 23
= ∫ ( + ) 𝑑𝑦 = [ + ] =
8 8 8 8 1/2 512
1/2
Marginal Distribution:
When the distribution of the random variable (Say, 𝑋 or 𝑌) is derived from a joint probability
distribution (say, 𝑓(𝑥, 𝑦)), then the resulting distribution is known as a marginal distribution
(of 𝑋 or 𝑌).
When the random variable 𝑋 and 𝑌 are discrete, the marginal distribution of 𝑋 is
Page 41 of 68
∞
ℎ(𝑦) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 , for −∞ < 𝑦 < ∞.
∞ ∞ ∞
∗ ∫−∞ 𝑔(𝑥)𝑑𝑥 = ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1
𝑏 ∞
𝑏
𝑃[𝑎 < 𝑋 < 𝑏] = 𝑃[𝑎 < 𝑋 < 𝑏, −∞ < 𝑌 < ∞] = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = ∫ 𝑔(𝑥)𝑑𝑥
𝑎
𝑎 −∞
Question:
Values of 𝑋
Solution:
1
1 1
𝑔(0) = 𝑃(𝑋 = 0) = ∑ 𝑓(0, 𝑦) = 𝑓(0,0) + 𝑓(0,1) = 0 + =
8 8
𝑦=0
1
1 2 3
𝑔(1) = 𝑃(𝑋 = 1) = ∑ 𝑓(1, 𝑦) = 𝑓(1,0) + 𝑓(1,1) = + =
8 8 8
𝑦=0
1
2 1 3
𝑔(2) = 𝑃(𝑋 = 2) = ∑ 𝑓(2, 𝑦) = 𝑓(2,0) + 𝑓(2,1) = + =
8 8 8
𝑦=0
1
1 1
𝑔(3) = 𝑃(𝑋 = 3) = ∑ 𝑓(3, 𝑦) = 𝑓(3,0) + 𝑓(3,1) = +0=
8 8
𝑦=0
Page 42 of 68
Similarly, for 𝑌
ℎ(0) = 𝑃(𝑌 = 0)
3
Marginal distribution of 𝑌:
𝑦 0 1 Sum
𝑓(𝑦) 4/8 4/8 1
*Find the marginal densities of 𝑋 and 𝑌 from the following density function and verify that
marginal distributions are also probability distributions.
1
𝑓(𝑥, 𝑦) = { (6 − 𝑥 − 𝑦}, for 0 < 𝑥 < 2, 2 < 𝑦 < 4.
8
Also compute 𝑃[𝑋 + 𝑌 < 3]and 𝑃[𝑋 < 1.5, 𝑌 < 2.5].
4
1 4 1 𝑦2 1
𝑔(𝑥) = ∫2 (6 − 𝑥 − 𝑦) 𝑑𝑦 = [6𝑦 − 𝑥𝑦 − ] = (3 − 𝑥), 0 < 𝑥 < 2
8 8 2 2 4
Page 43 of 68
2
1 2 1 𝑥2 1
ℎ(𝑦) = ∫0 (6 − 𝑥 − 𝑦) 𝑑𝑥 = [6𝑥 − − 𝑥𝑦] = (5 − 𝑦), 2 < 𝑦 < 4.
8 8 2 0 4
Also
2
2 1 2 1 𝑥2
∫0 𝑔(𝑥)𝑑𝑥 = 4 ∫0 (3 − 𝑥)𝑑𝑥 = 4 [3𝑥 − 2 0
] =1
And
4
4 1 4 1 𝑦2
∫2 ℎ(𝑦)𝑑𝑦 = 4 ∫2 (5 − 𝑦)𝑑𝑦 = 4 [5𝑥 − 2 2
] =1
Here 𝑔(𝑥) and ℎ(𝑦) satisfy all the condition of a density function.
Now
3−𝑥
1 2 3−𝑥 1 2 𝑦2
𝑃(𝑋 + 𝑌 < 3) = ∫0 ∫2 (6 − 𝑥 − 𝑦) 𝑑𝑦𝑑𝑥 = ∫0 [6𝑦 − 𝑥𝑦 − ] 𝑑𝑥
8 8 2 2
1 2 𝑥2 7
= ∫ ( − 4𝑥 + )𝑑𝑥
8 0 2 2
1 𝑥3 7𝑥 2 2 1
[ − 2𝑥 2 + ]0 =
8 6 2 24
3 5
3 5 1 9
𝑃 (𝑋 < , 𝑌 < ) = ∫02 ∫22(6 − 𝑥 − 𝑦) 𝑑𝑦𝑑𝑥 = .
2 2 8 32
Conditional Distribution:
The conditional distributions are exactly analogous to the conditional probabilities of the type
𝑃(𝐴|𝐵)𝑜𝑟 𝑃(𝐵|𝐴),where 𝐴 and 𝐵 are two events in a sample space. Using the definition of
conditional probability
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵|𝐴) = , 𝑃(𝐴) > 0
𝑃(𝐴)
Page 44 of 68
Replacing the events 𝐴 and 𝐵 by the random variables 𝑋 and 𝑌 respectively, we define the
conditional probability of 𝑌 for given 𝑋 as follows:
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) 𝑓(𝑥, 𝑦)
𝑃(𝑌 = 𝑦|𝑋 = 𝑥) = =
𝑃(𝑋 = 𝑥) 𝑔(𝑥)
𝑓(𝑥, 𝑦) 𝑓(𝑥, 𝑦)
𝑓(𝑦|𝑥) = = , for 𝑔(𝑥) > 0
∑𝑦 𝑓(𝑥, 𝑦) 𝑔(𝑥)
𝑓(𝑥, 𝑦) 𝑓(𝑥, 𝑦)
𝑓(𝑥|𝑦) = = , for ℎ(𝑦) > 0
∑𝑥 𝑓(𝑥, 𝑦) ℎ(𝑦)
𝑓(𝑥, 𝑦) 𝑓(𝑥, 𝑦)
𝑓(𝑦|𝑥) = ∞ = , 𝑔(𝑥) > 0
∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦 𝑔(𝑥)
𝑓(𝑥,𝑦) 𝑓(𝑥,𝑦)
𝑓(𝑥|𝑦) = ∞ = , ℎ(𝑦) > 0.
∫−∞ 𝑓(𝑥,𝑦)𝑑𝑥 ℎ(𝑦)
If one wished to find the probability that the random variable 𝑋 falls between 𝑎 and 𝑏, when it
is known that variable 𝑌 = 𝑦, we evaluate
∑ 𝑓(𝑥|𝑦), 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝑥
𝑃(𝑎 < 𝑋 < 𝑏|𝑌 = 𝑦) = 𝑏
∫ 𝑓(𝑥|𝑦), 𝑖𝑓 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
{𝑎
Example:
Values of 𝑿
Values of 𝒀 0 1 2
1 6/28 6/28 0
2 1/28 0 0
Page 45 of 68
𝑓(𝑥,1) 𝑓(1,𝑦)
Solution: 𝑓(𝑥|1) = and 𝑓(𝑦|1) = .
ℎ(1) 𝑔(1)
6 6 3
Now, ℎ(1) = ∑2𝑥=0 𝑓(𝑥, 1) = 𝑓(0,1) + 𝑓(1,1) + 𝑓(2,1) = + +0= .
28 28 7
𝑓(𝑥, 1) 7
𝑓(𝑥|1) = = 𝑓(𝑥, 1), 𝑓𝑜𝑟 𝑥 = 0, 1, 2.
ℎ(1) 3
Therefore,
7 7 6 1
𝑓(0|1) = 𝑓(0,1) = × =
3 3 28 2
7 7 6 1
𝑓(1|1) = 𝑓(1,1) = × =
3 3 28 2
7 7
𝑓(2|1) = 𝑓(2,1) = × 0 = 0
3 3
𝑋 0 1 2
6
𝑓(𝑥, 𝑦) = (𝑥 + 𝑦 2 ), 0 < 𝑥 < 1, 0<𝑦<1
5
= 0, elsewhere
(i) 𝑓(𝑥|𝑦)
(ii) 𝑓(𝑥|0.5)
(iii) 𝑃(𝑋 < 0.5|𝑌 = 0.5)
Page 46 of 68
Solution:
Now,
1
6 6 𝑥2 1
ℎ(𝑦) = ∫(𝑥 + 𝑦 2 )𝑑𝑥 = ( + 𝑥𝑦 2 )|
5 5 2 0
0
6 1
= ( + 𝑦2) , 0<𝑦<1
5 2
(i)
6 2
𝑓(𝑥, 𝑦) 5 (𝑥 + 𝑦 ) 2(𝑥 + 𝑦 2 )
𝑓(𝑥|𝑦) = = = , 0 < 𝑥 < 1, 0<𝑦<1
ℎ(𝑦) 6 1 2 1 + 2𝑦 2
( +𝑦 )
5 2
(ii)
2(𝑥 + 0.52 ) 1
𝑓(𝑥|0.5) = = (4𝑥 + 1), 0<𝑥<1
1 + 2(0.5)2 3
(iii)
0.5
1
𝑃(𝑋 < 0.5|𝑌 = 0.5) = ∫ (4𝑥 + 1)𝑑𝑥
3
0
1 1
= (2𝑥 2 + 𝑥)| 0.5
0
= .
3 3
Two random variables 𝑋 and 𝑌 with marginal densities 𝑔(𝑥) and ℎ(𝑦), respectively, are said
to be independent if and only if
Page 47 of 68
Values of 𝑋
For independence,
Values of 𝑋
Solution:
i) 𝑓(2,1) = 0.10 and 𝑔(2) = 0.40, ℎ(1) = 0.25, hence 𝑔(2). ℎ(1) = 0.10 = 𝑓(2,1).
𝑖𝑖) 𝑓(4,1) = 0.15 and 𝑔(4) = 0.60, ℎ(1) = 0.25, hence 𝑔(4). ℎ(1) = 0.15 =
𝑓(4,1).
.
.
.
Page 48 of 68
vi) 𝑓(4,5) = 0.15 and 𝑔(4) = 0.60, ℎ(5) = 0.25, hence 𝑔(4). ℎ(5) = 0.15 =
𝑓(4,5).
For all points of the values (𝑥, 𝑦) of the random variables 𝑋 and 𝑌, 𝑓(𝑥, 𝑦) = 𝑔(𝑥)ℎ(𝑦)
f(2,1) 0.10
𝑝(𝑌 = 1|𝑋 = 2)𝑥 = f(1,2) = = = 0.25 = h(1)
g(2) 0.40
𝑓(2,3) 0.20
𝑝(𝑌 = 3|𝑋 = 2) = 𝑓(3|2) = = = 0.50 = ℎ(3)
𝑔(2) 0.40
.
.
.
f(4,5) 0.15
p(Y = 5|X = 4) = f(5|4) = = = 0.25 = h(5)
g(4) 0.60
Here, the conditional distributions of Y for all X’s are equal to the marginal distribution
of Y. So, X and Y are independent.
Example:
𝑥+𝑦
𝑓(𝑥, 𝑦) = , 0 < x < 2, 0 < y < 2
8
Solution:
1 2 𝑥+𝑦
𝑔(𝑥) = ∫ (𝑥 + 𝑦)𝑑𝑦 =
8 0 4
1 2 𝑦+1
ℎ(𝑦) = ∫ (𝑥 + 𝑦)𝑑𝑥 =
8 0 4
(𝑥+1)(𝑦+1)
Thus, 𝑔(𝑥)ℎ(𝑦) = ≠ 𝑓(𝑥, 𝑦).
4
Page 49 of 68
𝑥+𝑦
𝑓(𝑥, 𝑦) 𝑥+𝑦
𝑓(𝑦|𝑥) = = 8 = ≠ ℎ(𝑦)
𝑔(𝑥) 𝑥 + 1 2(𝑥 + 1)
4
𝑥+𝑦
𝑓(𝑥, 𝑦) 𝑥+𝑦
𝑓(𝑥|𝑦) = = 8 = ≠ 𝑔(𝑥)
ℎ(𝑦) 𝑦 + 1 2(𝑦 + 1)
4
Since the conditional distributions are not equal to the marginal distributions, the variables are
not independent.
MATHEMATICAL EXPECTATION
Historically, the term mathematical expectation or expected value is derived from games of chances. In
such games; the gamers were concerned with how much, on the average, one would expect to win if
the game is continued for a sufficiently long time. In statistical terminology, this term is associated with
a random a variable and in fact, is the average value of this random variable generated through a random
experiment.
The computation of expected value of a random variable is straightforward. When the standard variable
is discrete, it is simply the sum of the products of all possible values of the random variables multiplied
by their respective probabilities. For continuous variable, it is analogously defined.
If 𝑋 is a discrete random variable with the probability function 𝑓(𝑥), then the expected value or the
mathematical expectation of 𝑋, 𝐸 (𝑋) is defined as,
𝐸 (𝑋) = ∑ 𝑥𝑓(𝑥)
𝑥
𝐸 (𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞
*If C is a constant, 𝐸 (𝐶 ) = 𝐶
*[𝐸𝑊1 (𝑋) + 𝑊2 (𝑋) + ⋯ + 𝑊𝑘 (𝑋)}] = 𝐸[𝑊1 (𝑋)] + 𝐸[𝑊2 (𝑋)] + ⋯ 𝐸[𝑊𝑘 (𝑋)]
∑ 𝑊(𝑥)𝑓(𝑥), 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝑥
𝐸𝑊𝑋 = ∞
∫ 𝑊(𝑥)𝑓(𝑥)𝑑𝑥, 𝑖𝑓 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
{ −∞ }
Page 50 of 68
*The Variance of a random variable x is
𝑉(𝑋) = 𝐸[ (𝑋 − 𝜇)2 ]
=𝐸(𝑋 2 ) − 𝜇2 ,
Example
X -3 -2 0 1 2
𝑃(𝑋 = 𝑥)
0.10 0.30 0.15 0.40 0.05
= 𝑓(𝑥)
Solution:
𝜇 = 𝐸(𝑋) = ∑ 𝑥𝑓(𝑥)
𝑥
= −0.4
2
= 2.54
= 2.7
Page 51 of 68
Example: A life insurance company in Bangladesh offers to sell a TK.25000 one year term
life insurance policy to a 25 year old man for a premium of TK.2500. According to Bangladesh
life table, the probability of serving one year for a 25 year old man is 0.97. What is the
company’s expected gain in the long run?
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛:
The gain 𝑋 is a random variable that may take on the values 2500, if the man survives or 2500-
25000=-TK.22500 if he dies .consequently, the probability of 𝑋 is
𝑋 ∶ 2500 − 22500
Example: Find the mean (expected value) and the variation of the following function
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛:
1 0 1
1
𝐸(𝑋) = ∫ 𝑥(2𝑥 − 1) 𝑑𝑥 = 2 [∫ 𝑥 𝑑𝑥 − ∫ 𝑥 2 𝑑𝑥 ] =
0 1 0 3
1
2)
1
𝐸(𝑋 = ∫ 𝑥 2 . 2(1 − 𝑥) 𝑑𝑥 =
0 6
2 2) 2
1 1 2 1
∴ 𝑉(𝑋) = 𝜎 = 𝐸(𝑋 − [𝐸(𝑋)] = − ( ) =
6 3 18
𝑒 −𝑚 𝑚 𝑥
𝑓(𝑥) = , 𝑥 = 0, 1, 2, . . ∞
𝑥!
Find 𝐸(𝑋).
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛:
By definition
Page 52 of 68
∞ ∞ ∞
𝑥𝑒 −𝑚 𝑚 𝑥 𝑥𝑚 𝑥
𝐸(𝑋) = ∑ 𝑥𝑓(𝑥) = ∑ = 𝑒 −𝑚 ∑
𝑥! 𝑥!
𝑥=0 𝑥=0 𝑥=0
𝑚2 𝑚3
=𝑒 −𝑚 𝑚 (1 + 𝑚 + + + ⋯ ) = 𝑒 −𝑚 𝑚 = 𝑚.
2! 3!
Example: A lot of 7 markers is supplied by a quality inspector; the lot contains 4 good markers
and 3 defective markers. A sample of 3 is taken by the inspector. Find the expected value of
the number of good markers in this sample.
Solution:
Let X represent the number of good markers in the sample. It can be shown that the probability
distribution of X is
(4𝐶𝑥 )(3𝐶3−𝑥 )
𝑓(𝑥) = , 𝑥 = 0, 1, 2, 3.
7𝐶3
Values of X 0 1 2 3
𝑓(𝑥) 1 12 18 4
35 35 35 35
1 12 18 4
Therefore, 𝐸(𝑋) = (0) ( ) + (1) ( ) + (2) ( ) + (3) ( ) = 1.7.
35 35 35 35
Thus, if a sample of 3 markers is selected at random over and over again from a lot of 4 good
markers and 3 defective markers, it would contain, on average 1.7 good markers.
Page 53 of 68
Example: In a coin tossing program, a man is promised to receive TK. 5 if he gets all heads or
all tails when three coins are tossed and he pays off (loses) TK. 3 if he either one or two heads
appear. How much is he expected to gain in the long run?
Solution: The random variable here is the amount the man can win. If X is the random variable,
then X will be on a value 5 when the coins show all heads and -3, otherwise. The table below
shows the outcomes of the experiment, values of X and the associated probabilities:
X 5 -3 -3 -3 -3 -3 -3 5
It appears from the above table that the variable X assumes values -3 and 5 with probabilities
6⁄8 and 2⁄8 respectively. Since the value -3 occurs 6 times and 5 occurs 2 times the expected
value of X is
6 2
𝐸(𝑋) = ∑ 𝑥𝑓(𝑥) = −3 ( ) + 5 ( ) = −1.
8 3
Let us now examine what happens if the man receives TK. 5 for all heads or all tails, Tk. “0”
for 2 heads and pays off Tk. 3 for 1 head. The random variable X will now assume the values,
5, 0 and -3 with associated probabilities 2⁄8, 3⁄8 and 3⁄8 respectively. The expected value
in this case will be
2 3 3 1
𝐸(𝑋) = ∑ 𝑥𝑓(𝑥) = 5 ( ) +0( ) + (−3) ( ) = = 0.125.
8 8 8 8
This shows that the man will be marginally gainer winning only 12.5 paisa.
Page 54 of 68
EXPECTED VALUE OF A FUNCTION OF TWO RANDOM VARIABLE
The notion of mathematical expectation can be expended of two or more random variables. We
will deal here with the case of two variables, which can be analogously being extended for 3
or more variables. Let X and Y be two random variables with joint probability distribution
f(x, y). the expected value of the function w(X, Y) is defined as
∞ ∞
= ∫−∞ ∫−∞ w(x, y)f(x, y) dxdy, if X and Y are continuous
Further if w(X, Y) is a function of the random variable X and Y, and c is a constant, then
And, also, if X and Y are two random variable and w1 (X, Y), w2 (X, Y) are the functions of X
and Y, then
E[w1 (X, Y) + w2 (X, Y)] = E[w1 (X, Y)] + E[w2 (X, Y)].
𝐓𝐡𝐞𝐨𝐫𝐞𝐦 𝟏: The expected value of the sum of two random variables X and Y is the sum of
𝐂𝐨𝐫𝐨𝐥𝐥𝐚𝐫𝐲 𝟐: If a and b are two constants, then E(aX + bY) = aE(X) + bE(Y)
Theorem 2: The expected value of the two random variables X and Y is equal to the product
𝐸(𝑋𝑌) = 𝐸(𝑋)𝐸(𝑌)
Page 55 of 68
Or, in other words,
The expected value of the product of two random variable is equal to the product of their
expectations.
=0, elsewhere
𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧:
Hence,
1 1
1
𝐸(𝑋) = ∫ 𝑥𝑔(𝑥)𝑑𝑥 = ∫ 𝑥𝑑𝑥 =
2
0 0
Similarly,
1
1
ℎ(𝑦) = 2 ∫(𝑥 + 𝑦 − 2𝑥𝑦)𝑑𝑥 = 2 ( + 𝑦 − 𝑦) = 1
2
0
And
1 1
1
𝐸(𝑌) = ∫ 𝑦ℎ(𝑦)𝑑𝑦 = ∫ 𝑦𝑑𝑦 =
2
0 0
1 1
Page 56 of 68
1 1
1
1 2𝑥
= 2 ∫ (𝑥 2 + + 𝑥 − 𝑥 2 − ) 𝑑𝑥
3 3
0
1
1 𝑥
= 2 ∫ ( + ) 𝑑𝑥 = 1
3 3
0
1 1
Since 𝐸(𝑋) + 𝐸(𝑌) = + = 1, we establish that 𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌)
2 2
1 1
1 1
= ∫ ∫ 𝑥𝑦(𝑥 + 𝑦 − 2𝑥𝑦)𝑑𝑥𝑑𝑦
0 0
1 1
= ∫ ∫(𝑥 2 𝑦 + 𝑦 2 𝑥 − 2𝑥 2 𝑦 2 ) 𝑑𝑥𝑑𝑦 𝑑𝑦
0 0
2
=
9
Now
1 1 1
𝐸(𝑋)𝐸(𝑌) = × =
2 2 4
And
2
𝐸(𝑋𝑌) =
9
Since E(XY) is not equal to E(X) × E(Y), the variables are not independent.
Page 57 of 68
Example: Given the following density function of X and Y:
= 0, elsewhere
Obtain E(X) and E(Y).
𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧:
1 1 1 1 1
2
𝐸(𝑋) = ∫ ∫ 𝑥𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 4 ∫ ∫ 𝑥 2 𝑦 𝑑𝑥𝑑𝑦 = 2 ∫ 𝑥 2 𝑑𝑥 =
3
0 0 0 0 0
1 1 1 1 1
4 2
𝐸(𝑌) = ∫ ∫ 𝑦𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 4 ∫ ∫ 𝑥 2 𝑦𝑑𝑥𝑑𝑦 = ∫ 𝑥𝑑𝑥 =
3 3
0 0 0 0 0
Page 58 of 68
Binomial Distribution
When an experiment has two possible outcomes, success and failure and the experiment is
repeated 𝑛 times independently and the probability 𝑃 of success of any given trial remains
constant from trial to trial, the experiment is known as binomial experiment.
Example: The probability that a patient recovers from a disease is 0.9. What is the probability
that an exactly 5 out of the next 7 patients will survive?
Solution:
We assure that the operations are made independently and p=0.9 for each of the seven
patients.
n=7, p=0.9 1: success (recover), 0: failure (not recover)
patient1 p2 p3 p4 p5 p6 p7 Total
1 1 0 0 0 1 0 3
0 0 0 0 1 0 0 1
⋮
Bernoulli Binomial
Page 59 of 68
So, 6.3 patients out of 7 patients will survive on an average. That is, 63 patients out of 70
patients will survive on an average.
Example: A traffic control officer reports that 75% of the trucks passing through a check post
are from within Dhaka city. What is the probability that at least 3 of the next 5 trucks are from
out of the city?
Solution: Let X be the number of trucks that pass through are from out of Dhaka city. The
probability of such an event is: p=1-0.75=0.25
X~ Bin (n=5, p=0.25)
P(X ≥ 3) =P(X=3) + P(X=4) + P(X=5)
𝑛
=∑53 ( ) 𝑝 𝑥 (1 − 𝑝)(𝑛−𝑥)
𝑥
5 5 5
=( )(0.253 )(0.755−3 )+( ) 0.254 0.755−4 +( ) 0.255 0.755−5
3 4 5
= 0.1035
Or, P(X≥3) =1-P(X=0)-P(X=1)-P(X=2)
5 5 5
=1-( ) . 250 . 755−0 − ( ) 0. 251 0.755−1 − ( ) . 252 . 755−2
0 1 2
*Probability of more than 3 trucks are out of the city:
P(X>3) =P(X=4) +P(X=5)
5 5
=( ) (. 254 )(. 755−4 ) + ( ) (. 255 )(. 755−5 )
4 5
= 0.0156
Page 60 of 68
If E(X)=5, V(X)=2, Then, np=5, np(1-p) = 2
𝑛𝑃(1−𝑃) 2
=
𝑛𝑃 5
2
or, 1-P =
5
2 3
or, P = 1- =
5 5
Now, nP = 5
5 5×5 25
or, 𝑛 = = =
𝑃 3 3
25 3
X~𝐵𝑖𝑛(𝑛 = ,𝑃 = )
3 5
Poisson Distribution
Let µ be the mean of successes in a specified time or space and the random variable X is the
number of successes in a given time interval or specified region. Then X follows Poisson
distribution as
𝑒 −µ µ𝑥
f(x) = , x=0,1, ...∞
𝑥!
where e= 2.718.
Example: The average number of calls received by a telephone operator during a time interval
of 10 minutes. during 5PM to 5:10 PM daily is 3. What is the prob. that the operator will
receive,
i. no call
ii. exactly one call
iii. at least two calls tomorrow during the same time interval.
Solution: Let X be the random variable representing the number of calls made during the
interval.
X ~ Poisson (3),
𝑒 −3 3𝑥
f(X=x) = , x≥0
𝑥!
𝑒 −3 30
i. P(X=0) = = 0.0498
0!
Page 61 of 68
𝑒 −3 ∙ 3
𝑖𝑖. 𝑃(𝑋 = 1) = = 0.1494
1!
𝑖𝑖𝑖. 𝑃(𝑋 ≥ 2) = 1 − 𝑃(𝑋 = 0) − 𝑃(𝑥 = 1)
= 1 − 0.0498 − 0.1494
= 0.8008
Mean:
𝐸(𝑋) = 𝜇
Variance:
𝑉(𝑋) = 𝜇
Example: Find the mean and standard deviation of a Poisson variate X for which P(X=1) =
P(X=2).
Solution: Let X ~ Poisson(𝜇)
𝑒 −𝜇 ∙ 𝜇1
𝑃(𝑋 = 1) =
1!
𝑒 −𝜇 ∙ 𝜇2
𝑃(𝑋 = 2) =
2!
𝑒 −𝜇 ∙ 𝜇1 𝑒 −𝜇 ∙ 𝜇2
∴ =
1! 2!
⇒𝜇=2
Page 62 of 68
P(X=2) = (e^(-3)∙ 3^2)/2!=0.2241.
Example: In a certain industrial facility, accidents occur infrequently. It is known that the
probability of an accident on any given day is 0.005 and accidents are independent of each
other.
a. What is the probability that in any given period of 400 days (about 1 year) there will be an
accident on one day?
b. What is the probability that there are at most three days with an accident?
Solution: Let X be a binomial random variable with n = 400 and p = 0.005. Thus, np = 2.
Using the Poisson approximation,
(a) P(X=1) = e^(-2) 2^1=0.271
and
(b) P(X ≤ 3) = (∑_(x=0)^3e^(-2) 2^x)/x!
= 0.857.
Example: In a manufacturing process where glass products are made, defects or bubbles occur,
occasionally rendering the piece undesirable for marketing. It is known that, on average, 1 in
every 1000 of these items produced has one or more bubbles. What is the probability that a
random sample of 8000 will yield fewer than 7 items possessing bubbles?
Solution: This is essentially a binomial experiment with n = 8000 and p = 0.001. Since p is
very close to 0 and n is quite large, we shall approximate with the Poisson distribution using
Μ = (8000)(0.001) = 8.
Hence, if X represents the number of bubbles, we have
P(X<7) = ∑_( z=0)^( 6)▒b(x;8000,0.0001) ≈ p(x;8)=0.3134.
Normal Distribution
A random variable X is said to have a normal distribution with mean µ and variance 𝜎 2 (-∞ <
µ < ∞ and 𝜎 2 > 0) if X has a continuous distribution for which the probability density function
is
1 𝑥−𝜇 2
1
f(x) = 𝑒 −2( 𝜎
)
, -∞ < x < ∞
𝜎√2𝜋
Page 63 of 68
P (µ-σ < X < µ+σ) = 68.27%
If a random variable X has a normal distribution with mean µ and variance σ2 (i.e. X ~ N (µ,
𝑋−𝜇
σ2)), then the variable Z = will be called a standard normal variable (or Z score) and its
𝜎
distribution is referred to as the standard normal distribution having the following density
function:
𝑍2
1
f(Z) = 𝑒− 2 , -∞ < Z < ∞
√2𝜋
𝑋−𝜇 1 1
E(Z) = E ( )= 𝐸(𝑋 − 𝜇) = [ 𝐸(𝑋) − 𝐸(𝜇) ]
𝜎 𝜎 𝜎
1
= [𝜇− 𝜇]
𝜎
=0
𝑋−𝜇 1 1
𝑉(𝑋)= 𝑉( )= 𝑣(𝑋 − 𝜇) = [ 𝑉(𝑋) + 𝑉(𝜇) − 2𝐶𝑜𝑣[𝑋]
𝜎 𝜎2 𝜎2
1
= [𝜎 2 + 0 − 0]
𝜎2
=1
Thus, Z ~ N (0,1)
The cumulative distribution function (cdf) of the standard normal variable Z is usually denoted
by φ(Z). Thus
𝑍 𝑍2
1 −2
φ(Z)= ∫ 𝑒 𝑑𝑍
√2𝜋 −∞
Example: The GPA score of 80 students of the Department of Physics of University of Dhaka
in their 1st year final exam was found to follow approximately a normal distribution with mean
2.1 and standard deviation 0.6. How many of these students are expected to have a score
between 2.5 and 3.5?
𝑋 ~ 𝑁(2.1, 0.62 )
Page 64 of 68
2.5−2.1 𝑋−𝜇 3.5−2.1
Now, P (2.5 < X < 3.5) = P ( < < )
0.6 𝜎 0.6
Example: If X is a normal variate with mean 25 and variance 9, find K such that
i. 30% of the area under the normal curve lies to the left of the distribution.
ii. 15% of the area under the normal curve lies to the right of the distribution.
Solution:
X ~ N (25, 9)
i.e., µ= 25, σ= 3
i. P(X<K)=0.30
=> P(Z<(𝐾 − 25)/3) = 0.30
The standard normal Table shows that
P (Z< -0.525) = 0.30
Page 65 of 68
Hence,
(𝐾 − 25)/3= -0.525
=> K = 23.425
ii. P(X>K) = 0.15
𝑘−25
or, P(Z > ) = 0.15
3
𝑘−25
or, P(Z < ) = 1- 0.15 = 0.85
3
Example: Given a standard normal distribution, find the area under the curve that lies
(a) to the right of z = 1.84
Page 66 of 68
(a) The area in Figure 6.9(a) to the right of z = 1.84 is equal to 1 minus the area in Z-table to
the left of z = 1.84, namely, 1 − 0.9671 = 0.0329.
(b) The area in Figure 6.9(b) between z = −1.97 and z = 0.86 is equal to the area to the left of
z = 0.86 minus the area to the left of z = −1.97. From Z-table we find the desired area to be
0.8051 − 0.0244 = 0.7807.
Example: Given a random variable X having a normal distribution with μ = 50 and σ = 10,
find the probability that X assumes a value between 45 and 62.
Therefore,
𝑃(45 < 𝑋 < 62) = 𝑃(−0.5 < 𝑍 < 1.2).
𝑃(−0.5 < 𝑍 < 1.2) is shown by the area of the shaded region in the Figure.
This area may be found by subtracting the area to the left of the ordinate 𝑧 = −0.5 from the
entire area to the left of z = 1.2. Using Z-table, we have
𝑃 (45 < 𝑋 < 62) = 𝑃(−0.5 < 𝑍 < 1.2) = 𝑃 (𝑍 < 1.2) − 𝑃(𝑍 < −0.5)
= 0.8849 − 0.3085 = 0.5764.
Page 67 of 68
2.99 − 3.0 3.01 − 3.0
𝑍1 = = −2 𝑎𝑛𝑑 𝑍2 = = +2.0
0.005 0.005
Hence,
𝑃(2.99 < 𝑋 < 3.01) = 𝑃(−2.0 < 𝑍 < 2.0).
From Table𝐴. 3, 𝑃(𝑍 < −2.0) = 0.0228.
Continuity Correction
Let, 𝑋~𝐵𝑖𝑛 (n = 10, P = 0.5). Find P (2 ≤ X ≤ 4) using both binomial distribution and normal
distribution:
Solution: Using binomial distribution:
𝑃(2 ≤ 𝑋 ≤ 4)
𝑛
= ∑ ( ) 𝑃 𝑥 (1 − 𝑃)𝑛−𝑥
𝑥
𝑥
4
10
= ∑( ) (0.5)𝑥 (1 − 0.5)10−𝑥 = 0.3662
𝑥
2
Now, P (2 ≤ X ≤ 4) ≈ P (2-0.5 ≤ 𝑋′≤ 4+0.5) = P (1.5 ≤ 𝑋′ ≤ 4.5)
Using normal distribution, µ = np = 5, 𝜎 2 = np(1-p) = 2.5
𝑃(1.5 ≤ 𝑋′ ≤ 4.5)
1.5−5 4.5−5
= 𝑃(1.5 − 5 ≤𝑍≤ )
√2.5 √2.5
= 𝑃(−2.21 ≤ 𝑍 ≤ −0.316)
= 𝑃(𝑍 ≤ −0.316) − 𝑃(𝑍 ≤ −0.21)
= 0.3764 − 0.0136
= 0.362.
Page 68 of 68