0% found this document useful (0 votes)
13 views

Probability

Uploaded by

mdferdousrafi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Probability

Uploaded by

mdferdousrafi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Random Experiment

A random experiment is an experiment in which,


a) All possible outcomes of the experiment are known in advance.
b) Any possible performance of the experiment results in an outcome that is not exactly
known in advance.
c) The experiment can be repeated under identical conditions.
Example:
• A fair die with six faces marked 1, 2 … … … 6 is tossed once. This is an experiment with
six possible outcomes{1, 2 … … … 6}. But we are uncertain about whether a 2 or a 6
will land when tossed. This makes the experiment a random experiment.
• An experiment consists of counting the number of bacteria in a portion of a food, we
might observe a countable but infinite number of bacteria, such as 0 bacteria, 1
bacterium, 2 bacteria, and so on. This experiment represents a random experiment.

Sample Space
Definition: A sample space of an experiment is a set or collection of all possible outcomes of
the same experiment such that any outcome of the experiment corresponds to exactly one
element in the set. A sample space is usually denoted by the symbol S.
• Discrete Sample Space
Definition: If a sample space contains a finite number of possibilities or an unending sequence
with as many elements as there are whole numbers, it is called a discrete sample space.
Example 1: If the experiment consists of rolling a die, then the sample space can be presented
as follows:
S = {x|x = 1, 2, 3 … … … 6}
where x represents the number appearing on the uppermost face of the die. A more fundamental
sample space for the above experiment is as follows:
S = {1, 2, 3, 4, 5, 6}
Example 2: A businessman may make a profit, encounter a loss or may have on a break-even
point while running his business. With these possible outcomes, the possible sample space is:
S = {Profit, Loss, Break − even }
Example 3: If the experiment involves rolling a pair of dice, then the resulting sample space
is of the following form:

Page 1 of 68
(1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
S=
(4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
(5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
{(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)}
where the outcome (3,6), for example, is said to occur if 3 occurs on the first die and 6 occurs
on the second die.
• Continuous Sample Space
Definition: If a sample space contains an infinite number of possibilities equal to the number
of points on a line segment, it is called a continuous sample space.
Example: In the measurement of the longevity of the bulb, the sample space could be
S = {x | x ≥ 0}, where x is the time taken by the bulbs to burn out.

Event
When an experiment is performed, it can result in one or more experimental outcomes, which
are called events. Any subset of sample space S is known as an event.
Example-1: If an experiment consists of tossing two coins and noting whether they land Heads
(H) or Tails (T) then the set S is
S= {HH, HT, TH, TT}
If A = {HH, HT, then A is an event that the first c coin lands on heads.
Example-2: A businessman may make a profit, encounter a loss or may have on a break-even
point while running his business. With these possible outcomes, the possible sample space is
S= {Profit, Loss, Break-even}
If A= {Loss}, then A is an event that the businessman will incur a loss while running his
business.
• Equally Likely Events
Two or more events are said to be equally likely if they have the same chance of occurrence.
Example-1: In the experiment of testing a coin, if A be the event of getting a head and B be
the event of getting a tail then the events A and B are said to be equally likely events since both
the event have same chance of occurrence.
Example-2: In the experiment of testing a dice, if A be the event of getting 1 and B be the
event of getting 2 then the events A and B are said to be equally likely events since both the
event have same chance of occurrence.

Page 2 of 68
• Mutually Exclusive Events
Two or more events are said to be mutually exclusive if the happening of any one of the events
excludes the happening of all the others (events) that is, no two or more of the events can
happen simultaneously in the same trial. (The joint occurrence is not possible, disjoint events)
(A ∩ B = ∅).
Example-1: In coin tossing experiment, either head or tail will land in each trial. So, occurring
head and tail are mutually exclusive events.
• Exhaustive Events
The total number of possible outcomes in any trial is known as exhaustive events.
Example: When pesticide is applied, a pest may survive or die. There are two exhaustive
events: survival, death.

H HHH
H
T HHT
H
H HTH
T
T HTT

H THH
H
T THT
T
H TTH
T
T TTT

Figure: Sample Space for three-coin toss

Multiplication Rule of Probability


The multiplication rule of probability explains the condition between two events. For two
events A and B associated with a sample space S, the set 𝐴 ∩ 𝐵 denotes the events in which
both event A and event B have occurred. Hence, 𝐴 ∩ 𝐵 denotes the simultaneous occurrence
of the events A and B. The event 𝐴 ∩ 𝐵 can be written as 𝐴 ∩ 𝐵. The probability of event AB

Page 3 of 68
is obtained by using the properties of conditional probability. We know that the conditional
probability of event A given that B has occurred is denoted by 𝑃(𝐴|𝐵) and is given by:

We aim to prove that if events A and B are independent, then the probability of both events A
and B occurring simultaneously is the product of their individual probabilities, i.e.,
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵)
We'll start by using the definition of conditional probability:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴|𝐵) × 𝑃(𝐵)
This equation represents the probability of both events A and B occurring simultaneously.
𝑃(𝐴|𝐵) denotes the conditional probability of event A occurring given that event B has already
occurred, and 𝑃(𝐵) is the probability of event B occurring.
Now, since events A and B are independent, the occurrence of one event does not affect the
occurrence of the other. Mathematically, this independence is expressed as: 𝑃(𝐴|𝐵) = 𝑃(𝐴)
This implies that the probability of event A occurring given that event B has occurred is the
same as the probability of event A occurring alone, which is P(A).
Substituting P(A|B) = P(A) into the equation for 𝑃(𝐴 ∩ 𝐵), we get:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵)
This concludes the proof of the multiplication rule of probability when events A and B are
independent.

Permutation
A permutation is an arrangement of objects in a definite order.
If 𝑛 is the total number of objects and 𝑟 is the number of objects you want to arrange, the
number of permutations at a time is given by,
𝑛
𝑛!
𝑃𝑟 =
(𝑛 − 𝑟)!
Example: If 𝑎, 𝑏, 𝑐 and 𝑑 are four letters and if we want to arrange these by taking two at a
time, the possible arrangements are: 𝑎𝑏, 𝑎𝑐, 𝑎𝑑, 𝑏𝑎, 𝑐𝑎, 𝑑𝑎, 𝑏𝑐, 𝑐𝑏, 𝑏𝑑, 𝑑𝑏, 𝑐𝑑 and 𝑑𝑐.
Here, 𝑛 = 4, 𝑟 = 2.
The number of permutations of 𝑛 distinct objects taken 2 at a time is,

4
4! 4×3×2×1
𝑃2 = = = 12
(4 − 2)! 2×1

Page 4 of 68
When one or more objects are repeated, the number of permutations needs adjustment. The
number of distinct permutations of 𝑛 things of which 𝑛1 are of one kind, 𝑛2 are of a second
kind… and 𝑛𝑘 of a 𝑘th kind is,
𝑛!
𝑛1 ! 𝑛2 ! … . 𝑛𝑘 !
Example: There are 9 birthday candles, of which four are yellow, three are red and two are
9!
blue. The number of ways these candles can be arranged in 9 positions is = 1260.
4!3!2!

Combination
Very often we are interested in the number of ways of selecting 𝑟 objects from 𝑛 without regard
to order of arrangements. These selections are called combinations. Here the arrangements 𝑎𝑏
and 𝑏𝑎 are regarded as the same. Thus, with the 3 letters 𝑎, 𝑏, 𝑐 the number of permutations
taking two at a time, the arrangements are 𝑎𝑏, 𝑏𝑎, 𝑎𝑐, 𝑐𝑎, 𝑏𝑐 𝑎𝑛𝑑 𝑐𝑏. But if the order of the
arrangements is disregarded (𝑖. 𝑒 𝑏𝑎 = 𝑎𝑏, 𝑎𝑐 = 𝑐𝑎, 𝑏𝑐 = 𝑐𝑏), the number of combinations
will be 3. A combination is a partition with two cells. The one cell containing 𝑟 objects selected
and the other cell containing (𝑛 − 𝑟) objects that are left.

Let (𝑛𝑟) denote the number of combinations of 𝑛 objects taken 𝑟 at a time irrespective of order.

The symbol (𝑛𝑟) is sometimes called binomial coefficient. We have noted earlier that for each
of set 𝑟 thing, there are 𝑟! permutations. Since combination of 𝑟 things are a set with 𝑟 elements,
(𝑛𝑟)𝑟! must be equal to the number of permutations of 𝑛 things taken 𝑟 at a time.

𝑛! 𝑛!
Thus, (𝑛𝑟) × 𝑟! = and (𝑛𝑟) =
(𝑛−𝑟)! 𝑟!(𝑛−𝑟)!

It is helpful to note that, (𝑛𝑟) = (𝑛−𝑟


𝑛
)

In particular, (10
7
) = (10
3
), (50
48
) = (50
2
), 𝑒𝑡𝑐.

Example: There are 20 people in a room of whom 12 are men and 8 are women. A committee
of 3 is to have formed them. How many ways can this be done? If it is desired that the
committee will consist of 2 men and 1 woman, in how many ways can this be done?

Solution: It is evident that the order of the selected people is of no importance. Therefore, this
is a problem of combination with 𝑛 = 20 𝑎𝑛𝑑 𝑟 = 3.

20! 20! 20×19×18×17!


Thus (20
3
)= = = = 1140.
3!(20−3)! 3!(17)! 3!17!

Page 5 of 68
The number of ways in which 2 men and 1 woman will be in the committee is (12
2
) × (81) =
528.

Definition of Probability
In assigning probabilities to experimental outcomes, there are various acceptable approaches.
There are usually three different approaches to define probability.
1. Classical approach
2. The relative frequency approaches.
3. Subjective approach
Here a brief description of these approaches is given below:
Classical Approach
Definition: If a random experiment can result in 𝑛(𝑆) mutually exclusive, exhaustive and
equally likely outcomes and if 𝑛(𝐴) of these outcomes are favorable to an event A, then the
probability of A is the ratio of 𝑛(𝐴) to 𝑛(𝑆). Symbolically
𝑛(𝐴)
𝑃(𝐴) =
𝑛(𝑆)
The definition under classical approach is also known as a priori or mathematical definition.
Example: An ordinary die is rolled once. Find the probability that,
i) an even number occurs?
ii) a number greater than 4 occurs?
Solution: Let S = {1,2, … . ,6}.If A denotes an even number and B a number greater than 4, then
A = {2,4,6} and B = {5,6} then,
𝑛(𝐴) 3 1
I. 𝑃(𝐴) = = =
𝑛(𝑆) 6 2
𝑛(𝐵) 2 1
II. 𝑃(𝐵) = = =
𝑛(𝑆) 6 3

Example: A bag contains 4 white and 6 red balls. A ball is drawn at random from the bag.
What is the probability it is red? That it is white? Are the events obtaining a red ball and
obtaining a white ball equally likely?
Solution: Here 𝑛(𝑆) = 10, 𝑛(𝑊) = 4, 𝑎𝑛𝑑 𝑛(𝑅) = 6.Hence,
𝑛(𝑅) 6
i. 𝑃(𝑅) = = = 0.6
𝑛(𝑆) 10
𝑛(𝑊) 4
ii. 𝑃(𝑊) = = = 0.4
𝑛(𝑆) 10

Since 𝑃(𝑅) ≠ 𝑃(𝑊),the occurrences of the event R and W are not equally likely.
Example: An ordinary die is rolled once. Find the probability that,

Page 6 of 68
i. An even number occurs?
ii. A number greater than 4 occurs?
Solution: Let 𝑆 = {1,2,3,4,5,6}. If A denotes an even number and B a number greater than 4,
then 𝐴 = {2,4,6} and 𝐵 = {5,6},then
𝑛(𝐴) 3 1
i. 𝑃(𝐴) = = =
𝑛(𝑆) 6 2
𝑛(𝐵) 2 1
ii. 𝑃(𝐵) = = =
𝑛(𝑆) 6 3

Example: A newly married couple plans to have two children and suppose that each child is
equally likely to be a boy or a girl. In order to find a sample space for this experiment, let B
denote that a child is a boy and G denote that a child is a girl. Then one possible sample space
that can be formed is S= {BB, BG, GB, GG}
The double BG, for instance represents the outcome 'the older child is a boy', while 'the younger
one is girl.
a. What is the probability that the couple will have two boys?
b. What is the probability that the couple will have one boy and one girl?
c. What is the probability that the couple will have at most one boy?
Solution: Let A1, A2, and A3 be the events that the couple will have two boys, one boy one
girl, and at most one boy respectively so that,
A1= {BB}, A2= {BG, BB}, A3= {BG, GB, GG}
Since but assumption, all the points in S are equally likely, that is,
𝑃(𝐵𝐵) = 𝑃(𝐵𝐺) = 𝑃(𝐺𝐵) = 𝑃(𝐺𝐺) = 1⁄4. Hence,
𝑛(𝐴1) 1
a. 𝑃(A1) = = 𝑃(𝐵𝐵) =
𝑛(𝑆) 4
𝑛(𝐴2) 1 1 1
b. 𝑃(𝐴2) = = 𝑃(𝐵𝐺) + 𝑃(𝐺𝐵) = + =
𝑛(𝑆) 4 4 2
𝑛(𝐴3) 1 1 1 3
c. 𝑃(𝐴3) = = 𝑃(𝐵𝐺) + 𝑃(𝐺𝐵) + 𝑃(𝐺𝐺) = + + =
𝑛(𝑆) 4 4 4 4

Example: A fair coin is tossed three times. Compute the probability that,
a. Exactly two tosses result in heads?
b. At most one toss results is head?
Solution: The experiment consists of observing the outcome for each of the three tosses of
the coin. One of the ways of presenting the sample space is as follows:
S= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
Because the coin is fair, we would expect the outcomes to be equally likely. That is, if Ai
represents the i-th outcome, then,

Page 7 of 68
1
𝑃(Ai ) =
8
For (a), the interest be A, so that,
A= {HHT, HTH, THH}
1 1 1 3
𝑃(𝐴) = 𝑃(𝐻𝐻𝑇) + 𝑃(𝐻𝑇𝐻) + 𝑃(𝑇𝐻𝐻) = + + =
8 8 8 8
For(b), The event of interest be B, so that,
B= {TTT, THT, TTH, HTT}
And hence
1 1 1 1 1
𝑃(𝐵) = 𝑃(𝑇𝑇𝑇) + 𝑃(𝑇𝐻𝑇) + 𝑃(𝑇𝑇𝐻) + 𝑃(𝐻𝑇𝑇) = + + + =
8 8 8 8 2

For coin tossing experiment, find the probability of obtaining (a) exactly two runs and (b) Less
than two runs.
Solution: Any unbroken sequence of like letters is called a run, even though the sequence has
only one letter. Thus, the outcome HHH has one run while the outcome HTT has two runs.
We now enumerate the number of runs in the above coin tossing experiment in a tabular form
as below:
Outcome Event Number of runs
HHH A1 1
HHT A2 2
HTH A3 3
HTT A4 2
THH A5 2
THT A6 3
TTH A7 2
TTT A8 1

Let 𝑋 denote the number of runs. Hence, for (a), there are four cases which are favorable to (a)
and hence the required probability is
4 1
𝑃(𝑋 = 2) = 𝑃(𝐴2 ) + 𝑃(𝐴4 ) + 𝑃(𝐴5 ) + 𝑃(𝐴7 ) = = .
8 2

Similarly, for (b) the event of interest is 𝑋 < 2 and number of cases favourable to this event is
2, so that the required probability is

Page 8 of 68
2 1
𝑃(𝑋 < 2) = 𝑃(𝐴1 ) + 𝑃(𝐴8 ) = =
8 4
A businessman has a stock of 8400 baby wears imported from 5 different countries. The
distribution of the wears was as follows:
Country Number of wears
USA 1500
India 1200
China 2700
Korea 1000
Thailand 2000
Total 8400

A piece of baby wear was selected at random. What is the probability that it was imported from
(i)USA, from (ii)China and (iii) either from India or from Thailand?
Solution: Using classical definition of probability, we find that
1500 2700 1200 2000
𝑃(𝑈𝑆𝐴) = = 0.18, 𝑃(𝐶ℎ𝑖𝑛𝑎) = = 0.32 𝑃(𝐼𝑛𝑑𝑖𝑎 𝑜𝑟 𝑇ℎ𝑎𝑖𝑙𝑎𝑛𝑑) = + =
8400 8400 8400 8400

0.13 + 0. .24 = 0.37.

Axioms of Probability
Axiom 1: 𝟎 ≤ 𝑷(𝑨) ≤ 𝟏
The relative frequency of occurrence of any event must be greater than or equal to zero.
For example
For one dice, total sample space, 𝑆 = {1,2,3,4,5,6}
Let, A is an event, 𝐴 = {1,2,3}
Now probability of A
𝑛(𝐴)
𝑃(𝐴) = [ 𝑛(𝐴) = 𝑛𝑜 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑒𝑣𝑒𝑛𝑡 ; 𝑛(𝑆) = 𝑛𝑜 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠]
𝑛(𝑆)
3 1
= = >0
6 2

Axiom 2: 𝑷(𝑺) = 𝟏
The probability of whole sample space is equal to one.
For example:
Sample space, 𝑆 = { 1,2,3,4,5,6}
Now,

Page 9 of 68
𝑛(𝑆)
𝑃(𝑆) = =1
𝑛(𝑆)

Axiom 3: 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)


If some events are disjoint (i.e., there is no overlap between them), then the probability of
their union must be the summations of their probabilities.
For example: A and B are mutually exclusive events.
Let, A is odd number event, 𝐴 = { 2,4,6}
And B is even number event, 𝐵 = {1,3,5}
Sample space,𝑆 = {1,2,3,4,5,6}
Now,
𝑛(𝐴) 1
𝑃 (𝐴) = =
𝑛(𝑆) 2
𝑛(𝐵) 1
𝑃 (𝐵) = =
𝑛(𝑆) 2
𝑛(𝐴∪𝐵) 6
𝑃(𝐴 ∪ 𝐵) = = =1
𝑛(𝑆) 6

Hence,
1 1
P 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) = + =1
2 2

Joint Probability: A Comprehensive Analysis with Mathematical Examples


Probability theory serves as a cornerstone in various disciplines, providing a framework to
quantify uncertainty and assess the likelihood of events. Within this realm, joint probability
plays a pivotal role by addressing the simultaneous occurrence of multiple events. In this
comprehensive discussion, we delve into the intricacies of joint probability, elucidating its
concepts through mathematical examples across diverse scenarios.
Joint probability pertains to the probability of two or more events occurring simultaneously.
Formally, it is denoted as 𝑃 (𝐴 ∩ 𝐵), where A and B represent distinct events. The joint
probability captures the intersection of these events within the sample space, enabling a
nuanced assessment of their combined occurrence.

Example 1: Coin Toss


Consider a simple yet illustrative example involving the tossing of a fair coin twice. Let A
denote the event of obtaining heads on the first toss, and B represents the event of securing tails
on the second toss. In this scenario, the sample space consists of four possible outcomes:
{𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇}, where 𝐻 signifies heads and 𝑇 signifies tails. To compute the joint

Page 10 of 68
probability of A and B, we ascertain the probability of the specific outcome 𝐻𝑇, yielding
𝑃 (𝐴 ∩ 𝐵) = 𝑃(𝐻𝑇) = 1/4.

Example 2: Deck of Cards


Expanding our exploration, let's examine a scenario involving a standard deck of 52 playing
cards. Suppose two cards are drawn successively without replacement. Let A signify the event
of drawing a red card on the first draw, and B represent the event of drawing a face card (Jack,
Queen, or King) on the second draw. The sample space comprises all possible pairs of cards
drawn from the deck. To calculate 𝑃 (𝐴 ∩ 𝐵), we delineate the probability of drawing a red
card initially and a face card subsequently. By leveraging conditional probability, we deduce
the joint probability as the product of individual probabilities, encapsulating the intricate
relationship between the events.
Here, the sample space S consists of all possible pairs of cards that can be drawn from the deck.

To calculate 𝑃(𝐴 ∩ 𝐵), we need to find the probability of drawing a red card first and a face
card second.
There are 26 red cards in a deck of 52, and 12 face cards (4 Jacks, 4 Queens, and 4 Kings).
26 1
So, 𝑃(𝐴) = = (probability of drawing a red card first)
52 2

Now, after drawing a red card, there are 51 cards remaining, out of which 12 are face cards.
12
So, 𝑃(𝐵|𝐴) = (probability of drawing a face card given that the first card drawn was red)
51
1 12
Hence, 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) ∗ 𝑃(𝐵|𝐴) = ( ) ∗ ( )
2 51

Example 3: Two Dice


Further enriching our understanding, let's explore the scenario of rolling two fair six-sided dice.
Here, A signifies the event of obtaining a sum of 7, while B represents the event of the first die
displaying a 4. The sample space encompasses 36 equiprobable outcomes, corresponding to all
possible combinations of dice rolls. Through meticulous enumeration of favorable outcomes
satisfying both events A and B, we compute the joint probability, unraveling the probabilistic
intricacies inherent in dice rolling experiments.
To find 𝑃(𝐴 ∩ 𝐵), we need to determine the outcomes where the sum is 7 and the first die
shows a 4. The outcomes satisfying A and B are {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}.
6 1
So, 𝑃(𝐴 ∩ 𝐵) = =
36 6

Page 11 of 68
Properties of Joint Probability
Understanding the properties of joint probability elucidates its utility in probabilistic analysis.
Key properties include:
• Commutativity: The order of events does not influence the joint probability.
Mathematically, 𝑃 (𝐴 ∩ 𝐵) = 𝑃(𝐵 ∩ 𝐴).
• Independence: If events A and B are independent, their joint probability simplifies to
the product of their individual probabilities, i.e., 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) ∗ 𝑃(𝐵).
• Conditional Probability: Joint probability facilitates the computation of conditional
probability, enabling the assessment of the likelihood of one event given the occurrence
of another.

Related Questions:
Problem 01: In an office of 100 employees, 75 read English, 50 read Bangla dailies and 40
read both. An employee is selected at random. What is the probability that the selected
employee
(a) reads English newspaper? (b) reads at least one of the papers? (c) reads none? (d) reads
English but not Bangla?
Solution: Let us define the above events,
𝐸 = Reads English, 𝐵 = Reads Bangla, 𝐸̅ ∩ 𝐵̅ = Reads none,
𝐵 ∩ 𝐸̅ = Reads Bangla but not English.
The number of the cases favorable to the above events can be placed in a tabular form as
follows:
𝐸 𝐸̅ Total
𝐵 𝑛(𝐵 ∩ 𝐸) = 40 𝑛(𝐵 ∩ 𝐸̅ ) =? 𝑛(𝐵) = 50
𝐵̅ 𝑛(𝐵̅ ∩ 𝐸) =? 𝑛(𝐵̅ ∩ 𝐸̅ ) =? 𝑛(𝐵̅) = 50
Total 𝑛(𝐸) = 75 𝑛(𝐸̅ ) = 25 𝑛(𝑆) = 100

(a) The probability that selected employee reads English is,


𝑛(𝐸) 75
𝑃(𝐸) = = = 0.75
𝑛(𝑆) 100
(b) The probability that selected employee reads at least one is:
𝑃(𝐸 ∪ 𝐵) = 𝑃(𝐸) + 𝑃(𝐵) − 𝑃(𝐸 ∩ 𝐵)

Page 12 of 68
75 50 40 85
=( + − )= = 0.85
100 100 100 100
(c) The probability that selected employee reads none is:
̅̅̅̅̅̅̅̅̅̅
𝑃(𝐵̅ ∩ 𝐸̅ ) = 𝑃((𝐸 ∪ 𝐵)) = 1 − 𝑃(𝐸 ∪ 𝐵) = (1 − 0.85) = 0.15
(d) The probability that selected employee reads Bangla but not English is:
𝑛(𝐵) − 𝑛(𝐵 ∩ 𝐸) 50 − 40 10
𝑃(𝐵 ∩ 𝐸̅ ) = = = = 0.10
𝑛(𝑆) 100 100

Problem 02: Of the total students at a women’s college, 60% wear neither a ring nor a
necklace, 20% wear a ring, and 30% wear a necklace. If one of the women in randomly chosen,
find the probability that she is wearing (a) A ring or a necklace (b) Both.
Solution:
Let 𝑅 and 𝑁 respectively denotes the events that a woman wears a ring and a necklace. We are
given that, 𝑃(𝑅) = 0.20, 𝑃(𝑁) = 0.30, 𝑃(𝑅̅ ∩ 𝑁
̅) = 0.60
(a) The probability that she is wearing a ring or a necklace is:
̅̅̅̅̅̅̅̅
𝑃(𝑅 ∪ 𝑁) = 1 − 𝑃(𝑅 ∪ 𝑁) = 1 − 𝑃(𝑅̅ ∩ 𝑁
̅) = 1 − 0.60 = 0.40
(b) The probability that she wears both is:
𝑃(𝑅 ∩ 𝑁) = 𝑃(𝑅) + 𝑃(𝑁) − 𝑃(𝑅 ∪ 𝑁) = 0.20 + 0.30 − 0.40 = 0.10

Problem 03: A class contains 10 men and 20 women of whom half of the men and half of the
women have brown eyes. A person is chosen at random. What is the probability that the person
is either a man or has brown eyes?
Solution: To solve this problem, we can construct a table as follows:
Man (𝑀) Woman (𝑊) Total
Brown (𝐵) 𝑛(𝑀 ∩ 𝐵) = 5 𝑛(𝑊 ∩ 𝐵) = 10 𝑛(𝐵) = 15
Not brown (𝐵̅) 𝑛(𝑀 ∩ 𝐵̅) = 5 𝑛(𝑊 ∩ 𝐵̅) = 10 𝑛(𝐵̅) = 15
Total 𝑛(𝑀) = 10 𝑛(𝑊) = 20 𝑛(𝑆) = 30

We must compute 𝑃(𝑀 ∪ 𝐵) were,


𝑃(𝑀 ∪ 𝐵) = 𝑃(𝑀) + 𝑃(𝐵) − 𝑃(𝑀 ∩ 𝐵)
From the above table we have,
𝑛(𝑀) 10 1
𝑃(𝑀) = = =
𝑛(𝑆) 30 10

Page 13 of 68
𝑛(𝐵) 15 1
𝑃(𝐵) = = =
𝑛(𝑆) 30 2
𝑛(𝑀 ∩ 𝐵) 5 1
𝑃(𝑀 ∩ 𝐵) = = =
𝑛(𝑆) 30 6
Thus,
1 1 1 2
𝑃(𝑀 ∪ 𝐵) = ( + − ) =
3 2 6 3
2
The probability that the person is either a man or has brown eyes is .
3

Conditional Probability
The probability of an event A when it is known that some other event B has occurred is called
a conditional probability and is denoted by P(A|B). The symbol P(A|B) is usually read as ‘the
probability that A occurs given that B occurs or simply probability of A given B, where the
slash ‘|’ stands for ‘given that’. In general P(A|B) is not equal to P(A).
With two events A and B, the most fundamental formula to compute conditional probability
for A given B is
𝑃(𝐴∩𝐵)
𝑃(𝐴|𝐵) = , 𝑃(𝐵) ≠ 0
𝑃(𝐵)

and that for B given A is


𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵|𝐴) = , 𝑃(𝐴) ≠ 0
𝑃(𝐴)
It thus follows from the above equations that for two dependent events A and B,
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵|𝐴) = 𝑃(𝐵)𝑃(𝐴|𝐵)
since 𝐴 ∩ 𝐵 = 𝐵 ∩ 𝐴
This rule is frequently referred to as the multiplication law, multiplication theorem or law of
compound probability.
The law may be stated more precisely as follows:
Definition: For two events A and B, the probability of their simultaneous occurrence is equal
to the product of the unconditional probability of A and conditional probability of B, given
that A has actually occurred. Symbolically
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵|𝐴).
We can extend the above multiplication rule when the number of events is 3 or more. For 3
events A1, A2 and A3,
𝑃(𝐴1 ∩ 𝐴2 ∩ 𝐴3 ) = P(𝐴1 )P(𝐴2 |𝐴1 )P(𝐴3 |𝐴1 ∩ 𝐴2 )

Page 14 of 68
Where C|A∩ 𝐵 is read as “C occurs given that A and B have already occurred”.
For k events, the rule is as follows:
𝑃(𝐴1 ∩ 𝐴2 ∩ … . .∩ 𝐴𝐾 ) = P(𝐴1 )P(𝐴2 |𝐴1 )P(𝐴3 |𝐴1 ∩ 𝐴2 )…P(𝐴𝐾 |𝐴1 ∩ 𝐴2 ….∩ 𝐴𝐾−1 ).
Refer to the values in the preceding section. Suppose that the selected person was known to be
a male. We now ask: what is the probability under the changed situation that he is employed?
This is a problem of conditional probability and we symbolically write this as P(E|M).
The probability P(E|M) can be computed once P(E∩ 𝑀) and P(M) are known from the
original sample space:
𝑃(𝐸∩𝑀) .51
𝑃(𝐸|𝑀) = = =0.93.
𝑃(𝑀) .55

An alternative way of computing P(E|M) is to use the reduced sample space M, which is a
part of S. To accomplish this, note that
𝑛(𝐸∩𝑀) 𝑛(𝑀)
𝑃(𝐸|𝑀) = and P(M) =
𝑛(𝑆) 𝑛(𝑆)

Substituting these quantities in (a)


𝑛(𝐸∩𝑀)
𝑃(𝐸|𝑀) = .
𝑛(𝑀)

Referring once again to the tabular values


𝑛(𝐸∩𝑀) 255
𝑃(𝐸|𝑀) = = = 0.93
𝑛(𝑀) 275

As before.
Example 01: A pair of dice is thrown. Find the probability that sum of the points on the two
dice is 10 or greater if a 5 appears on the first die.
Solution: Let A be the event that sum of the points on the two dice is 10 or greater and B be
the event that a 5 appears on the first toss. Symbolically, we want to evaluate the conditional
probability 𝑃(𝐴|𝐵).
Now,
A ={(4,6), (5,5), (5,6), (6,4), (6,5), (6,6)},
B ={(5,1), (5,2), (5,3), (5,4), (5,5), (5,6)}
𝐴 ∩ 𝐵 = {(5,5), (5,6)}
Hence,
𝑃(𝐴∩𝐵) 2/36 1
𝑃(𝐴|𝐵) = = =
𝑃(𝐵) 6/36 3

Alternatively, if B is considered as a reduced sample space, then only two sample points, viz.
(5,5) and (5,6) are favorable to the event that the sum is 10 or more. Since there are 6 sample
points in B , the required probability is

Page 15 of 68
2 1
𝑃(𝐴|𝐵) = = ,
6 3
as ought to be.
Example 02: the probability that a married man watches a certain TV show is 0.4 and that his
wife watches the show is 0.5. the probability that a man watches the show, given that his wife
does, is 0.7. Find
(a) The probability that a married couple watches the show.
(b) The probability that a wife watches the show given that her husband does.
(c) The probability that at least one of the partners will watch the show.
Solution: let us define two events H and W as follows:
H: Husband watches the show
W: Wife watches the show
We are given that
P(H)=0.4, P(W)=0.5 and P(H|W) = 0.7.
(a) The probability that the couple watches the show is
P(W∩ 𝐻)=P(W)P(H|W) = 0.5× 0.7 = 0.35
(b) The conditional probability that a wife watches the show given that her husband also
watches
𝑃(𝑊 ∩ 𝐻) 0.35
𝑃(𝑊|𝐻) = = = 0.875
𝑃(𝐻) 0.40
(c) The probability that at least one (either H or W or both) watches
𝑃(𝑊 ∪ 𝐻) = 𝑃(𝑊) + 𝑃(𝐻) − 𝑃(𝑊 ∩ 𝐻) = 0.40 + 0.50 − 0.35 = 0.55.

Example 03: A coin is tossed until a head appears or it has been tossed three times. Given
that the head does not appear on the first toss, what is the probability that the coin is tossed
three times?
Solution: A sample space for the experiment is S = {H, TH, TTH, TTT}.
The associated probabilities are
1 1 1 1
𝑃(𝐻) = , 𝑃(𝑇𝐻) = , 𝑃(𝑇𝑇𝐻) = , 𝑃(𝑇𝑇𝑇) = .
2 4 8 8

Let A be the event that the coin is tossed 3 times and B be the event that no heads appear in
the first toss so that
𝐴 = {𝑇𝑇𝐻, 𝑇𝑇𝑇}, 𝐵 = {𝑇𝐻, 𝑇𝑇𝐻, 𝑇𝑇𝑇},
and hence
𝐴 ∩ 𝐵 = {𝑇𝑇𝐻, 𝑇𝑇𝑇}.

Page 16 of 68
The associated probabilities are
1 1 1 1 1 1 1
𝑃(𝐴) = + = , 𝑃(𝐵) = + + =
8 8 4 4 8 8 2
1 1 1
and thus 𝑃(𝐴 ∩ 𝐵) = + = .
8 8 4

Hence the required conditional probability is


𝑃(𝐴 ∩ 𝐵) 1
𝑃(𝐴|𝐵) = = .
𝑃(𝐵) 2

Example 04: In a community there are equal number of males and females. Suppose 5% of
the males and 2% of females are disabled. A person is chosen at random. If this person is male,
what is the probability that he is disabled?
Solution: Let D stand for the event ‘disabled’ and M and F respectively for male and female.
As males and females are in equal proportion,
𝑃(𝑀) = 𝑃(𝐹) = 0.5 .
Also
𝑃(𝑀 ∩ 𝐷) = 0.05, 𝑃(𝐹 ∩ 𝐷) = 0.02
We want the conditional probability that the selected person is disabled:
𝑃(𝐷 ∩ 𝑀) 𝑃(𝑀 ∩ 𝐷) 0.05
𝑃(𝐷|𝑀) = = = = 0.1
𝑃(𝑀) 𝑃(𝑀) 0.5

The relative frequency approach


If an experiment is repeated 𝑛 times under similar condition as a result of which an event 𝐴
𝑚
occurs 𝑚 times, then the limit of the ratio tends to an idealized value as n becomes infinitely
𝑛

large. This idealized value is called the probability of the event 𝐴. Symbolically,
𝑚
Lim = 𝑃(𝐴).
𝑛→0 𝑛

More specific interpretation of this definition is that, if an experiment is repeated a large


number of times, there is a high probability that the proportion of repeats producing a specific
event will be very close to the probability of that event. As a illustration of the idea of
relationship between the relative frequency and the probability of the event, we show below a
simulated result of flipping a perfect die 10,000 by Ross (2005: 154).
Face of the die showing
1 2 3 4 5 6
Frequency 1724 1664 1628 1648 1672 1664

Page 17 of 68
Relative Frequency .1724 .1664 .1628 .1648 .1672 .1664
Note: 1/6=0.166667
The definition provided under relative frequency approach is also known as aposteriori or
empirical or statistical definition of probability.
Conceptually, the frequency definition of probability is a more appropriate definition of
probability.
Example 1: suppose you want to predict that a student being admitted in the first year honors
class in economics will belong to tribal area in any particular year. if your admission records
of several years in the past reveals that 12 percent of the admitted students come from tribal
areas, then it might be reasonable to assume that the probability of a tribal student being
admitted in the class is approximately 0.12.
Example 2: The dean of science has noticed that, according to past records, only 55% of the
students who begin a program successfully graduate from the programs 4 years later. We
choose a name at random from the list of beginning students to evaluate the chance that he will
successfully graduate from the program in 4 years. Basing probabilities on the statistical record,
the student has a 55% chance and, hence, a probability of 55/100, or more simply 11/20 of
graduating successfully. This is a problem that falls under the frequency interpretation of
probability.

Independence of Events

If A and B are two events and if the occurrence of A does not affect, and is not affected by the
occurrence of B, then A and B are said to be independent. Two events A and B are said to be
independent if and only if 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵).

(Joint probability is the product of their marginal/individual probability for independent


events).

Example 1: Three coins are tossed. Show that the events “heads on the first three coin” and
the event “tails on the last two” are independent.

Solution: The sample space for this experiment is:

𝑆 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇}

Let,

Page 18 of 68
A: head on the first

B: tails on the last two coins

So, 𝐴 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇}, 𝐵 = {𝐻𝑇𝑇, 𝑇𝑇𝑇}

Now,

4 1 2 1 1
𝑃(𝐴) = = , 𝑃(𝐵) = = , 𝑃(𝐴 ∩ 𝐵) =
8 2 8 4 8

Since,

1 1 1
𝑃(𝐴) × 𝑃(𝐵) = × = = 𝑃(𝐴 ∩ 𝐵)
2 4 8

The events are independent.

Example 2: In a community, 36% of the families own a dog and 22% of the families own both
a dog and a cat. If a randomly selected family owns a dog, what is the probability that it owns
a cat too?

Solution: Let us define the events of interest as follows:

D: Family owns a dog

C: Family owns a cat

Then

𝑃(𝐷) = 0.36 𝑎𝑛𝑑 𝑃(𝐷 ∩ 𝐶) = 0.22

Since being owner of a dog and owner of a cat are independent,

𝑃(𝐷 ∩ 𝐶) = 𝑃(𝐷) × 𝑃(𝐶) = 0.22

Hence

0.22
𝑃(𝐶) = = 0.61
0.36

Page 19 of 68
Independence of more than two events:

𝑃(𝐴1 ∩ 𝐴2 ∩ … … ∩ 𝐴𝑛 ) = 𝑃(𝐴1 ) × 𝑃(𝐴2 ) × … … × 𝑃(𝐴𝑛 )

Bayes’ Theorem

The Bayes’ Theorem is a mathematical formula used to determine the conditional probability
of events. Let 𝐴 and 𝐵 be two events and let𝑃(𝐴|𝐵)be the conditional probability of 𝐴 given
𝐵 has occurred. Then, Bayes’ theorem states that-

𝑃(𝐴 ∩ 𝐵) 𝑃(𝐵|𝐴)𝑃(𝐴)
𝑃(𝐴|𝐵) = =
𝑃(𝐵) 𝑃(𝐵|𝐴)𝑃(𝐴) + 𝑃(𝐵|𝐴𝑐 )𝑃(𝐴𝑐 )

Proof: First, consider the following facts-

According to conditional probability:

𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵|𝐴) =
𝑃(𝐴)

⟹ 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵|𝐴)𝑃(𝐴) → (1)

Now, using total probability theorem -

𝑃(𝐵) = 𝑃((𝐵 ∩ 𝐴) ∪ (𝐵 ∩ 𝐴𝑐 ))

⟹ 𝑃(𝐵) = 𝑃(𝐵 ∩ 𝐴) + 𝑃(𝐵 ∩ 𝐴𝑐 )

⟹ 𝑃(𝐵) = 𝑃(𝐵|𝐴)𝑃(𝐴) + 𝑃(𝐵|𝐴𝑐 )𝑃(𝐴𝐶 ) → (2)

Note that, 𝐴 ∩ 𝐵 = 𝐵 ∩ 𝐴

Now,

𝑃(𝐴 ∩ 𝐵) 𝑃(𝐵|𝐴)𝑃(𝐴)
𝑃(𝐴|𝐵) = = ; [𝑈𝑠𝑖𝑛𝑔 → (1)]
𝑃(𝐵) 𝑃(𝐴)

𝑃(𝐵|𝐴)𝑃(𝐴)
⟹ 𝑃(𝐴|𝐵) = ; [𝑈𝑠𝑖𝑛𝑔 ⟶ (2)]
𝑃(𝐵|𝐴)𝑃(𝐴) + 𝑃(𝐵|𝐴𝑐 )𝑃(𝐴𝑐 )

Proved

Page 20 of 68
For the events 𝐴, 𝐵, 𝐶, 𝐷;

𝑃(𝐴)𝑃(𝐷|𝐴) 𝑃(𝐴)𝑃(𝐷|𝐴)
𝑃(𝐴|𝐷) = =
𝑃(𝐷) 𝑃(𝐴)𝑃(𝐷|𝐴) + 𝑃(𝐵)𝑃(𝐷|𝐵) + 𝑃(𝐶)𝑃(𝐷|𝐶)

In general, for k events 𝐴1 , 𝐴2 , … … , 𝐴𝑘 ;

𝑃(𝐴𝑖 )𝑃(𝐷|𝐴𝑖 )
𝑃(𝐴|𝐷) = 𝑘
∑𝑖=1 𝑃(𝐴𝑖 ) 𝑃(𝐷|𝐴𝑖 )

Example 1: A blood test is 90% effective in detecting a certain disease when the disease is
present. However, the test yields a false-positive result for 5% of the healthy patients tested.
Suppose 1% of the population has the disease. Find the conditional probability that a randomly
chosen person actually has the disease given that his test result is positive.

Solution: Let,

D: the person has the disease

P: the test is positive

Now,

𝑃(𝐷 ∩ 𝑃) 𝑃(𝐷)𝑃(𝑃 | 𝐷)
𝑃(𝐷 | 𝑃) = =
𝑃(𝑃) 𝑃(𝐷)𝑃(𝑃 | 𝐷) + 𝑃(𝐷𝑐 )𝑃(𝑃 |𝐷𝑐 )

0.01×0.90
⇒𝑃(𝐷| 𝑃) = = 0.15
(0.01×0.90)+(0.99×0.05)

Example 2: Consider two urns. The first urn contains 2 white and 7 black balls and second urn
contains 5 white & 6 black balls. We flip a fair coin and then draw a ball from the first urn or
the second urn depending on whether the outcome was heads or tails. What is the conditional
probability that the outcomes of the toss were heads given that a white ball was selected?

Solution: Let,

W: the event that a white ball is drawn

H: the event that the coin comes up heads

Now,

Page 21 of 68
𝑃(𝐻 ∩ 𝑊) 𝑃(𝐻)𝑃(𝑊 | 𝐻)
𝑃( 𝐻 | 𝑊 ) = =
𝑃(𝑊) 𝑃(𝑊)

𝑃(𝐻)𝑃(𝑊 |𝐻)
=
𝑃(𝐻)𝑃(𝑊 | 𝐻) + 𝑃(𝐻 𝑐 )𝑃(𝑊 |𝐻 𝑐 )

1 2
(2)(9) 22
= 1 2 1 5 = .
( )( )+( )( ) 67
2 9 2 11

Random Variable

Random variable is a special terminology used for discussing probability distribution of


numerical scores. It is also known as stochastic or chance variable. This is a chance variable in
the sense that the values of this variable cannot exactly be predicted beforehand. We may define
random variable as follows:

A variable, whose values are any definite numbers or quantities that arise as result of chance
factors such that they cannot exactly be predicted in advance, is called a random variable.

The random variable may also be viewed as a function. For example, A school consists of 7
teachers of whom 4 are males and 3 are females. A committee of 2 teachers is to be formed.
If 𝑌 stands for the number of male teachers selected, then 𝑌 is a random variable assuming the
values of 0, 1 and 2. The possible outcomes and the values of the random variable 𝑌 are:

Events Sequence of events 𝑌=𝑦


𝑒1 Male, male 2
𝑒2 Male, female 1
𝑒3 Female, female 0

Thus, we may more formally define a random variable as follows:

A random variable is a real-valued function defined over a sample space.

Types of random variable:

A random variable may be classified as either discrete or continuous.

Page 22 of 68
Discrete random variable:

A random variable defined over a discrete sample space (i.e. that may only take on a finite or
countable number of different isolated values) is referred to as a discrete random variable.

Example:

a. Number of telephone calls received in a telephone booth per day.

b. Number of under five children in a family.

Continuous random variable:

A random variable defined over a continuous sample space (i.e. which may take on any value
in a certain interval) is referred to as a continuous random variable.

Example:

a. Weight of a six-month-old baby.

b. Longevity of an electric bulb.

Probability Distribution:
Any statement of a function associating each of a set of mutually exclusive and exhaustive
classes or class intervals with its probability is a probability distribution. A probability
distribution will be either discrete or continuous according to the random variable of interest. It
is a mathematical function that describes the likelihood of obtaining the possible values that a
random variable can take. It provides a systematic way to assign probabilities to different
outcomes, reflecting the uncertainty associated with random phenomena. Probability
distributions can be categorized into two main types: discrete and continuous.
Types of Probability Distributions:
1. Discrete probability distribution: A distribution where the random variable can only
take distinct, separate values. Examples- Bernoulli distribution (models a binary
outcome), binomial distribution (describes the number of successes in a fixed number
of independent Bernoulli trials), Poisson distribution (models the number of events
occurring in a fixed interval of time or space).
2. Continuous probability distribution: A distribution where the random variable can take
any value within a given range. Examples- uniform distribution (all values within a

Page 23 of 68
given interval are equally likely), normal distribution (symmetric bell-shaped curve;
characterized by mean and standard deviation), exponential distribution (models the
time until an event occurs in a Poisson process). Understanding these types of
distributions is fundamental for analyzing and interpreting data in various fields, from
statistics and finance to science and engineering.
Example 1: A bag contains 10 balls of which 4 are black. If 3 balls are drawn at random without
replacement, obtain the probability distribution for the number of black balls drawn.
Solution: If 𝑋 denotes the number of black balls drawn, then clearly 𝑋 can assume values 0,
1, 2 and 3. To obtain the probability distribution of 𝑋 , we need to compute the probabilities
associated with 0, 1 2 and 3. Since 3 balls are to be chosen, the number of ways in which this
10
choice can be made is 𝐶3 .
Thus,

𝑓(0) = 𝑃(𝑋 = 0) =4 𝐶0 ×6 𝐶3 / 10 𝐶3 = 20/120

𝑓(1) = 𝑃(𝑋 = 1) =4 𝐶1 ×6 𝐶2 / 10 𝐶3 = 60/120

𝑓(2) = 𝑃(𝑋 = 2) =4 𝐶2 ×6 𝐶1 / 10 𝐶3 = 36/120

𝑓(3) = 𝑃(𝑋 = 3) =4 𝐶3 ×6 𝐶0 / 10 𝐶3 = 4/120


Hence, the tabular form of the probability distribution of X will be as follows:

𝑋 𝑓(𝑥)
0 20/120
1 60/120
2 36/120

Example 2: An unbiased coin is tossed three times. Let 𝑋 be the number of runs obtained as a
result of the outcomes of this experiment. Find the probability distribution of 𝑋 .

Solution: Any unending sequence of a particular outcome will be counted as a run. Thus, for
the outcome 𝐻𝐻𝐻 , of runs is 1, since there no break or discontinuity in the sequence from the
first to the third outcome. Similarly, for the outcome 𝐻𝑇𝐻 , the number of runs is equal to 3.
We may put a bar on the top of the outcome to denote a run as follows: 𝐻 𝐻 𝐻, 𝐻 𝑇 𝐻 and so
on. We now construct a sample space for the above experiment along with the values of the
random variable 𝑋 together with their associated probabilities:

Page 24 of 68
Outcome Number of runs 𝑃(𝑋 = 𝑥)
(𝑋 = 𝑥)

𝐻𝐻𝐻 1 1/8

𝐻𝐻𝑇 2 1/8

𝐻𝑇𝐻 3 1/8

𝐻𝑇𝑇 2 1/8

𝑇𝐻𝐻 2 1/8

𝑇𝐻𝐻 3 1/8

𝑇𝑇𝐻 2 1/8

𝑇𝑇𝑇 1 1/8

The random variable 𝑋 is seen to take on three distinct values 1, 2 and 3 with probabilities
2/8 , 4/8 , and 2/8 respectively. The values of the random variables and their associated
probabilities are summarized in a tabular form, which appear below:

Number of runs 𝑃(𝑋 = 𝑥)


(𝑋 = 𝑥)
1 2/8
2 4/8
3 2/8
Total 1.0

Example 3: Obtain the probability distribution of the number of turning points in all possible
permutations of first four natural numbers.

Solution: In a sequence of first three successive values, the middle-most value is said to form
a turning point if it is either greater or less than the value preceding or following it. Thus, in
sequence 1, 2, 3, the value 2 is greater than its preceding value (1) but less than 3. Hence the
number of turning points is 0 for this sequence. For the sequence 1, 4, 2, we have a turning
point, since 4 is greater than both 1 and 2, so is the case for 4, 2, 3. We now make all possible
permutations of the numbers 1, 2, 3 and 4:

Page 25 of 68
Permutation Turning points Permutation Turning points
1234 0 3124 1
1243 1 3142 2
1324 2 3241 2
1342 1 3214 1
1423 2 3412 2
1432 1 3421 1
2134 1 4123 1
2143 2 4132 2
2341 1 4231 2
2314 2 4213 1
2413 2 4312 1
2431 1 4321 0

Since these 24 possible permutations (4! = 24 ) are equally likely, each sequence has a
probability of 1/24. The random variable 𝑋 is seen to take on values 0, 1 and 2. Hence the
probability distribution of 𝑋 is as displayed in the accompanying table:

Values of 𝑋 = 𝑥 𝑃(𝑋 = 𝑥)
0 2/24
1 12/24
2 10/24
Total 1.0

Discrete probability distribution or Probability mass function (pmf):

If a random variable “𝑋” has a discrete distribution, the probability distribution of 𝑋 is defined
as the function 𝑓 such that for any real numbers 𝑥, 𝑓(𝑥) = 𝑃(𝑋 = 𝑥).

The function 𝑓(𝑥) defined above must satisfy the following conditions to be a pmf.

1. 𝑓(𝑥) ≥ 0

Page 26 of 68
2. ∑𝑥 𝑓(𝑥) = 1
3. 𝑃(𝑋 = 𝑥) = 𝑓(𝑥)

Verify whether the following functions are probability mass function.

2𝑥−1
a. 𝑓(𝑥) = , 𝑥 = 0,1,2,3
8
3𝑥=6
b. 𝑓(𝑥) = , 𝑥 = 1,2
8
𝑥+1
c. 𝑓(𝑥) = , 𝑥 = 0,1,2,3
16

Solution:

a. Summing the function over the entire range of X,

3
∑ 𝑓(𝑥) = 𝑓(0) + 𝑓(1) + 𝑓(2) + 𝑓(3)
𝑥=0

1 1 3 5
=− + + + =1
8 8 8 8

1
Here, the total probability is 1 , but (𝑋 = 0) = 𝑓(0) = − , which provides impossible
8

negative value. So, 𝑓(𝑥) is not a probability mass function.

9 12
b. Here, ∑2𝑥=1 𝑓(𝑥) = + =1
21 21

and, 𝑓(𝑥) > 0 , So, 𝑓(𝑥) is a probability mass function.

c. Summing the function over the entire range of X,

3 1 2 3 4
∑ 𝑓(𝑥) = + + +
𝑥=0 16 16 16 16
10
= ≠1
16

Here, the total probability is not 1. So, 𝑓(𝑥) is not a probability mass function.

Page 27 of 68
Example with PMF table

The probability mass function example is given below:

Question: Let X be a random variable and P (X = x) is the PMF given by:

X 0 1 2 3 4 5 6 7
P(X=x) 0 k 2k 2k 3k 𝑘2 2𝑘 2 7𝑘 2 + 𝑘

1.Determine the value of K

2.Find the probability (i) P(X<=6), (ii) P(3<X<=6)

Solution:

(1) We know that


∑P(Xi) = 1.

Therefore,
0+k+2k+2k+3k+𝑘 2 + 2𝑘 2 + 7𝑘 2 + 𝑘 = 1

10k^2+9k-1 = 0; (10𝑘 − 1)(𝑘 + 1) = 0

10k- 1 = 0 and k+1 = 0

Therefore, k = 1/10 and k = -1.

K = -1 is not possible because the probability volume ranges from 0 to 1.

Hence the value of k is 1/10.

(2) (i)
P(X<=6) = 1-P(X>6)
=1-(7𝑘 2 + 𝑘)
1 2 1
=1-(7 ( ) + ( ))
10 10

17
=1-( )
100

=(100-17)/100
=83/100

Page 28 of 68
Therefore, P(X<=6)=83/100

(ii)P(3<X<=6)=P(X=4)+P(X=5)+P(X=6)
=3𝑘 2 + 𝑘 2 + 2𝑘 2
3 1 2 1 2
=( ) + ( ) + 2 ( )
10 10 10

=3/10+1/100+2/100
=(30+3)/100
=33/100
P(3<x<=6) = 33/100

Obtain probability mass function from school-teacher example:

Solution:

The random variable X is the number of males which takes values 0, 1, 2.

Now, 𝑓(0) = 𝑃(𝑋 = 0) =4 𝐶0 ×3 𝐶2 / 7 𝐶2

𝑓(1) = 𝑃(𝑋 = 1) =4 𝐶1 ×3 𝐶1 / 7 𝐶2

𝑓(2) = 𝑃(𝑋 = 2) =4 𝐶2 ×3 𝐶0 / 7 𝐶2

Hence, the probability distribution of X is,

X : 0 1 2

f(x) : 3/21 12/21 6/21

The above probability distribution can be expressed as,

𝑓(𝑥) = 𝑃(𝑋 = 𝑥) =4 𝐶𝑥 ×3 𝐶2−𝑥 / 7 𝐶2 , 𝑥 = 0, 1, 2.

Example: The pmf of 𝑋 is given as


3 𝑥
𝑓(𝑋) = 𝛼 ( ) , 𝑋 = 0,1,2 … . . ∞
4
= 0 elsewhere
Evaluate α and find 𝑃(𝑋 ≤ 3).

Page 29 of 68
Solution:
Since 𝑓(𝑥) is a probability function, ∑∞
𝑥=0 𝑓(𝑥) = 1

Now,
3 0
𝑓(0) = 𝛼 ( ) = α
4

3 1 3
𝑓(1) = 𝛼 ( ) = 𝛼 ( )
4 4
3 2
𝑓(2) = 𝛼 ( )
4
3 3
𝑓(3) = 𝛼 ( )
4
And so on.
Hence,
∞ 3 3 2 3 3
∑ 𝑓(𝑥) = 𝛼 + 𝛼 ( ) + 𝛼 ( ) + 𝛼 ( ) + ⋯ = 1
𝑥=0 4 4 4
3 3 2 3 3
⟹ 𝛼 (1 + + ( ) + ( ) + ⋯ ) = 1
4 4 4

1
⟹ 𝛼( )=1
3
1−
4
⟹𝛼×4=1
1
⟹𝛼=
4
1 1 𝑥
So, the complete 𝑝𝑚𝑓 of 𝑋 is 𝑓(𝑥) = ( ) ( ) , 𝑥 = 0,1,2 … . . ∞
4 4

Now, 𝑃(𝑋 ≤ 3) = 𝑃(𝑋 = 3) + 𝑃(𝑋 = 2) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 0)


1 3 3 2 3 3
= (1 + + ( ) + ( ) )
4 4 4 4
175
= .
256

Page 30 of 68
Discrete Distribution Function

In many occasions, we are interested to know the probability that a random variable takes on a
value less than or equal to a prescribed number 𝑥1 , say. If two dice are thrown, for example,
what is the probability that the sum is less than or equal to 5? If 3 coins are tossed, what is the
probability that at most 2 show heads? Answer to such questions is provided by what is known
as the cumulative distribution function, which applies to both continuous variable and discrete
variable.

Definition: The cumulative distribution function (CDF) or simply the distribution function
𝐹(𝑥) of a discrete random variable 𝑋 with probability function 𝑓(𝑥) defined over all real
numbers 𝑥 is the cumulative probability up to and including the point 𝑥. Symbolically, it is
defined as follows:

𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∑𝑡≤𝑥 𝑓(𝑡).

The function 𝐹(𝑥) is monotonic increasing, i.e. 𝐹(𝑎) ≤ 𝐹(𝑏), whenever 𝑎 ≤ 𝑏. And the limit
of 𝐹 to the left is 0 and to the right is 1:

Lim 𝐹(𝑥) = 0 and Lim 𝐹(𝑥) = 1.


𝑥→−∞ 𝑥→+∞

The value of 𝐹(𝑥) at any point must be a number in the interval 0 ≤ 𝐹(𝑥) ≤ 1, because 𝐹(𝑥)
is the probability of the event (𝑋 ≤ 𝑥).

Example: Consider the following table shows the probability distribution of a random variable
𝑋

𝑿: 0 1 2 3
𝒇(𝒙): 20/120 60/120 36/120 4/120

Here,

20
𝐹(0) = 𝑃(𝑋 = 0) = 𝑓(0) =
120

80
𝐹(1) = 𝑃(𝑋 ≤ 1) = 𝑓(0) + 𝑓(1) = 𝐹(0) + 𝑓(1) =
120

116
𝐹(2) = P(X ≤ 2) = 𝑓(0) + 𝑓(1) + 𝑓(2) = 𝐹(1) + 𝑓(2) =
120

Page 31 of 68
𝐹(3) = 𝑃(𝑋 ≤ 3) = 𝑓(0) + 𝑓(1) + 𝑓(2) + 𝑓(3) = 𝐹(2) + 𝑓(3) = 1

A tabular presentation of 𝐹(𝑥) is as follows:

𝑋 <0 0 1 2 3 >3
𝑓(𝑥) 0 20⁄120 60⁄120 36⁄120 4⁄120 0
𝐹(𝑥) 0 20⁄120 80⁄120 116⁄120 1 1

A formal way of presenting the distribution function 𝐹(𝑥) is as follows:

0,
20 𝑥<0
,
120 0≤𝑥≤1
80
𝐹(𝑥) = , 1≤𝑥< 2
120
116 2≤𝑥<3
,
120 𝑥≥3
{ 1,

Example: A coin is tossed three times. If 𝑋 is the random variable representing the number
of heads obtained, fined the probability distribution of 𝑋 and hence obtain 𝐹(𝑥).

Solution:

Sample space, 𝑆 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇}

Let 𝑋 be the number of head observed.

So, from the sample space,

[𝑋 = 3,2,2,1,2,1,1,0]

𝑋 = 0,1,2,3

So,

1
𝑃(𝑋 = 0) = 𝑃(𝑇𝑇𝑇) =
8

3
𝑃(𝑋 = 1) = 𝑃(𝐻𝑇𝑇, 𝑇𝐻𝑇, 𝑇𝑇𝐻) =
8

And so on.

Page 32 of 68
So, the probability distribution is:

𝑿: 0 1 2 3
𝒇(𝒙): 1/8 3/8 3/8 1/8

Therefore,

1
𝐹(0) = 𝑓(0) = ,
8

4
𝐹(1) = 𝑓(0) + 𝑓(1) = ,
8

7
𝐹(2) = 𝑓(0) + 𝑓(1) + 𝑓(2) = ,
8

𝐹(3) = 𝑓(0) + 𝑓(1) + 𝑓(2) + 𝑓(3) = 1.

Thus,

0, 𝑥<0
1
, 0≤𝑥<1
8
4
𝐹(𝑥) = , 1≤𝑥<2
8
7
, 2≤𝑥<3
8
{ 1, 𝑥≥3

Example: A six-sided fair is rolled once. Obtain the probability distribution of the number of
points on it and hence the distribution function.

Solution: Let 𝑋 be the number of points on the die so that 𝑋 = 1,2,3,4,5,6. If P assigns equal
mass to each of points on the die, then clearly

1
𝑓(x) = 𝑃(𝑋 = x) =
6

Page 33 of 68
Then

0, <1
1
, 1≤𝑥<2
6
2
, 2≤𝑥<3
6
3
𝐹 (𝑥 ) = 𝑃(𝑋 ≤ 𝑥) = , 3≤𝑥<4
6
4
, 4≤𝑥<5
6
5
, 5≤𝑥<6
6
{1, 6≤𝑥

Continuous Probability Distribution

In dealing with continuous variable, 𝑓(𝑥) is usually called a probability density function (pdf)
or simply density function. A formal definition of a probability density function may be
presented as follows:

Definition: A probability density function is a non-negative function and is constructed so that


the area under its curve bounded by the 𝑥-axis is equal to unity when computed over the range
of 𝑥, for which 𝑓(𝑥) is defined.

The above definition leads to conclude that a pdf is one that possesses the following properties:

1. 𝑓(𝑥) ≥ 0

2. ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1
𝑏
3. 𝑃(𝑎 < 𝑋 < 𝑏) = ∫𝑎 𝑓(𝑥)𝑑𝑥

Example: A random variable 𝑋 has the following functional form:

𝑓(𝑥) = 𝑘𝑥 , 0<𝑥<4

= 0, elsewhere

(i) Determine 𝑘 for which 𝑓(𝑥) is a density function.

(ii) Find 𝑃(1 < 𝑋 < 2) and 𝑃(𝑋 > 2).



Solution: (i) For 𝑓(𝑥) to be a density function, we must have ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1.

Page 34 of 68
Thus,

𝑘 ∫ 𝑥𝑑𝑥 = 1
0

Or,

4
𝑥2
𝑘 [ ] = 1.
2 0

From which

1
𝑘= .
8

The complete density function is thus

𝑥
𝑓(𝑥) = , 0<𝑥<4
8

= 0, elsewhere

(ii) Again,

2 2
1 𝑥2 3
𝑃(1 < 𝑋 < 2) = ∫ 𝑥𝑑𝑥 = [ ] =
8 16 1 16
1

and

4
1 4 𝑥2 3
𝑃(𝑋 > 2) = ∫2 𝑥𝑑𝑥 = [ ] = .
8 16 4 2

Example: A continuous random variable 𝑋 has the following density function:

2
𝑓(𝑥) = (1 + 𝑥), 2<𝑥<5
27

= 0, elsewhere

(i) Verify that it satisfies the condition ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1,

(ii) Find 𝑃(𝑋 < 4) and (iii) Find 𝑃(3 < 𝑋 < 4).

Page 35 of 68
Solution: (i) Integrating between 2 and 5

5 5 5
2 2 𝑥2
∫ 𝑓(𝑥)𝑑𝑥 = ∫(1 + 𝑥)𝑑𝑥 = [𝑥 + ] = 1
27 27 2 2
2 2

showing that the given function is a density function.

(ii) Since the lower limit is 2, we integrate between 2 and 4 to evaluate 𝑃(𝑋 < 4).

4
2 4 2 𝑥2 16
𝑃(𝑋 < 4) = ∫ (1 + 𝑥)𝑑𝑥 = 27 [𝑥 +
27 2 2 2
] = .
27

(iii) Evaluating the integral between 3 and 4, we obtain 𝑃(3 < 𝑋 < 4),

4
2 4 2 𝑥2 1
𝑃(3 < 𝑋 < 4) = ∫ (1 + 𝑥)𝑑𝑥 = 27 [𝑥 +
27 3 2 3
] = .
3

Continuous Distribution Function

Exactly analogous to the distribution function of a discrete random variable, a continuous


random variable as also a distribution function. This is defined as follows:

Definition: The cumulative distribution or distribution function 𝐹(𝑥) of a continuous random


variable 𝑋 with density function 𝑓(𝑥)is defined as
𝑥
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∫−∞ 𝑓(𝑡)𝑑𝑡.

If the derivative of 𝐹(𝑋) exists, then

𝑑
𝑓(𝑥) = 𝐹(𝑥) = 𝐹′(𝑥).
𝑑𝑥

𝐹(𝑥) has the following properties

1. 𝐹(𝑥) > 0

2. 𝐹(−∞) = 0

3. 𝐹(+∞) = 1

𝑏 𝑎
4. 𝑃(𝑎 < 𝑋 < 𝑏) = 𝐹(𝑏) − 𝐹(𝑎) = ∫−∞ 𝑓(𝑥)𝑑𝑥 − ∫−∞ 𝑓(𝑥)𝑑𝑥

Page 36 of 68
Example: If 𝑋 has the density function

2
𝑓(𝑥) = (1 + 𝑥), 2<𝑥<5
27

= 0, elsewhere

Obtain the distribution function and hence find 𝐹(3) and 𝐹(4). Also verify 𝑃(3 < 𝑥 < 4) =
𝐹(4) − 𝐹(3).

Solution:

The distribution function is,

2 𝑥
𝐹(𝑥) = ∫ (1 + 𝑡)𝑑𝑡
27 2

1 2
= (𝑥 + 2𝑥 − 8)
27

For 𝑥 = 3 and 𝑥 = 4,

1 2 7
𝐹(3) = (3 + 2 × 3 − 8) =
27 27

1 2 16
𝐹(4) = (4 + 2 × 4 − 8) =
27 27

Hence,

16 7 1
𝐹(4) − 𝐹(3) = − =
27 27 3

Now, 𝑃(3 < 𝑋 < 4) = 𝐹(4) − 𝐹(3).

Example: Let 𝑋 have the following probability density function:

𝑥, 0<𝑥≤1
𝑓(𝑥) = { 2 − 𝑥, 1≤2
0, otherwise

Obtain 𝐹(𝑥).

Page 37 of 68
Solution:

0, 𝑥≤0
𝑥 2
𝑥
∫ 𝑡𝑑𝑡 = , 0<𝑥≤1
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = 0 2
1 𝑥
𝑥2
∫ 𝑡𝑑𝑡 + ∫ (2 − 𝑡)𝑑𝑡 = 2𝑥 − − 1, 1<𝑥≤2
{ 0 0 2

Example: A box contains good and defective items. If an item drawn is good, the number 1 is
assigned to the drawing; otherwise, the number 0 is assigned. Let p be the probability of
drawing a good item at random.

Then

1−𝑝, 𝑥=0
𝑓(𝑥) = 𝑃(𝑋 = 𝑥) = {
𝑝, 𝑥=1

Consequently, the distribution function of 𝑋 is

0, 𝑥<0
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = {1 − 𝑝 , 0≤𝑥<1
1 , 1≤𝑥

Joint Probability Distribution

If two or more random variables are given that are defined on the same probability space, the
joint probability distribution is the corresponding probability distribution on all possible
combinations of outputs.

Or simply, Joint probability distribution shows probability distribution for two or more random
variables.

Joint probability distribution for discrete random variables:

Suppose that a given experiment involves two discrete variables 𝑋 and 𝑌. Then the joint
probability distribution of 𝑋 and 𝑌 can be expressed as

𝑓(𝑥, 𝑦) = 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦),

Page 38 of 68
Which has the following properties:

1. 𝑓(𝑥, 𝑦) ≥ 0 𝑓𝑜𝑟 𝑎𝑙𝑙 (𝑥, 𝑦)


2. ∑𝑥 ∑𝑦 𝑓(𝑥, 𝑦) = 1

3. 𝑃[(𝑋, 𝑌) ∈ 𝐴] = ∑ ∑(𝑥, 𝑦) ∈ 𝐴 𝑓(𝑥, 𝑦) for any region A in the 𝑋𝑌 plain.

Example: A coin is tossed three times. If 𝑋 denotes the number of heads and 𝑌 denotes the
number of tails in the last two tosses, then find the joint probability distribution of 𝑋 and 𝑌.

Solution: The outcomes of the experiment and the associated probabilities are shown below:

Outcome 𝑿 𝒀 𝑷(𝑿, 𝒀)
HHH 3 0 1/8
HHT 2 1 1/8
HTH 2 1 1/8
HTT 1 2 1/8
THH 2 0 1/8
THT 1 1 1/8
TTH 1 1 1/8
TTT 0 2 1/8

It is easy to see that 𝑋 assumes values 0, 1, 2, and 3, while 𝑌 assumes values 0,1, and 2. The
joint probability distribution can be written as:

𝑿 values
𝒀 values 0 1 2 3 Row sum
0 0 0 1/8 1/8 2/8
1 0 2/8 2/8 0 4/8
2 1/8 1/8 0 0 2/8
Column sum 1/8 3/8 3/8 1/8 1

Joint distribution for continuous variables:

Let 𝑋 and 𝑌 be two continuous random variables. Then the function 𝑓(𝑥, 𝑦) is called the joint
probability density function of 𝑋 and 𝑌 if
1. 𝑓(𝑥, 𝑦) ≥ 0, for all (𝑥, 𝑦)

Page 39 of 68
∞ ∞
2. ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1
3. 𝑃[(𝑋, 𝑌)] = ∫A ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 for any region A in the 𝑋𝑌 plain.

The Cumulative distribution function (cdf) for 𝑓(𝑥, 𝑦) is defined by

𝑥 𝑦

𝐹(𝑥, 𝑦) = 𝑃[𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦] = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦


−∞ −∞

The cdf has the following properties:

1. 0 ≤ 𝐹(𝑥, 𝑦) ≤ 1
𝜕2 𝐹(𝑥,𝑦)
2. = 𝑓(𝑥, 𝑦), whenever 𝐹 is differentiable
𝜕𝑥𝜕𝑦

3. 𝐹(𝑥, −∞) = 𝐹(−∞, 𝑦) = 0


4. 𝐹(∞, ∞) = 1.

Example: Let 𝑋 and 𝑌 have the following distribution

𝑥𝑦
𝑓(𝑥, 𝑦) = 𝑥 2 + , 0 ≤ 𝑥 ≤ 1, 0≤𝑦≤2
3

= 0, elsewhere

Check that 𝑓(𝑥, 𝑦) is a density function.

Solution: The function has the joint density function if

𝑓(𝑥, 𝑦) ≥ 0, for all (𝑥, 𝑦)

And
∞ ∞

∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1
−∞ −∞

Clearly, 𝑓(𝑥, 𝑦) ≥ 0, for all values of 𝑥 and 𝑦 in the given range. And

∞ ∞ 1 2
𝑥𝑦
∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = ∫ ∫ (𝑥 2 + ) 𝑑𝑥𝑑𝑦
3
−∞ −∞ 0 0

Page 40 of 68
2 2
1 𝑦 𝑦 𝑦2
= ∫ ( + ) 𝑑𝑦 = [ + ] = 1
3 6 3 12 0
0

Hence the proof.

Example: Consider the following density function:

𝑥(1 + 3𝑦 2 )
𝑓(𝑥, 𝑦) = , 0 ≤ 𝑥 ≤ 2, 0 ≤ 𝑦 ≤ 1
4

= 0, elsewhere

1 1
Find 𝑃 [0 < 𝑋 < 1, < 𝑌 < ].
4 2

Solution: The joint probability is obtained as follows:

1/2 1 1/4
𝑥=1
1 1 𝑥(1 + 3𝑦 2 ) 𝑥 2 3𝑥 2 𝑦 2
𝑃 [0 < 𝑋 < 1, < 𝑌 < ] = ∫ ∫ 𝑑𝑥𝑑𝑦 = ∫ ( + ) 𝑑𝑦
4 2 4 8 8 𝑥=0
1/4 0 1/2

1/4
1/4
1 3𝑦 2 𝑦 𝑦3 23
= ∫ ( + ) 𝑑𝑦 = [ + ] =
8 8 8 8 1/2 512
1/2

Marginal Distribution:

When the distribution of the random variable (Say, 𝑋 or 𝑌) is derived from a joint probability
distribution (say, 𝑓(𝑥, 𝑦)), then the resulting distribution is known as a marginal distribution
(of 𝑋 or 𝑌).

When the random variable 𝑋 and 𝑌 are discrete, the marginal distribution of 𝑋 is

𝑔(𝑥) = ∑𝑦 𝑓(𝑥, 𝑦).

And marginal distribution of 𝑌 is

ℎ(𝑦) = ∑𝑥 𝑓(𝑥, 𝑦).

When 𝑋 and 𝑌 are random continuous variables,



𝑔(𝑥) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦, for −∞ < 𝑥 < ∞

Page 41 of 68

ℎ(𝑦) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 , for −∞ < 𝑦 < ∞.

∞ ∞ ∞
∗ ∫−∞ 𝑔(𝑥)𝑑𝑥 = ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1

𝑏 ∞
𝑏
𝑃[𝑎 < 𝑋 < 𝑏] = 𝑃[𝑎 < 𝑋 < 𝑏, −∞ < 𝑌 < ∞] = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = ∫ 𝑔(𝑥)𝑑𝑥
𝑎
𝑎 −∞

Question:

*Suppose that 𝑋 and 𝑌 have the following joint probability distribution

Values of 𝑋

Values of 𝑌 0 1 2 3 Row sum

0 1 1/8 2/8 1/8 4/8

1 1/8 2/8 1/8 0 4/8

Column sum 1/8 3/8 3/8 1/8 1

Find the marginal distribution of 𝑋 and 𝑌.

Solution:

For random variables of 𝑋

1
1 1
𝑔(0) = 𝑃(𝑋 = 0) = ∑ 𝑓(0, 𝑦) = 𝑓(0,0) + 𝑓(0,1) = 0 + =
8 8
𝑦=0
1
1 2 3
𝑔(1) = 𝑃(𝑋 = 1) = ∑ 𝑓(1, 𝑦) = 𝑓(1,0) + 𝑓(1,1) = + =
8 8 8
𝑦=0
1
2 1 3
𝑔(2) = 𝑃(𝑋 = 2) = ∑ 𝑓(2, 𝑦) = 𝑓(2,0) + 𝑓(2,1) = + =
8 8 8
𝑦=0
1
1 1
𝑔(3) = 𝑃(𝑋 = 3) = ∑ 𝑓(3, 𝑦) = 𝑓(3,0) + 𝑓(3,1) = +0=
8 8
𝑦=0

Page 42 of 68
Similarly, for 𝑌

ℎ(0) = 𝑃(𝑌 = 0)
3

= ∑ 𝑓(𝑥, 0) = 𝑓(0,0) + 𝑓(1,0) + 𝑓(2,0) + 𝑓(3,0)


𝑥=0
1 2 1 4 1
=0+ + + = =
8 8 8 8 2
ℎ(1) = 𝑃(𝑌 = 1)
3

= ∑ 𝑓(𝑥, 1) = 𝑓(0,1) + 𝑓(1,1) + 𝑓(2,1) + 𝑓(3,1)


𝑥=0
1 2 1 4 1
= + + +0= =
8 8 8 8 2
Marginal distribution of 𝑋:
𝑥 0 1 2 3 Su
m
𝑓(𝑥) 1/ 3/ 3/ 1/ 1
8 8 8 8

Marginal distribution of 𝑌:
𝑦 0 1 Sum
𝑓(𝑦) 4/8 4/8 1

This shows that the marginal distributions are probability distribution.

*Find the marginal densities of 𝑋 and 𝑌 from the following density function and verify that
marginal distributions are also probability distributions.

1
𝑓(𝑥, 𝑦) = { (6 − 𝑥 − 𝑦}, for 0 < 𝑥 < 2, 2 < 𝑦 < 4.
8

Also compute 𝑃[𝑋 + 𝑌 < 3]and 𝑃[𝑋 < 1.5, 𝑌 < 2.5].

Solution: The marginal density of 𝑋 is

4
1 4 1 𝑦2 1
𝑔(𝑥) = ∫2 (6 − 𝑥 − 𝑦) 𝑑𝑦 = [6𝑦 − 𝑥𝑦 − ] = (3 − 𝑥), 0 < 𝑥 < 2
8 8 2 2 4

The marginal density of 𝑌 is

Page 43 of 68
2
1 2 1 𝑥2 1
ℎ(𝑦) = ∫0 (6 − 𝑥 − 𝑦) 𝑑𝑥 = [6𝑥 − − 𝑥𝑦] = (5 − 𝑦), 2 < 𝑦 < 4.
8 8 2 0 4

Now, we verify that 𝑔(𝑥) and ℎ(𝑦) are probability distributions.

It is clear that in the given range of 𝑋 and 𝑌,

𝑔(𝑥) ≥ 0 and ℎ(𝑦) ≥ 0

Also

2
2 1 2 1 𝑥2
∫0 𝑔(𝑥)𝑑𝑥 = 4 ∫0 (3 − 𝑥)𝑑𝑥 = 4 [3𝑥 − 2 0
] =1

And

4
4 1 4 1 𝑦2
∫2 ℎ(𝑦)𝑑𝑦 = 4 ∫2 (5 − 𝑦)𝑑𝑦 = 4 [5𝑥 − 2 2
] =1

Here 𝑔(𝑥) and ℎ(𝑦) satisfy all the condition of a density function.

Now

3−𝑥
1 2 3−𝑥 1 2 𝑦2
𝑃(𝑋 + 𝑌 < 3) = ∫0 ∫2 (6 − 𝑥 − 𝑦) 𝑑𝑦𝑑𝑥 = ∫0 [6𝑦 − 𝑥𝑦 − ] 𝑑𝑥
8 8 2 2

1 2 𝑥2 7
= ∫ ( − 4𝑥 + )𝑑𝑥
8 0 2 2

1 𝑥3 7𝑥 2 2 1
[ − 2𝑥 2 + ]0 =
8 6 2 24
3 5
3 5 1 9
𝑃 (𝑋 < , 𝑌 < ) = ∫02 ∫22(6 − 𝑥 − 𝑦) 𝑑𝑦𝑑𝑥 = .
2 2 8 32

Conditional Distribution:

The conditional distributions are exactly analogous to the conditional probabilities of the type
𝑃(𝐴|𝐵)𝑜𝑟 𝑃(𝐵|𝐴),where 𝐴 and 𝐵 are two events in a sample space. Using the definition of
conditional probability

𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵|𝐴) = , 𝑃(𝐴) > 0
𝑃(𝐴)

Page 44 of 68
Replacing the events 𝐴 and 𝐵 by the random variables 𝑋 and 𝑌 respectively, we define the
conditional probability of 𝑌 for given 𝑋 as follows:

𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) 𝑓(𝑥, 𝑦)
𝑃(𝑌 = 𝑦|𝑋 = 𝑥) = =
𝑃(𝑋 = 𝑥) 𝑔(𝑥)

When 𝑋 and 𝑌 are discrete random variables,

𝑓(𝑥, 𝑦) 𝑓(𝑥, 𝑦)
𝑓(𝑦|𝑥) = = , for 𝑔(𝑥) > 0
∑𝑦 𝑓(𝑥, 𝑦) 𝑔(𝑥)

𝑓(𝑥, 𝑦) 𝑓(𝑥, 𝑦)
𝑓(𝑥|𝑦) = = , for ℎ(𝑦) > 0
∑𝑥 𝑓(𝑥, 𝑦) ℎ(𝑦)

When 𝑋 and 𝑌 are the continuous random variables,

𝑓(𝑥, 𝑦) 𝑓(𝑥, 𝑦)
𝑓(𝑦|𝑥) = ∞ = , 𝑔(𝑥) > 0
∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦 𝑔(𝑥)

𝑓(𝑥,𝑦) 𝑓(𝑥,𝑦)
𝑓(𝑥|𝑦) = ∞ = , ℎ(𝑦) > 0.
∫−∞ 𝑓(𝑥,𝑦)𝑑𝑥 ℎ(𝑦)

If one wished to find the probability that the random variable 𝑋 falls between 𝑎 and 𝑏, when it
is known that variable 𝑌 = 𝑦, we evaluate

∑ 𝑓(𝑥|𝑦), 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝑥
𝑃(𝑎 < 𝑋 < 𝑏|𝑌 = 𝑦) = 𝑏

∫ 𝑓(𝑥|𝑦), 𝑖𝑓 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
{𝑎

Example:

Values of 𝑿

Values of 𝒀 0 1 2

0 3/28 3/28 3/28

1 6/28 6/28 0

2 1/28 0 0

Find 𝑓(𝑥|1), 𝑓(𝑦|1) and 𝑃(𝑋 = 0|Y = 1).

Page 45 of 68
𝑓(𝑥,1) 𝑓(1,𝑦)
Solution: 𝑓(𝑥|1) = and 𝑓(𝑦|1) = .
ℎ(1) 𝑔(1)

6 6 3
Now, ℎ(1) = ∑2𝑥=0 𝑓(𝑥, 1) = 𝑓(0,1) + 𝑓(1,1) + 𝑓(2,1) = + +0= .
28 28 7

Hence, the conditional distribution of 𝑋 given 𝑌 = 1 is

𝑓(𝑥, 1) 7
𝑓(𝑥|1) = = 𝑓(𝑥, 1), 𝑓𝑜𝑟 𝑥 = 0, 1, 2.
ℎ(1) 3

Therefore,

7 7 6 1
𝑓(0|1) = 𝑓(0,1) = × =
3 3 28 2

7 7 6 1
𝑓(1|1) = 𝑓(1,1) = × =
3 3 28 2

7 7
𝑓(2|1) = 𝑓(2,1) = × 0 = 0
3 3

Hence, the conditional distribution of 𝑋 given 𝑌 = 1 is,

𝑋 0 1 2

𝑓(𝑥|1) 1/2 1/2 0

Example: Given the following joint density function of 𝑋 and 𝑌.

6
𝑓(𝑥, 𝑦) = (𝑥 + 𝑦 2 ), 0 < 𝑥 < 1, 0<𝑦<1
5

= 0, elsewhere

Find the following:

(i) 𝑓(𝑥|𝑦)
(ii) 𝑓(𝑥|0.5)
(iii) 𝑃(𝑋 < 0.5|𝑌 = 0.5)

Page 46 of 68
Solution:

Now,

1
6 6 𝑥2 1
ℎ(𝑦) = ∫(𝑥 + 𝑦 2 )𝑑𝑥 = ( + 𝑥𝑦 2 )|
5 5 2 0
0

6 1
= ( + 𝑦2) , 0<𝑦<1
5 2

(i)

6 2
𝑓(𝑥, 𝑦) 5 (𝑥 + 𝑦 ) 2(𝑥 + 𝑦 2 )
𝑓(𝑥|𝑦) = = = , 0 < 𝑥 < 1, 0<𝑦<1
ℎ(𝑦) 6 1 2 1 + 2𝑦 2
( +𝑦 )
5 2

(ii)

From 𝑓(𝑥|𝑦) above,

2(𝑥 + 0.52 ) 1
𝑓(𝑥|0.5) = = (4𝑥 + 1), 0<𝑥<1
1 + 2(0.5)2 3

(iii)

0.5
1
𝑃(𝑋 < 0.5|𝑌 = 0.5) = ∫ (4𝑥 + 1)𝑑𝑥
3
0

1 1
= (2𝑥 2 + 𝑥)| 0.5
0
= .
3 3

Independence of Random Variables:

Two random variables 𝑋 and 𝑌 with marginal densities 𝑔(𝑥) and ℎ(𝑦), respectively, are said
to be independent if and only if

𝑓(𝑥|𝑦) = 𝑔(𝑥) or 𝑓(𝑦|𝑥) = ℎ(𝑦)

Where 𝑓(𝑥|𝑦) is the conditional density of 𝑋 for given 𝑌.

If 𝑋 and 𝑌 are independent, then

𝑓(𝑥, 𝑦) = 𝑔(𝑥). ℎ(𝑦)

For all values of 𝑥 and 𝑦.

Page 47 of 68
Values of 𝑋

Values of 𝑌 𝑥1 𝑥2 𝑥3 Row Sum


𝑦1 𝑓(𝑥1 , 𝑦1 ) 𝑓(𝑥2 , 𝑦1 ) 𝑓(𝑥3 , 𝑦1 ) ℎ(𝑦1 )
𝑦2 𝑓(𝑥1 , 𝑦2 ) 𝑓(𝑥2 , 𝑦2 ) 𝑓(𝑥3 , 𝑦2 ) ℎ(𝑦2 )
Column Sum 𝑔(𝑥1 ) 𝑔(𝑥2 ) 𝑔(𝑥3 ) 1

For independence,

𝑓(𝑥1 , 𝑦1 ) = 𝑔(𝑥1 ). ℎ(𝑦1 )

𝑓(𝑥2 , 𝑦1 ) = 𝑔(𝑥2 ). ℎ(𝑦1 )


.
.
.
𝑓(𝑥3 , 𝑦2 = 𝑔(𝑥3 ). ℎ(𝑦2 )
)

Example:Suppose 𝑋 and 𝑌 have the following joint probability function:

Values of 𝑋

Values of 𝑌 2 4 Row Sum


1 0.10 0.15 0.25
3 0.20 0.30 0.50
5 0.10 0.15 0.25
Column Sum 0.40 0.60 1

Check if 𝑋 and 𝑌 are independent.

Solution:

i) 𝑓(2,1) = 0.10 and 𝑔(2) = 0.40, ℎ(1) = 0.25, hence 𝑔(2). ℎ(1) = 0.10 = 𝑓(2,1).

𝑖𝑖) 𝑓(4,1) = 0.15 and 𝑔(4) = 0.60, ℎ(1) = 0.25, hence 𝑔(4). ℎ(1) = 0.15 =
𝑓(4,1).

.
.
.

Page 48 of 68
vi) 𝑓(4,5) = 0.15 and 𝑔(4) = 0.60, ℎ(5) = 0.25, hence 𝑔(4). ℎ(5) = 0.15 =
𝑓(4,5).

For all points of the values (𝑥, 𝑦) of the random variables 𝑋 and 𝑌, 𝑓(𝑥, 𝑦) = 𝑔(𝑥)ℎ(𝑦)

Hence, the variables are independent.

Alternatively, we can use the concept of conditional probability in this case.

f(2,1) 0.10
𝑝(𝑌 = 1|𝑋 = 2)𝑥 = f(1,2) = = = 0.25 = h(1)
g(2) 0.40

𝑓(2,3) 0.20
𝑝(𝑌 = 3|𝑋 = 2) = 𝑓(3|2) = = = 0.50 = ℎ(3)
𝑔(2) 0.40

.
.
.
f(4,5) 0.15
p(Y = 5|X = 4) = f(5|4) = = = 0.25 = h(5)
g(4) 0.60

Here, the conditional distributions of Y for all X’s are equal to the marginal distribution
of Y. So, X and Y are independent.

Example:

𝑥+𝑦
𝑓(𝑥, 𝑦) = , 0 < x < 2, 0 < y < 2
8

Show that X and Y are not independent.

Solution:

1 2 𝑥+𝑦
𝑔(𝑥) = ∫ (𝑥 + 𝑦)𝑑𝑦 =
8 0 4

1 2 𝑦+1
ℎ(𝑦) = ∫ (𝑥 + 𝑦)𝑑𝑥 =
8 0 4

(𝑥+1)(𝑦+1)
Thus, 𝑔(𝑥)ℎ(𝑦) = ≠ 𝑓(𝑥, 𝑦).
4

Hence, 𝑋 and 𝑌 are not independent.

Alternatively, using the concept of conditional distribution,

Page 49 of 68
𝑥+𝑦
𝑓(𝑥, 𝑦) 𝑥+𝑦
𝑓(𝑦|𝑥) = = 8 = ≠ ℎ(𝑦)
𝑔(𝑥) 𝑥 + 1 2(𝑥 + 1)
4

𝑥+𝑦
𝑓(𝑥, 𝑦) 𝑥+𝑦
𝑓(𝑥|𝑦) = = 8 = ≠ 𝑔(𝑥)
ℎ(𝑦) 𝑦 + 1 2(𝑦 + 1)
4

Since the conditional distributions are not equal to the marginal distributions, the variables are
not independent.

MATHEMATICAL EXPECTATION

Historically, the term mathematical expectation or expected value is derived from games of chances. In
such games; the gamers were concerned with how much, on the average, one would expect to win if
the game is continued for a sufficiently long time. In statistical terminology, this term is associated with
a random a variable and in fact, is the average value of this random variable generated through a random
experiment.

The computation of expected value of a random variable is straightforward. When the standard variable
is discrete, it is simply the sum of the products of all possible values of the random variables multiplied
by their respective probabilities. For continuous variable, it is analogously defined.

If 𝑋 is a discrete random variable with the probability function 𝑓(𝑥), then the expected value or the
mathematical expectation of 𝑋, 𝐸 (𝑋) is defined as,

𝐸 (𝑋) = ∑ 𝑥𝑓(𝑥)
𝑥

If 𝑋 is continuous having a density function 𝑓 (𝑥 ) , then


𝐸 (𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞

*If C is a constant, 𝐸 (𝐶 ) = 𝐶

*𝐸(𝐶 × 𝑔(𝑥)) = 𝐸(𝐶) × 𝐸(𝑔(𝑥)) = 𝐶 × 𝐸(𝑔(𝑥))

*[𝐸𝑊1 (𝑋) + 𝑊2 (𝑋) + ⋯ + 𝑊𝑘 (𝑋)}] = 𝐸[𝑊1 (𝑋)] + 𝐸[𝑊2 (𝑋)] + ⋯ 𝐸[𝑊𝑘 (𝑋)]

∑ 𝑊(𝑥)𝑓(𝑥), 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝑥
𝐸𝑊𝑋 = ∞
∫ 𝑊(𝑥)𝑓(𝑥)𝑑𝑥, 𝑖𝑓 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
{ −∞ }

Page 50 of 68
*The Variance of a random variable x is

𝑉(𝑋) = 𝐸[ (𝑋 − 𝜇)2 ]

=𝐸(𝑋 2 ) − 𝜇2 ,

where 𝜇 is the expected value.

Example

X -3 -2 0 1 2

𝑃(𝑋 = 𝑥)
0.10 0.30 0.15 0.40 0.05
= 𝑓(𝑥)

Find 𝐸(𝑋) and 𝑉(𝑋).

Solution:

𝜇 = 𝐸(𝑋) = ∑ 𝑥𝑓(𝑥)
𝑥

= (−3 × 0.1) + (−2 × 0.30) + (0 × 0.15) + (1 × 0.40) + (2 × 0.05)

= −0.4
2

𝑉(𝑋) = 𝐸(𝑋 − 𝜇)2 = ∑ (𝑋 − 𝜇)2 𝑥𝑓(𝑥)


−3

= (−3 + 0.4)2 × 0.1 + (−2 + 0.4)2 × 0.30 + ⋯ + (2 + 0.4)2 × .05

= 2.54

Or, 𝐸(𝑋 2 ) = ∑𝑥 𝑥 2 𝑓(𝑥)

= (−3)2 × .10 + (−2)2 × .30 + ⋯ + 22 × .05

= 2.7

∴ 𝑉(𝑋) = 𝐸(𝑋 2 ) − 𝜇2 = 2.7 − (−.4)2 = 2.54

∎Standard deviation (𝜎) is the square root of the variance, i.e.

𝜎 = √𝑉(𝑋) = √2.54 = 1.59 .

Page 51 of 68
Example: A life insurance company in Bangladesh offers to sell a TK.25000 one year term
life insurance policy to a 25 year old man for a premium of TK.2500. According to Bangladesh
life table, the probability of serving one year for a 25 year old man is 0.97. What is the
company’s expected gain in the long run?

𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛:

The gain 𝑋 is a random variable that may take on the values 2500, if the man survives or 2500-
25000=-TK.22500 if he dies .consequently, the probability of 𝑋 is

𝑋 ∶ 2500 − 22500

𝑓(𝑥) ∶ .97 .03

𝑆𝑜, 𝐸(𝑋) = (2500 × .97) + (−22500 × .03) = 1750

Thus, the ultimate gain of the company is 1750.

Example: Find the mean (expected value) and the variation of the following function

𝑓(𝑥) = 2(1 − 𝑥), 0 < 𝑥 < 1

𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛:
1 0 1
1
𝐸(𝑋) = ∫ 𝑥(2𝑥 − 1) 𝑑𝑥 = 2 [∫ 𝑥 𝑑𝑥 − ∫ 𝑥 2 𝑑𝑥 ] =
0 1 0 3
1
2)
1
𝐸(𝑋 = ∫ 𝑥 2 . 2(1 − 𝑥) 𝑑𝑥 =
0 6

2 2) 2
1 1 2 1
∴ 𝑉(𝑋) = 𝜎 = 𝐸(𝑋 − [𝐸(𝑋)] = − ( ) =
6 3 18

Example: Given the following discrete distribution

𝑒 −𝑚 𝑚 𝑥
𝑓(𝑥) = , 𝑥 = 0, 1, 2, . . ∞
𝑥!

Find 𝐸(𝑋).

𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛:

By definition

Page 52 of 68
∞ ∞ ∞
𝑥𝑒 −𝑚 𝑚 𝑥 𝑥𝑚 𝑥
𝐸(𝑋) = ∑ 𝑥𝑓(𝑥) = ∑ = 𝑒 −𝑚 ∑
𝑥! 𝑥!
𝑥=0 𝑥=0 𝑥=0

𝑚2 𝑚3
=𝑒 −𝑚 𝑚 (1 + 𝑚 + + + ⋯ ) = 𝑒 −𝑚 𝑚 = 𝑚.
2! 3!

Example: A lot of 7 markers is supplied by a quality inspector; the lot contains 4 good markers
and 3 defective markers. A sample of 3 is taken by the inspector. Find the expected value of
the number of good markers in this sample.

Solution:

Let X represent the number of good markers in the sample. It can be shown that the probability
distribution of X is

(4𝐶𝑥 )(3𝐶3−𝑥 )
𝑓(𝑥) = , 𝑥 = 0, 1, 2, 3.
7𝐶3

Calculation shows that the probability distribution of X is as shown in the accompanying


table:

Values of X 0 1 2 3

𝑓(𝑥) 1 12 18 4
35 35 35 35

1 12 18 4
Therefore, 𝐸(𝑋) = (0) ( ) + (1) ( ) + (2) ( ) + (3) ( ) = 1.7.
35 35 35 35

Thus, if a sample of 3 markers is selected at random over and over again from a lot of 4 good
markers and 3 defective markers, it would contain, on average 1.7 good markers.

Page 53 of 68
Example: In a coin tossing program, a man is promised to receive TK. 5 if he gets all heads or
all tails when three coins are tossed and he pays off (loses) TK. 3 if he either one or two heads
appear. How much is he expected to gain in the long run?

Solution: The random variable here is the amount the man can win. If X is the random variable,
then X will be on a value 5 when the coins show all heads and -3, otherwise. The table below
shows the outcomes of the experiment, values of X and the associated probabilities:

Outcome: HHT HTT HTH HHT THH THT TTH TTT

X 5 -3 -3 -3 -3 -3 -3 5

𝑓(𝑥) 1⁄8 1⁄8 1 ⁄8 1⁄8 1⁄8 1⁄8 1⁄8 1⁄8

It appears from the above table that the variable X assumes values -3 and 5 with probabilities
6⁄8 and 2⁄8 respectively. Since the value -3 occurs 6 times and 5 occurs 2 times the expected
value of X is

6 2
𝐸(𝑋) = ∑ 𝑥𝑓(𝑥) = −3 ( ) + 5 ( ) = −1.
8 3

Thus, the man is expected to lose TK. 1 in the long run.

Let us now examine what happens if the man receives TK. 5 for all heads or all tails, Tk. “0”
for 2 heads and pays off Tk. 3 for 1 head. The random variable X will now assume the values,
5, 0 and -3 with associated probabilities 2⁄8, 3⁄8 and 3⁄8 respectively. The expected value
in this case will be

2 3 3 1
𝐸(𝑋) = ∑ 𝑥𝑓(𝑥) = 5 ( ) +0( ) + (−3) ( ) = = 0.125.
8 8 8 8

This shows that the man will be marginally gainer winning only 12.5 paisa.

Page 54 of 68
EXPECTED VALUE OF A FUNCTION OF TWO RANDOM VARIABLE

The notion of mathematical expectation can be expended of two or more random variables. We
will deal here with the case of two variables, which can be analogously being extended for 3
or more variables. Let X and Y be two random variables with joint probability distribution
f(x, y). the expected value of the function w(X, Y) is defined as

E[w(X, Y)] = ∑x ∑y w(x, y)f(x, y), if X and Y are discrete

∞ ∞
= ∫−∞ ∫−∞ w(x, y)f(x, y) dxdy, if X and Y are continuous

Further if w(X, Y) is a function of the random variable X and Y, and c is a constant, then

E[cw(X, Y)] = cE[w(X, Y)] = c ∑ ∑ w(x, y)f(x, y)


y x

And, also, if X and Y are two random variable and w1 (X, Y), w2 (X, Y) are the functions of X
and Y, then

E[w1 (X, Y) + w2 (X, Y)] = E[w1 (X, Y)] + E[w2 (X, Y)].

𝐓𝐡𝐞𝐨𝐫𝐞𝐦 𝟏: The expected value of the sum of two random variables X and Y is the sum of

the expected values of the random variables. Symbolically,

E(X + Y) = E(X) + E(Y).

𝐂𝐨𝐫𝐨𝐥𝐥𝐚𝐫𝐲 𝟏: If X ≥ Y, then E(X) ≥ E(Y)

𝐂𝐨𝐫𝐨𝐥𝐥𝐚𝐫𝐲 𝟐: If a and b are two constants, then E(aX + bY) = aE(X) + bE(Y)

𝐂𝐨𝐫𝐨𝐥𝐥𝐚𝐫𝐲 𝟑: If a1 , a2……. ak are k constants, then for k RVs X1, X2 … X k

E[∑ki=1 ai Xi ] = ai ∑ki=1 E(Xi ).

Theorem 2: The expected value of the two random variables X and Y is equal to the product

of their expected values, only when the variables are independent, i. e

𝐸(𝑋𝑌) = 𝐸(𝑋)𝐸(𝑌)

Page 55 of 68
Or, in other words,

The expected value of the product of two random variable is equal to the product of their

expectations.

Example: Given the following density function

𝑓(𝑥, 𝑦) = 2(𝑥 + 𝑦 − 2𝑥𝑦), 𝑜 < 𝑥 < 1, 0 < 𝑦 < 1

=0, elsewhere

(a) Find E(X), E(Y), E(X + Y) and E(XY)


(b) Also verify whether E(X + Y) = E(X) + E(Y)
(c) Are X and Y independent?

𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧:

The marginal density of X is


1
1
𝑔(𝑥) = 2 ∫(𝑥 + 𝑦 − 2𝑥𝑦)𝑑𝑦 = 2 (𝑥 + − 𝑥) = 1
2
0

Hence,
1 1
1
𝐸(𝑋) = ∫ 𝑥𝑔(𝑥)𝑑𝑥 = ∫ 𝑥𝑑𝑥 =
2
0 0

Similarly,
1
1
ℎ(𝑦) = 2 ∫(𝑥 + 𝑦 − 2𝑥𝑦)𝑑𝑥 = 2 ( + 𝑦 − 𝑦) = 1
2
0

And
1 1
1
𝐸(𝑌) = ∫ 𝑦ℎ(𝑦)𝑑𝑦 = ∫ 𝑦𝑑𝑦 =
2
0 0

1 1

𝐸(𝑋 + 𝑌) = 2 ∫ ∫(𝑥 + 𝑦)(𝑥 + 𝑦 − 2𝑥𝑦)𝑑𝑥𝑑𝑦


0 0

Page 56 of 68
1 1

= ∫ ∫(𝑥 2 + 𝑦 2 + 2𝑥𝑦 − 2𝑥 2 𝑦 − 2𝑥𝑦 2 )𝑑𝑥𝑑𝑦


0 0

1
1 2𝑥
= 2 ∫ (𝑥 2 + + 𝑥 − 𝑥 2 − ) 𝑑𝑥
3 3
0

1
1 𝑥
= 2 ∫ ( + ) 𝑑𝑥 = 1
3 3
0

1 1
Since 𝐸(𝑋) + 𝐸(𝑌) = + = 1, we establish that 𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌)
2 2

1 1

𝐸(𝑋𝑌) = ∫ ∫ 𝑥𝑦𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦


0 0

1 1

= ∫ ∫ 𝑥𝑦(𝑥 + 𝑦 − 2𝑥𝑦)𝑑𝑥𝑑𝑦
0 0

1 1

= ∫ ∫(𝑥 2 𝑦 + 𝑦 2 𝑥 − 2𝑥 2 𝑦 2 ) 𝑑𝑥𝑑𝑦 𝑑𝑦
0 0

2
=
9

Now

1 1 1
𝐸(𝑋)𝐸(𝑌) = × =
2 2 4

And

2
𝐸(𝑋𝑌) =
9

Since E(XY) is not equal to E(X) × E(Y), the variables are not independent.

Page 57 of 68
Example: Given the following density function of X and Y:

𝑓(𝑥, 𝑦) = 4𝑥𝑦, 0<𝑥<1

= 0, elsewhere
Obtain E(X) and E(Y).
𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧:
1 1 1 1 1
2
𝐸(𝑋) = ∫ ∫ 𝑥𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 4 ∫ ∫ 𝑥 2 𝑦 𝑑𝑥𝑑𝑦 = 2 ∫ 𝑥 2 𝑑𝑥 =
3
0 0 0 0 0
1 1 1 1 1
4 2
𝐸(𝑌) = ∫ ∫ 𝑦𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 4 ∫ ∫ 𝑥 2 𝑦𝑑𝑥𝑑𝑦 = ∫ 𝑥𝑑𝑥 =
3 3
0 0 0 0 0

Page 58 of 68
Binomial Distribution
When an experiment has two possible outcomes, success and failure and the experiment is
repeated 𝑛 times independently and the probability 𝑃 of success of any given trial remains
constant from trial to trial, the experiment is known as binomial experiment.

If X is a binomial random variable, then

f(x) ~ Binomial (𝑛, 𝑝)


𝑛
𝑃(𝑋 = 𝑥) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 , 𝑥 = 0, 1, 2, … , 𝑛;
𝑥
where n is the number of trials and p is the probability of success.
[ f(x) = P(X=x)]

Mean of binomial random variable: E(X) = np


Variance of binomial random variable: V(X) = np(1-p)

Example: The probability that a patient recovers from a disease is 0.9. What is the probability
that an exactly 5 out of the next 7 patients will survive?
Solution:
We assure that the operations are made independently and p=0.9 for each of the seven
patients.
n=7, p=0.9 1: success (recover), 0: failure (not recover)
patient1 p2 p3 p4 p5 p6 p7 Total
1 1 0 0 0 1 0 3
0 0 0 0 1 0 0 1

Bernoulli Binomial

So, X~ Binomial (n=7, p=0.9)


𝑛
𝑃(𝑋 = 5) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥
𝑥
7
= ( ) 0.95 (1 − 0.9)7−5 = 0.1240
5

How many patients will survive on an average?


𝐸(𝑋) = 𝑛 ∗ 𝑝 = 7 ∗ 0.9 = 6.3

Page 59 of 68
So, 6.3 patients out of 7 patients will survive on an average. That is, 63 patients out of 70
patients will survive on an average.

Example: A traffic control officer reports that 75% of the trucks passing through a check post
are from within Dhaka city. What is the probability that at least 3 of the next 5 trucks are from
out of the city?
Solution: Let X be the number of trucks that pass through are from out of Dhaka city. The
probability of such an event is: p=1-0.75=0.25
X~ Bin (n=5, p=0.25)
P(X ≥ 3) =P(X=3) + P(X=4) + P(X=5)
𝑛
=∑53 ( ) 𝑝 𝑥 (1 − 𝑝)(𝑛−𝑥)
𝑥
5 5 5
=( )(0.253 )(0.755−3 )+( ) 0.254 0.755−4 +( ) 0.255 0.755−5
3 4 5
= 0.1035
Or, P(X≥3) =1-P(X=0)-P(X=1)-P(X=2)
5 5 5
=1-( ) . 250 . 755−0 − ( ) 0. 251 0.755−1 − ( ) . 252 . 755−2
0 1 2
*Probability of more than 3 trucks are out of the city:
P(X>3) =P(X=4) +P(X=5)
5 5
=( ) (. 254 )(. 755−4 ) + ( ) (. 255 )(. 755−5 )
4 5
= 0.0156

*Probability that at most 2 trucks are out of the city:


P(X≤2) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2) = (5 0 ) (0.25)0 (0.75)5−0 + ⋯
*Probability that less than 2 trucks are out of the city:
P(X<2) = P(X=0) + P(X=1)
*P(1≤X<3) = P(X=1) + P(X=2)
*P(1<X<3) = P(X=2)
What is the mean number of trucks out of the Dhaka city passing through the check post?
𝐸(𝑋) = 𝑛 ∗ 𝑝 = 5 ∗ 0.25 = 1.25
So, 125 trucks out of 500 trucks will pass from out of the Dhaka city.
−𝑂−

Page 60 of 68
If E(X)=5, V(X)=2, Then, np=5, np(1-p) = 2
𝑛𝑃(1−𝑃) 2
=
𝑛𝑃 5
2
or, 1-P =
5
2 3
or, P = 1- =
5 5

Now, nP = 5
5 5×5 25
or, 𝑛 = = =
𝑃 3 3
25 3
X~𝐵𝑖𝑛(𝑛 = ,𝑃 = )
3 5

Poisson Distribution
Let µ be the mean of successes in a specified time or space and the random variable X is the
number of successes in a given time interval or specified region. Then X follows Poisson
distribution as
𝑒 −µ µ𝑥
f(x) = , x=0,1, ...∞
𝑥!

where e= 2.718.

□ Poisson distribution is used for count data. [µ = mean = variance]

Example: The average number of calls received by a telephone operator during a time interval
of 10 minutes. during 5PM to 5:10 PM daily is 3. What is the prob. that the operator will
receive,
i. no call
ii. exactly one call
iii. at least two calls tomorrow during the same time interval.

Solution: Let X be the random variable representing the number of calls made during the
interval.
X ~ Poisson (3),
𝑒 −3 3𝑥
f(X=x) = , x≥0
𝑥!

𝑒 −3 30
i. P(X=0) = = 0.0498
0!

Page 61 of 68
𝑒 −3 ∙ 3
𝑖𝑖. 𝑃(𝑋 = 1) = = 0.1494
1!
𝑖𝑖𝑖. 𝑃(𝑋 ≥ 2) = 1 − 𝑃(𝑋 = 0) − 𝑃(𝑥 = 1)
= 1 − 0.0498 − 0.1494
= 0.8008
Mean:
𝐸(𝑋) = 𝜇
Variance:
𝑉(𝑋) = 𝜇

Example: Find the mean and standard deviation of a Poisson variate X for which P(X=1) =
P(X=2).
Solution: Let X ~ Poisson(𝜇)
𝑒 −𝜇 ∙ 𝜇1
𝑃(𝑋 = 1) =
1!
𝑒 −𝜇 ∙ 𝜇2
𝑃(𝑋 = 2) =
2!
𝑒 −𝜇 ∙ 𝜇1 𝑒 −𝜇 ∙ 𝜇2
∴ =
1! 2!
⇒𝜇=2

So, the mean = 2, Standard deviation = √2.

Poisson approximation to binomial distribution:


If n becomes larger and p becomes smaller, then the Poisson distribution with μ = np
provides an approximation to the binomial distribution.
Example: The probability of breaking a glass beaker while heating in the laboratory is 0.012.
If we heat 250 such glass beakers, find the probability that there will be exactly 2 breaks.
Solution: Let X be the number of breakages
X ~ Binomial (n = 250, p = 0.012)
Now, P(X=2) = (_( 2)^250)〖(0.012)〗^2 〖(1-0.012)〗^((250-2))
= 0.2245
Since n is large and p is small, we may obtain an approximate probability by using the
Poisson distribution with μ = np = 250×0.012=3.

Page 62 of 68
P(X=2) = (e^(-3)∙ 3^2)/2!=0.2241.

Example: In a certain industrial facility, accidents occur infrequently. It is known that the
probability of an accident on any given day is 0.005 and accidents are independent of each
other.
a. What is the probability that in any given period of 400 days (about 1 year) there will be an
accident on one day?
b. What is the probability that there are at most three days with an accident?
Solution: Let X be a binomial random variable with n = 400 and p = 0.005. Thus, np = 2.
Using the Poisson approximation,
(a) P(X=1) = e^(-2) 2^1=0.271
and
(b) P(X ≤ 3) = (∑_(x=0)^3e^(-2) 2^x)/x!
= 0.857.
Example: In a manufacturing process where glass products are made, defects or bubbles occur,
occasionally rendering the piece undesirable for marketing. It is known that, on average, 1 in
every 1000 of these items produced has one or more bubbles. What is the probability that a
random sample of 8000 will yield fewer than 7 items possessing bubbles?
Solution: This is essentially a binomial experiment with n = 8000 and p = 0.001. Since p is
very close to 0 and n is quite large, we shall approximate with the Poisson distribution using
Μ = (8000)(0.001) = 8.
Hence, if X represents the number of bubbles, we have
P(X<7) = ∑_( z=0)^( 6)▒b(x;8000,0.0001) ≈ p(x;8)=0.3134.

Normal Distribution
A random variable X is said to have a normal distribution with mean µ and variance 𝜎 2 (-∞ <
µ < ∞ and 𝜎 2 > 0) if X has a continuous distribution for which the probability density function
is
1 𝑥−𝜇 2
1
f(x) = 𝑒 −2( 𝜎
)
, -∞ < x < ∞
𝜎√2𝜋

*Normal probability law:


P (µ-3σ < X < µ+3σ) = 99.73%

P (µ-2σ < X < µ+2σ) = 95.45%

Page 63 of 68
P (µ-σ < X < µ+σ) = 68.27%

Standard Normal Distribution:

If a random variable X has a normal distribution with mean µ and variance σ2 (i.e. X ~ N (µ,
𝑋−𝜇
σ2)), then the variable Z = will be called a standard normal variable (or Z score) and its
𝜎

distribution is referred to as the standard normal distribution having the following density
function:

𝑍2
1
f(Z) = 𝑒− 2 , -∞ < Z < ∞
√2𝜋
𝑋−𝜇 1 1
E(Z) = E ( )= 𝐸(𝑋 − 𝜇) = [ 𝐸(𝑋) − 𝐸(𝜇) ]
𝜎 𝜎 𝜎

1
= [𝜇− 𝜇]
𝜎

=0

𝑋−𝜇 1 1
𝑉(𝑋)= 𝑉( )= 𝑣(𝑋 − 𝜇) = [ 𝑉(𝑋) + 𝑉(𝜇) − 2𝐶𝑜𝑣[𝑋]
𝜎 𝜎2 𝜎2

1
= [𝜎 2 + 0 − 0]
𝜎2

=1

Thus, Z ~ N (0,1)

The cumulative distribution function (cdf) of the standard normal variable Z is usually denoted
by φ(Z). Thus

𝑍 𝑍2
1 −2
φ(Z)= ∫ 𝑒 𝑑𝑍
√2𝜋 −∞

Example: The GPA score of 80 students of the Department of Physics of University of Dhaka
in their 1st year final exam was found to follow approximately a normal distribution with mean
2.1 and standard deviation 0.6. How many of these students are expected to have a score
between 2.5 and 3.5?

Solution: Let X be the GPA score of the 80 students.

𝑋 ~ 𝑁(2.1, 0.62 )

Page 64 of 68
2.5−2.1 𝑋−𝜇 3.5−2.1
Now, P (2.5 < X < 3.5) = P ( < < )
0.6 𝜎 0.6

= P (0.67 < 𝑍 < 2.33 )

= P (𝑍 < 2.33) – P (𝑍 < 0.67 )

= 0.9901 – 0.7480 = 0.2421


Hence, 24.21% or approximately 0.2421×80 = 20 students out of 80 are expected to make a
score between 2.5 and 3.5.

Example: If X is a normal variate with mean 25 and variance 9, find K such that
i. 30% of the area under the normal curve lies to the left of the distribution.
ii. 15% of the area under the normal curve lies to the right of the distribution.

Solution:
X ~ N (25, 9)

i.e., µ= 25, σ= 3
i. P(X<K)=0.30
=> P(Z<(𝐾 − 25)/3) = 0.30
The standard normal Table shows that
P (Z< -0.525) = 0.30

Page 65 of 68
Hence,
(𝐾 − 25)/3= -0.525
=> K = 23.425
ii. P(X>K) = 0.15

𝑘−25
or, P(Z > ) = 0.15
3
𝑘−25
or, P(Z < ) = 1- 0.15 = 0.85
3

The standard normal Table shows that


P (Z < 1.035) = 0.85
Hence,
(𝐾 − 25)/3 = 1.035
=> K = 28.105.

Example: Given a standard normal distribution, find the area under the curve that lies
(a) to the right of z = 1.84

(b) between z = −1.97 and z = 0.86.

Solution: See Figure 6.9 for the specific areas.

Page 66 of 68
(a) The area in Figure 6.9(a) to the right of z = 1.84 is equal to 1 minus the area in Z-table to
the left of z = 1.84, namely, 1 − 0.9671 = 0.0329.

(b) The area in Figure 6.9(b) between z = −1.97 and z = 0.86 is equal to the area to the left of
z = 0.86 minus the area to the left of z = −1.97. From Z-table we find the desired area to be
0.8051 − 0.0244 = 0.7807.

Example: Given a random variable X having a normal distribution with μ = 50 and σ = 10,
find the probability that X assumes a value between 45 and 62.

Solution: The z values corresponding to 𝑋1 = 45 and 𝑋 = 62 are


45−50 62−50
𝑍1 = = −0.5 and 𝑍2 = = 1.2
10 10

Therefore,
𝑃(45 < 𝑋 < 62) = 𝑃(−0.5 < 𝑍 < 1.2).
𝑃(−0.5 < 𝑍 < 1.2) is shown by the area of the shaded region in the Figure.

This area may be found by subtracting the area to the left of the ordinate 𝑧 = −0.5 from the
entire area to the left of z = 1.2. Using Z-table, we have
𝑃 (45 < 𝑋 < 62) = 𝑃(−0.5 < 𝑍 < 1.2) = 𝑃 (𝑍 < 1.2) − 𝑃(𝑍 < −0.5)
= 0.8849 − 0.3085 = 0.5764.

Example: In an industrial process, the diameter of a ball bearing is an important measurement.


The buyer sets specifications for the diameter to be 3.0 ± 0.01 cm. The implication is that no
part falling outside these specifications will be accepted. It is known that in the process the
diameter of a ball bearing has a normal distribution with mean μ = 3.0 and standard deviation
σ = 0.005. On average, how many manufactured ball bearings will be scrapped?

Solution: The distribution of diameters is illustrated by Figure 6.16. The values


corresponding to the specification limits are x1 = 2.99 and x2 = 3.01. The corresponding z
values are

Page 67 of 68
2.99 − 3.0 3.01 − 3.0
𝑍1 = = −2 𝑎𝑛𝑑 𝑍2 = = +2.0
0.005 0.005

Hence,
𝑃(2.99 < 𝑋 < 3.01) = 𝑃(−2.0 < 𝑍 < 2.0).
From Table𝐴. 3, 𝑃(𝑍 < −2.0) = 0.0228.

Due to symmetry of the normal distribution, we find that


𝑃(𝑍 < −2.0) + 𝑃(𝑍 > 2.0) = 2(0.0228) = 0.0456.

As a result, it is anticipated that, on average, 4.56% of manufactured ball bearings will be


scrapped.

Continuity Correction

𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) ≈ 𝑃(𝑎 − 0.5 < 𝑋′ < 𝑏 + 0.5)


where 𝑋 ′ is the constant normal variable

Let, 𝑋~𝐵𝑖𝑛 (n = 10, P = 0.5). Find P (2 ≤ X ≤ 4) using both binomial distribution and normal
distribution:
Solution: Using binomial distribution:
𝑃(2 ≤ 𝑋 ≤ 4)
𝑛
= ∑ ( ) 𝑃 𝑥 (1 − 𝑃)𝑛−𝑥
𝑥
𝑥
4
10
= ∑( ) (0.5)𝑥 (1 − 0.5)10−𝑥 = 0.3662
𝑥
2
Now, P (2 ≤ X ≤ 4) ≈ P (2-0.5 ≤ 𝑋′≤ 4+0.5) = P (1.5 ≤ 𝑋′ ≤ 4.5)
Using normal distribution, µ = np = 5, 𝜎 2 = np(1-p) = 2.5
𝑃(1.5 ≤ 𝑋′ ≤ 4.5)
1.5−5 4.5−5
= 𝑃(1.5 − 5 ≤𝑍≤ )
√2.5 √2.5
= 𝑃(−2.21 ≤ 𝑍 ≤ −0.316)
= 𝑃(𝑍 ≤ −0.316) − 𝑃(𝑍 ≤ −0.21)
= 0.3764 − 0.0136
= 0.362.

Page 68 of 68

You might also like