0% found this document useful (0 votes)
28 views46 pages

introduction to probability edited

Chapter 3 of the Probability and Statistics lecture notes introduces fundamental concepts of probability, including deterministic and nondeterministic models, definitions of key terms such as experiments, outcomes, and events, and the rules for calculating probabilities. It covers counting principles, including the addition and multiplication rules, permutations, and combinations, providing examples for clarity. The chapter emphasizes the importance of understanding sample spaces and events to effectively analyze uncertain outcomes.

Uploaded by

merhawitareke27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views46 pages

introduction to probability edited

Chapter 3 of the Probability and Statistics lecture notes introduces fundamental concepts of probability, including deterministic and nondeterministic models, definitions of key terms such as experiments, outcomes, and events, and the rules for calculating probabilities. It covers counting principles, including the addition and multiplication rules, permutations, and combinations, providing examples for clarity. The chapter emphasizes the importance of understanding sample spaces and events to effectively analyze uncertain outcomes.

Uploaded by

merhawitareke27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Probability and Statistics (Stat 2061) Lecture Note

CHAPTER 3
INTRODUCTION TO PROBABILITY
3.1 Introduction
• Probability theory is the foundation upon which the logic of inference is built.
• It helps us to cope up with uncertainty.
• In general, probability is the chance of an outcome of an experiment. It is the measure of
how likely an outcome is to occur.
Deterministic and nondeterministic models
1. Deterministic models: it is a model which stimulates that the condition under
which is performed determines the outcome of that experiment. For specified values of an
experiment’s input parameters, the result is a known constant.
• Example: Ohm’s law, V = IR,
2. Nondeterministic models: Also called probabilistic models or stochastic models. A
model in which a chance plays an important role in determination of the outcome.
Example: the number of α particle emitted from a piece of radioactive material in one
minute.
Definitions of some probability terms
1. Experiment: Any process of observation or measurement or any process which
generates well defined outcome.
2. Probability Experiment: It is an experiment that can be repeated any number of
times under similar conditions and it is possible to enumerate the total
number of outcomes without predicting an individual out come. It is also called
random experiment.
Example: If a fair die is rolled once it is possible to list all the possible outcomes i.e.1, 2,
3, 4, 5, 6 but it is not possible to predict which outcome will occur.
3. Outcome :The result of a single trial of a random experiment
4. Sample Space: Set of all possible outcomes of a probability experiment,
usually denoted by S.
Sample space can be
 Countable (finite or infinite)
 Uncountable.
5. Event: It is a subset of sample space. It is a statement about one or more outcomes of
a random experiment .They are denoted by capital letters. In other words it is a
set of outcomes but not necessarily all of them.
An event can be:
 Simple event or elementary event: an event having only a single element or sample
point.
 Compound or composite event: an event which contains more than one element.
 Impossible or null event: it is equal with empty set.
 Sure event: it equal with sample space.

Birhanu T. Mekelle University, Department of Statistics 1


Probability and Statistics (Stat 2061) Lecture Note

Example: Considering the above experiment let A be the event of odd numbers, B be the
event of even numbers, and C be the event of number 8.
⇒ A = {1,3,5}
B = {2,4,6}
C = { } or empty space or impossible event
Remark: If S (sample space) has n members then there are exactly 2n subsets or events.
6. Equally Likely Events: Events which have the same chance of occurring.
7. Complement of an Event: the complement of an event A means non-occurrence
of A and is denoted by A ' , or A c , or A contains those points of the sample space
which don’t belong to A.
8. Mutually Exclusive Events: Two events which cannot happen at the same time.
9. Independent Events: Two events are independent if the occurrence of one does
not affect the probability of the other occurring.
10. Dependent Events: Two events are dependent if the first event affects the outcome
or occurrence of the second event in a way the probability is changed.
11. Finite sample space: it is a sample space which consists of a finite number of
elements. Suppose that S = {x1, x2, …, xn} where xi’s are possible outcomes of an
experiment, then S is a finite sample space.
Definition: To every point in the sample space, we assign a real number between 0 and 1
called its probability, satisfying the following conditions.
Let Pi be probability of ith element of the sample space.
• P i ≥ 0 ∀i
n
• ∑P
i =1
i = 1 , the sum of the probabilities of all points in the sample space is one.

Example: Suppose that only three outcomes are possible in an experiment say they are a 1,
a2 and a3. Suppose further more that a1 is twice probable as a2 which is again twice
probable as a3. Find the probability of ai’s i = 1, 2, 3.
3
Solution: Let p1, p2, and p3 be their respective probabilities then ∑P
i =1
i =1
3
1
p 1 = 2 p 2 and p 2 = 2 p 3 . Hence ∑P
i =1
i = p 1 + p 2 + p 3 = p3 + 2 p3 + 4 p 3 = 1 ⇒ 7 p3 = 1 ⇔ p3 =
7
2 4
Therefore p 2 = 2 p 3 = and p 1 = 2 p 2 =
7 7
Equally likely outcomes
It is most common assumptions of the finite sample space which states that two outcomes
are said to be equally likely outcomes if either of them are not expected to occur in
preference to the others.
If we have ‘n’ possible outcomes for an experiment and suppose that all the outcomes are
1
equally likely, then the probability of the outcome is Pi = .
n

Birhanu T. Mekelle University, Department of Statistics 2


Probability and Statistics (Stat 2061) Lecture Note

Poof: Let S = {a1, a2, a3, …, an} be finite sample space.


n

∑p
i =1
i = 1 , where pi is their respective probabilities.

p 1 + p 2 + p 3 +  + pn = 1 , p 1, p 2, p 3,  , pn are equally likely.


Let p 1 = p 2 = p 3 =  = pn = pi then
1
p 1 + p 2 + p 3 +  + pn = p i + p i +  + p i = npi = 1 ⇔ pi = .
n
Generally suppose that there is a situation with n equally likely possible outcomes and that m of
those n outcomes correspond to a particular event A; then the probability of event A is defined as
m
p ( A) = .
n
Proof: A = {a1, a2, a3, …, am}, m ≤ n
1
P ( A) = p ( a1) + p ( a 2) +  + p ( am ) = p i + p i + pi +  + pi = mpi , but pi = .
n
m
Therefore P(A) = mpi = .
n
3.2 Counting Rules
In order to calculate probabilities, we have to know
• The number of elements of an event
• The number of elements of the sample space.
That is in order to judge what is probable, we have to know what is possible.
• In order to determine the number of outcomes, one can use several rules of counting.
- The addition rule - Permutation rule
- The multiplication rule - Combination rule
• To list the outcomes of the sequence of events, a useful device called tree diagram is used.
Addition Rule
If event A can occur in m possible ways and event B can occur in n possible ways, there are
m+n possible ways for either event A or event B to occur, but only if there are no events in
common between them. i.e. n (A or B) =n (A)+n(B)-n(A B). Suppose that there are k
procedures and the 1st procedure can be performed in n1 ways, 2nd procedure can be
performed in n2 ways, ..., and the kth procedure is performed in nk ways. Assume that all
procedure 1-k are not performed togetherly (procedures are mutually exclusive), then the
number of ways in which we may perform procedures 1 or 2 or 3 or .... or kth procedure is
n1 + n2+ n3 + ... +nk.
Example 1: suppose you are planning for a trip and are deciding by bus and train
transport. If there are 2 bus routes and 3 train routes to go from A to B. How many different
routes are available for the trip?
Solution: assuming that bus routes and train routes are mutually exclusive then, by the
addition rule we have n1 + n2 = 2+3 = 5 different ways.

Birhanu T. Mekelle University, Department of Statistics 3


Probability and Statistics (Stat 2061) Lecture Note
The Multiplication Rule:
If a choice consists of k steps of which the first can be made in n1 ways, the second can be
made in n2 ways…, the kth can be made in nk ways, then the whole choice step one followed
by two, then followed by three, …, then by k can be made in n1 * n 2 * ........ * n k ways.
Example 1: A quiz consists of 3 true-or-false questions. In how many ways can a student
answer the quiz?
Solution: There are 3 questions. Each question has 2 possible answers (true or false), so the quiz
may be answered in 2 · 2 · 2 = 8 different ways.
Example 2: How many sample points are, there in the sample space when a pair of dice is
thrown once?
Solution: The first die can land in any one of n1 = 6 ways and the second die can also land in n2
= 6 ways. Therefore, the pair of dice can land in n1n2 = (6)(6) = 36 possible ways.
Example 3: Computer lab technician is going to assemble a computer by himself. He has the
choice of ordering chips from two brands, a hard drive from four, memory from three, and an
accessory bundle from five local stores. How many different, ways can he order the parts?
Solution: Since n1 = 2, n2 = 4, n3 = 3, and n4 = 5, there are
n1xn2xn3xn4 = 2 x 4 x 3 x 5 = 120 different ways to order the parts.
Note: Multiplication principle is used when we are sampling with replacement.
Permutation principle
Definition: A permutation is an arrangement of all or part of a set of objects in a specified
order.
If n is positive integer then n! = 1*2*3*4*…*n-1*n.
By convention we have 0! = 1! = 1
Permutation Rules:
1. The number of permutations of n distinct objects taken all together is npn = n!
2. The arrangement of n objects in a specified order using r objects at a time is called the
permutation of n objects taken r objects at a time. It is written as n Pr and the formula is
n!
Pr =
( n − r )!
n

3. The number of permutations of n objects in which k1 are alike k2 are alike … etc is
n!
n Pr =
k1!*k 2 * ... * k n
Remark: A set of n distinct objects can be arranged in a circle in (n-1)! ways.
Examples:
1. Suppose we have a letters A,B, C, D
a) How many permutations are there taking all the four?
b) How many permutations are there two letters at a time?
2. How many different permutations can be made from the letters in the word “CORRECTION”?
Solutions:

Birhanu T. Mekelle University, Department of Statistics 4


Probability and Statistics (Stat 2061) Lecture Note
1.
a) Here n = 4, there are four disnict object
⇒ There are 4! = 24 permutatio ns.
4! 24
b) Here n = 4, r = 2 ⇒ There are 4 P2 = = = 12 permutations.
(4 − 2)! 2
Here n = 10, Of which 2 are C, 2 are O, 2 are R ,1E,1T, 1I,1N
2. ⇒ K1 = 2, k 2 = 2, k 3 = 2, k 4 = k 5 = k 6 = k 7 = 1
10!
Using the 3rd rule of permutation , there are = 453600 permutations.
2!*2!*2!*1!*1!*1!*1!
Example 3: In one year, three awards (research, teaching, and service) will be given for a class
of 25 graduate students in a statistics department. If each student can receive at most one award,
how many possible selections are there?
Solution: Since the awards are distinguishable, it is a permutation problem. The total number of
25! 25
sample points is 25P3 = = = 13,800
(25 − 2)! 22
Notice: Permutations are used when we are sampling without replacement and order matters.
Combination principle
Definition: Combination is a selection of objects without regard to order.
Example 1: Given the letters A, B, C, and D list the permutation and combination for
selecting two letters.
Solutions:
permutation Combination
AB BA CA DA AB BC
AC BC CB DB AC BD
AD BD CD DC AD DC
Note that in permutation AB is different from BA. But in combination AB is the same as BA.
Combination rule states that the number of combinations of r objects selected from n objects is
denoted by n and is defined as: n n!
C r or     =

n
r  
r ( n r )!* r !
Example 2: Among 15 clocks there are two defectives .In how many ways can an inspector
chose three of the clocks for inspection so that:
a) There is no restriction.
b) None of the defective clock is included.
c) Only one of the defective clocks is included.
d) Two of the defective clock is included.
Solutions:
n = 15 of which 2 are defective and 13 are non − defective.
r = 3
a) If there is no restriction select three clocks from 15 clocks and this can be done in :
n = 15 , r = 3
n n! 15 !
  = = = 455 ways
 r  ( n − r )!* r ! 12 !* 3 !
b) None of the defective clocks is included.

Birhanu T. Mekelle University, Department of Statistics 5


Probability and Statistics (Stat 2061) Lecture Note
This is equivalent to zero defective and three non defective, which can be done in:
 2   13 
  *   = 286 ways .
0  3 
c) Only one of the defective clocks is included.
This is equivalent to one defective and two non defective, which can be done in:
 2   13 
  *   = 156 ways .
1  2 
d) Two of the defective clock is included.
This is equivalent to two defective and one non defective, which can be done in:
 2  13 
  *   = 13 ways.
 2  3 
Notice: Combinations are used when we are sampling without replacement and order does
not matter.
3.3 Approaches to measuring Probability
If A is an event associated with an experiment, we cannot state with certainty that A will occur
or not. Hence it is becomes very important to try to associate a number with the event set A
which measures how likely will an event A will occurs. This leads to the theory of probability.
There are four different conceptual approaches to the definitions of probabilities. These are:
• The classical approach. • The axiomatic approach.
• The frequentist approach. • The subjective approach.

The classical approach: This approach is used when:


- All outcomes are equally likely.
- Total number of outcome is finite, say N.
Definition: If a random experiment with N equally likely outcomes is conducted and out of
these NA outcomes are favourable to the event A, then the probability that event A occur
denoted P ( A ) is defined as:
N No. of outcomes favourableto A n( A)
P( A) = A = =
N Total number of outcomes n( S )
Assumptions:
 It cannot be used if outcomes are not equally likely.
 It can be used only for finite sample space
Example: A box of 80 candles consists of 30 defective and 50 non defective candles. If 10 of
this candles are selected at random, what is the probability
a) All will be defective.
b) 6 will be non defective
c) All will be non defective
Solutions: Total selection =  80  = N = n ( S )
 10 
a) Let A be the event that all will be defective.
 30   50 
Total way in which A occur =   *   = N A = n( A)
 10   0 

Birhanu T. Mekelle University, Department of Statistics 6


Probability and Statistics (Stat 2061) Lecture Note

 30   50 
 * 
n( A)  10   0 
⇒ P ( A) = = = 0.00001825
n( S )  80 
 
 10 
b) Let A be the event that 6 will be non defective.
 30   50 
Total way in which A occur =   *   = N A = n( A)
4 6
 30   50 
 * 
n ( A )  4   6 
⇒ P ( A) = = = 0 .265
n(S )  80 
 
 10 
c) Let A be the event that all will be non defective.
 30   50 
Total way in which A occur =   *   = N A = n ( A )
 0   10 
 30   50 
 * 
n( A)  0   10 
⇒ P ( A) = = = 0.00624
n( S )  80 
 
 10 
Exercise 1: What is the probability that a waitress will refuse to serve alcoholic beverages
to only three minors if she randomly checks the I.D’s of five students from among ten
students of which four are not of legal age?
Frequentist Approach
If after n repetitions of an experiment where n is very large, an event A is observed to occur
k of these. Then the probability of an event A is the proportion of outcomes that are
favourable to event A in the long run when an experiment is repeated a large number of
times, defined as: P ( A ) = lim k
n→ ∞ n
Note: the frequentist approach for the definition of probability holds under the case where
we have large n ( → ∞)
Axiomatic Approach:
Let E be a random experiment and S be a sample space associated with E. For any event A,
defined on S we associated a real number P(A), called the probability of the event A
satisfying the following requirements called axioms of probability or rules of probability.
1. 0 ≤ P( A) ≤ 1
2. P( S ) = 1, S is the sure event.
3. If A and B are mutually exclusive events, the probability that one or the other occur equals
the sum of the two probabilities. i. e. P ( A ∪ B ) = P ( A ) + P ( B )

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 7


Probability and Statistics (Stat 2061) Lecture Note

4. If A1 A2,…, Ai, …, be a sequence of pair wise mutually exclusive events, then


 ∞  ∞
P   A i  = ∑ P ( A i)
 i =1  i =1

Some Important Theorems on Probability


1. P(ø) =0, ø is the impossible event.
2. If A’ is a complement (non existence) of event A then P ( A' ) = 1 − P ( A)
3. Let A and B be two events associated with a sample space of same random
experiment, then p( A ∪ B) = p( A) + p( B) − p( A ∩ B)
4. For any three events A, B, and C
p( A ∪ B ∪ C ) = p( A) + p( B) + p(C ) − p( A ∩ B) − p( A ∩ C ) − p( B ∩ C ) + p( A ∩ B ∩ C )
5. If A is a subset of B , ( A ⊆ B ) then P ( A) ≤ P ( B )
6. The probability that exactly one of the events A or B will occur is given as
p( A) + p( B) − 2 p( A ∩ B)

Remark: Venn-diagrams can be used to solve probability problems

AUB AnB A
Examples:
1. Three companies A, B and C provide cell phone coverage in the rural area. For a
randomly chosen location in this area, the probability of coverage for the first two
companies are P(A) = 0.8, P(B) = 0.75. we also know P(AuB) = 0.9 and P(BnC) = 0.45.
a. What is the probability of not having coverage from company A?
b. What is the probability of having coverage both from company A and B?
c. Company A claims it has better coverage than company C. Can you verify this?
d. If you own two cell phones; one from company B and one from company C. what is
your worst coverage?
Solution:
(a), ( ′) = 1 − ( ) = 1 − 0.8 = 0.2.
(b), ( )= ( )+ ( )− ( ) = 0.8 + 0.75 − 0.9 = 0.65.
(c), Lets find the maximum possible value for P(C). The only information we have relating
to C is ( ) = 0.45.
( )− ( ) = 0.75 − 0.45 = 0.3, then since C is the complement of this
max( ( )) = 1 − 0.3 = 0.7. Hence company A’s claim is true.

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 8


Probability and Statistics (Stat 2061) Lecture Note

(d), This question is asking the minimum possible value for P(BUC). This happens if C is a
subset of B. Also ( ) = ( )+ ( )− ( ). What is ( ) = ( ) in this case
so ( ) = ( ) + ( ) – ( ). Hence min( ( )) = ( ) = 0.75.
2. Suppose that A and B are two events for which P(A) = a, P(B) = b and P(AnB) = c. then
find:
i. ( ’ )
ii. ( ’ ’)
Solution:
(i), ( )= ( )+ ( )− ( ) = 1− ( )+ ( )−( ( )− ( ))
= 1− ( )+ ( )− ( )+ ( )=1− ( )+ ( )=1− +
(ii), ( ) = (( ))=1− ( )= 1−( ( )+ ( )− ( ))
=1− − + .
Exercise:
1. If ( ) = 0.75, ( ) = 0.25 then can we say that A and B are mutually exclusive?
2. Let ( ’) = , ( ) = then show that ( )≥1− − .

CHAPTER 4
CONDITIONAL PROBABILITY AND INDEPENDENCY
4.1 Conditional probability
Recall that when we use the symbol P(A) for probability of event A we really mean that
probability of event A with respect to the sample space S. Suppose that we have some
additional information that the outcomes of a trials is contained in a subject of a sample
space say B with P(B) ≠ 0. The resulting probability is called a conditional probability.
Conditional Events: If the occurrence of one event has an effect on the next occurrence of
the other event then the two events are conditional or dependant events.
Example: Suppose we have two red and three white balls in a bag
1. Draw a ball with replacement
2
Let A= the event that the first draw is red p ( A) =
5
2
B= the event that the second draw is red  p ( B ) = A and B are independent.
5
2. Draw a ball without replacement
2
Let A= the event that the first draw is red p ( A) =
5
B= the event that the second draw is red  p( B) = ? This is conditional. To
determine P(B) we need some information about P(A).

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 9


Probability and Statistics (Stat 2061) Lecture Note

Definition: Let A and B be two events in the sample space S with P(B) ≠0. The probability of
an event A occurs, given that event B has already occurred is called, conditional probability of
P( A ∩ B)
A given that B, denoted by p( A B) is defined as P ( A | B ) = , p( B) ≠ 0 .
P( B)
Note:
1. P(A|B) satisfies the various axioms of probability
i. 0≤ P(A|B) ≤ 1
ii. P(S|B) = 1
iii. P ( B1 ∪ B 2 | B ) = p ( B | B ) + p ( B 2 | B ) , provided that that P ( B1 ∩ B 2) = 
iv. For any sequence of events B1, B2, …, where P ( Bj ∩ Bi ) =  , ∀ i ≠ j
 n  ∞
P  Bi | B  = ∑ p ( Bi | B )
 i =1  i =1
2. If B = S then P(A|B) = P(A)
p ( AnB) p ( AnB) p ( AnB)
Proof: p ( A | B) = = = = p ( A),because AnS = A.
p( B) p( S ) 1
Note: P(A|B) ≠P(B|A)
Theorem 1: p( A B) + p( A′ B) = 1
Poof: Using definition of conditional probability we have:
p ( AnB) p ( A′nB) p ( AnB) + p ( A′nB)
p ( A B) + p ( B | A) = + = , but since A and A′ are form a
p( B) p( B) p( B)
partition of S we have,
p ( B ) = p ( AnB ) + p ( A′nB ) ⇒ p ( A | B ) + p ( B | A) = 1
i. p( A ' B) = 1 − p( A B)
ii. p ( B ' A) = 1 − p ( B A)

4.2 Multiplication theorem of probability


P( A ∩ B)
Recall that from the definition of conditional probability P ( A | B ) = , p ( B ) ≠ 0 then
P( B)
it follows that P ( A ∩ B ) = p ( A | B ). p ( B ). Therefore, the probability that two events, A and B,
both occur is given by the multiplication rule P ( A ∩ B ) = P ( B | A) ⋅ P ( A) = P ( A | B ) ⋅ P ( B )
In general let A1, A2, A3, …, An be an events defined on the same sample space and such that
n  n 
P  Ai  > 0 then P  Ai  = p ( A1). p ( A2 | A1). p ( A3 | A1nA2).. p ( An | A1nA2 n  nAn − 1) .
 i =1   i =1 
Note: For any two events A and B the following relation holds.
( ) ( )
p (B ) = p (B A). p ( A) + p B A ' . p A '
4.3 Theorem of Total Probability
Let A be an event defined on S, and let B1, B2, …, Bm be partition of S (mutually exclusive
events of nonzero probability whose union is the sample space S).

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 10


Probability and Statistics (Stat 2061) Lecture Note

Then
p ( A) = p(AnB 1 UAnB 2 U  UAnBm) = p ( AnB1) + p ( AnB 2) +  + p ( AnBk )
= p ( A | B1) p ( B1) + P ( A | B 2) p ( B 2) +  + p ( A | Bk ) p ( Bk )

Hence p ( A) = ∑ p ( Bi ). p ( A | Bi ) . This is called total probability theorem.

Now if we are interested in determining the value of P ( Bk | A) ,


P ( Bk ∩ A) P ( B ) P ( A | Bk )
P ( Bk | A) = = m k , i = 1,2,...,m.
∑ P( Bi ) P( A | Bi )
P ( A)
i =1
This is called the Bayes’ Theorem.
Example: In a binary communication system a zero and a one is transmitted with
probability 0.6 and 0.4 respectively. Due to error in the communication system a zero
becomes a one with a probability 0.1 and a one becomes a zero with a probability 0.08.
Determine the probability (i) of receiving a one and (ii) that a one was transmitted when
the received message is one.
Solution: Let S be the sample space corresponding to binary communication. Suppose T0

be event of transmitting 0 and T1 be the event of transmitting 1 and R0 and R1 be

corresponding events of receiving 0 and 1 respectively.


Given P (T0 ) = 0.6, P (T1 ) = 0.4, P ( R1 / T0 ) = 0.1 and P ( R0 / T1 ) = 0.08.
(i), p ( R1) = probability of receiving ‘one’
P ( R1) = p ((T 1) p ( R1 | T 1) + p (T 0) p ( R1 | T 0) = 0.4 x 0.92 + 0.6 x 0.1 = 0.448.
(ii), Using the Baye’s rule
p (T 1) p ( R1 | T 1) p (T 1) p ( R1 | T 1) 0.4 x0.92
P (T 1 | R1) = = = = 0.8214.
p ( R1) p ((T 1) p ( R1 | T 1) + p (T 0) p ( R1 | T 0) 0.4 x0.92 + 0.6 x0.1

4.4 Independent events


Two events A and B are independent if the occurrence on do not affect the probability of the
other to occur. Two events A and B are independent if and only if P ( A ∩ B ) = P ( B ) ⋅ P ( A)
otherwise, A and B are called dependent events.
Remark: If A and B are mutually exclusive events then they are independent if either P(A) = 0
or P(B) = 0.

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 11


Probability and Statistics (Stat 2061) Lecture Note

Proof: ( ) = 0 since A and B are mutually exclusive ( ) = ( ). ( ).


p ( A) p ( B ) = 0 ⇒ p ( A) = 0 or p ( B ) = 0 ⇔ Either A or B are possible empty set.
Theorem: Let A and B be independent events then
i. A and B’ are independent
Proof: ( ) = ( ). ( ) = ( \ ) = ( ) − ( )= ( )− ( ) ( )
= ( ){1 − ( )}, 1 − ( ) = ( ). ( ) = ( ). ( ).
Hence A’ and B are independent
ii. Similarly show that A and B’ are independent
iii. A’ and B’ are independent
Proof: ( ) = ( ). ( ), =( ) .
( ) = {( ) }= 1− ( )= 1−{ ( )+ ( )− ( )}
=1− { ( ) + ( ) − ( ). {1
( )} = − ( )}. {1 − ( )} = ( ) ( ′)
Hence A’ and B’ are independent
Example 1: The probability that a regularly scheduled flight departs on time is P{D) = 0.83; the
probability that it arrives on time is P(A) = 0.82; and the probability that it departs and arrives on
time is P{D D A) = 0.78. Find the probability that a plane (a) arrives on time given that it
departed on time, and (b) departed on time given that it has arrived on time.
Solution: The probability that a plane arrives on time given that it departed on time is
P ( A ∩ D ) 0.78
P( A | D) = = = 0.94
P( D) 0.83
The probability that a plane departed on time given that it has arrived on time is
P ( A ∩ D ) 0.78
P ( D | A) = = = 0.95
P ( A) 0.82
Example 2: A new computer program consists of two modules. The first module contains an
error with probability 0.2. The second module is more complex; it has a probability of 0.4 to
contain an error, independently of the first module. An error in the first module alone causes the
program to crash with probability 0.5. For the second module, this probability is 0.8. If there are
errors in both modules, the program crashes with probability 0.9. Suppose the program crashed.
What is the probability of errors in both modules?
Solution: Let define the events A, B, and C as follows:
A = {errors in module I}, B = {errors in module II}, C = {program crash}.
It is given that P {A} = 0.2, P {B} = 0.4, and P {A∩B} = (0.2)(0.4) = 0.08, since independence
We need to compute P {A∩B|C}.
Since A is a union of disjoint events AnB’ and A∩B, we compute.
P {AnB’} = {errors in module I alone} = P {A} − P {A∩B} = 0.2 − 0.08 = 0.12.
Similarly, P {BnA’} = {errors in module II alone} = 0.4 − 0.08 = 0.32.
By independence,
P {C | AnB’} = 0.5, P {C | BnA’} = 0.8, and P {C | A ∩ B} = 0.9.

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 12


Probability and Statistics (Stat 2061) Lecture Note

Events (AnB’), (BnA’), A∩B, and (A∪B) form a partition of S, because they are mutually
exclusive and exhaustive.
Therefore combining the Bayes Rule and the Law of Total Probability,
P (C | A ∩ B ) P ( AnB )
P ( AnB | C ) =
P (C )
where
P {C} = P{C|AnB’}*P{AnB’} + P{C|BnA’}*P{BnA’} + P{C|A∩B}*P{A∩B} + P{C|A’nB’}*P{A’nB’}.
0.9 * 0,08
Then P ( AnB | C ) = = 0.1856
0.5 * 0.12 + 0.8 * 0.32 + 0.9 * 0.08 + 0
Exercise 1: In a manufacturing plant three machines B1, B2 and B3 make 30%, 30% and
40% of the products respectively. It is also known that some of these are defective products
(D). It is known that: P(D|B1) = 0.1, P(D|B2) = 0.4, P(D|B3) = 0.07.
Find (a). P(D) and (b). P(B1|D)
Exercise 2: Let A and B be independent events with P(A) = 0.25 and P(A ∪ B) = 2P(B) −
P(A). Then find (a). P(B); (b). P(A|B); and (c). P(B’|A).
Exercise 3: let P(A) = 0.6, P(AuB) = 0.9, then find P(B) such that:
i. A and B are mutually exclusive
ii. A and B are independent

CHAPTER 5
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
Definition: Given a random experiment with sample space S, a function X that assigns to
each element s in S one and only one real number X(s) = x is called a random variable
(r.v.). A random variable is a numerical description of the outcomes of the experiment or
a numerical valued function defined on sample space, usually denoted by capital letters.
Example 1: Let X be the number of defective products when three products are tested.
Let sample space S = {DDD, DDN, DND, DNN, NDD, NDN, NND, NNN}.
Let D = defective and N = non defective item
Let X be a function defined on S such that X(DDD) = 3, X(DDN) = X(DND) = X(NDD) = 2,
X(DNN) = X(NDN) = X(NND) = 1 and X(NNN) = 0. The function X assigns a real numbers to
each element s in S. Thus X is a random variable.
Rx = {0, 1, 2, 3} is called the range space of X.
For notational purposes we shall denote the event {s: s S and X(s)= a} by {X = a} and we
denote:
P(X = a) = P({s: s S and X(s)= a}) and P(a<X<b) = P({s: s S and a < X(s)< b})
Then X is a discrete random variable, which takes on the values 0, 1, 2, 3.

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 13


Probability and Statistics (Stat 2061) Lecture Note

5.1 Discrete Random Variable


A discrete random variable is a random variable with a finite (or countably infinite)
range. Let X denote a random variable with range space R, a subset of real numbers.
Suppose that the space R contains a countable number of points. Such a set R is called
discrete sample space. The random variable X is called a discrete random variable, and X is
said to have a discrete distribution.
For a random variable X of the discrete type, we associate a number P(xi), 0 ≤ P ( xi ) ≤ 1 for i
= 1, 2, 3, … which is probability of x i.
The function P(x) = P(X = x) is called the probability mass function (pmf) if P(x), xR
satisfy the following properties:
i. P ( xi ) ≥ 0 for all x∈R;
ii. ∑ p( x) = 1 ;
x∈R

iii. P(X =x) = F(x)


The collection of pairs {xi, P(xi)} for i = 1, 2, 3, … is called a probability distribution, since X
is discrete random variable we call it a discrete probability distribution.
Example 1: Find the probability density function of the number of defective products when
three products are tested.
Solution: Let X equal number of defective outcomes.
Then X = {0,1,2,3}, the pmf of X and the distribution function F(x) of x is given by:
X 0 1 2 3
f(x) 1/8 3/8 3/8 1/8

Example 2: Let x be the random variable denoting the number of bits equal to 1 in an 8 bit
random binary number. Find the sample space and construct the probability distribution of
x. what is the probability that at most 3 bits ON? Exercise!
Solution: The sample space 28 = 256 possible binary numbers. Each are equally likely. The
random variable can have values between 0 and 8. Lets compute probabilities.
1
P ( x = 0) = ,
256
8c1 8
P ( x = 1) = =
256 256  8cx
 if x = 0,1, 2,  , 8
 P ( x) =  256
8ck  0 otherwise
P( x = k ) =
256
8c8 1
P ( x = 8) = =
256 256
3 3
8ck 93
P ( x ≤ 3) = ∑ P ( x = k ) = p ( x = 0) + p ( x = 1) + p ( x = 2) + p ( x = 3) = ∑ = = 0.36
k =0 k = 0 256 256

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 14


Probability and Statistics (Stat 2061) Lecture Note

5.2 Continuous Random variable


If a sample space contains an infinite number of outcomes (non countably infinite) equal to
the numbers of points on a line segment, it is called a continuous sample space. Let X
denote a random variable with space R, an interval or union of intervals. Such a set R is
called continuous sample space. The random variable X is called a continuous r.v., and X is
said to have a continuous distribution denoted as f(x).
The function f(x) is called probability density function (pdf) of a r.v. X if it satisfy the
following properties:
• f(x) ≥ 0, x∈R;
• ∫ f ( x)dx = 1 ;
R
b
• p ( a ≤ X ≤ b) = ∫ f ( x ) dx
a

 Probability means area for continuous random variable.


Remark:
A continuous random variable has a probability of zero of assuming exactly any one of its
values. i.e If X is continuous P(x = a) = 0.
Example 1: Consider the function
c (2 x - x 2 ) 0 ≤ x ≤ 2, c > 0
f(x) = 
 0 elsewhere
i. Find the value of c so that f(x) defines a probability density function
1 3
ii. Find P (0 ≤ X ≤1) and P ≤ X ≤ 
2 2

Solution: For a probability density function ∫ f(x)d x= 1, (note f(x) ≥ 0 for all)
−∞
2
 2 x3 
So ∫ ∞
−∞
2
0
2
f(x)d x = c ∫ ( 2 x-x )d x =c  x −
3
 =c 4 − 83 [ ] = 4c
3
=1 So c=3
4
.
 0
1 3
Find P (0 ≤ X ≤ 1 ) and P  ≤ X ≤ 
2 2
3 1
P (0 ≤ X ≤ 1 ) =
3 1 2  d x = 3  x 2 − x  = 1 .
∫ 2x − x  
4 0  4 3
 0 2
3
3
1 3 3 3 x 2
3
P  ≤ X ≤  = ∫ 2 ( 2 x − x 2 )d x =  x 2 − 
2 2 4 1 4
 3
1
2
3 2
3  3 
2
1  3  −  1  + 1  1   3 
2 3
26  33
=   −   = 2 − = .
4  2  32  
2 3  
2 
 4 24  48

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 15


Probability and Statistics (Stat 2061) Lecture Note

Exercise: Let X be a continuous random variable with distribution pdf


1
 x + k if 0 ≤ x ≤ 3
f ( x) =  6
 0 elsewhere
(i) Evaluate k.
(ii) Find P(1 ≤ X ≤ 2)

5.3 The cumulative probability density function and its properties

Let X be a random variable discrete or continuous, random variables, we define F(x) to be


the cumulative distribution function of the random variable x.
o The cumulative probability density function F(x) = P(X  x) is called the cumulative
distribution function of the discrete-type r. v. X defined as:

F(x) = P ( X < x) = ∑ P ( x) .
t≤ x
Notice: If X takes on a finite number of values x1, x2, …, xn then its cumulative distribution
function is given by
 0 ifX < x1
 P ( x1) if x1 ≤ x < x 2

 P ( x1) + P ( x 2) if x 2 ≤ x < x 3
F ( x) = 
 P ( x1) + P ( x 2) + p ( x 3) if x 3 ≤ x < x 4
 

 1 if x ≥ xn
If X is a random variable of the discrete type, then F(x) is a step function, and the height of a
step at x, xR, equals the probability P(X = x).
o The function F(x) of a continuous random variable X with density function f(x)
where F ( x ) = P ( X ≤ x ) = ∫
x
f (t ) dt , is called the cumulative distribution function
−∞

(cdf) of the continuous r.v. X.


b
 P(a < X < b) = ∫ f ( x)dx = F (b) − F (a)
a

∂F ( x)
 From the definition F(x) we have f ( x) =
∂x
Example 1: Consider the example on the number of defective products when 3 products
are tested. Construct the pmf of X

Solution: x 0 1 2 3
F(x) 1/8 4/8 7/8 1

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 16


Probability and Statistics (Stat 2061) Lecture Note

0 if x < 0
1
 if 0 ≤ x < 1
8
 1
F ( x ) =  if 1 ≤ x < 2
2
1
 7 if 2 ≤ x < 3

1 if x ≥ 3
Example 2: A random variable x has the probability density function
 x2
 if − 1 ≤ x ≤ 2.
f ( x) =  3
 0 other wise
Find CDF of x and using cdf find the probability x is between 0 and 1
x
Solution: F ( x) = ∫ f (t )dt
−∞
x
Case 1: x < −1, F ( x) = ∫ 0dt = 0
−∞
−1
x3 + 1
x
t2
Case 2: − 1 ≤ x < 2, F ( x) = ∫ 0dt + ∫ dt =
−∞ −1
3 9
−1 2 x
t2
Case 3: − 1 ≤ x < 2, F ( x) = ∫ 0dt + ∫ dt + ∫ 0dt =1
−∞ −1
3 2

 0 if x < − 1
 3
Therefore F ( x) =  x + 1 if − 1 ≤ x ≤ 2.
 9
1 if x ≥ 2
2 1 1
p (0 ≤ x ≤ 1) = F (1) − F (1) = − =
9 9 9
Exercise 1: A discrete pmf is given by
x − 2
 if x = 2, 3, 5.
f ( x) =  k
 0 other wise

a) Find the constant K for which P(x) is pmf.

b) Find the CDF of X


c) Compute p ( X ≥ 3)
d) Compute p ( X ≥ 3 | 3 ≤ X ≤ 5)
e) Compute p ( −2 ≤ X ≤ 1)
Exercise 2: Find the CDF of the following distribution

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 17


Probability and Statistics (Stat 2061) Lecture Note

x if 0 ≤ x ≤ 1

f ( x) =  2 − x if 1 ≤ x ≤ 2.
o otherwise

Properties of cumulative density function (CDF)


 The function F(x)is a probability, consequently 0 ≤ F ( x) ≤ 1
 F(x) is a non- decreasing function of x. that is for any two particular values x 1 and x2, if x1 ≤
x2, then F(x1) ≤ F(x2).
 F (∞) = lim F ( x) = 1 and F (−∞) = lim F ( x) = 0
x→∞ x→−∞

∂ F ( x)
 If X is continuous then f ( x) =
dx
 If X is discrete then P(X = xi) = F(xi) – F(xi-1)
- Probability of a fixed value of a continuous random variable is zero.
⇒ P (a < X < b) = P (a ≤ X < b) = P (a < X ≤ b) = P (a ≤ X ≤ b)
- If X is discrete random variable the
b −1 b −1
P (a < X < b) = ∑
x = a +1
P ( x) , P (a ≤ X < b) = ∑
x=a
p(x)
b b
P (a < X ≤ b) = ∑
x = a +1
P ( x ), P (a ≤ X ≤ b) = ∑
x=a
P (x)

 Remark: Once we know the point or pdf we can easily calculate their respective CDF and
vice versa.
CHAPTER 6
FUNCTIONS OF RANDOM VARIABLES

6.1Equivalent events
It is often the case that we know the probability distribution of a random variable X and are
interested in determining the distribution of some function of x. i.e we know the
distribution of x and we want to find the distribution of H(x). The function y = H(x) is called
a function of random variable.
S X(s) Rx y = H(x) Ry

Note y = H(x) is a real valued function. Hence its domain and range are set of real numbers.
Definition: Let E be an experiment and S be its corresponding sample space. Let X be the
random variable defined on the sample space S and Rx be the range space of x. Let B be an
event with respect to Rx. i.e B ⊆ R x .
Suppose that A is defined as A = {s ∈ S : x(s) ∈ B} . Then we say that the two events A and B
are equivalents and written as A ≅ B .
Example 1: consider the tossing of two coins. Let x be the number of heads obtained. Let B
be the event B = {1} with respect to Rx. Then find an event set A such that A ≅ B .
Solution: S = {HH, HT, TH, TT}
By definition A = {s ∈ S : x(s) ∈ B} ⇒ A = {s ∈ S : x(s) ∈ {1}}
x(s) ∈ B ⇒ x(s) = 1 , if s = HT, TH.

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 18


Probability and Statistics (Stat 2061) Lecture Note

A = {HT,TH} ⇔ A ≅ B .
Remark: For any events A and B which are equivalent then P(A) = P(B).
Definition: Let X be a random variable defined on the sample space S. Let R x be the range
space of X. Let H(x) be a real valued function and consider the random variable Y = H(x)
with range space Ry. Then for any event C ⊆ Ry we have P(C ) = P( x ∈ Rx : H ( x) ∈ C ) .

s1 sse x1 y1
s2 x2 y2
s3 x(s) x3 y = H(x) y3
... … …
sn xn yn
A = {s ∈ S : x(s) ∈ B}
P(C ) = P{x( s) ∈ Rx : H ( x) ∈ C )}
Example 2: Let X be a continuous random variable with probability density function
 e − x if x > 0
f ( x) = 
0 otherwise
a. Is f(x) really a probability density function?
b. If Y= H(x) = 2x + 1 determine the range space of X and Y
c. Suppose that the event C is defined as C = {y ≥ 5}, then determine the event
B = {x ∈ Rx : H( x) ∈ C} , where H(x) = 2x + 1.
d. Determine P{y ≥ 5} from the event B.
Solution:
∞ ∞
a. f ( x) ≥ 0 ∀x and ∫
−∞
f ( x)dx = ∫ e − x dx = 1 , Hence f(x) is pdf
0

b. Rx = {x : x > 0} , given
Ry = {y : y > 1}, y = 2x + 1 since x >0, y >1.
c. B = {x ∈ Rx : 2 x + 1 ∈ C} = {x ∈ Rx : 2 x + 1 ≥ 5} ⇒ B = {x ∈ Rx : x ≥ 2}
∞ ∞
1
d. P( y ≥ 5) = P(2 x + 1 ≥ 5) = P( x ≥ 2) = ∫ f ( x)dx = ∫ e − x dx =
2 2 e2
6.2Functions of discrete random variables and their distributions
If X is a discrete random variable then Y = H(x) is also a discrete random variable. Let x 1, x2,
x3, …, xn, … be the possible values of x with P(xi) = P(X = xi). Let Yi = H(xi) for i = 1,2,3, … be
the possible values of x then P(Y = yi) = P(X = xi).
Example 1: suppose the random variable X assume three values -1, 0, 1 with probabilities
1 1 1
, , and respectively. Find the probability function of (a) y = x 2 and (b) y = 2x + 5.
3 2 6
Solution: The pmf of X is given as
X -1 0 1
P(x) 1/3 ½ 1/6
a. Y = x , if x = -1, y = 1, if x = 0, y = 0, and if xx = 1, y = 1
2

Ry = {0, 1}

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 19


Probability and Statistics (Stat 2061) Lecture Note

1 1 1 1
P(y = 0) = P(x = 0)= and P(y = 1) = P(x = -1) + P(x = 1)= + = .
2 3 6 2
The pmf of y = x2 is

Y 0 1
P(y) ½ 1/2
b. Y = 2x + 5, if x = -1, y = 3, if x = 0, y = 5 and if x = 1, y = 7.
1 1 1
P(y = 3) = P(x = -1)= , P(y = 5) = P(x = 0) = and P(y = 7)= P(x = 1) = .
3 2 6
The pmf of y = 2x + 5 is

Y 3 5 7
P(y) 1/3
1/2 1/6
1  1 if x is even
Exercise: Let P( x) = x for x = 1, 2, 3, 4, …, and let H ( x) = 
2 − 1 if x is odd
Then show that (a) P(x) is a legitimate probability function.
(b) Find the probability distribution of y = H(x).

6.3. Functions of continuous random variables and their distributions


Case 1: A continuous random variable may be give rise to a function which is discrete.
 1 if x ≥ 0
Example: Let X be a continuous random variable defined over ℜ . Define Y = H ( x) = 
− 1 if x < 0
Then Y is a discrete random variable
∞ 0
P( y = 1) = P( x ≥ 0) = ∫ f ( x)dx and P( y = −1) = P( x < 0) =
0 −∞
∫ f ( x)dx
Case 2: A continuous random variable may give rise to a function which is continuous.
Suppose that X is continuous random variable with pdf f(x) and y = H(x) is a continuous random
variable.
If we want to find the pdf of y say g we follow the following procedures.
 Obtain the CDF of y, G(y) = P(Y = y) from the pdf of x or CDF of X
 Differentiate G(y) with respect to y in order to get g(y)
 Determine the range space of y for which g(y) >0.
 2 x if 0 ≤ x ≤ 1
Example 1: Let f ( x) = 
0 otherwise
Let y = H(x) = 2x + 1 then find g(y), i.e pdf of Y
y −1

y −1 2
( y − 1) 2
Solution: G( y) = P(Y ≤ y) = P(2 x + 1 ≤ y) = P( x ≤
2
)= ∫ 2 xdx =
0
4

 ( y − 1) 2  y −1
g ( y ) = G ′( y ) =   = ,
 4  2

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 20


Probability and Statistics (Stat 2061) Lecture Note

y −1
f ( x) > 0 ∀ 0 ≤ x ≤ 1 , ⇒ g ( y ) > 0 ∀ 0 ≤ ≤1⇔1≤ y ≤ 3
2
 y − 1
for 1 ≤ y ≤ 3
g ( y) =  2
 0 otherwise
Example 2: Suppose that f(x) is as defined in example 1 determine the probability density
function of Y = H(x) e − x .
1

∫ 2 xdx = 1 − (ln y)
−x
Solution: G ( y ) = P (Y ≤ y ) = p (e ≤ y ) = p (− x ≤ ln y ) = p ( x ≥ − ln y ) = 2

− ln y

[ ]′ − 2 ln y
g ( y ) = G ′( y ) = 1 − (ln y ) 2 =
y
1
f ( x ) > 0 for 0 ≤ x ≤ 1 , ⇒ g ( y ) > 0 for 0 ≤ − ln y ≤ 1 ⇔ ≤ y ≤1
e
 − 2 ln y 1
 for ≤ y ≤ 1
g ( y) =  y e
 0 otherwise
Theorem 1: Let X be continuous random variable with pdf f where f(x) >0 for a < x < b. Suppose that
Y = H(x) is strictly monotonic (increasing or decreasing) function of of x. Assume that this function is
differentiable and continuous for all x. Then the random variable y defined as Y = H(x) has pdf g give
by
d H (−y1)
g ( y ) = f (H ( y ) )
−1

dy
Remark:
1. If H is strictly increasing {H ( x) ≤ y} is equivalent to x ≤ H (−y1) { }
2. If H is strictly decreasing then {H ( x) ≥ x} is equivalent to
Proof:
Case 1: Assume that H is strictly increasing function then
G ( y ) = P(Y ≤ y ) = p( H ( x) ≤ y ) ⇒ H ( x) ≤ y ≅ x ≤ H (−y1)
G ( y ) = P( x ≤ H (−y1) ) ⇒ p( X ≤ x) = F ( x) = F ( H (−y1) )
dH (−y1)
[ (
g ( y ) = G ( y ) = F H (−y1) )]′ = F ′( H )(H )′ ⇒ g ( y) = f ( H
−1
( y)
−1
( y)
−1
( y) )
dy
………*

Case 2: Assume that H is strictly decreasing function


( )
G ( y ) = P(Y ≤ y ) = p( H ( x) ≤ y ) = p x ≥ H (−y1) = 1 − p x ≤ H (−y1) = 1 − F H (−y1) ( ) ( )
dH (−y1)
[
⇒ g ( y ) = G ′( y ) = 1 − F H ( −1
( y) )] ′
=−f H ( −1
( y) ) dy
…………………………**
−1

From (*) and (**) ⇒ g ( y ) = f H (−y1) ( ) dHdy ( y)

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 21


Probability and Statistics (Stat 2061) Lecture Note

1 if 0 ≤ x ≤ 1
Example 1: Suppose that f ( x) = 
0 otherwise
Find the pdf of y = -lnx
Solution: The function y = -lnx is a strictly decreasing function.
−y −1
dH (−y1)
y = − ln x ⇒ − y = ln x ⇔ x = e = H ( y ) and = −e − y
dx
−1
e −y
for y > 0
(
⇒ g ( y ) = f H (−y1) )
dH ( y )
⇒ g ( y) = 
dy  0 otherwise
 1
if − 1 ≤ x ≤ 1
Example 2: Suppose f ( x) =  2
 0 otherwise
2
Find the pdf of y = x .
Solution: The function y = x2 is not monotonic function over the range -1 to 1. Hence the above
theorem is not applicable.
{ } { }
G ( y ) = P(Y ≤ y ) = p( x 2 ≤ y ) = p x ≤ y = p − y ≤ x ≤ y = p( x ≤ y ) − p( x ≤ − y )
= F ( y ) − F (− y )
1 1 1 1 1 −1 1
g ( y ) = f ( y ). − f (− y ). = . − . = .
2 y 2 y 2 2 y 2 2 y 2 y
 1
 if 0 ≤ x ≤ 1
g ( y) =  2 y
0 otherwise
Theorem 2: Let X be a continuous random variable with pdf f(x). Let y = x2, then the random
variable y has a pdf g(y) given by
g ( y) =
1
2 y
{
f ( y ). − f (− y ) . }
 x2

Example 3: Let f ( x) =  81 if − 3 ≤ x ≤ 6
 0 otherwise
Find the pdf of y = H(x) = x2.
Solution: For 0 ≤ y ≤ 9, y = x 2 is not monotonic. Hence use theorem 2 above or definitional
approach.

g ( y) =
1
2 y
{
f ( y ). − f (− y ) . =
1 y
}
y
 . + . =
2 y  81 81 81
y
.

For 9 ≤ y ≤ 36, y = x 2 is monotonic function. Hence use theorem 1.


dH (−y1)
(
g ( y) = f H −1
( y) ) dy

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 22


Probability and Statistics (Stat 2061) Lecture Note

1
y = x 2 ⇒ x = ± y = H (−y1) = ± .
2 y
2
 y  1  y
g ( y) =  ±   = , for 9 ≤ y ≤ 36
 81   2 y  162
   
 y
 if 0 ≤ y ≤ 9

81
g ( y) =  y
 if 9 ≤ y ≤ 3
162
0 otherwise

CHAPTER 7
INTRODUCTION TWO DIMENSIONAL RANDOM VARIABLES
7.1 Two Dimensional Random Variable
We may define two or more random variables on the same sample space. Let S be the sample
space associated with a random experiment E. Let X and Y be two real random variables defined
on the same space. Let X=X(s) and Y=Y(s) be two functions each assigning a real number to
each outcomes s Є S. Then (x, y) is called a two dimensional random variables or random
vectors.
Example: Suppose in a communication system X is the transmitted signal and Y is the
corresponding noisy received signal. Then (x, y) is a joint random variable.
Note
i If the possible values of (X, Y) are finite or countable infinite, (X, Y) is called a two
dimensional discrete random variable.
ii If (X, Y) can assume all values in a specified region R in the xy-plane, (X, Y) is called a
two dimensional continuous random variable.

7.2 Joint Probability Distribution Function


Recall the definition of the distribution of a single random variable. The event {X ≤ x}was
used to define the probability distribution function f (x ). Given f (x ) we can find the
probability of any event involving the random variable. Similarly, for two random variables
X and Y , the event { X ≤ x, Y ≤ y} = { X ≤ x} ∩ {Y ≤ y} is considered as the representative
event. The probability p{X ≤ x, Y ≤ y} ∀ ( x, y )∈ S is called the joint distribution function of
the random variables X and Y and denoted by f ( x, y ).

a. Joint probability Mass Function (Discrete Case)


If X and Y are two discrete random variables defined on the same probability space such
that X takes values from the countable subset RX and Y takes values from the countable
subset RY . Then the joint random variable ( X , Y ) can take values from the countable subset
in RX × RY . If (X, Y) is a two-dimensional discrete random variable such that

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 23


Probability and Statistics (Stat 2061) Lecture Note

P(X = xi, Y = yj) = p ( x, y ) , then p ( x, y ) is called the probability mass function of (X, Y)
provided that:
i. p ( x, y ) ≥ 0 for all x ,y;
ii. ∑ p ( x, y ) = 1
R
iii. p ( X = x, Y = y ) = F ( x, y )
The set of triple {( xi, yj ), p ( xi, yj )} is called a joint probability distribution of (x, y).
Example 1: Suppose that two machines are used for a particular task in the morning and for a
different task in the afternoon. Let X and Y represent the number of times that a machine breaks
down in the morning and in the afternoon respectively. The table below give the joint probability
distribution of (x, y).
Y
0 1 2 Total
X 0 0.25 0.15 0.10 0.50
1 0.10 0.08 0.07 0.25
2 0.05 0.07 0.13 0.25
Total 0.40 0.30 0.30 1
a. What is the probability the machine breaks down equal number of times in the morning
and in the afternoon?
b. What is the probability the machine breaks down greater number of times in the morning
and in the afternoon?
Solution:
(a), p ( x = y ) = p ( x = 0, y = 0) + p ( x = 1, y = 1) + p ( x = 2, y = 2) = 0.25 + 0.08 + 0.13 = 0.46.
(b), p ( x > y ) = p ( x = 1, y = 0) + p ( x = 2, y = 0) + p ( x = 2, y = 1) = 0.15 + 0.10 + 0.07 = 0.32.

b. Joint probability Density Function (Continuous Case)


If (X, Y) is a two-dimensional continuous RV then f(x, y) is called the joint pdf of (X, Y),
provided that f(x, y) satisfies the following conditions.
i. f ( x, y ) ≥ 0 for all x , y ;
ii. ∫∫R
f ( x, y ) dx dy = 1 ; the probability is given by integrating f ( x, y ) .
b d

Note: P{ a < X ≤ b, c < Y ≤ d} = ∫ ∫ f (x, y) dydx


a c

Example 1: Let (X, Y) be a continuous two dimensional RV with their joint pdf
c if 0 < x < 2, 0 < y < 4
f ( x, y ) = 
0 otherwise
a. Determine c
b. Find P(X <1, Y <3)
c. Find P(X > y)
Solution:

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 24


Probability and Statistics (Stat 2061) Lecture Note

∞ ∞ 4 2 4 2
  4 4
1
∫−∞−∫∞ ∫0 ∫0 ∫0 ∫0  ∫0 x 0 ∫0 dy = 2c(4 − 0) = 8c = 1 ⇔ c = 8
2
a. f ( x , y ) dxdy = cdxdy = 1 ⇒ cdx dy = c ( )dy = 2 c

1
 if 0 < x < 2, 0 < y < 4
f ( x, y ) =  8
 0 otherwise
3 1 3 1
1
3 1
 1  1
3
1
3
1 3
∫−∞−∫∞ ∫0 ∫0 8 ∫0 ∫0 8  8 ∫0 0 ∫
1
b. p ( x < 1, y < 3) = f ( x , y ) dxdy = dxdy = dx dy = ( x )dy = dy = (3 − 0) =
80 8 8

2
2 2
1  y2
2 2 2 2 2 3
1 1 1 1 1
p( x > y ) = ∫ ∫ f ( x, y )dxdy = ∫ ∫ dxdy = ∫  ∫ dxdy = ∫ ( x )dy = ∫ (2 − y )dy = (2 y −
2
) =
8  y 8  80 80 2 0 4
y
0 y 0 y 0 8
Example 2: Suppose X and Y have a joint pdf
cx if 0 < y < x < 1, 0 < x 2 < y < 1
f ( x, y ) = 
0 otherwise
a. Determine c that makes f(x,y) a legitimate pdf.
1 1
b. Find p ( x < , y < ) .
2 2
∞ ∞ 1  y  1 x 2 y 
1 y
 
1
 y y2 
a. ∫ ∫ f ( x, y )dxdy = ∫ ∫ cxdxdy = 1 ⇒ ∫ c  ∫ xdxdy = c ∫ dy = c ∫  − dy = 1
0    2 y   2 2 
− ∞− ∞ 0 y  y  0   0
1

c y2 y3  c 1 1
2 2 
= − =  −  = 1 ⇔ c = 12
2  2 3
 3
 0
y
 1 1
1 y 1

p x < , y <  = ∫ ∫ 12 xdxdy = ∫ 
12 x 2 

1
dy = ∫ ( y − y 2 )dy =
1
 0 2 
2 2 0 y 0
2
y
7.3 Joint Cumulative Distribution Function

If (X, Y) is a two-dimensional RV (discrete or continuous), then F(x , y) = P{ X ≤ x and Y ≤ y }


is called the CDF of (X,Y).
In the discrete case, F ( x, y ) = P(X ≤ x, Y ≤ y) = ∑ ∑ P(x, y)
X ≤ x Y≤ y
x y
In the continuous case, F ( x, y ) = P(X ≤ x, Y ≤ y) = ∫ ∫ f (t , z ) dt dz
− ∞− ∞
Properties of Joint CDF, F (x , y)
i. F(-∞ , y) = 0 = F( x , - ∞ ) and F (∞ , ∞ ) = 1
ii. P {a < X < b , Y ≤ y} = F (b , y) – F (a , y)
iii. P{X ≤ x , c <Y <d} = F (x , d) – F (x , c)

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 25


Probability and Statistics (Stat 2061) Lecture Note

iv. P{a < X < b , c < Y <d } = F (b, d) – F(a, d) – F(b , c) + F(a , c)

∂ 2 F ( x, y )
v. At points of continuity of f (x , y), f ( x, y ) =
∂x ∂y
7.4 Marginal Probability Distribution
With two dimensional random variable (x, y) we associate one dimensional random
variable namely X or Y. that is we may be interested in the probability distribution of X or Y
only. We call such probabilities distribution as marginal probability distribution.
a. In the discrete case
Let X assumes the values x1, x2, x3, …, xm and Y assumes the values y1,y2, y3, …, yn then the
probability of an event X = x i and Y = yj for i = 1,2,3,…,m and j = 1,2,3,…,n.
P(X = x) = P{(X = xi and Y = y1) or ( X = xi and Y = y 2) or etc.
= p(x = xi) = ∑ p{x = xi, y = yj} is called the marginal probability function of X.
j

The collection of pairs {Xi, Pi.}, i = 1, 2, 3,…, m is called the marginal probability distribution
of X.
Similarly; P(Y = y) = P{(Y = yj and X = x1 ) or (Y = yj and X = x 2 ) or etc.}
= p(y = yj) = ∑ p{x = xi, y = yj} is called the marginal probability function of Y.
i
The collection of pairs {yj, P.j}, j = 1, 2, 3,…, n is called the marginal probability distribution
of Y.
b. In the continuous case
Let (X,Y) be a two dimensional continuous random vector with joint pdf f(x,y). Then the
individual or marginal distribution of X and Y are defined by the pdf’s
∞ ∞
g ( x) = ∫ f ( x, y) dy and h( y) = ∫ f ( x, y) dx , respectively.
−∞ −∞
Example 1: Recall the example on machine operation. Find the marginal distribution of x and y.
Solution: Marginal of x, p(x = xi) = ∑ p{x = xi, y = yj}
j

If x = 0, p(x = 0) = p(x = 0, y = 0) + p ( x = 0, y = 1) + p ( x = 0, y = 2) = 0.25 + 0.10 + 0.05 = 0.40.


Similarly find p(x = xi) for i = 1,2.
0.4 if x = 0

p(x = x) = 0.3 if x = 1, 2
0 otherwise

Marginal of y, p(y = yi) = ∑ p{ = xi, y = yj}
i
If y = 0, p(y = 0) = p(x = 0, y = 0) + p ( x = 1, y = 0) + p ( x = 2, y = 0) = 0.25 + 0.15 + 0.10 = 0.50.
Similarly find p(y = yj) for j = 1,2.

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 26


Probability and Statistics (Stat 2061) Lecture Note

0.5 if x = 0

p(y = yi) = 0.25 if x = 1, 2
0 otherwise

Example 2: Let (x, y) be a two dimensional continuous random variable with joint pdf
1
 if 0 < x < 2, 0 < y < 4
f(x, y) =  8
0 otherwise
Find the marginal of x and y.
Solution:
Marginal of x Marginal of y
∞ 4 ∞ 2
1 1 4 1 1 1 2 1
g(x) = ∫ f(x, y)dy = ∫ dy = y = h(y) = ∫ f(x, y)dx = ∫ dx = x =
-∞ 0
8 8 0 2 -∞ 0
8 8 0 4
1 1
 if 0 < x < 2  if 0 < y < 4
g(x) =  2 h(y) =  4
0 otherwise 0 otherwise
12 x, if 0 < y < x < 1, 0 < x 2 < y < 1
Example 3: Let f ( x, y ) = 
0 otherwise
Find marginal of x and y
∞ x x
Solution: Marginal of x, g(x) = ∫ f(x, y)dy = ∫ 12 xdy = 12 x ∫ dy = 12 xy x 2 = 12 x ( x − x 2 )
x

-∞ x2 x2

12 x ( x − x 2 ) if 0 < x < 1


g(x) = 
0 otherwise
∞ y y 2
y

Marginal of y, h(y) = ∫ f(x, y)dx = ∫ 12 xdx = 12 ∫ xdx = 12 x = 6 y (1 − y )


-∞ y y 2 y

6 y (1 − y ) if 0 < y < 1
h(y) = 
0 otherwise
Exercise: Suppose (x, y) has a joint pdf
2( x + y − 2 xy ), if 0 < x < 1, 0 < y < 1
f ( x, y ) = 
0 otherwise
Find marginal of x and y.

7.5 Conditional Distribution Function


When two random variables are defined in a random experiment, knowledge of one can change
the probabilities of the other.

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 27


Probability and Statistics (Stat 2061) Lecture Note

a. In the discrete case


Definition: Given two discrete random variables X and Y with joint pmf P(x, y) the conditional
p( X = xi, Y = yj ) P(x i , y j )
pmf of X given Y is defined as: P( X = xi | Y = yj ) = = similarly the
p  Y = y j  P  y j 
p ( X = xi, Y = yj ) P (x i , y j )
conditional pmf of Y given X is defined as: P (Y = yj | X = xi ) = = .
p ( X = xi ) P (x i )
b. In the continuous case
Definition: Let X and Y denote two random variables with joint probability density function
f(x, y) and marginal densities g(x), h(y) then the conditional pdf of X given Y = y, is defined as
f ( x, y )
g ( x | y) = , h( y ) > 0 , similarly the conditional pdf of Y given X = x, is defined as
h( y )
f ( x, y )
h( y | x ) = , g ( x) > 0 .
g ( x)
Example 1: In a binary communication channel, let x denote the bit sent by the transmitter and
let y denote the bit received at the other end of the channel. X is a discrete random variable with
two possible outcomes (0, 1) and Y is a discrete random variable with two possible outcomes (0,
1). Due to noise in the channel we don’t always have y = x. A joint probability distribution is
given as:
P(x, y) = X
0 1
Y 0 0.45 0.03
1 0.05 0.47

i. Find marginal of x and y


ii. Evaluate the conditional of x and y
Solution:
i. Marginal of x Marginal of y
0.48 if y = 0
 0.5 if x = 0, x = 1
p(x) = 0.52 if y = 1 p(y) = 
0 otherwise 0 otherwise

ii.
p ( x = 0, y = 0) 0.45 p ( x = 1, y = 0) 0.03
P ( y = 0 | x = 0) = = = 0.90, P ( y = 0 | x = 1) = = = 0.06
p (x = 0 ) 0 .5 p (x = 1) 0 .5
p ( x = 0, y = 1) 0.05 p ( x = 1, y = 1) 0.47
P ( y = 1 | x = 0) = = = 0.1, P ( y = 1 | x = 1) = = = 0.94
p (x = 0 ) 0 .5 p (x = 1) 0 .5
p ( x = 0, y = 0) 0.45 p ( x = 0, y = 1) 0.03
P ( x = 0 | y = 0) = = = 0.9375, P ( x = 0 | y = 1) = = = 0.05769
p ( y = 0) 0.48 p ( y = 1) 0.52

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 28


Probability and Statistics (Stat 2061) Lecture Note

p ( x = 1, y = 0) 0.05 p ( x = 1, y = 1) 0.47
P ( x = 1 | y = 0) = = = 0.1047, P ( x = 1 | y = 1) = = = 0.9038
p ( y = 0) 0.48 p ( y = 1) 0.52
2 if x > 0, y > 0, x + y < 1
Example 2: f ( x, y ) = 
0 otherwise
 1 1  1 1
Find (a), P  x < | y =  , (b), P  y > | x = 
 2 4  3 2
Solution: First find the marginals
∞ 1- x ∞ 1- y

g(x) = ∫ f(x, y)dy = ∫ 2dy = 2(1 − x), h(y) = ∫ f(x, y)dx = ∫ 2dx = 2(1 − y )
-∞ 0 -∞ 0
Then find the conditionals
f ( x, y ) 2 1 f ( x, y ) 2 1
g ( x | y) = = = , 0 < x < 1 − y , h( y | x ) = = = , 0 < y < 1− x
h( y ) 2(1 − y ) 1 − y g ( x) 2(1 − x) 1 − x
1

 1 2
1 1 4 3 1 4 2
a. If y = , g ( x | y) = = , for 0 < x < , then P  x < | y =  = ∫ dx =
4
1−
1 3 4  2 4 0 3 3
4
1

 1 2
1 1 1 1 1
b. If x = , h( y | x ) = = 2, for 0 < y < , then P  y > | x =  = ∫ 2dy =
2
1−
1 2  3 2 1 3
2 3

7.6 Independent Random Variables


Two random variables X and Y are said to be independent if:
a. Discrete: Let X and Y denote two discrete random variables
P(x,y) = P(X)*P(Y) or equivalently
P ( X = xi | Y = yj ) = P ( X = xi ) and P (Y = yj | X = xi ) = P (Y = yj )
b. Continuous: Let X and Y denote two continuous random variables
f(x, y) = g(x)*h(y) or equivalently g ( x | y ) = g ( x ) and h( y | x ) = h( y )
Example 1: The joint probability mass function (pmf) of X and Y is
P (x, y) =
Y 0 1 2
X 0 0.1 0.04 0.02
1 0.08 0.2 0.06
2 0.06 0.14 0.3
Compute the marginal pmf X and of Y, P [X ≤ 1,Y ≤ 1] and check if X and Y are independent.
Solution:
The marginal pmf of X is P(X = 0) = 0.16; P(X = 1) = 0.34 and P(X = 2) = 0.5 similarly the
marginal pmf of Y are, P(Y = 0) = 0.24; P(Y = 1) = 0.38 and P(Y = 2) = 0.38
Now P[X ≤ 1,Y ≤ 1] = P[X = 0,Y = 0] +P[X = 0,Y = 1] +P[X = 1;Y = 0] + P[X = 1;Y = 1]
= 0.1 + 0.04 + 0.08 + 0.20 = 0.42
If Pij = Pi. *Pj. we can say that X and Y are independent, we have P0. = 0.16 and P.0 = 0.24

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 29


Probability and Statistics (Stat 2061) Lecture Note

∴ P0.*P.0 = 0.0384 ≠ 0.1. Therefore Pij ≠ Pi. *P.j.


Hence X and Y are not independent.
Example 2: The joint probability mass function of (X, Y) is given by P( x, y) = k (2x + 3y);
x = 0,1,2, y =1,2,3. Find the marginal probability distribution of X
Solution: P(x,y) be the probability mass function, we have
3K + 6K + 9K + 5K + 8K + 11K + 7K + 10K + 13K = 1 ⇒ 72K = 1 ⇒ K = 1/72
The marginal probability distribution of X:
Y P(X = xi)
X 1 2 3
0 3/72 6/72 9/72 P(X= 0) = 18/72
1 8/72 8/72 11/72 P(X= 1) = 24/72
2 10/72 10/72 13/72 P(X= 2) = 30/72

Hence the marginal probability distribution of X are given by


P (X = 0) = 18/72, P (X = 1) = 24/72 and P (X = 2) = 30/72
4 xy, if 0 < x < 1, 0 < y < 1
Example 3: Suppose f ( x, y ) = 
0 otherwise
Is X and Y are independent?
1 1
g ( x) = ∫ 4 xydy = 2 x, h( y ) = ∫ 4 xydx = 2 y
0 0

g ( x) * h( y ) = 2 x * 2 y = 4 xy = f ( x, y )
X and Y are independent
2( i + j ) , if i = 1, 2, 3,  , j = 1, 2, 3, 
Example 4: Let p ( x = i, y = j ) = 
0 otherwise
Is X and Y are independent?
Solution:
∞ ∞ ∞
p(x = i ) = ∑ p{x = i y = j} = ∑ 2 −i 2 − j = 2 −i ∑ 2 − j = 2 −i (1) = 2 −i , similarly
j =1 j =1 j =1
∞ ∞ ∞
p(y = j) = ∑ p{x = i y = j} = ∑ 2 −i 2 − j = 2 − j ∑ 2 −i = 2 − j (1) = 2 − j
i =1 i =1 i =1
−i −j
P( x = i, y = j) = p ( x = i ). p ( y = j) = 2 .2 = 2 − (i + j) . Hence X and Y are independent.

CHAPTER 8
INTRODUCTION TO EXPECTATION OF A RANDOM VARIABLE
8.1 Introduction
Definition:
1. Let a discrete random variable X assume the values X1, X2, …. with the probabilities P(X1),
P(X2), …., respectively. Then the expected value of X or the mean of X, denoted as E(X) is
defined as:

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 30


Probability and Statistics (Stat 2061) Lecture Note


E ( X ) = X 1 P ( X 1 ) + X 2 P ( X 2 ) + ....) = ∑X i P ( X i ) provided that the series
i =1

∑ x p( x ) converges.
i i

n
If x assumes only a finite number of values say n, then E ( X ) = ∑X i P( X i )
i =1

If all the possible values of X are equally probable then their respective value is:
∞ n
1 1
E(X ) = ∑
i =1
Xi *
n
=
n
∑x
i =1
i = X

Example 1: Suppose that the probability mass function of X is given as


 1
if x = 1, 2
P ( x) =  2
 0 otherwise
Find expected value of X
Solution:
2
1 1 3
E(X ) = ∑X
i =1
i P ( X i ) = X 1 P ( X 1 ) + X 2 P ( X 2 ) = 1 * P ( x = 1) + 2 * P ( x = 2 ) = 1 *
2
+ 2* =
2 2
 x
Example 2: Assume X has the p.m.f. f ( x ) =  10 if x = 1,2,3,4.
 0 else where
4
 x 1 2 3 4
E ( X ) = ∑ x  = (1)  + (2)  + (3)  + (4)  = 3
x =1  10   
10  
10  
10  10 
2. Let X be a continuous random variable assuming the values in the interval (a, b), then the
b
expected value of X is defined as E ( X ) = x f ( x ) dx
∫ a

Example 3: The probability density function for a continuous random variable X is given as
 x + 2
 , − 2 < x < 4
f ( x ) =  18
 0 , otherwise
Find expected value of X
( x + 2 ) dx =
4
4
 x2 x   x3 x2 
4
Solution: E ( X ) =  = ∫− 2 18 ∫− 2 18 + 9  dx =  54 + 18  = 2 .

x⋅ 
−2
8.2 Expectation of a function of random variable
Definition: Let X be a random variable and Y = H(x) a function of X, hence Y is a random
variable. There are two ways of evaluating E(Y).
Case 1: If Y is discrete random variable with the possible values of Y to y 1, y2, … and with

probability P(y)= P(Y=yi), then E (Y ) = ∑ yip ( y i ) .
i =1

If Y is continuous random variable with pdf g(y) then E ( y ) = ∫ y g ( y ) dy
−∞
n
Case 2: If X is discrete random variable and P(xi) = P(X = xi) we have E (Y ) = ∑ yip ( x i )
i =1
∞ ∞
If X is a continuous random variable with pdf f(x), then E ( y ) = ∫ y f ( x ) dx = ∫ H ( x ) f ( x ) dx
−∞ −∞

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 31


Probability and Statistics (Stat 2061) Lecture Note

Example 1: let X is continuous random variable with pdf


1 x
 2 e if x ≤ 0
 1
f ( x) =  e− x if x > 0
2
0 elswhere

Let Y = |x|, find E(Y)
Solution:
∞ ∞ 0 x ∞ −x

− x*e xe
1 1 −x
E ( y) = ∫y f ( x ) dx = ∫ | x | f ( x ) dx = ∫ dx + ∫ = ∫ − x e dx + 2 ∫ x e
x
dx
−∞ −∞ −∞
2 0
2 2
1 1
= ( − 1 + 0 ) + (1 − 0 ) = 1
2 2
Exercise: find E(X)
Properties of expectations
1. If X is constant, X = c then E(c) = c
2. If c is constant and X is a random variable then E(cX) = cE(X)
3. Let X and Y be a two dimensional random variables then E(X+Y) = E(X) + E(Y)
4. If Y = a + bX, then E(Y) = a + bE(X)
5. Let (X,Y) be atwo dimensional random variables then if X and Y are independent,
E(XY) = E(X)E(Y).
Variance of a random variable and its properties
Definition: Let X be a random variable with probability distribution f(x) and mean
E(X) = µ, we define the variance of X which is defined as var(x) or  x 2 as follows:
{
var( x ) = E ( X −  )
2
}
The positive square root of  x 2 is called standard deviation of X.
Theorem : var( x ) = E {( X − E ( X ) ) 2 } = E ( X 2 ) − ( E ( X ) ) 2
Properties of variance
1. If c is constant, then var(cx) = c2var(x)
2. If c is constant, then var(c) = 0
3. If b and c are constant, then var(bx+c) = b2var(x)
4. If (x, y) is a two dimensional random variable and if they are independent, then
var(X + Y) = var(x) + var(y)
 0 . 125 if x = 0,3.
Example 1: Let X have the p.m.f.: 
f ( x ) =  0.375 if x = 1, 2
 0 else where
Find the mean, variance and the standard deviation of X.
Solution:  = E(X) = 0(0.125) + 1(0.375) + 2(0.375) + 3(0.125) = 1.5
2= E[(x-)2] = (-1.5)2(.125) + (-.5)2(.375) + (.5)2(.375) + (1.5)2(.125) = 2.90625
 = E ( x −  )  = 2.90625 = 1.721
2
 
Example 2: The probability distribution function for a discrete random variable X is

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 32


Probability and Statistics (Stat 2061) Lecture Note

2 k , x = 1
3k , x = 3

f ( x) =  where k is some constant.
 4 k , x = 5
0, otherwise
Then find (a) k. (b) E ( X ) and Var ( X )
Solution:
1
(a) ∑ f ( x ) = f (1) + f ( 3 ) + f ( 5 ) = 2 k + 3 k + 4 k = 9 k = 1 ⇔ k = .
x 9
(b) E ( X ) = ∑ xf ( x ) = 1 ⋅ f (1 ) + 3 ⋅ f ( 3 ) + 5 ⋅ f ( 5 ) = 1 ⋅ 2 + 3 ⋅ 3 + 5 ⋅ 4 = 31 an
x 9 9 9 9
2 2 2
 31   31   31 
Var ( X ) = ∑ (xx
− u ) 2
f ( x) = 1 −


9 
⋅ f (1 ) +  3 −


9 
⋅ f (3) +  5 −


9 
⋅ f (5 )

=
(− 22 )2 ⋅
2
+
16

3
+
14 2 4
⋅ =
200
81 9 81 9 81 9 81
Example 3: The probability density function for a continuous random variable X is
a + bx 2 , 0 ≤ x ≤ 1
f ( x) =  where a, b are some constants. Then find
0, otherwise.
3
(a) a, b if E ( X ) = (b) Var ( X ) .
5
Solution: (a)

∫ (a + bx )dx
1 1
b 3 1 b
∫ f ( x ) dx = 1 ⇔ = 1 ⇔ ax + x |0 = 1 ⇔ a + =1
2

0 0
3 3

∫ x (a + bx )dx
1 1
a 2 b 4 1 a b 3
and E ( X ) = ∫ xf ( x ) dx = = x + x |0 = + =
2

0 0
2 4 2 4 5
3 6
Solve for the two equations, we have a = , b = .
5 5
3 6 2
 + x , 0 ≤ x ≤1
(b) f ( x) =  5 5
0, otherwise
Thus,
2
 3 
Var ( X ) = E [ X − E ( X ) ] = E (X ) − [E ( X )] = E (X ) − 
2 2

2 2

 5 
 3 6 2 
1 1
9 9
= ∫ f ( x ) dx − = ∫ x2 + x  dx −
2
x
0
25 0  5 5  25
1 6 9 1 6 9 2
= x 3
+ x 5
|10 − = + − =
5 25 25 5 25 25 25
Example 4: The probability density function for a continuous random variable X is

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 33


Probability and Statistics (Stat 2061) Lecture Note

 x + 2
 , − 2 < x < 4
f ( x ) =  18
 0 , otherwise

(
Then find (a) P( X < 1) (b) P X 2 < 9 (c) E ( X ) and Var ( X ) )
Solution:
1
1
x+2  x2 x  1 1  1 1 2
(a) P ( X < 1) = P ( − 1 < X < 1) = ∫−1 18 dx =  +  = + − − =
 36 9  −1  36 9   36 9 9
−2 3
x+2 x+2  x2 x
( )
3 3
< 9 = P (− 3 < X < 3 ) =
(b) 25
∫− 3 18 dx = ∫ 0 dx + ∫− 2 18 dx =  36 + 9  = 36
2
P X
−3 −2

(x + 2)
4
(c)  x2 x  x3 x2 
∫− 2  18 + 9  dx =  54 + 18  = 2 .
4 4
E (X )=  = ∫ x⋅ dx =
−2
18 −2

( x + 2 ) dx
4
 x3 x2   x4 x3 
( )= ∫ x
4 4
Since ⋅ = ∫  +  dx =  +  = 6,
2 2
E X
− 2
−2
18 18 9   72 27  − 2
Var ( X ) = E ( X −  ) = E X
2
( )− 
2 2
= 6 − 22 = 2 .
Exercise 1: suppose X is a continuous random variable with pdf
1 + x if − 1 ≤ x ≤ 0

f ( x) = 1 − x if 0 ≤ x > 1
0
 elswhere
Find mean and variance of X
Exercise 2: A random current I follows through a resistor with R = 50Ω . The probability
density function for the current is given as:
 2kx if 0 ≤ x ≤ 0.5

f ( x) =  2k (1 − x) if 0.5 ≤ x ≤ 1
0
 elswhere
a. Find the value of k which makes f(x) a valid probability density function
b. Find the expected value of the current I
c. Find the expected value of the power dissipated, P =I2R
d. Find the variance of the current random variable
8.3 Moment and moment generating function
Moments
Definition: The kth moment of a random variable X about its expectation or mean is
defined as
 = E ( X − E ( X ) ) , k = 0, 1, 2, 3, …
k

Remark:
If k = 0, then µ 0 = 1
If k = 1, then µ 1 = 0
If k = 2, then µ 2 =  x 2
The kth moment of a random variable X about origin (zero) is defined as  ′k = E ( X k
) , k = 0,
1, 2, 3, …

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 34


Probability and Statistics (Stat 2061) Lecture Note

Remark:
If k = 0, then µ 0 = 1
If k = 1, then µ 1 = E(x) = mean
If k = 2, then µ 2 = E(X2)
var(x) = E( x 2 ) − (E( x)) = 2 = 2′ − (1′ )
2 2

The moment generating function


Definition: The moment generating function of a random variable X is defined as
∑etx p( x) if x is discrete
( )
M x (t ) = E etx
x
=
∫x e f ( x)dx if x is continuous
tx

We call M x (t ) the moment generating function of x because all the moments of x can be
evaluated from M x (t ) by successively differentiating and evaluating at t = 0.
E ( x) = M ′x (t ) | t = 0
E ( x 2 ) = M ′x′(t ) | t = 0 
E ( x k ) = M [k ] x (t ) | t = 0
Example 1: Let X be a random variable with pdf
 − x if x > 0
f ( x) = e
 0 elsewhere
Find the moment generating function, the first two moments of X and find variance of X using
the moment generating function.
Solution:
∞ ∞ ∞
1
M x (t ) = E (e tx ) = ∫ e tx f ( x)dx = ∫ e tx e − x dx = ∫ e − x (1−t ) dx =
−∞ −∞ 0
1− t

 1 
M ′x (t ) | t = 0 = E( X ) =  −2
 | t = 0 = (1 − t ) | t = 0 = 1
1− t 

 1 
E ( x 2 ) = M ′x′(t ) | t = 0 =  −3
 t = 0 = 2(1 − t ) | t = 0 = 2
1− t 
var( x) = E ( x ) − ( E ( x)) 2 = 2 − 1 = 1
2

Exercise: Drive the moment generating function for the following distribution with pdf
e− x if x > 0
f ( x) = 
 0 elsewhere
Find the mean and variance of X
Properties of moment generating functions
1. If the moment generating function of a random variable exists, it uniquely
determines the distribution function of that random variable. i.e if M x (t ) = M x (t )
then X and Y are from the same distribution.
2. If Y = a+ bx, then M x (t ) = e ta M x (tb )
3. If X and Y are independent if Z = X + Y, then M z (t ) = M x (t ) + M y (t )

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 35


Probability and Statistics (Stat 2061) Lecture Note

4. Moment generating function cannot always used to find the mean and variance.
Chebyshev’s Inequality
Let X be a random variable with E(X) = c, c ∈ ℜ . Then if E ( X − c) 2 = is finite and  is any
possible number we have P{| X − c |≥  } ≤ 2 E ( X − c) 2
1

This is called a Chebyshev’s inequality.
By choosing the complement event P{| X − c |<  } > 1 − 2 E ( X − c) 2
1

 2
By choosing c = μ , then P{| X − c |≥  } ≤ 2

 2 1
By considering c = μ and  = k then P{| X − c |≥ k } ≤ = 2
k 2 k
= P{| X − c |≤ k } ≥ 1 − 2
1
k
Theorem: the probability that any a random variable X will assume a value within k standard
deviations of its or departs from its mean by less than k times standard deviation is at least
1
1− 2
k
Note this is regardless of the specific distribution and it is a lower bound not an exact equality.
8.4 Covariance and correlation coefficient
Correlation coefficient measures the degree of linear association between X and Y.
Definition: let (x, y) be a two dimensional random variables then the correlation coefficient
of X and Y is denoted by  xy is defined as
E{(X − E( X ) )(Y − E (Y ) )} cov( x, y ) E ( XY ) − E ( X ) E (Y )
 xy = = =
var( x) var( y ) var( x) var( y ) var( x) var( y )
Definition: Let (x, y) be a two dimensional random variable, then the covariance denoted
by cov(x, y) or  xy is defined as  xy = E {( X − E ( X ) )(Y − E (Y ) )}
Theorem 1:  xy = E ( XY ) − E ( X ) E (Y )
Theorem 2: If X and Y are independent, then  xy = 0, but the covariance is not always true.
 2 if 0 ≤ x ≤ y ≤ 1
Example 1: f ( x, y ) = 
0 otherwise
Find  xy and  xy
Solution:
Marginal of x Marginal of x
∞ 1 ∞ y
g ( x) = ∫
−∞
f ( x, y )dy = ∫ 2 xdy
x
h( y ) = ∫
−∞
f ( x, y )dx = ∫ 2 xdx
0

 2(1 − x) if 0 ≤ x ≤ 1 2y if 0 ≤ y ≤ 1
= =
0 otherwise 0 otherwise

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 36


Probability and Statistics (Stat 2061) Lecture Note

∞ ∞ ∞ ∞ 1 1
1
E( xy ) = ∫
− ∞− ∞
∫ xyf ( x, y)dxdy = ∫ ∫ 2 xydxdy = ∫ ∫ 2 xydxdy =
− ∞− ∞ 0 x
4
1 1 1 1
1 2
E( X ) = ∫ xg ( x)dx = ∫ 2(1 − x)dx = and E(Y ) = ∫ yh( y )dy = ∫ 2 ydy = ,
0 0
3 0 0
3
1 1
E( X 2 ) = , E(Y 2 ) =
6 2
1 1
var( x ) = E( X 2 ) − ( E ( x )) 2 = and var( y ) = E(Y 2 ) − ( E ( y )) 2 =
18 18
1 1 2 1
 xy = E ( XY ) − E ( X ) E (Y ) = − * =
4 3 3 36
1
cov( x, y ) 36 1
 xy = = =
var( x) var( y ) 1 1 2
*
18 18
 2 (1 − x)if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
Exercise 1: Let f ( x, y ) = 
0 otherwise
i. Find  xy and  xy
ii. Show that X and Y are independent
6
 [ ]
Exercise 2: Let f ( x, y ) =  5 1 − ( x − y ) if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
2

 0 otherwise
Compute  xy and  xy
Exercise 3: Let X and Y denote voltages at two points in a circuit. The joint density function is
 x
(1 + 3 y 2 )if 0 ≤ x ≤ 2 , 0 ≤ y ≤ 1
given as: f ( x, y ) =  4
0 otherwise
i. Compute the mean for X and Y
ii. Compute  xy and  xy
iii. Let Z = x/y (the ratio of the voltages) compute E(Z)
Properties of  xy
 2 xy ≤ 1, − 1 ≤  xy ≤ 1
 xy = ±1 indicates high degree of linearity between X and Y
ac
Let V = ax + b and W = cy + d, then  vw =  xy , where a ≠ 0, c ≠ 0
| ac |
Conditional expectations
Definition: If (x, y) is a two dimensional random variable, then we defines the conditional
expectation of x for a given Y = y as:
∑ xp( x | y ) if x is discrete
E ( X | Y = y) = 
 ∫ xg ( x | y )dy if x is continuous

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 37


Probability and Statistics (Stat 2061) Lecture Note

p ( X = x, Y = y )
Where p ( X | Y = y ) =
p (Y = y )

CHAPTER 9
DISCRETE AND CONTINUOUS DENSITY FUNCTIONS
9.1 Common Discrete Probability Distributions
1. Binomial Distribution
A discrete random variable X is said to have a binomial distribution if x satisfies the following
conditions:
• An experiment is repeated for a fixed number of identical trials n.
• All trials of the experiment are independent from one another.
• All possible outcomes for each trial of the experiment can be divided into two
complementary events called “success” and “failure”.
• The probability of success has a constant value of p for every trial and the probability of
failure has a constant value of q for every trial, where q = 1 − p .
• The random variable x counts the number of trials on which success occurred.
• The trials are independent, thus we must sample with replacement.
Example: Fourteen percent of flights from Bole International Airport are delayed. If 20 flights
are chosen at random, then we can consider each flight to be an independent trial. If we define a
successful trial to be that a flight takes off on time, then the random variable X representing the
number of on-time flights will be binomially distributed with n = 20 , p = .86 , and q = .14 .
Definition: The outcomes of the binomial experiment and the corresponding probabilities
of these outcomes are called Binomial Distribution.
If X is a binomial random variable with n trials, probability of success p, and probability of
failure q, then by the fundamental counting principle, the probability of any outcome in which
there are x successes (and therefore n − x failures) is p x q n − x .
To count the number of outcomes with x successes and n − x failures, we observe that the x
successes could occur on any x of the n trials. The number of ways of choosing x trials out of n
is n C x , so the probability of getting x successes in n trials becomes:
n
P ( X = x ) =   p x q n − x , x = 0 ,1, 2 ,...., n
 x
and written as: X ~ Bin (n, p)
Example 1: A student takes a 10 question multiple-choice quiz and guesses each answer. For
each question, there are 4 possible answers, only one of which is correct. If we consider
“success” to be getting a question right and let the random variable X represent the number
correct answers on the multiple-choice quiz. Then what is the probability of a student guessing 3
answers correctly?
X ~ Bin ( n = 10, p = 0.25) ⇒ P ( X = x ) =   p x q n − x ,
n
x = 0 ,1, 2 ,3, 4 ,5 ,  , 10 .
 x

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 38


Probability and Statistics (Stat 2061) Lecture Note

3 7
 1   3  1 2187 while the probability of guessing seven
P (3) = 10 C 3     = 120 × × ≈ .2 5
 4   4  64 16384
7 3

answers correctly is; P (7 ) = 1 3 1 27


10 C 7     = 120 × × ≈ .0 0 3
4 4 16384 64
Example 2: An allergist claims that 45% of the patients she tests are allergic to some type
of weed. What is the probability that
(a) Exactly 3 of her next 4 patients are allergic to weeds?
 4
p( x = 3) =  0.453 * 0.55 = 0.20
 3
(b) None of her next 4 patients are allergic to weeds?
 4
p( x = 0) =  0.45 0 * 0.55 4 = 0.0915.
 0
Remark: In many cases, we are interested in the mean and standard deviation of a binomial
random variable. If x is a binomial random variable with n trials, probability of success p and
probability of failure q, then the mean and standard deviation of x can be calculated by the
following:
E ( X ) = np , Var ( X ) = npq, and  ( x ) = npq
Note: A binomial distribution is symmetric if p = q , left skewed if p > q and right skewed if
p<q.
2. Poisson Distribution
1. A random variable X is said to have a Poisson distribution if its probability distribution
is given by:


 x e −
P( X = x) =  x! , x = 0,1,2,..... Where λ = the average number
0 elsewhere
Assumptions of the Poisson distribution
• Homogeneity assumption: Events occur at a constant rate λ such that on average for
any length of time t we would expect to see λt events.
• Independence assumption: For any two non-overlapping intervals the number of
observed events is independent.
• If the interval is very small, then the probability of observing two or more events in
that interval is essentially zero.
- The Poisson distribution depends only on the average number of occurrences per unit time
of space.
Poisson distribution is applicable when n is very large and p is very small. Hence some of the
applications of Poisson distribution are as follows:
- Number of faulty blades produced by a reputed firm
- Number of deaths from a disease such as heart attack or cancer.
- Number of telephone calls received at a particular telephone exchange.

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 39


Probability and Statistics (Stat 2061) Lecture Note

- Number of cars passing a crossing per minute.


- Number of printing mistake in a page of a book.
- Hereditary.
- The process that gives rise to such events are called Poisson process.
Example 1: If 1.6 accidents can be expected an intersection on any given day, what is the
probability that there will be 3 accidents on any given day?
Solution; Let X =the number of accidents,  = 1.6
1.6 x e −1.6
X = poisson(1.6 ) ⇒ p( X = x ) =
x!
3 −1.6
p ( X = 3) =
1 .6 e
= 0.1380
3!
Note: If X is a Poisson random variable with parameters  then E (X ) =  , Var(X ) = 
Note: The Poisson probability distribution provides a close approximation to the binomial
probability distribution when n is large and p is quite small or quite large with  = np .
(np ) x e − ( np )
P( X = x) = , x = 0,1,2,...... Where λ = np = the average number.
x!
Usually we use this approximation if np ≤ 5 . In other words, if n > 20 and np ≤ 5 [or
n(1 − p) ≤ 5 ], then we may use Poisson distribution as an approximation to binomial
distribution.
3. Geometric Distribution
Characteristics of a Geometric distribution
 Each observation (or trial) has two categories: success or failure.
 The observations are all independent.
 The probability of success (p) is the same for each trial.
 We wish to find the number of trials needed to obtain the first success.
What are the Differences between the Geometric and the Binomial Distributions?
• The most obvious difference is that the Geometric Distribution does not have a set
number of observations, n.
• The second most obvious difference is the question being asked:
o Binomial: Asks for the probability of a certain number of successes.
o Geometric: Asks for the probability of the first success.
Geometric distribution is used to model the number of Bernoulli trials needed until the first
Success occurs (P(S)=p)
– First Success on Trial 1 ⇒ S, y = 1 ⇒ p(1)=p
– First Success on Trial 2 ⇒ FS, y = 2 ⇒ p(2)=(1-p)p
– First Success on Trial k ⇒ F…FS, y = k ⇒ p(k)=(1-p)k-1 p

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 40


Probability and Statistics (Stat 2061) Lecture Note

p ( y ) = (1 − p ) y −1 p y = 1,2,...
∞ ∞ ∞

∑ p( y) = ∑ (1 − p) y −1 p = p∑ (1 − p) y −1
y =1 y =1 y =1

Setting y = y − 1 and noting that y = 1,2,... ⇒ y * = 0,1,...


*

∞ ∞
 1  p
⇒ ∑ p ( y ) = p ∑ (1 − p ) y* = p   = =1
y =1 y *= 0 1 − (1 − p )  p
Let X be the number of trials required until the first success is obtained then
(1 − p) x −1 p x = 1,2,...
p( X = x) =  is called geometric probability distribution, and X has
0 otherwise
geometric distribution with probability of success P.
Example 1: Someone is trying to take the road test to get a driver’s license. If the probability of
passing the test is 40%.
a. What is the probability that this person will pass the test at second shot?
b. What is the probability that someone will pass the road test in 5 trials?
c. Given someone has taken the test 4 times and still has not got the license, what is that
person’s chance of passing the next time?
Solution: Let X denote number of trials needed to get driving license
X = Geo( p = 0.4 ) ⇒ p( X = x ) = 0.4 * 0.6 x −1
a), p ( x = 2) = 0.4 * 0.6 2 = 0.144
b), p ( x = 5) = 0.4 * 0.6 5 = 0.0311
c), p ( x > 4) = 1 − p ( x ≤ 4) = 1 − { p ( x = 1) + p ( x = 2) + p ( x = 3) + p ( x = 4)} = 1 − 0.522 = 0.478
9.2 Common Continuous Probability Distributions
1. Uniform distribution
Let X be a continuous random variable whose distribution function is constant on an interval, say
a ≤ x ≤ b and 0 elsewhere, i.e.
 1
a ≤ x≤b
f(x) =  b − a
0
 elsewhere
Such a random variable is uniformly distributed on [a, b] abbreviated X ~ Uniform (a, b). If
we take a measurement of X, we are equally likely to obtain any value within the interval.
d −c
d
Hence, for some sub interval ( c, d ) ⊆ ( a, b ) , we have P ( c ≤ x ≤ d ) = ∫
1
dx = .
c
b−a b−a
b 1
For this distribution to be a probability distribution we require
∫a dx = 1
b−a
The cumulative distribution function is given as

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 41


Probability and Statistics (Stat 2061) Lecture Note

 0 −∞ < x< a
 x 1 x−a
FX (x ) = ∫ dt = a≤ x≤b
a b − a b −a
 1 x ≥ b
+∞ b
1  x2  a+b ,
b
x
So the mean of the uniform distribution is  = ∫ xf ( x ) dx = ∫ dx =   =
−∞ a
b−a b − a  2 a 2
the midpoint of the interval (a, b).
+∞
1
b
b3 − a3 ( b − a ) ( b 2 + ab + a 2 )
And E  X 2  = ∫ x f ( x ) dx =
2
∫ x dx =
2
= .
−∞
b−a a
3 (b − a ) 3 (b − a )
( b − a ) , and the
2
b 2 + ab + a 2 b 2 − 2ab + a 2
Then the variance is  2
= E  X  − 
2 2
= − =
3 4 12
standard deviation is  = b − a .
2 3
Uniform random variables are the correct model for choosing a number “at random” from an
interval. They are also natural choices for experiments in which some event is “equally likely” to
happen at any time or place within some interval.

Note: the longer the interval (a, b), the larger the values of the variance and standard deviation.
Example 1: Suppose a point is chosen at random on a line segment [0, 2] what is the
probability that a chosen point lies between 1 and 1.5. Assuming that x is uniform on [0, 2].
1
Solution: f(x) =  2 if 0 ≤ x ≤ 2
0
 elsewhere
1 .5 1 .5
1 1
p(1 ≤ x ≤ 1.5) = ∫ f ( x)dx = ∫ 2 dx = 4
1 1

Exercise: Suppose that the random variable x has possible values between 0 and 100.
Assuming x has uniform distribution between 0 and 100.
Find (a). E(X) (b). p ( x ≥ 50) (c). p ( 25 ≤ x ≤ 75)
2. The Exponential Distribution
Let λ be a positive real number. We write X~exponential(λ) and say that X is an exponential
random variable with parameter λ if the pdf of X is
  e −  x , if x ≥ 0
f (x) = 
 0 , o th e rw is e 1
If X~exponential(λ), then the expected value and variance of X are:  T1 = E[ X ] = ;

1
Var[ X ] =  X2 = .
2
The cumulative distribution function is

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 42


Probability and Statistics (Stat 2061) Lecture Note

x
F (x) = P[ X ≤ x ] = ∫ f (t ) dt [
= 0 + − e − t ] x

0
= 1 − e −t ( x ≥ 0)
−∞
− t
⇒ P[ X > x ] = e ( x ≥ 0)
− x
Note: S ( x ) = P ( X ≥ x ) = 1 − P ( X < x ) = e is called the survival function.
Example 1: The random quantity X follows an exponential distribution with parameter  =
0.25. Find mean, standard deviation and P[X > 4].
1
= 4, p[X > 4] = e = e 4 = e −1 = .367879... ≈ .368
1 1 − *4
− x
Solution:  =  = =
 0.25
Note: The exponential random variable can be used to describe the life time of a machine,
industrial product and Human being. Also, it can be used to describe the waiting time of a
customer for some service.
Example 2: Let X represent the life time of a washing machine. Suppose the average lifetime for
this type of washing machine is 15 years. What is the probability that this washing machine can
be used for less than 6 years? Also, what is the probability that this washing machine can be used
for more than 18 years?
Solution: X has the exponential density function with  = 1 . Then,
15
−6 − 18
P ( X ≤ 6) = 1 − e = 0.3297 and P ( X ≥ 18 ) = e
15
= 0 . 3012 . Thus, for this
15

washing machine, it is about 30% chance that it can be used for quite a long time or a short time.

3. Normal Distribution
A random variable X is said to have a normal distribution if its probability density function
is given by
1  x−  2
1 −  
f ( x) = e 2  
, − ∞ < x < ∞, − ∞ <  < ∞,  > 0
 2
Here µ and σ are the parameters of the distribution; µ = E(X), the mean of the random variable X
(or of the probability distribution); and σ = the standard deviation of X.
Properties of Normal Distribution:
1. It is bell shaped and is symmetrical about its mean and it is mesokurtic. The maximum
1
ordinate is at x =  and is given by f ( x) =
 2
2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.
3. It is a continuous distribution.
4. The inflection points are at µ - σ and µ + σ.
5. It is a family of curves, i.e., every unique pair of mean and standard deviation defines a
different normal distribution. Thus, the normal distribution is completely described by two
parameters: mean and standard deviation.

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 43


Probability and Statistics (Stat 2061) Lecture Note

6. Total area under the curve sums to 1, i.e., the area of the distribution on each side of the

mean is 0.5. ⇒ ∫ f ( x)dx = 1
−∞

7. It is unimodal, i.e., values mound up only in the center of the curve.


8. Mean = Median = mod e = 
9. The probability that a random variable will have a value between any two points is equal to
the area under the curve between those points.
Note: To facilitate the use of normal distribution, the following distribution known as the
standard normal distribution was derived by using the transformation
1
X −  1 − 2 z2
Z = ⇒ f ( z) =
e
 2
Properties of the Standard Normal Distribution:
Same as a normal distribution, but also...
• Mean is zero
• Variance is one
• Standard Deviation is one
- Areas under the standard normal distribution curve have been tabulated in various ways.
The most common ones are the areas between Z = 0 and a positive value of Z .
- Given a normal distributed random variable X with Mean  and s tan dard deviation 
a− X − b−
P (a < X < b) = P ( < < )
  
⇒ P (a < X < b) = P(
a − 
< Z <
b − 
)
 
Note: P ( a < X < b ) = P ( a ≤ X < b ) = P ( a < X ≤ b ) = P (a ≤ X ≤ b)
Examples:
1. Find the area under the standard normal distribution which lies
1. Between Z = 0 and Z = 0.96
Solution: Area = P(0 < Z < 0.96) = 0.3315
Between Z = −1.45 and Z = 0
Solution: Area = P ( − 1 .45 < Z < 0 ) = P (0 < Z < 1 .45 ) = 0 .4265
2. To the right of Z = −0.35
Solution:
Area = P ( Z > − 0 .35 ) = P ( − 0 .35 < Z < 0 ) + P ( Z > 0 )
= P ( 0 < Z < 0 .35 ) + P ( Z > 0 ) = 0 .1368 + 0 .50 = 0 .6368
3. To the left of Z = −0.35
Solution: Area = P( Z < −0.35) = 1 − P( Z > −0.35) = 1 − 0.6368 = 0.3632
4. Between Z = −0.67 and Z = 0.75
Solution:

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 44


Probability and Statistics (Stat 2061) Lecture Note

Area = P ( − 0 . 67 < Z < 0 . 75 ) = P ( − 0 . 67 < Z < 0 ) + P ( 0 < Z < 0 . 75 )


= P ( 0 < Z < 0 . 67 ) + P ( 0 < Z < 0 . 75 ) = 0 . 2486 + 0 . 2734 = 0 . 5220
5. Between Z = 0.25 and Z = 1.25
Solution:
Area = P(0.25 < Z < 1.25) = P(0 < Z < 1.25) − P(0 < Z < 0.25)
= 0.3934 − 0.0987 = 0.2957
2. Find the value of Z if
1. The normal curve area between 0 and z(positive) is 0.4726
Solution
P(0 < Z < z ) = 0.4726 and from table P(0 < Z < 1.92) = 0.4726
⇔ z = 1.92.....uniqueness of Areea.
2. The area to the left of z is 0.9868
Solution
P ( Z < z ) = 0 . 9868 = P ( Z < 0 ) + P ( 0 < Z < z ) = 0 . 50 + P ( 0 < Z < z )
⇒ P ( 0 < Z < z ) = 0 . 9868 − 0 . 50 = 0 . 4868 and from table P(0 < Z < 2.2) = 0.4868
⇔ z = 2 .2
3. A random variable X has a normal distribution with mean 80 and standard deviation 4.8.
What is the probability that it will take a value:
a) Less than 87.2
b) Greater than 76.4
c) Between 81.2 and 86.0
Solution: X is normal with mean,  = 80, s tan dard deviation,  = 4.8
X −  87 . 2 −  87 . 2 − 80
a) P ( X < 87 . 2 ) = P (

<

) = P (Z <
4 .8
)

= P ( Z < 1 . 5 ) = P ( Z < 0 ) + P ( 0 < Z < 1 . 5 ) = 0 . 50 + 0 . 4332 = 0 . 9332

X −  76 . 4 −  76 . 4 − 80
b) P ( X > 76 . 4 ) = P (

>

) = P (Z >
4 .8
)

= P ( Z > − 0 . 75 ) = P ( Z > 0 ) + P ( 0 < Z < 0 . 75 ) = 0 . 50 + 0 . 2734 = 0 . 7734


81 . 2 −  X − 86 . 0 −  81 . 2 − 80 86 . 0 − 80
P ( 81 . 2 < X < 86 . 0 ) = P ( < < ) = P( <Z < )
   4 .8 4 .8
c) = P ( 0 . 25 < Z < 1 . 25 ) = P ( 0 < Z < 1 . 25 ) − P ( 0 < Z < 1 . 25 )
= 0 . 3934 − 0 . 0987 = 0 . 2957
4. A normal distribution has mean 62.4.Find its standard deviation if 20.0% of the area under
the normal curve lies to the right of 72.9
Solution

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 45


Probability and Statistics (Stat 2061) Lecture Note

X −  72 . 9 − 
P ( X > 72 . 9 ) = 0 . 2005 ⇒ P ( > ) = 0 . 2005
 
72 . 9 − 62 . 4
⇒ P (Z > ) = 0 . 2005

10 . 5
⇒ P (Z > ) = 0 . 2005

10 . 5
⇒ P (0 < Z < ) = 0 . 50 − 0 . 2005 = 0 . 2995 And from table

10 . 5
P ( 0 < Z < 0 . 84 ) = 0 . 2995 ⇔ = 0 . 84

⇒  = 12 . 5

5. A random variable has a normal distribution with  = 5 .Find its mean if the probability
that the random variable will assume a value less than 52.5 is 0.6915.
Solution
52.5 − 
P( Z < z ) = P( Z < ) = 0.6915
5
⇒ P(0 < Z < z ) = 0.6915 − 0.50 = 0.1915. But from the table
⇒ P(0 < Z < 0.5) = 0.1915
52.5 − 
⇔z= = 0 .5 ⇒  = 50
5
Exercise:
1. A city installs 2000 electric lamps for street lighting. These lamps have a mean burning
life of 1000 hours with a standard deviation of 200 hours. The normal distribution is a
close approximation to this case.
a. What is the probability that a lamp will fail in the first 700 burning hours?
b. What is the probability that a lamp will fail between 900 and 1300 burning hours?
c. How many lamps are expected to fail between 900 and 1300 burning hours?
d. What is the probability that a lamp will burn for exactly 900 hours?
e. What is the probability that a lamp will burn between 899 hours and 901 hours before
it fails?
f. After how many burning hours would we expect 10% of the lamps to be left?
g. After how many burning hours would we expect 90% of the lamps to be left?
2. Bolts coming off a production line are normally distributed with a mean length of 2.4 cm and
standard deviation 0.1 cm. What is the probability that a bolt chosen at random will be of length
between 2.6 cm and 2.7 cm?
3. A robot eye rotates at a constant rate and its field of vision is 45°. Suppose 50 like objects,
recognizable to the robot, are randomly and independently placed momentarily in the area
surveyed by the eye. If X counts the number of objects seen by the eye then what is the
probability that the number of observed objects will be within 2 standard deviations of the mean.

Birhanu Teshome. Mekelle University, Department of Statistics (November, 2017) 46

You might also like