introduction to probability edited
introduction to probability edited
CHAPTER 3
INTRODUCTION TO PROBABILITY
3.1 Introduction
• Probability theory is the foundation upon which the logic of inference is built.
• It helps us to cope up with uncertainty.
• In general, probability is the chance of an outcome of an experiment. It is the measure of
how likely an outcome is to occur.
Deterministic and nondeterministic models
1. Deterministic models: it is a model which stimulates that the condition under
which is performed determines the outcome of that experiment. For specified values of an
experiment’s input parameters, the result is a known constant.
• Example: Ohm’s law, V = IR,
2. Nondeterministic models: Also called probabilistic models or stochastic models. A
model in which a chance plays an important role in determination of the outcome.
Example: the number of α particle emitted from a piece of radioactive material in one
minute.
Definitions of some probability terms
1. Experiment: Any process of observation or measurement or any process which
generates well defined outcome.
2. Probability Experiment: It is an experiment that can be repeated any number of
times under similar conditions and it is possible to enumerate the total
number of outcomes without predicting an individual out come. It is also called
random experiment.
Example: If a fair die is rolled once it is possible to list all the possible outcomes i.e.1, 2,
3, 4, 5, 6 but it is not possible to predict which outcome will occur.
3. Outcome :The result of a single trial of a random experiment
4. Sample Space: Set of all possible outcomes of a probability experiment,
usually denoted by S.
Sample space can be
Countable (finite or infinite)
Uncountable.
5. Event: It is a subset of sample space. It is a statement about one or more outcomes of
a random experiment .They are denoted by capital letters. In other words it is a
set of outcomes but not necessarily all of them.
An event can be:
Simple event or elementary event: an event having only a single element or sample
point.
Compound or composite event: an event which contains more than one element.
Impossible or null event: it is equal with empty set.
Sure event: it equal with sample space.
Example: Considering the above experiment let A be the event of odd numbers, B be the
event of even numbers, and C be the event of number 8.
⇒ A = {1,3,5}
B = {2,4,6}
C = { } or empty space or impossible event
Remark: If S (sample space) has n members then there are exactly 2n subsets or events.
6. Equally Likely Events: Events which have the same chance of occurring.
7. Complement of an Event: the complement of an event A means non-occurrence
of A and is denoted by A ' , or A c , or A contains those points of the sample space
which don’t belong to A.
8. Mutually Exclusive Events: Two events which cannot happen at the same time.
9. Independent Events: Two events are independent if the occurrence of one does
not affect the probability of the other occurring.
10. Dependent Events: Two events are dependent if the first event affects the outcome
or occurrence of the second event in a way the probability is changed.
11. Finite sample space: it is a sample space which consists of a finite number of
elements. Suppose that S = {x1, x2, …, xn} where xi’s are possible outcomes of an
experiment, then S is a finite sample space.
Definition: To every point in the sample space, we assign a real number between 0 and 1
called its probability, satisfying the following conditions.
Let Pi be probability of ith element of the sample space.
• P i ≥ 0 ∀i
n
• ∑P
i =1
i = 1 , the sum of the probabilities of all points in the sample space is one.
Example: Suppose that only three outcomes are possible in an experiment say they are a 1,
a2 and a3. Suppose further more that a1 is twice probable as a2 which is again twice
probable as a3. Find the probability of ai’s i = 1, 2, 3.
3
Solution: Let p1, p2, and p3 be their respective probabilities then ∑P
i =1
i =1
3
1
p 1 = 2 p 2 and p 2 = 2 p 3 . Hence ∑P
i =1
i = p 1 + p 2 + p 3 = p3 + 2 p3 + 4 p 3 = 1 ⇒ 7 p3 = 1 ⇔ p3 =
7
2 4
Therefore p 2 = 2 p 3 = and p 1 = 2 p 2 =
7 7
Equally likely outcomes
It is most common assumptions of the finite sample space which states that two outcomes
are said to be equally likely outcomes if either of them are not expected to occur in
preference to the others.
If we have ‘n’ possible outcomes for an experiment and suppose that all the outcomes are
1
equally likely, then the probability of the outcome is Pi = .
n
∑p
i =1
i = 1 , where pi is their respective probabilities.
3. The number of permutations of n objects in which k1 are alike k2 are alike … etc is
n!
n Pr =
k1!*k 2 * ... * k n
Remark: A set of n distinct objects can be arranged in a circle in (n-1)! ways.
Examples:
1. Suppose we have a letters A,B, C, D
a) How many permutations are there taking all the four?
b) How many permutations are there two letters at a time?
2. How many different permutations can be made from the letters in the word “CORRECTION”?
Solutions:
30 50
*
n( A) 10 0
⇒ P ( A) = = = 0.00001825
n( S ) 80
10
b) Let A be the event that 6 will be non defective.
30 50
Total way in which A occur = * = N A = n( A)
4 6
30 50
*
n ( A ) 4 6
⇒ P ( A) = = = 0 .265
n(S ) 80
10
c) Let A be the event that all will be non defective.
30 50
Total way in which A occur = * = N A = n ( A )
0 10
30 50
*
n( A) 0 10
⇒ P ( A) = = = 0.00624
n( S ) 80
10
Exercise 1: What is the probability that a waitress will refuse to serve alcoholic beverages
to only three minors if she randomly checks the I.D’s of five students from among ten
students of which four are not of legal age?
Frequentist Approach
If after n repetitions of an experiment where n is very large, an event A is observed to occur
k of these. Then the probability of an event A is the proportion of outcomes that are
favourable to event A in the long run when an experiment is repeated a large number of
times, defined as: P ( A ) = lim k
n→ ∞ n
Note: the frequentist approach for the definition of probability holds under the case where
we have large n ( → ∞)
Axiomatic Approach:
Let E be a random experiment and S be a sample space associated with E. For any event A,
defined on S we associated a real number P(A), called the probability of the event A
satisfying the following requirements called axioms of probability or rules of probability.
1. 0 ≤ P( A) ≤ 1
2. P( S ) = 1, S is the sure event.
3. If A and B are mutually exclusive events, the probability that one or the other occur equals
the sum of the two probabilities. i. e. P ( A ∪ B ) = P ( A ) + P ( B )
AUB AnB A
Examples:
1. Three companies A, B and C provide cell phone coverage in the rural area. For a
randomly chosen location in this area, the probability of coverage for the first two
companies are P(A) = 0.8, P(B) = 0.75. we also know P(AuB) = 0.9 and P(BnC) = 0.45.
a. What is the probability of not having coverage from company A?
b. What is the probability of having coverage both from company A and B?
c. Company A claims it has better coverage than company C. Can you verify this?
d. If you own two cell phones; one from company B and one from company C. what is
your worst coverage?
Solution:
(a), ( ′) = 1 − ( ) = 1 − 0.8 = 0.2.
(b), ( )= ( )+ ( )− ( ) = 0.8 + 0.75 − 0.9 = 0.65.
(c), Lets find the maximum possible value for P(C). The only information we have relating
to C is ( ) = 0.45.
( )− ( ) = 0.75 − 0.45 = 0.3, then since C is the complement of this
max( ( )) = 1 − 0.3 = 0.7. Hence company A’s claim is true.
(d), This question is asking the minimum possible value for P(BUC). This happens if C is a
subset of B. Also ( ) = ( )+ ( )− ( ). What is ( ) = ( ) in this case
so ( ) = ( ) + ( ) – ( ). Hence min( ( )) = ( ) = 0.75.
2. Suppose that A and B are two events for which P(A) = a, P(B) = b and P(AnB) = c. then
find:
i. ( ’ )
ii. ( ’ ’)
Solution:
(i), ( )= ( )+ ( )− ( ) = 1− ( )+ ( )−( ( )− ( ))
= 1− ( )+ ( )− ( )+ ( )=1− ( )+ ( )=1− +
(ii), ( ) = (( ))=1− ( )= 1−( ( )+ ( )− ( ))
=1− − + .
Exercise:
1. If ( ) = 0.75, ( ) = 0.25 then can we say that A and B are mutually exclusive?
2. Let ( ’) = , ( ) = then show that ( )≥1− − .
CHAPTER 4
CONDITIONAL PROBABILITY AND INDEPENDENCY
4.1 Conditional probability
Recall that when we use the symbol P(A) for probability of event A we really mean that
probability of event A with respect to the sample space S. Suppose that we have some
additional information that the outcomes of a trials is contained in a subject of a sample
space say B with P(B) ≠ 0. The resulting probability is called a conditional probability.
Conditional Events: If the occurrence of one event has an effect on the next occurrence of
the other event then the two events are conditional or dependant events.
Example: Suppose we have two red and three white balls in a bag
1. Draw a ball with replacement
2
Let A= the event that the first draw is red p ( A) =
5
2
B= the event that the second draw is red p ( B ) = A and B are independent.
5
2. Draw a ball without replacement
2
Let A= the event that the first draw is red p ( A) =
5
B= the event that the second draw is red p( B) = ? This is conditional. To
determine P(B) we need some information about P(A).
Definition: Let A and B be two events in the sample space S with P(B) ≠0. The probability of
an event A occurs, given that event B has already occurred is called, conditional probability of
P( A ∩ B)
A given that B, denoted by p( A B) is defined as P ( A | B ) = , p( B) ≠ 0 .
P( B)
Note:
1. P(A|B) satisfies the various axioms of probability
i. 0≤ P(A|B) ≤ 1
ii. P(S|B) = 1
iii. P ( B1 ∪ B 2 | B ) = p ( B | B ) + p ( B 2 | B ) , provided that that P ( B1 ∩ B 2) =
iv. For any sequence of events B1, B2, …, where P ( Bj ∩ Bi ) = , ∀ i ≠ j
n ∞
P Bi | B = ∑ p ( Bi | B )
i =1 i =1
2. If B = S then P(A|B) = P(A)
p ( AnB) p ( AnB) p ( AnB)
Proof: p ( A | B) = = = = p ( A),because AnS = A.
p( B) p( S ) 1
Note: P(A|B) ≠P(B|A)
Theorem 1: p( A B) + p( A′ B) = 1
Poof: Using definition of conditional probability we have:
p ( AnB) p ( A′nB) p ( AnB) + p ( A′nB)
p ( A B) + p ( B | A) = + = , but since A and A′ are form a
p( B) p( B) p( B)
partition of S we have,
p ( B ) = p ( AnB ) + p ( A′nB ) ⇒ p ( A | B ) + p ( B | A) = 1
i. p( A ' B) = 1 − p( A B)
ii. p ( B ' A) = 1 − p ( B A)
Then
p ( A) = p(AnB 1 UAnB 2 U UAnBm) = p ( AnB1) + p ( AnB 2) + + p ( AnBk )
= p ( A | B1) p ( B1) + P ( A | B 2) p ( B 2) + + p ( A | Bk ) p ( Bk )
Events (AnB’), (BnA’), A∩B, and (A∪B) form a partition of S, because they are mutually
exclusive and exhaustive.
Therefore combining the Bayes Rule and the Law of Total Probability,
P (C | A ∩ B ) P ( AnB )
P ( AnB | C ) =
P (C )
where
P {C} = P{C|AnB’}*P{AnB’} + P{C|BnA’}*P{BnA’} + P{C|A∩B}*P{A∩B} + P{C|A’nB’}*P{A’nB’}.
0.9 * 0,08
Then P ( AnB | C ) = = 0.1856
0.5 * 0.12 + 0.8 * 0.32 + 0.9 * 0.08 + 0
Exercise 1: In a manufacturing plant three machines B1, B2 and B3 make 30%, 30% and
40% of the products respectively. It is also known that some of these are defective products
(D). It is known that: P(D|B1) = 0.1, P(D|B2) = 0.4, P(D|B3) = 0.07.
Find (a). P(D) and (b). P(B1|D)
Exercise 2: Let A and B be independent events with P(A) = 0.25 and P(A ∪ B) = 2P(B) −
P(A). Then find (a). P(B); (b). P(A|B); and (c). P(B’|A).
Exercise 3: let P(A) = 0.6, P(AuB) = 0.9, then find P(B) such that:
i. A and B are mutually exclusive
ii. A and B are independent
CHAPTER 5
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
Definition: Given a random experiment with sample space S, a function X that assigns to
each element s in S one and only one real number X(s) = x is called a random variable
(r.v.). A random variable is a numerical description of the outcomes of the experiment or
a numerical valued function defined on sample space, usually denoted by capital letters.
Example 1: Let X be the number of defective products when three products are tested.
Let sample space S = {DDD, DDN, DND, DNN, NDD, NDN, NND, NNN}.
Let D = defective and N = non defective item
Let X be a function defined on S such that X(DDD) = 3, X(DDN) = X(DND) = X(NDD) = 2,
X(DNN) = X(NDN) = X(NND) = 1 and X(NNN) = 0. The function X assigns a real numbers to
each element s in S. Thus X is a random variable.
Rx = {0, 1, 2, 3} is called the range space of X.
For notational purposes we shall denote the event {s: s S and X(s)= a} by {X = a} and we
denote:
P(X = a) = P({s: s S and X(s)= a}) and P(a<X<b) = P({s: s S and a < X(s)< b})
Then X is a discrete random variable, which takes on the values 0, 1, 2, 3.
Example 2: Let x be the random variable denoting the number of bits equal to 1 in an 8 bit
random binary number. Find the sample space and construct the probability distribution of
x. what is the probability that at most 3 bits ON? Exercise!
Solution: The sample space 28 = 256 possible binary numbers. Each are equally likely. The
random variable can have values between 0 and 8. Lets compute probabilities.
1
P ( x = 0) = ,
256
8c1 8
P ( x = 1) = =
256 256 8cx
if x = 0,1, 2, , 8
P ( x) = 256
8ck 0 otherwise
P( x = k ) =
256
8c8 1
P ( x = 8) = =
256 256
3 3
8ck 93
P ( x ≤ 3) = ∑ P ( x = k ) = p ( x = 0) + p ( x = 1) + p ( x = 2) + p ( x = 3) = ∑ = = 0.36
k =0 k = 0 256 256
F(x) = P ( X < x) = ∑ P ( x) .
t≤ x
Notice: If X takes on a finite number of values x1, x2, …, xn then its cumulative distribution
function is given by
0 ifX < x1
P ( x1) if x1 ≤ x < x 2
P ( x1) + P ( x 2) if x 2 ≤ x < x 3
F ( x) =
P ( x1) + P ( x 2) + p ( x 3) if x 3 ≤ x < x 4
1 if x ≥ xn
If X is a random variable of the discrete type, then F(x) is a step function, and the height of a
step at x, xR, equals the probability P(X = x).
o The function F(x) of a continuous random variable X with density function f(x)
where F ( x ) = P ( X ≤ x ) = ∫
x
f (t ) dt , is called the cumulative distribution function
−∞
∂F ( x)
From the definition F(x) we have f ( x) =
∂x
Example 1: Consider the example on the number of defective products when 3 products
are tested. Construct the pmf of X
Solution: x 0 1 2 3
F(x) 1/8 4/8 7/8 1
0 if x < 0
1
if 0 ≤ x < 1
8
1
F ( x ) = if 1 ≤ x < 2
2
1
7 if 2 ≤ x < 3
1 if x ≥ 3
Example 2: A random variable x has the probability density function
x2
if − 1 ≤ x ≤ 2.
f ( x) = 3
0 other wise
Find CDF of x and using cdf find the probability x is between 0 and 1
x
Solution: F ( x) = ∫ f (t )dt
−∞
x
Case 1: x < −1, F ( x) = ∫ 0dt = 0
−∞
−1
x3 + 1
x
t2
Case 2: − 1 ≤ x < 2, F ( x) = ∫ 0dt + ∫ dt =
−∞ −1
3 9
−1 2 x
t2
Case 3: − 1 ≤ x < 2, F ( x) = ∫ 0dt + ∫ dt + ∫ 0dt =1
−∞ −1
3 2
0 if x < − 1
3
Therefore F ( x) = x + 1 if − 1 ≤ x ≤ 2.
9
1 if x ≥ 2
2 1 1
p (0 ≤ x ≤ 1) = F (1) − F (1) = − =
9 9 9
Exercise 1: A discrete pmf is given by
x − 2
if x = 2, 3, 5.
f ( x) = k
0 other wise
x if 0 ≤ x ≤ 1
f ( x) = 2 − x if 1 ≤ x ≤ 2.
o otherwise
∂ F ( x)
If X is continuous then f ( x) =
dx
If X is discrete then P(X = xi) = F(xi) – F(xi-1)
- Probability of a fixed value of a continuous random variable is zero.
⇒ P (a < X < b) = P (a ≤ X < b) = P (a < X ≤ b) = P (a ≤ X ≤ b)
- If X is discrete random variable the
b −1 b −1
P (a < X < b) = ∑
x = a +1
P ( x) , P (a ≤ X < b) = ∑
x=a
p(x)
b b
P (a < X ≤ b) = ∑
x = a +1
P ( x ), P (a ≤ X ≤ b) = ∑
x=a
P (x)
Remark: Once we know the point or pdf we can easily calculate their respective CDF and
vice versa.
CHAPTER 6
FUNCTIONS OF RANDOM VARIABLES
6.1Equivalent events
It is often the case that we know the probability distribution of a random variable X and are
interested in determining the distribution of some function of x. i.e we know the
distribution of x and we want to find the distribution of H(x). The function y = H(x) is called
a function of random variable.
S X(s) Rx y = H(x) Ry
Note y = H(x) is a real valued function. Hence its domain and range are set of real numbers.
Definition: Let E be an experiment and S be its corresponding sample space. Let X be the
random variable defined on the sample space S and Rx be the range space of x. Let B be an
event with respect to Rx. i.e B ⊆ R x .
Suppose that A is defined as A = {s ∈ S : x(s) ∈ B} . Then we say that the two events A and B
are equivalents and written as A ≅ B .
Example 1: consider the tossing of two coins. Let x be the number of heads obtained. Let B
be the event B = {1} with respect to Rx. Then find an event set A such that A ≅ B .
Solution: S = {HH, HT, TH, TT}
By definition A = {s ∈ S : x(s) ∈ B} ⇒ A = {s ∈ S : x(s) ∈ {1}}
x(s) ∈ B ⇒ x(s) = 1 , if s = HT, TH.
A = {HT,TH} ⇔ A ≅ B .
Remark: For any events A and B which are equivalent then P(A) = P(B).
Definition: Let X be a random variable defined on the sample space S. Let R x be the range
space of X. Let H(x) be a real valued function and consider the random variable Y = H(x)
with range space Ry. Then for any event C ⊆ Ry we have P(C ) = P( x ∈ Rx : H ( x) ∈ C ) .
s1 sse x1 y1
s2 x2 y2
s3 x(s) x3 y = H(x) y3
... … …
sn xn yn
A = {s ∈ S : x(s) ∈ B}
P(C ) = P{x( s) ∈ Rx : H ( x) ∈ C )}
Example 2: Let X be a continuous random variable with probability density function
e − x if x > 0
f ( x) =
0 otherwise
a. Is f(x) really a probability density function?
b. If Y= H(x) = 2x + 1 determine the range space of X and Y
c. Suppose that the event C is defined as C = {y ≥ 5}, then determine the event
B = {x ∈ Rx : H( x) ∈ C} , where H(x) = 2x + 1.
d. Determine P{y ≥ 5} from the event B.
Solution:
∞ ∞
a. f ( x) ≥ 0 ∀x and ∫
−∞
f ( x)dx = ∫ e − x dx = 1 , Hence f(x) is pdf
0
b. Rx = {x : x > 0} , given
Ry = {y : y > 1}, y = 2x + 1 since x >0, y >1.
c. B = {x ∈ Rx : 2 x + 1 ∈ C} = {x ∈ Rx : 2 x + 1 ≥ 5} ⇒ B = {x ∈ Rx : x ≥ 2}
∞ ∞
1
d. P( y ≥ 5) = P(2 x + 1 ≥ 5) = P( x ≥ 2) = ∫ f ( x)dx = ∫ e − x dx =
2 2 e2
6.2Functions of discrete random variables and their distributions
If X is a discrete random variable then Y = H(x) is also a discrete random variable. Let x 1, x2,
x3, …, xn, … be the possible values of x with P(xi) = P(X = xi). Let Yi = H(xi) for i = 1,2,3, … be
the possible values of x then P(Y = yi) = P(X = xi).
Example 1: suppose the random variable X assume three values -1, 0, 1 with probabilities
1 1 1
, , and respectively. Find the probability function of (a) y = x 2 and (b) y = 2x + 5.
3 2 6
Solution: The pmf of X is given as
X -1 0 1
P(x) 1/3 ½ 1/6
a. Y = x , if x = -1, y = 1, if x = 0, y = 0, and if xx = 1, y = 1
2
Ry = {0, 1}
1 1 1 1
P(y = 0) = P(x = 0)= and P(y = 1) = P(x = -1) + P(x = 1)= + = .
2 3 6 2
The pmf of y = x2 is
Y 0 1
P(y) ½ 1/2
b. Y = 2x + 5, if x = -1, y = 3, if x = 0, y = 5 and if x = 1, y = 7.
1 1 1
P(y = 3) = P(x = -1)= , P(y = 5) = P(x = 0) = and P(y = 7)= P(x = 1) = .
3 2 6
The pmf of y = 2x + 5 is
Y 3 5 7
P(y) 1/3
1/2 1/6
1 1 if x is even
Exercise: Let P( x) = x for x = 1, 2, 3, 4, …, and let H ( x) =
2 − 1 if x is odd
Then show that (a) P(x) is a legitimate probability function.
(b) Find the probability distribution of y = H(x).
y −1 2
( y − 1) 2
Solution: G( y) = P(Y ≤ y) = P(2 x + 1 ≤ y) = P( x ≤
2
)= ∫ 2 xdx =
0
4
′
( y − 1) 2 y −1
g ( y ) = G ′( y ) = = ,
4 2
y −1
f ( x) > 0 ∀ 0 ≤ x ≤ 1 , ⇒ g ( y ) > 0 ∀ 0 ≤ ≤1⇔1≤ y ≤ 3
2
y − 1
for 1 ≤ y ≤ 3
g ( y) = 2
0 otherwise
Example 2: Suppose that f(x) is as defined in example 1 determine the probability density
function of Y = H(x) e − x .
1
∫ 2 xdx = 1 − (ln y)
−x
Solution: G ( y ) = P (Y ≤ y ) = p (e ≤ y ) = p (− x ≤ ln y ) = p ( x ≥ − ln y ) = 2
− ln y
[ ]′ − 2 ln y
g ( y ) = G ′( y ) = 1 − (ln y ) 2 =
y
1
f ( x ) > 0 for 0 ≤ x ≤ 1 , ⇒ g ( y ) > 0 for 0 ≤ − ln y ≤ 1 ⇔ ≤ y ≤1
e
− 2 ln y 1
for ≤ y ≤ 1
g ( y) = y e
0 otherwise
Theorem 1: Let X be continuous random variable with pdf f where f(x) >0 for a < x < b. Suppose that
Y = H(x) is strictly monotonic (increasing or decreasing) function of of x. Assume that this function is
differentiable and continuous for all x. Then the random variable y defined as Y = H(x) has pdf g give
by
d H (−y1)
g ( y ) = f (H ( y ) )
−1
dy
Remark:
1. If H is strictly increasing {H ( x) ≤ y} is equivalent to x ≤ H (−y1) { }
2. If H is strictly decreasing then {H ( x) ≥ x} is equivalent to
Proof:
Case 1: Assume that H is strictly increasing function then
G ( y ) = P(Y ≤ y ) = p( H ( x) ≤ y ) ⇒ H ( x) ≤ y ≅ x ≤ H (−y1)
G ( y ) = P( x ≤ H (−y1) ) ⇒ p( X ≤ x) = F ( x) = F ( H (−y1) )
dH (−y1)
[ (
g ( y ) = G ( y ) = F H (−y1) )]′ = F ′( H )(H )′ ⇒ g ( y) = f ( H
−1
( y)
−1
( y)
−1
( y) )
dy
………*
1 if 0 ≤ x ≤ 1
Example 1: Suppose that f ( x) =
0 otherwise
Find the pdf of y = -lnx
Solution: The function y = -lnx is a strictly decreasing function.
−y −1
dH (−y1)
y = − ln x ⇒ − y = ln x ⇔ x = e = H ( y ) and = −e − y
dx
−1
e −y
for y > 0
(
⇒ g ( y ) = f H (−y1) )
dH ( y )
⇒ g ( y) =
dy 0 otherwise
1
if − 1 ≤ x ≤ 1
Example 2: Suppose f ( x) = 2
0 otherwise
2
Find the pdf of y = x .
Solution: The function y = x2 is not monotonic function over the range -1 to 1. Hence the above
theorem is not applicable.
{ } { }
G ( y ) = P(Y ≤ y ) = p( x 2 ≤ y ) = p x ≤ y = p − y ≤ x ≤ y = p( x ≤ y ) − p( x ≤ − y )
= F ( y ) − F (− y )
1 1 1 1 1 −1 1
g ( y ) = f ( y ). − f (− y ). = . − . = .
2 y 2 y 2 2 y 2 2 y 2 y
1
if 0 ≤ x ≤ 1
g ( y) = 2 y
0 otherwise
Theorem 2: Let X be a continuous random variable with pdf f(x). Let y = x2, then the random
variable y has a pdf g(y) given by
g ( y) =
1
2 y
{
f ( y ). − f (− y ) . }
x2
Example 3: Let f ( x) = 81 if − 3 ≤ x ≤ 6
0 otherwise
Find the pdf of y = H(x) = x2.
Solution: For 0 ≤ y ≤ 9, y = x 2 is not monotonic. Hence use theorem 2 above or definitional
approach.
g ( y) =
1
2 y
{
f ( y ). − f (− y ) . =
1 y
}
y
. + . =
2 y 81 81 81
y
.
1
y = x 2 ⇒ x = ± y = H (−y1) = ± .
2 y
2
y 1 y
g ( y) = ± = , for 9 ≤ y ≤ 36
81 2 y 162
y
if 0 ≤ y ≤ 9
81
g ( y) = y
if 9 ≤ y ≤ 3
162
0 otherwise
CHAPTER 7
INTRODUCTION TWO DIMENSIONAL RANDOM VARIABLES
7.1 Two Dimensional Random Variable
We may define two or more random variables on the same sample space. Let S be the sample
space associated with a random experiment E. Let X and Y be two real random variables defined
on the same space. Let X=X(s) and Y=Y(s) be two functions each assigning a real number to
each outcomes s Є S. Then (x, y) is called a two dimensional random variables or random
vectors.
Example: Suppose in a communication system X is the transmitted signal and Y is the
corresponding noisy received signal. Then (x, y) is a joint random variable.
Note
i If the possible values of (X, Y) are finite or countable infinite, (X, Y) is called a two
dimensional discrete random variable.
ii If (X, Y) can assume all values in a specified region R in the xy-plane, (X, Y) is called a
two dimensional continuous random variable.
P(X = xi, Y = yj) = p ( x, y ) , then p ( x, y ) is called the probability mass function of (X, Y)
provided that:
i. p ( x, y ) ≥ 0 for all x ,y;
ii. ∑ p ( x, y ) = 1
R
iii. p ( X = x, Y = y ) = F ( x, y )
The set of triple {( xi, yj ), p ( xi, yj )} is called a joint probability distribution of (x, y).
Example 1: Suppose that two machines are used for a particular task in the morning and for a
different task in the afternoon. Let X and Y represent the number of times that a machine breaks
down in the morning and in the afternoon respectively. The table below give the joint probability
distribution of (x, y).
Y
0 1 2 Total
X 0 0.25 0.15 0.10 0.50
1 0.10 0.08 0.07 0.25
2 0.05 0.07 0.13 0.25
Total 0.40 0.30 0.30 1
a. What is the probability the machine breaks down equal number of times in the morning
and in the afternoon?
b. What is the probability the machine breaks down greater number of times in the morning
and in the afternoon?
Solution:
(a), p ( x = y ) = p ( x = 0, y = 0) + p ( x = 1, y = 1) + p ( x = 2, y = 2) = 0.25 + 0.08 + 0.13 = 0.46.
(b), p ( x > y ) = p ( x = 1, y = 0) + p ( x = 2, y = 0) + p ( x = 2, y = 1) = 0.15 + 0.10 + 0.07 = 0.32.
Example 1: Let (X, Y) be a continuous two dimensional RV with their joint pdf
c if 0 < x < 2, 0 < y < 4
f ( x, y ) =
0 otherwise
a. Determine c
b. Find P(X <1, Y <3)
c. Find P(X > y)
Solution:
∞ ∞ 4 2 4 2
4 4
1
∫−∞−∫∞ ∫0 ∫0 ∫0 ∫0 ∫0 x 0 ∫0 dy = 2c(4 − 0) = 8c = 1 ⇔ c = 8
2
a. f ( x , y ) dxdy = cdxdy = 1 ⇒ cdx dy = c ( )dy = 2 c
1
if 0 < x < 2, 0 < y < 4
f ( x, y ) = 8
0 otherwise
3 1 3 1
1
3 1
1 1
3
1
3
1 3
∫−∞−∫∞ ∫0 ∫0 8 ∫0 ∫0 8 8 ∫0 0 ∫
1
b. p ( x < 1, y < 3) = f ( x , y ) dxdy = dxdy = dx dy = ( x )dy = dy = (3 − 0) =
80 8 8
2
2 2
1 y2
2 2 2 2 2 3
1 1 1 1 1
p( x > y ) = ∫ ∫ f ( x, y )dxdy = ∫ ∫ dxdy = ∫ ∫ dxdy = ∫ ( x )dy = ∫ (2 − y )dy = (2 y −
2
) =
8 y 8 80 80 2 0 4
y
0 y 0 y 0 8
Example 2: Suppose X and Y have a joint pdf
cx if 0 < y < x < 1, 0 < x 2 < y < 1
f ( x, y ) =
0 otherwise
a. Determine c that makes f(x,y) a legitimate pdf.
1 1
b. Find p ( x < , y < ) .
2 2
∞ ∞ 1 y 1 x 2 y
1 y
1
y y2
a. ∫ ∫ f ( x, y )dxdy = ∫ ∫ cxdxdy = 1 ⇒ ∫ c ∫ xdxdy = c ∫ dy = c ∫ − dy = 1
0 2 y 2 2
− ∞− ∞ 0 y y 0 0
1
c y2 y3 c 1 1
2 2
= − = − = 1 ⇔ c = 12
2 2 3
3
0
y
1 1
1 y 1
p x < , y < = ∫ ∫ 12 xdxdy = ∫
12 x 2
1
dy = ∫ ( y − y 2 )dy =
1
0 2
2 2 0 y 0
2
y
7.3 Joint Cumulative Distribution Function
iv. P{a < X < b , c < Y <d } = F (b, d) – F(a, d) – F(b , c) + F(a , c)
∂ 2 F ( x, y )
v. At points of continuity of f (x , y), f ( x, y ) =
∂x ∂y
7.4 Marginal Probability Distribution
With two dimensional random variable (x, y) we associate one dimensional random
variable namely X or Y. that is we may be interested in the probability distribution of X or Y
only. We call such probabilities distribution as marginal probability distribution.
a. In the discrete case
Let X assumes the values x1, x2, x3, …, xm and Y assumes the values y1,y2, y3, …, yn then the
probability of an event X = x i and Y = yj for i = 1,2,3,…,m and j = 1,2,3,…,n.
P(X = x) = P{(X = xi and Y = y1) or ( X = xi and Y = y 2) or etc.
= p(x = xi) = ∑ p{x = xi, y = yj} is called the marginal probability function of X.
j
The collection of pairs {Xi, Pi.}, i = 1, 2, 3,…, m is called the marginal probability distribution
of X.
Similarly; P(Y = y) = P{(Y = yj and X = x1 ) or (Y = yj and X = x 2 ) or etc.}
= p(y = yj) = ∑ p{x = xi, y = yj} is called the marginal probability function of Y.
i
The collection of pairs {yj, P.j}, j = 1, 2, 3,…, n is called the marginal probability distribution
of Y.
b. In the continuous case
Let (X,Y) be a two dimensional continuous random vector with joint pdf f(x,y). Then the
individual or marginal distribution of X and Y are defined by the pdf’s
∞ ∞
g ( x) = ∫ f ( x, y) dy and h( y) = ∫ f ( x, y) dx , respectively.
−∞ −∞
Example 1: Recall the example on machine operation. Find the marginal distribution of x and y.
Solution: Marginal of x, p(x = xi) = ∑ p{x = xi, y = yj}
j
0.5 if x = 0
p(y = yi) = 0.25 if x = 1, 2
0 otherwise
Example 2: Let (x, y) be a two dimensional continuous random variable with joint pdf
1
if 0 < x < 2, 0 < y < 4
f(x, y) = 8
0 otherwise
Find the marginal of x and y.
Solution:
Marginal of x Marginal of y
∞ 4 ∞ 2
1 1 4 1 1 1 2 1
g(x) = ∫ f(x, y)dy = ∫ dy = y = h(y) = ∫ f(x, y)dx = ∫ dx = x =
-∞ 0
8 8 0 2 -∞ 0
8 8 0 4
1 1
if 0 < x < 2 if 0 < y < 4
g(x) = 2 h(y) = 4
0 otherwise 0 otherwise
12 x, if 0 < y < x < 1, 0 < x 2 < y < 1
Example 3: Let f ( x, y ) =
0 otherwise
Find marginal of x and y
∞ x x
Solution: Marginal of x, g(x) = ∫ f(x, y)dy = ∫ 12 xdy = 12 x ∫ dy = 12 xy x 2 = 12 x ( x − x 2 )
x
-∞ x2 x2
6 y (1 − y ) if 0 < y < 1
h(y) =
0 otherwise
Exercise: Suppose (x, y) has a joint pdf
2( x + y − 2 xy ), if 0 < x < 1, 0 < y < 1
f ( x, y ) =
0 otherwise
Find marginal of x and y.
p ( x = 1, y = 0) 0.05 p ( x = 1, y = 1) 0.47
P ( x = 1 | y = 0) = = = 0.1047, P ( x = 1 | y = 1) = = = 0.9038
p ( y = 0) 0.48 p ( y = 1) 0.52
2 if x > 0, y > 0, x + y < 1
Example 2: f ( x, y ) =
0 otherwise
1 1 1 1
Find (a), P x < | y = , (b), P y > | x =
2 4 3 2
Solution: First find the marginals
∞ 1- x ∞ 1- y
g(x) = ∫ f(x, y)dy = ∫ 2dy = 2(1 − x), h(y) = ∫ f(x, y)dx = ∫ 2dx = 2(1 − y )
-∞ 0 -∞ 0
Then find the conditionals
f ( x, y ) 2 1 f ( x, y ) 2 1
g ( x | y) = = = , 0 < x < 1 − y , h( y | x ) = = = , 0 < y < 1− x
h( y ) 2(1 − y ) 1 − y g ( x) 2(1 − x) 1 − x
1
1 2
1 1 4 3 1 4 2
a. If y = , g ( x | y) = = , for 0 < x < , then P x < | y = = ∫ dx =
4
1−
1 3 4 2 4 0 3 3
4
1
1 2
1 1 1 1 1
b. If x = , h( y | x ) = = 2, for 0 < y < , then P y > | x = = ∫ 2dy =
2
1−
1 2 3 2 1 3
2 3
g ( x) * h( y ) = 2 x * 2 y = 4 xy = f ( x, y )
X and Y are independent
2( i + j ) , if i = 1, 2, 3, , j = 1, 2, 3,
Example 4: Let p ( x = i, y = j ) =
0 otherwise
Is X and Y are independent?
Solution:
∞ ∞ ∞
p(x = i ) = ∑ p{x = i y = j} = ∑ 2 −i 2 − j = 2 −i ∑ 2 − j = 2 −i (1) = 2 −i , similarly
j =1 j =1 j =1
∞ ∞ ∞
p(y = j) = ∑ p{x = i y = j} = ∑ 2 −i 2 − j = 2 − j ∑ 2 −i = 2 − j (1) = 2 − j
i =1 i =1 i =1
−i −j
P( x = i, y = j) = p ( x = i ). p ( y = j) = 2 .2 = 2 − (i + j) . Hence X and Y are independent.
CHAPTER 8
INTRODUCTION TO EXPECTATION OF A RANDOM VARIABLE
8.1 Introduction
Definition:
1. Let a discrete random variable X assume the values X1, X2, …. with the probabilities P(X1),
P(X2), …., respectively. Then the expected value of X or the mean of X, denoted as E(X) is
defined as:
∞
E ( X ) = X 1 P ( X 1 ) + X 2 P ( X 2 ) + ....) = ∑X i P ( X i ) provided that the series
i =1
∑ x p( x ) converges.
i i
n
If x assumes only a finite number of values say n, then E ( X ) = ∑X i P( X i )
i =1
If all the possible values of X are equally probable then their respective value is:
∞ n
1 1
E(X ) = ∑
i =1
Xi *
n
=
n
∑x
i =1
i = X
Example 3: The probability density function for a continuous random variable X is given as
x + 2
, − 2 < x < 4
f ( x ) = 18
0 , otherwise
Find expected value of X
( x + 2 ) dx =
4
4
x2 x x3 x2
4
Solution: E ( X ) = = ∫− 2 18 ∫− 2 18 + 9 dx = 54 + 18 = 2 .
x⋅
−2
8.2 Expectation of a function of random variable
Definition: Let X be a random variable and Y = H(x) a function of X, hence Y is a random
variable. There are two ways of evaluating E(Y).
Case 1: If Y is discrete random variable with the possible values of Y to y 1, y2, … and with
∞
probability P(y)= P(Y=yi), then E (Y ) = ∑ yip ( y i ) .
i =1
∞
If Y is continuous random variable with pdf g(y) then E ( y ) = ∫ y g ( y ) dy
−∞
n
Case 2: If X is discrete random variable and P(xi) = P(X = xi) we have E (Y ) = ∑ yip ( x i )
i =1
∞ ∞
If X is a continuous random variable with pdf f(x), then E ( y ) = ∫ y f ( x ) dx = ∫ H ( x ) f ( x ) dx
−∞ −∞
− x*e xe
1 1 −x
E ( y) = ∫y f ( x ) dx = ∫ | x | f ( x ) dx = ∫ dx + ∫ = ∫ − x e dx + 2 ∫ x e
x
dx
−∞ −∞ −∞
2 0
2 2
1 1
= ( − 1 + 0 ) + (1 − 0 ) = 1
2 2
Exercise: find E(X)
Properties of expectations
1. If X is constant, X = c then E(c) = c
2. If c is constant and X is a random variable then E(cX) = cE(X)
3. Let X and Y be a two dimensional random variables then E(X+Y) = E(X) + E(Y)
4. If Y = a + bX, then E(Y) = a + bE(X)
5. Let (X,Y) be atwo dimensional random variables then if X and Y are independent,
E(XY) = E(X)E(Y).
Variance of a random variable and its properties
Definition: Let X be a random variable with probability distribution f(x) and mean
E(X) = µ, we define the variance of X which is defined as var(x) or x 2 as follows:
{
var( x ) = E ( X − )
2
}
The positive square root of x 2 is called standard deviation of X.
Theorem : var( x ) = E {( X − E ( X ) ) 2 } = E ( X 2 ) − ( E ( X ) ) 2
Properties of variance
1. If c is constant, then var(cx) = c2var(x)
2. If c is constant, then var(c) = 0
3. If b and c are constant, then var(bx+c) = b2var(x)
4. If (x, y) is a two dimensional random variable and if they are independent, then
var(X + Y) = var(x) + var(y)
0 . 125 if x = 0,3.
Example 1: Let X have the p.m.f.:
f ( x ) = 0.375 if x = 1, 2
0 else where
Find the mean, variance and the standard deviation of X.
Solution: = E(X) = 0(0.125) + 1(0.375) + 2(0.375) + 3(0.125) = 1.5
2= E[(x-)2] = (-1.5)2(.125) + (-.5)2(.375) + (.5)2(.375) + (1.5)2(.125) = 2.90625
= E ( x − ) = 2.90625 = 1.721
2
Example 2: The probability distribution function for a discrete random variable X is
2 k , x = 1
3k , x = 3
f ( x) = where k is some constant.
4 k , x = 5
0, otherwise
Then find (a) k. (b) E ( X ) and Var ( X )
Solution:
1
(a) ∑ f ( x ) = f (1) + f ( 3 ) + f ( 5 ) = 2 k + 3 k + 4 k = 9 k = 1 ⇔ k = .
x 9
(b) E ( X ) = ∑ xf ( x ) = 1 ⋅ f (1 ) + 3 ⋅ f ( 3 ) + 5 ⋅ f ( 5 ) = 1 ⋅ 2 + 3 ⋅ 3 + 5 ⋅ 4 = 31 an
x 9 9 9 9
2 2 2
31 31 31
Var ( X ) = ∑ (xx
− u ) 2
f ( x) = 1 −
9
⋅ f (1 ) + 3 −
9
⋅ f (3) + 5 −
9
⋅ f (5 )
=
(− 22 )2 ⋅
2
+
16
⋅
3
+
14 2 4
⋅ =
200
81 9 81 9 81 9 81
Example 3: The probability density function for a continuous random variable X is
a + bx 2 , 0 ≤ x ≤ 1
f ( x) = where a, b are some constants. Then find
0, otherwise.
3
(a) a, b if E ( X ) = (b) Var ( X ) .
5
Solution: (a)
∫ (a + bx )dx
1 1
b 3 1 b
∫ f ( x ) dx = 1 ⇔ = 1 ⇔ ax + x |0 = 1 ⇔ a + =1
2
0 0
3 3
∫ x (a + bx )dx
1 1
a 2 b 4 1 a b 3
and E ( X ) = ∫ xf ( x ) dx = = x + x |0 = + =
2
0 0
2 4 2 4 5
3 6
Solve for the two equations, we have a = , b = .
5 5
3 6 2
+ x , 0 ≤ x ≤1
(b) f ( x) = 5 5
0, otherwise
Thus,
2
3
Var ( X ) = E [ X − E ( X ) ] = E (X ) − [E ( X )] = E (X ) −
2 2
2 2
5
3 6 2
1 1
9 9
= ∫ f ( x ) dx − = ∫ x2 + x dx −
2
x
0
25 0 5 5 25
1 6 9 1 6 9 2
= x 3
+ x 5
|10 − = + − =
5 25 25 5 25 25 25
Example 4: The probability density function for a continuous random variable X is
x + 2
, − 2 < x < 4
f ( x ) = 18
0 , otherwise
(
Then find (a) P( X < 1) (b) P X 2 < 9 (c) E ( X ) and Var ( X ) )
Solution:
1
1
x+2 x2 x 1 1 1 1 2
(a) P ( X < 1) = P ( − 1 < X < 1) = ∫−1 18 dx = + = + − − =
36 9 −1 36 9 36 9 9
−2 3
x+2 x+2 x2 x
( )
3 3
< 9 = P (− 3 < X < 3 ) =
(b) 25
∫− 3 18 dx = ∫ 0 dx + ∫− 2 18 dx = 36 + 9 = 36
2
P X
−3 −2
(x + 2)
4
(c) x2 x x3 x2
∫− 2 18 + 9 dx = 54 + 18 = 2 .
4 4
E (X )= = ∫ x⋅ dx =
−2
18 −2
( x + 2 ) dx
4
x3 x2 x4 x3
( )= ∫ x
4 4
Since ⋅ = ∫ + dx = + = 6,
2 2
E X
− 2
−2
18 18 9 72 27 − 2
Var ( X ) = E ( X − ) = E X
2
( )−
2 2
= 6 − 22 = 2 .
Exercise 1: suppose X is a continuous random variable with pdf
1 + x if − 1 ≤ x ≤ 0
f ( x) = 1 − x if 0 ≤ x > 1
0
elswhere
Find mean and variance of X
Exercise 2: A random current I follows through a resistor with R = 50Ω . The probability
density function for the current is given as:
2kx if 0 ≤ x ≤ 0.5
f ( x) = 2k (1 − x) if 0.5 ≤ x ≤ 1
0
elswhere
a. Find the value of k which makes f(x) a valid probability density function
b. Find the expected value of the current I
c. Find the expected value of the power dissipated, P =I2R
d. Find the variance of the current random variable
8.3 Moment and moment generating function
Moments
Definition: The kth moment of a random variable X about its expectation or mean is
defined as
= E ( X − E ( X ) ) , k = 0, 1, 2, 3, …
k
Remark:
If k = 0, then µ 0 = 1
If k = 1, then µ 1 = 0
If k = 2, then µ 2 = x 2
The kth moment of a random variable X about origin (zero) is defined as ′k = E ( X k
) , k = 0,
1, 2, 3, …
Remark:
If k = 0, then µ 0 = 1
If k = 1, then µ 1 = E(x) = mean
If k = 2, then µ 2 = E(X2)
var(x) = E( x 2 ) − (E( x)) = 2 = 2′ − (1′ )
2 2
Exercise: Drive the moment generating function for the following distribution with pdf
e− x if x > 0
f ( x) =
0 elsewhere
Find the mean and variance of X
Properties of moment generating functions
1. If the moment generating function of a random variable exists, it uniquely
determines the distribution function of that random variable. i.e if M x (t ) = M x (t )
then X and Y are from the same distribution.
2. If Y = a+ bx, then M x (t ) = e ta M x (tb )
3. If X and Y are independent if Z = X + Y, then M z (t ) = M x (t ) + M y (t )
4. Moment generating function cannot always used to find the mean and variance.
Chebyshev’s Inequality
Let X be a random variable with E(X) = c, c ∈ ℜ . Then if E ( X − c) 2 = is finite and is any
possible number we have P{| X − c |≥ } ≤ 2 E ( X − c) 2
1
This is called a Chebyshev’s inequality.
By choosing the complement event P{| X − c |< } > 1 − 2 E ( X − c) 2
1
2
By choosing c = μ , then P{| X − c |≥ } ≤ 2
2 1
By considering c = μ and = k then P{| X − c |≥ k } ≤ = 2
k 2 k
= P{| X − c |≤ k } ≥ 1 − 2
1
k
Theorem: the probability that any a random variable X will assume a value within k standard
deviations of its or departs from its mean by less than k times standard deviation is at least
1
1− 2
k
Note this is regardless of the specific distribution and it is a lower bound not an exact equality.
8.4 Covariance and correlation coefficient
Correlation coefficient measures the degree of linear association between X and Y.
Definition: let (x, y) be a two dimensional random variables then the correlation coefficient
of X and Y is denoted by xy is defined as
E{(X − E( X ) )(Y − E (Y ) )} cov( x, y ) E ( XY ) − E ( X ) E (Y )
xy = = =
var( x) var( y ) var( x) var( y ) var( x) var( y )
Definition: Let (x, y) be a two dimensional random variable, then the covariance denoted
by cov(x, y) or xy is defined as xy = E {( X − E ( X ) )(Y − E (Y ) )}
Theorem 1: xy = E ( XY ) − E ( X ) E (Y )
Theorem 2: If X and Y are independent, then xy = 0, but the covariance is not always true.
2 if 0 ≤ x ≤ y ≤ 1
Example 1: f ( x, y ) =
0 otherwise
Find xy and xy
Solution:
Marginal of x Marginal of x
∞ 1 ∞ y
g ( x) = ∫
−∞
f ( x, y )dy = ∫ 2 xdy
x
h( y ) = ∫
−∞
f ( x, y )dx = ∫ 2 xdx
0
2(1 − x) if 0 ≤ x ≤ 1 2y if 0 ≤ y ≤ 1
= =
0 otherwise 0 otherwise
∞ ∞ ∞ ∞ 1 1
1
E( xy ) = ∫
− ∞− ∞
∫ xyf ( x, y)dxdy = ∫ ∫ 2 xydxdy = ∫ ∫ 2 xydxdy =
− ∞− ∞ 0 x
4
1 1 1 1
1 2
E( X ) = ∫ xg ( x)dx = ∫ 2(1 − x)dx = and E(Y ) = ∫ yh( y )dy = ∫ 2 ydy = ,
0 0
3 0 0
3
1 1
E( X 2 ) = , E(Y 2 ) =
6 2
1 1
var( x ) = E( X 2 ) − ( E ( x )) 2 = and var( y ) = E(Y 2 ) − ( E ( y )) 2 =
18 18
1 1 2 1
xy = E ( XY ) − E ( X ) E (Y ) = − * =
4 3 3 36
1
cov( x, y ) 36 1
xy = = =
var( x) var( y ) 1 1 2
*
18 18
2 (1 − x)if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
Exercise 1: Let f ( x, y ) =
0 otherwise
i. Find xy and xy
ii. Show that X and Y are independent
6
[ ]
Exercise 2: Let f ( x, y ) = 5 1 − ( x − y ) if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
2
0 otherwise
Compute xy and xy
Exercise 3: Let X and Y denote voltages at two points in a circuit. The joint density function is
x
(1 + 3 y 2 )if 0 ≤ x ≤ 2 , 0 ≤ y ≤ 1
given as: f ( x, y ) = 4
0 otherwise
i. Compute the mean for X and Y
ii. Compute xy and xy
iii. Let Z = x/y (the ratio of the voltages) compute E(Z)
Properties of xy
2 xy ≤ 1, − 1 ≤ xy ≤ 1
xy = ±1 indicates high degree of linearity between X and Y
ac
Let V = ax + b and W = cy + d, then vw = xy , where a ≠ 0, c ≠ 0
| ac |
Conditional expectations
Definition: If (x, y) is a two dimensional random variable, then we defines the conditional
expectation of x for a given Y = y as:
∑ xp( x | y ) if x is discrete
E ( X | Y = y) =
∫ xg ( x | y )dy if x is continuous
p ( X = x, Y = y )
Where p ( X | Y = y ) =
p (Y = y )
CHAPTER 9
DISCRETE AND CONTINUOUS DENSITY FUNCTIONS
9.1 Common Discrete Probability Distributions
1. Binomial Distribution
A discrete random variable X is said to have a binomial distribution if x satisfies the following
conditions:
• An experiment is repeated for a fixed number of identical trials n.
• All trials of the experiment are independent from one another.
• All possible outcomes for each trial of the experiment can be divided into two
complementary events called “success” and “failure”.
• The probability of success has a constant value of p for every trial and the probability of
failure has a constant value of q for every trial, where q = 1 − p .
• The random variable x counts the number of trials on which success occurred.
• The trials are independent, thus we must sample with replacement.
Example: Fourteen percent of flights from Bole International Airport are delayed. If 20 flights
are chosen at random, then we can consider each flight to be an independent trial. If we define a
successful trial to be that a flight takes off on time, then the random variable X representing the
number of on-time flights will be binomially distributed with n = 20 , p = .86 , and q = .14 .
Definition: The outcomes of the binomial experiment and the corresponding probabilities
of these outcomes are called Binomial Distribution.
If X is a binomial random variable with n trials, probability of success p, and probability of
failure q, then by the fundamental counting principle, the probability of any outcome in which
there are x successes (and therefore n − x failures) is p x q n − x .
To count the number of outcomes with x successes and n − x failures, we observe that the x
successes could occur on any x of the n trials. The number of ways of choosing x trials out of n
is n C x , so the probability of getting x successes in n trials becomes:
n
P ( X = x ) = p x q n − x , x = 0 ,1, 2 ,...., n
x
and written as: X ~ Bin (n, p)
Example 1: A student takes a 10 question multiple-choice quiz and guesses each answer. For
each question, there are 4 possible answers, only one of which is correct. If we consider
“success” to be getting a question right and let the random variable X represent the number
correct answers on the multiple-choice quiz. Then what is the probability of a student guessing 3
answers correctly?
X ~ Bin ( n = 10, p = 0.25) ⇒ P ( X = x ) = p x q n − x ,
n
x = 0 ,1, 2 ,3, 4 ,5 , , 10 .
x
3 7
1 3 1 2187 while the probability of guessing seven
P (3) = 10 C 3 = 120 × × ≈ .2 5
4 4 64 16384
7 3
p ( y ) = (1 − p ) y −1 p y = 1,2,...
∞ ∞ ∞
∑ p( y) = ∑ (1 − p) y −1 p = p∑ (1 − p) y −1
y =1 y =1 y =1
∞ ∞
1 p
⇒ ∑ p ( y ) = p ∑ (1 − p ) y* = p = =1
y =1 y *= 0 1 − (1 − p ) p
Let X be the number of trials required until the first success is obtained then
(1 − p) x −1 p x = 1,2,...
p( X = x) = is called geometric probability distribution, and X has
0 otherwise
geometric distribution with probability of success P.
Example 1: Someone is trying to take the road test to get a driver’s license. If the probability of
passing the test is 40%.
a. What is the probability that this person will pass the test at second shot?
b. What is the probability that someone will pass the road test in 5 trials?
c. Given someone has taken the test 4 times and still has not got the license, what is that
person’s chance of passing the next time?
Solution: Let X denote number of trials needed to get driving license
X = Geo( p = 0.4 ) ⇒ p( X = x ) = 0.4 * 0.6 x −1
a), p ( x = 2) = 0.4 * 0.6 2 = 0.144
b), p ( x = 5) = 0.4 * 0.6 5 = 0.0311
c), p ( x > 4) = 1 − p ( x ≤ 4) = 1 − { p ( x = 1) + p ( x = 2) + p ( x = 3) + p ( x = 4)} = 1 − 0.522 = 0.478
9.2 Common Continuous Probability Distributions
1. Uniform distribution
Let X be a continuous random variable whose distribution function is constant on an interval, say
a ≤ x ≤ b and 0 elsewhere, i.e.
1
a ≤ x≤b
f(x) = b − a
0
elsewhere
Such a random variable is uniformly distributed on [a, b] abbreviated X ~ Uniform (a, b). If
we take a measurement of X, we are equally likely to obtain any value within the interval.
d −c
d
Hence, for some sub interval ( c, d ) ⊆ ( a, b ) , we have P ( c ≤ x ≤ d ) = ∫
1
dx = .
c
b−a b−a
b 1
For this distribution to be a probability distribution we require
∫a dx = 1
b−a
The cumulative distribution function is given as
0 −∞ < x< a
x 1 x−a
FX (x ) = ∫ dt = a≤ x≤b
a b − a b −a
1 x ≥ b
+∞ b
1 x2 a+b ,
b
x
So the mean of the uniform distribution is = ∫ xf ( x ) dx = ∫ dx = =
−∞ a
b−a b − a 2 a 2
the midpoint of the interval (a, b).
+∞
1
b
b3 − a3 ( b − a ) ( b 2 + ab + a 2 )
And E X 2 = ∫ x f ( x ) dx =
2
∫ x dx =
2
= .
−∞
b−a a
3 (b − a ) 3 (b − a )
( b − a ) , and the
2
b 2 + ab + a 2 b 2 − 2ab + a 2
Then the variance is 2
= E X −
2 2
= − =
3 4 12
standard deviation is = b − a .
2 3
Uniform random variables are the correct model for choosing a number “at random” from an
interval. They are also natural choices for experiments in which some event is “equally likely” to
happen at any time or place within some interval.
Note: the longer the interval (a, b), the larger the values of the variance and standard deviation.
Example 1: Suppose a point is chosen at random on a line segment [0, 2] what is the
probability that a chosen point lies between 1 and 1.5. Assuming that x is uniform on [0, 2].
1
Solution: f(x) = 2 if 0 ≤ x ≤ 2
0
elsewhere
1 .5 1 .5
1 1
p(1 ≤ x ≤ 1.5) = ∫ f ( x)dx = ∫ 2 dx = 4
1 1
Exercise: Suppose that the random variable x has possible values between 0 and 100.
Assuming x has uniform distribution between 0 and 100.
Find (a). E(X) (b). p ( x ≥ 50) (c). p ( 25 ≤ x ≤ 75)
2. The Exponential Distribution
Let λ be a positive real number. We write X~exponential(λ) and say that X is an exponential
random variable with parameter λ if the pdf of X is
e − x , if x ≥ 0
f (x) =
0 , o th e rw is e 1
If X~exponential(λ), then the expected value and variance of X are: T1 = E[ X ] = ;
1
Var[ X ] = X2 = .
2
The cumulative distribution function is
x
F (x) = P[ X ≤ x ] = ∫ f (t ) dt [
= 0 + − e − t ] x
0
= 1 − e −t ( x ≥ 0)
−∞
− t
⇒ P[ X > x ] = e ( x ≥ 0)
− x
Note: S ( x ) = P ( X ≥ x ) = 1 − P ( X < x ) = e is called the survival function.
Example 1: The random quantity X follows an exponential distribution with parameter =
0.25. Find mean, standard deviation and P[X > 4].
1
= 4, p[X > 4] = e = e 4 = e −1 = .367879... ≈ .368
1 1 − *4
− x
Solution: = = =
0.25
Note: The exponential random variable can be used to describe the life time of a machine,
industrial product and Human being. Also, it can be used to describe the waiting time of a
customer for some service.
Example 2: Let X represent the life time of a washing machine. Suppose the average lifetime for
this type of washing machine is 15 years. What is the probability that this washing machine can
be used for less than 6 years? Also, what is the probability that this washing machine can be used
for more than 18 years?
Solution: X has the exponential density function with = 1 . Then,
15
−6 − 18
P ( X ≤ 6) = 1 − e = 0.3297 and P ( X ≥ 18 ) = e
15
= 0 . 3012 . Thus, for this
15
washing machine, it is about 30% chance that it can be used for quite a long time or a short time.
3. Normal Distribution
A random variable X is said to have a normal distribution if its probability density function
is given by
1 x− 2
1 −
f ( x) = e 2
, − ∞ < x < ∞, − ∞ < < ∞, > 0
2
Here µ and σ are the parameters of the distribution; µ = E(X), the mean of the random variable X
(or of the probability distribution); and σ = the standard deviation of X.
Properties of Normal Distribution:
1. It is bell shaped and is symmetrical about its mean and it is mesokurtic. The maximum
1
ordinate is at x = and is given by f ( x) =
2
2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.
3. It is a continuous distribution.
4. The inflection points are at µ - σ and µ + σ.
5. It is a family of curves, i.e., every unique pair of mean and standard deviation defines a
different normal distribution. Thus, the normal distribution is completely described by two
parameters: mean and standard deviation.
6. Total area under the curve sums to 1, i.e., the area of the distribution on each side of the
∞
mean is 0.5. ⇒ ∫ f ( x)dx = 1
−∞
X − 76 . 4 − 76 . 4 − 80
b) P ( X > 76 . 4 ) = P (
>
) = P (Z >
4 .8
)
X − 72 . 9 −
P ( X > 72 . 9 ) = 0 . 2005 ⇒ P ( > ) = 0 . 2005
72 . 9 − 62 . 4
⇒ P (Z > ) = 0 . 2005
10 . 5
⇒ P (Z > ) = 0 . 2005
10 . 5
⇒ P (0 < Z < ) = 0 . 50 − 0 . 2005 = 0 . 2995 And from table
10 . 5
P ( 0 < Z < 0 . 84 ) = 0 . 2995 ⇔ = 0 . 84
⇒ = 12 . 5
5. A random variable has a normal distribution with = 5 .Find its mean if the probability
that the random variable will assume a value less than 52.5 is 0.6915.
Solution
52.5 −
P( Z < z ) = P( Z < ) = 0.6915
5
⇒ P(0 < Z < z ) = 0.6915 − 0.50 = 0.1915. But from the table
⇒ P(0 < Z < 0.5) = 0.1915
52.5 −
⇔z= = 0 .5 ⇒ = 50
5
Exercise:
1. A city installs 2000 electric lamps for street lighting. These lamps have a mean burning
life of 1000 hours with a standard deviation of 200 hours. The normal distribution is a
close approximation to this case.
a. What is the probability that a lamp will fail in the first 700 burning hours?
b. What is the probability that a lamp will fail between 900 and 1300 burning hours?
c. How many lamps are expected to fail between 900 and 1300 burning hours?
d. What is the probability that a lamp will burn for exactly 900 hours?
e. What is the probability that a lamp will burn between 899 hours and 901 hours before
it fails?
f. After how many burning hours would we expect 10% of the lamps to be left?
g. After how many burning hours would we expect 90% of the lamps to be left?
2. Bolts coming off a production line are normally distributed with a mean length of 2.4 cm and
standard deviation 0.1 cm. What is the probability that a bolt chosen at random will be of length
between 2.6 cm and 2.7 cm?
3. A robot eye rotates at a constant rate and its field of vision is 45°. Suppose 50 like objects,
recognizable to the robot, are randomly and independently placed momentarily in the area
surveyed by the eye. If X counts the number of objects seen by the eye then what is the
probability that the number of observed objects will be within 2 standard deviations of the mean.