Random Variale & Random Process
Random Variale & Random Process
2. Sample Space: The sample space S is the collection of all outcomes of a random experiment.
The elements of S are called sample points.
• A sample space may be finite, countably infinite or uncountable.
• A finite or countably infinite sample space is called a discrete sample space.
• An uncountable sample space is called a continuous sample space
Sample
space S
Sample •s1
point •s2
3. Event: An event A is a subset of the sample space such that probability can be assigned to it.
Thus
• A ⊆ S.
• For a discrete sample space, all subsets are events.
• S is the certain event (sue to occur) and φ . is the impossible event.
Consider the following examples.
Example 1 Tossing a fair coin –The possible outcomes are H (head) and T (tail). The
associated sample space is S = {H , T }. It is a finite sample space. The events associated with the
sample space S are: S ,{H },{ T } and φ .
The associated finite sample space is S = {'1', '2', '3', '4', '5', '6'}. Some events are
A = The event of getting an odd face={'1', '3', '5'}.
B = The event of getting a six={6}
And so on.
NA
P( A) =
N
where
N A = Number of outcomes favourable to A.
Example 4 A fair die is rolled once. What is the probability of getting a‘6’?
Here S = {'1', ' 2 ', '3', ' 4 ', '5', '6 '} and A = { '6 '}
∴ N = 6 and N A = 1
1
∴ P( A) =
6
Example 5 A fair coin is tossed twice. What is the probability of getting two ‘heads’?
Total number of outcomes is 4 and all four outcomes are equally likely.
Only outcome favourable to A is {HH}
1
∴ P( A) =
4
Discussion
• The classical definition is limited to a random experiment which has only a finite
number of outcomes. In many experiments like that in the above example, the sample
space is finite and each outcome may be assumed ‘equally likely.’ In such cases, the
counting method can be used to compute probabilities of events.
• Consider the experiment of tossing a fair coin until a ‘head’ appears.As we have
discussed earlier, there are countably infinite outcomes. Can you believe that all these
outcomes are equally likely?
• The notion of equally likely is important here. Equally likely means equally probable.
Thus this definition presupposes that all events occur with equal probability. Thus the
definition includes a concept to be defined.
If an experiment is repeated n times under similar conditions and the event A occurs in nA times,
then
nA
P( A) = Lim
n→∞ n
This definition is also inadequate from the theoretical point of view.
¾ We cannot repeat an experiment infinite number of times.
¾ How do we ascertain that the above ratio will converge for all possible sequences of
outcomes of the experiment?
Example Suppose a die is rolled 500 times. The following table shows the frequency each face.
Face 1 2 3 4 5 6
Frequency 82 81 88 81 90 78
Relative frequency 0.164 0.162 0.176 0.162 0.18 0.156
We have earlier defined an event as a subset of the sample space. Does each subset of the sample
space forms an event?
The answer is yes for a finite sample space. However, we may not be able to assign probability
meaningfully to all the subsets of a continuous sample space. We have to eliminate those subsets.
The concept of the sigma algebra is meaningful now.
Definition: Let S be a sample space and F a sigma field defined over it. Let P : F → ℜ be a
mapping from the sigma-algebra F into the real line such that for each A ∈ F , there exists a unique
P ( A) ∈ ℜ. Clearly P is a set function and is called probability if it satisfies the following these
axioms
1. P ( A) ≥ 0 for all A ∈ F
2. P ( S ) = 1
3. Countable additivityIf A1 , A2 ,... are pair-wise disjoint events, i.e. Ai ∩ Aj = φ for i ≠ j, then
∞ ∞
P ( ∪ Ai ) = ∑ P ( Ai )
i =1 i =1
0
S
Remark
• The triplet ( S , F, P). is called the probability space.
• Any assignment of probability assignment must satisfy the above three axioms
• If if A ∩ B = 0, / P( A ∪ B) = P( A) + P( B)
This is a special case of axiom 3 and for a discrete sample space, this simpler version may be
considered as the axiom 3. We shall give a proof of this result below.
• The events A and B are called mutually exclusive if A ∩ B = 0, /
1. P(φ ) = 0
This is because,
S ∪φ = S
⇒ P( S ∪ φ ) = P( S )
⇒ P( S ) + P(φ ) = P ( S )
∴ P (φ ) = 0
2. P ( Ac ) = 1- P ( A) where where A∈ F
We have
A ∪ Ac = S
⇒ P( A ∪ Ac ) = P( S )
⇒ P( A) + P( Ac ) = 1 ∵ A ∩ Ac = φ
∴ P ( A) = 1 − P ( Ac )
/ P ( A ∪ B ) = P( A) + P ( B )
3. If A, B ∈ F and A ∩ B = 0,
We have
A ∪ B = A ∪ B ∪ φ ... ∪ φ ..
∴ P ( A ∪ B) = P ( A) + P( B) + P (φ )... + P (φ ) + .. (using axiom 3)
∴ P ( A ∪ B) = P ( A) + P( B)
4. If A, B ∈ F, P ( A ∩ B c ) = P( A) − P( A ∩ B)
B
A
We have A ∩ Bc A∩ B
( A ∩ B ) ∪ ( A ∩ B) = A
c
∴ P[( A ∩ B c ) ∪ ( A ∩ B)] = P( A) A = ( A ∩ B c ) ∪ ( A ∩ B)
⇒ P( A ∩ B c ) + P ( A ∩ B ) = P ( A)
⇒ P( A ∩ B c ) = P( A) − P ( A ∩ B )
We have
A ∪ B = ( Ac ∩ B) ∪ ( A ∩ B) ∪ ( A ∩ B c )
∴ P( A ∪ B ) = P[( Ac ∩ B) ∪ ( A ∩ B) ∪ ( A ∩ B c )]
= P ( Ac ∩ B) + P( A ∩ B) + P( A ∩ B c )
=P ( B ) − P( A ∩ B) + P( A ∩ B) + P( A) − P( A ∩ B)
=P ( B) + P( A) − P( A ∩ B )
6. We can apply the properties of sets to establish the following result for
A, B, C ∈ F, P ( A ∪ B ∪ C ) = P ( A) + P ( B ) + P(C ) − P ( A ∩ B) − P( B ∩ C ) − P( A ∩ C ) + P( A ∩ B ∩ C )
The following generalization is known as the principle inclusion-exclusion.
7. Principle of Inclusion-exclusion
⎛ n ⎞ n ⎛ n ⎞
P ⎜ ∪ Ai ⎟ = ∑ P ( Ai ) − ∑ P ( Ai ∩ Aj ) + ∑ P ( Ai ∩ Aj ∩ AK ) − .... + (−1) n +1 P ⎜ ∩ Ai ⎟
⎝ i =1 ⎠ i =1 i , j |i < j i , j , k |i < j < k ⎝ i =1 ⎠
Discussion
We require some rules to assign probabilities to some basic events in ℑ . For other events we can
compute the probabilities in terms of the probabilities of these basic events.
Consider a finite sample space S = { s1 , s2 ,...... sn } . Then the sigma algebra ℑ is defined
by the power set of S. For any elementary event {si } ∈ ℑ , we can assign a probability P( si )
such that,
N
∑ P ( {si } )
i =1
= 1
In a special case, when the outcomes are equi-probable, we can assign equal probability p to each
elementary event.
n
∴ ∑p
i =1
= 1
⇒ p = 1n
⎛ ⎞
∴ P ( A) = P ⎜ ∪ {Si } ⎟
⎜S ∈A ⎟
⎝ i ⎠
1 n( A)
= n( A) =
n n
Example Consider the experiment of rolling a fair die considered in example 2.
Suppose Ai , i = 1,.., 6 represent the elementary events. Thus A1 is the event of getting ‘1’, A2 is the
event of getting ’2’ and so on.
Since all six disjoint events are equiprobale and S = A1 ∪ A2 ∪ .... ∪ A6 we get
1
P ( A1 ) = P ( A2 ) = ... = P( A6 ) =
6
Su ppose A is the event of getting an odd face. Then
A = A1 ∪ A3 ∪ A5
1 1
∴ P ( A) = P ( A1 ) + P ( A3 ) + P( A5 ) = 3 × =
6 2
Example Consider the experiment of tossing a fair coin until a head is obtained discussed in
Example 3. Here S = {H , TH , TTH ,......}. Let us call
s1 = H
s2 = TH
s3 = TTH
1
and so on. If we assign, P({sn }) = then ∑ P({sn }) = 1. Let A = {s1 , s2 , s3} is the event of
2n sn ∈S
Suppose the sample space S is continuous and un-countable. Such a sample space arises when the
outcomes of an experiment are numbers. For example, such sample space occurs when the
experiment consists in measuring the voltage, the current or the resistance. In such a case, the sigma
algebra consists of the Borel sets on the real line.
In many applications we have to deal with a finite sample space S and the elementary
events formed by single elements of the set may be assumed equiprobable. In this case,
we can define the probability of the event A according to the classical definition
discussed earlier:
nA
P ( A) =
n
Thus calculation of probability involves finding the number of elements in the sample
space S and the event A. Combinatorial rules give us quick algebraic rules to find the
elements in S . We briefly outline some of these rules:
(1) Product rule: Suppose we have a set A with m distinct elements and the set B
{ }
with n distinct elements and A × B = ( ai , b j ) | ai ∈ A, b j ∈ B . Then A × B contains
mn ordered pair of elements. This is illustrated in Fig for m = 5 and n = 4. In
other words if we can choose element a in m possible ways and the element b in n
possible ways then the ordered pair (a,b) can be chosen in mn possible ways.
B a1 , b4 a2 , b4 a3 , b4 a4 , b4 a5 , b5
a1 , b3 a2 , b3 a3 , b3 a4 , b3 a5 , b3
a1 , b2 a2 , b2 a3 , b2 a4 , b2 a5 , b2
a1 , b1 a2 , b1 a3 , b1 a4 , b1 a5 , b1
Suppose we have to choose k objects from a set of n objects. Further, after every
choosing, the object is placed back in the set. In this case, the number of distinct
ordered k-tupples = n × n × ...... × n (k − times) = n k .
Suppose we have to choose k objects from a set of n objects by picking one object
after another at random.
In this case the first object can be chosen from n objects, the second object can be
chosen from n-1 objects, and so. Therefore, by applying the product rule, the number
of distinct order k-tupples in this case is
n × ( n − 1) ....... ( n − k + 1)
n!
=
( n − k )!
n!
The number is called the permutation of n objects taking k at a time and
( n − k )!
denoted by n Pk . Thus
n!
n
Pk =
( n − k )!
Clearly , n Pn = n !
Suppose nCk be the number of ways in which k objects can be chosen out of a set of
n objects. In this case ordering of the objects in the set of k objects is not considered.
n
∴ nCk k ! = n pk =
n−k
n
∴ n Ck =
k n−k
n
Ck is also called the binomial co-efficient.
Example:
An urn contains 6 red balls, 5 green balls and 4 blue balls. 9 balls were picked at random
from the urn without replacement. What is the probability that out of the balls 4 are red, 3
are green and 2 are blue?
Solution:
15!
9 balls can be picked from a population of 15 balls in 15C9 =
9!6!
6
C4 × 5C3 × 4C2
Therefore the required probability is 15
C9
n!
n1 !n2 !....nk !
=
n!
×
( n − n1 )! × ..... × ( n − n1 − n2 .... − nk −1 )!
n1 !( n − n1 ) ! n2 !( n − n1 − n2 ) ! nk !( n − n1 − n2 .. − nk −1 − nk ) !
n!
=
n1 !n2 !....nk !
Example: What is the probability that in a throw of 12 dice each face occurs twice.
Solution: The total number of elements in the sample space of the outcomes of a single
throw of 12 dice is = 612
The number of favourable outcomes is the number of ways in which 12 dice can be
arranged in six groups of size 2 each – group 1 consisting of two dice each showing 1,
group 2 consisting of two dice each showing 2 and so on.
Therefore, the total number distinct groups
12!
=
2!2!2!2!2!2!
Hence the required probability is
12!
=
(2)6 612
Conditional probability
Consider the probability space ( S , F, P ). Let A and B two events in F . We ask the following
question –
Given that A has occurred, what is the probability of B?
The answer is the conditional probability of B given A denoted by P( B / A). We shall develop the
concept of the conditional probability and explain under what condition this conditional
probability is same as P ( B ).
Notation
P( B / A) = Conditional
probability of B given A
Let us consider the case of equiprobable events discussed earlier. Let N AB sample points be
favourable for the joint event A ∩ B.
Clearly
Number of outcomes favourable to A and B
P ( B / A) =
Number of outcomes in A
NAB
=
NA
NAB
P( A ∩ B)
= N =
NA P ( A)
N
This concept suggests us to define conditional probability. The probability of an event B under the
condition that another event A has occurred is called the conditional probability of B given A and
defined by
P (A ∩ B )
P (B / A) = , P (A ) ≠ 0
P (A )
Example 1
Consider the example tossing the fair die. Suppose
A = event of getting an even number = {2, 4, 6}
B = event of getting a number less than 4 = {1, 2,3}
∴ A ∩ B = {2}
P( A ∩ B ) 1/ 6 1
∴ P ( B / A) = = =
P( A) 3/ 6 3
Example 2
A family has two children. It is known that at least one of the children is a girl. What is the
probability that both the children are girls?
A = event of at least one girl
B = event of two girls
Clearly
S = {gg , gb, bg , bb}, A = {gg , gb, bg} and B = {gg}
A ∩ B = {gg}
P ( A ∩ B ) 1/ 4 1
∴ P ( B / A) = = =
P ( A) 3/ 4 3
In the following we show that the conditional probability satisfies the axioms of probability.
P( A ∩ B)
By definition P(B / A) = , P( A) ≠ 0
P( A)
Axiom 1
P( A ∩ B) ≥ 0, P( A) > 0
P( A ∩ B)
∴ P( B / A) = ≥0
P( A)
Axiom 2 We have S ∩ A = A
P(S ∩ A) P(A)
∴P(S / A) = = =1
P(A) P(A)
Axiom 3 Consider a sequence of disjoint events B1 , B2 ,..., Bn ,... We have
∞ ∞
(∪ Bi ) ∩ A = ∪ ( Bi ∩ A)
i =1 i =1
( See the Venn diagram below for illustration of finite version of the result.)
Note that the sequence Bi ∩ A, i = 1, 2,... is also sequence of disjoint events.
∞ ∞
∴ P(∪(Bi ∩ A)) = ∑ P(Bi ∩ A)
i =1 i =1
∞ ∞
∞
P(∪Bi ∩ A) ∑P(B ∩ A)i ∞
∴P(∪Bi / A) = i =1
= i=1
= ∑P(Bi / A)
i =1 P( A) P( A) i =1
n
Proof: We have ∪
i=1
B ∩ A i = B and the sequence B ∩ Ai is disjoint.
n
∴ P ( B ) = P ( ∪ B ∩ Ai )
i =1
n
= ∑
i =1
P ( B ∩ Ai )
n
= ∑
i =1
P ( Ai ) P ( B / Ai )
S
A1
A2 A3
Remark
(1) A decomposition of a set S into 2 or more disjoint nonempty subsets is called a partition of S .
The subsets A1 , A2 , . . . An form a partition of S if
S = A1 ∪ A2 ..... ∪ An and Ai ∩ Aj = φ for i ≠ j.
(2) The theorem of total probability can be used to determine the probability of a complex event in
terms of related simpler events. This result will be used in Bays’ theorem to be discussed to the
end of the lecture.
Example 3 Suppose a box contains 2 white and 3 black balls. Two balls are picked at random
without replacement. Let A1 = event that the first ball is white and Let A1c = event that the first
ball is black. Clearly A1 and A1c form a partition of the sample space corresponding to picking
two balls from the box. Let B = the event that the second ball is white. Then
P( B) = P( A1 ) P( B / A1 ) + P( A1c ) P( B / A1c )
2 1 3 2 2
= × + × =
5 4 5 4 5
Independent events
Two events are called independent if the probability of occurrence of one event does not affect the
probability of occurrence of the other. Thus the events A and B are independent if
P( B / A) = P( B) and P( A / B) = P( A).
where P( A) and P( B) are assumed to be non-zero.
Equivalently if A and B are independent, we have
P( A ∩ B)
= P( B)
P ( A)
or
P ( A ∩ B ) = P( A) P( B)
Two events A and B are called statistically dependent if they are not independent.
Similarly, we can define the independence of n events. The events A1 , A2 ,..., An are called
independent if and only if
P ( Ai ∩ Aj ) = P( Ai ) P( Aj )
P ( Ai ∩ Aj ∩ Ak ) = P ( Ai ) P ( Aj ) P( Ak )
P ( Ai ∩ Aj ∩ Ak ∩ ... An ) = P ( Ai ) P( Aj ) P( Ak )...P( An )
Example 4 Consider the example of tossing a fair coin twice. The resulting sample space is given
by S = {HH , HT , TH , TT } and all the outcomes are equiprobable.
Let A = {TH , TT } be the event of getting ‘tail’ in the first toss and B = {TH , HH } be the event of
getting ‘head’ in the second toss. Then
1 1
P ( A ) = and P ( B ) = .
2 2
Again, ( A ∩ B ) = {TH } so that
1
P ( A ∩ B) = = P ( A) P ( B )
4
Hence the events A and B are independent.
Example 5 Consider the experiment of picking two balls at random discussed in example 3. In
2 1
this case, P ( B) = and P ( B / A1 ) = .
5 4
Therefore, P ( B ) ≠ P ( B / A1 ) and A1 and B are dependent.
Bayes’ Theorem
Suppose A1 , A2 , . . . An are partitions on S such that S = A1 ∪ A2 ..... ∪ An and Ai ∩ A j = φ for i ≠ j.
Suppose the event B occurs if one of the events A1 , A2 , . . . An occurs. Thus we have the information of
probabilities P ( Ai ) and P ( B / Ai ), i = 1, 2.., n. We ask the following question:
Given that B has occured what is the probability that a particular event Ak has occured ? In other wo
what is P ( Ak / B ) ?
n
We have P ( B ) = ∑ P(A
i =1
i ) P ( B | Ai ) ( Using the theorem of total probability)
P(Ak ) P ( B/Ak )
∴ P(Ak | B) =
P(B)
P(Ak ) P ( B/Ak )
= n
∑
i =1
P(Ai )P ( B / Ai )
This result is known as the Baye’s theorem. The probability P(Ak ) is called the a priori
probability and P(Ak / B) is called the a posteriori probability. Thus the Bays’ theorem enables us
to determine the a posteriori probability P(Ak | B) from the observation that B has occurred. This
result is of practical importance and is the heart of Baysean classification, Baysean estimation etc.
Example 6
In a binary communication system a zero and a one is transmitted with probability 0.6 and 0.4
respectively. Due to error in the communication system a zero becomes a one with a probability
0.1 and a one becomes a zero with a probability 0.08. Determine the probability (i) of receiving a
one and (ii) that a one was transmitted when the received message is one.
Suppose two experiments E1 and E2 with the corresponding sample space S1 and S 2 are
performed sequentially. Such a combined experiment is called the product of two
experiments E1 and E2.
Clearly, the outcome of this combined experiment consists of the ordered pair
( s1 , s2 ) where s1 ∈ S1 and s2 ∈ S 2 . The sample space corresponding to the combined
experiment is given by S = S1 × S2 . The events in S consist of all the Cartesian products
of the form A1 × A2 where A1 is an event in S1 and A2 is an event in S 2 . Our aim is to
define the probability P ( A1 × A2 ) .
The sample space S1 × S2 and the events A1 × S 2 ,S1 × A2 and A1 × A2 are illustrated in in
Fig.
P ( S1 × A2 ) = P2 ( A2 )
and
P ( A1 × S2 ) = P1 ( A1 )
where Pi is the probability defined on the events of Ai i = 1, 2. Ai,. This is because, the
event A1 × S2 in S occurs whenever A1 in S1 occurs, irrespective of the event in S 2 .
S2
A1 S1
Fig. (Animate)
Independent Experiments:
In many experiments, the events A1 × S2 and S1 × A2 are independent for every selection
of A1 ∈ S1 and A2 ∈ S2 . Such experiments are called independent experiments. In this
case can write
P( A × B) = P ⎡⎣( A1 × S2 ) ∩ ( S1 × A2 ) ⎤⎦
= P ( A1 × S2 ) P ( S1 × A2 )
= P1 ( A1 ) P2 ( A2 )
Example 1
Suppose S1 is the sample space of the experiment of rolling of a six-faced fair die and S 2
is the sample space of the experiment of tossing of a fair die.
Clearly,
S1 = {1, 2,3, 4,5, 6} , A1 ={2,3}
and
S 2 = { H , T } , A2 = {H }
∴ S1 × S2 = {(1, H ), (1, T ), (2, H ), (2, T ), (3, H ), (3, T ), (4, H ), (4, T ), (5, H ), (5, T ), (6, H ), (6, T )}
and
1 2 1
P( A1 × A2 ) = . =
2 6 6
Example 2
P ( A1 × A2 × ...... × An ) = P1 ( A1 ) P2 ( A2 ) ........Pn ( An )
Bernoulli trial
Binomial Law:
pn ( k ) = nCk p k (1 − p ) n − k
Consider n independent repetitions of the Bernoulli trial. Let S1 be the sample space
associated with each trial and we are interested in a particular event A1 ∈ s and its
complement Ac such that P ( A) = p and P( Ac ) = 1 − p . If A occurs in a trial, then we have
a ‘success’ otherwise a ‘failure’.
Any event in S is of the form A1 × A2 × ........ × An where some Ai s are A and remaining
Ai s are Ac .
P( A1 × A2 × ......... × An ) = P ( A1 ) P ( A2 ) ...........P ( An ) .
But the nCk number of events in S with k number of As and n – k number of Ac s. For
example, if n = 4, k = 2 , the possible events are
A × Ac × Ac × A
A × A × Ac × A c
A × Ac × A × A c
Ac × A × A × A c
Ac × A c × A × A
A c × A × Ac × A
We also note that all the nCk events are mutually exclusive.
A fair dice is rolled 6 times. What is the probability that a 4 appears thrice?
Solution:
6×5 ⎛ 1 ⎞ ⎛ 5 ⎞
4 2
∴ P6 (4) = × ⎜ ⎟ × ⎜ ⎟ = 0.008
2 ⎝6⎠ ⎝6⎠
Example2:
A communication source emits binary symbols 1 and 0 with probability 0.6 and 0.4
respectively. What is the probability that there will be 5 1s in a message of 20 symbols?
Solution:
S1 = {0,1}
A = {1} , P( A) = p = 0.6
∴ P20 (5) = 20C5 ( 0.6 ) ( 0.4 ) =0.0013
5 15
Example 3
In a binary communication system, bit error occurs with a probability of10−5 . What is the
probability of getting at least one error bit in a message of 8 bits?
Case 1
Suppose n is very large and p is very small and np = λ a constant.
Pn (k ) = nCk p k (1 − p) n − k
n−k
⎛λ⎞ ⎛ λ⎞
k
= nC k ⎜ ⎟ ⎜1 − ⎟
⎝n⎠ ⎝ n⎠
⎛ λ⎞
n
λ ⎜1 − ⎟
k
n(n − 1)....(n − k + 1)
. ⎝
n⎠
=
λ⎞
k
k ⎛
k
n ⎜1 − ⎟
⎝ n⎠
⎛ k −1⎞ k ⎛ λ ⎞
n
n (1 − 1/ n )(1 − 2 / n ) ....... ⎜1 −
k
⎟ λ ⎜1 − ⎟
= ⎝ n ⎠ ⎝ n⎠
⎛ λ⎞
k
k
n k ⎜1 − ⎟
⎝ n⎠
⎛ λ⎞ ⎛ λ⎞
k n
λ k e− λ
∴ pn (k ) =
k
This distribution is known as Poisson probability and widely used in engineering and
other fields. We shall discuss more about this distribution in a later class.
Case 2
When n is sufficiently large and np (1 − p ) 1 , pn (k ) may be approximated as
−
( k − np )2
1
pn (k ) ≈ e 2 np(1− p )
2π np (1 − p )
The right hand side is an expression for normal distribution to be discussed in a later
class.
Mathematical Preliminaries
Real-valued point function on a set
f ( s1 )
s1
f ( s2
s2
s3 f ( s3 )
s4
f ( s1
Domain of
f Range of
f
The function X : S → called a random variable if the inverse image of all Borel sets
under X is an event. Thus, if X is a random variable, then
X −1 (Β) = {s | X ( s ) ∈ Β} ∈ F.
X −1 Notations:
−1
A = X ( B) • Random variables are represented by
upper-case letters.
B • Values of a random variable are
denoted by lower case letters
• X ( s ) = x means that x is the value of a
S random variable X at the sample point
s.
• Usually, the argument s is omitted and
Figure Random Variable we simply write X = x.
(To be animated)
Remark
• S is the domain of X .
• The range of X , denoted by R X , is given by
RX = { X ( s ) | s ∈ S}.
Clearly R X ⊆ .
• The above definition of the random variable requires that the mapping X is such that
X −1 (Β) is a valid event in S . If S is a discrete sample space, this requirement is
met by any mapping X : S → . Thus any mapping defined on the discrete sample
space is a random variable.
Example 1: Consider the example of tossing a fair coin twice. The psample space is S={
HH,HT,TH,TT} and all four outcomes are equally likely. Then we can define a random
variable X as follows
Example 2: Consider the sample space associated with the single toss of a fair die. The sample
space is given by S = {1, 2,3, 4,5,6}
If we define the random variable X that associates a real number equal to the number in the
face of the die, then
X = {1, 2,3, 4,5,6}
Axiom 2
PX ( ) = P( X −1 ( )) = P( S ) = 1
Axiom 3
Suppose B1 , B2 ,.... are disjoint Borel sets. Then X −1 ( B1 ), X −1 ( B2 ),.... are distinct events in
F. Therefore,
∞ ∞
PX ( ∪ Bi ) = P ( ∪ X −1 ( Bi )
i =1 i =1
∞
= ∑ P ( X −1 ( Bi ))
i =1
∞
= ∑ PX ( Bi )
i =1
Thus the random variable X induces a probability space ( S , B, PX )
Probability Distribution Function
We have seen that the event B and {s | X ( s ) ∈ B} are equivalent and
PX ({B}) = P({s | X ( s ) ∈ B}). The underlying sample space is omitted in notation and we
simply write { X ∈ B} and P ({ X ∈ B}) in stead of {s | X ( s ) ∈ B} and P({s | X ( s ) ∈ B})
respectively.
Consider the Borel set (−∞, x] where x represents any real number. The equivalent
event X −1 ((−∞, x]) = {s | X ( s) ≤ x, s ∈ S} is denoted as { X ≤ x}. The event { X ≤ x} can
be taken as a representative event in studying the probability description of a random
variable X . Any other event can be represented in terms of this event. For example,
{ X > x} = { X ≤ x}c ,{x1 < X ≤ x2 } = { X ≤ x2 } \{ X ≤ x1},
∞ ⎛ 1 ⎞
{ X = x} = ∩ ⎜ { X ≤ x} \{ X ≤ x − }⎟
n =1 ⎝ n ⎠
and so on.
FX ( x)
Random
variable
Eaxmple 3 Consider the random variable X in Example 1
We have
Value of the P ({ X = x})
random
Variable X = x
0 1
4
1 1
4
2 1
4
3 1
4
For x < 0,
FX ( x) = P({ X ≤ x}) = 0
For 0 ≤ x < 1,
1
FX ( x) = P({ X ≤ x}) = P({ X = 0}) =
4
For 1 ≤ x < 2,
FX ( x) = P({ X ≤ x})
= P({ X = 0} ∪ { X = 1})
= P({ X = 0}) + P ({ X = 1})
1 1 1
= + =
4 4 2
For 2 ≤ x < 3,
FX ( x) = P({ X ≤ x})
= P({ X = 0} ∪ { X = 1} ∪ { X = 2})
= P({ X = 0}) + P ({ X = 1}) + P ({ X = 2})
1 1 1 3
= + + =
4 4 4 4
For x ≥ 3,
FX ( x) = P({ X ≤ x})
= P( S )
=1
x1 < x2
⇒ { X ( s) ≤ x1 } ⊆ { X ( s) ≤ x2 }
⇒ P{ X ( s) ≤ x1 } ≤ P{ X ( s) ≤ x2 }
∴ FX ( x1 ) < FX ( x2 )
A real function f ( x) is said to be continuous at a
point a if and only if
• FX (x) is right continuous (i) f (a) is defined
FX ( x + ) = lim FX ( x + h) = FX ( x) (ii) lim f ( x) = lim f ( x ) = f (a)
h →0
x→a + x→a −
h>0
Thefunction f ( x) is said to be right-continuous
Because, lim FX ( x + h) = lim P{ X ( s ) ≤ x + h}
h →0 h →0 at a point a if and only if
h >0 h>0
(iii) f (a) is defined
= P{ X ( s ) ≤ x}
lim f ( x) = f (a)
x→a +
=FX ( x)
• FX (−∞) = 0
Because, FX (−∞) = P{s | X ( s) ≤ −∞} = P(φ ) = 0
• FX (∞) = 1
Because, FX (∞) = P{s | X ( s) ≤ ∞} = P( S ) = 1
• P({x1 < X ≤ x2 }) = FX ( x2 ) − FX ( x1 )
We have
{ X ≤ x2 } = { X ≤ x1} ∪ {x1 < X ≤ x2 }
∴ P ({ X ≤ x2 }) = P({ X ≤ x1}) + P ({x1 < X ≤ x2 })
⇒ P ({x1 < X ≤ x2 }) = P ({ X ≤ x2 }) − P ({ X ≤ x1}) = FX ( x2 ) − FX ( x1 )
• FX ( x − )=FX ( x) − P( X = x)
FX ( x − ) = lim FX ( x − h)
h →0
h >0
= lim P{ X ( s) ≤ x − h}
h →0
h >0
= P{ X ( s ) ≤ x} − P( X ( s ) = x)
=FX ( x) − P( X = x)
We can further establish the following results on probability of events on the real line:
P{x1 ≤ X ≤ x2 } = FX ( x2 ) − FX ( x1 ) + P ( X = x1 )
P ({x1 ≤ X < x2 }) = FX ( x2 ) − FX ( x1 ) + P ( X = x1 ) − P ( X = x2 )
P ({ X > x}) = P ({x < X < ∞}) = 1 − FX ( x)
Thus we have seen that given FX ( x), -∞ < x < ∞, we can determine the probability of any
event involving values of the random variable X . Thus FX ( x) ∀x ∈ X is a complete
description of the random variable X .
FX ( x) = 0, x < −2
1 1
= x + x, − 2 ≤ x < 0
8 4
= 1, x≥0
Find a) P(X = 0)
b) P { X ≤ 0}
c) P { X > 2}
d) P {−1 < X ≤ 1}
Solution:
a) P ( X = 0) = FX (0+ ) − FX (0− )
1 3
= 1− =
4 4
b) P { X ≤ 0} = FX (0)
=1
c) P { X > 2} = 1 − FX (2)
= 1−1 = 0
d) P {−1 < X ≤ 1}
= FX (1) − FX (−1)
1 7 FX ( x)
= 1− =
8 8 1
x→
Conditional Distribution and Density function:
Clearly, the conditional probability can be defined on events involving a random variable
X.
Consider the event { X ≤ x} and any event B involving the random variable X. The
conditional distribution function of X given B is defined as
FX ( x / B ) = P ⎡⎣{ X ≤ x} / B ⎤⎦
P ⎡⎣{ X ≤ x} ∩ B ⎤⎦
= P ( B) ≠ 0
P ( B)
We can verify that FX ( x / B ) satisfies all the properties of the distribution function.
d
fX ( x / B) = FX ( x / B )
dx
Then
P ⎡⎣{ X ≤ x} ∩ B ⎤⎦
FX ( x / B ) =
P ( B)
P ⎡⎣{ X ≤ x} ∩ { X ≤ b}⎤⎦
=
P { X ≤ b}
P ⎡⎣{ X ≤ x} ∩ { X ≤ b}⎤⎦
=
FX ( b )
Case 1: x<b
Then
P ⎡⎣{ X ≤ x} ∩ { X ≤ b}⎤⎦
FX ( x / B ) =
FX ( b )
P { X ≤ x} FX ( x )
= =
FX ( b ) FX ( b )
d FX ( x ) f X ( x )
And f X ( x / B ) = =
dx FX ( b ) f X ( b )
Case 2: x ≥ b
P ⎡⎣{ X ≤ x} ∩ { x ≤ b}⎤⎦
FX ( x / B ) =
FX ( x )
P { X ≤ b} FX ( b )
= =
FX ( x ) FX ( x )
d d FX ( b )
and f X ( x / B ) = FX ( x / B ) = =0
dx dx FX ( x )
Suppose the interval { X ≤ x} is portioned into non overlapping subsets such that
n
{ X ≤ x} = ∪
i =1
Bi .
n
Then FX ( x ) = ∑ P ( Bi )FX ( x / Bi )
i =1
P ⎡⎣ B ∪ { X ≤ x}⎤⎦
∴ P ( B / { X ≤ x} ) =
FX ( x )
P ⎡⎣ B ∪ { X ≤ x}⎤⎦
= n
∑ P(B ) F (x / B )
i =1
i X i
Mixed Type Random variable:
Thus for a mixed type random variable X, FX ( x ) has discontinuous, but is not of stair
case type as the in the case of discrete random variable. A typical plot of the distribution
functions of a mixed type random variable as shown in Figure.
P { X ≤ x} = P ( S D ) P ({ X ≤ x} | S D ) + P ( SC ) P ({ X ≤ x} | SC )
= pFD ( x ) + (1 − p ) FC ( x )
p = P ( SD )
= ∑
x∈ X D
pX ( x )
FX ( x ) = 0, x<0
1 1
= + x 0≤ x<4
4 16
3 1
= + ( x − 4) 4 ≤ x ≤ 8
4 16
=1 x>8
Solution:
1 1 1
∴ p = pX ( 0 ) + pX ( 4 ) = + =
4 4 2
And
FD ( x ) = p X ( 0 ) u ( x ) + p X ( 4 ) u ( x − 4 )
1 1
= u ( x ) + u ( x − 4)
2 2
⎧0 ∞<x<0
⎪
FC ( x ) = ⎨ x 0≤ x≤8
⎪1 x >8
⎩
Example 2:
X is the RV representing the life time of a device with the PDF f X ( x ) for x>0.
y=X if X ≤a
=a if X >a
SD = a
SC = ( 0, a )
p = P { y ∈ D}
= P { X > a}
= 1 − FX ( a )
∴ FX ( x ) = pFD ( x ) + (1 − p ) FC ( x )
Discrete, Continuous and Mixed-type random variables
FX ( x)
x→
Plot FX ( x) vs. x for a discrete random variable ( to be animated)
FX ( x)
1
x→
FX ( x)
1
x→
FX ( x)
x→
Plot FX ( x) vs. x for a mixed-type random variable ( to be animated)
A random variable is said to be discrete if the number of elements in the range of RX is finite or
countably infinite. Examples 1 and 2 are discrete random variables.
Assume RX to be countably finite. Let x1 , x2 , x3 ..., xN be the elements in RX . Here the mapping
X ( s ) partitions S into N subsets {s | X ( s) = xi } , i = 1, 2,...., N .
The discrete random variable in this case is completely specified by the probability mass
function (pmf) p X ( xi ) = P ( s | X ( s ) = xi ), i = 1, 2,...., N .
Clearly,
• p X ( xi ) ≥ 0 ∀xi ∈ RX and
• ∑ p X ( xi ) = 1
i∈RX
• Suppose D ⊆ RX . Then
P({x ∈ D}) = ∑p
xi ∈D
X ( xi )
X ( s1 )
X (s2 )
s1 s2
X ( s3 )
s3
s4
X (s4 )
Example
⎧0 x<0
⎪1
⎪ 0 ≤ x <1
⎪4
FX ( x) = ⎨
⎪1 1≤ x < 2
⎪2
⎪1 x≥2
⎩
1
2
1
4
0 1 2 x
Value of the p X ( x)
random
Variable X = x
0 1
4
1 1
4
2 1
2
We shall describe about some useful discrete probability mass functions in a later class.
p X ( x) = P({ X = x})
= FX ( x) − FX ( x − )
= 0
Therefore, the probability mass function of a continuous RV X is zero for all x. A
continuous random variable cannot be characterized by the probability mass function. A
continuous random variable has a very important chacterisation in terms of a function
called the probability density function.
d
f X ( x) = FX ( x )
dx
Interpretation of f X ( x)
d
f X ( x) = FX ( x)
dx
F ( x + ∆x) − FX ( x)
= lim X
∆x → 0 ∆x
P ({x < X ≤ x + ∆x})
= lim
∆x → 0 ∆x
so that
• f X ( x) ≥ 0.
x
• FX ( x) = ∫
−∞
f X (u )du
∞
• ∫f
−∞
X ( x)dx = 1
x2
• P ( x1 < X ≤ x 2 ) = ∫f
− x1
X ( x)dx
f X ( x)
x0 x0 + ∆x0
x
Fig. P ({x0 < X ≤ x0 + ∆x0 }) f X ( x0 )∆x0
⎧0 x<0
FX ( x) = ⎨ − ax
⎩1 − e , a > 0 x ≥ 0
Remark: Using the Dirac delta function we can define the density function for a discrete
random variables.
Consider the random variable X defined by the probability mass function (pmf)
p X ( xi ) = P( s | X ( s ) = xi ), i = 1, 2,...., N .
The distribution function FX ( x) can be written as
N
FX ( x) = ∑ p X ( xi )u ( x − xi )
i =1
Then the density function f X ( x) can be written in terms of the Dirac delta function as
n
f X ( x) = ∑ p X ( xi )δ ( x − xi )
i =1
Example
Consider the random variable defined in Example 1 and Example 3. The distribution
function FX ( x) can be written as
1 1 1
FX ( x) = u ( x) + u ( x − 1) + u ( x − 2)
4 4 2
and
1 1 1
f X ( x) = δ ( x) + δ ( x − 1) + δ ( x − 2)
4 4 2
f X ( x) = pf X d ( x) + (1 − p) f X c ( x)
where
n
f X d ( x) = ∑ p X ( xi )δ ( x − xi )
i =1
⎧0 x<0
⎪0.1 x=0
⎪
FX ( x) = ⎨
⎪0.1 + 0.8 x 0 < x <1
⎪⎩1 x >1
FX ( x)
1
0 1 x→
FX ( x) can be expressed as
FX ( x) = 0.2 FX d ( x) + 0.8 FX c ( x)
where
⎧0 x<0
⎪
FX d ( x) = ⎨0.5 0 ≤ x ≤1
⎪1 x >1
⎩
and
⎧0 x<0
⎪
FX c ( x) = ⎨ x 0 ≤ x ≤1
⎪1 x >1
⎩
where
f X d ( x) = 0.5δ ( x) + 0.5δ ( x − 1)
and
⎧1 0 ≤ x ≤1
f X c ( x) = ⎨
⎩0 elsewhere
f X ( x)
0 1
x
Functions of Random Variables
Often we have to consider random variables which are functions of other random
variables. Let X be a random variable and g (.) is function \. Then Y = g ( X ) is a
random variable. We are interested to find the pdf of Y . For example, suppose
X represents the random voltage input to a full-wave rectifier. Then the rectifier output
Y is given by Y = X . We have to find the probability description of the random variable
Y . We consider the following cases:
= ∑ p X ( x)
{ x| g ( x ) = y}
f X ( x) f X ( x) ⎤
fY ( y ) = =
dy g ′( x) ⎥⎦ x = g −1 ( y )
dx
Y = FX ( X )
X
x = F X− 1 ( y )
Proof:
Where the negative sign in −dx2 is used to account for positive probability.
n
f X ( xi )
fY ( y ) = ∑
i =1 dy
dx x = xi
Suppose Y = X , − a ≤ X ≤ a, a>0
Y
−y y X
dy
y = x has two solutions x1 = y and x2 = − y and = 1 at each solution point.
dx
f X ( x) ]x = y f X ( x)]x =− y
∴ fY ( y ) = +
1 1
= f X ( x) + f X (− x)
Y = CX 2 , C ≥ 0
y
∴ y = cx 2 => x = ± y≥0
c
dy dy
And = 2cx so that = 2c y / c = 2 cy
dx dx
∴ fY ( y ) =
fX ( )
y / c + fX ( −y /c ) y>0
2 cy
= 0 otherwise
Expected Value of a Random Variable
Note that, for a discrete RV X defined by the probability mass function (pmf)
p X ( xi ), i = 1, 2,...., N , the pdf f X ( x) is given by
N
f X ( x) = ∑ p X ( xi )δ ( x − xi )
i =1
∞ N
∴ µ X = EX = ∫ x ∑ p X ( xi )δ ( x − xi )dx
−∞ i =1
N ∞
= ∑ p X ( xi ) ∫ xδ ( x − xi )dx
i =1 −∞
N
= ∑ xi p X ( xi )
i =1
N
µ X = ∑ xi p X ( xi )
i =1
Then
∞
EX = ∫ xf X ( x)dx
−∞
b1
=∫x dx
a b−a
a+b
=
2
Example 2 Consider the random variable X with pmf as tabulated below
Value of the 0 1 2 3
random variable x
p X ( x) 1 1 1 1
8 8 4 2
N
∴ µ X = ∑ xi p X ( xi )
i =1
1 1 1 1
=0 × + 1× + 2 × + 3 ×
8 8 4 2
17
=
8
∞
Remark If f X ( x) is an even function of x, then ∫ xf X ( x)dx = 0. Thus the mean of a RV
−∞
with an even symmetric pdf is 0.
Expected value of a function of a random variable
Suppose Y = g ( X ) is a function of a random variable X as discussed in the last class.
∞
Then, EY = Eg ( X ) = ∫ g ( x) f X ( x)dx
−∞
f X ( x) ⎤
fY ( y ) =
g ′( x) ⎥⎦ x = g −1 ( y ) g ( x)
∞
EY = ∫ yfY ( y )dy
−∞
y2
−1
y2 f X ( g ( y ))
=∫ y dy
y1 g ′( g −1 ( y )
y1
where y1 = g (−∞) and y2 = g (∞).
Substituting x = g −1 ( y ) so that y = g ( x) and dy = g ′( x)dx, we get x
∞
EY = ∫ g ( x) f X ( x)dx
−∞
The following important properties of the expectation operation can be immediately
derived:
(a) If c is a constant,
Ec = c
Clearly
∞ ∞
Ec = ∫ cf X ( x)dx = c ∫ f X ( x)dx = c
−∞ −∞
Mean-square value
∞
EX 2 = ∫ x 2 f X ( x)dx
−∞
Variance
For a random variable X with the pdf f X ( x) and men µ X , the variance of X is
denoted by σ X2 and defined as
∞
σ X2 = E ( X − µ X )2 = ∫ ( x − µ X ) 2 f X ( x)dx
−∞
N
σ X2 = ∑ ( xi − µ X ) 2 p X ( xi )
i =1
σ X2 = E ( X − µ X ) 2
b a+b 2 1
= ∫ (x − ) dx
a 2 b−a
2
1 b 2 a+bb ⎛a+b⎞ b
= [ ∫ x dx − 2 × ∫ xdx + ⎜ ⎟ ∫ dx
b−a a 2 a ⎝ 2 ⎠ a
(b − a) 2
=
12
17 2 1 17 1 17 1 17 1
= (0 − ) × + (1 − ) 2 × + (2 − ) 2 × + (3 − ) 2 ×
8 8 8 8 8 4 8 2
117
=
128
Remark
• Variance is a central moment and measure of dispersion of the random
variable about the mean.
• E ( X − µ X ) 2 is the average of the square deviation from the mean. It
gives information about the deviation of the values of the RV about the
mean. A smaller σ X2 implies that the random values are more clustered
about the mean, Similarly, a bigger σ X2 means that the random values are
more scattered.
For example, consider two random variables X 1 and X 2 with pmf as
1
shown below. Note that each of X 1 and X 2 has zero means. σ X2 1 = and
2
5
σ X2 2 = implying that X 2 has more spread about the mean.
3
pX (x)
1
2
1
4
x -1 0 1 x
p X1 ( x ) 1 1 1 -1 0 1
4 2 2
pX (x)
3
x -2 -1 0 1 2
p X 2 ( x) 1 1 1 1 1
6 3 6 6 1
6
6
2 -1 0 1 1 x
Fig. shows the pdfs of two continuous random variables with same mean but different
variances
• We could have used the mean absolute deviation E X − µ X for the same
purpose. But it is more difficult both for analysis and numerical
calculations.
Properties of variance
(1) σ X2 = EX 2 − µ X2
σ X2 = E ( X − µ X ) 2
= E ( X 2 − 2 µ X X + µ X2 )
= EX 2 − 2 µ X EX + E µ X2
= EX 2 − 2 µ X2 + µ X2
= EX 2 − µ X2
(3) If c is a constant,
var(c ) = 0.
nth moment of a random variable
We can define the nth moment and the nth central-moment of a random variable X
by the following relations
∞
nth-order moment EX n = ∫ x n f X ( x ) dx n = 1, 2,..
−∞
∞
nth-order central moment E ( X − µ X ) n = ∫ ( x − µ X ) n f X ( x ) dx n = 1, 2,...
−∞
Note that
• The mean µ X = EX is the first moment and the mean-square value EX 2
is the second moment
• The first central moment is 0 and the variance σ X2 = E ( X − µ X ) 2 is the
second central moment
• The third central moment measures lack of symmetry of the pdf of a random
E ( X − µ X )3
variable. is called the coefficient of skewness and If the pdf is
σ X3
Chebysev Inequality
Suppose X is a parameter of a manufactured item with known mean
µ X and variance σ X2 . The quality control department rejects the item if the absolute
deviation of X from µ X is greater than 2σ X . What fraction of the manufacturing
item does the quality control department reject? Can you roughly guess it?
The standard deviation gives us an intuitive idea how the random variable is
distributed about the mean. This idea is more precisely expressed in the remarkable
Chebysev Inequality stated below. For a random variable X with mean
µ X and variance σ X2
σ X2
P{ X − µ X ≥ ε } ≤
ε2
Proof:
∞
σ x2 = ∫ ( x − µ X ) 2 f X ( x)dx
−∞
≥ ∫ ( x − µ X ) 2 f X ( x)dx
X − µ X ≥ε
≥ ∫ ε 2 f X ( x) dx
X − µ X ≥ε
= ε 2 P{ X − µ X ≥ ε }
σ X2
∴ P{ X − µ X ≥ ε } ≤
ε2
Markov Inequality
For a random variable X which take only nonnegative values
E( X )
P{ X ≥ a} ≤ where a > 0.
a
∞
E ( X ) = ∫ xf X ( x)dx
0
∞
≥ ∫ xf X ( x)dx
a
∞
≥ ∫ af X ( x)dx
a
= aP{ X ≥ a}
E( X )
∴ P{ X ≥ a} ≤
a
E ( X − k )2
Remark: P{( X − k ) 2 ≥ a} ≤
a
Example
probability P ( X ≥ 3}).
By Markov’s inequality
E( X ) 1
P ( X ≥ 3}) ≤ = .
3 3
1
Hence the required upper bound = .
3
Characteristic Functions of Random Variables
Characteristic function
Consider a random variable X with probability density function f X ( x). The characteristic
function of X denoted by φ X ( w), is defined as
φX (w) = Ee jω X
∞
= ∫ e jωx f X (x)dx
−∞
where j = −1.
Note the following:
∞
[Recall that the Fourier transform of a function f(t) exists if ∫ ⎡⎣ f ( t )⎤⎦dt < ∞ , i.e., f(t) is absolutely
−∞
integrable.]
∞
1
fX ( x) = ∫ φ (ω ) e
− jω x
dω
2π
X
−∞
1
fX ( x) = a≤ x≤b
b−a
= 0 otherwise. The characteristics function is given by
Solution:
b
1 jω x
φ X (ω ) = ∫ e dx
a
b−a
b
1 e jω x ⎤
=
b−a jω ⎥⎦ a
=
1
jω ( b − a )
( e jωb − e jω a )
λ
=
λ − jω
Suppose X is a random variable taking values from the discrete set RX = { x1 , x2 ,.....} with
corresponding probability mass function p X ( xi ) for the value xi .
Then
φ X (ω ) = Ee jω X
= ∑
X i ∈RX
p X ( xi ) e jω xi
p X ( k ) = nCk p k (1 − p )
n−k
, k = 0,1,...., n
n
φ X (ω ) = ∑ nCk p k (1 − p )
n−k
Then e jω k
k =0
k
= ∑ Ck ( pe ) (1 − p )
n
n jω n−k
k =0
= ⎡⎣ pe jω + ( − p ) ⎤⎦
n
(Using the Binomial theorem)
p X (k ) = p(1 − p) k , k = 0,1,....
∞
φ X (ω ) = ∑ e jω k p(1 − p) k
k =0
∞
= p ∑ e jω k (1 − p) k
k =0
p
=
1 − (1 − p)e jω
Moments and the characteristic function
1 dk
EX k = φ X (ω )
j dω k ω =0
( jω ) 2 EX 2 ( jω ) n EX n
φ X (ω ) = 1 + jω EX + + ...... + + .....
2! n!
1 λj 1
EX = =
j (λ − jω ) 2 ω =0
λ
1 2λ j 2
2
EX 2 = = 2
j (λ − jω ) ω =0 λ
2 3
If the random variable under consideration takes non negative integer values only, it is convenient
to characterize the random variable in terms of the probability generating function G (z) defined
by
GX ( z ) = Ez X
∞
= ∑ pX ( k ) z
k =0
Note that
GX ( z ) is related to z-transform, in actual z-transform, z − k is used instead of z k .
The characteristic function of X is given by φ X (ω ) = GX ( e jω )
∞
GX (1) = ∑ p X ( k ) = 1
k =0
∞
GX ' ( z ) = ∑ kp X ( k )z k −1
k =0
∞
G '(1) = ∑ kp X ( k ) = EX
k =0
∞ ∞ ∞
GX ''( z ) = ∑ k (k − 1) p X ( k ) z k − 2 = ∑ k 2 px ( k ) z k − 2 − ∑ k px ( k ) z k − 2
k =0 k =0 k =0
∞ ∞
∴ GX ''(1) = ∑ k 2 p X ( k ) − ∑ kp X ( k ) = EX 2 − EX
k =0 k =0
pm ( x ) = nC x p x (1 − p ) x
∴φx ( z ) = ∑ nCx p x (1 − p) n − x z x
x
= ∑ nCx ( pz ) x (1 − p) n − x
x
= (1 − p + pz ) n
φ ' X (1) = EX = np
∴ EX 2 = φ X '' (1) + EX
= n(n − 1) p 2 + np
= np 2 + npq
p X ( x ) = p (1 − p ) x
φ X ( z ) = ∑ p(1 − p) x z x
x
= p ∑ ( (1 − p ) z )
x
1
=p
1 − (1 − p) z
p(1 − p)
φX ' ( z ) = +
(1 − (1 − p) z ) 2
p(1 − p ) p (1 − p) q
φ X ' (1) = = =
(1 − 1 + p) 2
p2 p
2 p(1 − p)(1 − p )
φ X '' ( z ) =
(1 − (1 − p) z )3
2
2 pq 2 ⎛q⎞
φ X (1) = 3 = 2 ⎜ ⎟
''
p ⎝ p⎠
2
q ⎛q⎞ q
EX = φ (1) + = 2 ⎜ ⎟ +
2 ''
p ⎝ p⎠ p
2 2
⎛q⎞ ⎛q⎞ ⎛q⎞
Var ( X ) = 2 ⎜ ⎟ + ⎜ ⎟ − ⎜ ⎟
⎝ p⎠ ⎝ p⎠ ⎝ p⎠
Moment Generating Function:
Sometimes it is convenient to work with a function similarly to the Laplace transform and
known as the moment generating function.
∞
M X ( s ) = ∫ f X ( x ) e SX dx
0
∞
M x '( s ) = ∫ xf x ( x ) e sx dx
0
∴ M '(0) = EX
∞
dk
M X ( s ) = ∫ x k f X ( x ) e sx dx
ds k 0
= EX k
Example Let X be a continuous random variable with
α
fX ( x) = − ∞ < x < ∞, α > 0
π ( x2 + α 2 )
Then
∞
EX = ∫ xf ( x ) dx
−∞
X
α ∞ 2x
π ∫0 x 2 + α 2
= dx
∞
α ⎤
= ln (1 + x 2 ) ⎥
π ⎦0
Hence EX does not exist. This density function is known as the Cauchy density function.
2
⎛q⎞ ⎛q⎞
= ⎜ ⎟ +⎜ ⎟
⎝ p⎠ ⎝ p⎠
The joint characteristic function of two random variables X and Y is defined by,
φ X ,Y (ω1 , ω2 ) = Ee jω x + jω y
1 2
∞ ∞
= ∫∫
−∞ −∞
f X , y ( x, y )e jω1x + jω2 y dydx
∞ ∞
φ X ,Y ( s1 , s2 ) = ∫∫
−∞ −∞
f X ,Y ( x, y )e xs1 + ys2 dxdy
= Ee s1x + s2 y
IF Z =ax+by, then
−∞ −∞
= φ1 ( s1 )φ2 ( s2 )
φZ ( s) = φ X ,Y ( s, s )
= φ X ( s)φY ( s)
Using the property of Laplace transformation we get,
f Z ( z ) = f X ( z ) * fY ( z )
x2 − 2( µ X +σ X 2 s ) x + ( µ X +σ X 2 s )−( µ X +σ X 2 s )+..
∞ −1
1 σX2
∫e
2
= dx
2πσ X −∞
∞ 1 + 1 (σ X 4 s 2 + 2 µ X σ X 2 s )
1 − ( x − µ −σ s )
2
σX2
∫
2
= e 2 X X dx + e dx
2πσ X −∞
We have,
φ X ,Y ( s1 , s2 )
= Ee Xs1 +Ys2
= E (1 + Xs1 + Ys2 +
( Xs1 + Ys2 ) + ..............)
2
s EX 2 s2 2 EY 2 2
= 1 + s1 EX + s2 EY + + + s1s2 EXY
1
2 2
∂
Hence, EX = φ X ,Y ( s1 , s2 )]s1 =0
∂s1
∂
EY = φ X ,Y ( s1 , s2 )]s2 =0
∂s2
∂2
EXY = φ X ,Y ( s1 , s2 )]s1 =0, s2 =0
∂s1∂s2
We can generate the joint moments of the RVS from the moment generating function.
Important Discrete Random Variables
1
p X ( xi ) =, i = 1, 2,...n
n
Its CDF is
1 n
FX ( x) = ∑ δ ( x − xi )
n i =1
p X ( x)
1
n
iii iii
x1 x2 x3 xn x
1 n
= ∑ xi
n i =0
n
EX 2 == ∑ xi2 p X ( xi )
i =0
1 n 2
= ∑ xi
n i =0
∴σ X2 = EX 2 − µ X2
2
1 n ⎛1 n ⎞
= ∑ xi2 − ⎜ ∑ xi ⎟
n i =0 ⎝ n i =0 ⎠
Example: Suppose X is the random variable representing the outcome of a single roll of a
fair dice. Then X can assume any of the 6 values in the set {1, 2, 3, 4, 5, 6} with the
probability mass function
1
p X ( x) = x = 1, 2,3, 4,5, 6
6
Suppose X is a random variable that takes two values 0 and 1, with probability mass
functions
p X (1) = P { X = 1} = p
and p X (0) = 1 − p, 0 ≤ p ≤1
1 x
0
Remark We can define the pdf of X with the help of delta function. Thus
f X ( x) = (1 − p)δ ( x) + pδ ( x)
1
µ X =EX = ∑ kp X (k ) = 1× p + 0 × (1 − p) = p
k =0
1
EX 2 = ∑ k 2 p X (k ) = 1× p + 0 × (1 − p ) = p
k =0
∴σ = EX 2 − µ X2 = p (1 − p )
2
X
Remark
• The Bernoulli RV is the simplest discrete RV. It can be used as the building block
for many discrete RVs.
• For the Bernoulli RV, EX m = p m = 1, 2,3... . Thus all the moments of the
Bernoulli RV have the same value of p.
Suppose X is a discrete random variable taking values from the set {0,1,......., n} . X is
called a binomial random variable with parameters n and 0 ≤ p ≤ 1 if
p X (k ) = nCk p k (1 − p ) n − k
where
n!
n
Ck =
k !( n − k ) !
As we have seen, the probability of k successes in n independent repetitions of the
Bernoulli trial is given by the binomial law. If X is a discrete random variable
representing the number of successes in this case, then X is a binomial random variable.
For example, the number of heads in ‘n’ independent tossing of a fair coin is a binomial
random variable.
The probability mass function for a binomial random variable with n = 6 and p =0.8 is
shown in the figure below.
Mean and variance of the Binomial random variable
We have
n
EX = ∑ kp X (k )
k =0
n
= ∑ k nCk p k (1 − p) n − k
k =0
n
=0 × q n + ∑ k nCk p k (1 − p) n − k
k =1
n
n!
=∑ k p k (1 − p) n − k
k =1 k !(n − k )!
n
n!
=∑ p k (1 − p ) n − k
k =1 ( k − 1)!( n − k )!
n
n − 1!
=np ∑ p k −1 (1 − p ) n −1− k1
k =1 ( k − 1)!( n − k )!
n −1
n − 1!
=np ∑ p k1 (1 − p ) n −1− k1 (Substituting k1 = k − 1)
k1 = 0 k1 !( n − 1 − k1 )!
= np ( p + 1 − p ) n −1
= np
Similarly
n
EX 2 = ∑ k 2 p X (k )
k =0
n
= ∑ k 2 nCk p k (1 − p) n − k
k =0
n
=02 × q n + ∑ k nCk p k (1 − p) n − k
k =1
n
n!
= ∑k2 p k (1 − p) n − k
k =1 k !(n − k )!
n
n!
= ∑k p k (1 − p) n − k
k =1 (k − 1)!(n − k )!
n
n − 1!
= np ∑ (k − 1 + 1) p k −1 (1 − p) n −1−( k −1)
k =1 (k − 1)!(n − k )!
n
n − 1! n
n − 1!
= np ∑ (k − 1) p k −1 (1 − p) n −1−( k −1) + np ∑ p k −1 (1 − p) n −1− ( k −1)
k =1 (k − 1)!(n − 1 − k + 1)! k =1 ( k − 1)!( n − 1 − k + 1)!
= np × (n − 1) p + np
= n(n − 1) p 2 + np
∴ σ X2 = variance of X = n(n − 1) p 2 + np − n 2 p 2 = np (1 − p ) Mean of B (n − 1, p )
Important Random Variables
1
p X ( xi ) = , i = 1, 2,...n
n
p X ( x)
1
n
iii iii
x1 x2 x3 xn x
Example: Suppose X is the random variable representing the outcome of a single roll of a
fair dice. Then X can assume any of the 6 values in the set {1, 2, 3, 4, 5, 6} with the
probability mass function
1
p X ( x) = x = 1, 2,3, 4,5, 6
6
Suppose X is a random variable that takes two values 0 and 1, with probability mass
functions
p X (1) = P { X = 1} = p
and p X (0) = 1 − p, 0 ≤ p ≤1
Such a random variable X is called a Bernoulli random variable, because it describes
the outcomes of a Bernoulli trial.
FX ( x)
1 x
0
Remark We can define the pdf of X with the help of delta function. Thus
f X ( x) = (1 − p)δ ( x) + pδ ( x)
1
µ X =EX = ∑ kp X (k ) = 1× p + 0 × (1 − p) = p
k =0
1
EX 2 = ∑ k 2 p X (k ) = 1× p + 0 × (1 − p ) = p
k =0
∴σ = EX 2 − µ X2 = p (1 − p )
2
X
Remark
• The Bernoulli RV is the simplest discrete RV. It can be used as the building block
for many discrete RVs.
• For the Bernoulli RV, EX m = p m = 1, 2,3... . Thus all the moments of the
Bernoulli RV have the same value of p.
Suppose X is a discrete random variable taking values from the set {0,1,......., n} . X is
called a binomial random variable with parameters n and 0 ≤ p ≤ 1 if
p X (k ) = nCk p k (1 − p ) n − k
where
n!
n
Ck =
k !( n − k ) !
As we have seen, the probability of k successes in n independent repetitions of the
Bernoulli trial is given by the binomial law. If X is a discrete random variable
representing the number of successes in this case, then X is a binomial random variable.
For example, the number of heads in ‘n’ independent tossing of a fair coin is a binomial
random variable.
= np ( p + 1 − p ) n −1
= np
Similarly
n
EX 2 = ∑ k 2 p X (k )
k =0
n
= ∑ k 2 nCk p k (1 − p) n − k
k =0
n
=02 × q n + ∑ k nCk p k (1 − p) n − k
k =1
n
n!
= ∑k2 p k (1 − p) n − k
k =1 k !(n − k )!
n
n!
= ∑k p k (1 − p) n − k
k =1 (k − 1)!(n − k )!
n
n − 1!
= np ∑ (k − 1 + 1) p k −1 (1 − p) n −1−( k −1)
k =1 (k − 1)!(n − k )!
n
n − 1! n
n − 1!
= np ∑ (k − 1) p k −1 (1 − p) n −1−( k −1) + np ∑ p k −1 (1 − p) n −1− ( k −1)
k =1 (k − 1)!(n − 1 − k + 1)! k =1 ( k − 1)!( n − 1 − k + 1)!
= np × (n − 1) p + np
= n(n − 1) p 2 + np
∴ σ X2 = variance of X = n(n − 1) p 2 + np − n 2 p 2 = np (1 − p ) Mean of B (n − 1, p )
Geometric random variable:
∴ p X (k ) = (1 − p) k −1 p = p(1 − p) k −1
• RX is countablly infinite, because we may have to wait infinitely long before the
first success occurs.
• The geometric random variable X with the parameter p is denoted by
X ~ geo( p)
• The CDF of X ~ geo( p ) is given by
k
FX (k ) = ∑ (1 − p )i −1 p = 1 − (1 − p) k
i =1
which gives the probability that the first ‘success’ will occur before the (k + 1)th trial.
Following Figure shows the pmf of a random variable X ~ geo( p ) for p = 0.25 and
p = 0.5 respectively. Observe that the plots have a mode at k = 1.
Example:
Suppose X is the random variable representing the number of independent tossing of a
coin before a head shows up. Clearly X will be a geometric random variable.
Example: A fair dice is rolled repeatedly. What is the probability that a 6 will be shown
before the fourth roll.
Suppose X is the random variable representing the number of independent rolling of the
1
dice before a ‘6’ shows up. Clearly X will be a geometric random variable with p = . .
6
P( a '6' will be shown before the 4th roll)=P( X = 1 or X = 2 or X = 3)
= p X (1) + p X (2) + p X (3)
= p + p(1 − p) + p(1 − p ) 2
= p(3 − 3 p + p 2 )
91
=
196
d ∞
= −p ∑
dk k =0
(1 − p ) k ( Sum of the geometric series)
p
=
p2
1
=
p
∞
EX = ∑ k 2 p X (k )
2
k =0
∞
= ∑ k 2 p (1 − p ) k −1
k =0
∞
= p ∑ (k (k − 1) + k )(1 − p ) k −1
k =0
∞ ∞
= p (1 − p )∑ k (k − 1)(1 − p ) k − 2 + p ∑ k (1 − p ) k −1
k =0 k =0
2 ∞ ∞
d
= p (1 − p )
dp 2
∑ (1 − p)
k =0
k
+ p ∑ k (1 − p ) k −1
k =0
(1- p ) 1
=2 +
p2 p
(1- p ) 1 1
∴ σ X2 = 2 + − 2
p2 p p
(1- p )
=
p2
1
Mean µX =
p
(1- p )
Variance σ X2 =
p2
Poisson Random Variable:
Remark
e−λ λ k
• p X (k ) = satisfies to be a pmf, because
k!
∞ ∞
e− λ λ k ∞
λk
∑
k =0
p X (k ) = ∑
k =0 k!
=e ∑
−λ
k =0 k !
= e− λ eλ = 1
∞
EX 2 = ∑ k 2 p X (k )
k =0
e−λ λ k
∞
= 0+∑k 2
k =1 k!
kλ k∞
=e ∑ −λ
k =1 k − 1!
∞
(k − 1 + 1)λ k
= e− λ ∑
k =1 k − 1!
⎛ ∞
λ k ⎞ −λ ∞ λ k
= e ⎜0 + ∑
−λ
⎟+e ∑
⎝ k = 2 k − 2! ⎠ k =1 k − 1!
∞
λ k −2 ∞
λ k −1
= e− λ λ 2 ∑ + e−λ λ ∑
k =2 k − 2! k =1 k − 1!
= e − λ λ 2 eλ + e − λ λ eλ
= λ2 + λ
∴σ X2 = EX 2 − µ X2 = λ
Example: The number of calls received ina telephone exchange follows a Poisson
distribution with an average of 10 calls per minute. What is the probability that in one-
minute duration
(i) no call is received
(ii) exactly 5 callsare received
(iii) more than 3 calls are received
Let X be the random variable representing the number of calls received. Given
e−λ λ k
p X (k ) = where λ = 10. Therefore,
k!
(i) probability that no call is received = p X (0) = e −10 =
e −10 × 105
(ii) probability that exactly 5 calls are received = p X (5) = =
5!
(iii) probability that more the 3 calls are received
5 2 3
10 10 10
= 1 − ∑ p X (k ) = 1 − e −10 (1 + + + )=
k =0 1 2! 3!
The Poisson distribution is used to model many practical problems. It is used in many
counting applications to count events that take place independently of one another. Thus
it is used to model the count during a particular length of time of:
• customers arriving at a service station
• telephone calls coming to a telephone exchange packets arriving at a particular
server
• particles decaying from a radioactive specimen
Poisson approximation of the binomial random variable
The Poisson distribution is also used to approximate the binomial distribution B (n, p )
when n is very large and p is small.
λ
Note that lim(1 − ) n = e − λ .
n →∞ n
1 2 k −1 λ
(1 − )(1 − ).....(1 − )(λ ) k (1 − ) n
n =e λ
−λ k
∴ p X (k ) lim n n n
n →∞ λ k!
k !(1 − ) k
n
Thus the Poisson approximation can be used to compute binomial probabilities for large
n. It also makes the analysis of such probabilities easier.Typical examples are:
• number of bit errors in a received binary data file
• number of typographical errors in a printed page
Example
Suppose there is an error probability of 0.01 per word in typing. What is the probability
that there will be more than 1 error in a page 120 words.
Suppose X is the RV representing the number of errors per page of 120 words.
X ~ B (120, p ) where p = 0.01. Therefore,
⎧ 1
⎪ a≤ x≤b
f X ( x) = ⎨ b − a
⎪⎩ 0 otherwise
We use the notation X ~ U (a, b) to denote a random variable X uniformly distributed over the interval
∞ b
1
[a,b]. Also note that ∫
−∞
f X ( x)dx = ∫
a
b−a
dx = 1.
Distribution function FX ( x)
For x < a
FX ( x) = 0
For a ≤ x ≤ b
x
∫
−∞
f X (u )dx
x
du
=∫
a
b−a
x−a
=
b−a
For x > b,
FX ( x) = 1
e jwb − e jwa
=
b−a
Example:
Suppose a random noise voltage X across an electronic circuit is uniformly distributed between -4 V
and 5 V. What is the probability that the noise voltage will lie between 2 v and 4 V? What is the
variance of the voltage?
3 dx 1
P (2 < X ≤ 3) = ∫ = .
2 5 − ( −4) 9
(5 + 4)2 27
σ X2 = = .
12 4
Remark
• The uniform distribution is the simplest continuous distribution
• Used, for example, to to model quantization errors. If a signal is discritized into steps of ∆,
−∆ ∆
then the quantization error is uniformly distributed between and .
2 2
• The unknown phase of a sinusoid is assumed to be uniformly distributed over [0, 2π ] in many
applications. For example, in studying the noise performance of a communication receiver, the
carrier signal is modeled as
X (t ) = A cos( wc t + Φ )
where Φ ~ U (0, 2π ).
• A random variable of arbitrary distribution can be generated with the help of a routine to
generate uniformly distributed random numbers. This follows from the fact that the distribution
function of a random variable is uniformly distributed over [0,1]. (See Example)
Thus if X is a contnuous raqndom variable, then FX ( X ) ~ U (0,1).
The normal distribution is the most important distribution used to model natural and man made
phenomena. Particular, when the random variable is the sum of a large number of random variables, it
can be modeled as a normal random variable.
A continuous random variable X is called a normal or a Gaussian random variable with parameters µ X
and σ X 2 if its probability density function is given by,
2
1 ⎛ x−µX ⎞
− ⎜ ⎟
1 2⎝ σ X ⎠
f X ( x) = e , −∞ < x < ∞
2πσ X
If µ X = 0 and σ X 2 = 1 ,
1 − 12 x2
f X ( x) = e
2π
FX ( x) = P ( X ≤ x )
2
x 1 ⎛ t −µX ⎞
− ⎜ ⎟
1
∫e
2⎝ σ X ⎠
= dt
2πσ X −∞
t − µX
Substituting, u = , we get
σX
x−µX
σX 1
1 − u2
FX ( x) =
2π ∫
−∞
e 2
du
⎛ x − µX ⎞
= Φ⎜ ⎟
⎝ σX ⎠
Thus FX ( x) can be computed from tabulated values of Φ( x). . The table Φ( x) was very useful in the
pre-computer days.
Q( x) = 1 − Φ ( x)
∞ u2
1 −
=
2π ∫e
x
2
du
1
Note that Q (0) = , Q ( − x ) = Q ( x )
2
If X is N ( µ X , σ X 2 ) distributed, then
EX = µ X
var( X ) = σ X 2
Proof:
∞ ∞ 1 ⎛ x−µX ⎞
− ⎜ ⎟
1
∫ xf ∫ xe
2⎝ σ X ⎠
EX = ( x)dx = dx
2πσ X
X
−∞ −∞
1∞
Substituting,
=
1 − u2
σ X ∫ ( uσ X + µ X ) e 2 du x − µX
=u
2π −∞ σX
∞ ∞
1 µ 1
− u2 so that x = uσ X + µ X
= σ X ∫ udu + X ∫e 2
du
2π −∞ 2π −∞
∞ u2
1 −
= 0 + µX
2π ∫e
−∞
2
du
∞
µ
2
u
−
= XX 2∫ e 2 du = µ X
2π 0
∞ x2
−
Evaluation of ∫e
−∞
2
dx
∞ x2
−
Suppose I = ∫e
−∞
2
dx
Then
2
⎛ ∞ −x ⎞
2
I = ⎜ ∫ e 2 dx ⎟
2
⎜ ⎟
⎝ −∞ ⎠
∞ x2 ∞ y2
− −
=
−∞
∫e 2
dx ∫ e
−∞
2
dy
∞ ∞ x2 + y 2
−
= ∫ ∫e
−∞ −∞
2
dydx
−π 0
∞
= 2π ∫ e − r r dr
2
0
∞
( r2 = s )
= 2π ∫ e ds −s
= 2π × 1 = 2π
∴ I = 2π
Var ( X ) = E ( X − µ X )
2
2
∞ 1 ⎛ x−µX ⎞
− ⎜ ⎟
1
∫ (x − µ )
2⎝ σ X ⎠
=
2
e dx
2πσ X
X
−∞
∞ 1
1 − u2
= ∫ σX u e σ X du
2 2
x − µX
2
= 2×
2π 0
∫ u e du Put
1
σ 2 ∞ 1 t = u 2 so that dt = udu
= 2× X 2 ∫ t e dt
2 −t 2
2π 0
= 2×
σX 3 2
Note the definition and properties
π 2
σ 21 1
of the gamma function
= 2× X
π 2 2 ∞
σ 2
= X × π Γn = ∫ t n −1e − t dt
π 0
=σX2
Γn = (n − 1)Γ(n − 1)
1
Γ = π
2
Exponential Random Variable
A continuous random variable X is called exponentially distributed with the parameter λ > 0 if
the probability density function is of the form
⎧λ e − λ x x≥0
f X ( x) = ⎨
⎩ 0 otherwise
The corresponding probabilty distribution function is
x
FX ( x) = ∫
−∞
f X (u )du
⎧0 x<0
=⎨ −λ x
⎩1 − e x≥0
∞
1
We have µ X = EX = ∫ xλ e − λ x dx =
0
λ
∞
1 1
σ X2 = E ( X − µ X ) 2 = ∫ ( x − ) 2 λ e − λ x dx = 2
0
λ λ
The following figure shows the pdf of an exponential RV.
• The time between two consecutive occurrences of independent events can be modeled by the
exponential RV. For example, the exponential distribution gives probability density function
of the time between two successive counts in Poisson distribution
• Used to model the service time in a queing system.
• In reliability studies, the expected life-time of a part, the average time between successive
failures of a system etc.,are determined using the exponential distribution.
Laplace Distribution
A continuous random variable X is called Laplace distributed with the parameter λ > 0 with the
probability density function is of the form
λ −λ x
f X ( x) = e λ > 0, -∞ < x < ∞
2
∞
λ
∫ x2e
−λ x
We have µ X = EX = dx = 0
−∞
∞
λ
σ X2 = E ( X − µ X ) 2 = ∫ x 2 e
−λ x
dx = λ
∞
2
Chi-square random variable
A random variable is called a Chi-square random variable with n degrees of freedom if its
PDF is given by
⎧ x n / 2−1
e − x / 2σ x > 0
2
⎪ n/2 n
f X ( x) = ⎨ 2 σ Γ(n / 2)
⎪0 x<0
⎩
with the parameter σ > 0 and Γ(.) denoting the gamma function. A Chi-square random
The pdf of χ n2 RVs with different degrees of freedom is shown in Fig. below:
Mean and variance of the chi-square random variable
∞
µ X = ∫ xf X ( x)dx
−∞
∞ x n / 2−1
=∫x e − x / 2σ dx
2
0 2 σ Γ(n / 2)
n/2 n
∞ xn / 2
=∫ e − x / 2σ dx
2
0 2 σ Γ(n / 2)
n/2 n
∞ (2σ 2 ) n / 2 u n / 2 − u
=∫ e (2σ 2 )du ( Substituting u = x / 2σ 2 )
0 2 σ Γ(n / 2)
n/2 n
2σ 2 Γ[(n + 2) / 2]
=
Γ(n / 2)
2σ 2 n / 2Γ(n / 2)
=
Γ(n / 2)
= nσ 2
Similarly,
∞
EX 2 = ∫ x 2 f X ( x)dx
−∞
∞ x n / 2−1
= ∫ x2 e − x / 2σ dx
2
0 2 σ Γ(n / 2)
n/2 n
∞ x ( n + 2) / 2
=∫ e− x / 2σ dx
2
0 2 σ Γ(n / 2)
n/2 n
∞ (2σ 2 )( n + 2) / 2 u n / 2 − u
=∫ e (2σ 2 )du ( Substituting u = x / 2σ 2 )
0 2 σ
n/2 n
Γ ( n / 2)
4σ 4 Γ[(n + 4) / 2]
=
Γ(n / 2)
4σ 4 [(n + 2) / 2]n / 2Γ(n / 2)
=
Γ(n / 2)
= n(n + 2)σ 4
σ X2 = EX 2 − µ X2 = n(n + 2)σ 4 − nσ 4 = 2nσ 4
∞
x
=∫x e− x / 2σ 2
2
dx
0 σ 2
∞
2π x2
∫
/ 2σ 2
= e− x
2
dx
σ 0 2πσ
2π σ 2
=
σ 2
π
= σ
2
Similarly
∞
EX 2 = ∫x
2
f X ( x)dx
−∞
∞
x
= ∫ x2 e− x / 2σ 2
2
dx
0 σ 2
∞
x2
= 2σ 2 ∫ ue − u du ( Substituting u = )
0 2σ 2
∞
= 2σ 2 ∫ ue du is the mean of the exponential RV with λ =1)
−u
( Noting that
0
π
∴σ X2 = 2σ 2 − ( σ )2
2
π
= (2 − )σ 2
2
If X 1 ~ N (0, σ 2 ) and X 2 ~ N (0, σ 2 ) are independent, then the envelope X = X 12 + X 22 has the
Rayleigh distribution.
Simulation of Random Variables
• In many fields of science and engineering, computer simulation is used to study
random phenomena in nature and the performance of an engineering system in a
noisy environment. For example, we may study through computer simulation the
performance of a communication receiver. Sometimes a probability model may
not be analytically tractable and computer simulation is used to calculate
probabilities.
• The heart of all these application is that it is possible to simulate a random
variable with an empirical CDF or pdf that fits well with the theoretical CDF or
pdf.
There are several algorithms to generate U [0 1] random numbers. Note that these
algorithms generate random number by a reproducible deterministic method. These
numbers are pseudo random numbers because they are reproducible and the same
sequence of numbers repeats after some period of count specific to the generating
algorithm. This period is very high and a finite sample of data within the period appears
to be uniformly distributed. We will not discuss about these algorithms. Software
packages provide routines to generate such numbers.
Example Suppose, we want to generate a random variable with the pdf f X ( x) ribution
given by
⎧2
⎪ x 0≤ x≤3
f X ( x) = ⎨ 9
⎪⎩0 otherwise
The CDF of X is given by
⎧0
⎪1
⎪
FX ( x) = ⎨ x 2 0≤ x≤3
⎪9
⎪⎩1 x>0
FX(x)
x2 /9
x
3
Therefore, we generate a random number y from the U [0, 1] distribution and set
FX ( x) = y.
We have
1 2
x =y
9
⇒ x= 9y
1 − e−λ x = y
log e (1 − y )
x = −
λ
Since 1 − y is also uniformly distributed over [0, 1], the above expression can be
simplified as,
log y
x = −
λ
Generation of Gaussian random numbers
We observed that the CDF of a discrete random variable is also U [0, 1] distributed.
Suppose X is a discrete random variable with the probability mass function
p X ( xi ), i = 1, 2,..n. Given y = F X ( x ) , the inverse mapping is defined as shown in the
Fig. below.
Y = FX ( X )
x k = FX− 1 ( y ) X
The algorithmic steps for the inverse transform method for simulating discrete random
variables are as follows:
⎧0 for y ≤ 1 − p
x=⎨
⎩1 otherwise
y
1− p
0 x
1
Jointly distributed random variables
We may define two or more random variables on the same sample space. Let X and
Y be two real random variables defined on the same probability space ( S , F, P). The
mapping S→ 2
such that for s ∈ S , ( X ( s ), Y ( s )) ∈ 2
is called a joint random
variable. ( X ( s ), Y ( s ))
R2
Y (s)
s •
Example1: Suppose we are interested in studying the height and weight of the students
in a class. We can define the joint RV ( X , Y ) where X represents height and
Y represents the weight.
Y
( x, y )
( x1 , y1 )
• FX ,Y (−∞, y) = FX ,Y ( x, −∞) = 0
Note that
{ X ≤ −∞, Y ≤ y} ⊆ { X ≤ −∞}
• FX ,Y (∞, ∞) = 1.
• FX ,Y ( x, y ) is right continuous in both the variables.
• If x < x2 and y1 < y2 ,
I 1
P{x1 < X ≤ x2 , y1 < Y ≤ y2 } = FX ,Y ( x2 , y2 ) − FX ,Y ( x1 , y2 ) − FX ,Y ( x2 , y1 ) + FX ,Y ( x1 , y1 ) ≥ 0.
Y ( x2 , y2 )
( x1 , y1 )
X
Given FX ,Y ( x, y ), -∞ < x < ∞, -∞ < y < ∞, we have a complete description of
the random variables X and Y .
• FX ( x) = FXY ( x,+∞).
To prove this
( X ≤ x ) = ( X ≤ x ) ∩ ( Y ≤ +∞ )
∴ F X ( x ) = P (X ≤ x ) = P (X ≤ x , Y ≤ ∞ )= F XY ( x , +∞ )
Example
Consider two jointly distributed random variables X and Y with the joint CDF
⎧(1 − e −2 x )(1 − e − y ) x ≥ 0, y ≥ 0
FX ,Y ( x, y ) = ⎨
⎩0 otherwise
⎧1 − e−2 x x ≥ 0
FX ( x) = lim FX ,Y ( x, y ) = ⎨
y →∞
⎩0 elsewhere
(a)
⎧1 − e − y y ≥ 0
FY ( y ) = lim FX ,Y ( x, y ) = ⎨
x →∞
⎩0 elsewhere
(b)
P{11 < X ≤ 2, 1 < Y ≤ 2} = FX ,Y (2, 2) + FX ,Y (1,1) − FX ,Y (1, 2) − FX ,Y (2,1)
= (1 − e −4 )(1 − e −2 ) + (1 − e −2 )(1 − e −1 ) − (1 − e−2 )(1 − e−2 ) − (1 − e−4 )(1 − e −1 )
=0.0272
• ∑ ∑ p X ,Y ( x, y ) = 1
( x , y )∈ RX × RY
This is because
∑ ∑ p X ,Y ( x, y ) = P ∪ {x, y}
( x , y )∈ RX × RY x , y )∈RX × RY
=P ( RX × RY )
=P{s | ( X ( s ), Y ( s )) ∈ ( RX × RY }
=P ( S ) = 1
• Marginal Probability Mass Functions: The probability mass functions
p X ( x ) and pY ( y ) are obtained from the joint probability mass function as
follows
p X ( x) = P{ X = x} ∪ RY
= ∑ p X ,Y ( x, y )
y∈RY
and similarly
p Y ( y ) = ∑ p X ,Y ( x , y )
x∈ R X
X 0 1 2 pY ( y )
Y
0 0.25 0.1 0.15 0.5
1 0.14 0.35 0.01 0.5
p X ( x) 0.39 0.45
Joint Probability Density Function
If X and Y are two continuous random variables and their joint distribution function
is continuous in both x and y, then we can define joint probability density function
f X ,Y ( x, y ) by
∂2
f X ,Y ( x, y ) = FX ,Y ( x, y ), provided it exists.
∂x∂y
x y
Clearly FX ,Y ( x, y ) = ∫ ∫ f X ,Y (u , v)dudv
−∞ −∞
Properties of Joint Probability Density Function
f X ,Y ( x, y ) ≥ 0 ∀( x, y ) ∈ 2
∞ ∞
• ∫ ∫ f X ,Y ( x, y )dxdy = 1
−∞ −∞
∂2
f X ,Y ( x, y ) = FX ,Y ( x, y )
∂x∂y
∂2
= [(1 − e −2 x )(1 − e − y )] x ≥ 0, y ≥ 0
∂x∂y
= 2e −2 x e − y x ≥ 0, y ≥ 0
Example: The joint pdf of two random variables X and Y are given by
f X ,Y ( x, y ) = cxy 0 ≤ x ≤ 2, 0 ≤ y ≤ 2
= 0 otherwise
(i) Find c.
(ii) Find FX , y ( x, y )
(iii) Find f X ( x) and fY ( y ).
(iv) What is the probability P (0 < X ≤ 1, 0 < Y ≤ 1) ?
∞ ∞ 2 2
∫ ∫
−∞ −∞
f X ,Y ( x, y )dydx = c ∫
0 ∫0
xydydx
1
∴c =
4
1 y x
4 ∫0 ∫0
FX ,Y ( x, y ) = uvdudv
x2 y2
=
16
2
xy
f X ( x) = ∫ dy 0 ≤ y ≤ 2
0
4
x
= 0≤ y≤2
2
Similarly
y
fY ( y ) = 0≤ y≤2
2
P (0 < X ≤ 1, 0 < Y ≤ 1)
= FX ,Y (1,1) + FX ,Y (0, 0) − FX ,Y (0,1) − FX ,Y (1, 0)
1
= +0−0−0
16
1
=
16
Jointly distributed random variables
We may define two or more random variables on the same sample space. Let X and
Y be two real random variables defined on the same probability space ( S , F, P). The
mapping S→ 2
such that for s ∈ S , ( X ( s ), Y ( s )) ∈ 2
is called a joint random
variable. ( X ( s ), Y ( s ))
R2
Y (s)
s •
Example1: Suppose we are interested in studying the height and weight of the students
in a class. We can define the joint RV ( X , Y ) where X represents height and
Y represents the weight.
Y
( x, y )
( x1 , y1 )
• FX ,Y (−∞, y) = FX ,Y ( x, −∞) = 0
Note that
{ X ≤ −∞, Y ≤ y} ⊆ { X ≤ −∞}
• FX ,Y (∞, ∞) = 1.
• FX ,Y ( x, y ) is right continuous in both the variables.
• If x < x2 and y1 < y2 ,
I 1
P{x1 < X ≤ x2 , y1 < Y ≤ y2 } = FX ,Y ( x2 , y2 ) − FX ,Y ( x1 , y2 ) − FX ,Y ( x2 , y1 ) + FX ,Y ( x1 , y1 ) ≥ 0.
Y ( x2 , y2 )
( x1 , y1 )
X
Given FX ,Y ( x, y ), -∞ < x < ∞, -∞ < y < ∞, we have a complete description of
the random variables X and Y .
• FX ( x) = FXY ( x,+∞).
To prove this
( X ≤ x ) = ( X ≤ x ) ∩ ( Y ≤ +∞ )
∴ F X ( x ) = P (X ≤ x ) = P (X ≤ x , Y ≤ ∞ )= F XY ( x , +∞ )
Example
Consider two jointly distributed random variables X and Y with the joint CDF
⎧(1 − e −2 x )(1 − e − y ) x ≥ 0, y ≥ 0
FX ,Y ( x, y ) = ⎨
⎩0 otherwise
⎧1 − e−2 x x ≥ 0
FX ( x) = lim FX ,Y ( x, y ) = ⎨
y →∞
⎩0 elsewhere
(a)
⎧1 − e − y y ≥ 0
FY ( y ) = lim FX ,Y ( x, y ) = ⎨
x →∞
⎩0 elsewhere
(b)
P{11 < X ≤ 2, 1 < Y ≤ 2} = FX ,Y (2, 2) + FX ,Y (1,1) − FX ,Y (1, 2) − FX ,Y (2,1)
= (1 − e −4 )(1 − e −2 ) + (1 − e −2 )(1 − e −1 ) − (1 − e−2 )(1 − e−2 ) − (1 − e−4 )(1 − e −1 )
=0.0272
• ∑ ∑ p X ,Y ( x, y ) = 1
( x , y )∈ RX × RY
This is because
∑ ∑ p X ,Y ( x, y ) = P ∪ {x, y}
( x , y )∈ RX × RY x , y )∈RX × RY
=P ( RX × RY )
=P{s | ( X ( s ), Y ( s )) ∈ ( RX × RY }
=P ( S ) = 1
• Marginal Probability Mass Functions: The probability mass functions
p X ( x ) and pY ( y ) are obtained from the joint probability mass function as
follows
p X ( x) = P{ X = x} ∪ RY
= ∑ p X ,Y ( x, y )
y∈RY
and similarly
p Y ( y ) = ∑ p X ,Y ( x , y )
x∈ R X
X 0 1 2 pY ( y )
Y
0 0.25 0.1 0.15 0.5
1 0.14 0.35 0.01 0.5
p X ( x) 0.39 0.45
Joint Probability Density Function
If X and Y are two continuous random variables and their joint distribution function
is continuous in both x and y, then we can define joint probability density function
f X ,Y ( x, y ) by
∂2
f X ,Y ( x, y ) = FX ,Y ( x, y ), provided it exists.
∂x∂y
x y
Clearly FX ,Y ( x, y ) = ∫ ∫ f X ,Y (u , v)dudv
−∞ −∞
Properties of Joint Probability Density Function
f X ,Y ( x, y ) ≥ 0 ∀( x, y ) ∈ 2
∞ ∞
• ∫ ∫ f X ,Y ( x, y )dxdy = 1
−∞ −∞
∂2
f X ,Y ( x, y ) = FX ,Y ( x, y )
∂x∂y
∂2
= [(1 − e −2 x )(1 − e − y )] x ≥ 0, y ≥ 0
∂x∂y
= 2e −2 x e − y x ≥ 0, y ≥ 0
Example: The joint pdf of two random variables X and Y are given by
f X ,Y ( x, y ) = cxy 0 ≤ x ≤ 2, 0 ≤ y ≤ 2
= 0 otherwise
(i) Find c.
(ii) Find FX , y ( x, y )
(iii) Find f X ( x) and fY ( y ).
(iv) What is the probability P (0 < X ≤ 1, 0 < Y ≤ 1) ?
∞ ∞ 2 2
∫ ∫
−∞ −∞
f X ,Y ( x, y )dydx = c ∫
0 ∫0
xydydx
1
∴c =
4
1 y x
4 ∫0 ∫0
FX ,Y ( x, y ) = uvdudv
x2 y2
=
16
2
xy
f X ( x) = ∫ dy 0 ≤ y ≤ 2
0
4
x
= 0≤ y≤2
2
Similarly
y
fY ( y ) = 0≤ y≤2
2
P (0 < X ≤ 1, 0 < Y ≤ 1)
= FX ,Y (1,1) + FX ,Y (0, 0) − FX ,Y (0,1) − FX ,Y (1, 0)
1
= +0−0−0
16
1
=
16
Conditional probability mass functions
pY / X ( y / x) = P({Y = y}/{ X = x})
P ({ X = x}{Y − y})
=
P{ X = x}
p X ,Y ( x, y )
= provided p X ( x) ≠ 0
p X ( x)
Similarly we can define the conditional probability mass function
pX / Y ( x / y)
• From the definition of conditional probability mass functions, we can
define two independent random variables. Two discrete random variables
X and Y are said to be independent if and only if
pY / X ( y / x ) = pY ( y )
so that
p X , y ( x, y ) = p X ( x ) pY ( y )
• Bayes rule:
p X / Y ( x / y ) = P({ X = x}/{Y = y})
P({ X = x}{Y − y})
=
P{Y = y}
p X ,Y ( x, y )
=
pY ( y )
p X ,Y ( x, y )
=
∑ p X ,Y ( y )
x∈RX
Example Consider the random variables X and Y with the joint probability
mass function as tabulated in Table .
X 0 1 2 pY ( y )
Y
0 0.25 0.1 0.15 0.5
1 0.14 0.35 0.01 0.5
p X ( x) 0.39 0.45
The marginal probabilities are as shown in the last column and the last row
p X ,Y (0,1)
pY / X (0 /1) =
p X (1)
0.14
=
0.39
P(Y ≤ y, X = x)
=
P ( X = x)
as both the numerator and the denominator are zero for the above expression.
P (Y ≤ y , x < X ≤ x + ∆x)
=lim∆x →0
P ( x < X ≤ x + ∆x)
y
∫ f X ,Y ( x, u )∆xdu
∞
=lim∆x →0
f X ( x ) ∆x
y
∫ f X ,Y ( x, u )du
=∞
f X ( x)
The conditional density is defined in the limiting sense as follows
f Y / X ( y / X = x) = lim ∆y →0 ( FY / X ( y + ∆y / X = x) − FY / X ( y / X = x)) / ∆y
= lim ∆y →0,∆x→0 ( FY / X ( y + ∆y / x < X ≤ x + ∆x) − FY / X ( y / x < X ≤ x + ∆x)) / ∆y (
∴ fY / X ( x / y ) = f X ,Y ( x, y ) / f X ( x)
(2)
Similarly we have
∴ f X / Y ( x / y ) = f X ,Y ( x, y ) / fY ( y )
(3)
Given the joint density function we can find out the conditional density function.
Example: For random variables X and Y, the joint probability density function is given
by
1 + xy
f X ,Y ( x, y ) = X ≤ 1, Y ≤ 1
4
= 0 otherwise
Find the marginal density f X ( x), fY ( y ) and fY / X ( y / x). Are X and Y independent?
1 + xy
1
f X ( x) = ∫
−1
4
dy
1
=
2
Similarly
1
fY ( y ) = -1 ≤ y ≤ 1
2
and
f X ,Y ( x, y )
fY / X ( y / x ) =
f X ( x)
=
Baye’s Rule for mixed random variables
Example
X Y
+
Then
p X ( x ) fY / X ( y / x )
p X / Y ( x = 1/ y ) =
∑ p X ( x ) fY / X ( y / x )
x
− ( y −1)2 / 2σ 2
pe
=
− ( y −1)2 / 2σ 2 / 2σ 2
+ (1 − p)e − ( y +1)
2
pe
Let X and Y be two random variables characterised by the joint distribution function
FX ,Y ( x, y ) = P{ X ≤ x, Y ≤ y}
and the corresponding joint density function f X ,Y ( x, y ) = ∂2
∂x∂y
FX ,Y ( x, y )
and equivalently
fY / X ( y) = fY ( y)
Remark:
Suppose X and Y are two discrete random variables with joint probability mass function
p X ,Y ( x, y ). Then X and Y are independent if
p X ,Y ( x, y ) = p X ( x ) pY ( y ) ∀(x,y) ∈ RX × RY
Transformation of two random variables:
We are often interested in finding out the probability density function of a function of
two or more RVs. Following are a few examples.
• The received signal by a communication receiver is given by
Z = X +Y
where Z is received signal which is the superposition of the message signal X
and the noise Y .
X Z
+
DZ
{Z ≤ z}
X
∴ FZ ( z ) = P({Z ≤ z})
= P {( x, y ) | ( x, y ) ∈ Dz }
= ∫∫ f X ,Y ( x, y ) dydx
( x , y )∈Dz
dFZ ( z )
∴ fZ ( z ) =
dz
Z≤z
=> X + Y ≤ Z
∴ FZ ( z ) = ∫∫ f X ,Y ( x, y ) dxdy
( x , y )∈Dz
∞
⎡ z−x ⎤
= ∫−∞ ⎢⎣ −∞∫ f X ,Y ( x, y )dy ⎥⎦ dx
∞
⎡z ⎤
= ∫ ⎢ ∫ f X ,Y ( x, u − x ) du ⎥ dx substituting y = u - x
−∞ ⎣ −∞ ⎦
z
⎡ ∞
⎤
= ∫ ⎢ ∫ f X ,Y ( x, u − x ) dx ⎥ du interchanging the order of integration
−∞ ⎣ −∞ ⎦
d ⎡ ⎤
z ∞
∴ fZ ( z ) = ∫ ⎣ −∞∫
⎢ f X ,Y ( x , u − x ) dx ⎥ du
dz −∞ ⎦
∞
= ∫ f X ,Y ( x, u − x ) dx
−∞
f X ,Y ( x, z − x ) = f X ( x ) fY ( z − x )
∞
∴ fZ ( z ) = ∫ f X ( x ) fY ( z − x ) dx
−∞
= f X ( z ) * fY ( z )
Suppose X and Y are independent random variables and each uniformly distributed over
(a, b). f X ( x ) and fY ( y ) are as shown in the figure below.
FZ ( z ) = ∫∫ f X ,Y ( x, y ) dydx
( x , y )∈Dz
z
∞ x
= ∫ (∫ f X ,Y ( x, y ) dy )dx
−∞ −∞
∞ ∞
1 ⎛ u⎞ Substituting u = xy
= ∫
−∞
∫
x −∞
f X ,Y ⎜ x, ⎟ dudx
⎝ x⎠
du = xdy
∞
d 1 ⎛ z⎞
∴ fZ ( z ) = FZ ( z ) = ∫ f X ,Y ⎜ x, ⎟ dx
dz −∞
x ⎝ x⎠
∞
1 ⎛z ⎞
= ∫
−∞
y
f X ,Y ⎜ , y ⎟ dy
⎝y ⎠
Y
Probability density function of Z =
X
Z≤z
Y
=> ≤ z
X
=> Y ≤ xz
∴ Dz = {( x, y ) | Z ≤ z}
= {−∞ < x ≤ ∞, y ≤ xz}
∴ FZ ( z ) = ∫∫ f X ,Y ( x, y ) dydx
( x , y )∈Dz
∞ zx
= ∫∫ f XY ( x, y ) dydx
−∞ −∞
∞ z
= ∫ x∫ f X ,Y ( x, ux ) dudx
−∞ −∞
∞
fZ ( z ) = ∫ x f X ( x ) fY ( zx ) dx
−∞
d
∴ fZ ( z ) = FZ ( z )
dz
∞
= ∫ x f X ,Y ( x, z , k ) dx
−∞
∞
= ∫ y f X ,Y ( z , y, u ) dy
−∞
Example:
Suppose X and Y are independent zero mean Gaussian random variable with unity
Y
standard derivationand Z = . Then
X
∞ 2 2 2
1 − x2 1 − 3 2x
fZ ( z ) = ∫ x e e dx
−∞ 2π 2π
∞
1 1
(
− x 2 1+ z 2 )
=
2π ∫
−∞
xe 2
dx
∞
1 1
(
− x 2 1+ z 2 )
=
π ∫ xe
0
2
dx
1
=
π (1 + z 2 )
= {( x, y ) | x2 + y 2 ≤ z }
= {( r , θ ) | 0 ≤ r ≤ z , 0 ≤ θ ≤ 2π }
∴ FZ ( z ) = ∫∫ f X ,Y ( x, y ) dydx
( x , y )∈Dz
2π z
= ∫ ∫ f ( r cos θ , r sin θ ) rdrdθ
0 0
XY
2π
d
∴ fZ ( z ) = FZ ( z ) = ∫ f XY ( z cos θ , z sin θ ) zdθ
dz 0
Example Suppose X and Y are two independent Gaussian random variables each with
mean 0 and variance σ 2 and Z = X 2 + Y 2 . Then
2π
fZ ( z ) = ∫ f XY ( z cos θ , z sin θ ) zdθ
0
2π
= z ∫ f X ( z cos θ ) fY ( z sin θ ) dθ
0
z2 z2
− cos2 θ − sin 2 θ
2π 2σ 2 2σ 2
e .e
= z∫ dθ
0 2πσ 2
z2
−
ze 2σ 2
= z≥0
σ2
The above is the Rayleigh density function we discussed earlier.
Rician Distribution:
Suppose X and Y are independent Gaussian variables with non zero mean µ X and µY
respectively and constant variance. We have to find the joint density function of the
random variable Z = X 2 + Y 2 .
Envelope of a sinusoidal + a narrow band Gaussian noise.
Received noise in a multipath situation.
Here
1 ⎛ 2⎞
⎜ ( x − µ X ) + ( y − µY ) ⎟
2
−
1 2σ 2 ⎜⎝ ⎟
f X ,Y ( x, y ) = e ⎠
2πσ 2
and
Z = X2 +Y2
We have shown that
2π
fZ ( z ) = ∫ f XY ( z cos θ , z sin θ ) zdθ
0
2πσ 2
( z2 +µ 2 )
1 −
= e 2σ 2 e z µ cos(θ −φ )
2πσ 2
2π ( z 2 + µ 2 ) z µ cos(θ −φ
1 −
∴ fZ ( z ) =
)
∫ 2πσ 2 e
0
2σ 2 e σ 2 zdθ
( z2 + µ 2 )
− z µ cos(θ −φ
ze 2σ 2 2π )
=
2πσ 2 ∫ 0
e σ2
dθ
Joint Probability Density Function of two functions of two random variables
We consider the transformation ( g1 , g 2 ) : R 2 → R 2 . We have to find out the joint
probability density function f Z1 , Z2 ( z1 , z2 ) where Z1 = g1 ( X , Y ) and Z 2 = g 2 ( X , Y ) . We
hve to find out the joint probability density function f Z1 , Z2 ( z1 , z2 ) where z1 = g1 ( x, y )
and z2 = g 2 ( x, y ) . Suppose the inverse mapping relation is
x = h1 ( z1 , z2 ) and y = h2 ( z1 , z2 )
Z2
δ h1 δh
(x + dz1 , y + 2 dz )
( x, y ) δ z1 δ z1
( z1 , z2 )
( z1 + dz1 , z2 )
X
Z1
We can similarly find the points in the X − Y plane corresponding to ( z1 , z2 + dz2 ) and
( z1 + dz1 , z2 + dz2 ). The mapping is shown in Fig. We notice that each differential region
in the X − Y plane is a parallelogram. It can be shown the differential parallelogram at
( x, y ) has a area J ( z1 , z2 ) dz1 dz2 where J ( z1 , z2 ) is the Jacobian of the transformation
defined as the determinant
δ h1 δ h1
δz δ z2
J ( z1 , z2 ) = 1
δ h2 δ h2
δ z1 δ z2
Further, it can be shown that the absolute values of the Jacobians of the forward and the
inverse transform are inverse of each other so that
1
J ( z1 , z2 ) =
J ( x, y )
where
δ g1 δ g1
δx δy
J ( x, y ) =
δ g2 δ g2
δx δy
dz1 dz2
Therefore, the differential parallelogram in Fig. has an area of .
J ( x, y )
Suppose the transformation z1 = g1 ( x, y ) and z2 = g 2 ( x, y ) has n roots and let
( xi , yi ), i = 1, 2,..n be the roots. The inverse mapping of the differential region in the
X − Y plane will be n differential regions corresponding to n roots. The inverse
mapping is illustrated in the following figure for n = 4. As these parallelograms are non-
overlapping,
n
dz1 dz2
f Z1 , Z2 ( z1 , z2 ) dz1 dz2 = ∑ f X ,Y ( x, y )
i =1 J ( xi , yi )
n f X ,Y ( x, y )
∴ f Z1 , Z 2 ( z1 , z2 ) = ∑
i =1 J ( xi , yi )
Remark
• If z1 = g1 ( x, y ) and z2 = g 2 ( x, y ) does not have a root in ( x, y ),
then f Z1 , Z2 ( z1 , z2 ) = 0.
δx δx δy δy
(x + dz1 + dz2 , y + dz1 + dz )
δ z1 δ z2 δ z1 δ z2 2
Y
δx δy
(x + dz z , y + dz )
Z2 δ z2 δ z2 2
( z1 , z2 + dz2 ) ( z1 + dz1 , z2 + dz2 )
δx δy
(x + dz1 , y + dz )
( z1 , z2 ) ( z1 + dz1 , z2 ) ( x, y ) δ z1 δ z1 1
Z1 X
Z2
( z1 , z2 )
( z1 + dz1 , z2 )
X
Z1
Solution:
y
and tan θ = ……………… (2)
x
From (1)
∂r x
= = cos θ
∂x r
∂r y
and = = sin θ
∂y r
From (2)
∂θ −y sin θ
= 2 =−
∂x x + y 2
r
∂θ x cos θ
= 2 =
∂y x + y 2
r
⎡ cos θ sin θ ⎤
1
∴ J = det ⎢ sin θ cos θ ⎥⎥ =
⎢− r
⎣ r r ⎦
f X ( x, y) ⎤
∴ f R,θ (r ,θ ) = ⎥
J ⎥⎦ x =r cosθ
y = r sinθ
2 2
r − r 2 cos2 θ − r 2 sin 2 θ
= e 2σ . e 2σ
2πσ 2
2
r − r 2
= e 2σ
2πσ 2
2π
∴ f R (r ) = ∫
0
f R ,θ (r , θ )dθ
2
r − r 2
= e 2σ 0≤r <∞
σ2
∞
fθ (θ ) = ∫ f R,θ (r ,θ )dr
0
∞ 2
− r 2
1
2πσ 2 ∫0
= re 2σ dr
1
= 0 ≤ θ ≤ 2π
2π
Rician Distribution:
• X and Y are independent Gaussian variables with non zero mean µ X and µY
respectively and constant variance.
Z= X 2 +Y2
Y
φ = tan −1
X
y
From z = x2 + y2 and tan φ =
x
y
We have, z 2 = x2 + y2 and tan φ =
x
Therefore,
∂z x
= = cos φ
∂x z
∂z y
and = = sin φ
∂y z
∂φ y y
also =− 2 = − 2 cos 2 φ
∂x x sec φ2
x
∂φ y 1
and = = cos 2 φ
∂y x sec φ x
2
⎡ cos φ sin φ ⎤
∴ J ( x, y ) = det ⎢ y 2 cos 2 φ 1
⎥
⎢− 2 ⎥
cos x
⎣⎢ x2 x ⎦⎥
cos3 φ y sin φ cos 2 φ
= +
x x2
⎛ x cos φ + y sin φ ⎞
= cos 2 φ ⎜ ⎟
⎝ x2 ⎠
z cos φ 1
2
= =
x2 z
2πσ 2
X − µ X = Z cos φ − µ cos φ0
⎛ 2⎞
− 1 2 ⎜⎜ ( z cosφ − µ cosφ0 ) + ( z sin φ − µ sin φ ) ⎟⎟
2
1 2σ ⎝
f X ,Y ( x, y) = e ⎠
2πσ 2
1 ⎛ ⎞
= 1 e− 2σ 2 ⎜⎝ z2 −2 zµ cos(φ −φ0 )+ µ 2 ⎟⎠
2πσ 2
− z + µ2
2 2 zµ
1 cos(φ −φ0 )
= e 2 σ e σ2
2πσ 2
Expected Values of Functions of Random Variables
Recall that
• If Y = g ( X ) is a function of a continuous random variable X , then
∞
EY = Eg ( X ) = ∫ g ( x) f X ( x)dx
−∞
∴ zf Z ( z )∆z = ∑
( xi , yi )∈∆Di
zf X ,Y ( xi , yi )∆xi ∆ yi
= ∑
( xi , yi )∈∆Di
g ( xi , yi ) f X ,Y ( xi , yi )∆xi ∆ y
As z is varied over the entire the entire Z axis, the corresponding (nonoverlapping)
differential regions in X − Y plane covers the entire plane.
∞ ∞ ∞
∴ ∫ zf Z ( z )dz = ∫ ∫ g ( x, y ) f X ,Y ( x, y )dxdy
−∞ −∞ −∞
Thus,
∞ ∞
Eg ( X , y ) = ∫ ∫ g ( x, y ) f X ,Y ( x, y )dxdy
−∞ −∞
Z
Y
{z < Z ≤ z + ∆z}
∆D1 ∆D2
∆D3
Example: The joint pdf of two random variables X and Y are given by
1
f X ,Y ( x, y ) = xy 0 ≤ x ≤ 2, 0 ≤ y ≤ 2
4
= 0 otherwise
Find the joint expectation of g ( X , Y ) = X 2Y
Eg ( X , Y ) = EX 2Y
∞ ∞
= ∫ ∫ g ( x, y ) f X ,Y ( x, y )dxdy
−∞ −∞
22 1
= ∫ ∫ x 2 y xydxdy
00 4
1 2 2
= ∫ x 3 dx ∫ y 2 dy
40 0
1 2 23 4
= × ×
4 4 3
8
=
3
Example: If Z = aX + bY , where a and b are constants, then
EZ = aEX + bEY
Proof:
∞ ∞
EZ = ∫ ∫ (ax + by ) f X ,Y ( x, y )dxdy
−∞ −∞
∞ ∞ ∞ ∞
= ∫ ∫ axf X ,Y ( x, y ) dxdy + ∫ ∫ byf X ,Y ( x, y ) dxdy
−∞ −∞ −∞ −∞
∞ ∞ ∞ ∞
= ∫ ax ∫ f X ,Y ( x, y )dydx + ∫ by ∫ f X ,Y ( x, y )dxdy
−∞ −∞ −∞ −∞
∞ ∞
= a ∫ xf X ( x)dx + b ∫ yfY ( y )dy
−∞ −∞
= aEX + bEY
Thus, expectation is a linear operator.
Example:
Consider the discrete random variables X and Y discussed in Example .The
joint probability mass function of the random variables are tabulated in Table .
Find the joint expectation of g ( X , Y ) = XY
X 0 1 2 pY ( y )
Y
0 0.25 0.1 0.15 0.5
1 0.14 0.35 0.01 0.5
p X ( x) 0.39 0.45
Clearly EXY = ∑ ∑ g ( x, y ) p X ,Y ( x, y )
x , y∈R X × RY
Remark
(1) We have earlier shown that expectation is a linear operator. We can generally write
E[a1 g1 ( X , Y ) + a2 g 2 ( X , Y )] = a1 Eg1 ( X , Y ) + a2 Eg 2 ( X , Y )
Thus E ( XY + 5log e XY ) = EXY + 5E log e XY
(2) If X and Y are independent random variables and g ( X , Y ) = g1 ( X ) g 2 (Y ), then
Eg ( X , Y ) = Eg1 ( X ) g 2 (Y )
∞ ∞
= ∫ ∫ g1 ( X ) g 2 (Y ) f X ,Y ( x, y )dx
−∞ −∞
∞ ∞
= ∫ ∫ g1 ( X ) g 2 (Y ) f X ( x) fY ( y )dxdy
−∞ −∞
∞ ∞ ∞
= ∫ g1 ( X ) f X ( x)dx ∫ ∫ g 2 (Y ) fY ( y )dxdy
−∞ −∞ −∞
= Eg1 ( X ) Eg 2 (Y )
Joint Moments of Random Variables
Just like the moments of a random variable provides a summary description of the
random variable, so also the joint moments provide summary description of two random
variables.
For two continuous random variables X and Y the joint moment of order m + n is
defined as
∞ ∞
E ( X m Y n ) = ∫ ∫ x m y n f X ,Y ( x, y )dxdy
−∞ −∞ and
the joint central moment of order m + n is defined as
∞ ∞
E ( X − µ X ) m (Y − µY ) n = ∫ ∫ ( x − µ X ) m ( y − µY ) n f X ,Y ( x, y )dxdy
−∞ −∞
µ X = EX µY = EY
where and
Remark
(1) If X and Y are discrete random variables, the joint expectation of order m and
n is defined as
E ( X mY n ) = ∑ ∑ x y p X ,Y ( x, y )
m n
( x , y )∈RX ,Y
E ( X − µ X ) m (Y − µY ) n = ∑ ∑ ( x − µ X ) ( y − µY ) p X ,Y ( x, y )
m n
( x , y )∈R X ,Y
Cov( X , Y )
The ratio ρ ( X , Y ) = is called the correlation coefficient. We will give an
σ X σY
Proof:
Consider the random variable Z = aX + Y
E (aX + Y ) 2 ≥ 0
.
⇒ a 2 EX 2 + EY 2 + 2aEXY ≥ 0
Non-negatively of the left-hand side => its minimum also must be nonnegative.
For the minimum value,
dEZ 2 EXY
= 0 => a = −
da EX 2
E 2 XY E 2 XY
so the corresponding minimum is + EY 2
− 2
EX 2 EX 2
Minimum is nonnegative =>
E 2 XY
EY 2 − ≥0
EX 2
⇒ E 2 XY ≤ EX 2 EY 2
⇒ EXY ≤ EX 2 EY 2
Now
Cov( X , Y ) E ( X − µ X )(Y − µY )
ρ( X ,Y ) = =
σ Xσ X E ( X − µ X ) 2 E (Y − µY ) 2
E ( X − µ X )(Y − µY )
∴ ρ( X ,Y ) =
E ( X − µ X ) 2 E (Y − µY ) 2
E ( X − µ X ) 2 E (Y − µY ) 2
≤
E ( X − µ X ) 2 E (Y − µY ) 2
=1
ρ ( X ,Y ) ≤ 1
Yˆ = aX + b Regression
Prediction error Y − Yˆ
Mean square prediction error
E (Y − Yˆ ) 2 = E (Y − aX − b) 2
For minimising the error will give optimal values of a and b. Corresponding to the
optimal solutions for a and b, we have
∂ E (Y − aX − b) 2 = 0
∂a
∂ E (Y − aX − b) 2 = 0
∂b
1
Solving for a and b , Yˆ − µY = 2 σ X ,Y ( x − µ X )
σX
σ
so that Yˆ − µY = ρ X ,Y y ( x − µ X )
σx
σ XY
where ρ X ,Y = is the correlation coefficient.
σ XσY
Yˆ − µY
σy
slope = ρ X ,Y
σx
x − µX
Remark
If ρ X ,Y > 0, then X and Y are called positively correlated.
=> Yˆ − µ Y = 0
=> Yˆ = µ Y is the best prediction.
=> Yˆ − µ Y = 0
=> Yˆ = µ Y is the best prediction.
Note that independence => Uncorrelatedness. But uncorrelated generally does not imply
independence (except for jointly Gaussian random variables).
Example :
Y = X 2 and f X (x) is uniformly distributed between (1,-1).
X and Y are dependent, but they are uncorrelated.
Cov( X , Y ) = σ X = E ( X − µ X )(Y − µ Y )
Because = EXY = EX 3 = 0
= EXEY (∵ EX = 0)
In fact for any zero- mean symmetric distribution of X, X and X 2 are uncorrelated.
⎡ ( x− µ X )2 ( x− µ )( y − µ ) ( y − µ ) 2 ⎤
− 1
⎢ − 2 ρ XY σ σ
X Y + Y
⎥
2 (1− ρ 2X ,Y ) 2 σ Y2
⎣⎢ σ X ⎦⎥
f X ,Y ( x, y ) = 1 X Y
e
2πσ X σ Y 1− ρ X2 ,Y
- means µ X and µ Y
2 2
- variances σ X and σ Y
- correlation coefficient ρ X , Y .
We denote the jointly Gaussian random variables X and Y with these parameters as
( X , Y ) ~ N ( µ X , µY , σ X2 , σ Y2 , ρ X ,Y )
The pdf has a bell shape centred at ( µ X , µY ) as shown in the Fig. below. The variances
σ X2 and σ Y2 determine the spread of the pdf surface and ρ X ,Y determines the orientation of the
surface in the X − Y plane.
Properties of jointly Gaussian random variables
(1) If X and Y are jointly Gaussian, then X and Y are both Gaussian.
We have
∞
f X ( x) = ∫ f X ,Y ( x, y )dy
−∞
⎡ ( x− µ )2 ( x− µ )( y − µ ) ( y − µ )2 ⎤
∞ − 1 ⎢ X − 2 ρ XY σ σ
X Y + Y ⎥
2(1− ρ 2X ,Y ) ⎢ σ 2X σ Y2 ⎥⎦
= ∫ ⎣
1 X Y
2 πσ X σ Y 1− ρ X2 ,Y
e dy
−∞
2 ⎡ ρ 2 ( x − µ )2 ( x − µ )( y − µ ) ( y − µ )2 ⎤
1 ⎛ x−µ X ⎞
⎢ X ,Y Y ⎥
− ⎜ ⎟
∞ − 1 X
− 2 ρ XY X
σ Xσ
Y +
2⎜ σ ⎟ 2 (1− ρ X ,Y ) ⎢
2 σX2 σ Y2 ⎥
= e ⎝ X ⎠ ⎣ ⎦
∫
1 Y
2πσ X 2 πσ Y 1− ρ X
2
e dy
−∞ ,Y
1 ⎛ x−µ X ⎞
2
⎡ ρ X ,Y σ ( x − µ X ) ⎤2
− ⎜ ⎟
∞ − 1
⎢( y − µY − Y
⎥
2⎜ σ ⎟ 2 2
2 σ Y (1− ρ X ,Y ) ⎣⎢ σ
⎦⎥
= e ⎝ X ⎠
∫
X
1
e dy Area under a Gaussian
2πσ X 2 πσ Y 1− ρ X
2
−∞ ,Y
2
1 ⎛ x−µ X ⎞
− ⎜ ⎟
2 ⎜⎝ σ X ⎟
= 1
2 πσ X
e ⎠
Similarly
2
1 ⎛ y − µY ⎞
− ⎜ ⎟
2 ⎜⎝ σ Y ⎟
fY ( y ) = 1
2πσ Y
e ⎠
(2) The converse of the above result is not true. If each of X and Y is Gaussian, X and Y
are not necessarily jointly Gaussian. Suppose
⎡ ( x−µ X )2 ( y − µ )2 ⎤
− 12 ⎢ 2
+ Y
⎥
⎢⎣ σ X σ Y2 ⎥⎦
f X ,Y ( x, y ) = 1
2πσ X σ Y e (1 + sin x sin y )
f X ,Y ( x, y ) in this example is non-Gaussian and qualifies to be a joint pdf. Because,
f X ,Y ( x, y ) ≥ 0 and
∞ ∞ ⎡ ( x− µ X )2 ( y − µ )2 ⎤
− 12 ⎢ + Y
⎥
∫∫
2 σ Y2
⎣⎢ σ X ⎦⎥
1
2πσ X σ Y e (1 + sin x sin y )dydx
−∞ −∞
∞ ∞ ⎡ ( x−µ X )2 ( y − µ )2 ⎤
∞ ∞ ⎡ ( x−µ X )2 ( y − µ )2 ⎤
− 12 ⎢ + Y
⎥ − 12 ⎢ + Y
⎥
∫∫ ∫∫
2 σ Y2 2 σ Y2
⎣⎢ σ X ⎦⎥ ⎣⎢ σ X ⎦⎥
= 1
2πσ X σ Y e dydx + 1
2πσ X σ Y e sin x sin ydydx
−∞ −∞ −∞ −∞
∞ ( x− µ )2 ∞ ( y − µ )2
− 12 X − 12 Y
= 1 + 2πσ X σ Y ∫e ∫e
1 σ 2X σ Y2
sin xdx sin ydy
−∞
−∞
= 1+ 0
=1
Odd function in y
∞ ⎡ ( x− µ X )2 ( y − µ )2 ⎤
− 12 ⎢ + Y
⎥
∫
2 σ Y2
⎣⎢ σ X ⎦⎥
f X ( x) = 1
2πσ X σ Y e (1 + sin x sin y )dy
−∞
∞ ⎡ ( x−µ X )2 ( y − µ )2 ⎤
∞ ⎡ ( x−µ X )2 ( y − µ )2 ⎤
− 12 ⎢ + Y
⎥ − 1
⎢ + Y
⎥
∫ ∫
2 σ Y2 2 ) 2 σ Y2
⎣⎢ σ X ⎦⎥ 2 (1− ρ X
⎣⎢ σ X ⎦⎥
= 1
e dy + 1
e sin x sin ydy
,Y
2πσ X σ Y 2πσ X σ Y
−∞ −∞
2
1 ⎛ x−µ X ⎞
− ⎜ ⎟
2 ⎜⎝ σ X ⎟
= 1
2π σ X
e ⎠
+0
2
1 ⎛ x−µ X ⎞
− ⎜ ⎟
1 2 ⎜⎝ σ X ⎟
⎠ Odd function in y
= 2π σ X
e
2
1 ⎛ y − µY ⎞
− ⎜ ⎟
2 ⎜⎝ σ Y ⎟
Similarly, fY ( y ) = 1
2πσ Y
e ⎠
⎡ ( x−µ X )2 ( y − µ )2 ⎤
− 12 ⎢ 2
+ Y
⎥
⎣⎢ σ X σ Y2 ⎦⎥
f X ,Y ( x, y ) = 1
2πσ X σ Y e
( x− µ )2 ( y − µ )2
X Y
1 1
= e σ 2X
e σY2
.
2πσ X 2πσ Y
= f X ( x ) fY ( y )
φ X ,Y (ω1 , ω 2 ) = Ee jω x + jω y
1 2
Note that φ X ,Y (ω1 , ω 2 ) is same as the two-dimensional Fourier transform with the basis function
e jω1x + jω 2 y instead of e − ( jω1x + jω 2 y ) .
∞ ∞
1
f X , y ( x, y ) =
4π 2 ∫ ∫φ
−∞ −∞
X ,Y (ω1 , ω 2 )e− jω1x − jω2 y d ω1 d ω 2
If X and Y are discrete random variables, we can define the joint characteristic function in terms
of the joint probability mass function as follows:
φ X ,Y (ω1 , ω 2 ) = ∑ ∑p
( x , y )∈R X ×RY
X ,Y ( x, y )e jω1x + jω2 y
The joint characteristic function has properties similar to the properties of the chacteristic function
of a single random variable. We can easily establish the following properties:
1. φ X (ω ) = φ X ,Y (ω , 0)
2. φY (ω ) = φ X ,Y (0, ω )
3. If X and Y are independent random variables, then
φ X ,Y (ω1 , ω2 ) = Ee jω X + jω Y 1 2
= E (e jω1 X e jω2Y )
= Ee jω1 X Ee jω2Y
= φ X (ω1 )φY (ω2 )
4. We have,
φ X ,Y (ω1 , ω2 ) = Ee jω X + jω Y 1 2
j 2 (ω1 X + jω2Y ) 2
= E (1 + jω1 X + jω2Y + + ..............)
2
j 2ω12 EX 2 j 2ω2 2 EY 2
= 1 + jω1 EX + jω2 EY + + + ω1ω2 EXY + .....
2 2
Hence,
1=φ X ,Y (0, 0)
1 ∂
EX = φ X ,Y (ω1 , ω 2 )
j ∂ω1 ω =0 1
1 ∂
EY = φ X ,Y (ω1 , ω 2 )
j ∂ω 2 ω 2 =0
1 ∂ φ X ,Y (ω1 , ω 2 )
2
EXY =
j2 ∂ω1∂ω 2 ω = 0,ω 1 2 =0
1 ∂ ∂ φ X ,Y (ω1 , ω2 )
m n
EX Y = m + n
m n
Example Joint characteristic function of the jointly Gaussian random variables X and Y with the
joint pdf
1 ⎡⎛ x − µ ⎞ 2 ⎛ x−µX ⎞⎛ y − µ y ⎞ ⎛ y − µY ⎞ ⎤
2
− ⎢⎜ ⎟ − 2 ρ X , y ⎜⎜ ⎥
⎟⎟⎜⎜ σ ⎟⎟ + ⎜⎜ σ ⎟⎟ ⎥
X
2(1− ρ X2 ,Y ) ⎢⎝⎜ σ X ⎠⎟ ⎝ σX ⎠⎝ Y ⎠ ⎝ Y ⎠ ⎦
⎣
e
f X ,Y ( x, y ) =
2πσ X σ Y 1 − ρ X2 ,Y
Let us recall the characteristic function of a Gaussian random variable
X ~ N ( µ X , σ X2 )
φ X (ω ) = Ee jω X
2
∞ 1 ⎛⎜ x − µ X ⎞
− ⎟
1 2 ⎜⎝ σ X ⎟
= ∫e ⎠
.e jω x dx
2πσ X −∞
∞ 1 x2 − 2( µ X −σ X2 jω ) x + ( µ X −σ X2 jω )2 −( µ X −σ X2 jω )2 + µ X2
−
1 σX2
∫e
2
= dx
2πσ X −∞
1 ( −σ X2 ω 2 + 2 µ X σ X2 jω ) 1 ⎛ x − µ X −σ X2 jω ⎞
2 Area under a Gaussian
∞ − ⎜ ⎟
σX2 1 2 ⎜⎝ ⎟
∫e
2 σX
=e ⎠
dx
2πσ X −∞
= e µ X jω −σ ω2 / 2
2
X
×1
= e µ X jω −σ ω2 / 2
42
X
We can use the joint characteristic functions to simplify the probabilistic analysis as illustrated
below:
φZ (ω ) = Ee jω Z = Ee j ( aX +bY )ω = φ X ,Y (aω , bω )
If X and Y are jointly Gaussian, then
φZ (ω ) = φ X ,Y (aω , bω )
j ( µ X + µY )ω − 1 (σ X2 + 2 ρ X ,Y σ X σ Y +σ Y2 )ω 2
=e 2
mean µZ = µX + µ Y and
σZ σX + 2 ρ X ,Y σ X σ Y + σ Y
2 2 2
variance =
φZ (ω ) = φ X ,Y (ω , ω )
= φ X (ω )φY (ω )
Using the property of the Fourier transform, we get
f Z ( z ) = f X ( z ) * fY ( z )
Conditional Expectation
Recall that
• If X and Y are continuous random variables, then the conditional density
function of Y given X = x is given by
fY / X ( x / y ) = f X ,Y ( x, y ) / f X ( x)
• If X and Y are discrete random variables, then the probability mass function
Y given X = x is given by
pY / X ( y / x) = p X ,Y ( x, y ) / p X ( x)
⎧∞
⎪ ∫ yfY / X ( x / y ) if X and Y are continuous
µY / X = x = E (Y / X = x) = ⎨ −∞
⎪ ∑ ypY / X ( x / y ) if X and Y are discrete
⎩ y∈RY
Remark
The conditional expectation of Y given X = x is also called the conditional mean
•
of Y given X = x.
σ Y2 / X = x = E[(Y − µY / X = x ) 2 / X = x]
Example:
Example
Suppose X and Y are jointlyuniform random variables with the joint probability
density function given by
⎧1
⎪ x ≥ 0, y ≥ 0, x + y ≤ 2
f X ,Y ( x, y ) = ⎨ 2
⎪⎩0 otherwise
Find E (Y / X = x)
1
From the figure, f X ,Y ( x, y ) = in the shaded area.
2
We have
2− x
∴ f X ( x) = ∫ f X ,Y ( x, y )dy
0
2− x 1
= ∫ dy
0 2
1
= (2 − x) 0 ≤ x ≤ 2
2
∴ fY / X ( y / x ) = f X ,Y ( x, y ) / f X ( x)
1
=
2− x
∞
∴ E (Y / X = x) = ∫ yfY / X ( y / x)dy
−∞
2− x 1
= ∫ y dy
0 2− x
2- x
=
2
Example
Suppose X and Y are jointly Gaussian random variables with the joint probability
density function given by
We have fY / X ( y / x) = f X ,Y ( x, y ) / f X ( x)
⎡ ( x−µ X )2 ( x−µ )( y − µ ) ( y − µ )2 ⎤
− 1
⎢ − 2 ρ XY X
σ Xσ
Y + Y
⎥
2 (1− ρ 2X ,Y ) 2 σ Y2
1 ⎣ σX Y ⎦
e
2πσ X σ Y 1− ρ X2 ,Y
= ( x−µ )2
− 12 2
X
1 σX
2π σ X
e
2
⎡ σ Y ρ X ,Y ⎤
− ⎢( y − µ Y ) − ( x − µ x )
1
⎥
2 σ Y2 (1− ρ 2X ,Y ) σX ⎦
= 1
e ⎣
2π σ Y 1− ρ X2 ,Y
Therefore,
σ Y ρ X ,Y
2
⎡ ⎤
∞ − ⎢( y − µ Y ) − ( x − µ x )⎥
1
2 σ Y2 (1− ρ X
2 )
σX
E (Y / X = x) = ∫ y 1
e ,Y ⎣ ⎦
dy
−∞ 2π σ Y 1− ρ X2 ,Y
σ Y ρ X ,Y
= µY + (x − µx )
σX
Conditional Expectation as a random variable
φ ( X ) = E (Y / X )
Using this function, we may define a random variable Thus we may consider
EY / X X E (Y / X = x) E (Y / X )
as a function of the random variable and as the value of at
X = x.
EE (Y / X ) = EY
EE (Y / X ) = EY
Proof:
∞
EE (Y / X ) = ∫ E (Y / X = x) f X ( x)dx
−∞
∞ ∞
= ∫ ∫ yfY / X ( x / y )dy f X ( x)dx
−∞ −∞
∞ ∞
= ∫ ∫ yf X ( x) fY / X ( x / y )dydx
−∞ −∞
∞ ∞
= ∫ ∫ yf X ,Y ( x, y )dydx
−∞ −∞
∞ ∞
= ∫ y ∫ f X ,Y ( x, y )dxdy
−∞ −∞
∞
= ∫ yfY ( y )dy
−∞
= EY
EE ( X / Y ) = EX
Obervation Y = y
fY / X ( y / x )
Random
variable X
with density
f X ( x)
estimation terminology.
f X ,Y ( x, y ) = f X ( x ) fY / X ( y )
Suppose the optimam estimator Xˆ (Y ) is a function of the random variable Y such that
it minimizes the mean-square error
The Estimation error E ( Xˆ (Y ) − X ) 2 . Such an estimator is known as the minimum mean-
square error estimator.
Estimation problem is
∞ ∞
∫ ∫ ( Xˆ ( y) − x)
2
Minimize f X ,Y (x, y )dx dy
−∞ −∞
∫ ∫ ( Xˆ ( y) − x)
2
fY (y ) f X / Y ( x / y )dx dy
−∞ −∞
∞ ∞
= ∫ ∫ ( Xˆ ( y) − x)
2
f X / Y ( x / y )dx fY (y )dy
−∞ −∞
Since fY (y ) is always +ve, the above integral will be minimum if the inner integral is
minimum. This results in the problem:
∞
∫ ( Xˆ ( y) − x)
2
Minimize f X / Y ( x / y )dx
−∞
∞
⇒ −2 ∫ ( Xˆ ( y ) − x) f X / Y ( x / y )dx = 0
−∞
∞ ∞
⇒ ∫
−∞
Xˆ ( y ) f X / Y ( x / y )dx = ∫xf
−∞
X /Y ( x / y )dx =E ( X / Y = y )
Multiple Random Variables
In many applications we have to deal with many random variables. For example, in the
navigation problem, the position of a space craft is represented by three random variables
denoting the x, y and z coordinates. The noise affecting the R, G, B channels of colour
video may be represented by three random variables. In such situations, it is convenient to
define the vector-valued random variables where each component of the vector is a
random variable.
In this lecture, we extend the concepts of joint random variables to the case of multiple
random variables. A generalized analysis will be presented n random variables defined
on the same sample space.
⎡ X1 ⎤
⎢X ⎥
⎢ 2⎥
X =⎢ . ⎥ or X ' = [ X1 X2 . . Xn ]
⎢ ⎥
⎢ . ⎥
⎢⎣ X n ⎥⎦
x=[ x1 x2 .. xn ]'.
The CDF of the random vector X. is defined as the joint CDF of X 1 , X 2 ,.., X n . Thus
FX1 , X 2 ,.., X n ( x1 , x2 ,..xn ) = FX (x)
= P ({ X 1 ≤ x1 , X 2 ≤ x2 ,.. X n ≤ xn })
Some of the most important properties of the joint CDF are listed below. These properties
are mere extensions of the properties of two joint random variables.
1
(e) The marginal CDF of a random variable X i is obtained from FX1 , X 2 ,.. X n ( x1 , x2 ,..xn ) by
letting all random variables except X i tend to ∞. Thus
FX1 ( x1 ) = FX1 , X 2 ,.. X n ( x1 , ∞,.., ∞),
FX 2 ( x2 ) = FX1 , X 2 ,.. X n (∞, x2 ,.., ∞)
and so on.
Given PX1 , X 2 ,.., X n ( x1 , x2 ,..xn ) we can find the marginal probability mass function
n -1 summations
its argument, then X can be specified by the joint probability density function
f X (x) = f X1 , X 2 ,.. X n ( x1 , x2 ,..xn )
∂n
= FX , X ,.. X ( x1 , x2 ,..xn )
∂x1∂x2 ...∂xn 1 2 n
(3) Given f X (x) = f X1 , X 2 ,.. X n ( x1 , x2 ,..xn ) for all ( x1 , x2 ,..xn ) ∈ R n , we can find the
probability of a Borel set (region ) B ⊆ R n ,
P({( x1 , x2 ,..xn ) ∈ B}) = ∫ ∫ .. ∫ f X1 , X 2 ,.. X n ( x1 , x2 ,..xn )dx1 dx2 ...dxn
B
2
∞ ∞ ∞
f X i ( xi ) = ∫ ∫ ... ∫ f X1 , X 2 ,.. X n ( x1 , x2 ,..xi ..xn )dx1 dx2 ...dxn
−∞ −∞ −∞
and so on.
The conditional density functions are defined in a similar manner. Thus
f X , X ,..., X ( x1 , x2 ,........., xn )
f X m +1 , X m + 2 ,......, X n / X1 , X 2 ,......., X m ( xm +1 , xm + 2 ,...... xn / x1 , x2 ,........., xm ) = 1 2 n
f X1 , X 2 ,..., X m ( x1 , x2 ,........., xm )
n
f X1 , X 2 ,.. X n ( x1 , x2 ,..xn ) = ∏ f X i ( xi )
i =1
1 ( xi − µi )
2
n −
1 2 σ i2
f X1 , X 2 ,.. X n ( x1 , x2 ,..xn ) = ∏ e
i =1 2πσ i
Remark, X 1 , X 2 ,................, X n may be pair wise independent, but may not be mutually
independent.
FX1 ( x ) = FX 2 ( x ) = ......... = FX n ( x ) ∀x
n
⎛1⎞
p X (1,1,......,1) = ⎜ ⎟
⎝2⎠
3
Moments of Multiple random variables
Consider n jointly random variables represented by the random vector
X = [ X 1 , X 2 ,....., X n ] '. The expected value of any scalar-valued function g ( X) is defined
Similarly for each (i, j ) i = 1, 2,.., n, j = 1, 2,.., n we can define the covariance
Cov( X i , X j ) = E ( X i − µ X i )( X j − µ X j )
All the possible covariances can be represented in terms of a matrix called the covariance
matrix CX defined by
CX = E ( X − µ X )( X − µ X )′
⎡ var( X i ) cov( X 1 , X 2 )" cov( X 1 , X n ) ⎤
⎢cov( X , X ) var( X )". cov( X , X ) ⎥
=⎢ 2 1 2 2 n ⎥
⎢# # # ⎥
⎢ ⎥
⎣ cov( X n , X 1 ) cov( X n , X 2 )" var( X n ) ⎦
• CX is a non-negative definite matrix in the sense that for any real vector z ≠ 0,
the quadratic form z ′CX z ≥ 0. The result can be proved as follows:
z ′C X z = z ′E (( X − µ X )( X − µ X )′)z
= E (z ′( X − µ X )( X − µ X )′z )
= E (z ′( X − µ X )) 2
≥0
The covariance matrix represents second-order relationship between each pair of the
random variables and plays an important role in applications of random variables.
• The n random variables X 1 , X 2 ,.., X n are called uncorrelated if for each
(i, j ) i = 1, 2,.., n, j = 1, 2,.., n
Cov( X i , X j ) = 0
If X 1 , X 2 ,.., X n are uncorrelated, CX will be a diagonal matrix.
4
Multiple Jointly Gaussian Random variables
For any positive integer n, X 1 , X 2 ,....., X n represent n jointly random variables.
variables.
5
Vector space Interpretation of Random Variables
Consider a set V with elements called vectors and the field of real numbers \.
V is called a vector space if and only if
It is easy to verify that the set of all random variables defined on a probability space
( S , F, P ) forms a vector space with respect to addition and scalar multiplication. Similarly
the set of all n − dim ensional random vectors forms a vector space. Interpretation of
random variables as elements of a vector space help in understanding many operations
involving random variables.
Linear Independence
Consider n random vectors v1 , v 2 , ...., v N .
Inner Product
If v and w are real vectors in a vector space V defined over the field \ , the inner product
< v, w > is a scalar such that
∀v, w, z ∈ V and r ∈ \
1. < v, w > = < w, v >
2
2. < v, v > = v ≥ 0, where v is a norm inducedby the inner product
3. < v + w , z > = < v, z > + < w, z >
4. < rv , w > = r < v, w >
In the case of two random variables X and Y , the joint expectation EXY defines an
inner product between X and Y . Thus
< X, Y > = EXY
We can easily verify that EXY satisfies the axioms of inner product.
The norm of a a random variable X is given by
2
X = EX 2
⎡ X1 ⎤ ⎡Y1 ⎤
⎢X ⎥ ⎢Y ⎥
For two n − dimensional random vectors X = ⎢ 2⎥
and Y = ⎢ ⎥ , the inner product is
2
⎢# ⎥ ⎢# ⎥
⎢ ⎥ ⎢ ⎥
⎣Xn ⎦ ⎣Yn ⎦
n
< X, Y > = EX′Y = ∑ EX iYi
i
• The set of RVs along with the inner product defined through the joint expectation
operation and the corresponding norm defines a Hilbert Space.
Schwarz Inequality
For any two vectors v and w belonging to a Hilbert space V
|< v, w >|≤ v w
E ( XY ) ≤ EX 2 EY 2
Just like the of independent random variables and the uncorrelated random variables, the
orthogonal random variables form an important class of random variables.
Remark
If X and Y are uncorrelated, then
E ( X − µ X )(Y − µY ) = 0
∴ ( X − µ X ) is orthogonal to (Y − µY )
If each of X and Y is zero-mean
Cov( X ,Y ) = EXY
In this case, EXY = 0 ⇔ Cov ( XY ) = 0.
Minimum Mean-square-error Estimation
Suppose X is a random variable which is not observable and Y is another observable
random variable which is statistically dependent on X through the joint probability
density function f X ,Y ( x, y ). We pose the followingproblem.
Signal Estimated
Noisy observation Y
X + Estimation Signal X̂
Noise
Let Xˆ (Y ) be the estimate of the random variable X based on the random variable
minimum is
∞
∫ ( x − Xˆ ( y )) f X / Y ( x / y )dx = 0
∂ 2
∂Xˆ
−∞
∞
Or 2 ∫ ( x − Xˆ ( y )) f X / Y ( x / y )dx = 0
-∞
∞ ∞
⇒ ∫ Xˆ ( y ) f X / Y ( x / y )dx = ∫ xf X / Y ( x / y )dx
−∞ −∞
⇒ Xˆ ( y ) = E ( X / Y = y )
Example Consider two zero-mean jointly Gaussian random variables X and Y with the
joint pdf
⎡ x2 y2 ⎤
− ⎢ σ 2 − 2 ρ XY σ σ + σ 2 ⎥
1 xy
2 (1− ρ 2X ,Y )
f X ,Y ( x, y ) = ⎣ X Y⎦
1 X Y
e
2πσ X σ Y 1− ρ X2 ,Y
f X ,Y ( x, y )
∴ f X /Y ( x / y) =
fY ( y )
2
⎡ ρ XY σ X ⎤
− 1
⎢x− σ y⎥
2 σ 2X (1− ρ 2X ,Y )
= 1
e ⎣ Y ⎦
2πσ X 1− ρ X2 ,Y
ρ XY σ
X
which is Gaussian with mean σ y . Therefore, the MMSE estimator of X given
Y
Y = y is given by
Xˆ ( y ) = E ( X / Y = y )
=
ρ XY σ
X
σ y
Y
This example illustrates that in the case of jointly Gaussian random variables X and Y ,
the mean-square estimator of X given Y = y , is linearly related with y. This important result
gives us a clue to have simpler version of the mean-square error estimation problem discussed
below.
Estimated
Signal Noisy observation Y
+ Estimation Signal X̂
X
X̂ = aY
Noise
d
da E ( X − aY ) 2 = 0
⇒ E dad ( X − aY ) 2 = 0
⇒ E ( X − aY )Y = 0
⇒ EeY = 0
where e is the estimation error.
Thus the optimum value of a is such that the estimation error ( X − aY ) is orthogonal to
the observed random variable Y and the optimal estimator aY is the orthogonal
projection of X on Y . This orthogonality principle forms the heart of a class of
estimation problem called Wiener filtering. The orthogonality principle is illustrated
geometrically in the following figure (Fig. ).
X
e is orthogonal to Y
aY Y
Orothogonal
projection
E ( X − aY )Y = 0
⇒ EXY − aEY 2 = 0
EXY
⇒ a=
EY 2
The corresponding minimum linear mean-square error (LMMSE) is
LMMSE = E ( X − aY ) 2
= E ( X − aY ) X − aE ( X − aY )Y
= E ( X − aY ) X − 0
( E ( X − aY )Y = 0, using the orthogonality principle)
= EX 2 − aEXY
The orthogonality principle can be applied to optimal estimation of a random variable
from more than one observation. We illustrate this in the following example.
variables. Suppose we want to estimate the mean of the random variable on the basis of
the observed data by means of the relation
1 N
µn = ∑ Xi
n i =1
How closely does µ n represent the true mean µ X as n is increased? How do we
to µ X ?
• The Cauchy criterion gives the condition for convergence of a sequence without
actually finding the limit. The sequence x1 , x 2 ,....x n .... converges if and only if ,
sequence is to be defined using different criteria. Five of these criteria are explained
below.
Convergence Everywhere
A sequence of random variables is said to converge everywhere to X if
X ( s ) − X n ( s ) → 0 for n > N and ∀s ∈ S .
Note here that the sequence of numbers for each sample point is convergent.
1
The sequence X 1 , X 2 ,.... X n ,.... is said to converge to X almost sure or with probability
1 if
P{s | X n ( s ) → X ( s )} = 1 as n → ∞,
or equivalently for every ε >0 there exists N such that
P{s X n ( s ) − X ( s ) < ε for all for n ≥ N } = 1
We write X n ⎯⎯→
a .s.
X in this case
1 n
finite mean µ X , then ∑ X i → µ X with probability 1as n → ∞.
n i =1
Remark:
1 n
• µn = ∑ X i is called the sample mean.
n i =1
• The strong law of large number states that the sample mean converges to the true
mean as the sample size increases.
• The SLLN is one of the fundamental theorems of probability. There is a weaker
versions of the law that we will discuss letter
Convergence in mean square sense
A random sequence X 1 , X 2 ,.... X n .... is said to converge in the mean-square sense (m.s) to
a random variable X if
E ( X n − X )2 → 0 as n→∞
• The following Cauchy criterion gives the condition for m.s. convergence of a
random sequence without actually finding the limit. The sequence
X 1 , X 2 ,.... X n .... converges in m.s. if and only if , for every ε > 0 there exists a
Example :
2
If X 1 , X 2 ,.... X n .... are iid random variables, then
1N
∑ X i → µ X in the mean square 1as n → ∞.
n i =1
1N
We have to show that lim E ( ∑ X i − µ X )2 = 0
n →∞ n i =1
Now,
1N 1 N
E ( ∑ X i − µ X ) 2 = E ( ( ∑ ( X i − µ X )) 2
n i =1 n i =1
1 N 1 n n
= 2 ∑ E ( X i − µ X ) 2 + 2 ∑ ∑ E ( X i − µ X )( X j − µ X )
n i=1 n i=1 j=1,j≠i
nσ X2
= +0 ( Because of independence)
n2
σ X2
=
n
1N
∴ lim E ( ∑ X i − µ X ) 2 = 0
n→∞ n i =1
Convergence in probability
Associated with the sequence of random variables X 1 , X 2 ,.... X n ,...., we can define a
P{ X n − X > ε } → 0 as n → ∞.
Example:
Suppose { X n } be a sequence of random variables with
3
1
P{ X n = 1} = 1 −
n
and
1
P{ X n = −1} =
n
Clearly
1
P{ X n − 1 > ε } = P{ X n = −1} = →0
n
as n → ∞.
Therefore { X n } ⎯⎯
P
→{ X = 0}
Convergence in distribution
Consider the random sequence X 1 , X 2 ,.... X n .... and a random variable X . Suppose
4
for all x at which FX ( x ) is continuous. Here the two distribution functions eventually
coincide. We write X n ⎯⎯
d
→ X to denote convergence in distribution of the random
Example: Suppose X 1 , X 2 ,.... X n .... is a sequence of RVs with each random variable X i
having the uniform density
⎧1
⎪ x≤b
f X i ( x) = ⎨ b
⎪⎩0 other wise
Define Z n = max( X 1 , X 2 ,.... X n )
We can show that
⎧= 0, z < 0
⎪ n
⎪z
FZn ( z ) = ⎨ n , 0 ≤ z < a
⎪a
⎪⎩1 otherwise
Clearly,
⎧0, z < a
lim FZn ( z ) = FZ ( z ) = ⎨
n →∞
⎩1 z ≥ a
∴{ zn } Converges to Z in distribution.
X n ⎯⎯
as
→X
Convergence almost sure
X n ⎯⎯
p
→X X n ⎯⎯
d
→X
Convergence in probability Convergence in distribution
X n ⎯⎯→
m. s .
X
Convergence in mean-square
5
Central Limit Theorem
Consider n independent random variables X 1 , X 2 ,.... X n The mean and variance of each
and
n
var(Yn ) = σ Y2n = E{∑ ( X i − µ i )}2
i =1
n n n
=∑ E ( X i − µi )2 + ∑ ∑ E ( X i − µ i )( X j − µ j )
i =1 i =1 j =1, j ≠ i
= σ X2 1 + σ X2 2 + ... + σ X2 n
∵ X i and X j are independent for (i ≠ j )
Thus we can determine the mean and variances of Yn . Can we guess about the probability
distribution of Yn ?
The central limit theorem (CLT) provides an answer to this question.
n
The CLT states that under very general conditions {Yn = ∑ X i } converges in distribution
i =1
1. The random variables X , X ,..., X are independent with same mean and
1 2 n
Remarks
• The central limit theorem is really a property of convolution, Consider the
sum of two statistically independent random variables, say, Y = X 1 + X 2 . Then
the pdf fY ( y ) the convolution of f X1 ( x ) and f X 2 ( x ) . This can be shown with
the help of the characteristic functions as follows:
φY (ω ) = E ⎡⎣e jω ( x + x ) ⎤⎦
1 2
= E ( e jω x1 ) E ( e jω x2 ) = φ x1 (ω )φ x2 (ω )
∴ fY ( x ) = f X1 ( y ) * f X 2 ( y )
∞
= ∫
−∞
f X1 (τ ) f X 2 ( y − τ )dτ
Yn = ( X 1 + X 2 + ... X n ) / n .
Clearly,
µY = 0,
n
σ = σ X2 ,
2
Yn
( jω ) 2 ( jω ) 2
φY (ω ) = E ( e jωY ) = 1 + jωµY +
n
E (Yn
2
) + E (Yn3 ) + ...
n n
2! 3!
Substituting µYn = 0 and E (Yn2 ) = σ Y2n = σ X2 , we get
Therefore,
φYn (ω ) = 1 − (ω 2 / 2!)σ X2 + R(ω , n)
where R(ω , n) is the average of terms involving ω 3 and higher powers of ω .
which is the characteristic function of a Gaussian random variable with 0 mean and
variance σ X2 .
Yn ⎯⎯d
→ N (0, σ X2 )
Remark:
distribution function. The theorem does not say that the pdf fYn ( y ) is a Gaussian pdf
in the limit. For example, suppose each X i has a Bernoulli distribution. Then the pdf
of Y consists of impulses and can never approach the Gaussian pdf.
(3) The the Cauchy distribution does not meet the conditions for the central limit
theorem to hold. As we have noted earlier, this distribution does not have a finite
mean or a variance.
Suppose a random variable X i has the Cauchy distribution
1
f X i ( x) = -∞ < x < ∞.
π (1 + x 2 )
The characteristic function of X i is given by
φ X ( w) = e − w
i
1N
The sample mean µˆ X = ∑ X i will have the chacteristic function
n i =1
φY ( w) = e − w
n
Thus the sum of large number of Cauchy random variables will not follow a
Gaussian distribution.
(4) The central-limit theorem one of the most widely used results of probability. If a
random variable is result of several independent causes, then the random variable can
be considered to be Gaussian. For example,
-the thermal noise in a resistor is result of the independent motion of billions electrons
and is modelled as a Gaussian.
-the observation error/ measurement error of any process is modeled as a Gaussian.
(5) The CLT can be used to simulate a Gaussian distribution given a
routine to simulate a particular random variable
Suppose X1, X2, X3, ………… Xn,….. is a sequence of Bernoulli(p) random variables with
P{ Xi =1}= p and P{ xi =0}= 1- p.
n
Then yn = ∑ X i is a Binomial distribution with µ y n = np and σ y2n = n p ( 1 − p ) .
i =1
Yn − np
Thus, ⎯⎯
d
→ N ( 0, 1)
np(1 − p)
or Yn ⎯⎯
d
→ N ( np, np(1 − p) )
1 ( y − np )2
k −1 1 − .
∴ P ( k − 1 < Yn ≤ k ) = ∫ e 2 np (1− p )
.dy
k −1
2π np(1 − p)
1 ( y − np )2
1 − .
∴ P (Yn = k ) = e 2 np (1− p ) ( assume the integrand interval = 1 )
2π np (1 − p)
This is normal approximation to the Binomial coefficients and known as the De-Moirre-
Laplace approximation.
RANDOM PROCESSES
In practical problems we deal with time varying waveforms whose value at a time is
random in nature. For example, the speech waveform, the signal received by
communication receiver or the daily record of stock-market data represents random
variables that change with time. How do we characterize such data? Such data are
characterized as random or stochastic processes. This lecture covers the fundamentals of
random processes..
Random processes
Recall that a random variable maps each sample point in the sample space to a point in
the real line. A random process maps each sample point to a waveform.
Consider a probability space {S , F, P}. A random process can be defined on {S , F, P} as
an indexed family of random variables { X ( s, t ), s ∈ S,t ∈ Γ} where Γ is an index set which
may be discrete or continuous usually denoting time. Thus a random process is a function
of the sample point ξ and index variable t and may be written as X (t , ξ ).
Remark
• For a fixed t (= t 0 ), X (t 0 , ξ ) is a random variable.
s3 X (t , s2 )
s1
s2
X (t , s1 )
t
Figure Random Process
( To Be animated)
and X 2 (t ) = − cos ω t. At a particular time t0 X (t0 ) is a random variable with two values
φ = 0.9320π
φ = 1.6924π
φ = 1.8636π
If the index set Γ is a countable set, { X (t ), t ∈ Γ} is called a discrete-time process.
Such a random process can be represented as { X [n], n ∈ Z } and called a random sequence.
φ = 0.4623π
φ = 1.9003π
φ = 0.9720π
The value of a random process X (t ) is at any time t can be described from its probabilistic
model.
The state is the value taken by X (t ) at a time t, and the set of all such states is called the
state space. A random process is discrete-state if the state-space is finite or countable. It
also means that the corresponding sample space is also finite countable. Other-wise the
random process is called continuous state.
Example Consider the random sequence { X n , n ≥ 0} generated by repeated tossing of a
fair coin where we assign 1 to Head and 0 to Tail.
Clearly X n can take only two values- 0 and 1. Hence { X n , n ≥ 0} is a discrete-time two-
state process.
How to describe a random process?
As we have observed above that X (t ) at a specific time t is a random variable and can be
described by its probability distribution function FX (t ) ( x) = P ( X (t ) ≤ x). This distribution
t1 and t2 is defined by
The autocorrelation function and the autocovariance functions are widely used to
characterize a class of random process called the wide-sense stationary process.
Example
[ X (t1 ), X (t2 ),..... X (tn )]' is jointly Gaussian with the joint density function given by
1
− X'C−X1X
e 2
f X (t1 ), X (t2 )... X (tn ) ( x1 , x2 ,...xn ) =
( )
n
2π det(CX )
where C X = E ( X − µ X )( X − µ X ) '
and µ X = E ( X) = [ E ( X 1 ), E ( X 2 )......E ( X n ) ] '.
The Gaussian Random Process is completely specified by the autocovariance matrix and
hence by the mean vector and the autocorrelation matrix R X = EXX ' .
(b) Bernoulli Random Process
A Bernoulli process is a discrete-time random process consisting of a sequence of
independent and identically distributed Bernoulli random variables. Thus the discrete –
time random process { X n , n ≥ 0} is Bernoulli process if
P{ X n = 1} = p and
P{ X n = 0} = 1 − p
Example
Consider the random sequence { X n , n ≥ 0} generated by repeated tossing of a fair coin
1
p X (1) = P{ X n = 1} = and
2
1
p X (0) = P{ X n = 0} =
2
(c) A sinusoid with a random phase
X (t ) = A cos( w0 t + φ ) where A and w0 are constants and φ is uniformly distributed
between 0 and 2π . Thus
1
f Φ (φ ) =
2π
X (t ) at a particular t is a random variable and it can be shown that
⎧ 1
⎪ x<A
f X ( t ) ( x ) = ⎨ π A2 − x 2
⎪0
⎩ otherwise
The pdf is sketched in the Fig. below:
The mean and autocorrelation of X (t ) :
µ X (t ) = EX (t )
= EA cos( w0t + φ )
∞ 1
= ∫ A cos( w0t + φ ) dφ
−∞ 2π
=0
RX (t1 , t2 ) = EA cos( w0t1 + φ ) A cos( w0t2 + φ )
= A2 E cos( w0t1 + φ ) cos( w0t2 + φ )
A2
= E (cos( w0 (t1 − t2 )) + cos( w0 (t1 + t2 + 2φ )))
2
A2 A2 π 1
= cos( w0 (t1 − t2 )) + ∫ cos( w0 (t1 + t2 + 2φ )) dφ
2 2 −π 2π
A2
= cos( w0 (t1 − t2 ))
2
In practical situations we deal with two or more random processes. We often deal with
the input and output processes of a system. To describe two or more random processes
we have to use the joint distribution functions and the joint moments.
Consider two random processes { X (t ), t ∈ Γ} and {Y (t ), t ∈ Γ}. For any positive integer
n , X (t1 ), X (t2 ),..... X (tn ), Y (t1/ ), Y (t2/ ),.....Y (t n/ ) represent 2n jointly distributed random
variables. Thus these two random processes can be described by the ( n + m)th order joint
distribution function
FX (t ), X (t ( tn ),Y ( t1/ ),Y ( t2/ ),.....Y ( t m/ )
( x1 , x2 .....xn , y1 , y2 ..... ym )
1 2 )..... X
∂ 2n
= F / ( x1 , x2 ..... xn , y1 , y2 ..... yn )
∂x1∂x2 ...∂xn ∂y1∂y2 ...∂yn X ( t1 ), X (t2 )..... X ( tn ),Y (t1 ),Y (t2 ),.....Y (t n )
/ /
On the basis of the above definitions, we can study the degree of dependence between
two random processes
s3 X (t , s2 )
s1
s2
X (t , s1 )
t
Figure Random Process
( To Be animated)
and X 2 (t ) = − cos ω t. At a particular time t0 X (t0 ) is a random variable with two values
φ = 0.9320π
φ = 1.6924π
φ = 1.8636π
If the index set Γ is a countable set, { X (t ), t ∈ Γ} is called a discrete-time process.
Such a random process can be represented as { X [n], n ∈ Z } and called a random sequence.
φ = 0.4623π
φ = 1.9003π
φ = 0.9720π
The value of a random process X (t ) is at any time t can be described from its probabilistic
model.
The state is the value taken by X (t ) at a time t, and the set of all such states is called the
state space. A random process is discrete-state if the state-space is finite or countable. It
also means that the corresponding sample space is also finite countable. Other-wise the
random process is called continuous state.
Example Consider the random sequence { X n , n ≥ 0} generated by repeated tossing of a
fair coin where we assign 1 to Head and 0 to Tail.
Clearly X n can take only two values- 0 and 1. Hence { X n , n ≥ 0} is a discrete-time two-
state process.
How to describe a random process?
As we have observed above that X (t ) at a specific time t is a random variable and can be
described by its probability distribution function FX (t ) ( x) = P ( X (t ) ≤ x). This distribution
t1 and t2 is defined by
The autocorrelation function and the autocovariance functions are widely used to
characterize a class of random process called the wide-sense stationary process.
Example
[ X (t1 ), X (t2 ),..... X (tn )]' is jointly Gaussian with the joint density function given by
1
− X'C−X1X
e 2
f X (t1 ), X (t2 )... X (tn ) ( x1 , x2 ,...xn ) =
( )
n
2π det(CX )
where C X = E ( X − µ X )( X − µ X ) '
and µ X = E ( X) = [ E ( X 1 ), E ( X 2 )......E ( X n ) ] '.
The Gaussian Random Process is completely specified by the autocovariance matrix and
hence by the mean vector and the autocorrelation matrix R X = EXX ' .
(b) Bernoulli Random Process
A Bernoulli process is a discrete-time random process consisting of a sequence of
independent and identically distributed Bernoulli random variables. Thus the discrete –
time random process { X n , n ≥ 0} is Bernoulli process if
P{ X n = 1} = p and
P{ X n = 0} = 1 − p
Example
Consider the random sequence { X n , n ≥ 0} generated by repeated tossing of a fair coin
1
p X (1) = P{ X n = 1} = and
2
1
p X (0) = P{ X n = 0} =
2
(c) A sinusoid with a random phase
X (t ) = A cos( w0 t + φ ) where A and w0 are constants and φ is uniformly distributed
between 0 and 2π . Thus
1
f Φ (φ ) =
2π
X (t ) at a particular t is a random variable and it can be shown that
⎧ 1
⎪ x<A
f X ( t ) ( x ) = ⎨ π A2 − x 2
⎪0
⎩ otherwise
The pdf is sketched in the Fig. below:
The mean and autocorrelation of X (t ) :
µ X (t ) = EX (t )
= EA cos( w0t + φ )
∞ 1
= ∫ A cos( w0t + φ ) dφ
−∞ 2π
=0
RX (t1 , t2 ) = EA cos( w0t1 + φ ) A cos( w0t2 + φ )
= A2 E cos( w0t1 + φ ) cos( w0t2 + φ )
A2
= E (cos( w0 (t1 − t2 )) + cos( w0 (t1 + t2 + 2φ )))
2
A2 A2 π 1
= cos( w0 (t1 − t2 )) + ∫ cos( w0 (t1 + t2 + 2φ )) dφ
2 2 −π 2π
A2
= cos( w0 (t1 − t2 ))
2
In practical situations we deal with two or more random processes. We often deal with
the input and output processes of a system. To describe two or more random processes
we have to use the joint distribution functions and the joint moments.
Consider two random processes { X (t ), t ∈ Γ} and {Y (t ), t ∈ Γ}. For any positive integer
n , X (t1 ), X (t2 ),..... X (tn ), Y (t1/ ), Y (t2/ ),.....Y (t n/ ) represent 2n jointly distributed random
variables. Thus these two random processes can be described by the ( n + m)th order joint
distribution function
FX (t ), X (t ( tn ),Y ( t1/ ),Y ( t2/ ),.....Y ( t m/ )
( x1 , x2 .....xn , y1 , y2 ..... ym )
1 2 )..... X
∂ 2n
= F / ( x1 , x2 ..... xn , y1 , y2 ..... yn )
∂x1∂x2 ...∂xn ∂y1∂y2 ...∂yn X ( t1 ), X (t2 )..... X ( tn ),Y (t1 ),Y (t2 ),.....Y (t n )
/ /
On the basis of the above definitions, we can study the degree of dependence between
two random processes
Having characterized the random process by the joint distribution ( density) functions and
joint moments we define the following two important classes of random processes.
and equivalently
p X n , X n ,.., X n ( x1 , x2 .....xn ) = p X ( x1 ) p X ( x2 )..... p X ( xn )
1 2 N
var( X n ) = p (1 − p )
RX (n1 , n2 ) = EX n1 X n2
= EX n1 EX n2
= p2
and
n
var( X n ) = ∑ var( Z i ) ∵ Z i s are independent random variables
i =1
= 4npq
Remark If the increment Z n of the random walk process takes the values of s and − s,
then
n
∴ EX n = ∑ EZ i = n(2 p − 1) s
i =1
and
n
var( X n ) = ∑ var( Z i )
i =1
= 4npqs 2
Wiener Process
0 Γ
∆ 2∆ t = n∆
Clearly,
EX n = 0
1 1
var( X n ) = 4 pqns 2 = 4 × × ns 2 = ns 2
2 2
For large n, the distribution of X n approaches the normal with men 0 and variance
t
ns2 = s2 = αt
∆
As ∆ → 0 and n → ∞, X n becomes the continuous-time process X (t ) with the pdf
2
1x
e 2 α t . This process { X ( t )} is called the Wienerprocess.
1 −
f X ( t ) ( x) =
2πα t
(1) X ( 0 ) = 0
(2) X ( t ) is an independent increment process.
(3) For each s ≥ 0, t ≥ 0 X ( s + t ) − X ( s ) has the normal distribution with mean 0 and
variance α t .
2
1x
1 −
f X ( s +t )− X ( s ) ( x) = e 2 αt
2πα t
• Wiener process was used to model the Brownian motion – microscopic particles
suspended in a fluid are subject to continuous molecular impacts resulting in the
zigzag motion of the particle named Brownian motion after the British Botanist
Brown.
• Wiener Process is the integration of the white noise process.
Similarly if t1 > t2
RX ( t1 , t2 ) = α t2
∴ RX ( t1 , t2 ) = α min ( t1 , t2 )
2
1 x
1 −
f X (t ) ( x ) = e 2α t
2πα t
Remark
C X ( t1 , t2 ) = α min ( t1 , t2 )
X ( t ) is a Gaussian process.
Poisson Process
The counting process { N (t ), t ≥ 0} is called Poisson’s process with the rate parameter λ if
(i) N(0)=0
(ii) N (t) is an independent increment process.
Thus the increments N (t2 ) − N (t1 ) and N (t4 ) − N (t3 ), et. are independent.
The assumptions are valid for many applications. Some typical examples are
Next to find P ( N (t ) = 1)
d
dt
({P( N (t ) = 1}) = −λ P ({ N (t ) = 1}) − λ P ({ N (t ) = 0})
= −λ P { N (t ) = 1} − λ e − λt
(λ t ) n e − λ t
P ({ N (t ) = n} ) =
n!
Remark
(1) The parameter λ is called the rate or intensity of the Poisson process.
It can be shown that
(λ (t2 − t1 )) n e− λ (t2 −t1 )
P ({ N (t2 ) − N (t1 ) = n} ) =
n!
Thus the probability of the increments depends on the length of the interval t2 − t1 and not
on the absolute times t2and t1. Thus the Poisson process is a process with stationary
increments.
(2) The independent and stationary increment properties help us to compute the joint
probability mass function of N (t ). For example,
P ({ N (t1 ) = n1 , N (t2 ) = n2 } ) = P ({ N (t1 ) = n1}) P ({N (t2 ) = n2 } /{N (t1 ) = n1})
= P ({ N (t1 ) = n1}) P({N (t2 ) − N (t1 ) = n2 − n1} )
(λt1 ) n1 e − λt1 (λ (t2 − t1 )) n2 − n1 e − λ (t2 −t1 )
=
n1 ! (n2 − n1 )!
We observe that at any time t > 0, N (t ) is a Poisson random variable with the parameter
λt. Therefore,
EN (t ) = λ t
and var N (t ) = λ t
Thus both the mean and variance of a Poisson process varies linearly with time.
As N (t ) is a random process with independent increment, we can readily show that
CN (t1 , t2 ) = var( N (min(t1 , t2 )))
= λ min(t1 , t2 )
∴ RN (t1 , t2 ) = CN (t1 , t2 ) + EN (t1 ) EN (t2 )
= λ min(t1 , t2 ) + λ 2t1t2
Example: A petrol pump serves on the average 30cars per hour. Find the probability that
during a period of 5 minutes (i) no car comes to the station, (ii) exactly 3 cars come to the
station and (iii) more than 3 cars come to the station.
1
− ×5
(i) P { N (5) = 0} = e 2
= e −2.5 = 0.0821
3
⎛1 ⎞
⎜ × 5⎟
(ii) P { N (5) = 3} = ⎝
2 ⎠ −2.5
e
3!
(iii) P{N (5) > 3} = 1 − 0.21 = 0.79
Binomial model:
P = probability of car coming in 1 minute=1/2
n=5
∴ P( X = 0) = (1 − P)5 = 0.55
o
t1 t2 t n −1 tn
Tn−1 Tn
T1 T2
Let Tn = time elapsed between (n-1) st event and nth event. The random process
{Tn , n = 1, 2,..} represent the arrival time of the Poisson process.
T1 = time elapsed before the first event take place. Clearly T1 is a continuous random
variable.
Similarly,
P ({Tn > t}) = P( {0 event occurs in the interval (tn −1 , tn −1 + t )/(n-1)th event occurs at (0, tn −1 ] })
= P( {0 event occurs in the interval (tn −1 , tn −1 + t ]})
= e − λt
∴ fTn (t ) = λ e − λt
Thus the inter-arrival times of a Poisson process with the parameter λ are exponentially
distributed. with
fTn (t ) = λ e − λt n>0
Remark
• We have seen that the inter-arrival times are identically distributed with the
exponential pdf. Further, we can easily prove that the inter-arrival times are
independent also. For example
• It is interesting to note that the converse of the above result is also true. If the
inter-arrival times between two events of a discrete state {N (t ), t ≥ 0} process are
1
exponentially distributed with mean , then {N (t ), t ≥ 0} is a Poisson process
λ
with the parameter λ .
• The exponential distribution of the inter-arrival process indicates that the arrival
process has no memory. Thus
P ({Tn > t0 + t1 / Tn > t1}) = P ({Tn > t0 }) ∀t0 , t1
Another important quantity is the waiting time Wn . This is the time that elapses before the
nth event occurs. Thus
n
Wn = ∑ Ti
i =1
How to find the first order pdf of Wn is left as an exercise. Note that Wn is the sum of n
independent and identically distributed random variables.
n n n
∴ EWn = ∑ ETi = ∑ ETi =
i =1 i =1 λ
and
n n n
var (Wn ) = var ∑ (Ti ) = ∑ var(Ti ) =
i =1 i =1 λ2
Example
The number of customers arriving at a service station is a Poisson process with a rate
10 customers per minute.
(a) What is the mean arrival time of the customers?
(b) What is the probability that the second customer will arrive 5 minutes after the
first customer has arrived?
(c) What is the average waiting time before the 10th customer arrives?
We can also find the conditional and joint probability mass functions. For example, for
t1 < t2 ,
p X ( t1 ), X ( t2 ) (1,1) = P({ X (t1 ) = 1}) P({ X (t2 ) = 1)}/{ X (t1 ) = 1)}
= e − λt1 cosh λ t1 P ({N (t2 ) is even )}/{N (t1 ) is even)}
= e − λt1 cosh λ t1 P ({N (t2 ) − N (t1 ) is even )}/{N (t1 ) is even)}
= e − λt1 cosh λ t1 P ({N (t2 ) − N (t1 ) is even )}
= e − λt1 cosh λ t1e − λ ( t2 −t1 ) cosh λ (t2 − t1 )
= e − λt2 cosh λ t1 cosh λ (t2 − t1 )
Similarly
p X ( t1 ), X ( t2 ) (1, −1) = e − λt2 cosh λ t1 sinh λ (t2 − t1 ),
p X ( t1 ), X ( t2 ) (−1,1) = e − λt2 sinh λt1 sinh λ (t2 − t1 )
p X ( t1 ), X ( t2 ) (−1, −1) = e − λt2 sinh λ t1 cosh λ (t2 − t1 )
EX (t ) = 1× e − λt cosh λt − 1× e− λt sinh λ t
= e − λt (cosh λt − sinh λt )
= e −2 λ t
EX 2 (t ) = 1× e − λt cosh λ t + 1× e− λt sinh λ t
= e − λt (cosh λt + sinh λt )
= e − λt eλt
=1
∴ var( X (t )) = 1 − e −4 λt
For t1 < t2
RX (t1 , t2 ) = EX (t1 ) X (t2 )
= 1× 1× p X ( t1 ), X ( t2 ) (1,1) + 1× (−1) × p X ( t1 ), X ( t2 ) (1, −1)
+ (−1) ×1× p X ( t1 ), X ( t2 ) (−1,1) + (−1) × (−1) × p X ( t1 ), X ( t2 ) (−1, −1)
= e − λt2 cosh λ t1 cosh λ (t2 − t1 ) − e − λt2 cosh λt1 sinh λ (t2 − t1 )
− e − λt2 sinh λt1 sinh λ (t2 − t1 ) + e− λt2 sinh λt1 cosh λ (t2 − t1 )
= e − λt2 cosh λ t1 (cosh λ (t2 − t1 ) − sinh λ (t2 − t1 )) + e − λt2 sinh λ t1 (cosh λ (t2 − t1 ) −1 sinh λ (t2 − t1 ))
= e − λt2 e − λ ( t2 −t1 ) (cosh λt1 + sinh λ t1 )
= e − λt2 e − λ ( t2 −t1 ) eλt1
= e −2 λ ( t2 −t1 )
Similarlrly for t1 > t2
RX (t1 , t2 ) = e −2 λ ( t1 −t2 )
∴ RX (t1 , t2 ) = e −2 λ t1 −t2
Random Telegraph signal
Consider a two-state random process {Y (t )} with the states Y (t ) = 1 and Y (t ) = −1.
1 1
Suppose P ({Y (0) = 1}) = and P ({Y (0) = −1}) = and Y (t ) changes polarity with an
2 2
equal probability with each occurrence of an event in a Poisson process of parameter λ .
Such a random process {Y (t )} is called a random telegraph signal and can be expressed
as
Y (t ) = AX (t )
where { X (t )} is the semirandom telegraph signal and A is a random variable
1 1
independent of X (t ) with P ({ A = 1}) = and P ({ A = −1}) = .
2 2
Clearly,
1 1
EA = (−1) × + 1× = 0
2 2
and
1 1
EA2 = (−1) 2 × + 12 × = 1
2 2
Therefore,
EY (t ) = EAX (t )
= EAEX (t ) ∵ A and X (t ) are independent
=0
RY (t1 , t2 ) = EAX (t1 ) AX (t2 )
= EA2 EX (t1 ) X (t2 )
= e −2 λ t1 −t2
Stationary Random Process
called SSS if
FX ( t1 ), X (t2 )..., X (tn ) ( x1 , x2 ..., xn ) = FX (t1 + t0 ), X ( t2 + t0 )..., X ( tn + t0 ) ( x1 , x2 ..., xn )
Thus the joint distribution functions of any set of random variables X ( t1 ), X ( t 2 ), ..., X ( t n )
does not depend on the placement of the origin of the time axis. This requirement is a
very strict. Less strict form of stationarity may be defined.
Particularly,
if FX ( t1 ), X ( t2 )..... X ( tn ) ( x1 , x2 .....xn ) = FX ( t1 +t0 ), X ( t2 +t0 )..... X ( tn +t0 ) ( x1 , x2 .....xn ) for n = 1, 2,.., k , then
FX ( t1 ) ( x1 ) = FX ( t1 + t0 ) ( x1 ) ∀t 0 ∈ T
As a consequence
EX (t1 ) = EX (0) = µ X (0) = constant
• If { X (t )} is stationary up to order 2
FX ( t1 ), X ( t 2 ) ( x1 , x 2 .) = FX ( t1 + t0 ), X ( t 2 +t0 ) ( x1 , x 2 )
Put t 0 = −t 2
= RX (t1 − t2 )
Similarly,
C X (t1 , t2 )= C X (t1 − t2 )
Therefore, the autocorrelation function of a SSS process depends only on the time lag
t1 − t2 .
We can also define the joint stationarity of two random processes. Two processes
{ X (t )} and {Y (t )} are called jointly strict-sense stationary if their joint probability
distributions of any order is invariant under the translation of time. A complex process
{Z (t ) = X (t ) + jY (t )} is called SSS if { X (t )} and {Y (t )} are jointly SSS.
Remark
(1) For a WSS process { X (t )},
∴ EX 2 (t ) = RX (0) = constant
var( X (t )=EX 2 (t ) − ( EX (t ))2 = constant
CX (t1 , t2 ) = EX (t1 ) X (t2 ) − EX (t1 ) EX (t2 )
= RX (t2 − t1 ) − µ X2
∴ CX (t1 , t2 ) is a function of lag (t2 − t1 ).
(2) An SSS process is always WSS, but the converse is not always true.
Example: Sinusoid with random phase
Consider the random process { X (t )} given by
Note that
⎧ 1
⎪ 0 ≤ φ ≤ 2π
f Φ (φ ) = ⎨ 2π
⎪⎩0 otherwise
⎧ 1
⎪ -A ≤ x ≤ A
f X ( t ) ( x ) = ⎨ π A2 − x 2
⎪0 otherwise
⎩
which is independent of t. Hence { X (t )} is first-order stationary.
Note that
EX (t ) = EA cos( w0 t + φ )
2π
1
= ∫ A cos(w t + φ ) 2π dφ
0
0
= 0 which is a constant
and
RX (t1 , t2 ) = EX (t1 ) X (t2 )
= EA cos( w0 t1 + φ ) A cos( w0 t2 + φ )
2
A
= E[c os( w0 t1 + φ + w0 t2 + φ ) + c os( w0 t1 + φ − w0 t2 − φ )]
2
A2
= E[c os( w0 (t1 + t2 ) + 2φ ) + c os( w0 (t1 − t2 )]
2
A2
= c os( w0 (t1 − t2 ) which is a function of the lag t1 − t2 .
2
Hence { X (t )} is wide-sense stationary.
Example: Sinusoid with random amplitude
Consider the random process { X (t )} given by
EX (t ) = EA × cos( w0t + φ )
A realization of the random binary wave is shown in Fig. above. Such waveforms are
used in binary munication- a pulse of amplitude 1is used to transmit ‘1’ and a pulse of
amplitude -1 is used to transmit ‘0’.
∞
(t − nT − D)
X (t ) = ∑ A rect
n =−∞
n
T
For any t,
1 1
EX (t ) = 1 × + (−1)( ) = 0
2 2
1 1
EX 2 (t ) = 12 × + (−1) 2 = 1
2 2
To find the autocorrelation function RX (t1 , t2 ) let us consider the case 0 < t1 < t1 + τ < T .
Depending on the delay D, the points t1 and t2 may lie on one or two pulse intervals.
Case 1:
X (t1 ) X (t2 ) = 1
Case 2:
Case 3:
X (t1 ) X (t2 ) = −1
Thus,
⎧1 0 < D < t1 or t2 < D < T
X (t1 ) X (t2 ) = ⎨
⎩−1 t1 < D < t2
t2 − t1
So that RX (t1 , t2 ) = 1 − t2 − t1 ≤ T
T
Thus the autocorrelation function for the random binary waveform depends on
τ1
RX (τ ) = 1 − τ ≤T
T
RX (τ )
T1 τ
-T1
Example Gaussian Random Process
Consider the Gaussian process { X (t )} discussed earlier. For any positive integer
n, X (t1 ), X (t 2 ),..... X (t n ) is jointly Gaussian with the joint density function given by
1
− X'C−X1X
e 2
f X (t1 ), X (t2 )... X (tn ) ( x1 , x2 ,...xn ) =
( )
n
2π det(CX )
where C X = E ( X − µ X )( X − µ X ) '
and µ X = E ( X) = [ E ( X 1 ), E ( X 2 )......E ( X n ) ] '.
If { X (t )} is WSS, then
⎡ EX (t1 ) ⎤ ⎡ µ X ⎤
⎢ ⎥ ⎢ ⎥
⎢ EX (t2 ) ⎥ ⎢ µ X ⎥
µ X = ⎢.. ⎥ = ⎢.. ⎥
⎢ ⎥ ⎢ ⎥
⎢.. ⎥ ⎢.. ⎥
⎢ EX (t ) ⎥ ⎢ µ ⎥
⎣ n ⎦ ⎣ X⎦
⎛ X (t ) − µ ′⎞
⎜⎡ 1 X ⎤ ⎡ X (t1 ) − µ X ⎤ ⎟
⎜ ⎢ X (t 2 ) − µ X ⎥ ⎢ X ( t 2 ) − µ X ⎥ ⎟
⎜⎢ ⎥⎢ ⎥ ⎟
CX = E ⎜ ⎢.. ⎥ ⎢.. ⎥ ⎟
⎢
⎜ .. ⎥ ⎢ ⎥ ⎟
⎜⎢ ⎥ ⎢.. ⎥ ⎟
⎜ ⎣ X (tn ) − µ X ⎦ ⎣ X (tn ) − µ X ⎥⎦ ⎟
⎢ ⎥ ⎢
⎝ ⎠
⎡C X (0) C X (t2 − t1 )... C X (tn − t1 ) ⎤
⎢ ⎥
C (t − t ) C X (0)... C X (t2 − tn ) ⎥
=⎢ X 2 1
⎢# ⎥
⎢ ⎥
⎣⎢C X (tn − t1 ) C X (tn − t1 )... C X (0) ⎦⎥
We see that f X (t1 ), X (t2 )... X (tn ) ( x1 , x2 ,...xn ) depends on the time-lags. Thus, for a Gaussian
random process, WSS implies strict sense stationarity, because this process is completely
described by the mean and the autocorrelation functions.
Properties Autocorrelation Function of a real WSS Random Process
Autocorrelation of a deterministic signal
Consider a deterministic signal x(t ) such that
1 T 2
0 < lim ∫ x (t ) dt < ∞
T →∞ 2T − T
Such signals are called power signals. For a power signal x(t ), the autocorrelation
function is defined as
1 T
Rx (τ ) = lim ∫ x (t + τ ) x (t ) dt
T →∞ 2T −T
1 T 2
Particularly, Rx (0) = lim ∫ x (t ) dt is the mean-square value. If x(t ) is a voltage
T →∞ 2T −T
waveform across a 1 ohm resistance, then Rx (0) is the average power delivered to the
resistance. In this sense, Rx (0) represents the average power of the signal.
Example Suppose x(t ) = A cos ω t. The autocorrelation function of x(t) at lag τ is given
by
1 T
Rx (τ ) = lim ∫ A cos ω (t + τ ) A cos ω tdt
T →∞ 2T −T
A2 T
= lim ∫ [cos(2ω t + τ ) + cos ωτ ]dt
T →∞ 4T −T
A2 cos ωτ
=
2
We see that Rx (τ ) of the above periodic signal is also periodic and its maximum occurs
2π 2π A2
when τ = 0, ± ,± , etc. The power of the signal is Rx (0) = .
ω ω 2
The autocorrelation of the deterministic signal gives us insight into the properties of the
autocorrelation function of a WSS process. We shall discuss these properties next.
Properties of the autocorrelation function of a WSS process
Consider a real WSS process { X (t )} . Since the autocorrelation function RX (t1 , t2 ) of such
signal applied across a 1 ohm resistance, then RX (0) is the ensemble average
power delivered to the resistance. Thus,
RX (0) = EX 2 (t ) ≥ 0.
RX ( −τ ) = R X (τ ). Thus,
RX (−τ ) = EX (t − τ ) X (t )
= EX (t ) X (t − τ )
= EX (t1 + τ ) X (t1 ) ( Substituting t1 = t − τ )
= RX (τ )
< X (t ), X (t + τ ) > ≤ X (t ) X (t + τ )
2 2 2
We have
RX2 (τ ) = {EX (t ) X (t + τ )}2
= EX 2 (t ) EX 2 (t + τ )
= RX (0) RX (0)
∴ RX (τ ) <RX (0)
4. RX (τ ) is a positive semi-definite function in the sense that for any positive integer
n n
n and real a j , a j , ∑ ∑ ai a j RX (ti − t j )>0
i =1 j =1
Proof
Define the random variable
n
Y = ∑ ai X (ti )
j =1
Then we have
n n
0 ≤ EY 2 = ∑ ∑ ai a j EX (ti )X (t j )
i =1 j =1
n n
= ∑ ∑ ai a j RX (ti −t j )
i =1 j =1
E ( X (t + Tp ) − X (t )) 2 = 0
⇒ EX 2 (t + Tp ) + EX 2 (t ) − 2 EX (t + Tp ) X (t ) = 0
⇒ RX (0) + RX (0) − 2 RX (Tp ) = 0
⇒ RX (Tp ) = RX (0)
Again
E (( X (t + τ + Tp ) − X (t + τ )) X (t )) 2 ≤ E ( X (t + τ + Tp ) − X (t + τ )) 2 EX 2 (t )
⇒ ( RX (τ + Tp ) − RX (τ )) 2 ≤ 2( RX (0) − RX (Tp )) RX (0)
⇒ ( RX (τ + Tp ) − RX (τ )) 2 ≤ 0 ∵ RX (0) = RX (Tp )
∴ RX (τ + Tp ) = RX (τ )
6. Suppose X (t ) = µ X + V (t )
where V (t ) is a zero-mean WSS process and lim RV (τ ) = 0. Then
τ →∞
lim RX (τ ) = µ 2
X
τ →∞
such a signal has less high frequency components. Later on we see that R X (τ ) is
directly related to the frequency -domain representation of a WSS process.
Fig.
If { X (t )} and {Y (t )} are two real jointly WSS random processes, their cross-correlation
functions are independent of t and depends on the time-lag. We can write the cross-
correlation function
RXY (τ ) = EX (t + τ )Y (t )
RYX (τ )
RXY (τ )
O
τ
Fig. RXY (τ ) = RYX (−τ )
We discussed the convergence and the limit of a random sequence. The continuity of the
random process can be defined with the help of convergence and limits of a random
process. We can define continuity with probability 1, mean-square continuity, and
continuity in probability etc. We shall discuss the mean-square continuity and the
elementary concepts of corresponding mean-square calculus.
n →∞
and we write
l.i.m. X n = X
n →∞
lim E ⎡⎣ X ( t ) − X ( t0 ) ⎤⎦ = 0
2
t → t0
Proof:
E ⎡⎣ X ( t ) − X ( t0 ) ⎤⎦ = E ( X 2 ( t ) − 2 X ( t ) X ( t0 ) + X 2 ( t0 ) )
2
= R X ( t , t ) − 2 RX ( t , t0 ) + RX ( t0 , t0 )
t →t0 t → t0
= R X ( t 0 , t0 ) − 2 R X ( t0 , t0 ) + R X ( t 0 , t0 )
=0
( E ⎡⎣ X ( t ) − X ( t )⎤⎦ ) ≤ E ⎡⎣( X ( t ) − X ( t0 ) ) ⎤⎦
2 2
0
∴ lim ⎣⎡ E ⎡⎣ X ( t ) − X ( t0 ) ⎤⎦ ⎦⎤ ≤ lim E ⎡⎣ X ( t ) − X ( t0 ) ⎤⎦ = 0
2 2
t →t0 t → t0
∴ EX ( t ) is continuous at t0 .
Example
Consider the random binary wave { X (t )} discussed in Eaxmple . Atypical realization of
the process is shown in Fig. below. The realization is a discontinuous function.
X (t )
1
… Tp
0 Tp t
−1 …
Mean-square differentiability
The random process { X ( t )} is said to have the mean-square derivative X ' ( t ) at a point
X ( t + ∆t ) − X ( t )
t ∈ Γ, provided approaches X ' ( t ) in the mean square sense as
∆t
∆t → 0 . In other words, the random process { X ( t )} has a m-s derivative X ' ( t ) if
⎡ X ( t + ∆t ) − X ( t )
2
⎤
lim E ⎢ − X ' ( t )⎥ = 0
∆t →0
⎣ ∆t ⎦
Remark
(1) If all the sample functions of a random process X ( t ) is differentiable, then the above
condition is satisfied and the m-s derivative exists.
Example Consider the random-phase sinusoid { X (t )} given by
X (t ) = A cos( w0 t + φ ) where A and w0 are constants and φ ~ U [0, 2π ]. Then for each
φ , X (t ) is differentiable. Therefore, the m.s. derivative is
Applying the Cauchy criterion, the condition for existence of m-s derivative is
⎡ X ( t + ∆t1 ) − X ( t ) X ( t + ∆t2 ) − X ( t ) ⎤
2
lim E ⎢ − ⎥ =0
∆t1 , ∆t2 →0
⎣ ∆t1 ∆t2 ⎦
lim E ⎢ − ⎥
∆t1 , ∆t2 →0
⎣ ∆t1 ∆t 2 ⎦
⎡ R ( t + ∆t1 , t + ∆t1 ) + RX ( t , t ) − 2 RX ( t + ∆t1 , t ) ⎤ ⎡ RX ( t + ∆t2 , t + ∆t2 ) + RX ( t , t ) − 2 RX ( t + ∆t2 , t ) ⎤
= lim ⎢ X ⎥+⎢ ⎥
∆t1 , ∆t2 → 0
⎣ ∆t12 ⎦ ⎣ ∆t 2 2 ⎦
⎡ R ( t + ∆t1 , t + ∆t2 ) − RX ( t + ∆t1 , t ) − RX ( t , t + ∆t2 ) + RX ( t , t ) ⎤
−2⎢ X ⎥
⎣ ∆t1∆t2 ⎦
∂ 2 RX ( t1 , t2 ) ⎤
Each of the above terms within square bracket converges to ⎥ if the
∂t1∂t2 ⎦ t =t ,t
1 2 =t
⎡ X ( t + ∆t1 ) − X ( t ) X ( t + ∆t2 ) − X ( t ) ⎤ ∂ 2 RX ( t1 , t2 ) ∂ 2 RX ( t1 , t2 ) ∂ 2 RX ( t1 , t2 ) ⎤
2
∴ lim E ⎢ − ⎥ = + − 2 ⎥
∆t1 , ∆t2 →0
⎣ ∆t1 ∆t 2 ⎦ ∂t1∂t2 ∂t1∂t2 ∂t1∂t2 ⎦ t =t ,t =t
1 2
=0
∂ 2 RX ( t1 , t2 )
Thus, { X ( t )} is m-s differentiable at t ∈ Γ if exists at (t , t ) ∈ Γ × Γ.
∂t1∂t2
Particularly, if X ( t ) is WSS,
RX ( t1 , t2 ) = RX ( t1 − t2 )
Substituting t1 − t2 = τ , we get
∂ 2 RX ( t1 , t2 ) ∂ 2 RX ( t1 − t2 )
=
∂t1∂t2 ∂t1∂t2
∂ ⎛ dRX (τ ) ∂ ( t1 − t2 ) ⎞
= ⎜ . ⎟
∂t1 ⎝ dτ ∂t2 ⎠
d 2 RX (τ ) ∂τ
=−
dτ 2 ∂t1
d 2 RX (τ )
=−
dτ 2
Therefore, a WSS process X ( t ) is m-s differentiable if RX (τ ) has second derivative at
τ = 0.
Example
Consider a WSS process { X ( t )} with autocorrelation function
RX (τ ) = exp ( −a τ )
RX (τ ) does not have the first and second derivative at τ = 0. { X ( t )} is not mean-square
differentiable.
Example The random binary wave { X ( t )} has the autocorrelation function
⎧ τ
⎪1 − τ ≤ Tp
RX (τ ) = ⎨ Tp
⎪0
⎩ otherwise
RX (τ ) does not have the first and second derivative at τ = 0. Therefore, { X ( t )} is not
mean-square differentiable.
Example For a Wiener process { X ( t )} ,
RX ( t1 , t2 ) = α min ( t1 , t2 )
where α is a constant.
⎧α t if t2 < 0
∴ RX ( 0, t2 ) = ⎨ 2
⎩0 other wise
⎧α if t2 < 0
∂RX ( 0, t2 ) ⎪
∴ = ⎨0 if t2 > 0
∂t1 ⎪does not exist if if t = 0
⎩ 2
∂ RX ( t1 , t2 )
2
∴ does not exist at (t1 = 0, t2 = 0)
∂t1∂t2
We have,
X ( t + ∆t ) − X ( t )
EX ' ( t ) = E lim
∆t →0 ∆t
EX ( t + ∆t ) − EX ( t )
= lim
∆t →0 ∆t
µ ( t + ∆t ) − µ X ( t )
= lim X
∆t →0 ∆t
= µ X '(t )
∂ 2 RXX ( t1 , t2 )
EX ' ( t1 ) X ' ( t2 ) =
∂t1∂t2
∂ dR (τ )
EX ( t1 ) X ' ( t2 ) = RX ( t1 − t2 ) = X
∂t dτ
and
EX ′ ( t1 ) X ′ ( t2 ) = RX ′ ( t1 − t2 )
d 2 RX (τ )
=
dτ 2
d 2 RX (τ )
∴ var( X ′ ( t ) =
dτ 2 τ =0
Mean Square Integral
Recall that the definite integral (Riemannian integral) of a function x ( t ) over the interval
[t0 , t ] is defined as the limiting sum given by
t n −1
∫ x (τ )dτ = lim
n →∞ ∆k →0
∑ x (τ ) ∆
k =0
k k
t0
Where t0 < t1 < ................ < tn −1 < tn = t are partitions on the interval [ t0 , t ] and
∆ k = tk +1 − tk and τ k ∈ [tk −1 , tk ] .
For a random process { X ( t )} , the m-s integral can be similarly defined as the process
{Y ( t )} given by
t n −1
Y ( t ) = ∫ X (τ )dτ = l.i.m.
n →∞ ∆k →0
∑ X (τ )∆
k =0
k k
t0
t t
exist is that the double integral ∫ ∫ R (τ ,τ ) dτ dτ
t0 t0
X 1 2 1 2 exists.
= µ X (t − t0 )
Therefore, if µ X ≠ 0, {Y ( t )} is necessarily non-stationary.
RY ( t1 , t2 ) = EY (t1 )Y (t2 )
t1 t2
= E ∫ ∫ X (τ 1 ) X (τ 2 ) dτ 1dτ 2
t0 t0
t1 t2
= ∫ ∫ EX (τ 1 ) X (τ 2 ) dτ 1dτ 2
t0 t0
t1 t2
= ∫ ∫ RX (τ 1 − τ 2 ) dτ 1dτ 2
t0 t0
Remark The nonstatinarity of the M.S. integral of a random process has physical
importance – the output of an integrator due to stationary noise rises unboundedly.
Example The random binary wave { X ( t )} has the autocorrelation function
⎧ τ
⎪1 − τ ≤ Tp
RX (τ ) = ⎨ Tp
⎪0
⎩ otherwise
Often we are interested in finding the various ensemble averages of a random process
{ X ( t )} by means of the corresponding time averages determined from single realization
of the random process. For example we can compute the time-mean of a single
realization of the random process by the formula
1 T
T →∞ 2T ∫−T
µx T
= lim x (t )dt
Can µ x T
and xrms T
represent µ X and EX 2 (t ) respectively ?
To answer such a question we have understand various time averages and their
properties.
1 T
2T ∫−T
g ( X (t )) T
=
g ( X (t )) dt
where the integral is defined in the mean-square sense.
The above definitions are in contrast to the corresponding ensemble average defined by
∞
Eg ( X (t )) = ∫ g ( x) f X ( t ) ( x)dx for continuous case
−∞
= ∑
i∈RX ( t )
g ( xi ) p X (t ) ( xi ) for discrete case
1 T
RX (τ ) T
=
2T ∫
−T
X (t ) X (t + τ )dt (continuous case)
1 N
R X [ m] = ∑ X i X i+m (discrete case)
N 2 N + 1 i =− N
Note that, g ( X (t )) T and g ( X n ) N are functions of random variables and are governed
by respective probability distributions. However, determination of these distribution
functions is difficult and we shall discuss the behaviour of these averages in terms of
their mean and variances. We shall further assume that the random processes { X ( t )} and
{ X n } are WSS.
The mean of µ X N
1 N
E µX =E ∑ Xi
N
2 N + 1 i =− N
1 N
= ∑ EX i
2 N + 1 i =− N
=µ X
2
⎛ 1 N ⎞
= E⎜ ∑ (Xi − µX )⎟
⎝ 2 N + 1 i =− N ⎠
=
1 ⎡ N
− µ +
N N
E ( X i − µ X )( X j − µ X ) ⎤⎥
∑ ∑ ∑
2
2 ⎢
E ( X ) 2
( 2 N + 1) ⎣i =− N ⎦
i X
i =− N ,i ≠ j j =− N
=
1 ⎡ N
E ( X i − µ X )2 ⎤
2 ⎢ ∑ ⎥⎦
( 2 N + 1) ⎣i =− N
σ X2
=
2N +1
Let us consider the time-averaged mean for the continuous case. We have
1 T
2T ∫−T
µX T = X (t )dt
1 T
2T ∫−T
∴ E µX T = EX (t )dt
1 T
2T ∫−T
= µ X dt = µ X
and the variance
2
⎛ 1 T ⎞
E ( µX T − µX ) = E ⎜
2
∫−T X (t )dt − µ X ⎟
⎝ 2T ⎠
2
⎛ 1 T ⎞
= E⎜ ∫−T ( X (t ) − µ X )dt ⎟
⎝ 2T ⎠
1
= 2 ∫−TT ∫−TT E ( X (t1 ) − µ X )( X (t2 ) − µ X )dt1dt2
4T
1
= 2 ∫−TT ∫−TT C X (t1 − t2 )dt1dt2
4T
The above double integral is evaluated on the square area bounded by t1 = ±T and
t2 = ±T . We divide this square region into sum of trapizoida strips parallel to t1 − t2 = 0.
Putting t1 − t2 = τ and noting that the differential area between t1 − t2 = τ and
t1 − t2 = τ + dτ is (2T − τ )dτ , the above double integral is converted to a single integral
as follows:
Therefore,
E ( µX − µX ) =
2 1 T T
∫−T ∫−T C X (t1 − t2 )dt1dt2
T
4T 2
1
= 2 ∫−22TT (2T − τ )C X (τ )dτ
4T
1 2T ⎛ τ ⎞
= ∫−2T ⎜1 − ⎟ C X (τ )dτ
2T ⎝ 2T ⎠
t1
t1 − t2 = τ + dτ
t1 − t2 = 2T
T t1 − t2 = τ
dτ
−T T t2
−T t1 − t2 = −2T
Ergodicity Principle
If the time averages converge to the corresponding ensemble averages in the probabilistic
sense, then a time-average computed from a large realization can be used as the value for
the corresponding ensemble average. Such a principle is the ergodicity principle to be
discussed below:
and
lim var µ X T
= σ X2
T →∞
We have earlier shown that
E µX T = µX
and
1
2T
⎡ τ ⎤
2T −∫2T
var µ X T
= C X (τ ) ⎢1 − ⎥dτ
⎣ 2T ⎦
Therefore, the condition for ergodicity in mean is
1
2T
⎡ τ ⎤
T →∞ 2T ∫
lim CX (τ ) ⎢1− ⎥dτ = 0
−2T ⎣ 2T ⎦
Further,
1
2T
⎡ τ ⎤ 1
2T
∫
2T −2T
C X (τ ) ⎢1 − ⎥dτ ≤
⎣ 2T ⎦ 2T ∫
−2T
C X (τ ) dτ
∫
−2T
C X (τ ) dτ < ∞
Here
2T 2T
∫
−2T
C X (τ ) dτ = 2 ∫ C X (τ ) dτ
0
Tp
⎛ τ ⎞
= 2 ∫ ⎜1 − ⎟dτ
⎜ T ⎟
0 ⎝ p ⎠
⎛ Tp3 Tp2 ⎞
= 2 ⎜ Tp + 2 − ⎟
⎜ 3Tp Tp ⎟⎠
⎝
2Tp
=
3
∞
∴ ∫ C X (τ ) dτ < ∞
−∞
Hence { X (t )} is not mean ergodic.
Autocorrelation ergodicity
T
1
2T −∫T
RX (τ ) T = X (t ) X (t + τ )dt
If we consider Z (t ) = X (t ) X (t + τ ) so that, µ Z = RX (τ )
1
2T
⎛ τ1 ⎞
lim
T →∞ 2T ∫ ⎜⎝1 − 2T ⎟⎠C
−2T
Z (τ 1 )dτ 1 = 0
where
CZ (τ 1 ) = EZ (t ) Z (t − τ 1 ) − EZ (t ) EZ (t − τ 1 )
= EX (t ) X (t − τ ) X (t − τ ) X (t − τ − τ 1 ) − RX2 (τ )
Hence found the condition for autocorrelation ergodicity of a jointly Gaussian process.
1
2T
⎛ τ ⎞
lim
T →∞ 2T ∫ ⎜⎝1 − 2T ⎟⎠ C (τ )dτ → 0
−2T
z
Now CZ (τ ) = EZ (t ) Z (t + τ ) − RX2 (τ )
2T
⎛ τ ⎞
∫−2T ⎝ 2T ⎟⎠ ( Ez (t ) z (t + α ) − RX (τ ) ) dα → 0
1
If lim ⎜ 1 − 2
T →∞ 2T
Example
Consider the random –phased sinusoid given by
X (t ) = A cos( w0 t + φ ) where A and w0 are constants and φ ~ U [0, 2π ] is a random
variable. We have earlier proved that this process is WSS with µ X = 0 and
A2
RX (τ ) = cos w0τ
2
For any particular realization x(t ) = A cos( w0t + φ1 ),
1 T
2T ∫−T
µx T
= A cos( w0t + φ1 )dt
1
= A sin( w0T )
Tw0
and
T
1
Rx (τ ) T
=
2T −T
∫ A cos(w t + φ ) A cos(w (t + τ ) + φ )dt
0 1 0 1
T
A2
=
4T ∫ [cos w τ + A cos(w (2t + τ ) + 2φ )]dt
−T
0 0 1
Remark
A random process { X (t )} is ergodic if its ensemble averages converges in the M.S. sense
to the corresponding time averages. This is a stronger requirement than stationarity- the
ensemble averages of all orders of such a process are independent of time. This implies
that an ergodic process is necessarily stationary in the strict sense. The converse is not
true- there are stationary random processes which are not ergodic.
Following Fig. shows a hierarchical classification of random processes.
Random processes
WSS Process
Ergodic Processes
Example
Suppose X (t ) = C where C ~ U [0 a]. { X (t )} is a family of straight line as illustrated in
Fig. below.
X (t ) = a
3
X (t ) = a
4
1
X (t ) = a
2
1
X (t ) = a
4
X (t ) = 0
t
a
Here µ X = and
2
1 T
µ X T = lim ∫ Cdt is a different constants for different realizations. Hence { X (t )} is
T →∞ 2T −T
The signal g (t ) can be obtained from G (ω ) by the inverse Fourier transform (IFT) as
follows:
1 ∞
g (t ) = IFT (G (ω )) = ∫ G (ω )e dω
jω t
2π −∞
The existence of the inverse Fourier transform implies that we can represent a function g (t ) as a
superposition of continuum of complex sinusoids. The Fourier transform G (ω ) is the strength of
the sinusoids of frequency ω present in the signal. If g (t ) is a voltage signal measured in volt,
G (ω ) has the unit of volt/radian. The function G (ω ) is also called the spectrum of g (t ).
ω
We can define the Fourier transform also in terms of the frequency variable f = . In this
2π
case, we can define the Fourier transform and the inverse Fourier transform as follows:
∞
G ( f ) = ∫ g (t )e − j 2π ft dt
−∞
and
∞
g (t ) = ∫ G ( f )e j 2π ft df
−∞
The Fourier transform is a linear transform and has many interesting properties. Particularly, the
energy of the energy of the signal f (t ) is related by the Parseval’s theorem
T ∞
∫g (t)dt = ∫ G( ω ) dω
2 2
−T −∞
integral
∞
FT ( X (t )) = ∫ X (t )e − jωt dt
−∞
The existence of the above integral would have implied the existence the Fourier
transform of every realization of X (t ). But the very notion of stationarity demands that
the realization does not decay with time and the first condition of Dirichlet is violated.
This difficulty is avoided by a frequency-domain representation of X (t ) in terms of the
power spectral density (PSD). Recall that the power of a WSS process X (t ) is a constant
and given by EX 2 (t ). The PSD denotes the distribution of this power over frequencies.
Definition of Power Spectral Density of a WSS Process
Let us define
X T (t) = X(t) -T < t < T
=0 otherwise
t
= X (t )rect ( )
2T
t
where rect ( ) is the unity-amplitude rectangular pulse of width 2T centering the
2T
origin. As t → ∞, X T (t ) will represent the random process X (t ).
Define the mean-square integral
T
FTX T (ω ) = ∫X
−T
T (t)e − jω t dt
∫ X T2 (t)dt = ∫ FTX T ( ω ) dω .
2
−T −∞
E FTX T (ω )
2
E FTX T (ω )
2
S X (ω ) = lim
T →∞ 2T
Relation between the autocorrelation function and PSD: Wiener-Khinchin-Einstein
theorem
We have
| FTX T (ω ) |2 FTX T (ω ) FTX T * (ω )
E =E
2T 2T
T T
1
= ∫ ∫
2T −T −T
EX T (t1 )X T (t2 )e − jωt1 e+ jωt2 dt1dt2
T T
1
=
2T ∫ ∫R
−T −T
X (t1 − t2 )e − jω (t1 −t2 ) dt1dt2
t1
t1 − t2 = τ + dτ
t1 − t2 = 2T
T t1 − t2 = τ
dτ
−T T t2
−T t1 − t2 = −2T
Note that the above integral is to be performed on a square region bounded by
t1 = ±T and t2 = ±T . Substitute t1 − t 2 = τ so that t 2 = t1 − τ is a family of straight
lines parallel to t1 − t2 = 0. The differential area in terms of τ is given by the shaded area
and equal to (2T − | τ |)dτ . The double integral is now replaced by a single integral in τ .
Therefore,
FTX T (ω ) X T * (ω ) 1 2T
= ∫ RX (τ )e (2T − | τ |)dτ
− jωτ
E
2T 2T −2T
2T |τ |
= ∫ Rx (τ )e − jωτ (1 − )dτ
−2 T 2T
∞
If R X (τ ) is integrable then the right hand integral converges to ∫ RX (τ )e − jωτ dτ as
−∞
T → ∞,
E FTX T (ω )
2
∞
∴ lim = ∫ RX (τ )e − jωτ dτ
T →∞ 2T −∞
E FTX T (ω )
2
S X (ω )
A2
B2 B2
2 2
−ω c o ω
ωc
⇒ RX (τ ) = E M (t + τ ) cos (ω c (t + τ ) + Φ ) M (t ) cos (ω cτ + Φ )
= E M (t + τ ) M (t ) E cos (ω c (t + τ ) + Φ ) cos (ω cτ + Φ )
( Using the independence of M (t ) and the sinusoid)
A2
= RM (τ ) cos ω cτ
2
A2
∴ S X (ω ) =
4
( SM (ω + ωc ) + SM (ω − ωc ) )
where S M (ω ) is the PSD of M(t)
S M (ω )
ω
S X (ω )
−ω c o ω
ωc
∞
1
∴ RN (τ ) =
2π ∫ S (ω )
−∞
N e jωτ dω
ω
ωc +
2
1 No
=
2π
×2× ∫ω 2
cos ωτ dτ
ωc −
2
⎡ ⎛ ω⎞ ⎛ ω ⎞⎤
⎢ sin ⎜ ωc + 2 ⎟ − sin ⎜ ωc − 2 ⎟ ⎥
⎢ ⎝ ⎠ ⎝ ⎠⎥
No
=
2π ⎢ τ ⎥
⎢ ⎥
⎣ ⎦
No sin ωτ 2
= cos ω o τ
2π ωτ 2
Properties of the PSD
S X (ω ) being the Fourier transform of RX (τ ), it shares the properties of the Fourier
EX 2 (t) = R X (0)
1 ∞
= ∫ S X (ω )dw
2π −∞
ω2
The average power in the band [ω1 , ω 2 ] is 2 ∫ S X (ω )d ω
ω1
Average Power in
the frequency band
[ω1 , ω 2 ]
∞
S X (ω ) = ∫R
−∞
X (τ )e − jωτ dτ
∞
= ∫R
−∞
X (τ )(cos ωτ + j sin ωτ )dτ
∞
=
−∞
∫R X (τ ) cos ωτ dτ
∞
= 2∫ RX (τ ) cos ωτ dτ
0
impulses.
Remark
1) The function SX (ω) is the PSD of a WSS process { X (t )} if and only
expansion { X (t )} .
Cross power spectral density
Consider a random process Z (t ) which is sum of two real jointly WSS random processes
X(t) and Y(t). As we have seen earlier,
RZ (τ ) = RX (τ ) + RY (τ ) + RXY (τ ) + RYX (τ )
If we take the Fourier transform of both sides,
S Z (ω ) = S X (ω ) + SY (ω ) + FT ( RXY (τ )) + FT ( RYX (τ ))
where FT (.) stands for the Fourier transform.
Thus we see that S Z (ω ) includes contribution from the Fourier transform of the cross-
correlation functions RXY (τ ) and RYX (τ ). These Fourier transforms represent cross power
spectral densities.
Definition of Cross Power Spectral Density
Given two real jointly WSS random processes X(t) and Y(t), the cross power spectral
density (CPSD) S XY (ω ) is defined as
FTX T∗ (ω ) FTYT (ω )
S XY (ω ) = lim E
T →∞ 2T
where FTX T (ω ) and FTYT (ω ) are the Fourier transform of the truncated processes
t t ∗
X T (t ) = X(t)rect ( ) and YT (t ) = Y(t)rect ( ) respectively and denotes the complex
2T 2T
conjugate operation.
We can similarly define SYX (ω ) by
FTYT∗ (ω ) FTX T (ω )
SYX (ω ) = lim E
T →∞ 2T
Proceeding in the same way as the derivation of the Wiener-Khinchin-Einstein theorem
for the WSS process, it can be shown that
∞
S XY (ω ) = ∫ RXY (τ )e − jωτ dτ
−∞
and
∞
SYX (ω ) = ∫ RYX (τ )e − jωτ dτ
−∞
The cross-correlation function and the cross-power spectral density form a Fourier
transform pair and we can write
∞
RXY (τ ) = ∫ S XY (ω )e jωτ dω
−∞
and
∞
RYX (τ ) = ∫ SYX (ω )e jωτ dω
−∞
(1) S XY (ω ) = SYX
*
(ω )
(3) X(t) and Y(t) are uncorrelated and have constant means, then
S XY (ω ) = SYX (ω ) = µ X µY δ (ω )
Observe that
RXY (τ ) = EX (t + τ )Y (t )
= EX (t + τ ) EY (t )
= µ X µY
= µY µ X
= RXY (τ )
∴ S XY (ω ) = SYX (ω ) = µ X µY δ (ω )
(5) The cross power PXY between X(t) and Y(t) is defined by
1 T
PXY = lim E ∫ X (t )Y (t )dt
T →∞ 2T −T
1 ∞
= lim E ∫ X T (t )YT (t )dt
T →∞ 2T −∞
1 1 ∞
= lim ∫ FTX T (ω ) FTYT (ω )dω
*
E
T →∞ 2T 2π −∞
1 ∞ EFTX T* (ω ) FTYT (ω )
= ∫ lim dω
2π −∞ T →∞ 2T
1 ∞
= ∫ S XY (ω )dω
2π −∞
1 ∞
∴ PXY = ∫ S XY (ω )dω
2π −∞
Similarly,
1 ∞
PYX = ∫ SYX (ω )dω
2π −∞
1 ∞ *
= ∫ S XY (ω )dω
2π −∞
= PXY
*
Example Consider the random process Z (t ) = X (t ) + Y (t ) discussed in the beginning of
the lecture. Here Z (t ) is the sum of two jointly WSS orthogonal random processes
X(t) and Y(t).
We have,
RZ (τ ) = RX (τ ) + RY (τ ) + RXY (τ ) + RYX (τ )
Taking the Fourier transform of both sides,
S Z (ω ) = S X (ω ) + SY (ω ) + S XY (ω ) + SYX (ω )
1 ∞ 1 ∞ 1 ∞ 1 ∞ 1 ∞
∴ ∫ S Z (ω )dω = ∫ S X (ω )dω + ∫ SY (ω )dω + ∫ S XY (ω ) dω + ∫ SYX (ω )dω
2π −∞ 2π −∞ 2π −∞ 2π −∞ 2π −∞
Therefore,
PZ (ω ) = PX (ω ) + PY (ω ) + PXY (ω ) + PYX (ω )
Remark
• PXY (ω ) + PYX (ω ) is the additional power contributed by X (t ) and Y (t ) to the
resulting power of X (t ) + Y (t )
• If X(t) and Y(t) are orthogonal, then
S Z (ω ) = S X (ω ) + SY (ω ) + 0 + 0
= S X (ω ) + SY (ω )
Consequently
PZ (ω ) = PX (ω ) + PY (ω )
Thus in the case of two jointly WSS orthogonal processes, the power of the sum of
the processes is equal to the sum of respective powers.
1
S X (ω ) = lim E DTFTX N (ω )
2
N →∞ 2N +1
where
∞ N
DTFTX N (ω ) = ∑ X N [n]e − jwn = ∑ X [ n]e − jwn
n =−∞ n =− N
Note that the average power of { X [n]} is RX [0] = E X 2 [n ] and the power spectral
component of frequency ω.
Wiener-Einstein-Khinchin theorem
The Wiener-Einstein-Khinchin theorem is also valid for discrete-time random processes.
The power spectral density S X (ω ) of the WSS process { X [n]} is the discrete-time
Fourier transform of autocorrelation sequence.
∞
S X (ω ) = ∑ RX [ m ] e − jω m −π ≤ w ≤ π
m =−∞
Clearly, S X (ω ) = S X ( z ) z =e jω
−m
Example Suppose RX [m] = 2 m = 0, ±1, ±2, ±3.... Then
∞
S X (ω ) = ∑ RX [ m ] e− jω m
m =−∞
m
∞ ⎛1⎞
= 1 + ∑ ⎜ ⎟ e − jω m
m =−∞ ⎝ 2 ⎠
m≠0
3
=
5 − 4cos ω
The plot of the autocorrelation sequence and the power spectral density is shown in Fig.
below.
Example
Properties of the PSD of a discrete-time WSS process
• For the real discrete-time process { X [ n]}, the autocorrelation function RX [m] is
real and even. Therefore, S X (ω ) is real and even.
• S X (ω ) ≥ 0.
• The average power of { X [n]} is given by
1 π
EX 2 [ n] = RX [0] = ∫ S X (ω )d ω
2π −π
Similarly the average power in the frequency band [ω1 , ω2 ] is given by
ω2
2 ∫ S X (ω ) d ω
ω1
R X [ m] = E X [ n + m] X [ n ]
= E X a (n T + mT ) X a (n T )
= RX a ( mT )
∴ RX [ m] = RX a ( mT ) m = 0, ±1, ±2,...
The frequency ω of the discrete-time WSS process is related to the frequency Ω of the
ω
continuous time process by the relation Ω =
T
White noise process
A white noise process {W (t )} is defined by
N0
SW (ω ) = −∞ < ω < ∞
2
where N 0 is a real constant and called the intensity of the white noise. The
corresponding autocorrelation function is given by
N
RW (τ ) = δ (τ ) where δ (τ ) is the Dirac delta.
2
The average power of white noise
1 ∞ N
Pavg = EW 2 (t ) = ∫ dω → ∞
2π −∞ 2
The autocorrelation function and the PSD of a white noise process is shown in Fig.
below.
S W (ω )
N0
2
O ω
RW (τ )
N0
δ (τ )
2
O τ
Remarks
• The term white noise is analogous to white light which contains all visible light
frequencies.
• We generally consider zero-mean white noise process.
• A white noise process is unpredictable as the noise samples at different instants of
time are uncorrelated.:
CW (ti , t j ) = 0 for ti ≠ t j .
N 0 B sin Bτ
RX (τ ) =
2π Bτ
The plot of S X (ω ) and RX (τ ) of a band-limited white noise process is shown in Fig.
S X (ω )
N0
2
−B
O
B ω
Observe that
N0 B
• The average power of the process is EX 2 (t ) = RX (0) =
2π
π 2π 3π nπ
• RX (τ ) = 0 for τ = ± ,± ,± ,... This means that X (t ) and X (t + )
B B B B
where n is a non-zero integer are uncorrelated. Thus we can get uncorrelated
samples by sampling a band-limited white noise process at a uniform interval of
π
.
B
• A band-limited white noise process may also be a band-pass process with PSD as
shown in the Fig. and given by
⎧N B
⎪ 0 ω − ω0 <
S X (ω ) = ⎨ 2 2
⎪⎩0 otherwise
S X (ω )
N0
2
ω0 ω
−ω 0 −
B −ω 0 −ω 0 +
B O
ω0 −
B ω0 ω +
B
2 2 2 0
2
Coloured Noise
A noise process which is not white is called coloured noise. Thus the noise process
2 a 2b
{ X (t )} with RX (τ ) = a 2 e −b τ b > 0 and PSD S X (ω ) = is an example of a
b2 + ω 2
coloured noise.
N
SW (ω ) = −π ≤ ω ≤ π
2
Therefore
N
RW (m) = δ ( m)
2
where δ (m) is the unit impulse sequence. The autocorrelation function and the PSD of a
whitenoise sequence is shown in Fig.
SW (ω )
RW [m] N
N 2 NNN
2 2 22
• • • • • →
m shown −π π →ω
A realization of a white noise sequence is in Fig. below.
Remark
1 N N
• The average power of the white noise sequence is EX 2 [n] = × 2π =
2π 2 2
The average power of the white noise sequence is finite and uniformly
distributed over all frequencies.
• If the white noise sequence {W [n]} is a Gaussian sequence, then {W [n]} is
called a white Gaussian noise (WGN) sequence.
• An i.i.d. random sequence is always white. Such a sequence may be called
strict-sense white noise sequence. A WGN sequence is is a strict-sense
stationary white noise sequence.
• The model white noise sequence looks artificial, but it plays a key role in
random signal modelling. It plays the similar role as that of the impulse
function in the modeling of deterministic signals. A class of WSS processes
called regular processes can be considered as the output of a linear system
with white noise as input as illustrated in Fig.
• The notion of the sequence of i.i.d. random variables is also important in
statistical inference.
Linear
White noise process System Regular WSS random process
Response of Linear time-invariant system to WSS input:
In many applications, physical systems are modeled as linear time invariant (LTI) system.
The dynamic behavior of an LTI system to deterministic inputs is described by linear
differential equations. We are familiar with time and transfer domain (such as Laplace
transform and Fourier transform) techniques to solve these equations. In this lecture, we
develop the technique to analyse the response of an LTI system to WSS random process.
y(t ) = T [ x(t )]
Linear system
The system is called linear if superposition applies: the weighted sum of inputs results in
the weighted sum of the corresponding outputs. Thus for a linear system
T ⎡⎣ a1 x1 ( t ) + a2 x2 ( t ) ⎤⎦ = a1T ⎡⎣ x1 ( t ) ⎤⎦ + a2T ⎡⎣ x2 ( t ) ⎤⎦
Then,
d
dt
( a1 x1 (t ) + a2 x2 (t ) )
d d
= a1 x1 ( t ) + a2 x2 (t )
dt dt
Consider a linear system with y(t) = T x(t). The system is called time-invariant if
T x ( t − t0 ) = y ( t − t0 ) ∀ t0
It is easy to check that that the differentiator in the above example is a linear time-
invariant system.
Causal system
The system is called causal if the output of the system at t = t0 depends only on the
present and past values of input. Thus for a causal system
y ( t0 ) = T ( x(t ), t ≤ t0 )
Response of a linear time-invariant system to deterministic input
A linear system can be characterised its impulse response h(t ) = T δ (t ) where δ (t ) is the
Dirac delta function.
δ (t ) LTI h(t )
system
Recall that any function x(t) can be represented in terms of the Dirac delta function as
follows
∞
x(t ) = ∫ x(τ ) δ ( t − s )
−∞
ds
∞
= ∫ x( s ) T δ ( t − s ) ds [ Using the linearity property ]
−∞
∞
= ∫ x( s) h ( t , s )
−∞
ds
X (ω ) LTI System Y (ω )
H (ω )
Y (ω ) = H (ω ) X (ω )
∞
where H (ω ) = FT h ( t ) = ∫ h(t ) e
− jω t
dt is the frequency response of the system
−∞
Consider an LTI system with impulse response h(t). Suppose { X (t )} is a WSS process
input to the system. The output {Y (t )} of the system is given by
∞ ∞
Y (t ) = ∫ h ( s ) X ( t − s ) ds = ∫ h (t − s ) X ( s ) ds
−∞ −∞
where we have assumed that the integrals exist in the mean square (m.s.) sense.
∞
EY ( t ) = E ∫ h ( s )X ( t − s ) ds
−∞
∞
= ∫ h ( s )EX ( t − s ) ds
−∞
∞
= ∫ h ( s )µ
−∞
X ds
∞
= µX ∫ h ( s )ds
−∞
= µ X H (0)
∫ h ( t )e ∫ h ( t )dt
− jω t
H (ω ) ω =0 = dt =
−∞ ω =0 −∞
∞
EX ( t + τ )Y t ) = E X (t + τ ) ∫ h ( s ) X ( t − s ) ds
−∞
∞
= ∫ h (s)
−∞
E X ( t + τ ) X ( t − s ) ds
∞
= ∫ h(s)
−∞
RX (τ + s ) ds
∞
= ∫ h ( −u )
−∞
RX (τ − u ) du [ Put s = − u ]
= h ( −τ ) * RX (τ )
∴ R XY (τ ) = h ( −τ ) * R X (τ )
also RYX (τ ) = R XY ( −τ ) = h (τ ) * R X ( − τ )
= h (τ ) * RX (τ )
Thus we see that RXY (τ ) is a function of lag τ only. Therefore, X ( t ) and Y ( t ) are
jointly wide-sense stationary.
∞
∴ EY ( t + τ )Y (t ) ) = E ∫ h ( s ) X ( t + τ − s ) dsY (t )
−∞
∞
=
−∞
∫ h(s) E X ( t + τ − s ) Y (t ) ds
∞
=
−∞
∫ h(s) R XY (τ − s ) ds
Thus the autocorrelation of the output process {Y ( t )} depends on the time-lag τ , i.e.,
EY ( t ) Y ( t + τ ) = RY (τ ) .
Thus
RY (τ ) = RX (τ ) * h (τ ) * h ( −τ )
The above analysis indicates that for an LTI system with WSS input
(1) the output is WSS and
(2) the input and output are jointly WSS.
PY = RY ( 0 )
= RX ( 0 ) * h ( 0 ) * h ( 0 )
Power spectrum of the output process
Using the property of Fourier transform, we get the power spectral density of the output
process given by
SY (ω ) = S X (ω ) H (ω ) H * (ω )
= S X (ω ) H (ω )
2
S XY (ω ) = H * (ω ) S X (ω )
and
SYX (ω ) = H (ω ) S X (ω )
S XY (ω ) SY (ω )
S X (ω )
H (ω )
* H (ω )
RXY (τ ) RY (τ )
RX (τ )
h(−τ ) h(τ )
Example:
N0
(a) White noise process X ( t ) with power spectral density is input to an ideal low
2
pass filter of band-width B. Find the PSD and autocorrelation function of the output
process.
H (ω )
N
2
−B B ω
N0
The input process X ( t ) is white noise with power spectral density S X (ω ) = .
2
SY (ω ) = H (ω ) S X (ω )
2
N0
= 1× −B ≤ω ≤ B
2
N0
= − B ≤ω ≤ B
2
The output PSD SY (ω ) and the output autocorrelation function RY (τ ) are illustrated in
Fig. below.
S Y (ω )
N0
2
O
−B B ω
Example 2:
A random voltage modeled by a white noise process X ( t ) with power spectral density
N0
is input to a RC network shown in the fig.
2
1
jCω
H (ϖ ) =
1
R+
jCω
1
=
jRCω + 1
Therefore,
(a)
SY (ω ) = H (ω ) S X (ω )
2
1
= S X (ω )
R C ω 2 +1
2 2
1 N0
= 2 2 2
R C ω +1 2
(b) Taking inverse Fourier transform
τ
N 0 − RC
RY (τ ) = e
4 RC
⎧1 for n = 0
δ [ n] = ⎨
⎩0 otherwise
Any discrete-time signal x[ n] can be expressed in terms of δ [n] as follows:
∞
x[n] = ∑ x[ k ]δ [ n − k ]
k =−∞
An analysis similar to that for the continuous-time LTI system shows that the response
y[ n] of a the linear time-invariant system with impulse response h[ n ] to a deterministic
input x[ n] is given by
∞
y[ n] = ∑ x[ k ]h[ n − k ] = x[ n]* h[n]
k =−∞
Consider a discrete-time linear system with impulse response h[n] and WSS input X [n]
h[n]
X [n] Y [n]
∞
Y [n] = X [n]* h[n] = ∑ h[k ] X [n − k ]
k =−∞
RY [m] = E Y [n]Y [n − m]
= E ( X [n]* h[n])( X [n − m]* h[n − m]) .
= RX [m]* h[m]* h[− m]
S X (ω ) S Y (ω )
Hω )
2
• Note that though the input is an uncorrelated process, the output is a correlated
process.
Consider the case of the discrete-time system with a random sequence x[n ] as an input.
x[n ] y[n ]
h[n ]
RY [m] = R X [m] * h[m] * h[ −m]
Taking the z − transform, we get
S Y ( z ) = S X ( z ) H ( z ) H ( z −1 )
RX [ m ] RY [m]
H (z ) H ( z −1 )
S X ( z) SY ( z )
Example
1
If H ( z) = and x[n ] is a unity-variance white-noise sequence, then
1 − α z −1
Given EX 2 [n] = σ X2
σ X2
∴ S X (ω ) =
2π
SY ( z ) = H ( z ) H ( z −1 ) S X ( z )
⎛ 1 ⎞⎛ 1 ⎞ 1
=⎜ −1 ⎟⎜ ⎟
⎝ 1 − α z ⎠⎝ 1 − α z ⎠ 2π
By partial fraction expansion and inverse z − transform, we get
1
RY [m ] = a |m|
1−α 2
sequence.
If S X (ω ) is an analytic function of ω ,
π
and ∫ | ln S X (ω ) | dω < ∞ , then S X ( z ) = σ v2 H c ( z ) H a ( z )
−π
where
H c ( z ) is the causal minimum phase transfer function
1 V [ n]
X [n ] H c ( z)
1 π iω n
where c[k ] = ∫ ln S X (ω )e dw is the kth order cepstral coefficient.
2π −π
For a real signal c[k ] = c[−k ]
1 π
and c[0] = ∫ ln S XX ( w)dw
2π −π
∞
∑ c[ k ] z − k
S XX ( z ) = e k =−∞
∞ −1
∑ c[ k ] z − k ∑ c[ k ] z − k
=e c[0]
e k =1 e k =−∞
∞
∑ c[ k ] z − k
Let H C ( z ) = ek =1 z >ρ
= 1 + hc (1)z -1 + hc (2) z −2 + ......
(∵ hc [0] = Limz →∞ H C ( z ) = 1
H C ( z ) and ln H C ( z ) are both analytic
where σ V2 = ec ( 0)
Salient points
• S XX (z ) can be factorized into a minimum-phase and a maximum-phase factors
i.e. H C ( z ) and H C ( z −1 ).
• In general spectral factorization is difficult, however for a signal with rational
power spectrum, spectral factorization can be easily done.
• Since is a minimum phase filter, 1 exists (=> stable), therefore we can have a
H C (z)
1
filter to filter the given signal to get the innovation sequence.
HC ( z)
• X [n ] and v[n] are related through an invertible transform; so they contain the
same information.
Example
Wold’s Decomposition
Any WSS signal X [n ] can be decomposed as a sum of two mutually orthogonal
processes
• a regular process X r [ n] and a predictable process X p [n] , X [n ] = X r [n ] + X p [n ]
sequence as input.
• X p [n] is a predictable process, that is, the process can be predicted from its own