0% found this document useful (0 votes)
66 views

Stat Note

This document discusses probabilistic analysis of the Aloha network protocol. It provides the probabilities of different outcomes in a time slot: idle, success, and collision. The probabilities depend on the number of active nodes (n) and the transmission probability (p). It also calculates the expected number of nodes involved in a collision given that a collision occurs. Finally, it introduces the concept of independent and identically distributed (i.i.d.) random variables and processes, which arise frequently in probabilistic models.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Stat Note

This document discusses probabilistic analysis of the Aloha network protocol. It provides the probabilities of different outcomes in a time slot: idle, success, and collision. The probabilities depend on the number of active nodes (n) and the transmission probability (p). It also calculates the expected number of nodes involved in a collision given that a collision occurs. Finally, it introduces the concept of independent and identically distributed (i.i.d.) random variables and processes, which arise frequently in probabilistic models.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Topical Notes on Probabilistic Analysis

Kazi Ashrafuzzaman∗

This note sheds lights on a number of topics on probabilistic modeling and analysis, extend-
ing some of the discussions already held during the classes. This is also expected to complement
the study materials provided earlier.

1 Probabilities in Aloha Network


We’d pose and study some problems of probabilistic nature involving (slotted) Aloha, one of
the earliest network protocols. In networked communication a protocol is basically a set of rules
of engagement, which are the agreed upon steps followed by the nodes comprising the network
to accomplish a communication task. Aloha belongs to the random access category of multi-
access protocols enabling common channel sharing among the nodes within the immediate
neighborhood (the other major category is the contention-free scheduled access suitable for
periodic packet traffic). Following simplified description of slotted Aloha should suffice for
what we need to model and analyze the relevant interesting problems.

• A large number of network nodes are sharing a common communications channel. We


are assuming that there are n active nodes amongst them with data packets to transmit
to other nodes (or to a common access point). Since everyone can hear each other, only
one pair of nodes (sender and receiver) can effectively make use of the shared channel at
a time.

• Time is divided in slots1 . Each slot can accommodate a single packet and transmission
is allowed only at the beginning of a slot. We are assuming that the packets are of the
same size.

• The nodes act in distributed fashion, without there being any centralized coordination.
In other words, each node is unaware of what other transmitting nodes are up to, and
acts independently. Accordingly, if two or more nodes happen to transmit in a slot, they
collide. In that case the transmitted packets don’t get through and the channel slot is
wasted (recall that this is a shared channel).

• To reduce the chances of colliding transmissions, an active node transmits its packet in
a slot with probability p, and refrains from transmitting with probability 1 − p.

• If no node transmits in a channel slot it goes idle, whereas the slot is successfully utilized
whenever there is one and only one node transmitting its packet in it.
∗ For
corrections and comments, shoot me an k at [email protected]
1 Thereis a variation of Aloha known as pure Aloha where time is not slotted. The advantage with the pure
Aloha is that it does not require time synchronization among the nodes at slot boundaries. However it brings
in extra vulnerabilities for collision and has efficiency loss by a factor of two compared to its slotted version.
Refer to a standard networking textbook or its Wikipedia page.

1
Probability/CSE/CU Notes/KAZ

There is an illustration of the protocol in the diagram below alongside the listing of the slot-
wise outcomes. Note that this is one realization out of the large number of ways the random
occurrences can possibly unfold (with eight nodes contending for 14 slots):

slot outcome transmitting nodes


1: idle none
2: success only A
3: success only D
4: collision C, F and H
5: success only C
6: success only F
7: collision C, E and G
8: idle none
9: success only C
10: collision D and F
11: collision B, E and G
12: success only D
13: collision C and H
14: success only E

Empirical rates of events of interest can be calculated for the above example as follows:
idle rate = 2/14 = 0.143, success rate = 7/14 = 0.5, collision rate = 5/14 = 0.357 (clearly,
these values, obtained with only a run of 14 slots, do not fully represent what would turn out
as long run averages.).
Since success rate represents the proportion of time the channel is truly utilized, this is of
particular interest and is known as channel efficiency or throughput.
In the above illustration we merely observed a realization of the process without any need to
know the value of p (in fact, the nodes may even have different values of p among themselves).
In probabilistic analysis for a network of n nodes, however, a given value of p would be sufficient
for us to deduce the probabilities of the channel being idle, and seeing successful and colliding
transmissions (note that the probability of an arbitrary slot carrying a successful transmission
is throughput).

Probabilities and expectations: The probabilities of idle, success and collision in a slot
are as follows:

Pr{idle} = Pr{all n nodes refrain from transmission} = (1 − p)n


Pr{success} = Pr{one of n nodes transmits and the rest refrains} = np(1 − p)n−1
Pr{collision} = Pr{two or more nodes transmits}
n n
!
n k
p (1 − p)n−k
X X
= Pr{k nodes transmit and the rest refrains} =
k=2 k=2
k
= 1 − (1 − p)n − np(1 − p)n−1

As the channel cannot observe any other event in a slot beyond these three, the last equality
for Pr{collision} is deduced from the normalization condition

Pr{idle} + Pr{success} + Pr{collision} = 1

2
Probability/CSE/CU Notes/KAZ

n
!
n k
p (1 − p)n−k = 1.
X
Equivalently, that is also due to the Binomial identity
k=0
k
The followings are some examples of interesting analytic questions that can be addressed
from this setting.

Given that there is a collision in a slot, what is the expected number of nodes
involved in the collision? To deduce this conditional expectation, it helps to look at the
associated conditional probabilities first.

Pr{k nodes transmitted ∩ collision}


Pr{k nodes transmitted | collision} =
Pr{collision}
pk (1 − p)n−k
 
n
k
= , 2≤k≤n
1 − (1 − p)n − np(1 − p)n−1

Let C denote the random variable representing the number of nodes involved in a collision.
The required conditional expectation can now be computed as
n
E{C | collision} =
X
k Pr{C = k | collision}
k=2
Xn
= k Pr{k nodes transmitted | collision}
k=2
pk (1 − p)n−k
 
n
n
X k· k
=
k=2 1 − (1 − p)n − np(1 − p)n−1
np − np(1 − p)n−1
=
1 − (1 − p)n − np(1 − p)n−1

What is the probability that at least one transmission occurs in a slot? This is
the complementary event to the slot being idle where both colliding and successful slots are
accounted for. Hence

Pr{one or more nodes transmit} = 1 − Pr{idle} = 1 − (1 − p)n .

Now try to work out the following questions and think about similar others:
What is the expected number of nodes that transmit in a given slot?
What is the probability that at least three nodes transmit in a given slot?
What is the probability that at most one node transmits in a given slot?

1.1 Independent & identically distributed random variables/processes


In many probabilistic models we specify a collection of random variables (RVs) as independent
and identically distributed, often abbreviated as i.i.d. (or iid, or sometimes as IID). In this
case, the RVs are mutually independent as well as are identically distributed. Being identically
distributed requires that the RVs are of the same distribution type and with the same parameter
values. The associated concept arises more frequently when studying certain stochastic/random
processes e.g., a Bernoulli process is an i.i.d. process.

3
Probability/CSE/CU Notes/KAZ

1.2 Convergence and probability laws


The strong law of large numbers (SLLN), stated below, is probably the most well-known result
in probability theory. It states that the average of a sequence of independent random variables
having the same distribution will, with probability 1, converge to the mean of that distribution.

Theorem 1 (Strong Law of Large Numbers). Let X1 , X2 , . . . be a sequence of independent


random variables having a common distribution, and let E[Xi ] = µ. Then, with probability 1,

X1 + X2 + ůůů + Xn
→ µ as n → ∞.
n

Remark. Contrast the SLLN stated above with the weaker version
n
discussed earlier in the
def
X
class. With the iid random variables Xi introduced, take Sn = Xi . Now SLLN says that
i=1

Sn
 

Pr lim − µ ≥ ϵ = 0.
n→∞ n

The weak law of large numbers, which comes from Chebyshev’s inequality, represents the
following
Sn
 

lim Pr − µ ≥ ϵ = 0.

n→∞ n

Here we state the central limit theorem (CLT), arguably the most celebrated result in
probability theory.

Theorem 2 (Central limit theorem). Let X1 , X2 , . . . be a sequence of independent, identically


distributed random variables, each with mean µ and variance σ 2 . Then the distribution of
X1 + X2 + · · · + Xn − nµ
√ tends to the standard normal (N (0, 1)) as n → ∞. That is,
σ n
!
X1 + X2 + · · · + Xn − nµ 1 Z a −x2 /2
Pr √ ≤a → √ e dx.
σ n 2π −∞

Notice that the precise distribution of these iid random variables Xi do not matter for the
said convergence. That contributes to the significance of the theorem. This might as well
explain why the normal distribution is observed all over the place, or, in other words, why
normal (i.e., Gaussian) distribution is normal.
Let us illustrate how CLT can be employed to analyze relevant problems.

Example 1. The lifetime of a special type of battery is a random variable with mean 40 hours
and standard deviation 20 hours. A battery is used until it fails, at which point it is replaced by
a new one. Assuming a stockpile of 25 such batteries, the lifetimes of which are independent,
approximate the probability that over 1100 hours of use can be obtained.
If we let Xi denote the lifetime of the ith battery to be put in use, then we desire to
determine p, which is given by

p = Pr{X1 + · · · + X25 > 1100}.

4
Probability/CSE/CU Notes/KAZ

Clearly
" #
X1 + · · · + X25 − 1000 1100 − 1000
[X1 + · · · + X25 > 1100] ⇔ √ > √ =1
20 25 20 25
X1 + · · · + X25 − 1000
Now √ can be approximated per the CLT as N (0, 1) ( though n = 25
20 25
falls short from n → ∞, we are getting an useful approximation nonetheless). This allows for
the desired probability to be framed as the complementary CDF

Pr{N (0, 1) > 1} = 1 − Pr{N (0, 1) ≤ 1} = 1 − Φ(1) = 0.1587.

Therefore p ≈ 0.1587. □

1.3 Covariance and correlation


Covariance is a generalization of variance that provides a measure of correlation between two
random variables.

Definition 1. The covariance of two random variables X and Y is

Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])].


def
(1)

This also yields Var[X] = Cov(X, X) as a special case. □

The following equivalent formulation of covariance follows immediately by expanding and


applying linearity of expectation to equation (1).

Lemma 3.
Cov(X, Y ) = E[X · Y ] − E[X] · E[Y ]. (2)

Another simple consequence of linearity of expectation is:

Lemma 4. For a, b, c, d ∈ R,

Cov(aX + b, cY + d) = ac Cov(X, Y ). (3)

Exercise 1. Do the algebra to fill in the necessary detail to formally prove the two lemma
above. ☺
Since covariance can vary widely, correlation coefficient, defined as,

def Cov(X, Y )
ρ(X, Y ) = q
Var[X]Var[Y ]

is often computed for convenience as a normalized measure of covariance of random variables


X and Y . Notice that the normalization implies that −1 ≤ ρ(X, Y ) ≤ 1.
Exercise 2. Think about what a positive or a negative value of ρ(X, Y ) signifies. If X is a
discrete uniform random variable taking values from {−1, 0, 1}, and Y = 1/(2 + X), what
connection do you find between ρ(X, Y ) and the way Y changes with respect to X? ☺

5
Probability/CSE/CU Notes/KAZ

Definition 2. Two variables are said to be uncorrelated if and only if their covariance (or,
equivalently, correlation coefficient) is zero. By equation (2), we can also put it another way:
X and Y are uncorrelated if and only if

E[X · Y ] = E[X] · E[Y ]. (4)


We know that if X and Y are independent, then equation (4) holds, and hence X and Y
are uncorrelated. But the converse is not true: there are uncorrelated random variables that
are not independent. In other words, if we denote independence between X and Y by X ⊥ Y

X ⊥ Y =⇒ [ Cov(X, Y ) = 0], but [ Cov(X, Y ) = 0] =⇒


̸ X ⊥ Y. (5)

Example 2. Let X be a random variable taking values −1, 0, 1 with a uniform distribution.
So E[X] = 0. Now let Y be the indicator variable for the event [X = 0]. So Y = 0 if and only
if X ̸= 0, and hence XY = 0. So
2
E[XY ] = E[0] = 0 = 0 · = E[X]E[Y ],
3
confirming that X and Y are uncorrelated.
Also, X and Y are obviously not independent, since
1
Pr{X = 0 | Y = 0} = 0 ̸= = Pr{X = 0}.
3
There are also examples of uncorrelated but dependent variables whose expectations are
nonzero. For example, by equation (3), Cov(X + 2, Y + 1) = Cov(X, Y ), so X + 2 and Y + 1
are also uncorrelated and only take positive values. □
Exercise 3. Let X, a random variable, take values uniformly from the discrete set {−1, 0, 1}.
Define Y = |X|. Clearly, then, X and Y are not independent (Y is a function of X, after all).
Find the Cov(X, Y ) and show how the implications (5) fare in this case. ☺
For correlated variables, variances do not add, obeying linearity, but covariance provides
just the right corrective term.
Theorem 5 (General Variance Additivity). For any random variables X, Y ,

Var[X + Y ] = Var[X] + Var[Y ] + 2 Cov(X, Y ). (6)


Correlation, more specifically lack of it, is therefore important because it is sufficient for
variances to obey linearity. That is Cov(X, Y ) = 0 in equation (6) yields

Var[X + Y ] = Var[X] + Var[Y ]. (7)

The above linearity can be extended for n random variables via straightforward induction.
A set of variables X1 , X2 , . . . , Xn is said to be pairwise uncorrelated if and only if Xi and Xj
are uncorrelated (i.e., Cov(Xi , XJ ) = 0) for all i ̸= j.
Theorem 6. If X1 , X2 , . . . , Xn are pairwise uncorrelated random variables, then

Var[X1 + X2 + · · · + Xn ] = Var[X1 ] + Var[X2 ] + · · · + Var[Xn ]. (8)

6
Probability/CSE/CU Notes/KAZ

1.4 Chernoff bound


Chernoff bound comes in a number of flavors. The form we already used is equivalent to the
other forms in which it gets reported elsewhere.
In particular, the following form for RV X with E[X] = µ is also widely used.
 µ

Pr(X ≥ (1 + δ)µ) ≤   , δ > −1.
(1 + δ)1+δ

Exercise 4. Do the required algebra to yield equivalence between the above and the following
form we already employed

Pr(X ≥ cµ) ≤ e−(c ln c−c+1)µ , c > 0.

1.5 Probability/Moment generating functions and transforms


You are supposed to have studied ordinary generating functions in discrete Mathematics. We
are going to briefly appeal to your familiarity with that analytic toolbox, as we seek to plug in
a PMF into it.
Let X be a nonnegative integer-valued random variable. Define G, the probability generating
function (PGF) of X, by

G(t) = E[tX ], 0 ≤ t ≤ 1 (9)

If p(.) is the probability mass function of X, the power series G, ignoring the integrability of
X, is as follows

p(n)tn .
X
G(t) = (10)
n=0

Note that the condition 0 ≤ t ≤ 1 is to prevent the power series from diverging.
Putting t = es in the above equations, we get an exponential function MX (s) such that

s2 s3
MX (s) = G(es ) = E[esX ] = 1 + sE[X] + E[X 2 ] + E[X 3 ] + . . . .
2! 3!
We call MX (s) the moment generating function (MGF), or transform, of X.
In this case X can be a general random variable as well (i.e., it need not take nonnegative
integer values). Specifically,
X


 esX p(x) if X is discrete
MX (s) = E[esX ] = Zx∞


 esX fX (x) dx if X is continuous
−∞

MGF is moment generating, for the following reason. All of the moments, i.e. random
variables of the form X k , are on the right hand side, and we can sift out whichever moment
we need with a cute trick. If we want E[X], then we can differentiate both sides with respect
to s and set s = 0, killing all the terms but E[X]. In symbols,

d s2
[MX (s)] = E[X] + sE[X 2 ] + E[X 3 ] + . . .
ds 2!

7
Probability/CSE/CU Notes/KAZ

If we wanted E[X 2 ], we can just take another derivative to get

d2
[MX (s)] = E[X 2 ] + sE[X 3 ] + . . .
ds2
and set s = 0. In general, if we take n derivatives, we find
dn
n
[MX (s)] = E[X n ] + sE[X n+1 ] + . . .
ds
from which setting s to 0 gives us the nth moment.
Not only are MGFs great to easily find the moments of X, they also make it easy to
deal with convolution of random variables (think about finding the PDF/PMF of Z, given by
Z = X + Y , where X and Y are two independent random variables).
With s ≥ 0 for nonnegative random variables X, the Laplace transform g(t), is the MGF
evaluated at −s, g(t) = MX (−s) = E[e−sX ]. There are cases when Laplace transform is
particularly convenient to deal with.

1.5.1 MGF examples


Let’s get our hands dirty and find the MGFs of some common distributions.

Example 3 (Exponential MGF). Suppose we have a RV X ∼ Exp(λ) which has PDF fX (x) =
λe−λx for x ≥ 0. Then
Z ∞ Z ∞
e−(λ−s)x ∞ λ

E[esX
]= sx
e fX (x)dx = λ −λx sx
e e dx = λ =
0 0 −(λ − s) 0 λ−s

where s < λ must hold in order for the integral to converge.


λ 1 ′′ (0) = 2 .

Using this we can get E[X] = MX (0) =
′ = , and E[X 2 ] = MX

2

(λ − s) s=0 λ

λ
Example 4 (Poisson MGF). Now lets do the same for the Poisson distribution.
e−λ λk
Let X ∼ Poisson(λ), so that Pr(X = k) = for k ≥ 0. Then
k!
∞ ∞
e−λ λk (λes )k s
MX (s) = E[esX ] = esk = e−λ = e−λ+λe .
X X

k=0
k! k=0
k!

′ (0) = λ and M ′′ (0) = λ2 + λ.


From this we can calculate MX □
X

Exercise 5. Try to determine the MGF of random variables that have the following distribu-
tions: Uniform, Gaussian, Binomial and Geometric. ☺

You might also like