Stat Note
Stat Note
Kazi Ashrafuzzaman∗
This note sheds lights on a number of topics on probabilistic modeling and analysis, extend-
ing some of the discussions already held during the classes. This is also expected to complement
the study materials provided earlier.
• Time is divided in slots1 . Each slot can accommodate a single packet and transmission
is allowed only at the beginning of a slot. We are assuming that the packets are of the
same size.
• The nodes act in distributed fashion, without there being any centralized coordination.
In other words, each node is unaware of what other transmitting nodes are up to, and
acts independently. Accordingly, if two or more nodes happen to transmit in a slot, they
collide. In that case the transmitted packets don’t get through and the channel slot is
wasted (recall that this is a shared channel).
• To reduce the chances of colliding transmissions, an active node transmits its packet in
a slot with probability p, and refrains from transmitting with probability 1 − p.
• If no node transmits in a channel slot it goes idle, whereas the slot is successfully utilized
whenever there is one and only one node transmitting its packet in it.
∗ For
corrections and comments, shoot me an k at [email protected]
1 Thereis a variation of Aloha known as pure Aloha where time is not slotted. The advantage with the pure
Aloha is that it does not require time synchronization among the nodes at slot boundaries. However it brings
in extra vulnerabilities for collision and has efficiency loss by a factor of two compared to its slotted version.
Refer to a standard networking textbook or its Wikipedia page.
1
Probability/CSE/CU Notes/KAZ
There is an illustration of the protocol in the diagram below alongside the listing of the slot-
wise outcomes. Note that this is one realization out of the large number of ways the random
occurrences can possibly unfold (with eight nodes contending for 14 slots):
Empirical rates of events of interest can be calculated for the above example as follows:
idle rate = 2/14 = 0.143, success rate = 7/14 = 0.5, collision rate = 5/14 = 0.357 (clearly,
these values, obtained with only a run of 14 slots, do not fully represent what would turn out
as long run averages.).
Since success rate represents the proportion of time the channel is truly utilized, this is of
particular interest and is known as channel efficiency or throughput.
In the above illustration we merely observed a realization of the process without any need to
know the value of p (in fact, the nodes may even have different values of p among themselves).
In probabilistic analysis for a network of n nodes, however, a given value of p would be sufficient
for us to deduce the probabilities of the channel being idle, and seeing successful and colliding
transmissions (note that the probability of an arbitrary slot carrying a successful transmission
is throughput).
Probabilities and expectations: The probabilities of idle, success and collision in a slot
are as follows:
As the channel cannot observe any other event in a slot beyond these three, the last equality
for Pr{collision} is deduced from the normalization condition
2
Probability/CSE/CU Notes/KAZ
n
!
n k
p (1 − p)n−k = 1.
X
Equivalently, that is also due to the Binomial identity
k=0
k
The followings are some examples of interesting analytic questions that can be addressed
from this setting.
Given that there is a collision in a slot, what is the expected number of nodes
involved in the collision? To deduce this conditional expectation, it helps to look at the
associated conditional probabilities first.
Let C denote the random variable representing the number of nodes involved in a collision.
The required conditional expectation can now be computed as
n
E{C | collision} =
X
k Pr{C = k | collision}
k=2
Xn
= k Pr{k nodes transmitted | collision}
k=2
pk (1 − p)n−k
n
n
X k· k
=
k=2 1 − (1 − p)n − np(1 − p)n−1
np − np(1 − p)n−1
=
1 − (1 − p)n − np(1 − p)n−1
What is the probability that at least one transmission occurs in a slot? This is
the complementary event to the slot being idle where both colliding and successful slots are
accounted for. Hence
Now try to work out the following questions and think about similar others:
What is the expected number of nodes that transmit in a given slot?
What is the probability that at least three nodes transmit in a given slot?
What is the probability that at most one node transmits in a given slot?
3
Probability/CSE/CU Notes/KAZ
X1 + X2 + ůůů + Xn
→ µ as n → ∞.
n
■
Remark. Contrast the SLLN stated above with the weaker version
n
discussed earlier in the
def
X
class. With the iid random variables Xi introduced, take Sn = Xi . Now SLLN says that
i=1
Sn
Pr lim − µ ≥ ϵ = 0.
n→∞ n
The weak law of large numbers, which comes from Chebyshev’s inequality, represents the
following
Sn
lim Pr − µ ≥ ϵ = 0.
n→∞ n
■
Here we state the central limit theorem (CLT), arguably the most celebrated result in
probability theory.
Notice that the precise distribution of these iid random variables Xi do not matter for the
said convergence. That contributes to the significance of the theorem. This might as well
explain why the normal distribution is observed all over the place, or, in other words, why
normal (i.e., Gaussian) distribution is normal.
Let us illustrate how CLT can be employed to analyze relevant problems.
Example 1. The lifetime of a special type of battery is a random variable with mean 40 hours
and standard deviation 20 hours. A battery is used until it fails, at which point it is replaced by
a new one. Assuming a stockpile of 25 such batteries, the lifetimes of which are independent,
approximate the probability that over 1100 hours of use can be obtained.
If we let Xi denote the lifetime of the ith battery to be put in use, then we desire to
determine p, which is given by
4
Probability/CSE/CU Notes/KAZ
Clearly
" #
X1 + · · · + X25 − 1000 1100 − 1000
[X1 + · · · + X25 > 1100] ⇔ √ > √ =1
20 25 20 25
X1 + · · · + X25 − 1000
Now √ can be approximated per the CLT as N (0, 1) ( though n = 25
20 25
falls short from n → ∞, we are getting an useful approximation nonetheless). This allows for
the desired probability to be framed as the complementary CDF
Therefore p ≈ 0.1587. □
Lemma 3.
Cov(X, Y ) = E[X · Y ] − E[X] · E[Y ]. (2)
■
Lemma 4. For a, b, c, d ∈ R,
Exercise 1. Do the algebra to fill in the necessary detail to formally prove the two lemma
above. ☺
Since covariance can vary widely, correlation coefficient, defined as,
def Cov(X, Y )
ρ(X, Y ) = q
Var[X]Var[Y ]
5
Probability/CSE/CU Notes/KAZ
Definition 2. Two variables are said to be uncorrelated if and only if their covariance (or,
equivalently, correlation coefficient) is zero. By equation (2), we can also put it another way:
X and Y are uncorrelated if and only if
□
We know that if X and Y are independent, then equation (4) holds, and hence X and Y
are uncorrelated. But the converse is not true: there are uncorrelated random variables that
are not independent. In other words, if we denote independence between X and Y by X ⊥ Y
Example 2. Let X be a random variable taking values −1, 0, 1 with a uniform distribution.
So E[X] = 0. Now let Y be the indicator variable for the event [X = 0]. So Y = 0 if and only
if X ̸= 0, and hence XY = 0. So
2
E[XY ] = E[0] = 0 = 0 · = E[X]E[Y ],
3
confirming that X and Y are uncorrelated.
Also, X and Y are obviously not independent, since
1
Pr{X = 0 | Y = 0} = 0 ̸= = Pr{X = 0}.
3
There are also examples of uncorrelated but dependent variables whose expectations are
nonzero. For example, by equation (3), Cov(X + 2, Y + 1) = Cov(X, Y ), so X + 2 and Y + 1
are also uncorrelated and only take positive values. □
Exercise 3. Let X, a random variable, take values uniformly from the discrete set {−1, 0, 1}.
Define Y = |X|. Clearly, then, X and Y are not independent (Y is a function of X, after all).
Find the Cov(X, Y ) and show how the implications (5) fare in this case. ☺
For correlated variables, variances do not add, obeying linearity, but covariance provides
just the right corrective term.
Theorem 5 (General Variance Additivity). For any random variables X, Y ,
■
Correlation, more specifically lack of it, is therefore important because it is sufficient for
variances to obey linearity. That is Cov(X, Y ) = 0 in equation (6) yields
The above linearity can be extended for n random variables via straightforward induction.
A set of variables X1 , X2 , . . . , Xn is said to be pairwise uncorrelated if and only if Xi and Xj
are uncorrelated (i.e., Cov(Xi , XJ ) = 0) for all i ̸= j.
Theorem 6. If X1 , X2 , . . . , Xn are pairwise uncorrelated random variables, then
6
Probability/CSE/CU Notes/KAZ
Exercise 4. Do the required algebra to yield equivalence between the above and the following
form we already employed
If p(.) is the probability mass function of X, the power series G, ignoring the integrability of
X, is as follows
∞
p(n)tn .
X
G(t) = (10)
n=0
Note that the condition 0 ≤ t ≤ 1 is to prevent the power series from diverging.
Putting t = es in the above equations, we get an exponential function MX (s) such that
s2 s3
MX (s) = G(es ) = E[esX ] = 1 + sE[X] + E[X 2 ] + E[X 3 ] + . . . .
2! 3!
We call MX (s) the moment generating function (MGF), or transform, of X.
In this case X can be a general random variable as well (i.e., it need not take nonnegative
integer values). Specifically,
X
esX p(x) if X is discrete
MX (s) = E[esX ] = Zx∞
esX fX (x) dx if X is continuous
−∞
MGF is moment generating, for the following reason. All of the moments, i.e. random
variables of the form X k , are on the right hand side, and we can sift out whichever moment
we need with a cute trick. If we want E[X], then we can differentiate both sides with respect
to s and set s = 0, killing all the terms but E[X]. In symbols,
d s2
[MX (s)] = E[X] + sE[X 2 ] + E[X 3 ] + . . .
ds 2!
7
Probability/CSE/CU Notes/KAZ
d2
[MX (s)] = E[X 2 ] + sE[X 3 ] + . . .
ds2
and set s = 0. In general, if we take n derivatives, we find
dn
n
[MX (s)] = E[X n ] + sE[X n+1 ] + . . .
ds
from which setting s to 0 gives us the nth moment.
Not only are MGFs great to easily find the moments of X, they also make it easy to
deal with convolution of random variables (think about finding the PDF/PMF of Z, given by
Z = X + Y , where X and Y are two independent random variables).
With s ≥ 0 for nonnegative random variables X, the Laplace transform g(t), is the MGF
evaluated at −s, g(t) = MX (−s) = E[e−sX ]. There are cases when Laplace transform is
particularly convenient to deal with.
Example 3 (Exponential MGF). Suppose we have a RV X ∼ Exp(λ) which has PDF fX (x) =
λe−λx for x ≥ 0. Then
Z ∞ Z ∞
e−(λ−s)x ∞ λ
E[esX
]= sx
e fX (x)dx = λ −λx sx
e e dx = λ =
0 0 −(λ − s) 0 λ−s
k=0
k! k=0
k!
Exercise 5. Try to determine the MGF of random variables that have the following distribu-
tions: Uniform, Gaussian, Binomial and Geometric. ☺