0% found this document useful (0 votes)
42 views

Maths Assignment

This document provides information about probability distributions. It discusses the binomial distribution and derives its mean and variance. It also discusses the Poisson distribution and derives its mean. MATLAB code is provided to calculate and plot the probability mass functions (pmfs) of the binomial and Poisson distributions. As the number of trials (n) increases, the binomial distribution converges to the Poisson distribution, as evidenced by the decreasing mean squared error between their pmfs.

Uploaded by

maxamed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Maths Assignment

This document provides information about probability distributions. It discusses the binomial distribution and derives its mean and variance. It also discusses the Poisson distribution and derives its mean. MATLAB code is provided to calculate and plot the probability mass functions (pmfs) of the binomial and Poisson distributions. As the number of trials (n) increases, the binomial distribution converges to the Poisson distribution, as evidenced by the decreasing mean squared error between their pmfs.

Uploaded by

maxamed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Week 2: Probability review

Bernoulli, binomial, Poisson, and normal distributions


Solutions

A Binomial distribution. To evaluate the mean and variance of a binomial RV Bn with


parameters (n, p), we will rely on the relation between the binomial and the Bernoulli. First,
let {Li }i=1,...,n be independent Bernoulli RVs with probability of success p. Then, the expected
value of a single Li can be computed as

E [Li ] = 1 · p + 0 · (1 − p) = p. (1)

Indeed, the expected value is the sum of possible outcomes weighted by their probabilities by
definition. Using the fact that Bn is the sum of n independent and identically distributed (i.i.d.)
Bernoulli RVs Li , we can derive its expected value from (1) by linearity:
n
" n # n
X X X
Bn = Li ⇒ E [Bn ] = E Li = E [Li ] [from the linearity of expectation]
i=1 i=1 i=1
n
X
= p = np.
i=1

We can proceed in a similar way for the variance. First, we compute some second-order moments
of Bernoulli RVs. Explicitly,

E L2i = 12 · p + 02 · (1 − p) = p,
 

E [Li Lj ] = E [Li ] · E [Lj ] = p · p = p2 ,

where we recall that {Li , Lj } are independent for i 6= j. Then,


 ! n 
h i n
X X
var(Bn ) = E (Bn − E(Bn ))2 = E Bn2 − (E [Bn ])2 = E  Lj  − (np)2
 
Li 
i=1 j=1
n X
X n
= E [Li Lj ] − n2 p2 = np + n(n − 1)p2 − n2 p2 = np(1 − p).
i=1 j=1

A MATLAB script that computes the values of the binomial pmf is presented below.

1 % Takes vector k and scalars n and p and returns a vector of the same size
2 % as k, whose entries represent the probability of a binomial RV with
3 % parameters n and p at the points of the elements of k.
4
5 function pmf = binomial pmf(k, n, p)
6

1
7 pmf = zeros(size(k)); % Initialize output
8
9 for i = 1:length(k)
10 % Check that the element of k is in the support of the binomial(n,p).
11 % Otherwise, pmf = 0.
12 if (k(i) >= 0) && (k(i) <= n)
13 pmf(i) = nchoosek(n,k(i)) * pˆk(i) * (1-p)ˆ(n-k(i));
14 end
15 end
16
17 end

You could write a similar code replacing nchoosek by the function factorial, but be aware
that it is only accurate for numbers up to 21. This is because MATLAB represents numbers in
double precision which have roughly 15 “usable” digits. The function nchoosek uses different
approximations for large n.
We can use binomial pmf.m to plot the pmfs and cdfs requested in part A. Notice we use of
stem to show the pmf and stairs for the cdf, since these are discrete RV.

1 % Delete all variables and close figures


2 clear all
3 close all
4
5 n vector = [6, 10, 20, 50];
6
7 for i = 1:length(n vector)
8 n = n vector(i);
9 p = 5/n; % E[B n] = 5
10
11 h1 = figure(1);
12 subplot(2,2,i);
13 stem(0:n, binomial pmf(0:n, n, p), '.');
14 title(['n = ' num2str(n)]);
15 xlabel('k');
16 ylabel('pmf');
17 grid;
18 xlim([0,50]);
19 ylim([0,0.5]);
20
21 h2 = figure(2);
22 subplot(2,2,i);
23 stairs(0:n, cumsum(binomial pmf(0:n, n, p)), 'LineWidth', 2);
24 title(['n = ' num2str(n)]);
25 xlabel('k');
26 ylabel('cdf');
27 grid;
28 xlim([0,50]);
29 ylim([0,1]);
30 end
31
32 %%% Export figure %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
33 set(h1,'Color','w');
34 export fig('-q101', '-pdf', 'HW2 A.pdf', h1);
35 set(h2,'Color','w');
36 export fig('-q101', '-pdf', '-append', 'HW2 A.pdf', h2);
37 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The pmfs and cdfs are show in Figures 1 and 2.

2
B Binomial and Poisson distributions. The expected value of a Poisson RV P with param-
eter λ is given by
∞ ∞ ∞
 −λ k  !
X e λ X kλ k−1 d X λk
E [P ] = k· = λe−λ = λe−λ
k! k! dλ k!
k=0 k=0 k=0
d  
= λe−λ eλ = λe−λ eλ = λ.

Notice that in the first equality in the second line comes from recognizing that the infinite sum is
the Taylor series of the exponential.
The MATLAB code to plot the Poisson distribution is shown below.

1 % Delete all variables and close figures


2 clear all
3 close all
4
5 lambda = 5;
6 k = 0:50;
7
8 poisson pmf = exp(-lambda) * (lambda.ˆk) ./ factorial(k);
9 % Note that we use the "dot" to perform elementwise operations and evaluate
10 % the pmf for points in k at once.
11
12 figure();
13 stem(k, poisson pmf, '.');
14 title(['Poisson pmf with \lambda = ' num2str(lambda)]);
15 xlabel('k');
16 ylabel('pmf');
17 grid;
18
19
20 %%% Export figure %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
21 set(gcf,'Color','w');
22 export fig('-q101', '-pdf', 'HW2 B1.pdf');
23 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The result can be found in Figure 3. Note the similarity with Figures 1 for large n.
To calculate the MSE, we first check a few values of the Poisson pmf to find the cut-off point for
the exercise. Simple trial and error shows that we need only considers values for k ∈ [3, 9]. Hence,
we can use the following code to evaluate and plot the MSE.

1 % Delete all variables and close figures


2 clear all
3 close all
4
5 lambda = 5;
6 k = 3:9;
7
8 poisson pdf = exp(-lambda)*(lambda.ˆk) ./ factorial(k);
9
10 n vector=[6 10 20 50];
11 mse = zeros(length(n vector), 1);
12
13 for i=1:length(n vector)
14 n = n vector(i);
15 mse(i) = sum( (binomial pmf(k, n, lambda/n) - poisson pdf).ˆ2 .* poisson pdf );
16 end
17
18 figure();

3
19 plot(n vector, mse, 'x', 'LineWidth', 2);
20 title('MSE between Poisson and binomial pmfs')
21 xlabel('n');
22 ylabel('MSE');
23 grid;
24
25
26 disp(mse);
27
28
29 %%% Export figure %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
30 set(gcf,'Color','w');
31 export fig('-q101', '-pdf', 'HW2 B2.pdf');
32 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The resulting plot is shown in Figure 4. Values are reported in Table 1.

Table 1: MSE between the pmf of a Poisson with λ = 5 and binomials with parameters (n, λ/n)
for n = 6, 10, 20, 50.

n MSE
6 0.01684
10 0.00168
20 0.00025
50 0.00003

The table above shows that the MSE between the binomial and Poisson distributions rapidly
decreases to zero as n increases, indicating that the distributions become increasingly similar for
large n.

C Binomial and Poisson distributions again. We start by writing out the pmf of Bn .
Explicitly,
 k 
λ n−k

n! n! λ
P [Bn = k] = pk (1 − p)n−k = 1− .
(n − k)!k! (n − k)!k! n n

Then, notice that we can rearrange the first two terms to read

λk n(n − 1)(n − 2) · · · (n − k + 1) λ n−k


 
P [Bn = k] = 1−
k! nk n
λk n (n − 1) (n − 2) λ n−k
 
(n − k + 1)
= ··· 1−
k! n n n n n

Finally, we can take the limit as n → ∞. Recall that the limit of the product is the product of the
limit as long as all limits exist, which is the case here:

λk λ n−k
 
n (n − 1) (n − 2) (n − k + 1)
lim P [Bn = k] = lim ··· 1−
n→∞ k! n→∞ n n n n n
λk n
λ −k
   
λ
= (1 × 1 × · · · × 1) lim 1 − lim 1 − .
k! n→∞ n n→∞ n

4
The last limit is simply 1−k . Also, by definition, we have

λ n
 
lim 1 − = e−λ .
n→∞ n

Thus,
λk −λ
lim P [Bn = k] = e = P [P = k] .
n→∞ k!

D Binomial and normal distributions. If Zn is a normal RV with zero mean and unit
variance (also known as a standard normal ), then from
Pn
Yi − nµ
Zn = i=1 √ ,
σ n
Pn 2
it holds that Yn = i=1 Yi is also normally distributed but with mean np and variance nσ ,
2
where σ = p(1 − p) is the variance of a single Bernoulli Yi . The MATLAB code to display this
approximation is given below.

1 % Delete all variables and close figures


2 clear all
3 close all
4
5 n vector = [10 20 50];
6 p = 0.5;
7
8 for i = 1:length(n vector)
9 n = n vector(i);
10
11 Yn mean = n*p;
12 Yn var = n*p*(1-p);
13 k = 0:n;
14
15 figure(1);
16 subplot(3,1,i);
17 stairs(k, cumsum(binomial pmf(k, n, p)), 'LineWidth', 2);
18 hold on
19 stairs(k, normcdf(k, Yn mean, sqrt(Yn var)), 'LineWidth', 2);
20 title(['n=',num2str(n)]);
21 xlabel('k');
22 ylabel('cdf');
23 legend('Binomial', 'Normal');
24 grid;
25 end
26
27
28
29 %%% Export figure %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
30 set(gcf,'Color','w');
31 export fig('-q101', '-pdf', 'HW2 D.pdf');
32 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The resulting plot can be found in Figure 5.

E Normal and Poisson approximations. I’ll provide you with a hint for this part: The
Poisson limit theorem is about counting a large number of increasingly improbable events. In
particular, note that for the distribution of a sum of i.i.d. Bernoulli RVs (i.e., a binomial RV) to
converge to a Poison distribution with mean λ, the probability of success of each Bernoulli trial

5
must be p = λ/n, which goes to zero as n → ∞. On the other hand, p is fixed for the CLT. Hence,
the CLT and the Poisson limit theorem are addressing fundamentally different limits. I leave more
contemplation on this matter to you!
n=6 n = 10

0.4 0.4

0.3 0.3
pmf

pmf
0.2 0.2

0.1 0.1

0 0
0 20 40 0 20 40
k k
n = 20 n = 50

0.4 0.4

0.3 0.3
pmf

pmf
0.2 0.2

0.1 0.1

0 0
0 20 40 0 20 40
k k

Figure 1: Binomial pmf for n = 6, 10, 20, 50 and p = 5/n (part A).

6
n=6 n = 10
1 1

cdf

cdf
0.5 0.5

0 0
0 20 40 0 20 40
k k
n = 20 n = 50
1 1
cdf

cdf
0.5 0.5

0 0
0 20 40 0 20 40
k k

Figure 2: Binomial cdf for n = 6, 10, 20, 50 and p = 5/n (part A).

Poisson pmf with =5


0.18

0.16

0.14

0.12

0.1
pmf

0.08

0.06

0.04

0.02

0
0 10 20 30 40 50
k

Figure 3: The pmf of a Poisson RV with λ = 5 (part B). Note that the support of the Poisson
distribution is the whole non-negative integer line. However, we display only the first 50 points.

7
MSE between Poisson and binomial pmfs
0.018

0.016

0.014

0.012

0.01
MSE

0.008

0.006

0.004

0.002

0
5 10 15 20 25 30 35 40 45 50
n

Figure 4: MSE between the pmf of a Poisson with λ = 5 and binomials with parameters (n, λ/n)
for n = 6, 10, 20, 50 (part B).

n=10
1
Binomial
cdf

0.5
Normal
0
0 2 4 6 8 10
k
n=20
1
Binomial
cdf

0.5
Normal

0
0 5 10 15 20
k
n=50
1
Binomial
cdf

0.5
Normal

0
0 10 20 30 40 50
k

Figure 5: Cumulative distribution function of the binomial RV Yn for n = 10, 20, 30 and its ap-
proximation by a normal cdf (part D).

You might also like