0% found this document useful (0 votes)
13 views33 pages

ch1

Chapter 1 of STAT2602B covers basic concepts in probability and statistics, including discrete and continuous distributions, and empirical distributions. It defines key terms such as probability density functions (p.d.f.) and cumulative distribution functions (c.d.f.), along with their properties and theorems. The chapter also provides examples and discusses the empirical distribution derived from samples, emphasizing its convergence to the true distribution as sample size increases.

Uploaded by

SUJJAL JAIN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views33 pages

ch1

Chapter 1 of STAT2602B covers basic concepts in probability and statistics, including discrete and continuous distributions, and empirical distributions. It defines key terms such as probability density functions (p.d.f.) and cumulative distribution functions (c.d.f.), along with their properties and theorems. The chapter also provides examples and discusses the empirical distribution derived from samples, emphasizing its convergence to the true distribution as sample size increases.

Uploaded by

SUJJAL JAIN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Chapter 1: Basic Concepts

STAT2602B Probability and statistics II


(2024-2025 2nd Semester)
Contents

1.1 Discrete distribution

1.2 Continuous distribution

1.3 Empirical distribution

1.4 Expectation
1.1 Discrete distribution
I Definition: Let x be a realization of a discrete random
variable X ∈ R. Then,

f (x) = P(X = x)

is the discrete probability density function (discrete p.d.f.) or


probability mass function of X .
I Theorem 1.1 (Discrete univariate probability density
function) A discrete univariate probability density function
has the following properties:
(x) > 0 for x ∈ S;
(1) fP
(2) x∈S f (x) =P1;
(3) P(X ∈ A) = x∈A f (x), where A ⊂ S.
Then,
f (x) = P(X = x)
is the probability density function (p.d.f.) of X .
Based on the p.d.f. f (x), we further define cumulative distribution
function.
1.1 Discrete distribution
I Definition: we define the cumulative distribution function
(c.d.f.) F (x) of the discrete random variable X by
X
F (x) = P(X ≤ x) = f (s).
s∈S and s≤x

Note that F (x) is a step function on R and the height of a


step at x, x ∈ S, equals the probability f (x)
From Theorem 1.1, we can obtain the following about F (x)
I Theorem 1.2 (Discrete cumulative distribution function) A
discrete univariate cumulative distribution function has the
following properties:
(1) 0 ≤ F (x) ≤ 1 for x ∈ R;
(2) F (x) is a nondecreasing function of x;
(3) F (∞) = 1 and F (−∞) = 0.
Remark 1.1 The discrete p.d.f. f (x) and the c.d.f. F (x) are
one-to-one corresponding. We can first define the c.d.f. F (x), and
then define the discrete p.d.f. f (x) by
f (x) = F (x) − F (x−) for x ∈ S.
1.1 Discrete distribution

I Example: A discrete random variable X with p.d.f.


f (x) = P(X = x) = x/6 for x = 1, 2, 3

Figure 1: p.d.f Figure 2: c.d.f.


1.1 Discrete distribution

I Property 1.1 Two discrete random variables X and Y are


independent if and only if F (x, y ) = FX (x)FY (y ) for all
(x, y ) ∈ S, where F is joint distribution of X and Y , and FX
(or FY ) is the marginal distribution of X (or Y ).
I Property 1.2 Let X and Y be two independent discrete
random variables. Then,
(a) for arbitrary countable sets A and B,

P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B);

(b) for any real functions g (·) and h(·), g (X ) and h(Y ) are
independent.
1.2 Continuous distribution

I Definition: Let X ∈ R be a continuous random variable.


The probability of X lies in an interval (a, b] is
Z b
P(a < X ≤ b) = f (x)dx
a

for some non-negative function f (·). We call f (x) the p.d.f.


of the continuous random variable X .
I Theorem 1.3 (Continuous univariate probability density
function) A continuous univariate probability density function
has the following properties:
(1) Rf (x) ≥ 0 for x ∈ R;
(2) R f (x)dx = R1;
(3) P(X ∈ A) = A f (x)dx for A ⊂ R.
Based on the p.d.f. f (x), we further define cumulative distribution
function of continuous random variable.
1.2 Continuous distribution
I Definition: We define the cumulative distribution function of
continuous random variable X by
Z x
F (x) = P(X ≤ x) = f (s)ds,
−∞

which also satisfies Theorem 1.2. From the fundamental


theorems of calculus, we have F 0 (x) = f (x) if exists. Since
there are no steps or jumps in a continuous c.d.f., it must be
true that P(X = b) = 0 for all real values of b.
I Example 1.1 (Uniform distribution) A random variable X
has a uniform distribution if
 1
b−a , for a ≤ x ≤ b,
f (x) =
0, otherwise.

Briefly, we say that X ∼ U(a, b).


1.2 Continuous distribution
I Property 1.3 If F is a continuous c.d.f. and X ∼ U(0, 1),
then Y = F −1 (X ) ∼ F .
Proof.

P(Y ≤ y ) = P(F −1 (X ) ≤ y ) = P(X ≤ F (y )) = F (y ).


Note that this property helps us to generate a random variable
from certain distribution.
I Example 1.2 (Normal distribution) A random variable X has a
normal distribution if
(x − µ)2
 
1
f (x) = √ exp − for x ∈ R,
σ 2π 2σ 2

where µ ∈ R is the location parameter and σ > 0 is the scale


parameter. Briefly, we say that X ∼ N(µ, σ 2 ).
1.2 Continuous distribution
1.2 Continuous distribution

I Example 1.2 (con’t) Further, Z = (X − µ)/σ ∼ N(0, 1) (the


standard normal distribution), and the c.d.f. of Z is typically
denoted by Φ(x), where
Z x  2
1 s
Φ(x) = P(Z ≤ x) = √ exp − ds.
−∞ 2π 2

Numerical approximations for Φ(x) have been well tabulated


in practice.
1.2 Continuous distribution

I Property 1.4 If the p.d.f. of a continuous random variable X


is fX (x) for x ∈ R, the p.d.f. of Y = aX + b for a > 0 is
fY (x) = 1a fX ( x−b
a ) for x ∈ R.
I Quick proof of Property 1.4 Let FX (x) be the c.d.f. of X .
Then, the c.d.f. of Y is
   
x −b x −b
FY (x) = P(Y = aX +b ≤ x) = P X ≤ = FX
a a

for x ∈ R. Hence,
   
1 x −b 1 x −b
fY (x) = FY0 (x) = FX0 = fX .
a a a a

This completes the proof.


1.2 Continuous distribution

I Property 1.5 Two continuous random variables X and Y are


independent if and only if

F (x, y ) = FX (x)FY (y ) for all (x, y ) ∈ R 2 .

I Property 1.6 Let X and Y be two independent continuous


random variables. Then,
(a) for arbitrary intervals A and B,

P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B);

(b) for any real functions g (·) and h(·), g (X ) and h(Y ) are
independent.
1.3 Empirical distribution

I Brief Introduction: Suppose that X ∼ F (x) is a random


variable resulting from a random experiment. Repeat this
experiment n independent times, we get n random variables
X1 , · · · , Xn associated with these outcomes. The collection of
these random variables is called a sample from a distribution
with c.d.f. F (x) (or p.d.f. f (x)). The number n is called the
sample size.
As all random variables in a sample follow the same c.d.f. as X , we
expect that they can give us the information about the c.d.f of X .
Next, we are going to show that the empirical distribution of
{X1 , · · · , Xn } is close to F (x) in some probability sense.
1.3 Empirical distribution

I Definition: The empirical distribution of {X1 , · · · , Xn } is


defined as
n
1X
Fn (x) = I(Xk ≤ x)
n
k=1

for x ∈ R, where I(A) is an indicator function such that


I(A) = 1 if A holds and I(A) = 0 otherwise.
Fn (x) assigns the probability 1/n to each Xk , and we can check
that it satisfies Theorem 1.2 (please do it by yourself). Since Fn (x)
is the relative frequency of the event X ≤ x, it is an approximation
of the probability P(X ≤ x) = F (x). Thus, the following result is
expected.
1.3 Empirical distribution

I Theorem 1.4 As n → ∞, supx∈R |Fn (x) − F (x)| → 0 almost


surely (a.s.).
Remark The proof of aforementioned theorem is omitted. A
rough explanation of this theorem could be found in the
lecture notes. More details about convergence almost surely
can be found in the book Probability: Theory and examples
by Rick Durrett.
I Fn (x) provides an estimate of the c.d.f. F (x) for each
realization x1 , . . . , xn .
I Example: In the below figure, the black step function is the
empirical distribution function Fn (x) based on a realization
x1 , . . . , xn with Xi ∼ N(0, 1). The red solid line is the c.d.f.
Φ(x) of N(0, 1).
1.3 Empirical distribution

I Example 1.3 Let X denote the number of observed heads


when four coins are tossed independently and at random.
Recall that the distribution of X is B(4, 1/2). One thousand
repetitions of this experiment (actually simulated on the
computer) yielded the following results:
Number of heads Frequency
0 65
1 246
2 358
3 272
4 59
1.3 Empirical distribution
I Example 1.3 (con’t) This information above determines the
following empirical distribution function:
x F1000 (x) x F1000 (x)
(−∞, 0) 0.000 [2, 3) 0.669
[0, 1) 0.065 [3, 4) 0.941
[1, 2) 0.311 [4, ∞) 1.000
The graph of the empirical distribution function F1000 (x) and
the theoretical distribution function F (x) for the binomial
distribution are very close (please check it by yourself).
I Example 1.4 The following numbers are a random sample of
size 10 from some distribution:

−0.49, 0.90, 0.76, −0.97, −0.73, 0.93, −0.88, −0.75, 0.88, 0.96.

(a) Write done the empirical distribution; (b) use the


empirical distribution to estimate P(X ≤ −0.5) and
P(−0.5 ≤ X ≤ 0.5).
1.3 Empirical distribution

I Example 1.4 Solution Order the random sample:

−0.97, −0.88, −0.75, −0.73, −0.49, 0.76, 0.88, 0.90, 0.93, 0.96.

Then, the empirical distribution function F10 (x) is as follows:


x F10 (x) x F10 (x)
(−∞, −0.97) 0.0 [−0.49, 0.76) 0.5
[−0.97, −0.88) 0.1 [0.76, 0.88) 0.6
[−0.88, −0.75) 0.2 [0.88, 0.90) 0.7
[−0.75, −0.73) 0.3 [0.90, 0.93) 0.8
[−0.73, −0.49) 0.4 [0.93, 0.96) 0.9
[0.96, ∞) 1.0
Thus, P(X ≤ −0.5) = F (−0.5) ≈ F10 (−0.5) = 0.4 and
P(−0.5 ≤ X ≤ 0.5) = F (0.5) − F (−0.5) ≈
F10 (0.5) − F10 (−0.5) = 0.5 − 0.4 = 0.1.
1.3 Empirical distribution
I Question: How to estimate the p.d.f. f (x)
Explanation: Apply relative frequency histogram. For the
discrete random variable X , we can estimate f (x) = P(X = x)
by the relative frequency of occurrences of x. That is,
Pn
I(Xk = x)
f (x) ≈ fn (x) = k=1 .
n
I Reuse of example 1.3 The relative frequency of observing
x = 0, 1, 2, 3 or 4 is listed in the second column, and it is close
to the value of f (x), which is the p.d.f of B(4, 1/2).
x f1000 (x) f (x)
0 0.065 0.0625
1 0.246 0.2500
2 0.358 0.3750
3 0.272 0.2500
4 0.059 0.0625
By increasing the value of n, the difference between fn (x) and
f (x) will become small.
1.3 Empirical distribution - Relative frequency histogram
The following points explain how to define relative frequency
histogram and its properties.
I Class intervals For the continuous random variable X , we
first define the so-called class intervals. Choose an integer
l ≥ 1, and a sequence of real numbers c0 , c1 , · · · , cl such that
c0 < c1 < · · · < cl . The class intervals are

(c0 , c1 ], (c1 , c2 ], · · · , (cl−1 , cl ].

Roughly speaking, the class intervals are a non-overlapped


partition of the interval [Xmin , Xmax ]. As f (x) = F 0 (x), we expect
that when cj−1 and cj are close,

F (cj ) − F (cj−1 )
f (x) ≈ for x ∈ (cj−1 , cj ], j = 1, 2, · · · , l.
cj − cj−1
1.3 Empirical distribution - Relative frequency histogram
Note that
Pn
k=1 I(Xk ∈ (cj−1 , cj ])
F (cj ) − F (cj−1 ) = P(X ∈ (cj−1 , cj ]) ≈
n
is the relative frequency of occurrences of Xk ∈ (cj−1 , cj ]. Thus,
we can approximate f (x) by
Pn
I(Xk ∈ (cj−1 , cj ])
f (x) ≈ hn (x) = k=1 ,
n(cj − cj−1 )

for x ∈ (cj−1 , cj ], j = 1, 2, · · · , l.
We call hn (x) the relative frequency histogram.
Clearly, the way that we define the class intervals is not unique,
and hence the value of hn (x) is not unique. When the sample size
n is large and the length of the class interval is small, hn (x) is
expected to be a good estimate of f (x).
1.3 Empirical distribution - Relative frequency histogram

I Properties of hn (x)
(i) hn (x) ≥ 0 for all x;
(ii) The total area bounded by the x axis and below hn (x)
equals one, i.e.,
cl
Z
hn (x)dx = 1;
c0

(iii) The probability for an event A, which is composed of a


union of class intervals, can be estimated by the area above A
bounded by hn (x), i.e.,
Z
P(A) ≈ hn (x)dx.
A
1.3 Empirical distribution - Relative frequency histogram
I Example 1.5 A random sample of 50 college-bound high
school seniors yielded the following high school cumulative
GPA’s.
3.77 2.78 3.40 2.20 3.26
3.00 2.85 2.65 3.08 2.92
3.69 2.83 2.75 3.97 2.74
2.90 3.38 2.38 2.71 3.31
3.92 3.29 4.00 3.50 2.80
3.57 2.84 3.18 3.66 2.86
2.81 3.10 2.84 2.89 2.59
2.95 2.77 3.90 2.82 3.89
2.83 2.28 3.20 2.47 3.00
3.78 3.48 3.52 3.20 3.30
(a) Construct a frequency table for these 50 GPA’s using 10
intervals of equal length with c0 = 2.005 and c10 = 4.005.
(b) Construct a relative frequency histogram for the grouped
data.
(c) Estimate f (3) and f (4).
1.3 Empirical distribution - Relative frequency histogram
I Example 1.5 Solution (a) and (b). The frequency and the
relative frequency histogram based on the class intervals are
given in the following table:
C.I. F R.F. C.I. F R.F.
H. H.
(2.005, 2.205] 1 0.1 (3.005, 3.205] 5 0.5
(2.205, 2.405] 2 0.2 (3.205, 3.405] 6 0.6
(2.405, 2.605] 2 0.2 (3.405, 3.605] 4 0.4
(2.605, 2.805] 7 0.7 (3.605, 3.805] 4 0.4
(2.805, 3.005] 14 1.4 (3.805, 4.005] 5 0.5
C.I. = Class Interval; F. = Frequency; R.F.H. = Relative
Frequency Histogram
(c) As 3 ∈ (2.805, 3.005] and 4 ∈ (3.805, 4.005],
14
f (3) ≈ h50 (3) = = 1.4,
50 × (3.005 − 2.805)
5
f (4) ≈ h50 (4) = = 0.5.
50 × (4.005 − 3.805)
1.4 Expectation

I Definition: (Expectation of a discrete statistic) If u(X ) is a


function of a discrete random variable X that has a p.d.f.
f (x), then X
E[u(X )] = u(x)f (x),
x∈S

where the summation is taken over all possible values of x. If


E[u(X )] exists, it is called the mathematical expectation (or
expected value) of u(X ).
I Remark 1.2 E[u(X )] exists if x∈S |u(x)|f (x) < ∞.
P

I Uncorrelation: Two random variables X1 and X2 are


uncorrelated, if Cov(X1 , X2 ) = 0, where

Cov(X1 , X2 ) = E(X1 X2 ) − E(X1 )E(X2 )

is the covariance of X1 and X2 .


1.4 Expectation

I Property 1.7 Let X be a discrete random variable with finite


mean E(X ), and let a and b be constants. Then,
(i) E(aX + b) = aE(X ) + b;
(ii) if P(X = b) = 1, then E(X ) = b;
(iii) if P(a < X ≤ b) = 1, then a < E(X ) ≤ b;
(iv) if g (X ) and h(X ) have finite mean, then

E(g (X ) + h(X )) = E(g (X )) + E(h(X )).

I Property 1.8 If X ≥ 0 takes integer values, then



X ∞
X
E(X ) = P(X ≥ x) = P(X > x)
x=1 x=0
.
1.4 Expectation

I Definition: (Expectation of a continuous statistic) If u(X ) is


a function of a continuous random variable that has a p.d.f.
f (x), then Z
E[u(X )] = u(x)f (x)dx.
R
If E[u(X )] exists, it is called the mathematical expectation (or
expected value) of u(X ).
I Remark 1.3 E[u(X )] exists if
Z
|u(x)|f (x)dx < ∞.
R
1.4 Expectation

I Example 1.6 Let X have the N(µ, σ 2 ) distribution. Then,


Z ∞
(x − µ)2
 
x
E(X ) = √ exp − dx
−∞ 2πσ 2σ 2
Z ∞  2
sσ + µ s
= √ exp − d(sσ + µ) (letting s = (x − µ)/σ)
−∞ 2πσ 2
Z ∞  2 Z ∞  2
sσ s 1 s
= √ exp − ds + µ √ exp − ds.
−∞ 2π 2 −∞ 2π 2

The first integrand is an odd function, and so the integral


over R is zero. The second integrand is one by some algebra.
Hence, E(X ) = µ.
1.4 Expectation

I Property 1.9 Let X be a continuous random variable, a and


b be constants, and g and h be functions. Then,
(i) if g (X ) and h(X ) have finite mean then

E(ag (X ) + bh(X )) = aE(g (X )) + bE(h(X ));

(ii) if P(a ≤ X ≤ b) = 1, then a ≤ E(X ) ≤ b;


(iii) if h is non-negative, then for a > 0,
P(h(X ) ≥ a) ≤ E(h(X )/a);
(iv) if g is convex, then g (E(X )) ≤ E(g (X )).
I Property 1.10 Let X be a non-negative random variable with
c.d.f. F , p.d.f f , and finite expected value E(X ). Then,
Z ∞
E(X ) = (1 − F (x))dx.
0
1.4 Expectation

I Property 1.11 Let a, b, c, and d be constants. Then,


(i) E (X 2 ) = 0 if and only if P(X = 0) = 1;
(ii) Cov(aX + b, cY + d) = ac Cov(X , Y );
(iii) Var(X + Y ) = Var(X ) + Var(Y ) + 2Cov(X , Y );
(iv) if X and Y are independent,
E(h(X )g (Y )) = E(h(X ))E(g (Y )) provided that
E(h(X )) < ∞ and E(g (Y )) < ∞.
(v) −1 ≤ ρ(X , Y ) ≤ 1;
(vi) |ρ(X , Y )| = 1 if and only if P(X = aY + b) = 1 for some
constants a and b;
(vii) ρ(aX + b, cY + d) = sgn(ac)ρ(X , Y ), where sgn(x)
denotes the sign of x;
(viii) if X and Y are independent, ρ(X , Y ) = 0.
1.4 Expectation

I Property 1.12 (Cauchy-Schwarz inequality) If


E(X 2 )E(Y 2 ) < ∞, then
q
E(XY ) ≤ E(X 2 )E(Y 2 ).

Quick Proof: Without loss generality, we assume that E(Y 2 ) > 0.


Note that

0 ≤ E (X E(Y 2 ) − Y E(XY ))2 = E(Y 2 ) E(X 2 )E(Y 2 ) − (E(XY ))2 .


   

Hence, the conclusions holds.

You might also like