01 Ecdf Plugin
01 Ecdf Plugin
Advanced Statistics II
4 Up next
Outline
4 Up next
Visualizing data
Histogram of data
7
6
5
Frequency
4
3
2
1
0
−3 −2 −1 0 1 2 3
1.0
1.0
0.8
0.8
0.6
0.6
pnorm
Fn(x)
0.4
0.4
0.2
0.2
0.0
0.0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
x x
For a given sample, the ecdf is the cdf of that distribution that puts
probability mass 1/n at each data point Xi in the sample.1
The pmf/discrete pdf corresponds to the discrete uniform on the set
{x1 , . . . , xn }.
1
So there may be some (other) population out there having exactly this distribution.
Statistics and Econometrics (CAU Kiel) Summer 2021 5 / 29
The empirical cdf
0.10
0.05
0.00
Sampling properties
To interpret the observed ecdf (which is a sample outcome), we need to
know the sampling distribution of the corresponding estimator.
Theorem (2.1)
Let X ∼ F and {X1 , . . . , Xn } be an iid sample from the population X.
The pdf of F̂n (t) is
n j n−j
(
j j [F (t)] [1 − F (t)] for j ∈ {0, 1, 2, . . . , n} ,
P F̂n (t) = =
n 0 otherwise.
Uniform convergence
It is easily shown that the ecdf F̂n (t) converges in probability to the cdf
F (t) for each value of t. But there’s more...
For large enough n, the ecdf provides a good approximation of the cdf
over its entire domain (not only for individual points).
Provided that X is a continuous random variable, one may even show that
the distribution of Dn does not depend on the true F .
Uniform convergence
0.4
0.2
0.0
−3 −2 −1 0 1 2 3
Recall
Definition
A statistical functional τ (F ) is any function of F .
Simplest examples:
xF 0 (x)dx = xdF (x),
R R
the mean, µ =
the variance σ 2 = (x − µ)2 dF (x), or
R
2
This is assuming uniqueness; otherwise use qp = inf{x : F (x) ≥ p}.
Statistics and Econometrics (CAU Kiel) Summer 2021 10 / 29
The empirical cdf
Definition
The plug-in estimator of θ = τ (F ) is defined by
θ̂ = τ F̂n .
Pn
Sample moments Mr0 = xr dF̂n (x) = 1 r
R
n i=1 Xi ,
Sample quantiles q̂p = F̂ −1 (p).
F̂n is not invertible (not even if F is!),
... so we take q̂p = inf{x : F̂n (x) ≥ p}
This amounts to the rth smallest observation, where r = bnp + 0.5c
where b·c denotes the integer part.3
A sample pdf however is meaningless when F is differentiable! See
Advanced Statistics III for nonparametric pdf estimation.
3
And therefore bx + 0.5c rounds to the integer nearest to x.
Statistics and Econometrics (CAU Kiel) Summer 2021 11 / 29
Plug-in I: sample moments
Outline
4 Up next
The rth order central sample moment (or moment about the mean) is
n
1X
Mr = (Xi − X̄n )r ,
n
i=1
1 Pn
where X̄n = n i=1 Xi .
Sampling properties
Let Mr0 = n1 ni=1 Xir be the rth order non-central sample moment for a
P
Asymptotics
m
Mr0 → µ0r ⇒ plim Mr0 = µ0r .
√ 1 P r
n n i Xi − µ0r d
p → N (0, 1).
µ02r − (µ0r )2
Special cases
Definition (Sample Mean)
Let X1 , ..., Xn denote a random sample. The sample mean is
n
1X
X̄n = Xi = M10 .
n
i=1
Theorem (2.3)
Let Sn2 be the sample variance of a random sample X1 , ..., Xn from a
population distribution. Assuming that the population moments exist,
(n−1) 2
a. E(Sn2 ) = n σ ,
2
Var(Sn2 ) = n1 n−1 µ4 − (n−1)(n−3) σ4 ,
b. n n2
c. plim Sn2 = σ 2 ,
√ d
n Sn2 − σ 2 → N 0, µ4 − σ 4 ,
d.
a
Sn2 ∼ N σ 2 , n1 (µ4 − σ 4 ) .
e.
Sample Covariance
Outline
4 Up next
Order statistics
Definition
Let X1 , X2 , ..., Xn be a random sample. Then X[1] ≤ X[2] ≤ ... ≤ X[n] ,
where the X[i] s are the Xi s arranged in order of increasing magnitudes, are
the order statistics of the sample; X[i] is called the ith order statistic.
Sample quantiles are also order statistics: e.g. the median is X[(n+1)/2] .4
4
Beware the multiple definitions in the literature.
Statistics and Econometrics (CAU Kiel) Summer 2021 21 / 29
Plug-in II: sample quantiles and order statistics
Stock returns...
Example
Let the rv X be the return of a portfolio of risky assets. Then the 1st
order statistic X[1] = min{X1 , ..., Xn } is a critical variable for a risk
manager. He or she might be interested in the probability
P(X[1] ≤ -10%).
Worst-case scenario
4
2
0
y
−2
−4
0 20 40 60 80 100
Time
1.4
1.2
0.8
1.0
0.6
0.8
0.6
0.4
0.4
0.2
0.2
0.0
0.0
−3 −2 −1 0 1 2 3 0 5 10 15
1.4
1.2
0.8
1.0
t( 50 ) population
0.6
0.8
0.6
0.4
0.4
0.2
0.2
0.0
0.0
−3 −2 −1 0 1 2 3 0 5 10 15
Larger sample
Distribution of sample max for 500 sample elements Distribution of sample max for 5000 sample elements
1.4
1.4
1.2
1.2
Standard normal population
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 0 5 10 15
Distribution of sample max for 500 sample elements Distribution of sample max for 5000 sample elements
1.4
1.4
1.2
1.2
1.0
1.0
t( 50 ) population
t( 50 ) population
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 0 5 10 15
Distribution
Theorem (2.10)
Let (X1 , . . . , Xn ) be a random sample from a population distribution with
cdf F , and let X[k] be the kth order statistic. Then the cdf of X[k] is given
by
n
X n j n−j
FX[k] (b) = F (b) [1 − F (b)] .
j
j=k
Corollary
The cdfs of X[1] and X[n] are given by
n
FX[1] (b) = 1 − [1 − F (b)] , and FX[n] (b) = F (b)n .
In any case, the distribution of the order statistics FX[k] (b) depends on the
particular cdf of the parent distribution F .
Outline
4 Up next
Coming up