0% found this document useful (0 votes)
14 views

Lecture 15 - Statistics and Data Analysis I 2

This document discusses point estimation, which is estimating an unknown population parameter using a single value called an estimator. It explains how to compute the estimator as a sample proportion, and how to estimate the standard error to determine how far estimates may be from the true parameter value. A larger sample size results in a smaller standard error and more accurate estimates. Two examples are provided to illustrate these concepts.

Uploaded by

guy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lecture 15 - Statistics and Data Analysis I 2

This document discusses point estimation, which is estimating an unknown population parameter using a single value called an estimator. It explains how to compute the estimator as a sample proportion, and how to estimate the standard error to determine how far estimates may be from the true parameter value. A larger sample size results in a smaller standard error and more accurate estimates. Two examples are provided to illustrate these concepts.

Uploaded by

guy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Statistics and Data Analysis I – IDC – 2017

Avner Halevy

Lecture 15 – Point Estimation

In a certain population, a proportion p of the people have a certain property. We would like to estimate
p, which is an unknown population parameter. We collect a sample of people, compute the sample
proportion P̂ and use it as our estimate of p. Since we are estimating the parameter p using a single
value of P̂ , this process is known as point estimation. P̂ is known as the estimator and its value
based on the sample is known as the estimate.
For example, we would like to estimate the proportion p of people who like chocolate. We collect a
sample of n = 50 people. We discover that 20 of them like chocolate. We compute P̂ = 20/50 = 0.4
and report it as our estimate of p.
Recall that in the last lecture we used the central limit theorem to conclude that when n is large
 
p(1 − p)
P̂ ∼ N p, .
n

The fact E[P̂ ] = p means that we expect the estimate we get using P̂ to be near the parameter p
we are trying to estimate, at least on average. However, since we only have one estimate, we would
like to know how values of P̂ spread around their expected value p. For this purpose we compute the
standard deviation of P̂ from its variance:
r
p(1 − p)
σ[P̂ ] =
n
Since we don’t know the value of p, we can estimate the standard deviation, which again in this context
is also called the standard error, by using P̂ in place of p:
s
P̂ (1 − P̂ )
σ[P̂ ] ≈
n

In our example we have P̂ = 0.4. If we pretend that the unknown parameter is p = 0.5, then the
estimation error is given by
|P̂ − p| = |0.4 − 0.5| = 0.1
The precise standard error is r
0.5(1 − 0.5)
σ[P̂ ] = = 0.0707
50
The approximate standard error is
r
0.4(1 − 0.4)
σ[P̂ ] ≈ = 0.0693
50

1
The error in our estimate of the standard error is

|0.0707 − 0.0693| = 0.0014

We see that using P̂ in place of p to approximate the standard error has resulted in an error which is
much smaller than the estimation error, and this is quite typical.
Remember that the standard error tells us how far typical values of P̂ will be from their expected value
p, which is the parameter we are trying to estimate. If the standard error is small, the estimation error
is likely to be small. Therefore, we would like the standard error to be as small as possible. Looking
at the expression for the standard error, we see that it is determined by two quantities:
• p, over which we have no control
• n, the sample size: the larger n is, the smaller the standard error
Let’s consider another example. In a survey before the elections, 100 people in a sample of n = 225
were found to support a certain candidate.
1. As before, we use P̂ as our estimator of the proportion p supporting the candidate in the
population:
100
P̂ = = 0.44
225
2. We estimate the standard error:
r
0.44(1 − 0.44)
σ[P̂ ] ≈ = 0.033
225

3. We can find the probability that our estimation error will not exceed, say, 0.01:

P (|P̂ − p| ≤ 0.01) = P (−0.01 ≤ P̂ − p ≤ 0.01)

We divide all parts of the inequality by the standard error in order to standardize:
!  
−0.01 P̂ − p 0.01 −0.01 0.01
P ≤ ≤ =P ≤Z≤ = P (−0.303 ≤ Z ≤ 0.303) = 0.2358
σ[P̂ ] σ[P̂ ] σ[P̂ ] 0.033 0.033

This means that in 23.58% of surveys with a sample size of n = 225, the estimation error will
not exceed 0.01.
Since the standard error is important in determining how confident we are in our estimate, and since
there is some uncertainty about the standard error itself, which is estimated by using P̂ instead of p,
we can also compute an upper bound on what the standard error could possibly be. We can do this
since for 0 ≤ p ≤ 1 the expression p(1 − p) is bounded above by 1/4. Therefore the standard error is
bounded above by r r r
p(1 − p) 1/4 1
σ[P̂ ] = ≤ =
n n 4n
q
1
In our example, this would have led to an upper bound of (4)(225) = 0.03333 which is very close to
the estimate we computed. This is no accident, since the upper bound is attained when p = 1/2.

You might also like