0% found this document useful (0 votes)

91 views

Unbiased Statistic

This document discusses unbiased estimation. It begins by introducing the concept of bias in parameter estimation and defines an unbiased estimator as one where the expected value of the estimator equals the parameter value. Several examples are provided to illustrate unbiased and biased estimators. The document then discusses how to compute the bias of an estimator and presents methods for decomposing the mean squared error of an estimator into variance and bias components. It also explores how to identify and compensate for biased estimators, including through the use of bias-corrected estimators. The key points are that unbiased estimators are preferred and that bias can be reduced or removed to improve estimator quality.

Uploaded by

OrYuenyuen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views

Unbiased Statistic

Uploaded by

OrYuenyuen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Topic 14

Unbiased Estimation
14.1

Introduction

In creating a parameter estimator, a fundamental question is whether or not the estimator differs from the parameter
in a systematic manner. Lets examine this by looking a the computation of the mean and the variance of 16 flips of a
fair coin.
Give this task to 10 individuals and ask them report the number of heads. We can simulate this in R as follows
> (x<-rbinom(10,16,0.5))
[1]
8 5 9 7 7 9 7

8 10

Our estimate is obtained by taking these 10 answers and averaging them. Intuitively we anticipate an answer
around 8. For these 10 observations, we find, in this case, that
> sum(x)/10
[1] 7.8
The result is a bit below 8. Is this systematic? To assess this, we appeal to the ideas behind Monte Carlo to perform
a 1000 simulations of the example above.
> meanx<-rep(0,1000)
> for (i in 1:1000){meanx[i]<-mean(rbinom(10,16,0.5))}
> mean(meanx)
[1] 8.0049
From this, we surmise that we the estimate of the sample mean x
neither systematically overestimates or underestimates the distributional mean. From our knowledge of the binomial distribution, we know that the mean
also has mean
= np = 16 0.5 = 8. In addition, the sample mean X
1
80
(8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8) =
=8
10
10
verifying that we have no systematic error.
is an unbiased estimator of the distributional mean . Here is
The phrase that we use is that the sample mean X
the precise definition.
=
EX

Definition 14.1. For observations X = (X1 , X2 , . . . , Xn ) based on a distribution having parameter value , and for
d(X) an estimator for h(), the bias is the mean of the difference d(X) h(), i.e.,
bd () = E d(X)

h().

(14.1)

If bd () = 0 for all values of the parameter, then d(X) is called an unbiased estimator. Any estimator that is not
unbiased is called biased.
205

Introduction to the Science of Statistics

Unbiased Estimation

Example 14.2. Let X1 , X2 , . . . , Xn be Bernoulli trials with success parameter p and set the estimator for p to be
the sample mean. Then,
d(X) = X,
= 1 (EX1 + EX2 + + EXn ) = 1 (p + p + + p) = p
Ep X
n
n
is an unbiased estimator for p. In this circumstance, we generally write p instead of X.
In addition, we can
Thus, X
use the fact that for independent random variables, the variance of the sum is the sum of the variances to see that
1
(Var(X1 ) + Var(X2 ) + + Var(Xn ))
n2
1
1
= 2 (p(1 p) + p(1 p) + + p(1 p)) = p(1
n
n

Var(
p) =

p).

is an unbiased
Example 14.3. If X1 , . . . , Xn form a simple random sample with unknown finite mean , then X
2
estimator of . If the Xi have variance , then
=
Var(X)

(14.2)

We can assess the quality of an estimator by computing its mean square error, defined by
h())2 ].

E [(d(X)

(14.3)

Estimators with smaller mean square error are generally preferred to those with larger. Next we derive a simple
relationship between mean square error and variance. We begin by substituting (14.1) into (14.3), rearranging terms,
and expanding the square.
E [(d(X)

h())2 ] = E [(d(X)
= E [(d(X)

bd ()))2 ] = E [((d(X)

(E d(X)
2

E d(X)) ] + 2bd ()E [d(X)

E d(X)) + bd ())2 ]

E d(X)] + bd ()2

= Var (d(X)) + bd ()2

Thus, the representation of the mean square error as equal to the variance of the estimator plus the square of the
bias is called the bias-variance decomposition. In particular:
The mean square error for an unbiased estimator is its variance.
Bias always increases the mean square error.

14.2

Computing Bias

For the variance

, we have been presented with two choices:

1X
(xi
n i=1

x
)2

and

1
n

n
X

(xi

x
)2 .

(14.4)

i=1

Using bias as our criterion, we can now resolve between the two choices for the estimators for the variance 2 .
Again, we use simulations to make a conjecture, we then follow up with a computation to verify our guess. For 16
tosses of a fair coin, we know that the variance is np(1 p) = 16 1/2 1/2 = 4
P10
For the example above, we begin by simulating the coin tosses and compute the sum of squares i=1 (xi x
)2 ,
> ssx<-rep(0,1000)
> for (i in 1:1000){x<-rbinom(10,16,0.5);ssx[i]<-sum((x-mean(x))2)}
> mean(ssx)
[1] 35.8511
206

Introduction to the Science of Statistics

Unbiased Estimation

The choice is to divide either by 10, for the first

choice, or 9, for the second.
Histogram of ssx
250

> mean(ssx)/10;mean(ssx)/9
[1] 3.58511
[1] 3.983456

150
0

In this case, because we know all the aspects of the

simulation, and thus we know that the answer ought to
be near 4. Consequently, division by 9 appears to be the
appropriate choice. Lets check this out, beginning with
what seems to be the inappropriate choice to see what
goes wrong..

100

Frequency

200

Exercise 14.4. Repeat

P10 the simulation above, compute
the sum of squares i=1 (xi 8)2 . Show that these simulations
support dividing by 10 rather than 9. verify that
Pn
2
2
(X
for ini ) /n is an unbiased estimator for
i=1
dependent random variable X1 , . . . , Xn whose common
distribution has mean and variance 2 .

Example 14.5. If a simple random sample X1 , X2 , . . . ,

has unknown finite variance 2 , then, we can consider the sample variance

100

120

ssx

Figure 14.1: Sum of squares about x

for 1000 simulations.

1X
S2 =
(Xi
n i=1

2.
X)

To find the mean of S 2 , we divide the difference between an observation Xi and the distributional mean into two steps
- the first from Xi to the sample mean x
and and then from the sample mean to the distributional mean, i.e.,
Xi

+ (X

= (Xi

We shall soon see that the lack of knowledge of is the source of the bias. Make this substitution and expand the
square to obtain
n
X

(Xi

) =

i=1

n
X

((Xi

+ (X

))2

i=1

n
X

(Xi

i=1

n
X

2+2
X)

n
X
i=1

(Xi

2 + 2(X

) +

n
X

i=1

n
X

X)(

(Xi

n
X

(Xi

+ n(X

i=1

(Xi

2 + n(X

i=1

(Check for yourself that the middle term in the third line equals 0.) Subtract the term n(X
divide by n to obtain the identity
n

1X
(Xi
n i=1

X
2= 1
X)
(Xi
n i=1
207

)2 .

)2 from both sides and

Introduction to the Science of Statistics

Unbiased Estimation

Using the identity above and the linearity property of expectation we find that
" n
#
1X
2
2

ES = E
(Xi X)
n i=1
" n
#
1X
2
2

=E
(Xi )
(X )
n i=1
n

1X
E[(Xi
n i=1

E[(X

)2 ]

1X
=
Var(Xi )
n i=1
=

1
n
n

1
n

Var(X)

The last line uses (14.2). This shows that S 2 is a biased estimator for 2 . Using the definition in (14.1), we can
see that it is biased downwards.
n 1 2
1 2
2
b( 2 ) =
=
.
n
n
In addition, because
Note that the bias is equal to Var(X).

n
n
n
n 1 2
E
S2 =
E S2 =
= 2
n 1
n 1
n 1
n
and

Su2 =

n
n

S2 =

1
n

n
X

(Xi

2
X)

i=1

is an unbiased
estimator for . As we shall learn in the next section, because the square root is concave downward,
p
Su = Su2 as an estimator for is downwardly biased.
2

Example 14.6. We have seen, in the case of n Bernoulli trials having x successes, that p = x/n is an unbiased
estimator for the parameter p. This is the case, for example, in taking a simple random sample of genetic markers
at a particular biallelic locus. Let one allele denote the wildtype and the second a variant. If the circumstances in
which variant is recessive, then an individual expresses the variant phenotype only in the case that both chromosomes
contain this marker. In the case of independent alleles from each parent, the probability of the variant phenotype is
p2 . Navely, we could use the estimator p2 . (Later, we will see that this is the maximum likelihood estimator.) To
determine the bias of this estimator, note that
E p2 = (E p)2 + Var(
p) = p2 +
Thus, the bias b(p) = p(1

1
p(1
n

p).

p)/n and the estimator p2 is biased upward.

Exercise 14.7. For Bernoulli trials X1 , . . . , Xn ,

1X
(Xi
n i=1

p)2 = p(1

p).

Based on this exercise, and the computation above yielding an unbiased estimator, Su2 , for the variance,
"
#

n
1
1
1 X
1
1
1
E
p(1 p) = E
(Xi p)2 = E[Su2 ] = Var(X1 ) = p(1 p).
n 1
n
n 1 i=1
n
n
n
208

(14.5)

Introduction to the Science of Statistics

Unbiased Estimation

In other words,

1
n

is an unbiased estimator of p(1

E p2

p(1

p)/n. Returning to (14.5),

1
1
p(1 p) = p2 + p(1
n 1
n

Thus,

pb2 u = p2

p(1

1
p(1
n

p) = p2 .

is an unbiased estimator of p2 .
To compare the two estimators for p2 , assume that we find 13 variant alleles in a sample of 30, then p = 13/30 =
0.4333,
2
2

13
1 13
17
13
2
b
2
p =
= 0.1878, and p u =
= 0.1878 0.0085 = 0.1793.
30
30
29 30
30

The bias for the estimate p2 , in this case 0.0085, is subtracted to give the unbiased estimate pb2 u .
The heterozygosity of a biallelic locus is h = 2p(1 p). From the discussion above, we see that h has the unbiased
estimator
x n x 2x(n x)
= 2n p(1 p) = 2n
h
=
.
n 1
n 1 n
n
n(n 1)

14.3

Compensating for Bias

as an estimator for g(). If g is a convex function, we

In the methods of moments estimation, we have used g(X)
can say something about the bias of this estimator. In Figure 14.2, we see the method of moments estimator for the
for a parameter in the Pareto distribution. The choice of = 3 corresponds to a mean of = 3/2 for
estimator g(X)
is nearly normally distributed
the Pareto random variables. The central limit theorem states that the sample mean X

with mean 3/2. Thus, the distribution of X is nearly symmetric around 3/2. From the figure, we can see that the
interval from 1.4 to 1.5 under the function g maps into a longer interval above = 3 than the interval from 1.5 to 1.6
above = 3 more than below. Consequently, we
maps below = 3. Thus, the function g spreads the values of X

anticipate that the estimator will be upwardly biased.

To address this phenomena in more general terms, we use the characterization of a convex function as a differentiable function whose graph lies above any tangent line. If we look at the value for the convex function g, then this
statement becomes
g(x) g() g 0 ()(x ).
and take expectations.
Now replace x with the random variable X

E [g(X)

g()]

Consequently,

E [g 0 ()(X

)] = g 0 ()E [X

E g(X)

] = 0.
(14.6)

g()

is biased upwards. The expression in (14.6) is known as Jensens inequality.

and g(X)
Exercise 14.8. Show that the estimator Su is a downwardly biased estimator for .
To estimate the size of the bias, we look at a quadratic approximation for g centered at the value
g(x)

g() g 0 ()(x

1
) + g 00 ()(x
2

209

)2 .

Introduction to the Science of Statistics

Unbiased Estimation

4.5

g(x) = x/(x!1)

3.5

y=g()+g()(x!)
3

2.5

2
1.25

1.3

1.35

1.4

1.45

1.5

1.55

1.6

1.65

1.7

1.75

Figure 14.2: Graph of a convex function. Note that the tangent line is below the graph of g. Here we show the case in which = 1.5 and
= g() = 3. Notice that the interval from x = 1.4 to x = 1.5 has a longer range than the interval from x = 1.5 to x = 1.6 Because g spreads
above = 3 more than below, the estimator for is biased upward. We can use a second order Taylor series expansion to correct
the values of X
most of this bias.

and then take expectations. Then, the bias

Again, replace x in this expression with the random variable X

bg () = E [g(X)]

)] + E[g 00 ()(X
2

g() E [g 0 ()(X

(Remember that E [g 0 ()(X

)2 ] =

2
1 00
= 1 g 00 () . (14.7)
g ()Var(X)
2
2
n

)] = 0.) Thus, the bias has the intuitive properties of being

large for strongly convex functions, i.e., ones with a large value for the second derivative evaluated at the mean
,
large for observations having high variance

, and

small when the number of observations n is large.

Exercise 14.9. Use (14.7) to estimate the bias in using p2 as an estimate of p2 is a sequence of n Bernoulli trials and
note that it matches the value (14.5).
Example 14.10. For the method of moments estimator for the Pareto random variable, we determined that
g() =

has
and that X
mean

and

variance

By taking the second derivative, we see that g 00 () = 2(

Next, we have

2
g 00
=
1
1

210

n
3

1)2 (

> 0 and, because > 1, g is a convex function.

3 = 2(
1

1)3 .

Introduction to the Science of Statistics

Thus, the bias

bg ( )

Unbiased Estimation

2
1 00
1
g ()
= 2(
2
n
2

1)3

1)2 (

(
n(

1)
.
2)

So, for = 3 and n = 100, the bias is approximately 0.06. Compare this to the estimated value of 0.053 from the
simulation in the previous section.
Example 14.11. For estimating the population in mark and recapture, we used the estimate
N = g() =

for the total population. Here is the mean number recaptured, k is the number captured in the second capture event
and t is the number tagged. The second derivative
g 00 () =

2kt
>0
3

and hence the method of moments estimate is biased upwards. In this siutation, n = 1 and the number recaptured is a
hypergeometric random variable. Hence its variance
2

kt (N t)(N k)
.
N
N (N 1)

Thus, the bias

bg (N ) =

1 2kt kt (N t)(N k)
(N t)(N k)
(kt/ t)(kt/ k)
kt(k )(t )
=
=
=
.
2 3 N
N (N 1)
(N 1)
(kt/ 1)
2 (kt )

In the simulation example, N = 2000, t = 200, k = 400 and = 40. This gives an estimate for the bias of 36.02. We
can compare this to the bias of 2031.03-2000 = 31.03 based on the simulation in Example 13.2.
This suggests a new estimator by taking the method of moments estimator and subtracting the approximation of
the bias.

= kt kt(k r)(t r) = kt 1 (k r)(t r) .

N
r
r2 (kt r)
r
r(kt r)
p
The delta method gives us that the standard deviation of the estimator is |g 0 ()| / n. Thus the ratio of the bias
of an estimator to its standard deviation as determined by the delta method is approximately
g 00 () 2 /(2n)
1 g 00 ()
p =
p .
0
2 |g 0 ()| n
|g ()| / n
If this ratio is 1, then the bias correction is not very important. In the case of the example above, this ratio is
36.02
= 0.134
268.40
and its usefulness in correcting bias is small.

14.4

Consistency

Despite the desirability of using an unbiased estimator, sometimes such an estimator is hard to find and at other times
impossible. However, note that in the examples above both the size of the bias and the variance in the estimator
decrease inversely proportional to n, the number of observations. Thus, these estimators improve, under both of these
criteria, with more observations. A concept that describes properties such as these is called consistency.
211

Introduction to the Science of Statistics

Unbiased Estimation

Definition 14.12. Given data X1 , X2 , . . . and a real valued function h of the parameter space, a sequence of estimators dn , based on the first n observations, is called consistent if for every choice of
lim dn (X1 , X2 , . . . , Xn ) = h()

n!1

whenever is the true state of nature.

Thus, the bias of the estimator disappears in the limit of a large number of observations. In addition, the distribution
of the estimators dn (X1 , X2 , . . . , Xn ) become more and more concentrated near h().
For the next example, we need to recall the sequence definition of continuity: A function g is continuous at a real
number x provided that for every sequence {xn ; n 1} with
xn ! x, then, we have that g(xn ) ! g(x).
A function is called continuous if it is continuous at every value of x in the domain of g. Thus, we can write the
expression above more succinctly by saying that for every convergent sequence {xn ; n 1},
lim g(xn ) = g( lim xn ).

n!1

Example 14.13. For a method of moment estimator, lets focus on the case of a single parameter (d = 1). For
independent observations, X1 , X2 , . . . , having mean = k(), we have that
n = ,
EX
n , the sample mean for the first n observations, is an unbiased estimator for = k(). Also, by the law of large
i. e. X
numbers, we have that
n = .
lim X

n!1

Assume that k has a continuous inverse g = k 1 . In particular, because = k(), we have that g() = . Next,
using the methods of moments procedure, define, for n observations, the estimators

n ).
n (X1 , X2 , . . . , Xn ) = g
(X1 + + Xn ) = g(X
n
for the parameter . Using the continuity of g, we find that
n ) = g( lim X
n ) = g() =
lim n (X1 , X2 , . . . , Xn ) = lim g(X

n!1

n ) is a consistent sequence of estimators for .

and so we have that g(X

14.5

Cramer-Rao Bound

This topic is somewhat more advanced and can be skipped for the first reading. This section gives us an introduction to
the log-likelihood and its derivative, the score functions. We shall encounter these functions again when we introduce
maximum likelihood estimation. In addition, the Cramer Rao bound, which is based on the variance of the score
function, known as the Fisher information, gives a lower bound for the variance of an unbiased estimator. These
concepts will be necessary to describe the variance for maximum likelihood estimators.
Among unbiased estimators, one important goal is to find an estimator that has as small a variance as possible, A
more precise goal would be to find an unbiased estimator d that has uniform minimum variance. In other words,
d(X) has has a smaller variance than for any other unbiased estimator d for every value of the parameter.
212

Introduction to the Science of Statistics

Unbiased Estimation

Var d(X) Var d(X)

for all 2 .

of unbiased estimator d is the minimum value of the ratio

The efficiency e(d)
Var d(X)

Var d(X)

over all values of . Thus, the efficiency is between 0 and 1 with a goal of finding estimators with efficiency as near to
one as possible.
For unbiased estimators, the Cramer-Rao bound tells us how small a variance is ever possible. The formula is a bit
mysterious at first. However, we shall soon learn that this bound is a consequence of the bound on correlation that we
have previously learned
Recall that for two random variables Y and Z, the correlation
Cov(Y, Z)

(14.8)

Cov(Y, Z)2 Var(Y )Var(Z).

(14.9)

(Y, Z) = p

Var(Y )Var(Z)

takes values between -1 and 1. Thus, (Y, Z)2 1 and so

Exercise 14.14. If EZ = 0, the Cov(Y, Z) = EY Z

We begin with data X = (X1 , . . . , Xn ) drawn from an unknown probability P . The parameter space R.
Denote the joint density of these random variables
f (x|),

where x = (x1 . . . , xn ).

In the case that the data comes from a simple random sample then the joint density is the product of the marginal
densities.
f (x|) = f (x1 |) f (xn |)
(14.10)
For continuous random variables, the two basic properties of the density are that f (x|)
Z
1=
f (x|) dx.
Rn

0 for all x and that

(14.11)

Now, let d be the unbiased estimator of h(), then by the basic formula for computing expectation, we have for
continuous random variables
Z
h() = E d(X) =
d(x)f (x|) dx.
(14.12)
Rn

If the functions in (14.11) and (14.12) are differentiable with respect to the parameter and we can pass the
derivative through the integral, then we first differentiate both sides of equation (14.11), and then use the logarithm
function to write this derivate as the expectation of a random variable,

@ ln f (X|)
h0 () = E d(X)
.
@
213

(14.14)

Introduction to the Science of Statistics

Unbiased Estimation

Now, return to the review on correlation with Y = d(X), the unbiased estimator for h() and the score function
Z = @ ln f (X|)/@. From equations (14.14) and then (14.9), we find that

2
@ ln f (X|)
@ ln f (X|)
@ ln f (X|)
h0 ()2 = E d(X)
= Cov d(X),
Var (d(X))Var
,
@
@
@
or,

where
I() = Var

Var (d(X))

h0 ()2
.
I()

@ ln f (X|)
@

= E

(14.15)

@ ln f (X|)
@

2 #

is called the Fisher information. For the equality, recall that the variance Var(Z) = EZ 2 (EZ)2 and recall from
equation (14.13) that the random variable Z = @ ln f (X|)/@ has mean EZ = 0.
Equation (14.15), called the Cramer-Rao lower bound or the information inequality, states that the lower bound
for the variance of an unbiased estimator is the reciprocal of the Fisher information. In other words, the higher the
information, the lower is the possible value of the variance of an unbiased estimator.
If we return to the case of a simple random sample, then take the logarithm of both sides of equation (14.10)
ln f (x|) = ln f (x1 |) + + ln f (xn |)
and then differentiate with respect to the parameter ,
@ ln f (x|)
@ ln f (x1 |)
@ ln f (xn |)
=
+ +
.
@
@
@
The random variables {@ ln f (Xk |)/@; 1 k n} are independent and have the same distribution. Using the fact
that the variance of the sum is the sum of the variances for independent random variables, we see that In , the Fisher
information for n observations is n times the Fisher information of a single observation.

@ ln f (X1 |)
@ ln f (Xn |)
@ ln f (X1 |)
@ ln f (X1 |) 2
In () = Var
+ +
= nVar(
) = nE[(
) ].
@
@
@
@
Notice the correspondence. Information is linearly proportional to the number of observations. If our estimator
is a sample mean or a function of the sample mean, then the variance is inversely proportional to the number of
observations.
Example 14.15. For independent Bernoulli random variables with unknown success probability , the density is
f (x|) = x (1
The mean is and the variance is (1

)(1

). Taking logarithms, we find that

ln f (x|) = x ln + (1
@
x
ln f (x|) =
@

1
1

x) ln(1

x
x
=

.
)

The Fisher information associated to a single observation

"
2 #
@
1
I() = E
ln f (X|)
= 2
E[(X
@
(1 )2
=

2 (1
214

)2 ] =
) =

1
2 (1
1

)2
.

Var(X)

Introduction to the Science of Statistics

Unbiased Estimation

Thus, the information for n observations In () = n/((1 )). Thus, by the Cramer-Rao lower bound, any unbiased
estimator of based on n observations must have variance al least (1 )/n. Now, notice that if we take d(x) = x
,
then
= , and Var d(X) = Var(X)
= (1 ) .
E X
n
is a unbiased estimator having uniformly minimum variance.
These two equations show that X
Exercise 14.16. For independent normal random variables with known variance
uniformly minimum variance unbiased estimator.
Exercise 14.17. Take two derivatives of ln f (x|) to show that
"
2 #
@ ln f (X|)
I() = E
=
@

2
0

is a
and unknown mean , X

@ 2 ln f (X|)
.
@2

(14.16)

This identity is often a useful alternative to compute the Fisher Information.

Example 14.18. For an exponential random variable,
ln f (x| ) = ln

@ 2 f (x| )
=
@ 2

Thus, by (14.16),

I( ) =

1
2

is an unbiased estimator for h( ) = 1/ with variance

Now, X
1
.
n 2
By the Cramer-Rao lower bound, we have that
g 0 ( )2
1/
=
nI( )
n

1
.
n 2

has this variance, it is a uniformly minimum variance unbiased estimator.

Because X
Example 14.19. To give an estimator that does not achieve the Cramer-Rao bound, let X1 , X2 , . . . , Xn be a simple
random sample of Pareto random variables with density
fX (x| ) =

x > 1.

1)2 (

The mean and the variance

is an unbiased estimator of = /(
Thus, X

=
Var(X)

1)2 (

To compute the Fisher information, note that

ln f (x| ) = ln

( + 1) ln x

and thus
215

@ 2 ln f (x| )
=
@ 2

1
2

Introduction to the Science of Statistics

Unbiased Estimation

Using (14.16), we have that

I( ) =

1
2

Next, for
= g( ) =

g0 ( ) =

1
(

1)2

g 0 ( )2 =

and

1
(

1)4

2)
=1
1)2

Thus, the Cramer-Rao bound for the estimator is

g 0 ( )2
=
In ( )
n(

1)4

and the efficiency compared to the Cramer-Rao bound is

g 0 ( )2 /In ( )
=

n(
Var(X)

1)4

1)2 (

The Pareto distribution does not have a variance unless

Cramer-Rao bound is low but improves with larger .

14.6

(
(

> 2. For

1)2

just above 2, the efficiency compared to its

A Note on Efficient Estimators

For an efficient estimator, we need find the cases that lead to equality in the correlation inequality (14.8). Recall that
equality occurs precisely when the correlation is 1. This occurs when the estimator d(X) and the score function
@ ln fX (X|)/@ are linearly related with probability 1.
@
ln fX (X|) = a()d(X) + b().
@
After integrating, we obtain,
ln fX (X|) =

a()dd(X) +

b()d + j(X) = ()d(X) + B() + j(X)

Note that the constant of integration of integration is a function of X. Now exponentiate both sides of this equation
fX (X|) = c()h(X) exp(()d(X)).

(14.17)

Here c() = exp B() and h(X) = exp j(X).

We shall call density functions satisfying equation (14.17) an exponential family with natural parameter ().
Thus, if we have independent random variables X1 , X2 , . . . Xn , then the joint density is the product of the densities,
namely,
f (X|) = c()n h(X1 ) h(Xn ) exp(()(d(X1 ) + + d(Xn )).
(14.18)
In addition, as a consequence of this linear relation in (14.18),
d(X) =

1
(d(X1 ) + + d(Xn ))
n

is an efficient estimator for h().

Example 14.20 (Poisson random variables).
x

f (x| ) =

=e
216

1
exp(x ln ).
x!

Introduction to the Science of Statistics

Unbiased Estimation

Thus, Poisson random variables are an exponential family with c( ) = exp(

( ) = ln . Because

= E X,

), h(x) = 1/x!, and natural parameter

is an unbiased estimator of the parameter .

X
The score function
@
@
x
ln f (x| ) =
(x ln
ln x!
)=
1.
@
@
The Fisher information for one observation is
"
2 #
X
1
1
I( ) = E
1
= 2 E [(X
)2 ] = .
Thus, In ( ) = n/ is the Fisher information for n observations. In addition,
=
Var (X)
and d(x) = x
has efficiency

Var(X)
= 1.
1/In ( )

This could have been predicted. The density of n independent observations is

f (x| ) =

e
x1 !

e
xn !

x1 +xn

x1 ! xn !

e n nx
x1 ! xn !

and so the score function

@
@
ln f (x| ) =
( n + n
x ln ) =
@
@
showing that the estimate x
and the score function are linearly related.

n
x

Exercise 14.21. Show that a Bernoulli random variable with parameter p is an exponential family.
Exercise 14.22. Show that a normal random variable with known variance
family.

14.7

2
0

and unknown mean is an exponential

Answers to Selected Exercises

14.4. Repeat the simulation, replacing mean(x) by 8.

> ssx<-rep(0,1000)
> for (i in 1:1000){x<-rbinom(10,16,0.5);ssx[i]<-sum((x-8)2)}
> mean(ssx)/10;mean(ssx)/9
[1] 3.9918
[1] 4.435333
Note that division by 10 gives an answer very close to the correct value of 4. To verify that the estimator is
unbiased, we write
" n
#
n
n
n
1X
1X
1X
1X 2
2
2
E
(Xi ) =
E[(Xi ) ] =
Var(Xi ) =
= 2.
n i=1
n i=1
n i=1
n i=1
217

Introduction to the Science of Statistics

Unbiased Estimation

14.7. For a Bernoulli trial note that Xi2 = Xi . Expand the square to obtain
n
X

(Xi

p)2 =

i=1

n
X
i=1

Xi2

n
X

Xi + n
p2 = n
p

2n
p2 + n
p2 = n(
p

p2 ) = n
p(1

p).

i=1

Divide by n to obtain the result.

p
14.8. Recall that ESu2 = 2 . Check the second derivative to see that g(t) = t is concave down for all t. For concave
down functions, the direction of the inequality in Jensens inequality is reversed. Setting t = Su2 , we have that
ESu = Eg(Su2 ) g(ESu2 ) = g(

and Su is a downwardly biased estimator of .

14.9. Set g(p) = p2 . Then, g 00 (p) = 2. Recall that the variance of a Bernoulli random variable
bias
2
1
1 p(1 p)
p(1 p)
bg (p) g 00 (p)
= 2
=
.
2
n
2
n
n
14.14. Cov(Y, Z) = EY Z

EY EZ = EY Z whenever EZ = 0.

14.16. For independent normal random variables with known variance

f (x|) =
0

ln f (x|) =
Thus, the score function

1
p

ln(

2
0

@
1
ln f (x|) = 2 (x
@
0

and the Fisher information associated to a single observation

"
2 #
@
1
I() = E
ln f (X|)
= 4 E[(X
@
0

p) and the

= p(1

and unknown mean , the density

exp

2
0

)2 ] =

4 Var(X)
0

2.
0

Again, the information is the reciprocal of the variance. Thus, by the Cramer-Rao lower bound, any unbiased estimator
based on n observations must have variance al least 02 /n. However, if we take d(x) = x
, then
Var d(X) =

2
0

and x
is a uniformly minimum variance unbiased estimator.
14.17. First, we take two derivatives of ln f (x|).
@ ln f (x|)
@f (x|)/@
=
@
f (x|)

(14.19)

and
@ 2 ln f (x|)
@ 2 f (x|)/@2
=
@2
f (x|)
=

@ 2 f (x|)/@2
f (x|)

(@f (x|)/@)2
@ 2 f (x|)/@2
=
f (x|)2
f (x|)

2
@ ln f (x|)
@
218

@f (x|)/@)
f (x|)

Introduction to the Science of Statistics

Unbiased Estimation

upon substitution from identity (14.19). Thus, the expected values satisfy
E

2
@ 2 ln f (X|)
@ f (X|)/@2
=
E

@2
f (X|)

@ ln f (X|)
@

2 #

p
p) exp x ln
.
1 p
1 p

p, h(x) = 1 and the natural parameter (p) = ln 1 p p , the log-odds.

f (x|p) = px (1

Thus, c(p) = 1

p)1

1
p

exp

= (1

14.22. The normal density

f (x|) =
0

Thus, c() =

1
p

2 /2

, h(x) = e

x2 /2

(x
2
0

2
0

=
0

1
p

2 /2

x2 /2

and the natural parameter () = /

219

exp
2
0.

x
2
0

N Unbiased
No ratings yet
N Unbiased
15 pages
16 Intro to Point Estimation, Bias, MSE, Efficiency, (8.1-8.2, 9.1-9.2)-1
No ratings yet
16 Intro to Point Estimation, Bias, MSE, Efficiency, (8.1-8.2, 9.1-9.2)-1
26 pages
S1B 15 02 Estimation Bias 4
No ratings yet
S1B 15 02 Estimation Bias 4
2 pages
Unit 2
No ratings yet
Unit 2
41 pages
Unbiased Estimator
No ratings yet
Unbiased Estimator
70 pages
Unit-16 IGNOU STATISTICS
No ratings yet
Unit-16 IGNOU STATISTICS
16 pages
Lectura 2 Point Estimator Basics
No ratings yet
Lectura 2 Point Estimator Basics
11 pages
Chapter 6 Statistical Estimation Method of Moments MLE
No ratings yet
Chapter 6 Statistical Estimation Method of Moments MLE
29 pages
Estimation Bertinoro09 Cristiano Porciani 1
No ratings yet
Estimation Bertinoro09 Cristiano Porciani 1
42 pages
2 Hypothesis Testing
No ratings yet
2 Hypothesis Testing
22 pages
9.0 Lesson Plan
No ratings yet
9.0 Lesson Plan
16 pages
6.CHAPTER 4
No ratings yet
6.CHAPTER 4
9 pages
Lecture 21
No ratings yet
Lecture 21
4 pages
Estimation
No ratings yet
Estimation
25 pages
Module3
No ratings yet
Module3
5 pages
UMass Stat 516 Solutions Chapter 8
No ratings yet
UMass Stat 516 Solutions Chapter 8
26 pages
DS Estimator
No ratings yet
DS Estimator
2 pages
Properties of Estimators New Update Spin
No ratings yet
Properties of Estimators New Update Spin
43 pages
Handout 14: Unbiasedness and MSE
No ratings yet
Handout 14: Unbiasedness and MSE
3 pages
bias_issues
No ratings yet
bias_issues
16 pages
Topic 2a Theory of Estimation
No ratings yet
Topic 2a Theory of Estimation
12 pages
Sta 2
No ratings yet
Sta 2
13 pages
INFERENCE-UNBIASEDNESS
No ratings yet
INFERENCE-UNBIASEDNESS
4 pages
Chapter10 Solutions (1)
No ratings yet
Chapter10 Solutions (1)
62 pages
Mean and Variance Estimation
No ratings yet
Mean and Variance Estimation
2 pages
Variance Estimation
No ratings yet
Variance Estimation
2 pages
MIT15 097S12 Lec04
No ratings yet
MIT15 097S12 Lec04
6 pages
Theory of Estimation by P.G.dixit, Nirali Publication
No ratings yet
Theory of Estimation by P.G.dixit, Nirali Publication
186 pages
Chap 3
No ratings yet
Chap 3
25 pages
IST172 Note 7
No ratings yet
IST172 Note 7
15 pages
Statistical Inference II
No ratings yet
Statistical Inference II
3 pages
Module 4: Point Estimation: Statistics (OA3102)
No ratings yet
Module 4: Point Estimation: Statistics (OA3102)
41 pages
Adobe Scan 07 Aug 2023
No ratings yet
Adobe Scan 07 Aug 2023
5 pages
Lecture 11
100% (1)
Lecture 11
33 pages
Estimator Properties
No ratings yet
Estimator Properties
17 pages
Estimators1 PDF
No ratings yet
Estimators1 PDF
2 pages
Basic Stats Estimation
No ratings yet
Basic Stats Estimation
8 pages
Chap 10
No ratings yet
Chap 10
7 pages
Estimation Theory - Module1
No ratings yet
Estimation Theory - Module1
44 pages
Statistical Inference
No ratings yet
Statistical Inference
35 pages
Lecture Notes For Mathematical Statistics
No ratings yet
Lecture Notes For Mathematical Statistics
184 pages
Probability and Statistics ch7
No ratings yet
Probability and Statistics ch7
19 pages
Chapter 3 - Statistical Inference (Point Estimation
No ratings yet
Chapter 3 - Statistical Inference (Point Estimation
15 pages
PP 01 Soln
No ratings yet
PP 01 Soln
10 pages
MATH321 F08 LEC2 Solutions - Chapter 8
No ratings yet
MATH321 F08 LEC2 Solutions - Chapter 8
2 pages
Statinf Estimation
No ratings yet
Statinf Estimation
110 pages
Sample Statistics: N I N I
No ratings yet
Sample Statistics: N I N I
13 pages
mseee
No ratings yet
mseee
5 pages
ch7
No ratings yet
ch7
29 pages
Problem Set 1 - Answers
No ratings yet
Problem Set 1 - Answers
7 pages
TP_stat_inf_103957
No ratings yet
TP_stat_inf_103957
32 pages
Why Is The Sample Variance A Biased Estimator?: Stephen So
No ratings yet
Why Is The Sample Variance A Biased Estimator?: Stephen So
4 pages
Lec3_estimatorproperties (1)
No ratings yet
Lec3_estimatorproperties (1)
23 pages
lecture_note_15
No ratings yet
lecture_note_15
6 pages
Topic_10_Point_estmation_of_parameters
No ratings yet
Topic_10_Point_estmation_of_parameters
36 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet