BEST Linear Estimators
BEST Linear Estimators
n
i i
i
w y
=
u =
, where the
weights w
i
are choice variables, to make calculating expectations simpler and to pose
the estimation problem better.
Writing the mean of u as
( ) ( )
1 1 1
( ) ( )
n n n
i i i i i
i i i
E E w y w E y w
= = =
u = = = u
, (1)
the variance of u is equal to
| |
{ } ( )
2
2
2
1 1
( )
n n
i i i
i i
E E E w y w
u
= =
= u u = u
`
)
{ }
2
1
( )
n
i i
i
E w y
=
= u
. (2)
We seek to choose the weights w
i
, for i = 1,,n, to minimize this function. Using the
composite function theorem, the necessary first-order conditions are
2
1
2( ) ( ) 0 1, ,
n
i j j
j
i
E y w y i n
w
u
=
= u u = =
" . (3)
Re-arranging terms and using the fact that independence implies zero covariance,
Best Linear Estimators ARE 210 Page 3
2
2
1
2 [( )( )] 2 0 1, ,
n
j i j i
j
i
w E y y w i n
w
u
=
= u u = = =
" . (4)
if and only if w
i
= 0 i =1,, n. Thus, the choice 0 u = achieves the global unrestricted
minimum variance of zero. But although this is a very precise estimator, it is likely to be
inaccurate.
It is prudent, therefore, to take the bias of an estimator into account. We will now develop
and discuss the statistical properties of two estimators that do the BLUE and the
BLMSE. To obtain the BLUE, note that for uto be unbiased, it must satisfy ( ) E u = u.
Applying this condition to (1), we have
( )
1
( )
n
i
i
E w
=
u = u = u
if and only if
1
1
n
i
i
w
=
=
. (5)
Thus, we now seek to find appropriate weights w
i
to minimize the variance (2) subject to
the adding up condition in (5) implied by unbiasedness. To accomplish this, we form the
Lagrangean function,
{ }
( )
2
1 1
( ) 1
n n
i i i
i i
E w y w
= =
= u +
L , (6)
and find a saddle point of L (a relative maximum with respect to the ws and a relative
minimum with respect to ). Since we do not have any inequality or sign restrictions on
the choice variables, and since the Lagrangean is convex in w (this is easy to prove, and
you should do it as an exercise), the first-order necessary and sufficient conditions are:
{ }
1
2( ) ( ) 0 1,...,
n
j i i
i
j
E y w y j n
w
=
= u u = =
L
, (7)
Best Linear Estimators ARE 210 Page 4
1
1 0
n
i
i
w
=
= =
L
. (8)
Rearranging the terms inside the {} and then passing the expectation operator through by
the distributive law, we can rewrite (7) as
1
2 [( )( )] 1,...,
n
i i j
i
j
w E y y j n
w
=
= u u = =
L
. (9)
Now, again using the fact that independence implies zero covariance, we obtain
2
2 1,...,
j
j
w j n
w
= = =
L
. (10)
Solving for the w
j
terms, we have
2
2
j
w j = . Then substituting this into (8) and
solving for gives
2 2
1
1 2 2
n
j
j
w n n
=
= = =
n
i
i
y n
=
u =
.
Before proceeding to the BLMSEE, we will briefly to develop the statistical properties of
the sample mean. First, by construction, y u = is unbiased. We can easily verify this by
using the linearity of the expectation operator, to show that
( )
1 1 1
( ) ( )
n n n
i i
i i i
E y E y n E y n n n n
= = =
= = = u = u = u
. (11)
Second, also by construction, y u = has smallest variance among all possible unbiased
estimators for u that are formed as linear combinations of the y
i
. We can easily calculate
Best Linear Estimators ARE 210 Page 5
its variance by using the fact that the y
i
are statistically independent, and therefore uncor-
related, so that
( ) ( )
2 2
2
1 1
[( ) ] ( )
n n
i i
i i
E y E y n E y n
= =
u = u = u
( ) ( )
1 2 2
2
1 2 1
1 ( ) 2 1 ( )( )
n n i
i i j
i i j
E n y n y y
=
= = =
= u + u u
( ) ( )
1 2 2
2
1 2 1
1 [( ) ] 2 1 [( )( )]
n n i
i i j
i i j
n E y n E y y
=
= = =
= u + u u
( ) ( )
1 2 2
2
1 2 1
1 2 1 0
n n i
i i j
n n
=
= = =
= +
2
n = . (12)
Finally, the sample size adjusted and mean deviated random variable, ( ) n y u , has
mean 0 zero and variance
2
for all values of n 2. As long as this variance is finite, it
can be shown that as n , this random variable converges to one which has a normal
distribution (also with zero mean and variance
2
). This is one of the main justifications
for using the normal distribution theory and standard normal probability tables to con-
struct confidence intervals and perform hypothesis tests for all sorts of probability distri-
butions, as long as the sample size is reasonably large.
The BLUE applies a lexicographic preference ordering to the bias and variance of an es-
timator. That is, any degree of bias is strictly less preferred to no bias, regardless of what
the variance of the estimator is. This is a somewhat restrictive subjective criterion func-
tion (i.e., no utility function exists for such a preference ordering). An alternative to the
best linear unbiased estimator (BLUE) is the best linear mean square error estimator
Best Linear Estimators ARE 210 Page 6
(BLMSEE). This does not require the estimator to be unbiasedness. Instead, this princi-
ple weights the squared bias of an estimator equally with the variance as the criterion for
selecting the estimator. The criterion now is
( )
2
2
1
( )
n
i i
i
MSE E E w y
=
= u u = u
. (13)
By adding and subtracting
( ) ( )
1 1
n n
i i i
i i
E w y w
= =
= u
inside of the square term, we find
that the mean square error is the sum of the variance and the squared bias,
{ }
2
[ ( ) ( ) ] MSE E E E = u u + u u
{ }
2 2
[ ( )] 2[ ( )][ ( ) ] [ ( ) ] E E E E E = u u + u u u u + u u
{ }
2 2
[ ( )] 2[ ( ) ] [ ( )] [ ( ) ] E E E E E E = u u + u u u u + u u
2 2
( , ) B
u
+ u u
, (14)
where
2 2
{[ ( )] } E E
u
= u u
= u = =
. (15)
Distributing the y
i
inside the parentheses and then distributing the expectation operator
inside the square brackets then gives
( )
2 2 2
1 1
0 ( ) ( ) , 1,...,
n n
j i j i j i
j j
w E y y E y w w i n
= =
= u = u + u =
. (16)
Best Linear Estimators ARE 210 Page 7
Now, summing across all i gives
( ) ( )
2 2 2
1 1
0
n n
j i
j i
w w n
= =
u + u =
, (17)
which can be solved directly for the total sum of the linear weights,
1 1
n n
j i
j i
w w
= =
=
,
which gives
2
2 2 2 2 1
1
1
( ) [1 ( )]
n
j
j
n
w
n n
=
u
= = <
u + + u
. (18)
Plugging this back into (15) and solving for each (equal) w
i
then gives
( ) ( )
2
2 2 2
1
2 2
1
1
1 1
n
j
j
i
w n
w
=
u
`
u + u
)
= =
( )
( )
2 2 2
2 2
2 2 2
1 1
[ ( )]
1
n
n n
n
u u
= = <
+ u
+ u
. (19)
Thus, we see that the BLMSEE is biased toward zero (i.e., ( ) E u < u ), trading off a
smaller variance (due to weights that are less than 1/n) for some bias. In particular, by
taking the expected value of the linear estimator u , we obtain the bias as
( )
( )
1 2 2
( , )
1
n
i i
i
B E w y
n
=
u
u u = u = u
+ u
( )
( )
2 2
2 2
1
n
n
u
= u
+ u
. (20)
Again using the fact that the y
i
are statistically independent, we also can calculate the
Best Linear Estimators ARE 210 Page 8
variance of the linear estimator u as the sum of the squared weights times the variance of
the individual y
i
s, which gives
( )
2
2 2
2 2 2
2 2 1
1
1
n
i
i
w
n n n
u
=
| |
= = <
|
+ u
\ .
. (17)
Finally, we can see the gain in the mean square error by trading off some bias for an as-
sociated reduction in variance by squaring (16) and adding it to (17), which gives
2 2
( ( , ) MSE B
u
u. u) = + u u
( )
( )
( )
( )
2 4 2 2
2 2
2 2 2 2
1 1
n n
n n
u
= +
+ u + u
( )
2 2
2 2
1
( )
1
Var y
n n
n
| |
= < =
|
+ u
\ .
. (18)
Comments:
1. The optimal weights for the BLMSEE for u is a function of the unknown parameters
u and
2
. This makes it more difficult to calculate than the BLUE. However, it is
possible to show that we can use x and
2
s in the formulas for the weights of the
BLMSEE to dominate the BLUE in terms of finite sample mean square error.
2. The differences in the mean and variance properties of the BLMSEE and BLUE con-
verge to zero as sample size gets large (i.e., as n ). This large sample ( asymp-
totic) property is important and is a general characteristic of the relationship between
mean square error minimizing estimators and best unbiased estimators.