Appendix E - The Linear Regression Model in Matrix Form
Appendix E - The Linear Regression Model in Matrix Form
E
The Linear Regression
Model in Matrix Form
T
his appendix derives various results for ordinary least squares estimation of the multiple
linear regression model using matrix notation and matrix algebra (see Appendix D for
a summary). The material presented here is much more advanced than that in the text.
where yt is the dependent variable for observation t, and xtj, j 5 1, 2, …, k, are the indepen-
dent variables. As usual, b0 is the intercept and b1, …, bk denote the slope parameters.
For each t, define a 1 (k 1 1) vector, xt 5 (1, xt1, …, xtk), and let 5 (b0, b1, …,
bk) be the (k 1 1) 1 vector of all parameters. Then, we can write (E.1) as
yt 5 xt 1 ut, t 5 1, 2, …, n. [E.2]
x1 1 x11 x12 ... x1k
x2 1 x21 x22 ... x2k
X .
. 5 .. .
n (k 1 1) . .
xn 1 xn1 xn2 ... xnk
807
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
808 APPENDICES
y 5 X 1 u. [E.3]
Remember, because X is n (k 1 1) and is (k 1 1) 1, X is n 1.
Estimation of proceeds by minimizing the sum of squared residuals, as in Section 3.2.
Define the sum of squared residuals function for any possible (k 1 1) 1 parameter
vector b as
n
SSR(b) ∑ ( y 2 x b) .
t51
t t
2
∑ x ( y 2 x ˆ)
t51
t t t 0. [E.5]
(We have divided by 22 and taken the transpose.) We can write this first order condition as
n
∑ ( y 2 bˆ
t51
t 0
ˆ1xt1 2 … 2 b
2b ˆk xtk) 5 0
∑ x ( y 2 bˆ
t51
t1 t 0
ˆ1xt1 2 … 2 b
2b ˆk xtk) 5 0
.
.
.
n
∑ x ( y 2 bˆ
t51
tk t 0
ˆ1xt1 2 … 2 b
2b ˆk xtk) 5 0,
which is identical to the first order conditions in equation (3.13). We want to write these
in matrix form to make them easier to manipulate. Using the formula for partitioned
multiplication in Appendix D, we see that (E.5) is equivalent to
X (y 2 Xˆ) 5 0 [E.6]
or
(X X)ˆ 5 X y. [E.7]
It can be shown that (E.7) always has at least one solution. Multiple solutions do not help
us, as we are looking for a unique set of OLS estimates given our data set. Assuming that
the (k 1 1) (k 1 1) symmetric matrix X X is nonsingular, we can premultiply both
sides of (E.7) by (X X)21 to solve for the OLS estimator ˆ:
ˆ 5 (X X)21X y. [E.8]
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
APPENDIX E The Linear Regression Model in Matrix Form 809
This is the critical formula for matrix analysis of the multiple linear regression model.
The assumption that X X is invertible is equivalent to the assumption that rank(X) 5
(k 1 1), which means that the columns of X must be linearly independent. This is the ma-
trix version of MLR.3 in Chapter 3.
Before we continue, (E.8) warrants a word of warning. It is tempting to simplify the
formula for ˆ as follows:
The flaw in this reasoning is that X is usually not a square matrix, so it cannot be inverted.
In other words, we cannot write (X X)21 5 X21(X )21 unless n 5 (k 1 1), a case that vir-
tually never arises in practice.
The n 1 vectors of OLS fitted values and residuals are given by
y 5 X ˆ, u
ˆ y 5 y 2 X ˆ, respectively.
ˆ5y2ˆ
Xˆ
u 5 0. [E.9]
Because the first column of X consists entirely of ones, (E.9) implies that the OLS re-
siduals always sum to zero when an intercept is included in the equation and that the
sample covariance between each independent variable and the OLS residuals is zero. (We
discussed both of these properties in Chapter 3.)
The sum of squared residuals can be written as
n
SSR 5 ∑ uˆ
t51
2
t 5ˆ
uˆu 5 (y 2 X ˆ) (y 2 X ˆ). [E.10]
All of the algebraic properties from Chapter 3 can be derived using matrix algebra. For ex-
ample, we can show that the total sum of squares is equal to the explained sum of squares
plus the sum of squared residuals [see (3.27)]. The use of matrices does not provide a sim-
pler proof than summation notation, so we do not provide another derivation.
The matrix approach to multiple regression can be used as the basis for a geometrical
interpretation of regression. This involves mathematical concepts that are even more ad-
vanced than those we covered in Appendix D. [See Goldberger (1991) or Greene (1997).]
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
810 APPENDICES
This is a careful statement of the assumption that rules out linear dependencies among the
explanatory variables. Under Assumption E.2, X X is nonsingular, so ˆ is unique and can
be written as in (E.8).
E(uX) 5 0. [E.11]
This assumption is implied by MLR.4 under the random sampling assumption, MLR.2.
In time series applications, Assumption E.3 imposes strict exogeneity on the explana-
tory variables, something discussed at length in Chapter 10. This rules out explanatory
variables whose future values are correlated with ut; in particular, it eliminates lagged
dependent variables. Under Assumption E.3, we can condition on the xtj when we compute
the expected value of ˆ.
ˆ 5 (X X)21X y 5 (X X)21X (X 1 u)
5 (X X)21(X X) 1 (X X)21X u 5 1 (X X)21X u, [E.12]
where we use the fact that (X X)21(X X) 5 Ik 1 1. Taking the expectation conditional on X gives
because E(uX) 5 0 under Assumption E.3. This argument clearly does not depend on the
value of , so we have shown that ˆ is unbiased.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
APPENDIX E The Linear Regression Model in Matrix Form 811
Part (i) of Assumption E.4 is the homoskedasticity assumption: the variance of ut cannot
depend on any element of X, and the variance must be constant across observations, t. Part
(ii) is the no serial correlation assumption: the errors cannot be correlated across observa-
tions. Under random sampling, and in any other cross-sectional sampling schemes with
independent observations, part (ii) of Assumption E.4 automatically holds. For time series
applications, part (ii) rules out correlation in the errors over time (both conditional on X
and unconditionally).
Because of (E.13), we often say that u has a scalar variance-covariance matrix
when Assumption E.4 holds. We can now derive the variance-covariance matrix of the
OLS estimator.
Formula (E.14) means that the variance of bˆj (conditional on X) is obtained by multiply-
2 th
ing by the j diagonal element of (X X)21. For the slope coefficients, we gave an
interpretable formula in equation (3.51). Equation (E.14) also tells us how to obtain the
covariance between any two OLS estimates: multiply 2 by the appropriate off-diagonal
element of (X X)21. In Chapter 4, we showed how to avoid explicitly finding covari-
ances for obtaining confidence intervals and hypothesis tests by appropriately rewriting
the model.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
812 APPENDICES
because [Var( ˜ X) 2 Var(b ˆX)] is p.s.d. Therefore, when it is used for estimating any
linear combination of , OLS yields the smallest variance. In particular, Var( b ˆjX)
Var( b̃jX) for any other linear, unbiased estimator of bj.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
APPENDIX E The Linear Regression Model in Matrix Form 813
ˆ2 5 u
ˆuˆ/(n 2 k 2 1),
THEOREM UNBIASEDNESS OF 2
E.4 ˆ2 is unbiased for 2: E(
Under Assumptions E.1 through E.4, ˆ2X) 5 2 for all 2 . 0.
ˆ 5 y 2 Xˆ 5 y 2 X(X X)21X y 5 My 5 Mu, where M 5 In 2 X(X X)21X ,
PROOF: Write u
and the last equality follows because MX 5 0. Because M is symmetric and idempotent,
ˆu
u ˆ 5 u M Mu 5 u Mu.
The last equality follows from tr(M) 5 tr(In) 2 tr[X(X X)21X ] 5 n 2 tr[(X X)21X X] 5 n 2
tr (Ik 1 1) 5 n 2 (k 1 1)5 n 2 k 2 1. Therefore,
Under Assumption E.5, each ut is independent of the explanatory variables for all t. In a
time series setting, this is essentially the strict exogeneity assumption.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
814 APPENDICES
THEOREM
NORMALITY OF b
E.5 Under the classical linear model Assumptions E.1 through E.5, ˆ conditional on
X is distributed as multivariate normal with mean and variance-covariance
matrix 2(X X)21.
Theorem E.5 is the basis for statistical inference involving . In fact, along with the prop-
erties of the chi-square, t, and F distributions that we summarized in Appendix D, we can
use Theorem E.5 to establish that t statistics have a t distribution under Assumptions E.1
through E.5 (under the null hypothesis) and likewise for F statistics. We illustrate with a
proof for the t statistics.
ˆj 2 bj)/se( b
(b ˆj) ~ tn 2 k 2 1, j 5 0, 1, …, k.
PROOF: The proof requires several steps; the following statements are initially conditional on
ˆj 2 bj)/sd(b
X. First, by Theorem E.5, (b ˆj) 5 c__jj , and cjj is the jth
ˆj) ~ Normal(0,1), where sd(b
diagonal element of (X X)21. Next, under Assumptions E.1 through E.5, conditional on X,
ˆ2/2 ~
(n 2 k 2 1) 2
n 2 k 2 1. [E.18]
erty 5 of the multivariate normal distribution in Appendix D, that ˆ and Mu are indepen-
ˆ2 is a function of Mu, ˆ and
dent. Because ˆ2 are also independent.
ˆj 2 bj)/se( b
(b ˆj) 5 [( b
ˆj 2 bj)/sd(b
ˆj)]/(
ˆ2/2)1/2,
which is the ratio of a standard normal random variable and the square root of a n2 2 k 2 1 /
(n 2 k 2 1) random variable. We just showed that these are independent, so, by definition
of a t random variable, ( bˆj 2 bj )/se( b
ˆj ) has the tn 2 k 2 1 distribution. Because this distri-
bution does not depend on X, it is the unconditional distribution of ( b ˆj 2 bj)/se( b
ˆj) as well.
From this theorem, we can plug in any hypothesized value for bj and use the t statistic for
testing hypotheses, as usual.
Under Assumptions E.1 through E.5, we can compute what is known as the Cramer-
Rao lower bound for the variance-covariance matrix of unbiased estimators of (again
conditional on X) [see Greene (1997, Chapter 4)]. This can be shown to be 2(X X)21,
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
APPENDIX E The Linear Regression Model in Matrix Form 815
which is exactly the variance-covariance matrix of the OLS estimator. This implies that
ˆ is the minimum variance unbiased estimator of (conditional on X): Var( ˜ X) 2
Var( ˆX) is positive semi-definite for any other unbiased estimator ˜ ; we no longer have
to restrict our attention to estimators linear in y.
It is easy to show that the OLS estimator is in fact the maximum likelihood estimator
of under Assumption E.5. For each t, the distribution of yt given X is Normal(xt ,2).
Because the yt are independent conditional on X, the likelihood function for the sample is
obtained from the product of the densities:
n
t51
(2 2)21/2exp[2(yt 2 xt )2/(22)],
where denotes product. Maximizing this function with respect to and 2 is the same
as maximizing its natural logarithm:
n
∑ [2(1/2)log(2
t51
2) 2 (yt 2 xt )2/(22)].
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
816 APPENDICES
Proof of Theorem 11.1. As in Problem E.1 and using Assumption TS.1 , we write the
OLS estimator as
n n n n
∑ x x ∑ x y 5 ∑ x x ∑ x (x
21 21
ˆ5
t51
t t
t51
t t
t51
t t
t51
t t 1 ut)
n n
∑x x ∑x u
21
5 1 t t t t [E.19]
t51 t51
n n
∑ ∑ x u .
21
5 1 n21 xt xt n21 t t
t51 t51
∑ xx
21
p
n21 t t → A21. [E.21]
t51
plim( ˆ) 5 1 A21 0 5 .
∑ x x n ∑ x u
21 n
__
n ( ˆ 2
21 21/2
)5 n t t t t
t51 t51
n
5 A21 n21/2 ∑ x u 1 o (1),
t51
t t p [E.22]
where the term “op(1)” is a remainder term that converges in probability to zero. This
term is equal to n21 xt xt 2 A21 n21/2 t51 xt ut . The term in brackets con-
∑ ∑
n 21 n
t51
verges in probability to zero (by the same argument used in the proof of Theorem 11.1),
n
∑
while n21/2 t51 xt ut is bounded in probability because it converges to a multivariate
normal distribution by the central limit theorem. A well-known result in asymptotic theory
__
is that the product of such terms converges in probability to zero. Further, n (ˆ 2 )
n
inherits its asymptotic distribution from A21 n21/2 t51 xt ut . See Wooldridge (2010, ∑
Chapter 3) for more details on the convergence results used in this proof.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
APPENDIX E The Linear Regression Model in Matrix Form 817
__
n ( ˆ 2 ) ~a Normal (0,2A21). [E.23]
!
Avar(ˆ) 5
ˆ 2(X X)21. [E.24]
Notice how the two divisions by n cancel, and the right-hand side of (E.24) is just the
usual way we estimate the variance matrix of the OLS estimator under the Gauss-Markov
assumptions. To summarize, we have shown that, under Assumptions TS.1 to TS.5 —
which contain MLR.1 to MLR.5 as special cases—the usual standard errors and t statistics
are asymptotically valid. It is perfectly legitimate to use the usual t distribution to obtain
critical values and p-values for testing a single hypothesis. Interestingly, in the general
setup of Chapter 11, assuming normality of the errors—say, ut given xt, ut21, xt21, ..., u1,
x1 is distributed as Normal(0,2)—does not necessarily help, as the t statistics would not
generally have exact t statistics under this kind of normality assumption. When we do not
assume strict exogeneity of the explanatory variables, exact distributional results are dif-
ficult, if not impossible, to obtain.
If we modify the argument above, we can derive a heteroskedasticity-robust, variance-
covariance matrix. The key is that we must estimate E(u2t xt xt) separately because this matrix
no longer equals 2E(xt xt). But, if the uˆt are the OLS residuals, a consistent estimator is
n
(n 2 k 2 1)21 ∑ uˆ x x ,
t51
2
t t t [E.25]
where the division by n 2 k 2 1 rather than n is a degrees of freedom adjustment that typi-
cally helps the finite sample properties of the estimator. When we use the expression in
equation (E.25), we obtain
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
818 APPENDICES
n
!
Avar(ˆ) 5 [n/(n 2 k 2 1)](X X)21 ∑ uˆ x x (X X)
t51
2
t t t
21
. [E.26]
The square roots of the diagonal elements of this matrix are the same heteroskedasticity-
robust standard errors we obtained in Section 8.2 for the pure cross-sectional case. A
matrix extension of the serial correlation- (and heteroskedasticity-) robust standard er-
rors we obtained in Section 12.5 is also available, but the matrix that must replace (E.25)
is complicated because of the serial correlation. See, for example, Hamilton (1994,
Section 10.5).
where A 5 E(xt xt), as in the proofs of Theorems 11.1 and 11.2. The intuition behind
__
equation (E.25) is simple. Because n ( ˆ 2 ) is roughly distributed as Normal(0,2A21),
__ ˆ __
R[ n ( 2 )] 5 n R( ˆ 2 ) is approximately Normal(0,2RA21R ) by Property 3
of the multivariate normal distribution in Appendix D. Under H 0 , R 5 r, so
__
n (R ˆ 2 r) ~ Normal(0, RA
2 21
R ) under H0. By Property 3 of the chi-square distri-
bution, z ( RA R ) z ~ q if z ~ Normal(0,2RA21R ). To obtain the final result
2 21 21 2
formally, we need to use an asymptotic version of this property, which can be found in
Wooldridge (2010, Chapter 3).
Given the result in (E.25), we obtain a computable statistic by replacing A and 2
with their consistent estimators; doing so does not change the asymptotic distribution. The
result is the so-called Wald statistic, which, after cancelling the sample sizes and doing a
little algebra, can be written as
Under H0,W ~a 2q, where we recall that q is the number of restrictions being tested. If
2 5 SSR/(n 2 k 2 1), it can be shown that W/q is exactly the F statistic we obtained in
ˆ
Chapter 4 for testing multiple linear restrictions. [See, for example, Greene (1997, Chap-
ter 7).] Therefore, under the classical linear model assumptions TS.1 to TS.6 in Chapter 10,
W/q has an exact Fq,n 2 k 2 1 distribution. Under Assumptions TS.1 to TS.5 , we only have
the asymptotic result in (E.26). Nevertheless, it is appropriate, and common, to treat the
usual F statistic as having an approximate Fq,n 2 k 2 1 distribution.
A Wald statistic that is robust to heteroskedasticity of unknown form is obtained by
using the matrix in (E.26) in place of ˆ 2(X X)21, and similarly for a test statistic robust
to both heteroskedasticity and serial correlation. The robust versions of the test statistics
cannot be computed via sums of squared residuals or R-squareds from the restricted and
unrestricted regressions.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
APPENDIX E The Linear Regression Model in Matrix Form 819
Summary
This appendix has provided a brief treatment of the linear regression model using matrix no-
tation. This material is included for more advanced classes that use matrix algebra, but it is
not needed to read the text. In effect, this appendix proves some of the results that we ei-
ther stated without proof, proved only in special cases, or proved through a more cumbersome
method of proof. Other topics—such as asymptotic properties, instrumental variables estima-
tion, and panel data models—can be given concise treatments using matrices. Advanced texts
in econometrics, including Davidson and MacKinnon (1993), Greene (1997), Hayashi (2000),
and Wooldridge (2010), can be consulted for details.
Key Terms
First Order Condition Scalar Variance-Covariance Wald Statistic
Matrix Notation Matrix Quasi-Maximum Likelihood
Minimum Variance Unbiased Variance-Covariance Matrix Estimator (QMLE)
Estimator of the OLS Estimator
Problems
1 Let xt be the 1 (k 1 1) vector of explanatory variables for observation t. Show that the
OLS estimator ˆ can be written as
n n
∑ x x ∑ x y .
21
ˆ5 t t t t
t51 t51
ˆu
SSR(b) 5 u ˆ 1 ( ˆ 2 b) X X( ˆ 2 b).
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
820 APPENDICES
ˆj be the OLS estimates from regressing yt on 1, xt1, …, xtk, and let the b̃j be
(iv) Let the b
the OLS estimates from the regression of yt on 1, a1xt1, …, ak xtk, where aj 0,
j 5 1, …, k. Use the results from part (i) to find the relationship between the b̃j
ˆj.
and the b
ˆj)/aj.
(v) Assuming the setup of part (iv), use part (iii) to show that se(b̃j) 5 se(b
(vi) Assuming the setup of part (iv), show that the absolute values of the t statistics for b̃j
ˆj are identical.
and b
4 Assume that the model y 5 X 1 u satisfies the Gauss-Markov assumptions, let G be
a (k 1 1) (k 1 1) nonsingular, nonrandom matrix, and define d 5 G , so that d is
also a (k 1 1) 1 vector. Let ˆ be the (k 1 1) 1 vector of OLS estimators and define dˆ
5 G ˆ as the OLS estimator of d.
(i) Show that E(dˆX) 5 d.
(ii) Find Var(dˆX) in terms of 2, X, and G.
(iii) Use Problem E.3 to verify that dˆ and the appropriate estimate of Var(dˆX) are
obtained from the regression of y on XG21.
(iv) Now, let c be a (k 1 1) 1 vector with at least one nonzero entry. For concreteness,
assume that ck 0. Define 5 c , so that is a scalar. Define j 5 j, j 5 0, 1, ...,
k 2 1 and k 5 . Show how to define a (k 1 1) (k 1 1) nonsingular matrix G so
that d 5 G . (Hint: Each of the first k rows of G should contain k zeros and a one.
What is the last row?)
(v) Show that for the choice of G in part (iv),
1 0 0 . . . 0
0 1 0 . . . 0
.
G
21
5 . .
.
0 0 . . . 1 0
2c0 /ck 2c1/ck . . . 2ck21/ck 1/ck
This regression is exactly the one obtained by writing bk in terms of and b0, b1, ..., bk21,
plugging the result into the original model, and rearranging. Therefore, we can formally
justify the trick we use throughout the text for obtaining the standard error of a linear com-
bination of parameters.
5 Assume that the model y 5 X 1 u satisfies the Gauss-Markov assumptions and let ˆ be
the OLS estimator of . Let Z 5 G(X) be an n (k 1 1) matrix function of X and assume
that Z X [a (k 1 1) (k 1 1) matrix] is nonsingular. Define a new estimator of by ˜ 5
(Z X)21Z y.
(i) Show that E( ˜ X) 5 , so that ˜ is also unbiased conditional on X.
(ii) Find Var( ˜ X). Make sure this is a symmetric, (k 1 1) (k 1 1) matrix that depends
on Z, X, and 2.
(iii) Which estimator do you prefer, ˆ or ˜ ? Explain.
Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.