0% found this document useful (0 votes)
23 views

EF3450 2021B MID

This document contains the midterm examination for the EF3450 Principles of Econometrics course at City University of Hong Kong for Semester B 2020-2021. It includes multiple choice questions, short answer questions, and long answer questions related to econometric principles and regression analysis. The document also provides a formula sheet for reference.

Uploaded by

ZeroVector
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

EF3450 2021B MID

This document contains the midterm examination for the EF3450 Principles of Econometrics course at City University of Hong Kong for Semester B 2020-2021. It includes multiple choice questions, short answer questions, and long answer questions related to econometric principles and regression analysis. The document also provides a formula sheet for reference.

Uploaded by

ZeroVector
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

City University of Hong Kong

EF3450 Principles of Econometrics Paper B


Semester B 2020-2021

Midterm Examination

Full Name: _____________________________

Student ID: _____________________________

Instructions:
Mark down at the top right hand corner of your answer sheets whether you are answering Paper
A or B (mark down “A” or “B”).

Section 1: Multiple Choice (15 points)

1. The R2 of a multiple regression


a. may go up or down when more explanatory variables are introduced into the model.
b. is just another name for correlation coefficient.
c. is useful for detecting omitted variable bias.
d. None of the above.

2. The homoskedasticity assumption of the regression random error term is required


a. to establish the unbiasedness property of the least squares estimator.
b. for the OLS estimator to be BLUE.
c. for hypothesis testing in finite sample.
d. Both b and c.

3. The least squares estimator is no longer BLUE (Best Linear Unbiased)


a. if some of the regression parameters are known to be zero.
b. if a relevant explanatory variable has been omitted from the regression.
c. if the random error term fails to be normally distributed.
d. All of the above.

4. The least squares estimator


a. cannot be calculated if the mean of the random error term e is non-zero.
b. has a smaller variance than any other estimators that one can imagine.
c. will be biased if the sample size is too small.
d. None of the above.

1
5. Which of the following violates the assumption of homoskedasticity?
a. The mean of the random error term is non-zero.
b. The variance of the dependent variable is not constant.
c. The regression function is non-linear.
d. None of the above.

6. Let SSRU be the sum of squared residuals from estimating the unrestricted model
Yt =β1 +β2X2t +β3X3t +β4X4t + et
and SSRR be the sum of squared residuals from estimating the restricted model
Yt =β1 +β2X2t + et
a. SSRU>SSRR is impossible.
b. SSRU<SSRR is impossible.
c. SSRU ≥SSRR is impossible.
d. SSRU may be larger or smaller than SSRR, depending on whether the restrictions are true
or false.

7. Which is the following statements is a conclusion of the Gauss-Markov Theorem?


a. The least squares estimator is normally distributed.
b. The least squares estimator is the only unbiased estimator that exists.
c. The least squares estimator has the smallest variance among all possible estimators.
d. None of the above.

8. When the degrees of freedom is getting large, the t-distribution


a. will become more asymmetric.
b. will become a probability distribution with zero variance.
c. will look like the standard normal distribution.
d. None of the above.

9. If E(Y | X) = 0 and E(Y) = 0, then


a. X and Y must be uncorrelated.
b. X and Y must be independent.
c. Y can be written as a linear function of X.
d. Both a and b.

10. By applying the method of least squares, which of the following(s) will be minimized?
a. The size of the random error term.
b. The sum of residuals.
c. The sum of squares of the dependent variable.
d. None of the above.
2
11. Which of the following is not a linear regression model?
a. log(Yt ) = β0 + β1 log(Xt ) + et
b. Yt = β0 + β1 X t2 +et
c. Yt = β0 + β1 X +et
1
t

Yt = β0 + β1 Xt 2 +et
β
d.

12. The true random error term e of a regression model


a. will vanish if the regression parameters are known.
b. is the deviation of the observed y from the sample mean of y.
c. can be calculated once the least squares estimates have been obtained.
d. None of the above.

13. Find the 5% critical value of the F–test for


H0: β1 = 1.2 and 2β2 - 3β3 = 0
for the multiple regression model Y = β1 + β2X2 + β3X3 + e, sample size = 30.
a. 4.17
b. 3.35
c. 3.20
d. None of the above.

Answer the next TWO questions based on the following hedonic price regression for houses:
ln(price) = 1.1 + 0.5ln(size) + 0.1good + e,
where good = 1 if the house is located in a good neighborhood, 0 otherwise.

14. The hedonic regression predicts that if we compare two houses of the same size but of
different locations,
a. the one in the good neighborhood will cost about 0.1% more.
b. the one in the good neighborhood will cost about $0.1 million more.
c. the one in the good neighborhood will cost about 10% more.
d. None of the above.

15. Comparing two houses of the same location, if one is twice as large as the other, the
hedonic regression predicts that
a. the smaller house should be $0.5 million less expensive.
b. the smaller house should be 50% less expensive.
c. the smaller house should be 100% less expensive.
d. None of the above.

3
Section 2: Short questions (15 points)
Write down your answers in the answer book. You can use formulae, graphs or examples
to illustrate if you want.

1. (5 points) Give the motivations for using the adjusted 𝑅 2 instead of the 𝑅 2 when
comparing the goodness of fit of various multiple regression models?

2. (5 points) If the magnitude of the covariance coefficient between two random variables
is 0, does it indicate no relationship between the two random variables?
[Hint: Use an example to justify your answer].

3. (5 points) Is the following statement true? Explain.


“Omitting relevant explanatory variables from the regression will always lead to an
upward bias of the regression coefficient estimators.”

Answers:
1. (2 points)
𝐸𝑆𝑆 𝑅𝑆𝑆
The 𝑅 2 = 𝑇𝑆𝑆 = 1 − 𝑇𝑆𝑆 where ESS is the explained sum of squares, RSS is the
residual sum of squares and TSS is the total sum of squares.
𝑅 2 generally increases when we add another explanatory variable to our regression,
even if the added variable is a ‘garbage’ variable, so it is not a valid indicator to compare
the goodness of fit of regression model with different number of independent variables.

(2 points)
𝑅𝑆𝑆/(𝑇−𝐾)
The adjusted 𝑅 2 = 1 − 𝑇𝑆𝑆/(𝑇−1) , where K is the number of parameters in the
regression model. More explanatory variables increase K and reduce the adjusted 𝑅 2 ,
ceteris paribus. It ‘penalizes’ us for including another explanatory variable.

(1 point) Overall, adjusted 𝑅 2 may increase or decrease when more explanatory


variables are added to the regression model, depending on whether the effect from the
reduction of SSR or the decrease in the degrees of freedom T-K is larger.

2. (3 points) If the covariance coefficient between two random variables is 0, it does not
mean that there is no relationship between the two random variables because zero
covariance only means that there is no linear relationship between the two variables,
there can be nonlinear relationship between the variables.

(2 points) For example, if 𝑋 2 + 𝑌 2 = 1, X and Y will have zero covariance but they
are not unrelated.

3. (1 point) No.

If the true model should have two explanatory variables X2 and X3


𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝜀𝑖
but X3 is omitted in the estimation of the regression,
𝑌̂𝑖 = 𝛽̂1∗ + 𝛽̂2∗ 𝑋2𝑖 + 𝜀̂∗
4
(2 points) the estimator of the slope coefficient of X2 (𝛽̂2∗ ) will be biased as follows:
𝐸[𝛽̂2∗ ]=𝛽2 + ⏟
𝛽3 𝑏32.
𝑏𝑖𝑎𝑠
where 𝑏32 is the estimated coefficient when regressing the omitted variable (𝑋3 ) on the
included variable (𝑋2 ).

(2 points) The direction of the bias (positive or negative) depends on the sign of 𝛽3 𝑏32 .
If both are of the same sign, the bias term will be positive (an upward bias). If they are
of the opposite sign, the bias term will be negative (a downward bias).

5
Section 3: long questions (19 points)
Write all your answers on your answer sheets. Make sure that you write out all your
steps clearly.

We have estimated the wage equation


log(𝑤𝑎𝑔𝑒) = 𝛽0 + 𝛽1 ∙ 𝑐𝑜𝑙𝑙𝑒𝑔𝑒 + 𝛽2 ∙ 𝑢𝑛𝑖𝑣𝑒𝑟 + 𝛽3 ∙ 𝑒𝑥𝑝𝑒𝑟 + 𝑒
where
“log(wage)” is the natural log of hourly wage,
“college” is the number of years attending a two-year junior college,
“univer” is the number of years at a four-year university,
“exper” is the number of years of work experience in the workforce.

The R regression output is as follow:


Estimate Std. Error t value Pr(>|t|)
(intercept) 0.284360 0.104190 2.729 0.00656 **
college 0.022067 0.003094 7.133 3.29e-12 ***
univer 0.092029 0.007330 12.555 <2e-16 ***
exper 0.004121 0.001723 2.391 0.01714 *

Residual standard error: 0.4409 on 552 degrees of freedom


Multiple R-squared: 0.316 Adjusted R-squared: 0.3121
F statistic: 80.39 on 3 and 552 DF p-value: <2.2e-16

(a) (3 points) Based on the estimation result, interpret the coefficient of “exper” in the
regression.

(b) (1 point) Given the degrees of freedom provided in the regression output, what is the
number of observations in the sample?

(c) (3 points) Construct a 95% confidence interval for the coefficient of “exper”.

(d) (3 points) Test the hypothesis that every year of increase in work experience will
increase the wage by more than 0.1%, use a 5% significance level. (Hint: A 0.1% is
equivalent to a proportion of 0.001.)

(e) (3 points) Use the p-value method to test the hypothesis that an increase in “exper” will
increase the wage (please state the null and alternative hypothesis and the p-value for
doing the hypothesis testing. Use the 5% significance level).

(f) (3 points) Interpret the value of “Multiple R-squared” in the regression output.

(g) (3 points) What hypothesis can you test by using the F statistic reported at the bottom
of the regression output? Perform the test and state your decision. (State the null and
alternative hypothesis and your conclusion).

6
Solution:

(a) Since 𝛽̂3 = 0.004121, wage is predicted to increase by a proportion of 0.004121 (or about
0.4121%) for every year of increase in experience [2 points], other variables held constant
[1 point].

(b) The degrees of freedom T-K is 552, as K is 4, T is 556.

(c) The confidence interval is given by:

𝛼 𝛼
[𝛽̂3 − 𝑡𝑐,𝑇−𝐾=552 ( ) × 𝑠𝑒(𝛽̂1 ), 𝛽̂3 + 𝑡𝑐,𝑇−𝐾=552 ( ) × 𝑠𝑒(𝛽̂1 ) ]
2 2
=[0.004121 − 1.96 × 0.001723, 0.004121 + 1.96 × 0.001723 ]
=[0.000744, 0.007498]

(d)
𝐻0 : 𝛽3 ≤ 0.001
𝐻1 : 𝛽3 > 0.001

̂3 −0.01
𝛽 0.004121−0.001
Test statistic = ̂3 ) = = 1.8114
𝑠𝑒(𝛽 0.001723

Since the test statistic > the 1-sided critical value at 5% significance level which is 1.645,
we reject the null hypothesis at the 5% significance level and conclude that every year of
increase in education will increase the wage by more than 1%, other things held constant.

(e)
𝐻0 : 𝛽3 ≤ 0
𝐻1 : 𝛽3 > 0

Since p-value≈ 0.01714 which is smaller than the significance level of 0.05, we
reject the null hypothesis and conclude that an increase in work experience will
increase wage, other variables held constant.

(f) Multiple R-Squared = 0.316 means that about 31.6% of the variation in the dependent
variable is explained by the variations of the explanatory variables included in the model.

(g) The F statistic is used to test of overall significance of the model, i.e. all slope coefficients
are jointly significant. The null and alternative hypotheses are as follows:

𝐻0 : 𝛽1 = 0 𝑎𝑛𝑑 𝛽2 = 0 𝑎𝑛𝑑 𝛽3 = 0
𝐻1 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑙𝑜𝑝𝑒 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡 𝑓𝑟𝑜𝑚 0

Since the p-value is close to 0, we reject the null hypothesis and conclude that at least
one of the slope coefficients is nonzero.

- END –

7
Formula sheet
Expected value
k
E[ X ]  x1 f ( x1 )  x2 f ( x2 )  .....  xk f ( xk )   xi f ( xi )
i 1

Sample mean
n

x i
sample mean  x  i 1

Variance
n
 X2 or V ( X )  E[ X  E ( X )2 ]   f ( xi )[ xi  E ( X )]2
i 1

Sample variance

∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑎 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑋 =
𝑛−1

Covariance
cov( X , Y )  E[( X  E ( X ))(Y  E (Y ))]
n
  f ( xi , yi )( xi  E ( X ))( yi  E (Y ))
i 1

where f ( xi , yi ) is the joint distribution of X and Y.

Correlation
cov( X , Y ) cov( X , Y )
corr( X , Y )   XY  
var( X ) var(Y )  XY
where X and  Y are the standard deviations of X and Y resp ectively .

Sample covariance
n

 (x i  x )( yi  y )
sample cov( X , Y )  i 1

n 1
Sample correlation
sample cov( X , Y )
sample corr ( X , Y ) 
sample V ( X ) sample V (Y )
sample cov( X , Y )

 XY
where  X and  Y are the standard deviations of X and Y respective ly.
8
Ordinary least squares estimators:
T T T T T
T  xt y t   xt  y t  x t y t  Tx y  x t  x  y t  y 
ˆ 2  t 1
T
t 1
T
t 1
 t 1
T
 t 1
T
T  x  ( xt )
t 1
2
t
t 1
2
 x  T ( x)
t 1
2
t
2
 x
t 1
t  x
2

ˆ1  y  ˆ 2 x

The variances and covariance of ˆ1 and ˆ2 are given by:
T

x 2
t
Var ( ˆ1 )  E[ ˆ1  E ( ˆ1 )]2   2 [ T
t 1

T  ( x  x)
t 1
t
2

2
Var ( ˆ2 )  E[ ˆ2  E ( ˆ2 )]2  T

 ( x  x)
t 1
t
2

x
cov(ˆ1, ˆ2 )  E[(ˆ1  E ( ˆ1 ))(ˆ2  E ( ˆ2 ))]   2 [ T
]

 ( x  x)
t 1
t
2

t test statistic:
ˆ j  c
test statistic
se(ˆ j )

Measures of goodness of fit:

Explained sum of squares ESS ∑Ti=1(ŷi − y̅)2


R2 = = =
Total sum of squares TSS ∑Ti=1(yi − y̅)2
or
2
Residual sum of squares RSS ∑Ti=1 êi 2
R = 1− =1− =1− T
Total sum of squares TSS ∑i=1(yi − y̅)2

RSS
T−1
Adjusted R = 1 − − K = 1 − (1 − R2 )
T 2
TSS T−K
T−1

9
The upper and lower bounds of a (1   ) 100% confidence interval for  j are given by
 
 
ˆ j  tc,T  K    se ˆ j :
2

𝛼 𝛼
[𝛽̂𝑗 − 𝑡𝑐,𝑇−𝐾 ( ) × 𝑠𝑒(𝛽̂𝑗 ) , 𝛽̂𝑗 + 𝑡𝑐,𝑇−𝐾 ( ) × 𝑠𝑒(𝛽̂𝑗 ) ]
⏟ 2 ⏟ 2
𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑 𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑

10
11
- END -

12

You might also like