EF3450 2021B MID
EF3450 2021B MID
Midterm Examination
Instructions:
Mark down at the top right hand corner of your answer sheets whether you are answering Paper
A or B (mark down “A” or “B”).
1
5. Which of the following violates the assumption of homoskedasticity?
a. The mean of the random error term is non-zero.
b. The variance of the dependent variable is not constant.
c. The regression function is non-linear.
d. None of the above.
6. Let SSRU be the sum of squared residuals from estimating the unrestricted model
Yt =β1 +β2X2t +β3X3t +β4X4t + et
and SSRR be the sum of squared residuals from estimating the restricted model
Yt =β1 +β2X2t + et
a. SSRU>SSRR is impossible.
b. SSRU<SSRR is impossible.
c. SSRU ≥SSRR is impossible.
d. SSRU may be larger or smaller than SSRR, depending on whether the restrictions are true
or false.
10. By applying the method of least squares, which of the following(s) will be minimized?
a. The size of the random error term.
b. The sum of residuals.
c. The sum of squares of the dependent variable.
d. None of the above.
2
11. Which of the following is not a linear regression model?
a. log(Yt ) = β0 + β1 log(Xt ) + et
b. Yt = β0 + β1 X t2 +et
c. Yt = β0 + β1 X +et
1
t
Yt = β0 + β1 Xt 2 +et
β
d.
Answer the next TWO questions based on the following hedonic price regression for houses:
ln(price) = 1.1 + 0.5ln(size) + 0.1good + e,
where good = 1 if the house is located in a good neighborhood, 0 otherwise.
14. The hedonic regression predicts that if we compare two houses of the same size but of
different locations,
a. the one in the good neighborhood will cost about 0.1% more.
b. the one in the good neighborhood will cost about $0.1 million more.
c. the one in the good neighborhood will cost about 10% more.
d. None of the above.
15. Comparing two houses of the same location, if one is twice as large as the other, the
hedonic regression predicts that
a. the smaller house should be $0.5 million less expensive.
b. the smaller house should be 50% less expensive.
c. the smaller house should be 100% less expensive.
d. None of the above.
3
Section 2: Short questions (15 points)
Write down your answers in the answer book. You can use formulae, graphs or examples
to illustrate if you want.
1. (5 points) Give the motivations for using the adjusted 𝑅 2 instead of the 𝑅 2 when
comparing the goodness of fit of various multiple regression models?
2. (5 points) If the magnitude of the covariance coefficient between two random variables
is 0, does it indicate no relationship between the two random variables?
[Hint: Use an example to justify your answer].
Answers:
1. (2 points)
𝐸𝑆𝑆 𝑅𝑆𝑆
The 𝑅 2 = 𝑇𝑆𝑆 = 1 − 𝑇𝑆𝑆 where ESS is the explained sum of squares, RSS is the
residual sum of squares and TSS is the total sum of squares.
𝑅 2 generally increases when we add another explanatory variable to our regression,
even if the added variable is a ‘garbage’ variable, so it is not a valid indicator to compare
the goodness of fit of regression model with different number of independent variables.
(2 points)
𝑅𝑆𝑆/(𝑇−𝐾)
The adjusted 𝑅 2 = 1 − 𝑇𝑆𝑆/(𝑇−1) , where K is the number of parameters in the
regression model. More explanatory variables increase K and reduce the adjusted 𝑅 2 ,
ceteris paribus. It ‘penalizes’ us for including another explanatory variable.
2. (3 points) If the covariance coefficient between two random variables is 0, it does not
mean that there is no relationship between the two random variables because zero
covariance only means that there is no linear relationship between the two variables,
there can be nonlinear relationship between the variables.
(2 points) For example, if 𝑋 2 + 𝑌 2 = 1, X and Y will have zero covariance but they
are not unrelated.
3. (1 point) No.
(2 points) The direction of the bias (positive or negative) depends on the sign of 𝛽3 𝑏32 .
If both are of the same sign, the bias term will be positive (an upward bias). If they are
of the opposite sign, the bias term will be negative (a downward bias).
5
Section 3: long questions (19 points)
Write all your answers on your answer sheets. Make sure that you write out all your
steps clearly.
(a) (3 points) Based on the estimation result, interpret the coefficient of “exper” in the
regression.
(b) (1 point) Given the degrees of freedom provided in the regression output, what is the
number of observations in the sample?
(c) (3 points) Construct a 95% confidence interval for the coefficient of “exper”.
(d) (3 points) Test the hypothesis that every year of increase in work experience will
increase the wage by more than 0.1%, use a 5% significance level. (Hint: A 0.1% is
equivalent to a proportion of 0.001.)
(e) (3 points) Use the p-value method to test the hypothesis that an increase in “exper” will
increase the wage (please state the null and alternative hypothesis and the p-value for
doing the hypothesis testing. Use the 5% significance level).
(f) (3 points) Interpret the value of “Multiple R-squared” in the regression output.
(g) (3 points) What hypothesis can you test by using the F statistic reported at the bottom
of the regression output? Perform the test and state your decision. (State the null and
alternative hypothesis and your conclusion).
6
Solution:
(a) Since 𝛽̂3 = 0.004121, wage is predicted to increase by a proportion of 0.004121 (or about
0.4121%) for every year of increase in experience [2 points], other variables held constant
[1 point].
𝛼 𝛼
[𝛽̂3 − 𝑡𝑐,𝑇−𝐾=552 ( ) × 𝑠𝑒(𝛽̂1 ), 𝛽̂3 + 𝑡𝑐,𝑇−𝐾=552 ( ) × 𝑠𝑒(𝛽̂1 ) ]
2 2
=[0.004121 − 1.96 × 0.001723, 0.004121 + 1.96 × 0.001723 ]
=[0.000744, 0.007498]
(d)
𝐻0 : 𝛽3 ≤ 0.001
𝐻1 : 𝛽3 > 0.001
̂3 −0.01
𝛽 0.004121−0.001
Test statistic = ̂3 ) = = 1.8114
𝑠𝑒(𝛽 0.001723
Since the test statistic > the 1-sided critical value at 5% significance level which is 1.645,
we reject the null hypothesis at the 5% significance level and conclude that every year of
increase in education will increase the wage by more than 1%, other things held constant.
(e)
𝐻0 : 𝛽3 ≤ 0
𝐻1 : 𝛽3 > 0
Since p-value≈ 0.01714 which is smaller than the significance level of 0.05, we
reject the null hypothesis and conclude that an increase in work experience will
increase wage, other variables held constant.
(f) Multiple R-Squared = 0.316 means that about 31.6% of the variation in the dependent
variable is explained by the variations of the explanatory variables included in the model.
(g) The F statistic is used to test of overall significance of the model, i.e. all slope coefficients
are jointly significant. The null and alternative hypotheses are as follows:
𝐻0 : 𝛽1 = 0 𝑎𝑛𝑑 𝛽2 = 0 𝑎𝑛𝑑 𝛽3 = 0
𝐻1 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑙𝑜𝑝𝑒 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡 𝑓𝑟𝑜𝑚 0
Since the p-value is close to 0, we reject the null hypothesis and conclude that at least
one of the slope coefficients is nonzero.
- END –
7
Formula sheet
Expected value
k
E[ X ] x1 f ( x1 ) x2 f ( x2 ) ..... xk f ( xk ) xi f ( xi )
i 1
Sample mean
n
x i
sample mean x i 1
Variance
n
X2 or V ( X ) E[ X E ( X )2 ] f ( xi )[ xi E ( X )]2
i 1
Sample variance
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑎 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑋 =
𝑛−1
Covariance
cov( X , Y ) E[( X E ( X ))(Y E (Y ))]
n
f ( xi , yi )( xi E ( X ))( yi E (Y ))
i 1
Correlation
cov( X , Y ) cov( X , Y )
corr( X , Y ) XY
var( X ) var(Y ) XY
where X and Y are the standard deviations of X and Y resp ectively .
Sample covariance
n
(x i x )( yi y )
sample cov( X , Y ) i 1
n 1
Sample correlation
sample cov( X , Y )
sample corr ( X , Y )
sample V ( X ) sample V (Y )
sample cov( X , Y )
XY
where X and Y are the standard deviations of X and Y respective ly.
8
Ordinary least squares estimators:
T T T T T
T xt y t xt y t x t y t Tx y x t x y t y
ˆ 2 t 1
T
t 1
T
t 1
t 1
T
t 1
T
T x ( xt )
t 1
2
t
t 1
2
x T ( x)
t 1
2
t
2
x
t 1
t x
2
ˆ1 y ˆ 2 x
The variances and covariance of ˆ1 and ˆ2 are given by:
T
x 2
t
Var ( ˆ1 ) E[ ˆ1 E ( ˆ1 )]2 2 [ T
t 1
T ( x x)
t 1
t
2
2
Var ( ˆ2 ) E[ ˆ2 E ( ˆ2 )]2 T
( x x)
t 1
t
2
x
cov(ˆ1, ˆ2 ) E[(ˆ1 E ( ˆ1 ))(ˆ2 E ( ˆ2 ))] 2 [ T
]
( x x)
t 1
t
2
t test statistic:
ˆ j c
test statistic
se(ˆ j )
RSS
T−1
Adjusted R = 1 − − K = 1 − (1 − R2 )
T 2
TSS T−K
T−1
9
The upper and lower bounds of a (1 ) 100% confidence interval for j are given by
ˆ j tc,T K se ˆ j :
2
𝛼 𝛼
[𝛽̂𝑗 − 𝑡𝑐,𝑇−𝐾 ( ) × 𝑠𝑒(𝛽̂𝑗 ) , 𝛽̂𝑗 + 𝑡𝑐,𝑇−𝐾 ( ) × 𝑠𝑒(𝛽̂𝑗 ) ]
⏟ 2 ⏟ 2
𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑 𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑
10
11
- END -
12