Homework 3
Homework 3
~
(i) The coefficient on x1 from the multiple ( ^β 1 ) and simple regression model ( β 1
)would have a huge difference . Because, x1 is highly correlated with x2 and x3,
in other words, the association between them is strong, and these variables have
large partial effects on y.
β1 and β1 can never be similar if x1 is highly correlated with x2 and x3.
(ii) x1 is not correlated with either x2 or x3, in other words, their association is not
strong in order to make clear assumption. This signifies that the coefficient on x1
~
from the the multiple ( ^β 1 ) and simple regression model ( β 1 ) would not be
different.
It can be said that β1 and β1 would be the same as the size of correlation between
x2 and x3 does not have any impact on the estimate of x1 in the multiple regression.
(iii) As there is a high correlation ( a strong collaboration ) between x2 and x3 with
x1, this increases the standard error of the coefficient on x1, which makes se (β1)
bigger than se (β1). The multicollinearity has also been introduced in the model.
The amount of correlation between x2 and x3 directly influences the standard error.
(iv) As there is a very low correlation ( a weak collaboration ) between x2 and x3
with x1, this decreases the standard error of the coefficient on x1, which makes se
(β1) lower than se (β1). The size of correlation between x2 and x3 does not
influence the standard error of x1 under multiple regression.
Problem2(6 points)
Be sure all of the following regressions contain an intercept.
(for items (i) – (iii) refer to the figures below
(i) Run a simple regression of IQ on educ to obtain the slope
~
coefficient, say, δ 1.
(ii) Run the simple regression of log(wage) on educ, and obtain the
~
slope coefficient, β 1.
(iii) Run the multiple regression of log(wage) on educ and IQ, and
obtain the slope coefficients, ^β 1 and ^β 2
~ ~ ~
(iv) Verify that β 1 = β 1 + δ 1 ∗ ^β2 .
(v) What you can say about bias? What is the direction of bias?
(vi) Which assumption of MLR was violated?
(vii) Imagine hypothesis states that ^β 1 − ^β2 = 0. How you can test it?
Which test you will use (t or F test)?
Imagine researcher wanted to test a model with square of educ as an
additional independent variable. However, by the mistake of the researcher
(2*educ) was included instead of (educ2) to the model.
(viii) Will we have bias in the estimates? Has any of the assumptions been
violated,
i) IQ = 53.68715 + 3.533829*educ
~
δ 1=3.533829
~ ~ ~ ^
iv) β 1 = β 1 + δ 1 ∗ β2
0.059839 = 0.039120 + 0.005863*3.533829
v) It will be biased.
~ ~ ~ ~
B2>0, δ 1>0 therefore B2* δ 1>0. In this case, β 1> ^β 1. So β 1 is
overestimated or positive bias.
Problem 3 (4 points)
The data set includes information on wages, education, parents’
education, and several other variables for 1,230 working men in 1991 (to
answer the questions please refer to the figures below).
(i) Assume the following regression model was estimated by OLS:
Problem 4 (6 points)
Hypothesis testing
How following problems will impact your results while implementing
hypothesis testing:
(i) Heteroskedasticity.
(ii) Higher correlation between two independent variables that are in the
model.
(iii) Omitting an important explanatory variable
(iv) Can we solve above stated issues?
1) Heteroskedasticity: This refers to the circumstance in which the variability of
the error term, or residuals, is not constant across all levels of the independent
variables. According to heteroskedasticity, in a regression model the error term
spreads widely, that is the reason why, heteroskedasticity affetcs hypothesis
testing negatively and makes difficult to rely on it. This violates one of the key
assumptions of linear regression models, which can lead to inefficient parameter
estimates and incorrect inference. Solutions to heteroskedasticity include:
Transforming the dependent variable (e.g., log transformation).
Redefining the dependent variable.
Using weighted regression.
2) High correlation between independent variables: This is known as
multicollinearity. It occurs when two or more independent variables in a
regression model are highly correlated. This can lead to unstable parameter
estimates and make it difficult to assess the individual effects of independent
variables. There can be identified some challengings for hypothesis testing
because of need to evaluate two indepentent variables and their effects into the
dependent variable. Solutions to multicollinearity include:
Removing some of the highly correlated independent variables.
Combining the independent variables.
Using techniques like Ridge Regression or Lasso that can handle
multicollinearity.
3) Omitting an important explanatory variable: This can lead to omitted
variable bias; a higher error variance. The bias occurs when a relevant explanatory
variable is not included in a regression model, which can cause the coefficient of
one or more explanatory variables in the model to be biased. This can lead to
incorrect conclusions about the relationship between the independent and
dependent variables and faces with some difficulties to evaluate the relationship
between variables. The best solution to this issue is to include all relevant
variables in the model. However, this might not always be possible due to lack
of data or other constraints.
Problem 5 (6 points)
The variable rdintens is expenditures on research and development (R&D) as a
percentage of sales. Sales are measured in millions of dollars. The variable
profmarg is profits as a percentage of sales. Using the data in RDCHEM.RAW
for 32 firms in the chemical industry, the following equation is estimated:
(i)
10 % increase in sales results with 0.03 units increase on rdintens -
expenditures on research and development (R&D). According to the
result, this is not an economically large effect.
(ii) As we know, since it is one tailed test, the hypothesis is stated below
as follow :
H0: ^β 1 − ^β2 =0
H1: ^β 1 − ^β2 >0