0% found this document useful (0 votes)
164 views

Homework 3

This document contains an economics homework assignment involving econometrics. It includes 5 problems analyzing relationships between variables using simple and multiple linear regression models. Problem 1 asks the student to consider how the estimates of the coefficient on x1 would differ between simple and multiple regression models under different assumptions about the correlations between x1, x2, and x3. Problem 2 involves running simple and multiple regressions using wage, education, and IQ data to test assumptions of the multiple linear regression model. Problem 3 examines a regression model using education as the dependent variable and parents' education levels and ability as independent variables to explain variations in education. Problem 4 discusses how heteroskedasticity, multicollinearity

Uploaded by

canqarazadeanar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
164 views

Homework 3

This document contains an economics homework assignment involving econometrics. It includes 5 problems analyzing relationships between variables using simple and multiple linear regression models. Problem 1 asks the student to consider how the estimates of the coefficient on x1 would differ between simple and multiple regression models under different assumptions about the correlations between x1, x2, and x3. Problem 2 involves running simple and multiple regressions using wage, education, and IQ data to test assumptions of the multiple linear regression model. Problem 3 examines a regression model using education as the dependent variable and parents' education levels and ability as independent variables to explain variations in education. Problem 4 discusses how heteroskedasticity, multicollinearity

Uploaded by

canqarazadeanar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Econometrics

Homework 3 - Anar Canqarazade


Problem 1 (4 points)
Suppose that you are interested in estimating the ceteris paribus relationship
between y and x1. For this purpose, you can collect data on two control
variables, x 2 and x 3. (For concreteness, you might think of y as final exam
score, x 1 as class attendance, x 2 as GPA up through the previous semester, and
~
x 3 as SAT or ACT score.) Let β 1 be the simple regression estimate from y on x 1
and let ^β 1 be the multiple regression estimate from y on x 1, x 2, x 3.
(i) If x 1 is highly correlated with x 2 and x 3 in the sample, and x 2 and x 3
~
have large partial effects on y, would you expect β 1 and ^β 1 to be
similar or very different? Explain.
(ii) If x 1 is almost uncorrelated with x 2, and x 3, but x 2 and x 3 are highly
~
correlated, will β 1 and ^β 1 tend to be similar or very different?
Explain.
(iii) If x 1 is highly correlated with x 2 and x 3, and x 2 and x 3 have small
~
partial effects on y, would you expect se( β 1) or se( ^β 1) to be smaller?
Explain.
(iv) If x 1 is almost uncorrelated with x 2 and x 3, x 2 and x 3 have large partial
effects on y, and x 2 and x 3 are highly correlated, would you expect
~
se( β 1) or se( ^β 1) to be smaller? Explain

~
(i) The coefficient on x1 from the multiple ( ^β 1 ) and simple regression model ( β 1
)would have a huge difference . Because, x1 is highly correlated with x2 and x3,
in other words, the association between them is strong, and these variables have
large partial effects on y.
β1 and β1 can never be similar if x1 is highly correlated with x2 and x3.
(ii) x1 is not correlated with either x2 or x3, in other words, their association is not
strong in order to make clear assumption. This signifies that the coefficient on x1
~
from the the multiple ( ^β 1 ) and simple regression model ( β 1 ) would not be
different.
It can be said that β1 and β1 would be the same as the size of correlation between
x2 and x3 does not have any impact on the estimate of x1 in the multiple regression.
(iii) As there is a high correlation ( a strong collaboration ) between x2 and x3 with
x1, this increases the standard error of the coefficient on x1, which makes se (β1)
bigger than se (β1). The multicollinearity has also been introduced in the model.
The amount of correlation between x2 and x3 directly influences the standard error.
(iv) As there is a very low correlation ( a weak collaboration ) between x2 and x3
with x1, this decreases the standard error of the coefficient on x1, which makes se
(β1) lower than se (β1). The size of correlation between x2 and x3 does not
influence the standard error of x1 under multiple regression.

Problem2(6 points)
Be sure all of the following regressions contain an intercept.
(for items (i) – (iii) refer to the figures below
(i) Run a simple regression of IQ on educ to obtain the slope
~
coefficient, say, δ 1.
(ii) Run the simple regression of log(wage) on educ, and obtain the
~
slope coefficient, β 1.
(iii) Run the multiple regression of log(wage) on educ and IQ, and
obtain the slope coefficients, ^β 1 and ^β 2
~ ~ ~
(iv) Verify that β 1 = β 1 + δ 1 ∗ ^β2 .
(v) What you can say about bias? What is the direction of bias?
(vi) Which assumption of MLR was violated?
(vii) Imagine hypothesis states that ^β 1 − ^β2 = 0. How you can test it?
Which test you will use (t or F test)?
Imagine researcher wanted to test a model with square of educ as an
additional independent variable. However, by the mistake of the researcher
(2*educ) was included instead of (educ2) to the model.
(viii) Will we have bias in the estimates? Has any of the assumptions been
violated,
i) IQ = 53.68715 + 3.533829*educ
~
δ 1=3.533829

ii) Log(wage) = 5.973063 + 0.059839*educ


~
β 1=0.059839

iii) Log(wage) = 5.658288 + 0.039120*educ + 0.005863*IQ


^β =0.039120
1
^β =0.005863
2

~ ~ ~ ^
iv) β 1 = β 1 + δ 1 ∗ β2
0.059839 = 0.039120 + 0.005863*3.533829

v) It will be biased.
~ ~ ~ ~
B2>0, δ 1>0 therefore B2* δ 1>0. In this case, β 1> ^β 1. So β 1 is
overestimated or positive bias.

vi) VIOLATED ASSUMPTION : MLR.1 Linear in Parameters


For unbiasedness:
E( ^β 1) = B1
vii) ^β − ^β = 0
1 2

We will use t-statistics.


H0: ^β 1 − ^β2=0
H1: ^β 1 − ^β2>0

viii) could be violated- no perfect collinearity assumption


Because x2= 2*x1

Problem 3 (4 points)
The data set includes information on wages, education, parents’
education, and several other variables for 1,230 working men in 1991 (to
answer the questions please refer to the figures below).
(i) Assume the following regression model was estimated by OLS:

How much sample variation in educ is explained by parents’


education? Interpret the coefficient on motheduc.
(ii) What is the VIF? Could you check correlation between them?
(hint: for models without intercepts refer to uncentered VIF,
with intercept centered VIF)
(iii) Add the variable abil (a measure of cognitive ability) to the
regression from part (i), and report the results in equation form.
Does “ability” help to explain variations in education, even after
controlling for parents’ education? Explain.
(i) Educ = 6.964355 + 0.304197motheduc + 0.190286fatheduc

Coefficient on motheduc = 0.304197. A 1 unit increase in mother`s


education will increase education of man on average by 0.304197
unit.
R2=0.249251 , it means that 24.93% percent of variation in
eduction is explained by parents’ education

(ii) A variance inflation factor (VIF) is a measure of the amount


of multicollinearity in regression analysis.
Centered VIF is 1.559728.
So , R2 j is 0.3588 ( less than 1 ). It means, there is no perfect
correlation between fatheduc and motheduc.

(iii)Educ = 8.448690 + 0.189131motheduc + 0.111085fatheduc +


0.502483abil
R2=0.427508, it means that 42.75% of variation in education is
explained by mother's education, father's education, and cognitive
ability (abilabil).
R2 has increased and coefficients of fatheeduc and motheduc has
changed. It means that ability variable is relevant variable for
this model.

Problem 4 (6 points)
Hypothesis testing
How following problems will impact your results while implementing
hypothesis testing:
(i) Heteroskedasticity.
(ii) Higher correlation between two independent variables that are in the
model.
(iii) Omitting an important explanatory variable
(iv) Can we solve above stated issues?
1) Heteroskedasticity: This refers to the circumstance in which the variability of
the error term, or residuals, is not constant across all levels of the independent
variables. According to heteroskedasticity, in a regression model the error term
spreads widely, that is the reason why, heteroskedasticity affetcs hypothesis
testing negatively and makes difficult to rely on it. This violates one of the key
assumptions of linear regression models, which can lead to inefficient parameter
estimates and incorrect inference. Solutions to heteroskedasticity include:
 Transforming the dependent variable (e.g., log transformation).
 Redefining the dependent variable.
 Using weighted regression.
2) High correlation between independent variables: This is known as
multicollinearity. It occurs when two or more independent variables in a
regression model are highly correlated. This can lead to unstable parameter
estimates and make it difficult to assess the individual effects of independent
variables. There can be identified some challengings for hypothesis testing
because of need to evaluate two indepentent variables and their effects into the
dependent variable. Solutions to multicollinearity include:
 Removing some of the highly correlated independent variables.
 Combining the independent variables.
 Using techniques like Ridge Regression or Lasso that can handle
multicollinearity.
3) Omitting an important explanatory variable: This can lead to omitted
variable bias; a higher error variance. The bias occurs when a relevant explanatory
variable is not included in a regression model, which can cause the coefficient of
one or more explanatory variables in the model to be biased. This can lead to
incorrect conclusions about the relationship between the independent and
dependent variables and faces with some difficulties to evaluate the relationship
between variables. The best solution to this issue is to include all relevant
variables in the model. However, this might not always be possible due to lack
of data or other constraints.

Problem 5 (6 points)
The variable rdintens is expenditures on research and development (R&D) as a
percentage of sales. Sales are measured in millions of dollars. The variable
profmarg is profits as a percentage of sales. Using the data in RDCHEM.RAW
for 32 firms in the chemical industry, the following equation is estimated:

(i) Interpret the coefficient on log(sales). In particular, if sales increases


by 10%, what is the estimated percentage point change in rdintens?
Is this an economically large effect?
(ii) Test the hypothesis that R&D intensity does not change with sales
against the alternative that it does increase with sales. Do the test at
the 5% and 10% levels.
(iii) Interpret the coefficient on profmarg. Is it economically large?
(iv) Does profmarg have a statistically significant effect on rdintens?
(v) Build confidence interval for profmarg and log(sales) at the 5% and
10% levels
(vi) Find p values for the profmarg and log(sales).

(i)
10 % increase in sales results with 0.03 units increase on rdintens -
expenditures on research and development (R&D). According to the
result, this is not an economically large effect.
(ii) As we know, since it is one tailed test, the hypothesis is stated below
as follow :
H0: ^β 1 − ^β2 =0
H1: ^β 1 − ^β2 >0

According to our example : t = (0.321 - 0 ) / 0.216 = 1.486


5% level of signifance and 29 degrees of freedom in the critical value of
one text t -statistic equal to 1.699, then --> 1.486 < 1.699.
We do not reject the null hypothesis. It means that the R/D intensity does
not change with sale at 5% level of signifance.

On the other hand, 10% level of signifance and 29 degrees of freedom


eqaul 1.311, then --> 1.486 > 1.311
We reject the null hypothesis. It means that the R/D intensity increases
with sales at 10% level of signifance.

(iii) The coefficient on profits as a percentage of sales shows the effect of


increase in profits as a percentage of sales on the R/D.
Therefore, it means that unit change in progmarg will results into 0.05
unit change in R/D. For every 1% increase in profmarg leads to 0.05%
increase in R/D.
As a result, it is NOT ECONOMICALLY LARGE

(iv) In order to know wheather profmarg has a significant effect on


rdintens or not, we should perform the t-text and evaluate the result at
both 5% and 10% level of signifance.
H0: ^β 1 − ^β2=0
H1: ^β 1 − ^β2>0

T= ( 0.05 - 0 ) / 0.046 = 1.087


CASE 1 - The critical value of one tail test t-statistics at 5% and 10%
level of signifance at 29 degrees of freedom is 1.699 and 1.311
respectively. According to the last indicators :
1.087 < 1.699 and 1.087 < 1.311. Both results are lower than both
5% and 10% signifance, as a result, we do not reject the null hypothesis.
It means that, profmang has no significant effect on rdintens at 5% and
10% level of signifance.
(v) The 95% confidence interval for the coefficient on profmarg is
approximately 0.050 ± 1.96 * 0.046 = (-0.040, 0.140). The 90% confidence
interval is approximately 0.050 ± 1.645 * 0.046 = (-0.025, 0.125). The
confidence intervals for the coefficient on log(sales) can be calculated in a
similar way.
(vi) The p-value is the probability of observing a t-statistic as extreme as the
one we calculated, assuming the null hypothesis is true. The null hypothesis in
this case is that the coefficient is zero.
 For log(sales), the t-statistic is 0.321/0.216 = 1.49. Without knowing the
exact degrees of freedom, I can’t provide the exact p-value. However,
based on the t-statistic, it’s likely that the p-value for log(sales) would be
greater than 0.05, meaning it would not be statistically significant at the
5% level. However, it could be less than 0.10, meaning it could be
statistically significant at the 10% level.
 For profmarg, the t-statistic is 0.050/0.046 = 1.09. Again, without
knowing the exact degrees of freedom, I can’t provide the exact p-value.
However, based on the t-statistic, it’s likely that the p-value for
profmarg would be greater than 0.05 and 0.10, meaning it would not be
statistically significant at either the 5% or 10% level.

You might also like