0% found this document useful (0 votes)
108 views

I. Task 1 - Multiple Regression: Determinants of Wages: Econometrics Assignment

This document contains an econometrics assignment to analyze the determinants of wages using multiple regression analysis. It includes: 1) Developing an econometric model to test the relationship between wages and several independent variables like age, education, training. 2) Conducting hypothesis tests on the coefficients and significance of the model. 3) Diagnostic tests that find the model suffers from multicollinearity between the training variables. The summary suggests dropping one of the training variables to remedy this issue and improve the model.

Uploaded by

Lâm Anh Phạm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

I. Task 1 - Multiple Regression: Determinants of Wages: Econometrics Assignment

This document contains an econometrics assignment to analyze the determinants of wages using multiple regression analysis. It includes: 1) Developing an econometric model to test the relationship between wages and several independent variables like age, education, training. 2) Conducting hypothesis tests on the coefficients and significance of the model. 3) Diagnostic tests that find the model suffers from multicollinearity between the training variables. The summary suggests dropping one of the training variables to remedy this issue and improve the model.

Uploaded by

Lâm Anh Phạm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

ECONOMETRICS ASSIGNMENT

I. TASK 1 - Multiple regression: Determinants of wages


a. Econometric model

Wage = β1 + β2*Age + β3*Educ + β4*Training1 + β5*Training2 + µ

b. Hypothesis testing

Dependent Independent
Hypothesis Explanation
variable variables
The higher the age, the higher the wage. For
young people, the wage is lower than those
Age Positive
who are older because older ones have more
working experience.

People who have higher education levels


will get access to better job opportunities
Educ Positive
and are more knowledgable than others,
therefore they will have a higher wage.

Wages
The higher number of years of apprentice
training, the more experience and
Training1 Positive
knowledge that you gain. Therefore, the
wage would also increase.

Even though the measure applied is


unknown, the variable “training2” is
Training2 Positive
considered to imply the same as “training
1”, therefore the relationship is also positive
c. Basic statistics

c-1. Mean and standard deviation

- Wages: The average wage is 31.145 million USD, with standard deviations being
5.58.
- Age: The average age is 27.796 years old, with standard deviations being 3.363
- Educ: The average years of education of the participants is 14.61 years, with
standard deviations being 2.041.
- Training1: The average apprentice training in firms is 4.502 years, with standard
deviations being 1.549.
- Training2: The average apprentice training in firms but under different measures is
4.4976 years, with standard deviation being 1.565.

c-2. Skewness, Kurtosis, and Jarque-Bera normality test

 Wages

{H0: Wages are normally distributed


H1 : Wages are not normally distributed

+ p-value = 0.8302 is used for test S=0 -> so the skewness is = 0


+ p-value = 0.8321 is used for test K=3 -> so the Kurtosis = 3
+ Finally, we test for both S=0 and K=3 to test whether wages are normal
We have, chi-square value = 0.09 and p-value = 0.9556 >0.05

=> Do not reject H0 and conclude that wages are normally distributed

 Age

{H0: Age is normally distributed


H1 : Age is not normally distributed

+ p-value = 0.5379 is used for test S=0 -> so the skewness is = 0


+ p-value = 0.0883 is used for test K=3 -> so the Kurtosis = 3
+ Finally, we test fr both S=0 and K=3 to test whether age is normal
We have, chi-square value = 3.29 and p-value = 0.1932 >0.05

=> Do not reject H0 and conclude that age is normally distributed

 Educ

{H0: Educ is normally distributed


H1 : Educ is not normally distributed

+ p-value = 0.6451 is used for test S=0 -> so the skewness is = 0


+ p-value = 0.0147 is used for test K=3 -> so the Kurtosis ≠ 3
+ Finally, we test for both S=0 and K=3 to test whether educ is normal
We have, chi-square value = 6.17 and p-value = 0.0458 < 0.05

=> Because Kurtosis ≠ 3 and p-value of JB test is smaller than 0.05, we reject H0 and
conclude that educ is not normally distributed

 Traning1

{H0: Training1 is normally distributed


H1 : Training1 is not normally distributed

+ p-value = 0.7908 is used for test S=0 -> so the skewness is = 0


+ p-value = 0.1632 is used for test K=3 -> so the Kurtosis = 3
+ Finally, we test for both S=0 and K=3 to test whether training1 is normal
We have, chi-square value = 2.02 and p-value = 0.3643 > 0.05

=> Do not reject H0 and conclude that traing1 is normally distributed

 Training2

{H0: Training2 is normally distributed


H1 : Training2 is not normally distributed

+ p-value = 0.8434 is used for test S=0 -> so the skewness is = 0


+ p-value = 0.1448 is used for test K=3 -> so the Kurtosis = 3
+ Finally, we test for both S=0 and K=3 to test whether training2 is normal
We have, chi-square value = 2.17 and p-value = 0.3378> 0.05
=> Do not reject H0 and conclude that traing2 is normally distributed

 Residuals
First, we record residuals by storing the residuals into the variables and use the stata
command predict resid, residuals

Hypothesis: {H0: Error terms are normally distributed


H1 : Errror terms are not normally distributed

+ p-value = 0.3129 is used for test S=0 -> so the skewness is = 0


+ p-value = 0.1828 is used for test K=3 -> so the Kurtosis = 3
+ Finally, we test for both S=0 and K=3 to test whether residuals are normal
We have, chi-square value = 2.8 and p-value = 0.2469 >0.05

=> Do not reject H0 => Error terms are normally distributed

d. Model estimates:

Sample Regression Functions:

w age s = ^
^ β1 + ^
β 2 * age + β^3 *educ + ^
β 4 * training1+ ^
β 5 *training2+ µ
^

After replacing the coefficient, we have the following function:


w age s =6.911+ 0.1 *age + 1.045 * educ + 0.508 * training 1 + 0.866 * training2+ µ
^ ^

d-1. Meaning of the coefficients in the model:

- β2 = 0.1000375: If other variables fixed, when the ages increase by 1 year, the
average wages will increase 0.1000375 million USD

- β3= 1.045242: If other variables fixed, when the education increases by 1 year, the
average wages will increase 1.045242 million USD

- β 4 = 0.5081875: If other variables fixed, when the numbers of years of apprentice


training increases 1 year, the average wages will increase 0.5081875 million USD.

d-2. Test for significance of model:

{H0: β 2=β 3 =β 4 =β5 =0


H1: A t least B j ≠ 0[ j ∊(2,5)]

From the table above, p-value = 0.00 < 0.05

=> We reject H0 and conclude that the model is significant at 5% significance level

d-3. Test for significance of coefficients

H0: β 2 =0
{
* H1: β ≠0
2

From the table above, p-value = 0.129 > 0.05, therefore we don’t reject H0 and
conclude that β2 is not significant at 5% significance level.
H0: β =0
{
* H1: β 3 ≠0
3

From the table above, p-value = 0.000 < 0.05, therefore we reject H0 and conclude
that β3 is significant at 5% significance level.
H0: β 4 =0
{
* H1: β ≠0
4

From the table above, p-value = 0.627 > 0.05, therefore we don’t reject H0 and
conclude that β 4 is not significant at 5% significance level.
H0: β =0
{
* H1: β 5 ≠0
5

From the table above, p-value = 0.403 > 0.05, therefore we don’t reject H0 and
conclude that β5 is not significant at 5% significance level.
e. Diagnostic test
e-1. Test for multicollinearity

{H0: T here is no multicollineartiy in the model


H1: T here is multicollineartiy in the model

- Mean VIF = 31.20 > 10

=> We reject H0 and conclude that the model has a serious multicollinearity problem.
The VIF of training1 and training2 is both higher than 10.

e-2. Other diagnostic tests

Besides multicollinearity, we also test for heteroscedasticity and omitted variable and
incorrect functional form to make sure that the model does not have any problems.

 Heteroscedasticity test

{H0: T here is no heteroscedasticity in the model


H1: T here is heteroscedasticity in the model
- Breush-Pagan test
{Chi square = 1.29
p-value=0. 8637 > 0.05
=> Do not reject H0 and conclude that there is no

heteroscedasticity in the model.


- White test

{Chi square = 11.51


p-value=0.6459 > 0.05
=> Do not reject H0 and conclude that there is no

heteroscedasticity in the model.


=> Both tests give the same conclusion
 Omitted variable and incorrect functional form

{H0: T here is no omitted variable and incorrect functional form


H1: T here is omitted variable and incorrect functional form

{Fp-value=0.2221
= 1.47
> 0.05
=> Do not reject H0 and conclude that there is no omitted

variable in the model.


e-3. Remedy and suggestion for the new model

 Dropping independent variables

- Scatter plot between training1 and training2:


Based on the VIF table and the scatter plot between training1 and training2, we can
conclude that they have a strong linear relationship. This leads to a multicollinearity
problem. Thus we can remove the variables that have a linear relationship.

CASE 1: Drop the variable “training1”

In the new model, the adj R-squared = 0.3165, which means 31.65% of the dependent
variable wages is explained by the independent variables if we drop the variable
training1
CASE 2: Dropping varible “training2”

In this model, the adj R-squared = 0.3158, which means 31.58% of the dependent
variable wages is explained by the independent variables if we drop the variable
training2

=> As we can see from the result, Adj R-squared of model dropping training1 is
higher than Adj R-squared of model dropping training2 (0.3165 > 0.3158). Therefore,
the dependent variable would be more explained if dropping the variable training1

In the new VIF table, all VIF of the independent variable and mean VIF < 2. Hence,
there is no serious multicollinearity in the new model.

Additional Remedy

 Adding more observations in the sample


 Adding variables: Because in the old model, the R-squared= 0.3206, which
means 32.06% of the model explains for the variation of the dependent
variable. Hence, we suggest adding more variables such as “productivity”,
“experience”,...

f. Conclusion: Answering two research questions

1. Is age a determinant of productivity?

From the significant test for coefficient β2 above, it can be seen that age is not a
determinant of wage. And wage is a measure of productivity, therefore, we can
conclude that age is not a determinant of productivity.

2. Is apprentice training more efficient than education in raising the level of wage?

As a result from the test, our new model does not have the variable training1

From the table, we can see that β5> β 3. However, we still need to carry out the t-test
for restriction to test for the following assumptions

H0: β 5 - β 3 =0
{
H1: β 5 - β 3 > 0

β^5 - β^3 -0
We calculate the test statistics: t=
se( β^5 - β^3 )
se( ^ β3 ) = √ var( β^5 - β^3 ) = √ var( β^5 )+ var( β^3 )-2cov( β^5 , β^3 ) =
β5 − ^

√ 0.13198062 +0.10848312 -2. (-0.00040995) = 0.1732


1.364336-1.045906
=> t= = 1.839
0.1732

t > t 496
0.05 => Reject H0 and conclude that apprentice training is more efficient than

education in raising wages.

TASK 2 - Lakeland: Retention to leave


a. Write the logistic regression equation relating X1 and X2 to Y.
- According to the questions:
Y =1 if the student returned to Lakeland for the sophomore year and Y = 0 if not.
- The two independent variables are :
+ x1 is GPA at the end of the first semester
+ x2 =0 if the student did not attend the orientation program and X2 =1 if the
student attended the orientation program
Thus the equation is
e β1+β2*x1+β3*x2
E(y)=
1+ eβ1+β2*x1+β3*x2

e -6.8926+2.5388*x1+1.5608*x2
=> E(y)=
1+ e -6.8926+2.53988*x1+1.5608*x2

b. What is the interpretation of E(Y) when X2 =0?


- According to the formula, E(y) = p and in this case, p(y=1)= p
=> For this problem the interpretation of E(y) when x2=0 is the probability of y=1.
Therefore, for a given data, it is an estimate of the probability that a student who did
not attend the orientation program will return to Lakeland for the sophomore year.
c. Use both independent variables and software to compute the estimated logit.
p
Logit function: ln( )=β1+β2*x1+β3*x2+µ
1-p

p
=> Estimated logit: ln( )= -6.892555+ 2.538797 *x1+1.560754 *x2+µ
1-p
d. Conduct a test for overall significance using alpha =.05.
- The hypothesis for the overall significance test:

{H0: β - β =0
2 3
H1: A t least B j ≠ 0 j∊(2,3)

- According to the table, LR chi square=47.87, p-value=0.000<0.05


=> Reject Ho and conclude that the overall model is significant at 5% significance
level
e. Use alpha =.05 to determine whether each of the independent variables is
significant.

H0: β 2 =0
{
* H1: β ≠0
2

- From the table above, p=0.000<0.05 => Reject H0 and conclude that β2 is significant
at 5% significance level.
H0: β 3 =0
{
* H1: β ≠0
3

- From the table above, p=0.006<0.05 => Reject H0 and conclude that β3 is significant
at 5% significance level.

f. Use the estimated logit computed in part (c) to estimate the probability that
students with a 2.5 grade point average who did not attend the orientation program
will return to Lakeland for their sophomore year. What is the estimated probability
for students with a 2.5 grade point average who attended the orientation program?
- According to part c, the estimated logistic regression is:
e -6.89256+2.5388*X1+1.56075*X2
pi =
1+ e-6.89256+2.5388*X1+1.56075*X2

+ To calculate the probability that students with a 2.5 grade point average who
did not attend the orientation program will return to Lakeland for their
sophomore year, we substitute the value of GPA=2.5 and the value of
Program=0:
e-6.89256+2.5388*2.5+1.56075*0
pi = ≈ 0.3669
1+ e -6.89256+2.5388*2.5+1.56075*0

=> Therefore, the estimated probability that students with a 2.5 grade point average
who did not attend the orientation program will return to Lakeland for their
sophomore year is 36.69%

+ To calculate the probability that students with a 2.5 grade point average who
attended the orientation program, we substitute the value of GPA=2.5 and the
value of Program=1:

e-6.89256+2.5388*2.5+1.56075*1
pi = ≈ 0.734
1+ e -6.89256+2.5388*2.5+1.56075*1

=> Therefore, the estimated probability that students with a 2.5 grade point average
who attended the orientation program will return for their sophomore year is 73.4%
g. What is the estimated odds ratio for the orientation program? Interpret it.

The estimated odds ratio for the orientation program is 4.762413 : If we fix GPA, if
students attended the orientation program, the estimated odds of students who returned
to sophomore increases 4.762413 times, or 376.2413%.
h. Would you recommend making the orientation program a required activity? Why
or why not?
- From part f, the estimated probability that students with a 2.5 grade point average
who did not attend the orientation program is 36.69% and the estimated probability
that students with a 2.5 grade point average who attended the orientation program will
return for their sophomore year is 73.4%
- From part g, if we fix GPA, if students attended the orientation program, the
estimated odds of students who returned to sophomore increases 4.762413 times, or
376.2413%.
=> Therefore, we conclude that the orientation program has a great effect in making
students return to Lakeland for the sophomore year. Hence, we recommend making
the orientation program a required activity

You might also like