I. Task 1 - Multiple Regression: Determinants of Wages: Econometrics Assignment
I. Task 1 - Multiple Regression: Determinants of Wages: Econometrics Assignment
b. Hypothesis testing
Dependent Independent
Hypothesis Explanation
variable variables
The higher the age, the higher the wage. For
young people, the wage is lower than those
Age Positive
who are older because older ones have more
working experience.
Wages
The higher number of years of apprentice
training, the more experience and
Training1 Positive
knowledge that you gain. Therefore, the
wage would also increase.
- Wages: The average wage is 31.145 million USD, with standard deviations being
5.58.
- Age: The average age is 27.796 years old, with standard deviations being 3.363
- Educ: The average years of education of the participants is 14.61 years, with
standard deviations being 2.041.
- Training1: The average apprentice training in firms is 4.502 years, with standard
deviations being 1.549.
- Training2: The average apprentice training in firms but under different measures is
4.4976 years, with standard deviation being 1.565.
Wages
=> Do not reject H0 and conclude that wages are normally distributed
Age
Educ
=> Because Kurtosis ≠ 3 and p-value of JB test is smaller than 0.05, we reject H0 and
conclude that educ is not normally distributed
Traning1
Training2
Residuals
First, we record residuals by storing the residuals into the variables and use the stata
command predict resid, residuals
d. Model estimates:
w age s = ^
^ β1 + ^
β 2 * age + β^3 *educ + ^
β 4 * training1+ ^
β 5 *training2+ µ
^
- β2 = 0.1000375: If other variables fixed, when the ages increase by 1 year, the
average wages will increase 0.1000375 million USD
- β3= 1.045242: If other variables fixed, when the education increases by 1 year, the
average wages will increase 1.045242 million USD
=> We reject H0 and conclude that the model is significant at 5% significance level
H0: β 2 =0
{
* H1: β ≠0
2
From the table above, p-value = 0.129 > 0.05, therefore we don’t reject H0 and
conclude that β2 is not significant at 5% significance level.
H0: β =0
{
* H1: β 3 ≠0
3
From the table above, p-value = 0.000 < 0.05, therefore we reject H0 and conclude
that β3 is significant at 5% significance level.
H0: β 4 =0
{
* H1: β ≠0
4
From the table above, p-value = 0.627 > 0.05, therefore we don’t reject H0 and
conclude that β 4 is not significant at 5% significance level.
H0: β =0
{
* H1: β 5 ≠0
5
From the table above, p-value = 0.403 > 0.05, therefore we don’t reject H0 and
conclude that β5 is not significant at 5% significance level.
e. Diagnostic test
e-1. Test for multicollinearity
=> We reject H0 and conclude that the model has a serious multicollinearity problem.
The VIF of training1 and training2 is both higher than 10.
Besides multicollinearity, we also test for heteroscedasticity and omitted variable and
incorrect functional form to make sure that the model does not have any problems.
Heteroscedasticity test
{Fp-value=0.2221
= 1.47
> 0.05
=> Do not reject H0 and conclude that there is no omitted
In the new model, the adj R-squared = 0.3165, which means 31.65% of the dependent
variable wages is explained by the independent variables if we drop the variable
training1
CASE 2: Dropping varible “training2”
In this model, the adj R-squared = 0.3158, which means 31.58% of the dependent
variable wages is explained by the independent variables if we drop the variable
training2
=> As we can see from the result, Adj R-squared of model dropping training1 is
higher than Adj R-squared of model dropping training2 (0.3165 > 0.3158). Therefore,
the dependent variable would be more explained if dropping the variable training1
In the new VIF table, all VIF of the independent variable and mean VIF < 2. Hence,
there is no serious multicollinearity in the new model.
Additional Remedy
From the significant test for coefficient β2 above, it can be seen that age is not a
determinant of wage. And wage is a measure of productivity, therefore, we can
conclude that age is not a determinant of productivity.
2. Is apprentice training more efficient than education in raising the level of wage?
As a result from the test, our new model does not have the variable training1
From the table, we can see that β5> β 3. However, we still need to carry out the t-test
for restriction to test for the following assumptions
H0: β 5 - β 3 =0
{
H1: β 5 - β 3 > 0
β^5 - β^3 -0
We calculate the test statistics: t=
se( β^5 - β^3 )
se( ^ β3 ) = √ var( β^5 - β^3 ) = √ var( β^5 )+ var( β^3 )-2cov( β^5 , β^3 ) =
β5 − ^
t > t 496
0.05 => Reject H0 and conclude that apprentice training is more efficient than
e -6.8926+2.5388*x1+1.5608*x2
=> E(y)=
1+ e -6.8926+2.53988*x1+1.5608*x2
p
=> Estimated logit: ln( )= -6.892555+ 2.538797 *x1+1.560754 *x2+µ
1-p
d. Conduct a test for overall significance using alpha =.05.
- The hypothesis for the overall significance test:
{H0: β - β =0
2 3
H1: A t least B j ≠ 0 j∊(2,3)
H0: β 2 =0
{
* H1: β ≠0
2
- From the table above, p=0.000<0.05 => Reject H0 and conclude that β2 is significant
at 5% significance level.
H0: β 3 =0
{
* H1: β ≠0
3
- From the table above, p=0.006<0.05 => Reject H0 and conclude that β3 is significant
at 5% significance level.
f. Use the estimated logit computed in part (c) to estimate the probability that
students with a 2.5 grade point average who did not attend the orientation program
will return to Lakeland for their sophomore year. What is the estimated probability
for students with a 2.5 grade point average who attended the orientation program?
- According to part c, the estimated logistic regression is:
e -6.89256+2.5388*X1+1.56075*X2
pi =
1+ e-6.89256+2.5388*X1+1.56075*X2
+ To calculate the probability that students with a 2.5 grade point average who
did not attend the orientation program will return to Lakeland for their
sophomore year, we substitute the value of GPA=2.5 and the value of
Program=0:
e-6.89256+2.5388*2.5+1.56075*0
pi = ≈ 0.3669
1+ e -6.89256+2.5388*2.5+1.56075*0
=> Therefore, the estimated probability that students with a 2.5 grade point average
who did not attend the orientation program will return to Lakeland for their
sophomore year is 36.69%
+ To calculate the probability that students with a 2.5 grade point average who
attended the orientation program, we substitute the value of GPA=2.5 and the
value of Program=1:
e-6.89256+2.5388*2.5+1.56075*1
pi = ≈ 0.734
1+ e -6.89256+2.5388*2.5+1.56075*1
=> Therefore, the estimated probability that students with a 2.5 grade point average
who attended the orientation program will return for their sophomore year is 73.4%
g. What is the estimated odds ratio for the orientation program? Interpret it.
The estimated odds ratio for the orientation program is 4.762413 : If we fix GPA, if
students attended the orientation program, the estimated odds of students who returned
to sophomore increases 4.762413 times, or 376.2413%.
h. Would you recommend making the orientation program a required activity? Why
or why not?
- From part f, the estimated probability that students with a 2.5 grade point average
who did not attend the orientation program is 36.69% and the estimated probability
that students with a 2.5 grade point average who attended the orientation program will
return for their sophomore year is 73.4%
- From part g, if we fix GPA, if students attended the orientation program, the
estimated odds of students who returned to sophomore increases 4.762413 times, or
376.2413%.
=> Therefore, we conclude that the orientation program has a great effect in making
students return to Lakeland for the sophomore year. Hence, we recommend making
the orientation program a required activity