Econometrics Report - WIRTE UP
Econometrics Report - WIRTE UP
Section 1
- Summary stats
- Literature review
Section 2: Summary statistics
Need table of figure 1.2
The figure 1.2 illustrates the variables chosen from ‘the world inequality database (WID)’ the
dataset presents 109 different countries the observations are taken in the year 2020. The
table illustrates only the base variables of our model. In the model we are testing the
dependent variable GDP which is a measure of the countries’ outputs given a specific
period. The independent variables we have chosen are:
We felt that these variables are best able to illustrate the specific effect of how inequality of
opportunity for youth creates a negative impact on the economic development of a country.
The first variable shows how well the country is able to provide opportunities for its young
people, combining multiple factors such as the labour market problems, access to education,
as well as socioeconomic conditions. The second variable further illustrates the capacity for
youth to enter the labour force, and the third one is the best estimate we have available to
demonstrate the level of education available to the youth.
We have also chosen multiple control variables, mainly related to economic development,
which help us better explain the variation in GDPP to avoid ommitted variable bias.
We have decided to take GDP per capita as our dependent variable, as it is a universal
measurement of a country's economic development. As we can see on this histogram, the data is
skewed, as there are a couple of outliers - countries with incredibly high GDP per capita - and a far
larger number of countries with lower GDP. This means that the mean GDPP is inflated, meaning
that we need to use the natural log of GDPP in our regressions not to skew the results. This is
supported by a drastic decrease in standard deviation in proportion to the mean.
Another interesting relationship in the variables is between GDP and youth NEET (not in education,
employment or training). If we regress those variables, 40% of movement of GDP can be explained
by Youth NEET, with a coefficient of -0.08043, meaning that every increase in youth NEET by 1%
leads to a 7.7% decrease in GDP per capita. As we can see on the graph demonstrating this
relationship, there are some outliers to this trend, such as Turkey or Togo.
Another summary statistic we need to look at is the relationship between youth NEET and LFP
(labour force participation), as we want to use them as independent variables in our model, so we
have to ensure there is no multicollinearity.
As we can see on this graph, there is close to no relationship between the variables, which is
supported by running a regression, as on 4.2% of variation in NEET can be explained by LFP.
Therefore, we can use both of these variables in our model without having multicollinearity.
There is a slight relationship between trade and political stability, which is logical as the more
stable your country is, the more likely the other countries are to be willing to trade with you. On
this graph, we can also see the trade outliers such as Malta, Luxembourg and Ireland: small
developed nations that are unable to produce sufficient goods and are therefore forced to trade.
Variable Obs Mean Std. Min Max Varian Std error of Skewn Kurtos
dev ce mean ess is
GDP 109 23897.5 20888. 1217.4 11175 4.36E+ 2000.786 1.31570 5.2408
4 82 46 1.3 08 4 25
Share of youth 105 18.3315 8.9081 4.301 47.532 79.355 1.247623 0.78126 3.2031
not in education 2 71 52 5 39
Govt expenditure 96 4.73387 1.6000 0.3584 8.8095 2.5602 0.163308 0.12517 3.4607
on education 9 85 79 48 72 56 72
Youth Labour 105 42.2294 12.784 21.015 78.302 163.43 0.8693485 0.46111 2.4985
Force 1 33 92 18 84
Participation
Trade 106 83.9747 53.049 23.079 372.27 2814.2 2.817063 2.28927 11.060
6 26 78 14 24 9 49
FDI 108 5.41240 30.487 - 200.83 929.49 2.93367 4.53443 31.811
3 59 101.83 15 3 7 29
31
Exports 106 39.8187 29.003 9.2985 203.12 841.19 5.1526 2.56676 12.636
3 44 59 03 96 3 87
Gauss-Markov assumptions
Before moving onto the regression, it is important first to explain the Gauss-Markov theorem
which sets out key assumptions for the OLS estimator to be Best Linear Unbiased Estimator
(BLUE). In this report, our regression design accounts for these assumptions as we test and
pre-emptively refine models accordingly. We aim for our models to draw reliable statistical
inferences by considering and doing the following.
The dataset is assumed to be drawn from a random sample, and the model equations are
linear in parameters, satisfying the first two Gauss-Markov conditions.
Heteroskedasticity is when the variance of the error term is not constant across all levels of
the independent variables (Wooldridge, 2016). Testing for some of initial our regressions
produced p-values of less than the 5% significance level - rejecting the null hypothesis that
models are homoscedastic. This can be shown in table 1. To correct for this, we re-estimated
all the models using robust standard errors. Thus, the results in all regressions that are to
follow were ran using the robust command.
Multicollinearity is when two or more independent variables in a regression model are
highly correlated, making it difficult to isolate the individual effect of each predictor on the
dependent variable (Wooldridge, 2016). This can lead to inflated standard errors and
unreliable coefficient estimates. To assess this, we calculated the Variance Inflation Factor
(VIF) for each independent variable. Results showed that most VIF values, including our
youth variables, were close to 1 (below 5), indicating no evidence of problematic
multicollinearity. Table 1 shows each models mean vif value – while models 6 to 8 have high
values compared to previous regressions, given the nature of interactive models, this is to be
expected thus this does not pose a significant problem.
Lastly, the zero conditional mean assumption was assessed using the Ramsey RESET test to
detect potential omitted variable bias (Wooldridge, 2016). All models but one regression
produced p-values above the 5% level, indicating no significant evidence of OVB.
Furthermore, Regressions 2 and 3 included control variables that captured other relevant
influences on GDP, providing additional support for the validity of this assumption. While it
needs to be acknowledged that there is some capacity for reverse causality within the
model, however we have taken precautions of rigorously testing our regressions as well as
adding different control variables to ensure exogeneity within a reasonable degree.
The regression models that failed some of the Gauss-Markov tests will be discussed
individually, however our main findings remain robust, and we will interpret every result
with caution.
Test Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8
Heteroskedastici
ty p-value
(before robust)
0.2649 0.0094 0.0095 0.0201 0.2482 0.0005 0.0003 0.0015
Multicollinearity:
mean vif-value
1.09 1.39 1.4 1.85 1.16 5.82 3.38 4.69
OVB (Ramsey
test) p-value
0.44 0.0952 0.1478 0.2873 0.5104 0.0849 0.0849 0.0224
Table 1: Results for each model testing for heteroskedasticity, multicollinearity (vif) and omitted variable bias (Ramsey test).
Variable Model 1 Model 2 Model 3 Model 4 Model 5 (Non- Model 6 Model 7 Model 8
(High-income) high-income)
Share of youth -.0818929 -.0419818 -.038372 -.016031 -.024102 -.0252988 -.0228704 -.027552
NEET
Youth labour -.0116311 -.0062812 -.0070091 .0094555 -.016094 -.0050974 -.0054242 -.0178698
force
participation
Trade - .0046598 .0042744 .0022147 .003433 .0031597 .0026231 .002911
.
Urban - .0346108 .0352443 .0070711 .0317084 .0281844 0289353 .0269666
population
Obv. 95 92 91 41 50 91 91 91
Table 2: Results for each model showing coefficient values, R2 and number of observations.
P-values
Variable Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8
Share of youth 0.000 0.000 0.000 0.219 0.034 0.005 0.014 0.006
NEET
Youth labour 0.148 0.235 0.182 0.005 0.102 0.337 0.283 0.070
force
participation
Regression 1:
Model 1 establishes the foundational relationship between youth-related variables and log
GDP per capita. The results indicate that the NEET variable is strongly and negatively
associated with economic development. A one percentage point increase in the share of
youth who are NEET is associated with an approximate 8.2% decrease in GDP per capita, a
statistically significant effect (p = 0.000). This suggests that disengaged youth represent a
substantial economic cost, consistent with broader findings in the development literature??.
Youth labour force participation is also negatively associated with ln GDP, though the
relationship is weaker. The coefficient implies that a one percentage point increase in
participation is associated with a 1.2% reduction in GDP per capita, marginally significant at
the 10% level (p = 0.087). This counterintuitive result may reflect the fact that youth
participation rates are often higher in lower-income economies due to weaker education
systems and informal labour markets, and thus may not directly reflect productive
employment however we will explore this in a later regression model.
Government spending on education shows a positive relationship but statistically
insignificant (p = 0.191), thus there is insufficient evidence to conclude that higher education
expenditure alone drives higher GDP per capita.
The model explains approximately 47.8% of the variation in log GDP (R² = 0.4783), which is
relatively strong for a baseline specification with only three predictors. This serves as a
useful starting point for the analysis, to be extended through the inclusion of other controls
variables in subsequent models.
Regression 2:
Model 2 builds upon the initial regression by incorporating macroeconomic control variables
that are known to influence GDP per capita: inflation, trade, and foreign direct investment
(FDI). These factors provide a broader context for understanding income differences across
countries. For example, FDI act as an injection into the economy, meaning that countries
with greater inward investment may generate higher levels of income. By including such
control variables, this model helps isolate the independent effects of the youth-related
variables, making their estimated coefficients more accurate. In other words, FDI may now
be capturing some of the variation previously attributed to the youth variables.
The share of youth NEET remains a key negative determinant of economic development.
Although the coefficient decreases slightly from Model 1, it remains highly significant (p =
0.000), with a one-point increase in NEET corresponding to a 4.2% decrease in GDP per
capita.
Government expenditure on education now shows a negative coefficient -0.0774 with
marginally significant p-value (0.064) at a 10% confidence level. While unexpected, this
result may suggest inefficiencies in how education funds are allocated or reflect a time lag
between spending and returns to output.
Youth labour force participation, while still negatively signed, is no longer statistically
significant (p = 0.181), and its effect weakens relative to the baseline model. This indicates
that its earlier marginal significance may have been confounded by omitted macroeconomic
factors.
Overall, the addition of control variables improves model, with the R² rising to 0.772. This
highlights the importance of control variables in explaining differences in GDP per capita
across countries. *Should I also briefly explain the coefficients of the control variables?
Regression 3:
Model 3 explains approximately 78% of the variation in log GDP per capita (R² = 0.7786),
adding inflation as an additional control variable. This improved our model with the
following changes. The NEET variable coefficient weakens from -4.2% to -3.8%, though it
remains statistically significant (p = 0.000). The effect of government expenditure on
education becomes mildly stronger (to -8.5% coefficient) and now significant at the 5% level
(p = 0.036). The coefficient on youth labour force participation remains roughly the same (at
-0.007) and remains statistically insignificant (p = 0.182).
Including inflation as a control helps capture macroeconomic conditions that influence GDP,
and its coefficient is both positive and significant, suggesting a meaningful association with
log GDP per capita in this model.
To examine whether the relationship between youth-related variables and GDP per capita
differs by development level, we now split the sample into high-income countries (Model 4)
and non-high-income countries (Model 5). We do this by adding high-income country
dummy variable to account for structural income group differences across the sample,
where high-income = 1 and non-high-income = 0. This allows for a clearer picture of how
these dynamics vary across economic contexts, particularly since earlier results suggested
income group may mediate these effects.
Government Expenditure on Education:
In non-high-income countries (Model 5), the coefficient is negative and statistically
significant (-0.120, p = 0.008), suggesting that higher education spending is
associated with lower GDP per capita. This could point to reverse causality, where
countries with lower income levels increase education spending in response to
underdevelopment. It may also reflect inefficiencies in education systems, or the
delayed retunes to human capital investments.
In high-income countries (Model 4), however, the effect is much smaller and
statistically insignificant (-0.031, p = 0.541), consistent with the idea that diminishing
returns to education investment may exist in already-developed systems.
While Models 4 and 5 suggested that the impact of youth-related variables may differ by
income group, these regressions did not formally test whether those differences were
statistically significant. Models 6, 7, and 8 address this by including interaction terms
between the income group dummy and each of the three youth-related variables. These
models retain the full sample and enable direct tests of whether the effects of education
spending, youth disengagement, and labour force participation vary significantly between
high- and non-high-income countries. To clarify, the individual youth variable coefficients
under these following models represent the dummy variable equalling to 0, which is non-
high income our case. And the interaction coefficient represents the difference in slope of
high-income countries to non-high income.
Overall, these interaction models confirm that the effects of our youth
variables are not uniform across all countries. While education spending
continues to show a negative and statistically significant association with
GDP per capita in non-high-income countries, there is no strong
evidence that this effect differs significantly in high-income settings.
Similarly, the NEET variable, although consistent in negative associated
with GDP, the interaction model does not provide statistically significant
evidence that this relationship varies across income groups. However,
labour force participation shows a strong and statistically meaningful
difference, with its impact on GDP significantly more positive in high-
income countries.
Section 3: Analysis of regression results
Goodness of fit
F-tests
Model 3
H 0 : β 1= β 2=β 3=0
H 1: β 1≠ 0∨β 2 ≠ 0∨β 3 ≠ 0
F stat :7.07
P−value :0.0003
Pologeorgis, N.A. (2023). Employability, the Labor Force, and the Economy. [online]
Investopedia. Available at:
https://www.investopedia.com/articles/economics/12/employability-labor-force-economy.asp