Lectures Merged Econometrics
Lectures Merged Econometrics
What is econometrics?
Typical goals of
Steps in empirical economic analysis Economic models
econometric analysis
§ Might be micro or macro models
§ Estimating relationships between economic variables
Economic model § Often use optimising behaviour and equilibrium modelling
§ Testing economic theories and hypotheses
(this step is often skipped)
§ Establish relationships between economic variables
§ Forecasting economic variables
§ Examples: demand equations, pricing equations, etc.
§ Evaluating and implementing government and business policy
Econometric model
§ Equation could have been postulated without economic modelling avgsen = the average sentence length after conviction.
Econometric model of job training
The structure of economic data Cross-sectional data sets
and worker productivity
Unobserved § Econometric analysis requires data. § Sample of individuals, households, organisations, cities, states,
countries or other units of interest at a given point of time/in a
determinants of
the wage e.g.
§ Different kinds of economic data sets: given period.
q Cross-sectional data
innate ability,
q Time series data
§ Cross-sectional observations are more or less independent;
quality of
education, etc. q for example, pure random sampling from a population.
q Pooled cross sections
§ Econometrics analysis deals with the specification of the error. q Panel/longitudinal data § Sometimes pure random sampling is violated; e.g. units refuse
to respond in surveys, or if sampling is characterised by
§ Econometric models may be used for hypothesis testing u; § Econometric methods depend on the nature of the data used. clustering.
q Use of inappropriate methods may lead to misleading results.
e.g. parameter 3 represents effect of training on wage. § Cross-sectional data are widely used in economics and other
social sciences.
How large is this effect? Is it different from zero?
Time series data on minimum wages Pooled cross sections Pooled cross sections on two years
and related variables § Two or more cross sections are combined in one data set. of fertility data
§ Cross sections are drawn independently of each other.
§ Pooled cross sections are often used to evaluate policy changes.
§ Examples: Year
Age of
q Evaluate the effect of education and age on the pattern of individual
fertility among women between 2001 and 2018.
q A random sample of fertility for the year 2001 Number
of kids
q A new random sample of fertility for the year 2018
Average min. Average Unemployment Gross national
wage for given coverage rate product
year rate
Two-year panel data on city
Panel or longitudinal data crime statistics Graphing data
§ The same cross-sectional units are followed over time.
§ Graphs can be a preliminary analysis of data.
§ Panel data:
§ Types of graphs:
q have a cross-sectional and a time series dimension
q Time series plots
q can be used to account for time-invariant unobservables q Bar charts
q can be used to model lagged responses. q Histograms
Graphing data: bar charts (cont.) Graphing data: histograms Graphing data: scatter plots
Proportion of household disposable income spent on gambling § Shows how frequently/infrequently certain values occur § Examines the relationship between two variables
for different fiscal years § Useful visual summary of the properties of the variable § Can highlight outliers in the data
Source: The World Bank: World Development Indicators, licensed under Creative Commons Attribution 4.0 licence
Causality and the notion of ceteris Causal effect of fertiliser
Measuring the return to education
paribus in econometric analysis on crop yield
If a person is chosen from the population and given another year of
§ Definition of causal effect on x and y: By how much will production of soybeans increase if you increase the education, by how much will her or his wage increase?
amount of fertiliser used?
q How does variable y change if variable x is changed § Implicit assumption: all other factors that influence wages, e.g.
but all other relevant factors are held constant? § Implicit assumption: all other factors influencing crop yield, e.g. experience, family background, intelligence etc., are held fixed.
quality of land, rainfall, presence of parasites etc., are held fixed.
§ Most economic questions are ceteris paribus questions.
§ Experiment:
§ Experiment:
q Choose a group of people; randomly assign different amounts
§ It is important to define which causal effect you are q Choose several one-acre plots of land; randomly assign different of education to them (infeasible!); compare wage outcomes.
interested in. amounts of fertiliser to the different plots; compare yields.
q The problem without random assignment is that the amount
§ It is useful to describe how an experiment would have to be q Experiment works because amount of fertiliser applied is of education is related to other factors that influence wages
designed to infer the causal effect in questions. unrelated to other factors influencing crop yields. (e.g. intelligence).
Populations, parameters and random sampling § A Bernoulli random variable can only take on values of zero
and one
q e.g. it is common to label 1 as a ‘success’ and 0 as a ‘failure’.
Properties of estimators
Random variables and their
Discrete random variables Probability density function (pdf)
probability distributions (cont.)
§ Can only take on a finite number of values
f(xj) = pj , j = 1, 2, …, k,
§ A Bernoulli random variable is an example of a discrete
random variable § For any real number x, f(x) is the probability that the random
variable X takes on the particular value x.
Discrete random variables q e.g. a randomly selected customer showing up for their reservation, where X=1 is the
customer showing up.
§ Given the pdf, it is simple to compute the probability of any
P(X = 1) = θ event if the random variable is known.
P(X = 0) = 1 – θ
q If θ = .75, then there is a 75% chance that the customer will show up.
Continuous random variables
Variance
§ Negative covariance indicate they move in opposite
directions.
§ Var.3: For constants a and b, § Var.4: If {X1 , … , Xn } are pairwise uncorrelated random
variables, and
Chapter 4
var(aX + bY) =a var(X) + b
2 2 var(Y) + 2ab cov(X,Y)
If X and Y are uncorrelated so that cov(X,Y) = 0 {ai : i = 1, … , n } are constants, then
THE SIMPLE REGRESSION
then var(X + Y) = var(X) + var(Y)
var(aiXi + ... + anXn) = !" var(X1) + ... + #" var(Xn).
MODEL
and
var(X − Y) = var(X) + var(Y).
Learning objectives Definition of the simple linear Definition of the simple linear
regression model regression model examples
The simple regression model
Example: house price and land size
The ordinary least squared estimates Explains variable y in terms of variable x Number of bathrooms,
Dependent bedrooms etc.
The algebraic properties of the fitted OLS variable Error term
Measures the effect of land size on
The implications of OLS estimators house price, holding all other factors fixed
Intercept Slope Independent
parameter variable Example: a simple wage equation Labour force experience,
The key assumptions SLR1 to SLR5 tenure with current employer,
work ethic, intelligence etc.
The interpretation of OLS Measures the change in hourly wage
given another year of education,
holding all other factors fixed
e.g. intelligence § This means that the average value of the dependent variable
can be expressed as a linear function of the explanatory
The conditional mean independence assumption is unlikely to variable.
hold because individuals with more education will also be more
intelligent on average.
Deriving the ordinary least Deriving the ordinary least Deriving the ordinary least
squares estimates squares estimates (cont. 1) squares estimates (cont. 2)
Fit as good as possible a regression line through the data points: Regression residuals
Deriving the ordinary least Examples of simple regression Examples of simple regression
squares estimates (cont. 3) obtained using real data obtained using real data (cont.)
CEO salary and return on equity
Fitted regression
Ordinary least squares minimise the sum of Salary in Return on equity line (depends on
squared residuals. $ thousands of CEO‘s firm sample)
Fitted regression:
Why not minimise absolute values of the residuals? Unknown
population
regression line
Causal interpretation?
Hourly Years of
wage in $ education
Fitted or Deviations from regression
Fitted regression: predicted values line (= residuals)
The wage increases by 8.3 % for Assumption SLR.2 (Random sampling) Assumption SLR.4 Zero conditional mean
every additional year of education
(= return to education) The data is a random sample The error has an expected value of zero given
drawn from the population. any value of the explanatory variable
The variance of the One could estimate the variance of the errors by calculating
unobserved determinants of
wages increases
the variance of the residuals in the sample; unfortunately this
with the level of education. estimate would be biased.
Conclusion:
The sampling variability of the estimated regression coefficients
will be higher, the larger the variability of the unobserved factors, An unbiased estimate of the error variance can be obtained by
and lower, the higher the variation in the explanatory variable. subtracting the number of estimated regression coefficients from
the number of observations.
Learning objectives
The simple regression model
Theorem 4.3 (Unbiasedness of the error variance)
Chapter 5
Motivation for multiple regression
This implies that other factors that affect wage are not related Assumes a constant elasticity relationship between CEO salary
The model has two explanatory variables: income and income squared. on educ and exper. and the sales of his or her firm.
Consumption is explained as a quadratic function of income.
One has to be very careful when interpreting the coefficients: When applied to the quadratic consumption function, it has a Assumes a quadratic relationship between CEO salary and his
different meaning. or her tenure with the firm.
By how much does consumption increase if
income is increased by one unit? E(u |inc, inc2) = E(u |inc) = 0 Meaning of ‘linear’ regression
residuals. Interpretation
consideration.
Holding EGM fixed: if an individual is unemployed, we predict
Ceteris paribus interpretation that the EGM expenditure will increase by $56.98.
It has still to be assumed that unobserved factors do not Holding unemployment fixed: for every additional EGM per
change if the explanatory variables are changed. 1000 adults in a particular area, this will increase expenditure
by $44.87.
Mechanics and interpretation Changing more than one independent Properties of OLS on any
of ordinary least squares (cont.) variable simultaneously sample of data
Example: Hourly wage equation Example (cont.): Hourly wage equation Fitted values and residuals
�
log() = .284 + .092educ + .0041exper + .022tenure
An individual stays at the same firm for another year.
Wages
Years of
education
Years of
experience
Years with
current
Exper and tenure increase by one year. Fitted or Residuals
Interpretation employer Holding educ fixed
predicted values
Holding exper and tenure fixed: another year of education is The total effect is: The sample average of the residuals is zero
�
predicted to increase wages by 9.2%. ∆log() = .0041 ∆exper + .022 ∆tenure = .0041 + .022 = .0261
The sample covariance between each independent variable
Holding educ and tenure fixed: another year of experience is Interpretation: and the OLS residuals is zero.
predicted to increase wages by 4.1%. Since exper and tenure each increase by 1 year and holding educ The point is always on the OLS regression line:
Holding educ and exper fixed: another year of tenure is predicted fixed, the estimated effect on wage when an individual stays at
to increase wages by 2.2%. the same firm is about 2.6%.
in a given sample, the estimates may still be far away from variance of the unobserved factors.
the true values. Example: wage equation
This assumption may also be
Total sample variation in R-squared from a regression of
hard to justify in many cases.
explanatory variable xj: explanatory variable xj on all
other independent variables
(including a constant)
Components of OLS variances Components of OLS variances (cont.) Estimating the error variance
1. The error variance
A high error variance increases the sampling variance because 3. Linear relationships among the independent variables
there is more ‘noise’ in the equation. Regress xj on all other independent variables (including a An unbiased estimate of error variance can be obtained by
substracting the number of estimated regression coefficients from
A large error variance necessarily makes estimates imprecise.
constant)
number of observations.
The R-squared of this regression will be the higher the better xj
The error variance does not decrease with sample size.
can be linearly explained by the other independent variables
The number of observations minus the number of estimated
2. The total sample variation in the explanatory variable parameters is also called the degrees of freedom.
The sampling variance of the slope estimator for xj will be higher
More sample variation leads to more precise estimates. when xj can be better explanied by the other indepdendent
The n estimated squared residuals in the sum are not completely
independent but related through the k+1 equations that define
Total sample variation automatically increases with the sample size. variables first-order conditions of the minimisation problem.
Increasing the sample size is thus a way to get more precise Under perfect multicollinearity, the variance of the slope Theorem 5.3 (Unbiased estimator of the error variance)
estimates. estimator will approach infinity.
Multiple regression analysis: Several scenarios for applying
Efficiency of OLS: the estimation
Gauss-Markov theorem multiple regression
Under assumptions MLR.1–MLR.5, OLS is unbiased. Theorem 5.4: Gauss-Markov
MLR.1–MLR.5 theorem Prediction
However, under these assumptions there may be many other MLR.1–MLR.5 The best prediction of y will be its conditional expectation
estimators that are unbiased.
Which one is the unbiased estimator with the smallest variance?
Efficient markets
In order to answer this question, limit yourself to linear estimators, Theorem 5.5: consistency of OLS
i.e. estimators linear in the dependent variable.
MLR.1–MLR.5 Efficient markets theory states that a single variable acts as a
sufficient statistic for predicting y. Once we know this
sufficient statistic, then additional information is not useful in
May be an arbitrary function of the sample predicting y.
values of all the explanatory variables; the OLS
estimator can be shown to be of this form. If E(y|w,x1, ... xk) = E(y|w), then w is a sufficient statistic.
if n-k-1 is large.
Testing against one-sided Example: wage equation Example: wage equation (cont.)
alternatives (greater than zero) Test whether, after controlling for education and tenure,
higher work experience leads to higher hourly wages. t statistic
Test H0 : βj = 0 against H1 : βj > 0
Reject the null hypothesis in favour of the Degrees of freedom;
alternative hypothesis if the estimated here the standard
coefficient is too large (i.e. larger than a normal approximation
critical value). applies
Construct the critical value so that, if the Standard errors
null hypothesis is true, it is rejected in, for
example, 5% of the cases. Critical values for the 5% and the 1% significance level
(these are conventional significance levels)
In the given example, this is the point of
the t distribution with 28 degrees of The null hypothesis is rejected because the t statistic
freedom that is exceeded in 5% of the exceeds the critical value.
cases.
One would expect either a positive effect of experience on The effect of experience on hourly wage is statistically greater than
Reject if t statistic greater than 1.701
hourly wage or no effect at all. zero at the 5% (and even at the 1%) significance level.
One would expect a higher incidence of super wealth with economic freedom.
Testing against two-sided
Testing more general hypotheses
alternatives Example: French-made cars
about a regression coefficient
Null hypothesis
Hypothesised value of the coefficient
Example: wage equation Example: hourly wage Computing p-values for t tests
§If the significance level is made smaller and smaller, there will be a
Test whether, after controlling for education and tenure, higher Whether the return to education is equal to 12%: point at which the null hypothesis cannot be rejected any more.
work experience leads to higher hourly wages.
§The reason is that, by lowering the significance level, you are
increasingly less likely to make the error of rejecting a correct H0.
Estimate is different from §The smallest significance level at which the null hypothesis is still
one, but is this difference rejected is called the p-value of the hypothesis test.
Standard errors statistically significant?
§A small p-value is evidence against the null hypothesis because you
would reject the null hypothesis even at small significance levels.
Test H0 : βexper = 0 against H1 : βexper > 0
§A large p-value is evidence in favour of the null hypothesis.
Expect a positive effect of experience on hourly wage, Hypothesis is rejected §p-values are more informative than tests at fixed significance levels.
or no effect at all. at the 1% level.
n = 477; R2 .633
Hypothesis cannot be rejected Level of public
t = –.0817/.0858 = – 0.952 at 10% level but can at 5% level social
Years under Security and enforcement expenditures
p-value = .342 relative to GDP None of these variables is statistically significant when tested individually
communist rule of private property
Confidence interval –.0817 ± .1682
This method always works for single linear hypotheses.
Idea: How would the model fit be if these variables were dropped from the regression?
Test whether performance measures have no effect/can be excluded from regression.
restrictions squared residuals when going from The likely reason is multicollinearity between them.
H1 to H0 follows a F distribution (if
the null hypothesis H0 is correct).
The R-squared form of the F statistic Example: parent’s education in Computing p-values for F tests
a birthweight equation
§Let denote an F random variable with (q,n – k – 1)
degrees of freedom.
§Used for testing exclusion restrictions §A small p-value is evidence against H0.
§Cannot be used for linear restrictions §Once the p-value has been computed, the F test can
The numerator df is 2, and the denominator df is 1185. be carried out.
The F statistic for overall significance Testing general linear Testing general linear
of a regression restrictions example restrictions example (cont.)
§ Estimate the unrestricted model, then impose restrictions
to obtain the restricted model.
§ Unrestricted model:
§ Restricted model:
§ Impose the restriction that the coefficient on x1 is unity,
estimate
Does this mean the return to student– This area can be ignored as Turnaround point:
experience becomes negative after teacher ratio it concerns only 1% of the
24.4 years? observations.
Not necessarily. It depends on how
Increase rooms from 5 to 6:
many observations in the sample lie to
the right of the turnaround point.
Does this mean that, at a low number of rooms, more
In the given example, these are about rooms are associated with lower prices? Increase rooms from 6 to 7:
28% of the observations. There may
be a specification problem (e.g.
omitted variables).
❑ where:
Omitted variable bias: Summarising the direction Omitted variable bias:
the simple case (cont. 1) of the bias more general cases
▪ Given the following model:
❑ Another model is estimated, but omits x3:
❑ Suppose x2 and x3 are not correlated but x1 and x3 are.
❑ The estimator for x1 and x2 will be biased unless x1 and x2
are uncorrelated.
❑ When x1 and x2 are uncorrelated, then:
Variances in misspecified models Functional form misspecification RESET as a general test for
functional form misspecifcation
▪ Not accounting for the relationship between the dependent and
the observed explanatory variables ▪ Regression Specification Error Test (RESET)
▪ For example, estimating the following model but omitting exper2 ▪ Null hypothesis of RESET is there is no misspecification in the model
▪ There is no evidence of functional form misspecification. ▪ A significant t statistic is a rejection of the linear model.
▪ Log-log model is preferred.
Multicollinearity
▪ Linear relationships between explanatory variables can create problems. Chapter 8
▪ High multicollinearity can occur when 2 is ‘close’ to 1.
MULTIPLE REGRESSION
▪ Best to have less correlation between xj and other independent variables.
▪ For example, examining the effect of various school expenditure ANALYSIS WITH
categories on school performance.
QUALITATIVE INFORMATION:
▪
BINARY (OR DUMMY)
It is expected that wealthier schools will spend more on everything than
less wealthy schools.
▪ It can be difficult to estimate the effect of any category on student
performance when there is little variation in one category.
VARIABLES
▪ We can include control variables to isolate causal effects.
Using dummy variables for multiple categories ▪ A way to incorporate qualitative information is to use dummy
variables (i.e. binary variables or zero–one variables).
Interaction involving dummy variables ❑ Example: female is a dummy variable, which takes the value
of 1 if the student is female, and zero otherwise.
A binary dependent variable: the linear probability model ❑ They may appear as the dependent or as independent
variables.
Interpreting regression results with discrete dependent variables
Alternative interpretation of
coefficient:
Intercept shift
Base group or benchmark group Estimated wage equation Example: effects of accessing online
When using dummy variables, one category always has to be omitted: with intercept shift subject materials on assignment mark
Assignment Dummy indicating whether student
mark reviewed online material
The base category are men
Using a dummy independent variable Using a dummy independent variable Example: housing price regression
to account for outliers to account for outliers (cont.)
▪ Outliers can occur when sampling from a population. ▪ It was found that a positive and significant relationship existed
▪ May be different in some relevant aspect from rest of population. between fiscal stimulus and performance.
▪ Example: Did countries that adopted large fiscal packages ▪ Create a dummy variable ‘DUM’ for observations that are Greece, Dummy indicating
whether house is
outperform those that didn’t? Hungary, Iceland and Ireland. of colonial style
Dummy variables This specification would probably not be appropriate, as the credit rating only contains
▪ We can examine the wages difference between single and married
are circled in red. ordinal information. A better way to incorporate this information is to define dummies: females and males respectively.
▪ We can test the null hypothesis that the gender differential does not
depend on marital status.
Holding other things fixed, the proportion of
Fords registered in Victoria is 15.4% higher Dummies indicating whether the particular rating applies; e.g. CR1=1 if CR=1 and CR1=0
than in WA (= the base category). otherwise. All effects are measured in comparison to the worst rating (= base category).
Example: log hourly wage equation
Allowing for different slopes Allowing for different slopes (cont.)
Allowing for different slopes Interaction term
Example: log hourly wage equation Testing for differences in regression Testing for differences in regression
functions across groups functions across groups (cont.)
Unrestricted model (contains full set of interactions) Null hypothesis All interaction effects are zero,
Assignment = 1 if viewed online material Student‘s mark i.e. the same regression
mark = 0 otherwise in test coefficients apply to men and
women.
Estimation of the unrestricted model
The base case if European as ▪ The linear probability model is necessarily heteroscedastic.
scrap rate training grant
= 0 otherwise
the dummy variable for
Variance of Bernoulli variable No apparent effect
ethnicity is 0.
Negative predicted probability
▪ Heteroscedasticity-consistent standard errors need to be computed.
▪ Treatment group: Grant receivers
of grant on
productivity
if income < $6200. Advantages ▪ Control group: Firms that received no grant
Sample has income ranging
from $5000 to $50 000.
▪ Easy estimation and interpretation ▪ Grants were given on a first-come, first-served basis. This is not
▪ Estimated effects and predictions are often reasonably good in the same as giving them out randomly. Firms with less-productive
practice. workers may have seen an opportunity to improve productivity
and applied first.
ADDRESSING THE PROBLEM OF ADDRESSING THE PROBLEM OF Interpreting regression results with
SELF-SELECTION SELF-SELECTION (CONT.) discrete dependent variables
▪ Consider the simple regression: ▪ To illustrate the issues, use the following example:
▪
❑ A more convincing case is to include covariates x1 through xj.
▪ If we take the coefficient of educ literally, every additional
For example, children eligible for a program like Head Start
participate based on parental decisions. We thus need to control
▪ Now we assume that w is independent of [y(0),y(1)] conditional year of education reduces the estimated number of children
upon x1 through xj. by 0.09.
for things like family background and structure to get closer to
random assignment into the treatment (participates in Head Start) ❑ This is known as regression adjustment and allows us to ▪ We have to interpret/summarise this information differently.
and control (does not participate) groups. adjust for differences across units in estimating the causal ▪ If each woman in a group of 100 obtains an additional year
effect of the treatment. of education, there will be nine fewer children among them.
* Results vary.
Learning objectives Consequences of heteroscedasticity Heteroscedasticity-robust inference
for OLS after OLS estimation
Consequences of heteroscedasticity for OLS
▪ OLS still unbiased and consistent under heteroscedasticity. ▪ Formulas for OLS standard errors and related statistics have been
developed that are robust to heteroscedasticity of unknown form.
▪ Also, interpretation of R-squared is not changed.
Heteroscedasticity-robust inference after OLS estimation
▪ All formulas are only valid in large samples.
Unconditional error variance is unaffected
Testing for heteroscedasticity
by heteroscedasticity (which refers to the ▪ Formula for heteroscedasticity-robust OLS standard error
conditional error variance).
Also called White/Eicker standard errors.
The linear probability model revisited ▪ Under heteroscedasticity, OLS is no longer the best linear unbiased ▪ Using these formulas, the usual t test is valid asymptotically.
estimator (BLUE); there may be more efficient linear estimators. ▪ The usual F statistic does not work under heteroscedasticity, but
heteroscedasticity-robust versions are available in most software.
Multiple regression analysis: Testing for heteroscedasticity Testing for heteroscedasticity (cont.)
heteroscedasticity ▪ It may still be interesting whether there is heteroscedasticity, Breusch-Pagan test for heteroscedasticity (cont.)
Example: Fords – total registration because then OLS may not be the most efficient estimator
any more. Regress squared residuals on all
explanatory variables and test
whether this regression has
Breusch-Pagan test for heteroscedasticity explanatory power.
Heteroscedasticity-robust standard errors may be larger or smaller than A large test statistic (= a high
non-robust counterparts. The differences are often small in practice. R-squared) is evidence against
Under MLR.4 the null hypothesis.
Example: heteroscedasticity in Testing for heteroscedasticity (cont.) Testing for heteroscedasticity (cont.)
housing price equations The White test for heteroscedasticity
Alternative form of the White test
Discussion ▪ If OLS and WLS produce very different estimates, this typically
Smoking restrictions in indicates that some other assumptions (e.g. MLR.4‘) are wrong.
restaurants The income elasticity is now statistically significant; other
coefficients are also more precisely estimated (without ▪ If there is strong heteroscedasticity, it is still often better to use a
Reject wrong form of heteroscedasticity in order to increase efficiency.
homoscedasticity
changing qualititative results).
The linear probability model
revisited
WLS in the linear probability model