0% found this document useful (0 votes)
8 views

Lectures Merged Econometrics

Econometrics uses statistical methods to analyze economic data. It typically analyzes non-experimental observational data. The goals of econometric analysis include estimating relationships between economic variables, testing economic theories, forecasting variables, and evaluating policy. Econometric models specify functional forms and approximate variables to represent economic relationships based on data. Different types of economic data include cross-sectional, time series, pooled cross-sections, and panel data. The structure of the data impacts the appropriate econometric methods.

Uploaded by

holyjobles3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lectures Merged Econometrics

Econometrics uses statistical methods to analyze economic data. It typically analyzes non-experimental observational data. The goals of econometric analysis include estimating relationships between economic variables, testing economic theories, forecasting variables, and evaluating policy. Econometric models specify functional forms and approximate variables to represent economic relationships based on data. Different types of economic data include cross-sectional, time series, pooled cross-sections, and panel data. The structure of the data impacts the appropriate econometric methods.

Uploaded by

holyjobles3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Learning objectives

What is econometrics?

Chapter 1 The nature and scope of econometrics


Econometrics: use of statistical methods to analyse
THE NATURE OF Steps in empirical economic analysis economic data

ECONOMETRICS AND The structure of economic data


ECONOMIC DATA Graphing data
Econometricians: typically analyse non-experimental
(observational) data

The notions of ceteris paribus and causal inference

Typical goals of
Steps in empirical economic analysis Economic models
econometric analysis
§ Might be micro or macro models
§ Estimating relationships between economic variables
Economic model § Often use optimising behaviour and equilibrium modelling
§ Testing economic theories and hypotheses
(this step is often skipped)
§ Establish relationships between economic variables
§ Forecasting economic variables
§ Examples: demand equations, pricing equations, etc.
§ Evaluating and implementing government and business policy

Econometric model

Economic model of crime Model of job training and Econometric model of


(Becker [1968]) worker productivity criminal activity
§ Derives equation for criminal activity based on utility maximisation § What is the effect of additional training on worker productivity? § The functional form has to be specified.
§ Formal economic theory is not really needed to derive equation: § Variables may have to be approximated by other quantities.
age = age of individual
crime = some measure of the frequency of criminal activity Unobserved
wagem = the wage that can be earned in legal employment determinants of
criminal activity e.g.
othinc = the income from other sources (assets, inheritance
and so on)
moral character,
§ These factors are most important, but could there be other factors? freqar = the frequency of arrests from prior infractions
wage in criminal
activity, family
(to approximate the probability of arrest)
§ Functional form of relationship not specified freqconv = the frequency of conviction, and
background, etc.

§ Equation could have been postulated without economic modelling avgsen = the average sentence length after conviction.
Econometric model of job training
The structure of economic data Cross-sectional data sets
and worker productivity
Unobserved § Econometric analysis requires data. § Sample of individuals, households, organisations, cities, states,
countries or other units of interest at a given point of time/in a
determinants of
the wage e.g.
§ Different kinds of economic data sets: given period.
q Cross-sectional data
innate ability,
q Time series data
§ Cross-sectional observations are more or less independent;
quality of
education, etc. q for example, pure random sampling from a population.
q Pooled cross sections

§ Econometrics analysis deals with the specification of the error. q Panel/longitudinal data § Sometimes pure random sampling is violated; e.g. units refuse
to respond in surveys, or if sampling is characterised by
§ Econometric models may be used for hypothesis testing u; § Econometric methods depend on the nature of the data used. clustering.
q Use of inappropriate methods may lead to misleading results.
e.g. parameter 3 represents effect of training on wage. § Cross-sectional data are widely used in economics and other
social sciences.
How large is this effect? Is it different from zero?

Cross-sectional data set on wages Cross-sectional data on growth rates


Time series data
and other characteristics and country characteristics § Observations of a variable or several variables over time
q for example, stock prices, money supply, consumer price
index, gross domestic product, annual homicide rates, car
sales
§ Time series observations are typically serially correlated.
§ Ordering of observations conveys important information.
§ Data frequency: daily, weekly, monthly, quarterly, annually etc.
§ Typical features of time series: trends and seasonality
Growth rate Government Adult secondary
Observation Hourly Indicator variables of real per consumption education rates § Typical applications: applied macroeconomics and finance
number wage (1=yes, 0=no) capita GDP as % of GDP

Time series data on minimum wages Pooled cross sections Pooled cross sections on two years
and related variables § Two or more cross sections are combined in one data set. of fertility data
§ Cross sections are drawn independently of each other.
§ Pooled cross sections are often used to evaluate policy changes.
§ Examples: Year
Age of
q Evaluate the effect of education and age on the pattern of individual
fertility among women between 2001 and 2018.
q A random sample of fertility for the year 2001 Number
of kids
q A new random sample of fertility for the year 2018
Average min. Average Unemployment Gross national
wage for given coverage rate product
year rate
Two-year panel data on city
Panel or longitudinal data crime statistics Graphing data
§ The same cross-sectional units are followed over time.
§ Graphs can be a preliminary analysis of data.
§ Panel data:
§ Types of graphs:
q have a cross-sectional and a time series dimension
q Time series plots
q can be used to account for time-invariant unobservables q Bar charts
q can be used to model lagged responses. q Histograms

§ Example: q Scatter plots

q City crime statistics, where each city is observed in two years


q Time-invariant unobserved city characteristics may be modelled
q Effect of police on crime rates may exhibit time lag Each city has two time Number of Number of
series observations police in 1986 police in 1990

Graphing data: Graphing data:


Graphing data: bar charts
time series plots time series plots (cont.)
§ Observations are at equally spaced time intervals. § Represents the FIGURE 1.3 WORKING DAYS LOST DUE TO INDUSTRIAL DISPUTES IN
value of an AUSTRALIA: BAR CHART
§ Useful for examining a single, or group of, time series.
observation in a
§ Can help display different and important characteristics Appears to have single series
q such as outliers and trend/cyclical behaviour.
a regular peak
and trough
§ Can combine
different data sets
§ Seasonality is cyclical behaviour that occurs on a regular together to
calendar basis compare data
visually
q e.g. high ice cream sales in summer; low unemployment
at ski resorts in winter.

Graphing data: bar charts (cont.) Graphing data: histograms Graphing data: scatter plots
Proportion of household disposable income spent on gambling § Shows how frequently/infrequently certain values occur § Examines the relationship between two variables
for different fiscal years § Useful visual summary of the properties of the variable § Can highlight outliers in the data

§ Illustrates the range of the data and any outliers

Source: The World Bank: World Development Indicators, licensed under Creative Commons Attribution 4.0 licence
Causality and the notion of ceteris Causal effect of fertiliser
Measuring the return to education
paribus in econometric analysis on crop yield
If a person is chosen from the population and given another year of
§ Definition of causal effect on x and y: By how much will production of soybeans increase if you increase the education, by how much will her or his wage increase?
amount of fertiliser used?
q How does variable y change if variable x is changed § Implicit assumption: all other factors that influence wages, e.g.
but all other relevant factors are held constant? § Implicit assumption: all other factors influencing crop yield, e.g. experience, family background, intelligence etc., are held fixed.
quality of land, rainfall, presence of parasites etc., are held fixed.
§ Most economic questions are ceteris paribus questions.
§ Experiment:
§ Experiment:
q Choose a group of people; randomly assign different amounts
§ It is important to define which causal effect you are q Choose several one-acre plots of land; randomly assign different of education to them (infeasible!); compare wage outcomes.
interested in. amounts of fertiliser to the different plots; compare yields.
q The problem without random assignment is that the amount
§ It is useful to describe how an experiment would have to be q Experiment works because amount of fertiliser applied is of education is related to other factors that influence wages
designed to infer the causal effect in questions. unrelated to other factors influencing crop yields. (e.g. intelligence).

Effect of law enforcement Effect of the minimum wage on Testing predictions of


on city crime level unemployment economic theories
If a city is randomly chosen and given ten additional police officers, By how much (if at all) will unemployment increase if the minimum
by how much would its crime rate fall? wage is increased by a certain amount (holding other things fixed)?
§ Economic theories are not always stated in terms of causal effects.
e.g. the expectations hypothesis states that long-term interest
§ Alternatively: If two cities are the same in all respects, except § Experiment:
q
rates equal compounded expected short-term interest rates:
that one city has ten more police officers, by how much would q The government randomly chooses a minimum wage each
the two cities crime rates differ? year and observes unemployment outcomes.
§ Experiment: q The experiment will work because the level of minimum wage § An implication is that the interest rate of a 3-month T-bill should
q Randomly assign a number of police officers to a large is unrelated to other factors determining unemployment. be equal to the expected interest rate for the first three months
number of cities. q In reality, the level of the minimum wage will depend on of a 6-month T-bill; this can be tested using econometric methods.
q In reality, the number of police officers is determined by political and economic factors that also influence
crime rate (simultaneous determination of crime and number unemployment.
of police).

Learning objectives Random variables and their


probability distributions
Random variables and their probability distributions
§ An experiment is any procedure that can be repeated and
has a well-defined set of outcomes
Chapter 3 Features of probability distributions q e.g. flipping a coin 10 times.

REVIEW OF PROBABILITY AND Feature of joint and conditional distributions


§ A random variable takes an observed random value
e.g. the number of times heads appears in 10 coin-flips.
STATISTICS
q

Populations, parameters and random sampling § A Bernoulli random variable can only take on values of zero
and one
q e.g. it is common to label 1 as a ‘success’ and 0 as a ‘failure’.
Properties of estimators
Random variables and their
Discrete random variables Probability density function (pdf)
probability distributions (cont.)
§ Can only take on a finite number of values
f(xj) = pj , j = 1, 2, …, k,
§ A Bernoulli random variable is an example of a discrete
random variable § For any real number x, f(x) is the probability that the random
variable X takes on the particular value x.
Discrete random variables q e.g. a randomly selected customer showing up for their reservation, where X=1 is the
customer showing up.
§ Given the pdf, it is simple to compute the probability of any
P(X = 1) = θ event if the random variable is known.
P(X = 0) = 1 – θ
q If θ = .75, then there is a 75% chance that the customer will show up.
Continuous random variables

Example: the pdf of the number of free Joint distributions, conditional


Continuous random variables
throws made out of two attempts § Take on any real value of zero probability
distributions and independence
P(X ≥ 1) = P(X = 1) + P(X = 2) = .44 + .36 = .80 § Use the pdf of a continuous random variable only to compute § (X,Y) is a joint distribution.
events involving a range of values P(a ≤ X ≤ b). § The joint pdf (X,Y) is fX,Y(x,y) = P(X = x,Y = y).
§ Joint pdf are independent if fX,Y(x,y) = fX(x)fY(y).
§ Example: A basketball player shoots two free throws. X = 1 if
she makes the first shot. Y = 1 if she makes the second shot.
There is an 80% chance of getting it in. What is the probability
of making both free throws?
P(X = 1) = P(Y = 1) = .8
P(X = 1,Y = 1) = P(X = 1)P(Y = 1) = (.8)(.8) = .64

Example conditional distribution:


Conditional distributions Features of probability distributions
basketball free throw
§How does X affect Y? The expected value
fnx (y|x) = fX,Y (x,y) /fX(x) Assume: fY|X(1|1) = .85, fY|X(0|1) = .15
§Discrete conditional distribution fY|X(1|0) = .70, fY|X(0|0) = .30
fY|X(y|x) = P(Y = y|X = x) Find: P(X = 1,Y = 1) = P(Y = 1|X = 1).P(X = 1) = (.85)(.8) = .68. Variance and standard deviation
§If X and Y are independent random variables, knowing X will Joint pdf table:
not tell us anything about the probability of Y occurring;
that is,
fY|X(y|x) = fY (Y) and fX|Y (x|y) = fX (X).
Measures of variability:
Expected value Properties of expected value variance
Discrete random variable:
§ E.1: E(c) = c Variance:
§ E.2: E(aX + b) = aE(X) + b σ2 = E[(X – μ)2] = E(X2 – 2Xμ + μ2) = E(X2) – 2μ 2 + μ 2 = E(X2) – μ 2

Continuous random variable:


§ E.3: E(a1X1 + a2X2 + … + anXn) = a1E(X1) + a2E(X2) + … + anE(Xn).
Properties of variance:
Var.1: The variance of a constant is 0. Var(c) = 0
Var.2: var(aX + b) = a2var(X)

Measures of variability: Features of joint and Measure of association:


standard deviation conditional distributions covariance
The linear dependence between two random variables
Covariance

Properties of standard deviation:


§ Sd.1: sd(c) = 0 Correlation
§ A positive covariance indicates that two random variables
§ Sd.2: sd(aX + b) = |a|sd(X) move in the same direction.

Variance
§ Negative covariance indicate they move in opposite
directions.

Properties of covariance Correlation coefficient Properties of correlation correlation


§ Correlation coefficient between X and Y is sometimes
§ Corr.1: –1 ≤ corr(X,Y ) ≤ 1
§ Cov.1: If X and Y are independent, then cov(X,Y) = 0. denoted as ρXY If corr(X,Y ) = 0, then there is no linear relationship between
X and Y.
§ Cov.2: For any constants a1, b1, a2 and b2
cov(a1X + b1, a2Y + b2) = a1a2cov(X,Y).
§ Corr.2: For constants a1, b1, a2, and b2, with a1a2 > 0,
corr(a1X + b1,a2Y + b2) = corr(X,Y).
§ Cov.3: |cov(X,Y)| ≤ sd(X)sd(Y). § A measure that determines the degree to which two
variables’ movements are associated If a1a2 < 0, then
corr(a1X + b1,a2Y + b2) = −corr(X,Y).
Variance of sums of random
Variance of sums of random variables variables (cont.)

§ Var.3: For constants a and b, § Var.4: If {X1 , … , Xn } are pairwise uncorrelated random
variables, and
Chapter 4
var(aX + bY) =a var(X) + b
2 2 var(Y) + 2ab cov(X,Y)
If X and Y are uncorrelated so that cov(X,Y) = 0 {ai : i = 1, … , n } are constants, then
THE SIMPLE REGRESSION
then var(X + Y) = var(X) + var(Y)
var(aiXi + ... + anXn) = !" var(X1) + ... + #" var(Xn).
MODEL
and
var(X − Y) = var(X) + var(Y).

Learning objectives Definition of the simple linear Definition of the simple linear
regression model regression model examples
The simple regression model
Example: house price and land size
The ordinary least squared estimates Explains variable y in terms of variable x Number of bathrooms,
Dependent bedrooms etc.
The algebraic properties of the fitted OLS variable Error term
Measures the effect of land size on
The implications of OLS estimators house price, holding all other factors fixed
Intercept Slope Independent
parameter variable Example: a simple wage equation Labour force experience,
The key assumptions SLR1 to SLR5 tenure with current employer,
work ethic, intelligence etc.
The interpretation of OLS Measures the change in hourly wage
given another year of education,
holding all other factors fixed

When is there a causal Population regression Population regression


interpretation? function (PFR) function (PFR) (cont.)
Conditional mean independence assumption
§ The conditional mean independence assumption implies that:
The explanatory variable must not
contain information about the mean
of the unobserved factors. Population regression
function
Example: wage equation

e.g. intelligence § This means that the average value of the dependent variable
can be expressed as a linear function of the explanatory
The conditional mean independence assumption is unlikely to variable.
hold because individuals with more education will also be more
intelligent on average.
Deriving the ordinary least Deriving the ordinary least Deriving the ordinary least
squares estimates squares estimates (cont. 1) squares estimates (cont. 2)
Fit as good as possible a regression line through the data points: Regression residuals

e.g. the i-th Fitted


data point (xi ,yi ) regression line
Minimise sum of squared regression residuals

Ordinary Least Squares (OLS) estimates

Deriving the ordinary least Examples of simple regression Examples of simple regression
squares estimates (cont. 3) obtained using real data obtained using real data (cont.)
CEO salary and return on equity

Fitted regression
Ordinary least squares minimise the sum of Salary in Return on equity line (depends on
squared residuals. $ thousands of CEO‘s firm sample)

Fitted regression:
Why not minimise absolute values of the residuals? Unknown
population
regression line

Intercept If return on equity increases by 1%


then salary is predicted to change by $18 501.

Causal interpretation?

Properties of OLS on any Example: CEO salary and return


The simple regression model
sample of data on equity
Wage and education

Hourly Years of
wage in $ education
Fitted or Deviations from regression
Fitted regression: predicted values line (= residuals)

In the sample, one more year of education


Intercept e.g. CEO number 11‘s salary
was associated with an increase in hourly
Deviations from Correlation between Sample averages of was $875.372 lower than
wage by $1.29
Causal interpretation? regression line sum deviations and y and x lie on predicted using information
to zero regressors is zero regression line on his firm‘s return on equity
Goodness-of-fit Goodness-of-fit measure
Decomposition of total variation
How well does the explanatory variable explain (R-squared)
the dependent variable?
Total Explained Unexplained
Measures of variation variation part part
§R-squared measures the fraction of the total variation that is
explained by the regression.
Proving SST = SSE + SSR
Intrepretation of R-squared

Total sum of squares, Explained sum of squares, Residual sum of squares,


§The fraction (percentage) of the sample variation in y that is
represents total variation represents variation represents variation not explained by x.
in dependent variable explained by regression explained by regression
§What does a low/high R-squared value mean?

The effect of changing units of Incorporating nonlinearities:


CEO Salary and return on equity
measurement on OLS Statistics semi-logarithmic form
Regression of log wages on years of education

§Salary was measured in thousands of dollars Natural logarithm of wage


The regression explains only 1.3 % ($857 000 = $857), ROE was measured in percentage.
of the total variation in salaries
§What would happen if salary were measured in dollars and This changes the interpretation of the regression coefficient:
Caution: Using R-squared as the main gauge of success in ROE in decimals?
econometrics can lead to trouble. §Would R-squared be affected?

Fitted regression Unbiasedness of OLS: Unbiasedness of OLS:


assumptions assumptions (cont.)
Assumption SLR.1 (Linear in parameters) Assumption SLR.3 sample variation in the explanatory variable
In the population, the relationship The sample outcomes
between y and x is linear. on x are not all the same

The wage increases by 8.3 % for Assumption SLR.2 (Random sampling) Assumption SLR.4 Zero conditional mean
every additional year of education
(= return to education) The data is a random sample The error has an expected value of zero given
drawn from the population. any value of the explanatory variable

Each datum point therefore


Growth rate of wage is 8.3% follows the population
per year of education equation. If any of the assumptions fail, then the OLS estimators will be biased.
Graphical illustration of
Estimating unbiased OLS estimators Variances of the OLS estimators
homoscedasticity
§ Depending on the sample, the estimates will be nearer or
farther away from the true population values.
§ How far can we expect our estimates to be away from the The variability of the
true population values on average? unobserved influences does
§ Sampling variability is measured by the estimator‘s not dependent on the value
of the explanatory variable.
variances.
The question is what the estimators will estimate on average
and how large their variability in repeated samples is: Assumption SLR.5 (Homoscedasticity)

§The value of the explanatory variable must contain no


information about the variability of the unobserved factors.

An example for heteroscedasticity: Theorem 2.2: variances of


Estimating the error variance
wage and education OLS estimators
Under assumptions SLR.1 – SLR.5:

The variance of the One could estimate the variance of the errors by calculating
unobserved determinants of
wages increases
the variance of the residuals in the sample; unfortunately this
with the level of education. estimate would be biased.

Conclusion:
The sampling variability of the estimated regression coefficients
will be higher, the larger the variability of the unobserved factors, An unbiased estimate of the error variance can be obtained by
and lower, the higher the variation in the explanatory variable. subtracting the number of estimated regression coefficients from
the number of observations.

Learning objectives
The simple regression model
Theorem 4.3 (Unbiasedness of the error variance)
Chapter 5
Motivation for multiple regression

Mechanics and interpretation of OLS


Calculation of standard errors for regression coefficients
MULTIPLE REGRESSION
ANALYSIS: ESTIMATION
Properties of the OLS estimators

Comments on the language of multiple regression analysis


The estimated standard deviations of the regression coefficients
are called standard errors. They measure how precisely the
regression coefficients are estimated. Scenarios for applying multiple regression
Example: number of electronic
Definition of the multiple linear gaming machines (EGM) and EGM
Motivation for multiple regression
regression model  Incorporate more explanatory factors into the model expenditure
Explains variable y in terms of variables x1, x2, …, xk  Explicitly hold fixed other factors that otherwise would be in u
 Allow for more flexible functional forms
Slope parameters, holding all else constant EGM Number Percentage of people Other factors
Example: wage equation expenditure of EGM unemployed
Now measures effect of
Dependent education explicitly holding
experience fixed
 The greater the number of EGMs within a local government would
variable
lead to increased EGM expenditure.
Intercept Independent Error term
All other factors  Omitting unemployment in the regression would lead to a biased
variables Hourly Years of Labour market estimate of the effect of EGM expenditure, as research has shown
wage education experience that EGMs are more likely to be placed in socioeconomically
disadvantaged regions.

Example: family income and Models with k independent variables


Motivation for multiple regression
family consumption Example: CEO salary, sales and CEO tenure
 In multiple regressions, the key assumption about how u is
related to x1 and x2 is E(u|x1, x2) = 0.
Family
consumption
Family
income
Family income
squared
Other factors
 For the education example: E(u|educ, exper) = 0 Log of CEO salary Log sales Quadratic function of CEO tenure with firm

 This implies that other factors that affect wage are not related  Assumes a constant elasticity relationship between CEO salary
 The model has two explanatory variables: income and income squared. on educ and exper. and the sales of his or her firm.
 Consumption is explained as a quadratic function of income.
 One has to be very careful when interpreting the coefficients:  When applied to the quadratic consumption function, it has a  Assumes a quadratic relationship between CEO salary and his
different meaning. or her tenure with the firm.
By how much does consumption increase if
income is increased by one unit? E(u |inc, inc2) = E(u |inc) = 0 Meaning of ‘linear’ regression

Depends on how much income is already there.


 The model has to be linear in the parameters (not in the
variables).

Mechanics and interpretation of Interpretation of the Mechanics and interpretation


ordinary least squares multiple regression model of ordinary least squares
By how much does the dependent variable change
Obtaining the OLS estimators if the j-th independent variable is increased by one Example: Determinants of EGM expenditure
 Minimise the sum of squared residuals: unit, holding all other independent variables and
the error term constant?

 The multiple linear regression model manages to hold the


EGM expenditure Number of Unemployment
values of other explanatory variables fixed even if, in reality,
 Choose the estimators that minimise the the sum of squared they are correlated with the explanatory variable under
per adult EGM status

residuals. Interpretation
consideration.
 Holding EGM fixed: if an individual is unemployed, we predict
 Ceteris paribus interpretation that the EGM expenditure will increase by $56.98.
 It has still to be assumed that unobserved factors do not  Holding unemployment fixed: for every additional EGM per
change if the explanatory variables are changed. 1000 adults in a particular area, this will increase expenditure
by $44.87.
Mechanics and interpretation Changing more than one independent Properties of OLS on any
of ordinary least squares (cont.) variable simultaneously sample of data
Example: Hourly wage equation Example (cont.): Hourly wage equation Fitted values and residuals

log() = .284 + .092educ + .0041exper + .022tenure
 An individual stays at the same firm for another year.
Wages
Years of
education
Years of
experience
Years with
current
 Exper and tenure increase by one year. Fitted or Residuals
Interpretation employer  Holding educ fixed
predicted values

 Holding exper and tenure fixed: another year of education is The total effect is:  The sample average of the residuals is zero

predicted to increase wages by 9.2%. ∆log() = .0041 ∆exper + .022 ∆tenure = .0041 + .022 = .0261
 The sample covariance between each independent variable
 Holding educ and tenure fixed: another year of experience is Interpretation: and the OLS residuals is zero.
predicted to increase wages by 4.1%.  Since exper and tenure each increase by 1 year and holding educ  The point is always on the OLS regression line:
 Holding educ and exper fixed: another year of tenure is predicted fixed, the estimated effect on wage when an individual stays at
to increase wages by 2.2%. the same firm is about 2.6%.

‘Partialling out’ interpretation of


Goodness-of-fit Example: explaining arrest records
multiple regression
Decomposition of total variation Number of times Proportion of prior arrests Months in Quarters
The estimated coefficient of an explanatory variable in a multiple arrested 1986 that led to conviction prison 1986 employed 1986
regression can be obtained in two steps:
1. Regress the explanatory variable on all other explanatory variables. Notice that R-squared can
R-squared only increase if another
2. Regress y on the residuals from this regression. explanatory variable is
added to the regression.
Why does this procedure work?
Alternative expression for R-squared Interpretation:
 The residuals from the first regression are the part of the explanatory R-squared is equal to
variable that is uncorrelated with the other explanatory variables. the squared correlation  Proportion prior arrests +0.5! –.075 = –7.5 arrests per 100 men
coefficient between the
 The slope coefficient of the second regression therefore represents actual and the predicted  Months in prison +12! –.034(12) = –0.408 arrests for given man
values of the dependent
the isolated effect of the explanatory variable on the dependent
variable.
variable.  Quarters employed +1! –.104 = –10.4 arrests per 100 men

Example: explaining arrest records


Standard assumptions for the
(cont.) Adjusted R-squared/R-bar squared
multiple regression model
An additional explanatory variable is added:
Assumption MLR.1: linear in parameters
In the population, the
Average sentence in relationship between y and the
prior convictions explanatory variables is linear.
R-squared increases It imposes a penalty for adding additional independent
only slightly
Assumption MLR.2: random sampling
variables to a model, as R-squared can never fall when a new
Interpretation: The data is a random sample
independent variable is added. drawn from the population.
 Average prior sentence increases number of arrests (?)
 Limited additional explanatory power, as R-squared increases by little.
General remark on R-squared Each data point therefore follows
 Even if R-squared is small (as in the given example), regression may the population equation.
still provide good estimates of ceteris paribus effects.
Standard assumptions for the Examples of perfect Assumption MLR.4:
multiple regression model (cont.) collinearity zero conditional mean
Assumption MLR.3: no perfect collinearity Example of perfect collinearity: not perfectly correlated
In the sample (and therefore in the population), none of the independent The value of the explanatory variables
variables is constant and there are no exact relationships among the must contain no information about
independent variables. the mean of the unobserved factors.
We anticipate that there is a relationship between EGM and unemployment,
Remarks on MLR.3
as LGAs with high levels of unemployment tend to have more EGMs.
 In a multiple regression model, the zero conditional mean
 The assumption only rules out perfect collinearity/correlation Example of perfect collinearity: multiples of explanatory variables assumption is much more likely to hold because fewer things
between explanatory variables; imperfect correlation is allowed. end up in the error.
 If an explanatory variable is a perfect linear combination of other
explanatory variables, it is superfluous and may be eliminated.
Using basic log properties:
 Constant variables are also ruled out (collinear with intercept).

Unbiasedness of the OLS estimators Theorem 5.2: sampling variances of


Theorem 5.1: (cont.)
unbiasedness of OLS OLS slope estimators
Assumption MLR.5: homoscedasticity
The error u has the same variance given any value of the Under assumptions MLR.1 – MLR.5:
explanatory variables
Variance of the
The value of the explanatory variables
 Unbiasedness is an average property in repeated samples; must contain no information about the
error term

in a given sample, the estimates may still be far away from variance of the unobserved factors.
the true values. Example: wage equation
This assumption may also be
Total sample variation in R-squared from a regression of
hard to justify in many cases.
explanatory variable xj: explanatory variable xj on all
other independent variables
(including a constant)

Components of OLS variances Components of OLS variances (cont.) Estimating the error variance
1. The error variance
 A high error variance increases the sampling variance because 3. Linear relationships among the independent variables
there is more ‘noise’ in the equation.  Regress xj on all other independent variables (including a  An unbiased estimate of error variance can be obtained by
substracting the number of estimated regression coefficients from
 A large error variance necessarily makes estimates imprecise.
constant)
number of observations.
  The R-squared of this regression will be the higher the better xj
The error variance does not decrease with sample size.
can be linearly explained by the other independent variables
 The number of observations minus the number of estimated
2. The total sample variation in the explanatory variable parameters is also called the degrees of freedom.
 The sampling variance of the slope estimator for xj will be higher

 More sample variation leads to more precise estimates. when xj can be better explanied by the other indepdendent
The n estimated squared residuals in the sum are not completely
independent but related through the k+1 equations that define
 Total sample variation automatically increases with the sample size. variables first-order conditions of the minimisation problem.
 Increasing the sample size is thus a way to get more precise  Under perfect multicollinearity, the variance of the slope Theorem 5.3 (Unbiased estimator of the error variance)
estimates. estimator will approach infinity.
Multiple regression analysis: Several scenarios for applying
Efficiency of OLS: the estimation
Gauss-Markov theorem multiple regression
 Under assumptions MLR.1–MLR.5, OLS is unbiased. Theorem 5.4: Gauss-Markov
MLR.1–MLR.5 theorem Prediction
 However, under these assumptions there may be many other MLR.1–MLR.5  The best prediction of y will be its conditional expectation
estimators that are unbiased.
 Which one is the unbiased estimator with the smallest variance?
Efficient markets
 In order to answer this question, limit yourself to linear estimators, Theorem 5.5: consistency of OLS
i.e. estimators linear in the dependent variable.
MLR.1–MLR.5  Efficient markets theory states that a single variable acts as a
sufficient statistic for predicting y. Once we know this
sufficient statistic, then additional information is not useful in
May be an arbitrary function of the sample predicting y.
values of all the explanatory variables; the OLS
estimator can be shown to be of this form.  If E(y|w,x1, ... xk) = E(y|w), then w is a sufficient statistic.

Several scenarios for applying Several scenarios for applying


multiple regression (cont.) multiple regression (cont.)
Measuring the tradeoff between two variables
 Potential outcomes, treatment effects and policy analysis
Consider regressing salary on pension compensation and other controls.
 With multiple regression, we can get closer to random Chapter 6
assignment by conditioning on observables.
MULTIPLE
Chapter 4 REGRESSION
Testing for ceteris paribus group differences ANALYSIS: INFERENCE
 Differences in outcomes between groups can be evaluated with dummy
variables.

Learning objectives Sampling distribution of the Sampling distribution of the


OLS estimators OLS estimators (cont.)
The sampling distribution of the OLS estimators
Assumption MLR.6: Normality § Examples where normality cannot hold:
Testing hypotheses about single population parameter u is independent of x1, x2, …, xk and normally distributed with q Wages (non-negative; also: minimum wage)
zero mean and variance σ2: u ~ normal(0, σ2). q Number of arrests (takes on a small number of integer values)
Confidence intervals
§ In some cases, normality can be achieved through transformations
Testing hypotheses about single linear combination of the parameter of the dependent variable (e.g. use log(wage) instead of wage).
§ Under normality, OLS is the best (even non-linear) unbiased
Testing multiple hypotheses – F-statistic estimator.
§ Important: For the purposes of statistical inference, the assumption
Reporting multiple regression results of normality can be replaced by a large sample size.
Sampling distribution of the OLS Testing hypotheses about a single Testing hypotheses about a single
estimators: terminology population parameter: the t test population parameter: t statistic
Theorem 6.2: t distribution for standardised estimators t statistic (or t ratio)
MLR.1–MLR.5 Gauss-Markov assumptions Use t statistic to test the null hypothesis. The farther the
Under assumptions MLR.1 – MLR.6: estimated coefficient is away from zero, the less likely it is that
the null hypothesis holds true. But what does ‘far away from
MLR.1–MLR.6 Classical linear model (CLM) assumptions If the standardisation is done using the estimated zero’ mean?
standard deviation (= standard error), the normal
distribution is replaced by a t distribution. Depends on variability of estimated coefficient, i.e. its
Theorem 6.1: Normal sampling distributions standard deviation. The t statistic measures how many
estimated standard deviations the estimated coefficient is
Under assumptions MLR.1–MLR.6: Note: The t distribution is close to the standard normal distribution away from zero.

if n-k-1 is large.

Distribution of t statistic if null hypothesis is true


Null hypothesis The population parameter is equal to zero; i.e. after
The estimators are normally distributed The standardised estimators follow a
around the true parameters with the standard normal distribution. controlling for the other independent variables, there
variance that was derived earlier. is no effect of xj on y.
Goal: Define a rejection rule so that, if it is true, H0 is rejected
only with a small probability (= significance level, e.g. 5%).

Testing against one-sided Example: wage equation Example: wage equation (cont.)
alternatives (greater than zero) Test whether, after controlling for education and tenure,
higher work experience leads to higher hourly wages. t statistic
Test H0 : βj = 0 against H1 : βj > 0
Reject the null hypothesis in favour of the Degrees of freedom;
alternative hypothesis if the estimated here the standard
coefficient is too large (i.e. larger than a normal approximation
critical value). applies
Construct the critical value so that, if the Standard errors
null hypothesis is true, it is rejected in, for
example, 5% of the cases. Critical values for the 5% and the 1% significance level
(these are conventional significance levels)
In the given example, this is the point of
the t distribution with 28 degrees of The null hypothesis is rejected because the t statistic
freedom that is exceeded in 5% of the exceeds the critical value.
cases.
One would expect either a positive effect of experience on The effect of experience on hourly wage is statistically greater than
Reject if t statistic greater than 1.701
hourly wage or no effect at all. zero at the 5% (and even at the 1%) significance level.

Testing against one-sided


alternatives (cont.) Example: number of Example: number of
super-rich people super-rich people (cont.)
Test why there are more super-rich people in some countries than others by
controlling for communist government, population size and countries‘ per t statistic
capita incomes.
Degrees of freedom;
here the standard normal
approximation applies

Standard errors Critical value for the 5% significance level


n = 166, R2 = 0.325 The null hypothesis is rejected because the t statistic
exceeds the critical value.
Test H0 : βcommunist = 0 against H1 : βcommunist < 0

One would expect a higher incidence of super wealth with economic freedom.
Testing against two-sided
Testing more general hypotheses
alternatives Example: French-made cars
about a regression coefficient
Null hypothesis
Hypothesised value of the coefficient

§The t statistic for french is –1.75, which is not statistically significant at


the 5% level but is statistically significant at the 10% level against a t statistic
two-sided alternative.
§The t statistic on log(m_fam) is 13.17, which is significant at very small
significance levels.
§The t statistic on log(totreg) is 9.74, which is significant at 1%. The test works exactly as before, except that the hypothesised value is
substracted from the estimate when forming the statistic.

Example: wage equation Example: hourly wage Computing p-values for t tests
§If the significance level is made smaller and smaller, there will be a
Test whether, after controlling for education and tenure, higher Whether the return to education is equal to 12%: point at which the null hypothesis cannot be rejected any more.
work experience leads to higher hourly wages.
§The reason is that, by lowering the significance level, you are
increasingly less likely to make the error of rejecting a correct H0.

Estimate is different from §The smallest significance level at which the null hypothesis is still
one, but is this difference rejected is called the p-value of the hypothesis test.
Standard errors statistically significant?
§A small p-value is evidence against the null hypothesis because you
would reject the null hypothesis even at small significance levels.
Test H0 : βexper = 0 against H1 : βexper > 0
§A large p-value is evidence in favour of the null hypothesis.
Expect a positive effect of experience on hourly wage, Hypothesis is rejected §p-values are more informative than tests at fixed significance levels.
or no effect at all. at the 1% level.

Guidelines for discussing economic


How the p-value is computed Confidence intervals
and statistical significance Critical value of
The p-value is the significance level at which
one is indifferent between rejecting and not
§If a variable is statistically significant, discuss the magnitude of the two-sided test
rejecting the null hypothesis. coefficient to get an idea of its economic or practical importance.
These would be
the critical values
In the two-sided case, the p-value is the
probability that the t distributed variable
§The fact that a coefficient is statistically significant does not
for a 5% takes on a larger absolute value than the
necessarily mean it is economically or practically significant.
Lower bound of the Upper bound of the Confidence
significance level realised value of the test statistic, e.g.:
p-value = P(|T|>1.85) = 2P(T>1.85) = 2(.0359) = .0718
§If a variable is statistically and economically important but has the confidence interval confidence interval level
‘wrong’ sign, the regression model might be misspecified.
From this, it is clear that a null hypothesis is
rejected only if the corresponding §If a variable is statistically insignificant at the usual levels (10%, 5%, Interpretation of the confidence interval
p-value is smaller than the significance level. 1%), youmay think of dropping it from the regression. §The bounds of the interval are random.
For example, for a significance level of 5%
the t statistic would not lie in the rejection §If the sample size is small, effects might be imprecisely estimated so §In repeated samples, the interval that is constructed in the above
region. that the case for dropping insignificant variables is less strong. way will cover the population regression coefficient in 95% of cases.
Value of test statistic
Example: model of R&D expenditures Testing hypotheses about a linear Testing hypotheses about a linear
combination of parameters combination of parameters (cont.)
Spending on R&D Annual sales Profits as % of sales
Example: Compare the effect of female and male Impossible to compute with standard regression output because

unemployment rates on average family income

Usually not available in regression output


Female unemployment rate Male unemployment rate

Define θ1 = β1 − β2 and test H0 : θ1 = 0 against H1 : θ1 < 0.


Alternative method

A possible test statistic would be:


Normalised difference between estimates by the
The effect of sales on R&D is relatively precisely This effect is imprecisely estimated as
estimated as the interval is narrow. Moreover, the interval is very wide. It is not even estimated standard deviation of the difference.
the effect is significantly different from zero statistically significant because zero Reject the null hypothesis if the statistic is ‘too
because zero is outside the interval. lies in the interval. negative’ to believe that the true difference between
the parameters is equal to zero. Insert into original regression a new regressor (= total years of university)

Multiple regression analysis: Testing multiple linear regressions: Estimation of the


inference The F test unrestricted model
Testing exclusion restrictions
Estimation results Estimation of the unrestricted model
Number of Population Per capita income
billionaires size of the country

n = 477; R2 .633
Hypothesis cannot be rejected Level of public
t = –.0817/.0858 = – 0.952 at 10% level but can at 5% level social
Years under Security and enforcement expenditures
p-value = .342 relative to GDP None of these variables is statistically significant when tested individually
communist rule of private property
Confidence interval –.0817 ± .1682
This method always works for single linear hypotheses.
Idea: How would the model fit be if these variables were dropped from the regression?
Test whether performance measures have no effect/can be excluded from regression.

Estimation of the Relationship between F and


Test decision in example
unrestricted model (cont.) t statistics
Estimation of the restricted model

§t2n-k-l has a F1, n-k-l distribution.


We reject the hypothesis that Communist and propertyrights have no §t statistics are more flexible for testing a single
effect on the number of billionaires at the 5% level of significance. hypothesis.
Discussion: §It is also easier to obtain than an F statistic.
The sum of squared residuals necessarily increases, but is the increase statistically significant? § The three variables are ‘jointly significant’. §An F statistic is used to detect whether a set of
Number of The relative increase of the sum of
§ They were not significant when tested individually. coefficients is different from zero.
§
Test statistic

restrictions squared residuals when going from The likely reason is multicollinearity between them.
H1 to H0 follows a F distribution (if
the null hypothesis H0 is correct).
The R-squared form of the F statistic Example: parent’s education in Computing p-values for F tests
a birthweight equation
§Let denote an F random variable with (q,n – k – 1)
degrees of freedom.
§Used for testing exclusion restrictions §A small p-value is evidence against H0.
§Cannot be used for linear restrictions §Once the p-value has been computed, the F test can
The numerator df is 2, and the denominator df is 1185. be carried out.

The F statistic for overall significance Testing general linear Testing general linear
of a regression restrictions example restrictions example (cont.)
§ Estimate the unrestricted model, then impose restrictions
to obtain the restricted model.
§ Unrestricted model:

§ Restricted model:
§ Impose the restriction that the coefficient on x1 is unity,
estimate

Example: confidence interval


Testing general linear Example: confidence interval predicted predicted EGM per adult expenditure
restrictions example EGM per adult expenditure (cont.)
exp_
 _adult = –108.0592 + 44.87077EGM + 56.97601unemployed
Compute the F statistic by obtaining the restricted and (67.52669) (7.43007) (10.67668)
unrestricted SSR.
§ Obtain a confidence interval for EGM expenditure per adult
The F statistic is: when EGM = 6 and unemployment = 5.4.
§ Construct a 95% confidence interval for the expected EGM
§ We cannot use the above equation to directly get a confidence expenditure per adult:
interval.
n = 70 R2 = 0.555 468.84 ± 1.996(17.766)
§ We need to define a new set of independent variables: § Note we have used 5% critical value from the t distribution
EGM0 = EGM – 6 and unemployed0 = unemployed – 5.4. with df = 67.
Reporting regression results

§ The estimated OLS equation should always be


reported.
§ The standard errors should always be included Chapter 7
along with the estimated coefficients.
§ The R-squared should be included. MODEL SPECIFICATION
§ The sample size should be included.

Learning objectives More on using logarithmic More on using logarithmic


functional forms functional forms: example 1
Discussion on functional form
▪ Convenient percentage/elasticity interpretation Median No of
price bedrooms
▪ Slope coefficients of logged variables are invariant to rescalings.
▪ Taking logs often eliminates/mitigates problems with outliers. Amount of
nitrogen oxide in
Specification errors ▪ Taking logs often helps to secure normality and homoscedasticity. the air
▪ Do not log variables measured in units such as years.
▪ Do not log variables measured in percentage points.
▪ The coefficient for log(nox) is the elasticity of price with respect to
Multicollinearity nox (pollution).
▪ Logs must not be used if variables take on zero or negative values.
▪ The exact percentage change in the predicted y is
▪ It is hard to reverse the log-operation when constructing predictions.

More on using logarithmic Models with quadratics example:


Models with quadratics
functional forms: example 2 wages and experience
▪ Used to capture decreasing and increasing marginal effects
Wages
Experience
Student– ▪ Estimated equation:
teacher ratio
▪ Has a diminishing effect on wage.
▪ If stratio increases by one, price decreases by 5.1%. ▪ Approximation:
▪ The first year of experience is $0.298.
▪ The exact proportionate change is exp(–.051) – 1 ≈ –.050. ▪ Turning point: ▪ The second year of experience is $0.286.
▪ If we increase stratio by five, the approximate percentage .298 – 2(.0061)(1) ≈ .286
change is –26.1%.
▪ Going from ten years to eleven years
▪ The exact proportionate change is 100[exp(–.26) – 1] ≈ –22.9%. .298 – 2(.0061)(10) = .176
Example: wages and experience Example: effects of pollution Example: effects of pollution
quadratic relationship on housing prices on housing prices (cont.1)
Nitrogen oxide distance from Calculation of the turnaround point
in air employment centres

Does this mean the return to student– This area can be ignored as Turnaround point:
experience becomes negative after teacher ratio it concerns only 1% of the
24.4 years? observations.
Not necessarily. It depends on how
Increase rooms from 5 to 6:
many observations in the sample lie to
the right of the turnaround point.
Does this mean that, at a low number of rooms, more
In the given example, these are about rooms are associated with lower prices? Increase rooms from 6 to 7:
28% of the observations. There may
be a specification problem (e.g.
omitted variables).

Example: effects of pollution


Models with interaction terms Example: German-made cars
on housing prices (cont.2)
Number of German Number of Mean family Total population
cars in a postal code German people income per postal code
▪ Houses with more than 4.4 bedrooms will be worth
more.
▪ It does not make sense that a house will be worth less Total car
if it has three bedrooms and is adding a fourth. registration
▪ Looking at the sample, less than 1% of the sample had
fewer than 4.4 bedrooms.
▪ We can assume adding a room has an increasing effect
on price. y = α0 + δ1x1 + δ2x2 +  3(x1 – μ1) (x2 – μ2) + u

Including irrelevant variables Omitted variable bias:


Example: German-made cars (cont.)
in a regression model the simple case
▪ We cannot just look at the coefficient for german. The OLS estimators can be biased when a relevant variable is
▪ Inclusion of an irrelevant variable or overspecifying the omitted.
▪ To estimate the partial effect of german on german_car, model will not have a partial effect on y.
we need to plug in m_fam to find the partial effect.
▪ Including one or more irrelevant variables does not affect Example: Wages
the unbiasedness of the OLS estimators. ▪ The correctly specified model:

▪ The model not observing ability:

❑ where:
Omitted variable bias: Summarising the direction Omitted variable bias:
the simple case (cont. 1) of the bias more general cases
▪ Given the following model:
❑ Another model is estimated, but omits x3:
❑ Suppose x2 and x3 are not correlated but x1 and x3 are.
❑ The estimator for x1 and x2 will be biased unless x1 and x2
are uncorrelated.
❑ When x1 and x2 are uncorrelated, then:

Variances in misspecified models Functional form misspecification RESET as a general test for
functional form misspecifcation
▪ Not accounting for the relationship between the dependent and
the observed explanatory variables ▪ Regression Specification Error Test (RESET)

▪ For example, estimating the following model but omitting exper2 ▪ Null hypothesis of RESET is there is no misspecification in the model

▪ Distribution of the F statistic is


▪ Another way to misspecify is to use wage as the dependent
variable instead of log(wage).

▪ Use an F test to detect misspecified functional form.

Tests against non-nested Testing against non-nested


Example: housing price equation
alternatives alternatives (cont.)
Two models are estimated as below:
It is possible to test a linear model against a linear-log model. Second approach

▪ Test whether is true.


n = 88 ▪ Estimate the other model by OLS to obtain the fitted values
Model 1: First approach
▪ The RESET statistic is 4.67; p-value = 0.012. ▪ Construct another model that contains each model.
▪ These fitted values are called .

▪ There is evidence of functional form misspecification. ▪ Then test
▪ Then estimate
Model 2:
▪ The RESET statistic is 2.56 and p-value = 0.084. ▪ Perform the Davidson-Mackinnon test (a t statistic on ).

▪ There is no evidence of functional form misspecification. ▪ A significant t statistic is a rejection of the linear model.
▪ Log-log model is preferred.
Multicollinearity
▪ Linear relationships between explanatory variables can create problems. Chapter 8
▪ High multicollinearity can occur when 2 is ‘close’ to 1.
MULTIPLE REGRESSION
▪ Best to have less correlation between xj and other independent variables.
▪ For example, examining the effect of various school expenditure ANALYSIS WITH
categories on school performance.
QUALITATIVE INFORMATION:

BINARY (OR DUMMY)
It is expected that wealthier schools will spend more on everything than
less wealthy schools.
▪ It can be difficult to estimate the effect of any category on student
performance when there is little variation in one category.
VARIABLES
▪ We can include control variables to isolate causal effects.

Learning objectives Describing qualitative information


▪ Qualitative information:
Describing qualitative information
❑ Comes in the form of binary information

❑ Example 1: A person is male or female


A single dummy independent variable
http://eselt.adelaide.edu.au/blue ❑ Example 2: A student attends a private or public school

Using dummy variables for multiple categories ▪ A way to incorporate qualitative information is to use dummy
variables (i.e. binary variables or zero–one variables).
Interaction involving dummy variables ❑ Example: female is a dummy variable, which takes the value
of 1 if the student is female, and zero otherwise.
A binary dependent variable: the linear probability model ❑ They may appear as the dependent or as independent
variables.
Interpreting regression results with discrete dependent variables

Single dummy independent variable Dummy variable trap


Single dummy independent variable
▪ How is a binary information incorporated into regression model? Graphical illustration

Alternative interpretation of
coefficient:

= the wage gain/loss if the person is a Dummy variable:


woman rather than a man (holding =1 if the person is a woman This model cannot be estimated perfect collinearity as female + male = 1
other things fixed) =0 if the person is man
i.e. the difference in mean wage male is a perfect linear function of female
between men and women with
the same level of education.
Dummy variable trap

Intercept shift
Base group or benchmark group Estimated wage equation Example: effects of accessing online
When using dummy variables, one category always has to be omitted: with intercept shift subject materials on assignment mark
Assignment Dummy indicating whether student
mark reviewed online material
The base category are men

The base category are women

Disadvantages: Holding education and experience


Alternatively, one could omit the intercept: fixed, women earn $6.37 less per
1. More difficult to test
for differences hour than men. This is an example of program evaluation:
between parameters
2. R-squared formula
▪ Treatment group (= reviewed online) vs. control group (= not reviewed)
only valid if regression Does that mean that women are discriminated against? ▪ Is the effect of treatment on the outcome of interest causal?
contains intercept. ▪ Not necessarily. Being female may be correlated with other
productivity characteristics that have not been controlled for.

Using a dummy independent variable Using a dummy independent variable Example: housing price regression
to account for outliers to account for outliers (cont.)
▪ Outliers can occur when sampling from a population. ▪ It was found that a positive and significant relationship existed
▪ May be different in some relevant aspect from rest of population. between fiscal stimulus and performance.
▪ Example: Did countries that adopted large fiscal packages ▪ Create a dummy variable ‘DUM’ for observations that are Greece, Dummy indicating
whether house is
outperform those that didn’t? Hungary, Iceland and Ireland. of colonial style

As the dummy for


▪ We expect Greece, Hungary, Iceland and Ireland to be outliers due ▪ The dummy variables are not significant.
colonial style changes
from 0 to 1, the house
to their fiscal circumstances. price increases by 5.4
▪ There is not a strong case to treat these observations differently.

percentage points.
We discard these four observations.

Using dummy variables for multiple


categories Incorporating ordinal information Interactions among dummy variables
Example: demand for Fords using dummy variables
1. Define membership in each category by a dummy variable. Example: city credit ratings on government bond interest rates ▪ Examine the interaction term between female and married in the
log wage model.
2. Leave out one category (which becomes the base category).
Government bond Credit rating from 0–4
rate (0=worst, 4=best)

Dummy variables This specification would probably not be appropriate, as the credit rating only contains
▪ We can examine the wages difference between single and married
are circled in red. ordinal information. A better way to incorporate this information is to define dummies: females and males respectively.
▪ We can test the null hypothesis that the gender differential does not
depend on marital status.
Holding other things fixed, the proportion of
Fords registered in Victoria is 15.4% higher Dummies indicating whether the particular rating applies; e.g. CR1=1 if CR=1 and CR1=0
than in WA (= the base category). otherwise. All effects are measured in comparison to the worst rating (= base category).
Example: log hourly wage equation
Allowing for different slopes Allowing for different slopes (cont.)
Allowing for different slopes Interaction term

Interacting both the


intercept and the slope
= intercept men = slope men with the female dummy
enables one to model
= intercept women = slope women completely independent
wage equations for men
and women. The estimated return to education The estimated return to education for
Interesting hypotheses for women is 0.088 + 0.00005 men is 8.8%

The return to education is The whole wage equation is


the same for men and the same for men and No evidence against the hypothesis that the return to education is
women. women. the same for men and women

Example: log hourly wage equation Testing for differences in regression Testing for differences in regression
functions across groups functions across groups (cont.)
Unrestricted model (contains full set of interactions) Null hypothesis All interaction effects are zero,
Assignment = 1 if viewed online material Student‘s mark i.e. the same regression
mark = 0 otherwise in test coefficients apply to men and
women.
Estimation of the unrestricted model

Does this mean that there is no significant evidence of


lower pay for women at the same levels of educ and exper?
Number of
tutorials attended
Tested individually,
No: this is only the effect for educ = 0. Restricted model (same regression for both groups) the hypothesis that
the interaction
To answer the question one has to recentre the interaction effects are zero
term, e.g. around educ = –14.9 (= average education) cannot be rejected.

A binary dependent variable: Example: transport options of


Joint test with F statistic the linear probability model females in New Zealand
Linear regression when the dependent variable is binary Another year of age,
= 1 if female drives to work everything else held fixed, will
= 0 otherwise decrease the probability of
driving to work by 0.24%.
Alternative way to compute F statistic in the given case
▪ Run separate regressions for men and for women; the unrestricted Linear probability
model (LPM)
SSR is given by the sum of the SSR of these two regressions.
▪ Run a regression for the restricted model and store SSR. Measures the change in probability of success
when xj changes, holding all else constant
▪ If the test is computed in this way, it is called the Chow test.
▪ Important: test assumes a constant error variance across groups.
Example: transport options of Advantages and disadvantages of MORE ON POLICY ANALYSIS
females in New Zealand (cont.) the linear probability model AND PROGRAM EVALUATION
Disadvantages Example: Effect of job training grants on worker productivity

Graph for Māori = 0, other = 0,


▪ Predicted probabilities may be larger than one or smaller than zero.
Asian = 0, fixed age = 45 ▪ Marginal probability effects are sometimes logically impossible. The firm’s =1 if firm received

The base case if European as ▪ The linear probability model is necessarily heteroscedastic.
scrap rate training grant
= 0 otherwise
the dummy variable for
Variance of Bernoulli variable No apparent effect
ethnicity is 0.
Negative predicted probability
▪ Heteroscedasticity-consistent standard errors need to be computed.
▪ Treatment group: Grant receivers
of grant on
productivity
if income < $6200. Advantages ▪ Control group: Firms that received no grant
Sample has income ranging
from $5000 to $50 000.
▪ Easy estimation and interpretation ▪ Grants were given on a first-come, first-served basis. This is not
▪ Estimated effects and predictions are often reasonably good in the same as giving them out randomly. Firms with less-productive
practice. workers may have seen an opportunity to improve productivity
and applied first.

ADDRESSING THE PROBLEM OF ADDRESSING THE PROBLEM OF Interpreting regression results with
SELF-SELECTION SELF-SELECTION (CONT.) discrete dependent variables
▪ Consider the simple regression: ▪ To illustrate the issues, use the following example:

▪ We need to make the strong assumption that w is independent


▪ We include x1 through xj to account for the possibility that the of [y(0),y(1)]. In other words, treatment is randomly assigned.
treatment (w) is not randomly assigned. ▪ It is not possible to have a fraction of a child.


❑ A more convincing case is to include covariates x1 through xj.
▪ If we take the coefficient of educ literally, every additional
For example, children eligible for a program like Head Start
participate based on parental decisions. We thus need to control
▪ Now we assume that w is independent of [y(0),y(1)] conditional year of education reduces the estimated number of children
upon x1 through xj. by 0.09.
for things like family background and structure to get closer to
random assignment into the treatment (participates in Head Start) ❑ This is known as regression adjustment and allows us to ▪ We have to interpret/summarise this information differently.
and control (does not participate) groups. adjust for differences across units in estimating the causal ▪ If each woman in a group of 100 obtains an additional year
effect of the treatment. of education, there will be nine fewer children among them.

Clicking on the link below increases happiness by 13%* Chapter 9


http://eselt.adelaide.edu.au/blue HETEROSKEDASTICITY

* Results vary.
Learning objectives Consequences of heteroscedasticity Heteroscedasticity-robust inference
for OLS after OLS estimation
Consequences of heteroscedasticity for OLS
▪ OLS still unbiased and consistent under heteroscedasticity. ▪ Formulas for OLS standard errors and related statistics have been
developed that are robust to heteroscedasticity of unknown form.
▪ Also, interpretation of R-squared is not changed.
Heteroscedasticity-robust inference after OLS estimation
▪ All formulas are only valid in large samples.
Unconditional error variance is unaffected
Testing for heteroscedasticity
by heteroscedasticity (which refers to the ▪ Formula for heteroscedasticity-robust OLS standard error
conditional error variance).
Also called White/Eicker standard errors.

Weighted least squares estimation


▪ Heteroscedasticity invalidates variance formulas for OLS estimators. They involve the squared residuals from the
regression and from a regression of xj on all
▪ The usual F tests and t tests are not valid under heteroscedasticity. other explanatory variables.

The linear probability model revisited ▪ Under heteroscedasticity, OLS is no longer the best linear unbiased ▪ Using these formulas, the usual t test is valid asymptotically.
estimator (BLUE); there may be more efficient linear estimators. ▪ The usual F statistic does not work under heteroscedasticity, but
heteroscedasticity-robust versions are available in most software.

Multiple regression analysis: Testing for heteroscedasticity Testing for heteroscedasticity (cont.)
heteroscedasticity ▪ It may still be interesting whether there is heteroscedasticity, Breusch-Pagan test for heteroscedasticity (cont.)
Example: Fords – total registration because then OLS may not be the most efficient estimator
any more. Regress squared residuals on all
explanatory variables and test
whether this regression has
Breusch-Pagan test for heteroscedasticity explanatory power.

Heteroscedasticity-robust standard errors may be larger or smaller than A large test statistic (= a high
non-robust counterparts. The differences are often small in practice. R-squared) is evidence against
Under MLR.4 the null hypothesis.

Alternative test statistic (= Lagrange multiplier


The mean of u2 must not
F statistics are also often not too different. statistic, LM). Again, high values of the test statistic
vary with x1, x2, …, xk
(= high R-squared) lead to rejection of the null
If there is strong heteroscedasticity, differences may hypothesis that the expected value of u2 is unrelated
be larger. To be on the safe side, it is advisable to to the explanatory variables.
always compute robust standard errors.

Example: heteroscedasticity in Testing for heteroscedasticity (cont.) Testing for heteroscedasticity (cont.)
housing price equations The White test for heteroscedasticity
Alternative form of the White test

Regress squared residuals on all explanatory variables, their squares


This regression indirectly tests the dependence of the squared residuals on
and interactions (here: example for k=3).
the explanatory variables, their squares and interactions, because the
Heteroscedasticity predicted value of y and its square implicitly contain all of these terms.
The White test detects more
general deviations from
heteroscedasticity than the
Breusch-Pagan test.
Example: Heteroscedasticity in (log) housing price equations
Disadvantage of this form of the White test
In the logarithmic specification, homoscedasticity cannot be rejected.
Including all squares and interactions leads to a large number
of estimated parameters (e.g. k=6 leads to 27 parameters to
be estimated).
Weighted least squares estimation
Weighted least squares estimation Example: savings and income (cont.)

The functional form of the


▪ OLS in the transformed model is weighted least squares (WLS).
heteroscedasticity is known. Note that this regression Observations with a
model has no intercept. large variance get a
smaller weight in the
optimisation problem.
The transformed model is homoscedastic
▪ Why is WLS more efficient than OLS in the original model?
❑ Observations with a large variance are less informative than
Transformed model observations with a small variance and therefore should get
If the other Gauss-Markov assumptions hold as well, OLS applied to the
less weight.
transformed model is the best linear unbiased estimator! ▪ WLS is a special case of generalised least squares (GLS).

Important special case of Unknown heteroscedasticity function


Example: expenditure per adult heteroscedasticity (feasible GLS)
Net financial wealth If the observations are reported as averages at the city/county/state/ Assumed general
country/firm level, they should be weighted by the size of the unit. form of
heteroscedasticity;
Assumed form of exp-function is used
heteroscedasticity: Average contribution to Average earnings % firm contributes Heteroscedastic to ensure positivity
pension plan in firm i and age in firm i to plan error term
Multiplicative error (assumption:
WLS estimates have independent of the explanatory
considerably smaller standard variables)
errors (which is line with the Error variance if errors
expectation that they are are homoscedastic at
more efficient). the individual level
Use inverse values of the
Number of EGM per 1000 If errors are homoscedastic at the individual level, WLS with weights equal estimated heteroscedasticity
adults to firm size mi should be used. If the assumption of homoscedasticity at function as weights in WLS
the individual level is not exactly right, one can calculate robust standard
errors after WLS (i.e. for the transformed model). Feasible GLS is consistent and asymptotically more efficient than OLS

Example: demand for cigarettes Estimation by FGLS What if the assumed


Now statistically significant
heteroscedasticity function is wrong?
Estimation by OLS
Logged income and ▪ If the heteroscedasticity function is misspecified, WLS is still
Cigarettes smoked per day cigarette price consistent under MLR.1–MLR.4, but robust standard errors should
be computed.

▪ WLS is consistent under MLR.4 but not necessarily under MLR.4‘.

Discussion ▪ If OLS and WLS produce very different estimates, this typically
Smoking restrictions in indicates that some other assumptions (e.g. MLR.4‘) are wrong.
restaurants The income elasticity is now statistically significant; other
coefficients are also more precisely estimated (without ▪ If there is strong heteroscedasticity, it is still often better to use a
Reject wrong form of heteroscedasticity in order to increase efficiency.
homoscedasticity
changing qualititative results).
The linear probability model
revisited
WLS in the linear probability model

In the LPM, the exact form of


heteroscedasticity is known

Use inverse values


as weights in WLS
Discussion
▪ Infeasible if LPM predictions are below zero or greater than one.
▪ If such cases are rare, they may be adjusted to values such
as .01/.99.
▪ Otherwise, it is probably better to use OLS with robust
standard errors.

You might also like