0% found this document useful (0 votes)
4 views

Lec2 ASE

The document discusses linear regression models. It explains that a linear regression model specifies a dependent variable (y) as a linear function of one or more independent variables (x). The slope coefficient (β1) measures the average change in the dependent variable (y) from a one-unit change in the independent variable (x). While nonlinear relationships can be approximated by linear models, this introduces approximation errors. The model also includes a stochastic error term (ε) to account for uncertainty. Observations from a random sample are used to estimate the coefficients.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lec2 ASE

The document discusses linear regression models. It explains that a linear regression model specifies a dependent variable (y) as a linear function of one or more independent variables (x). The slope coefficient (β1) measures the average change in the dependent variable (y) from a one-unit change in the independent variable (x). While nonlinear relationships can be approximated by linear models, this introduces approximation errors. The model also includes a stochastic error term (ε) to account for uncertainty. Observations from a random sample are used to estimate the coefficients.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Linear Regression

Karim Nchare

African School of Economics

November 2020
Functional relations

I Quantitative characteristics of the world are usually entangled


in functional relations
Functional relations

I Quantitative characteristics of the world are usually entangled


in functional relations
I A regression or model specifies an explained variable as a
function of an explanatory variable

y = f (x)
Functional relations

I Quantitative characteristics of the world are usually entangled


in functional relations
I A regression or model specifies an explained variable as a
function of an explanatory variable

y = f (x)

I y is the regressand, response variable, explained variable,


dependant variable, outcome
Functional relations

I Quantitative characteristics of the world are usually entangled


in functional relations
I A regression or model specifies an explained variable as a
function of an explanatory variable

y = f (x)

I y is the regressand, response variable, explained variable,


dependant variable, outcome
I x is the regressor, predictor variable, explanatory variable,
independent variable, control variable.
Example: Quadratic regression
Rate of change
∆x = x1 − x0 and ∆y = y1 − y0 = f (x1 ) − f (x0 )
I The rate of change measures how y responds to changes in x
∆y f (x1 ) − f (x0 ) f (x0 + ∆X ) − f (x0 )
= =
∆x x1 − x0 x1 − x0
Rate of change
∆x = x1 − x0 and ∆y = y1 − y0 = f (x1 ) − f (x0 )
I The rate of change measures how y responds to changes in x
∆y f (x1 ) − f (x0 ) f (x0 + ∆X ) − f (x0 )
= =
∆x x1 − x0 x1 − x0

I It depends both on the initial point and the magnitude of the


change
Rate of Change
∆x = x1 − x0 and ∆y = y1 − y0 = f (x1 ) − f (x0 )
I The rate of change measures how y responds to changes in x
∆y f (x1 ) − f (x0 ) f (x0 + ∆X ) − f (x0 )
= =
∆x x1 − x0 x1 − x0
Rate of Change
∆x = x1 − x0 and ∆y = y1 − y0 = f (x1 ) − f (x0 )
I The rate of change measures how y responds to changes in x
∆y f (x1 ) − f (x0 ) f (x0 + ∆X ) − f (x0 )
= =
∆x x1 − x0 x1 − x0

I It depends both on the initial point and the magnitude of the


change
Linear Model
I A model is linear if it can be written as:

y = β0 + β1 x
Linear Model
I A model is linear if it can be written as:

y = β0 + β1 x

I Which means that the graph of the regression is a (straight)


line
Slope coefficient
Slope coefficient
Slope coefficient

I The slope of a linear model equals β1 independently of x0 and


∆x
∆y y1 − y0
=
∆x x1 − x0
(β0 + β1 x1 ) − (β0 + β1 x0 )
=
x1 − x0
β1 (x1 − x0 )
=
x1 − x0
= β1
The linearity assumption
I The linearity assumption is less restrictive than it appears
The linearity assumption
I The linearity assumption is less restrictive than it appears

I The following model is clearly nonlinear

y = log (γ0 x γ1 )
The linearity assumption
I The linearity assumption is less restrictive than it appears

I The following model is clearly nonlinear

y = log (γ0 x γ1 )

I After some relabelling:

β0 = log (γ0 )
β 1 = γ1
z = log (x)
The linearity assumption
I The linearity assumption is less restrictive than it appears

I The following model is clearly nonlinear

y = log (γ0 x γ1 )

I After some relabelling:

β0 = log (γ0 )
β 1 = γ1
z = log (x)

I We obtain the linear model

y = log (γ0 ) + γ1 log (x) = β0 + β1 z


Approximating nonlinear models

I Suppose that the true relationship between x and y is given by

y = f (x)
Approximating nonlinear models

I Suppose that the true relationship between x and y is given by

y = f (x)

I We can always abstract from non potential linearities and use


a linear model

ỹ = β0 + β1 x ≈ y = f (x)
Approximating nonlinear models

I Suppose that the true relationship between x and y is given by

y = f (x)

I We can always abstract from non potential linearities and use


a linear model

ỹ = β0 + β1 x ≈ y = f (x)

I If f is not linear, then the approximation will be inexact and


there will be approximation errors

 = y − ỹ
Approximating nonlinear models
Multivariate regressions
I The value of the response variable may be a function of many
regressors
y = f (x1 , x2 , ..., xk )
Multivariate regressions
I The value of the response variable may be a function of many
regressors
y = f (x1 , x2 , ..., xk )

I We can still have linear models

y = β0 + β1 x1 + β2 x2 + · · · + βk xk
Multivariate regressions
I The value of the response variable may be a function of many
regressors
y = f (x1 , x2 , ..., xk )

I We can still have linear models

y = β0 + β1 x1 + β2 x2 + · · · + βk xk

I In this case, each coefficient βj is still a measure of change


holding every other variable constant
∆y
= βj
∆xj
Multivariate regressions
I The value of the response variable may be a function of many
regressors
y = f (x1 , x2 , ..., xk )

I We can still have linear models

y = β0 + β1 x1 + β2 x2 + · · · + βk xk

I In this case, each coefficient βj is still a measure of change


holding every other variable constant
∆y
= βj
∆xj

I For multivariate regressions linearity assumes separability


Unobserved variables

I We may not know or observe all the variables which affect y

y = β0 + β1 x1 + β2 x2 + · · · + βk xk
Unobserved variables

I We may not know or observe all the variables which affect y

y = β0 + β1 x1 + β2 x2 + · · · + βk xk

I if β2 x2 + · · · + βk xk is unobserved, We can still approximate y


with the variables that we do observe

ỹ = β0 + β1 x1
Unobserved variables

I We may not know or observe all the variables which affect y

y = β0 + β1 x1 + β2 x2 + · · · + βk xk

I if β2 x2 + · · · + βk xk is unobserved, We can still approximate y


with the variables that we do observe

ỹ = β0 + β1 x1

I As before, this approximation is inexact and has an


approximation error

 = y − ỹ = β2 x2 + · · · + βk xk
Stochastic regression

I Most of the time there is uncertainty because (at least)


I We are not certain about the linearity of a regression
I We cannot list all the relevant regressors
I we can have some measurement error issues

I Uncertainty is captured by a stochastic error term 

y = β0 + β1 x + 

I β0 + β1 x is called the deterministic component of the model


I  is called the random component of the model
Stochastic regression

I Assume that the error has zero mean conditional on x

I Then the deterministic component corresponds to the mean of


y conditional on x

E (y |x) = E (β0 + β1 x + |x)


= β0 + β1 x

I Then slope coefficient measures the average per-unit effect of


changes in x over the average value of y conditional on x

E (y |x1 ) − E (y |x0 ) = β1 (x1 − x0 )


Random Sample
I We are usually interested in different observations coming
from
I Cross-sectional – different sources
I Time series – a single source at different times
I Panel data – different time series from different sources

I We assume that the data comes from a random sample


{xi, yi, i }
I xi and yi are observed but i is not and we have a collection
of equations
yi = β0 + β1 xi + i
I In case of a multivariate regression

yi = β0 + β1 x1i + · · · + βk xki + i
Predictions and Residuals

I Suppose that we have estimates β̂0 and β̂1 , the estimated


model is then
ŷ = β̂0 + β̂1 x
I Given an estimated model, for each realization of xi the
predicted value of yi is:

ŷi = β̂0 + β̂1 xi

I The corresponding residual is:

ei = yi − ŷi

I Notice we cannot guarantee that ei = i unless we know β0


and β1
A linear regression, random sample
A linear regression, the estimated model
A linear regression, errors vs residuals
Example: Height and Weight model

I Contest Game:
I If you guest the weight of a participant within 10lb of the
actual weight, you get paid 2$.
I Otherwise you pay him or her 3$
I You could use height (observable) to estimate the weight

WEIGHTi = β0 + β1 HEIGHTi + i

I Given estimated coefficients β̂0 = 103.4 and β̂1 = 6.38, you


can make predictions

\ i = 103.4 + 6.38HEIGHTi
WEIGHT
Example: height and weight, Predictions, observations,
residuals
Example: height and weight Predictions
Example: height and weight, Predictions, observations
Example: height and weight, Predictions, observations,
residuals
Estimating linear models

I Begin from dataset coming from a random sample {xi , yi }


I We assume that x and y are related by a model:

yi = β0 + β1 xi + i

I We do not observe i and the true coefficients β0 and β1


I Our objective now is to generate estimates β̂0 and β̂1 of these
coefficients to obtain an estimated model

ŷi = β̂0 + β̂1 xi


Example: linear regression, Data generating process
Example: linear regression, Realized random sample
Example: linear regression, the best linear model
Example: linear regression, the best linear model
Example: linear regression, the best linear model
Example: linear regression, the best linear model
Example: linear regression, the best linear model
Example: linear regression, the closest linear model
Example: linear regression, the closest linear model
The best linear model

I Two uses for the estimated model:


I Prediction - Given xi , yi what is the predicted value ŷi for a
new value of x
I Policy - Given xi , yi what is the average change in y
associated with a change in x:

∆yi = β1 ∆xi ≈ ∆xi

I Better predictions when yi ≈ ŷi , i.e. when the residuals are


small
I Policy implications only make sense if we establish causality
I Better policy implications when β̂1 ≈ β1 and e ≈ 0
Ordinary least squares

Given a data set, the ordinary least squares (OLS) estimates of


β0 and β1 , are the numbers β̂0 and β̂1 which minimize the sum of
squared residuals:
n
X
SSR = (yi − β̂0 − β̂1 xi )2
i=1

The OLS estimated model is: ŷi = β̂0 + β̂1 xi


I We wish to have small residuals. Small means in magnitude
not sign:
ei = yi − ŷi = yi − β̂0 − β̂1 xi
Examples: OLS Random samples
Examples: OLS Estimated models
Computing OLS

I When β1 = 0, we know that β̂0 = ȳ why?


I Now suppose that we know that β0 = 0, i.e. yi = β1 xi + i
I In this case we obtain:
P
xi yi
β̂1 = P 2
xi
I In the general case, the OLS estimates are given by:
P
(xi − x̄)(yi − ȳ )
β̂1 =
(xi − x̄)2
P

β̂0 = ȳ − β̂1 x̄

cov (x,y )
I Notice that β̂1 looks like a sample analogue of var (x)
I The OLS estimates guarantee that
P
êi = 0
Example: height and weight Computing OLS
Example: height and weight Computing OLS
Example: height and weight Computing OLS

P
(xi − x̄)(yi − ȳ ) 590.2
β̂1 = P 2
= ≈ 6.38
(xi − x̄) 92.55
β̂0 = ȳ − β̂1 x̄ = 169 − 6.38 × 10 ≈ 105.22
ŷi = 105.22 + 6.38xi
Example: geography of trade
Example: military service and income
Example: income vs. fecundity
Example: public debt vs. growth
The need for an intercept

I Most of the time we will be interested in β1 rather than β0


I One could simply estimate

yi = β1 xi + i

I But if β̂0 = 0 we may get bad estimates


Multivariate regressions
I The analysis extends to multivariate models

yi = β0 + β1 x1i + · · · + βk xki + i

I The interpretation is slightly different: β̂k indicates the


response to changes in xk holding other regressors constant
I OLS is defined in the same way: minimizing SSR
I The formulas require linear algebra
I OLS is never done by hand: we use computers
Example: Financial Aid
I Response variable: FINAIDi – grant per year to applicant i
I Regressors:
I PARENTi -feasible contributions from parents
I HSRANKi -GPA rank in high school
I GENDERi -gender dummy (1 if male and 0 if female)
Example: financial aid, Dataset
Example: financial aid, Dataset
Example: financial aid, OLS
Estimated OLS model (ignoring GENDER and HSRANK):

\ i = 15897 − 0.34PARENTi
FINAID
Example: financial aid, OLS
Estimated OLS model (ignoring GENDER):

\ i = 8927 − 0.36PARENTi + 87.4HSRANKi


FINAID
Interaction terms

I If the effect of x1 on y depends on the value of x2


I Include an interaction term x1x2 in the regression

y = β0 + β1 x1 + β2 x2 + β3 x1 x2 + 

I The average effect of one unit change in x1 over y is given by


β2 + β3 x2 :
0 0
E (y |x1 , x2 ) − E (y |x1 , x2 ) = (x1 − x1 )(β2 + β3 x2 )
Anscombe’s quartet Data
Example: Anscombe’s quartet, Scatterplots
Anscombe’s quartet Estimated models
Evaluating an estimated model

I Is the equation supported by sound theory/common sense?


I How well does the estimated model fit the data?
I Is the dataset reasonably large and accurate?
I Is OLS the best estimator to be used?
I Do estimated coefficients correspond to prior expectations?
I Are all the important variables included?
I In case we want to do policy: are the estimated parameters
structural?
Explained variation

I Regressions are used to explain y


I In particular, we wish to explain why/when is yi different from
E (y )
I The variation in y can be decomposed as:

yi − E (y ) = β0 + β1 xi + i − β0 − β1 E (x)
= β1 (xi − E (x)) + i
explained unexplained

I One way to evaluate models is to measure the proportion of


the variance of y that we are able to explain
Explained variation

I Regressions are used to explain y


I In particular, we wish to explain why/when is yi different from

I The variation in y can be decomposed as:

yi − ȳ = β0 + β1 xi + i − β0 − β1 x̄
= β1 (xi − x̄) + i
explained unexplained

I One way to evaluate estimated models is to measure the


proportion of the variance of y that we are able to explain
Example: Variance decomposition
Example: Variance decomposition
Variance decomposition

X X
SST = (yi − ȳ )2 =(yi − ŷi + ŷi − ȳ )2
X X X
= (yi − ŷi )2 + 2 (yi − ŷi )(ŷi − ȳ ) + (ŷi − ȳ )2
X X
= (yi − ŷi )2 + (ŷi − ȳ )2
= Sum of Squares Residual + Sum of Squares Explained
= SSR + SSE
Goodness of fit R 2
I We have decomposed the total variation (SST) into the
explained variation (SSE) and the unexplained or residual
variation (SSR)
I A measure of how much of the variation of y can be explained
by the variation of x according to the estimated model
SSE SST − SSR SSR
R2 = = =1−
SST SST SST
I The higher the R 2 the closer the model is to the data and
since 0 ≤ SSR ≤ SST we know that 0 ≤ R 2 ≤ 1.
I It does not measure:
I How linear/tight the relation between x and y is (correlation)
I The inclination of the estimated line (slope coefficient)
I The strength of the causal relation between x and y
Examples
Example: height and weight, Computing OLS
Adding more regressors

I Adding a regressor always decreases SSR and then always


increases R 2 if y is independent from it. Why?
I Having more data or more variables improves the R2 because
it increases the degrees of freedom
I The adjusted R 2 controls for this bias:

SSR/n − K
R̄ 2 = 1 −
SST /n − 1

where n is the sample size and K is the number of parameters


I R̄ 2 = R 2 when K = 1 and R̄ 2 ≈ R 2 when n is very large.
ANOVA
Example: water supply variables

You might also like