0% found this document useful (0 votes)

11 views86 pages

Lec2 ASE

The document discusses linear regression models. It explains that a linear regression model specifies a dependent variable (y) as a linear function of one or more independent variables (x). The slope coefficient (β1) measures the average change in the dependent variable (y) from a one-unit change in the independent variable (x). While nonlinear relationships can be approximated by linear models, this introduces approximation errors. The model also includes a stochastic error term (ε) to account for uncertainty. Observations from a random sample are used to estimate the coefficients.

Uploaded by

As'Ad Ichola ASSANI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views86 pages

Lec2 ASE

Uploaded by

As'Ad Ichola ASSANI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 86

Linear Regression

Karim Nchare

African School of Economics

November 2020
Functional relations

I Quantitative characteristics of the world are usually entangled

in functional relations
Functional relations

I Quantitative characteristics of the world are usually entangled

in functional relations
I A regression or model specifies an explained variable as a
function of an explanatory variable

y = f (x)
Functional relations

I Quantitative characteristics of the world are usually entangled

in functional relations
I A regression or model specifies an explained variable as a
function of an explanatory variable

y = f (x)

I y is the regressand, response variable, explained variable,

dependant variable, outcome
Functional relations

I Quantitative characteristics of the world are usually entangled

in functional relations
I A regression or model specifies an explained variable as a
function of an explanatory variable

y = f (x)

I y is the regressand, response variable, explained variable,

dependant variable, outcome
I x is the regressor, predictor variable, explanatory variable,
independent variable, control variable.
Example: Quadratic regression
Rate of change
∆x = x1 − x0 and ∆y = y1 − y0 = f (x1 ) − f (x0 )
I The rate of change measures how y responds to changes in x
∆y f (x1 ) − f (x0 ) f (x0 + ∆X ) − f (x0 )
= =
∆x x1 − x0 x1 − x0
Rate of change
∆x = x1 − x0 and ∆y = y1 − y0 = f (x1 ) − f (x0 )
I The rate of change measures how y responds to changes in x
∆y f (x1 ) − f (x0 ) f (x0 + ∆X ) − f (x0 )
= =
∆x x1 − x0 x1 − x0

I It depends both on the initial point and the magnitude of the

change
Rate of Change
∆x = x1 − x0 and ∆y = y1 − y0 = f (x1 ) − f (x0 )
I The rate of change measures how y responds to changes in x
∆y f (x1 ) − f (x0 ) f (x0 + ∆X ) − f (x0 )
= =
∆x x1 − x0 x1 − x0
Rate of Change
∆x = x1 − x0 and ∆y = y1 − y0 = f (x1 ) − f (x0 )
I The rate of change measures how y responds to changes in x
∆y f (x1 ) − f (x0 ) f (x0 + ∆X ) − f (x0 )
= =
∆x x1 − x0 x1 − x0

I It depends both on the initial point and the magnitude of the

change
Linear Model
I A model is linear if it can be written as:

y = β0 + β1 x
Linear Model
I A model is linear if it can be written as:

y = β0 + β1 x

I Which means that the graph of the regression is a (straight)

line
Slope coefficient
Slope coefficient
Slope coefficient

I The slope of a linear model equals β1 independently of x0 and

∆x
∆y y1 − y0
=
∆x x1 − x0
(β0 + β1 x1 ) − (β0 + β1 x0 )
=
x1 − x0
β1 (x1 − x0 )
=
x1 − x0
= β1
The linearity assumption
I The linearity assumption is less restrictive than it appears
The linearity assumption
I The linearity assumption is less restrictive than it appears

I The following model is clearly nonlinear

y = log (γ0 x γ1 )
The linearity assumption
I The linearity assumption is less restrictive than it appears

I The following model is clearly nonlinear

y = log (γ0 x γ1 )

I After some relabelling:

β0 = log (γ0 )
β 1 = γ1
z = log (x)
The linearity assumption
I The linearity assumption is less restrictive than it appears

I The following model is clearly nonlinear

y = log (γ0 x γ1 )

I After some relabelling:

β0 = log (γ0 )
β 1 = γ1
z = log (x)

I We obtain the linear model

y = log (γ0 ) + γ1 log (x) = β0 + β1 z

Approximating nonlinear models

I Suppose that the true relationship between x and y is given by

y = f (x)
Approximating nonlinear models

I Suppose that the true relationship between x and y is given by

y = f (x)

I We can always abstract from non potential linearities and use

a linear model

ỹ = β0 + β1 x ≈ y = f (x)
Approximating nonlinear models

I Suppose that the true relationship between x and y is given by

y = f (x)

I We can always abstract from non potential linearities and use

a linear model

ỹ = β0 + β1 x ≈ y = f (x)

I If f is not linear, then the approximation will be inexact and

there will be approximation errors

= y − ỹ
Approximating nonlinear models
Multivariate regressions
I The value of the response variable may be a function of many
regressors
y = f (x1 , x2 , ..., xk )
Multivariate regressions
I The value of the response variable may be a function of many
regressors
y = f (x1 , x2 , ..., xk )

I We can still have linear models

y = β0 + β1 x1 + β2 x2 + · · · + βk xk
Multivariate regressions
I The value of the response variable may be a function of many
regressors
y = f (x1 , x2 , ..., xk )

I We can still have linear models

y = β0 + β1 x1 + β2 x2 + · · · + βk xk

I In this case, each coefficient βj is still a measure of change

holding every other variable constant
∆y
= βj
∆xj
Multivariate regressions
I The value of the response variable may be a function of many
regressors
y = f (x1 , x2 , ..., xk )

I We can still have linear models

y = β0 + β1 x1 + β2 x2 + · · · + βk xk

I In this case, each coefficient βj is still a measure of change

holding every other variable constant
∆y
= βj
∆xj

I For multivariate regressions linearity assumes separability

Unobserved variables

I We may not know or observe all the variables which affect y

y = β0 + β1 x1 + β2 x2 + · · · + βk xk
Unobserved variables

I We may not know or observe all the variables which affect y

y = β0 + β1 x1 + β2 x2 + · · · + βk xk

I if β2 x2 + · · · + βk xk is unobserved, We can still approximate y

with the variables that we do observe

ỹ = β0 + β1 x1
Unobserved variables

I We may not know or observe all the variables which affect y

y = β0 + β1 x1 + β2 x2 + · · · + βk xk

I if β2 x2 + · · · + βk xk is unobserved, We can still approximate y

with the variables that we do observe

ỹ = β0 + β1 x1

I As before, this approximation is inexact and has an

approximation error

= y − ỹ = β2 x2 + · · · + βk xk
Stochastic regression

I Most of the time there is uncertainty because (at least)

I We are not certain about the linearity of a regression
I We cannot list all the relevant regressors
I we can have some measurement error issues

I Uncertainty is captured by a stochastic error term

y = β0 + β1 x +

I β0 + β1 x is called the deterministic component of the model

I is called the random component of the model
Stochastic regression

I Assume that the error has zero mean conditional on x

I Then the deterministic component corresponds to the mean of

y conditional on x

E (y |x) = E (β0 + β1 x + |x)

= β0 + β1 x

I Then slope coefficient measures the average per-unit effect of

changes in x over the average value of y conditional on x

E (y |x1 ) − E (y |x0 ) = β1 (x1 − x0 )

Random Sample
I We are usually interested in different observations coming
from
I Cross-sectional – different sources
I Time series – a single source at different times
I Panel data – different time series from different sources

I We assume that the data comes from a random sample

{xi, yi, i }
I xi and yi are observed but i is not and we have a collection
of equations
yi = β0 + β1 xi + i
I In case of a multivariate regression

yi = β0 + β1 x1i + · · · + βk xki + i
Predictions and Residuals

I Suppose that we have estimates β̂0 and β̂1 , the estimated

model is then
ŷ = β̂0 + β̂1 x
I Given an estimated model, for each realization of xi the
predicted value of yi is:

ŷi = β̂0 + β̂1 xi

I The corresponding residual is:

ei = yi − ŷi

I Notice we cannot guarantee that ei = i unless we know β0

and β1
A linear regression, random sample
A linear regression, the estimated model
A linear regression, errors vs residuals
Example: Height and Weight model

I Contest Game:
I If you guest the weight of a participant within 10lb of the
actual weight, you get paid 2$.
I Otherwise you pay him or her 3$
I You could use height (observable) to estimate the weight

WEIGHTi = β0 + β1 HEIGHTi + i

I Given estimated coefficients β̂0 = 103.4 and β̂1 = 6.38, you

can make predictions

\ i = 103.4 + 6.38HEIGHTi
WEIGHT
Example: height and weight, Predictions, observations,
residuals
Example: height and weight Predictions
Example: height and weight, Predictions, observations
Example: height and weight, Predictions, observations,
residuals
Estimating linear models

I Begin from dataset coming from a random sample {xi , yi }

I We assume that x and y are related by a model:

yi = β0 + β1 xi + i

I We do not observe i and the true coefficients β0 and β1

I Our objective now is to generate estimates β̂0 and β̂1 of these
coefficients to obtain an estimated model

ŷi = β̂0 + β̂1 xi

Example: linear regression, Data generating process
Example: linear regression, Realized random sample
Example: linear regression, the best linear model
Example: linear regression, the best linear model
Example: linear regression, the best linear model
Example: linear regression, the best linear model
Example: linear regression, the best linear model
Example: linear regression, the closest linear model
Example: linear regression, the closest linear model
The best linear model

I Two uses for the estimated model:

I Prediction - Given xi , yi what is the predicted value ŷi for a
new value of x
I Policy - Given xi , yi what is the average change in y
associated with a change in x:

∆yi = β1 ∆xi ≈ ∆xi

I Better predictions when yi ≈ ŷi , i.e. when the residuals are

small
I Policy implications only make sense if we establish causality
I Better policy implications when β̂1 ≈ β1 and e ≈ 0
Ordinary least squares

Given a data set, the ordinary least squares (OLS) estimates of

β0 and β1 , are the numbers β̂0 and β̂1 which minimize the sum of
squared residuals:
n
X
SSR = (yi − β̂0 − β̂1 xi )2
i=1

The OLS estimated model is: ŷi = β̂0 + β̂1 xi

I We wish to have small residuals. Small means in magnitude
not sign:
ei = yi − ŷi = yi − β̂0 − β̂1 xi
Examples: OLS Random samples
Examples: OLS Estimated models
Computing OLS

I When β1 = 0, we know that β̂0 = ȳ why?

I Now suppose that we know that β0 = 0, i.e. yi = β1 xi + i
I In this case we obtain:
P
xi yi
β̂1 = P 2
xi
I In the general case, the OLS estimates are given by:
P
(xi − x̄)(yi − ȳ )
β̂1 =
(xi − x̄)2
P

β̂0 = ȳ − β̂1 x̄

cov (x,y )
I Notice that β̂1 looks like a sample analogue of var (x)
I The OLS estimates guarantee that
P
êi = 0
Example: height and weight Computing OLS
Example: height and weight Computing OLS
Example: height and weight Computing OLS

P
(xi − x̄)(yi − ȳ ) 590.2
β̂1 = P 2
= ≈ 6.38
(xi − x̄) 92.55
β̂0 = ȳ − β̂1 x̄ = 169 − 6.38 × 10 ≈ 105.22
ŷi = 105.22 + 6.38xi
Example: geography of trade
Example: military service and income
Example: income vs. fecundity
Example: public debt vs. growth
The need for an intercept

I Most of the time we will be interested in β1 rather than β0

I One could simply estimate

yi = β1 xi + i

I But if β̂0 = 0 we may get bad estimates

Multivariate regressions
I The analysis extends to multivariate models

yi = β0 + β1 x1i + · · · + βk xki + i

I The interpretation is slightly different: β̂k indicates the

response to changes in xk holding other regressors constant
I OLS is defined in the same way: minimizing SSR
I The formulas require linear algebra
I OLS is never done by hand: we use computers
Example: Financial Aid
I Response variable: FINAIDi – grant per year to applicant i
I Regressors:
I PARENTi -feasible contributions from parents
I HSRANKi -GPA rank in high school
I GENDERi -gender dummy (1 if male and 0 if female)
Example: financial aid, Dataset
Example: financial aid, Dataset
Example: financial aid, OLS
Estimated OLS model (ignoring GENDER and HSRANK):

\ i = 15897 − 0.34PARENTi
FINAID
Example: financial aid, OLS
Estimated OLS model (ignoring GENDER):

\ i = 8927 − 0.36PARENTi + 87.4HSRANKi

FINAID
Interaction terms

I If the effect of x1 on y depends on the value of x2

I Include an interaction term x1x2 in the regression

y = β0 + β1 x1 + β2 x2 + β3 x1 x2 +

I The average effect of one unit change in x1 over y is given by

β2 + β3 x2 :
0 0
E (y |x1 , x2 ) − E (y |x1 , x2 ) = (x1 − x1 )(β2 + β3 x2 )
Anscombe’s quartet Data
Example: Anscombe’s quartet, Scatterplots
Anscombe’s quartet Estimated models
Evaluating an estimated model

I Is the equation supported by sound theory/common sense?

I How well does the estimated model fit the data?
I Is the dataset reasonably large and accurate?
I Is OLS the best estimator to be used?
I Do estimated coefficients correspond to prior expectations?
I Are all the important variables included?
I In case we want to do policy: are the estimated parameters
structural?
Explained variation

I Regressions are used to explain y

I In particular, we wish to explain why/when is yi different from
E (y )
I The variation in y can be decomposed as:

yi − E (y ) = β0 + β1 xi + i − β0 − β1 E (x)
= β1 (xi − E (x)) + i
explained unexplained

I One way to evaluate models is to measure the proportion of

the variance of y that we are able to explain
Explained variation

I Regressions are used to explain y

I In particular, we wish to explain why/when is yi different from
ȳ
I The variation in y can be decomposed as:

yi − ȳ = β0 + β1 xi + i − β0 − β1 x̄
= β1 (xi − x̄) + i
explained unexplained

I One way to evaluate estimated models is to measure the

proportion of the variance of y that we are able to explain
Example: Variance decomposition
Example: Variance decomposition
Variance decomposition

X X
SST = (yi − ȳ )2 =(yi − ŷi + ŷi − ȳ )2
X X X
= (yi − ŷi )2 + 2 (yi − ŷi )(ŷi − ȳ ) + (ŷi − ȳ )2
X X
= (yi − ŷi )2 + (ŷi − ȳ )2
= Sum of Squares Residual + Sum of Squares Explained
= SSR + SSE
Goodness of fit R 2
I We have decomposed the total variation (SST) into the
explained variation (SSE) and the unexplained or residual
variation (SSR)
I A measure of how much of the variation of y can be explained
by the variation of x according to the estimated model
SSE SST − SSR SSR
R2 = = =1−
SST SST SST
I The higher the R 2 the closer the model is to the data and
since 0 ≤ SSR ≤ SST we know that 0 ≤ R 2 ≤ 1.
I It does not measure:
I How linear/tight the relation between x and y is (correlation)
I The inclination of the estimated line (slope coefficient)
I The strength of the causal relation between x and y
Examples
Example: height and weight, Computing OLS
Adding more regressors

I Adding a regressor always decreases SSR and then always

increases R 2 if y is independent from it. Why?
I Having more data or more variables improves the R2 because
it increases the degrees of freedom
I The adjusted R 2 controls for this bias:

SSR/n − K
R̄ 2 = 1 −
SST /n − 1

where n is the sample size and K is the number of parameters

I R̄ 2 = R 2 when K = 1 and R̄ 2 ≈ R 2 when n is very large.
ANOVA
Example: water supply variables

Lecture Slides Chapter 5-6 Wooldridge 7th Edition
No ratings yet
Lecture Slides Chapter 5-6 Wooldridge 7th Edition
25 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
No ratings yet
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
20 pages
Econometrics Unit 3 Tedy Best
No ratings yet
Econometrics Unit 3 Tedy Best
147 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
Multiple Regression
No ratings yet
Multiple Regression
22 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
lecture 3
No ratings yet
lecture 3
33 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Sta 3
No ratings yet
Sta 3
9 pages
CH 06
No ratings yet
CH 06
22 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
reg
No ratings yet
reg
110 pages
StatLearning2r PDF
No ratings yet
StatLearning2r PDF
267 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
ch12_0
No ratings yet
ch12_0
43 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
DAUNIT-3
No ratings yet
DAUNIT-3
32 pages
WEEK2 Simple Regression
No ratings yet
WEEK2 Simple Regression
133 pages
2.linear Regression
No ratings yet
2.linear Regression
49 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Regression
No ratings yet
Regression
60 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
Simple Regression Model: Erbil Technology Institute
No ratings yet
Simple Regression Model: Erbil Technology Institute
9 pages
Notes2
No ratings yet
Notes2
16 pages
FE5209 3 AY 2024
No ratings yet
FE5209 3 AY 2024
59 pages
Chapter 9 Simple Linear Regression and Correlation (1) (1)
No ratings yet
Chapter 9 Simple Linear Regression and Correlation (1) (1)
56 pages
ch12 0
No ratings yet
ch12 0
82 pages
STAT 445-Lecture 1_2021
No ratings yet
STAT 445-Lecture 1_2021
42 pages
Introduction To Mathematical Modeling: Simple Linear Regression
No ratings yet
Introduction To Mathematical Modeling: Simple Linear Regression
21 pages
Estad Istica II Chapter 4: Simple Linear Regression
No ratings yet
Estad Istica II Chapter 4: Simple Linear Regression
46 pages
Ordinary Least Squares Linear Regression Review: Week 4
No ratings yet
Ordinary Least Squares Linear Regression Review: Week 4
10 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
chapter 8
No ratings yet
chapter 8
39 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model for Medical Data
No ratings yet
Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model for Medical Data
7 pages
Module III (Part II)(Regression and Time Series)
No ratings yet
Module III (Part II)(Regression and Time Series)
118 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Week 2 - The Simple Linear Regression Model PDF
No ratings yet
Week 2 - The Simple Linear Regression Model PDF
47 pages
Chapter 2 Regression Analysis Notes
No ratings yet
Chapter 2 Regression Analysis Notes
11 pages
21csc305p Ml Unit 2 Ppt
No ratings yet
21csc305p Ml Unit 2 Ppt
115 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Lecture Note #8_PEC-CS701E
No ratings yet
Lecture Note #8_PEC-CS701E
20 pages
Linear Regression
100% (1)
Linear Regression
14 pages
Econ 471 Notes 1
No ratings yet
Econ 471 Notes 1
14 pages
Lecture2 241007 162001
No ratings yet
Lecture2 241007 162001
11 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Capsule Calculus
From Everand
Capsule Calculus
Ira Ritow
No ratings yet
Statistical Modelling Assignment II
No ratings yet
Statistical Modelling Assignment II
3 pages
Syllabus For Post Graduate Program in Machine Learning & Artificial Intelligence - Demo
No ratings yet
Syllabus For Post Graduate Program in Machine Learning & Artificial Intelligence - Demo
4 pages
Time Series Topics Using R/Rstudio: Oscar Torres-Reyna
No ratings yet
Time Series Topics Using R/Rstudio: Oscar Torres-Reyna
16 pages
Chapter 5 Introduction To Factorial Designs
No ratings yet
Chapter 5 Introduction To Factorial Designs
28 pages
Chapter 1 Simple Linear Regression
No ratings yet
Chapter 1 Simple Linear Regression
62 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
7 pages
ID Pengaruh Kualitas Pelayanan Terhadap Kepuasan Pelanggan Pada PT Pos Indonesia Pe
No ratings yet
ID Pengaruh Kualitas Pelayanan Terhadap Kepuasan Pelanggan Pada PT Pos Indonesia Pe
10 pages
Chapter5-Multiple Linear Regression
No ratings yet
Chapter5-Multiple Linear Regression
5 pages
cs1 Specimen Questions and Solutions
No ratings yet
cs1 Specimen Questions and Solutions
7 pages
Correlation: (For M.B.A. I Semester)
100% (2)
Correlation: (For M.B.A. I Semester)
46 pages
Regresion
No ratings yet
Regresion
38 pages
Chapter7 - Spesification - Choosing A Functional Form
No ratings yet
Chapter7 - Spesification - Choosing A Functional Form
30 pages
BKM 10e Ch07 Two Security Model
No ratings yet
BKM 10e Ch07 Two Security Model
2 pages
Rcihards 1993 Spurius Correlation
No ratings yet
Rcihards 1993 Spurius Correlation
14 pages
Jaggia Chapter 7 1 Updated
No ratings yet
Jaggia Chapter 7 1 Updated
23 pages
LEARNING ACTIVITY SHEET (LAS) Grade 11 - Statistics and Probability
No ratings yet
LEARNING ACTIVITY SHEET (LAS) Grade 11 - Statistics and Probability
5 pages
ARDL Coint EViews
No ratings yet
ARDL Coint EViews
13 pages
Uji Validitas Dan Rehabilitas Skala Kecerdasan Emosi
No ratings yet
Uji Validitas Dan Rehabilitas Skala Kecerdasan Emosi
3 pages
MI Unit 2
No ratings yet
MI Unit 2
85 pages
Least Squares Matrix Form PDF
No ratings yet
Least Squares Matrix Form PDF
16 pages
Ermi Stat LL ch5
No ratings yet
Ermi Stat LL ch5
42 pages
ADANCO 2.0.1 User Manual
No ratings yet
ADANCO 2.0.1 User Manual
61 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
Factorial Anova Examples PDF
0% (1)
Factorial Anova Examples PDF
2 pages
Econometrica - 2022 - Hansen - A Modern Gauss Markov Theorem
No ratings yet
Econometrica - 2022 - Hansen - A Modern Gauss Markov Theorem
12 pages
Lecture 11: Multiple Regression: Statistics For Business Decision
No ratings yet
Lecture 11: Multiple Regression: Statistics For Business Decision
68 pages
CH 13
No ratings yet
CH 13
61 pages
Homework 10
No ratings yet
Homework 10
3 pages
FHMM1034 Tutorial 6-Correlation Regression 201910
No ratings yet
FHMM1034 Tutorial 6-Correlation Regression 201910
8 pages

Lec2 ASE

Uploaded by

Lec2 ASE

Uploaded by

Linear Regression

African School of Economics

I Quantitative characteristics of the world are usually entangled

I Quantitative characteristics of the world are usually entangled

I Quantitative characteristics of the world are usually entangled

I y is the regressand, response variable, explained variable,

I Quantitative characteristics of the world are usually entangled

I y is the regressand, response variable, explained variable,

I It depends both on the initial point and the magnitude of the

I It depends both on the initial point and the magnitude of the

I Which means that the graph of the regression is a (straight)

I The slope of a linear model equals β1 independently of x0 and

I The following model is clearly nonlinear

I The following model is clearly nonlinear

I After some relabelling:

I The following model is clearly nonlinear

I After some relabelling:

I We obtain the linear model

y = log (γ0 ) + γ1 log (x) = β0 + β1 z

I Suppose that the true relationship between x and y is given by

I Suppose that the true relationship between x and y is given by

I We can always abstract from non potential linearities and use

I Suppose that the true relationship between x and y is given by

I We can always abstract from non potential linearities and use

I If f is not linear, then the approximation will be inexact and

I We can still have linear models

I We can still have linear models

I In this case, each coefficient βj is still a measure of change

I We can still have linear models

I In this case, each coefficient βj is still a measure of change

I For multivariate regressions linearity assumes separability

I We may not know or observe all the variables which affect y

I We may not know or observe all the variables which affect y

I if β2 x2 + · · · + βk xk is unobserved, We can still approximate y

I We may not know or observe all the variables which affect y

I if β2 x2 + · · · + βk xk is unobserved, We can still approximate y

I As before, this approximation is inexact and has an

I Most of the time there is uncertainty because (at least)

I Uncertainty is captured by a stochastic error term 

I β0 + β1 x is called the deterministic component of the model

I Assume that the error has zero mean conditional on x

I Then the deterministic component corresponds to the mean of

E (y |x) = E (β0 + β1 x + |x)

I Then slope coefficient measures the average per-unit effect of

E (y |x1 ) − E (y |x0 ) = β1 (x1 − x0 )

I We assume that the data comes from a random sample

I Suppose that we have estimates β̂0 and β̂1 , the estimated

ŷi = β̂0 + β̂1 xi

I The corresponding residual is:

I Notice we cannot guarantee that ei = i unless we know β0

I Given estimated coefficients β̂0 = 103.4 and β̂1 = 6.38, you

I Begin from dataset coming from a random sample {xi , yi }

I We do not observe i and the true coefficients β0 and β1

ŷi = β̂0 + β̂1 xi

I Two uses for the estimated model:

∆yi = β1 ∆xi ≈ ∆xi

I Better predictions when yi ≈ ŷi , i.e. when the residuals are

Given a data set, the ordinary least squares (OLS) estimates of

The OLS estimated model is: ŷi = β̂0 + β̂1 xi

I When β1 = 0, we know that β̂0 = ȳ why?

I Most of the time we will be interested in β1 rather than β0

I But if β̂0 = 0 we may get bad estimates

I The interpretation is slightly different: β̂k indicates the

\ i = 8927 − 0.36PARENTi + 87.4HSRANKi

I If the effect of x1 on y depends on the value of x2

I The average effect of one unit change in x1 over y is given by

I Is the equation supported by sound theory/common sense?

I Regressions are used to explain y

I One way to evaluate models is to measure the proportion of

I Regressions are used to explain y

I One way to evaluate estimated models is to measure the

I Adding a regressor always decreases SSR and then always

where n is the sample size and K is the number of parameters

You might also like

I Uncertainty is captured by a stochastic error term

E (y |x) = E (β0 + β1 x + |x)

I Notice we cannot guarantee that ei = i unless we know β0

I We do not observe i and the true coefficients β0 and β1