0% found this document useful (0 votes)
2 views33 pages

Econometrics___Lecture_2___Simple_Regression

The document outlines the Simple Regression Model, which examines the relationship between two variables through a linear equation. It discusses the interpretation, assumptions, and properties of Ordinary Least Squares (OLS) estimators, emphasizing the importance of understanding causal relationships and the limitations of linearity in practice. Additionally, it covers the concepts of goodness-of-fit, incorporating nonlinearities, and estimating error variance in regression analysis.

Uploaded by

Philips Otibo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views33 pages

Econometrics___Lecture_2___Simple_Regression

The document outlines the Simple Regression Model, which examines the relationship between two variables through a linear equation. It discusses the interpretation, assumptions, and properties of Ordinary Least Squares (OLS) estimators, emphasizing the importance of understanding causal relationships and the limitations of linearity in practice. Additionally, it covers the concepts of goodness-of-fit, incorporating nonlinearities, and estimating error variance in regression analysis.

Uploaded by

Philips Otibo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

L ECTURE 2 - T HE S IMPLE R EGRESSION M ODEL

Course: [107408] ECONOMETRICS

School of Economics and Management,


University of Siena
2024/2025
T HE SIMPLE REGRESSION MODEL
D EFINITION OF THE SIMPLE REGRESSION MODEL

▶ The simple regression model can be used to study the relationship between two variables.

▶ A simple equation to “Explain the variable y in terms of variable x”

1 / 32
T HE SIMPLE REGRESSION MODEL
I NTERPRETATION OF THE SIMPLE LINEAR REGRESSION MODEL

▶ Explains how y varies with changes in x

▶ The simple linear regression model is rarely applicable in practice but its discussion is
useful for pedagogical reasons.

2 / 32
T HE SIMPLE REGRESSION MODEL
E XAMPLES

Example: Soybean yield and fertilizer

Example: A simple wage equation

Linearity: one-unit change in x has the same effect on y, regardless of the initial value of x. This
is often unrealistic. For example, in the wage-education example, we might want to allow for
increasing returns: the next year of education has a larger effect on wages than did the previous
year.
3 / 32
T HE SIMPLE REGRESSION MODEL
C AUSAL INTERPRETATION

▶ When is there a causal interpretation?


Conditional mean independence assumption: the average value of u does not depend on the
value of x.
In the example of educations, this means that given any specific value of education (say 12
years of schooling), the factors in the error term (like intelligence or experience) are, on average,
zero.

Example: wage equation

4 / 32
T HE SIMPLE REGRESSION MODEL
P OPULATION REGRESSION FUNCTION (PFR)

▶ The conditional mean independence assumption implies that

▶ This means that the average value of the dependent variable can be expressed as a linear
function of the explanatory variable.
▶ It is important to understand that the equation tells us how the average value of y changes
with x; it does not say that y equals β0 + β1 x for all units in the population.

5 / 32
T HE SIMPLE REGRESSION MODEL
G RAPHICAL FORM

This graph represents the linear relationship between the dependent variable y and the
independent variable x for the entire population. It is important to note that we do not observe
the entire population. Instead, we typically work with a sample. 6 / 32
T HE SIMPLE REGRESSION MODEL
D ERIVING THE ORDINARY LEAST SQUARES (OLS) ESTIMATES

▶ In order to estimate the regression model one needs data


▶ Population parameters represent the true underlying relationship in the entire population.
These are denoted as: β0 , β1
▶ Fitted (Sample) parameters are estimates based on the sample data. These parameters
aim to approximate the population parameters and are denoted with a hat as: β̂0 , β̂1
▶ A random sample of n observations

A random sample is a subset of individuals or observations chosen from a larger population,


7 / 32
where each member of the population has an equal chance of being selected.
T HE SIMPLE REGRESSION MODEL
D ERIVING THE ORDINARY LEAST SQUARES (OLS) ESTIMATES

So, how can we derive the estimates using our data?

▶ Defining regression residuals

The residual for observation i is the difference between the actual yi and its fitted value:

▶ Minimize the sum of the squared regression residuals

Finding β̂0 and β̂1 to make the sum of squared residuals as small as possible. Gives the same
result as the method of moments.

▶ OLS estimators

8 / 32
T HE SIMPLE REGRESSION MODEL
D ERIVING THE ORDINARY LEAST SQUARES (OLS) ESTIMATES

Interpreting β1

▶ The OLS estimator for β1 can also be represented as

1. The more x and y move together (high covariance), the higher the value of β1 : This is
because covariance measures the strength of the linear relationship between x and y . If
they are highly positively correlated, β1 will be larger, reflecting that changes in x strongly
influence changes in y .
2. If the variance of x is large, the effect of β1 will be smaller: This is because, with a large
spread in x, each individual unit change in x represents a smaller portion of the total
variation in x. So, the estimated effect of x on y (captured by β1 ) becomes smaller.

R stats: Let’s see how to calculate these coefficients in R. Open on Moodle R scripts / Lecture 2
- Simple OLS
9 / 32
T HE SIMPLE REGRESSION MODEL
D ERIVING THE ORDINARY LEAST SQUARES (OLS) ESTIMATES

▶ OLS fits as good as possible a regression line through the data points

10 / 32
T HE SIMPLE REGRESSION MODEL
E XAMPLE OF A SIMPLE REGRESSION

CEO salary and return on equity

Fitted regression

▶ Causal interpretation?
11 / 32
T HE SIMPLE REGRESSION MODEL
P ROPERTIES OF OLS ON ANY SAMPLE OF DATA

Properties of OLS on any sample of data

▶ Fitted values and residuals

▶ Algebraic properties of OLS regression

12 / 32
T HE SIMPLE REGRESSION MODEL
P ROPERTIES OF OLS ON ANY SAMPLE OF DATA

▶ This table presents fitted values and residuals for 15 CEOs.


▶ For example, the 12th CEO’s predicted salary is $526,023 higher than their actual salary.
▶ By contrast the 5th CEO’s predicted salary is $149,493 lower than their actual salary.

R stats: Let’s see if we can confirm the OLS properties in R. Open on Moodle R scripts /
Lecture 2 - Simple OLS
13 / 32
T HE SIMPLE REGRESSION MODEL
G OODNESS OF FIT

▶ How well does an explanatory variable explain the dependent variable?

Measures of variation:

14 / 32
T HE SIMPLE REGRESSION MODEL
G OODNESS OF FIT

Decomposition of total variation

Goodness-of-fit measure (R-squared)

15 / 32
T HE SIMPLE REGRESSION MODEL
G OODNESS OF FIT

CEO Salary and return on equity

So, it means that only one independent variable (x) does not explain much of the variation in
salaries

Voting outcomes and campaign expenditures

Caution: A high R-squared does not necessarily mean that the regression has a causal
interpretation!
16 / 32
T HE SIMPLE REGRESSION MODEL
I NCORPORATING NONLINEARITIES : S EMI - LOGARITHMIC FORM

Incorporating nonlinearities: Semi-logarithmic form

We might want to allow for increasing returns: the next year of education has a larger effect on
wages than did the previous year.

Regression of log wages on years of education

This changes the interpretation of the regression coefficient:

17 / 32
T HE SIMPLE REGRESSION MODEL
I NCORPORATING NONLINEARITIES : S EMI - LOGARITHMIC FORM

Fitted regression

18 / 32
T HE SIMPLE REGRESSION MODEL
I NCORPORATING NONLINEARITIES : S EMI - LOGARITHMIC FORM

CEO salary and firm sales

This changes the interpretation of the regression coefficient:

19 / 32
T HE SIMPLE REGRESSION MODEL
I NCORPORATING NONLINEARITIES : S EMI - LOGARITHMIC FORM

CEO salary and firm sales: fitted regression

▶ The log-log form postulates a constant elasticity model, whereas the semi-log form
assumes a semi-elasticity

▶ Linear regression must be linear in the parameters (coefficients), but the variables can
undergo nonlinear transformations such as x 2 , log (x ), or ex .

20 / 32
T HE SIMPLE REGRESSION MODEL
E XPECTED VALUES AND VARIANCES OF THE OLS ESTIMATORS

▶ The estimated regression coefficients are random variables because they are calculated
from a random sample

▶ The question is what the estimators will estimate on average and how large will their
variability be in repeated samples

First we have to make some assumptions

21 / 32
T HE SIMPLE REGRESSION MODEL
S TANDARD ASSUMPTIONS FOR THE LINEAR REGRESSION MODEL

Standard assumptions for the linear regression model

Assumption SLR.1 (Linear in parameters)

Assumption SLR.2 (Random sampling)

22 / 32
T HE SIMPLE REGRESSION MODEL
A SSUMPTIONS FOR THE LINEAR REGRESSION MODEL ( CONT.)

Assumption SLR.3 (Sample variation in the explanatory variable))

Assumption SLR.4 (Zero conditional mean))

23 / 32
T HE SIMPLE REGRESSION MODEL
A SSUMPTIONS FOR THE LINEAR REGRESSION MODEL ( CONT.)

Theorem 2.1 (Unbiasedness of OLS)

Interpretation of unbiasedness:
▶ The estimated coefficients may be smaller or larger, depending on the sample resulting
from a random draw. However, on average, they will be equal to the values that
characterize the true relationship between y and x in the population.
▶ “On average” means if sampling was repeated, i.e. if drawing the random sample and doing
the estimation was repeated many times.
▶ In a given sample, estimates may differ considerably from true values. But if we repeat the
estimation for many different samplings, we will get, on average, the true value.

24 / 32
T HE SIMPLE REGRESSION MODEL
A SSUMPTIONS FOR THE LINEAR REGRESSION MODEL ( CONT.)

Variances of the OLS estimators


▶ Depending on the sample, the estimates will be nearer or farther away from the true
population values.
▶ How far can we expect our estimates to be away from the true population values on
average (= sampling variability)?
▶ Sampling variability is measured by the estimator‘s variances

Assumption SLR.5 (Homoskedasticity)

Assumption SLR.5: the homoskedasticity assumption plays no role in showing that β0 and β1
are unbiased. We add Assumption SLR.5 because it simplifies the variance calculations and
because it implies that OLS has certain efficiency properties, which we will see next class.
25 / 32
T HE SIMPLE REGRESSION MODEL
A SSUMPTIONS FOR THE LINEAR REGRESSION MODEL ( CONT.)

Graphical illustration of homoskedasticity

26 / 32
T HE SIMPLE REGRESSION MODEL
A SSUMPTIONS FOR THE LINEAR REGRESSION MODEL ( CONT.)

An example for heteroskedasticity: Wage and education

Homoskedasticity does not hold: more educated people likely have a wider variety of job
opportunities, which could lead to more wage variability at higher levels of education. People
with low levels of education have fewer opportunities and often must work at the minimum wage;
this reduces wage variability at low education levels.
27 / 32
T HE SIMPLE REGRESSION MODEL
A SSUMPTIONS FOR THE LINEAR REGRESSION MODEL ( CONT.)

Theorem 2.2 (Variances of the OLS estimators)


▶ Under assumptions SLR.1 – SLR.5:

Conclusion:
▶ The sampling variability of the estimated regression coefficients will be the higher, the
larger the variability of the unobserved factors, and the lower, the higher the variation in the
explanatory variable.

28 / 32
T HE SIMPLE REGRESSION MODEL
E STIMATING THE ERROR VARIANCE

The formulas of the previous slides allow us to isolate the factors that contribute to Var (β̂1 ) and
Var (β̂0 ). But these formulas are unknown, because σ 2 is unknown (it’s a population parameter).
Nevertheless, we can use the data to estimate σ 2 , which then allows us to estimate Var (β̂1 ) and
Var (β̂0 ).

29 / 32
T HE SIMPLE REGRESSION MODEL

Theorem 2.3 (Unbiasedness of the error variance)

▶ Calculation of standard errors for regression coefficients

▶ The estimated standard deviations of the regression coefficients are called “standard
errors.” They measure how precisely the regression coefficients are estimated.
30 / 32
T HE SIMPLE REGRESSION MODEL
R EGRESSION ON A BINARY EXPLANATORY VARIABLE

▶ Suppose that x is either equal to 0 or 1

▶ This regression allows the mean value of y to differ depending on the state of x

▶ Note that the statistical properties of OLS are no different when x is binary

31 / 32
R EFERENCES

Heiss F. (2020). Using R for Introductory Econometrics, 2nd edition. Available here: link.
Chapter 2: The Simple Regression Model.

Wooldridge J.M. (2018). Introductory Econometrics: A Modern Approach, Seventh Edition.


Cengage. Chapter 2: The Simple Regression Model.

32 / 32

You might also like