0% found this document useful (0 votes)
18 views

Exam preparation

Chapter Three discusses Multiple Linear Regression, emphasizing the need to include multiple variables beyond just labor hours to better understand output. It covers the assumptions of the Ordinary Least Squares (OLS) method, the derivation of OLS estimators, and the interpretation of partial regression coefficients. Additionally, it explains the concepts of the Multiple Coefficient of Determination, adjusted R-squared, partial correlation coefficients, and the significance of regression results using F-statistics.

Uploaded by

birhanuh2025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Exam preparation

Chapter Three discusses Multiple Linear Regression, emphasizing the need to include multiple variables beyond just labor hours to better understand output. It covers the assumptions of the Ordinary Least Squares (OLS) method, the derivation of OLS estimators, and the interpretation of partial regression coefficients. Additionally, it explains the concepts of the Multiple Coefficient of Determination, adjusted R-squared, partial correlation coefficients, and the significance of regression results using F-statistics.

Uploaded by

birhanuh2025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Chapter Three Multiple Linear Regression

CHAPTER THREE
MULTIPLE LINEAR REGRESSIONS
In our firm example, for instance, it was assumed implicitly that only labour hour (X)
affects output (Y). But besides labour – hour, a number of other variables/factors are
also likely to affect output, such as capital. Therefore, we need to extend our simple two-
variable regression model into multi-variable regression model.

The number of variables to be contained in multiple regressions is not limited by


number; it depends on the nature of the problem and the researcher’s decisions. For
simplicity we discussed a three variable model. We still use the OLS method discussed
in chapter 2 and all the CLRM assumptions are also maintained here.

These include:
o Linearity in parameters
o Zero mean value of the disturbance term, ( )= 0
o No serial correlation between disturbance terms, ( )= ( ) ( )=0
o Homoscedasticity, ( ) - constant variance of the error term
o Zero covariance between error term and each explanatory variable: ( )=
( )= 0
o No specification bias. The model is correctly specified
o No perfect multicollinearity between the explanatory variables.

3.1. Regression with three variables


Using the OLS (Ordinary Least Square), we write the three variables PRF as:

The coefficients and are called the Partial Regression Coefficients. The
interpretations of the βs coefficients are different from the case in simple regression. In
multiple regressions Model single coefficient can only be interpreted under ‘ceteris
paribus conditions’ (other things being constant).

An important consequence of the ceteris paribus condition is that it is not possible to


interpret a single coefficient in a regression model without knowing what the other
variables in the model are.
Chapter Three Multiple Linear Regression

Deriving OLS Estimators for three variable regression model

To find the OLS estimators, let us first write the SRF corresponding to the PRF as
follows:
̂ + ̂ +̂ +̂
The OLS estimators are obtained where the squared sum of the residuals (RSS) from the
estimation is as minimum as possible.

Problem:

min ∑ ̂ = ∑( ̂ ̂ ̂ )
FOC: first take the partial derivative w.r.t. ̂ ̂ and ̂ set it equal to zero
∑̂
= ∑[ ̂ ̂ ̂ ]( )
̂

∑̂
[∑( ̂ ̂ ̂ )( )]
̂

 u

 
2

 2 Yi  ˆ0  ˆ1 X 1  ˆ 2 X 2  X 2   0
ˆ 2

And, hence from the above FOC:

Y  ˆ
i 0  ˆ1  x1  ˆ2  x2

Y x  ˆ0  x1  ˆ1  x 1  ˆ2  x1 x2


2
i 1

Y x  ˆ0  x 2  ˆ1  x1 x2  ˆ2  x 2


2
i 2

Finally after mathematical manipulation, we get the following formulas for each
estimator.

∑ ∑ ∑ ∑
̂
∑ ∑ (∑ )
Alternatively in deviation form we can write the following equation:
∑ ̂ ∑
̂

∑ 𝒚𝒊 𝒙𝟐 ∑ 𝑥 ∑ 𝒚𝒊 𝒙𝟏 ∑ 𝒙𝟏 𝒙𝟐
̂𝟐
𝜷
∑𝑥 ∑𝑥 (∑ 𝒙𝟏 𝒙𝟐 )𝟐
Chapter Three Multiple Linear Regression

Then write the estimator of Y as: ̂ = ̂ ̂ - ̂ and = ̂ +̂

Numerical Example 3: Suppose we have the following econometric model:


= + + +
: Annual income in 1000s of birr, : higher education in years, : work experience in
years. Suppose we have the following data on the variables of the model

Yi 30 20 36 24 40
X1 4 3 6 4 8
X2 10 8 11 9 12

∑ = 812, ∑ = 1552, ∑ = 262, ∑ = 141, ∑ = 510, ∑ = 4772


̅ = 30 , ̅ ,̅

In deviation form:
∑ = ∑ - ̅ ̅ = 62, ∑ = ∑ -n ̅ ̅ = 52, ∑ = ∑ -
̅ = 16, ∑ = ∑ - ̅ = 10, ∑ ∑ - ̅ ̅ = 12, ∑ =
∑ - ̅ = 272

Thus,
∑ ∑ ∑ ∑
̂ = = = = - 0.25
∑ ∑ (∑ ) ( )

Interpretation: the partial regression coefficient of years spent in higher education ( )


is 0.25. given other factors constant or given (work experience is constant, a one year
increase in time spent higher education will decrease annual income by birr 250.

∑ ∑ ∑ ∑
̂
∑ ∑ (∑ )
= ( )
= = = 5.5

Interpretation: the partial regression coefficient of years of experience (the variable )


is 5.5. Given other factors being constant, a one year increase in work experience will
decrease annual income by birr 5,500.

̂ = ̅ ̂ ̅ - ̂ ̅ = 30 ( ) – 5.5 = 31.25 – 55 = - 23.75

The Estimated sample regression equation will be:̂ = +


Chapter Three Multiple Linear Regression

3.2 The Multiple Coefficient of Determination ( )


Multiple Coefficient of Determination ( ) measures the proportion of variation in the
dependent variable (Y) explained jointly by explanatory variables in the model (such as
X1, X2). Multiple coefficient of determination is denoted by ; conceptually similar to .
To compute use the following formula:

𝑬𝑺𝑺 ̂ 𝟏 ∑ 𝒙𝟏 𝒚𝒊
𝜷 ̂ 𝟐 ∑ 𝒙𝟐 𝒚𝒊
𝜷
𝑹𝟐 𝑻𝑺𝑺 ∑ 𝒚𝒊 𝟐
and 0 <𝑅 < 1

Note also:
For our numerical example 3, is computed as follows:
̂ ∑ ̂ ∑

= = = 0.994485

Adjusted (̅ )
It is important to note that never decreases, it usually increases, when another
independent variable is added into a regression. As the number of explanatory
(independent) variables increases, always increases. The makes it a poor tool for
deciding whether one variable or several variables should be added to a model. This
implies that the goodness-of-fit of an estimated model depends on the number of
independent (explanatory) variables regardless of whether they are important or not.
To eliminate this dependency, we compute the adjusted (̅ ) as:

= = , then adjust RSS and TSS by their respective



̅ = ( )

For the example above ̅ will be: ̅ = ( )=

3.4. Partial Correlation coefficients


A simple correlation coefficient is a measure of the degree of linear association or linear
dependence between two variables. In two variables case, we have the simple
correlation coefficient (r):

√ ⁄ = (in deviation form)
√(∑ )(∑ )
Chapter Three Multiple Linear Regression

The partial correlation coefficients measure the strength of linear correlations between
variables independent of the impudence of other variables existed within the model.
Partial correlation coefficients are used in multiple regression analysis to determine the
relative importance of each explanatory variable in the model.

In a three variable regression model, we can compute three pair- wise correlation
coefficients;
= partial correlation between Y and , while holding is constant
= partial Correlation between Y and holding X1 is constant
= partial Correlation between and , holding Y constant


= simple/pair wise correlation coefficient b/n Y & : = = 0.9394
√(∑ )(∑ )


= simple/pair wise correlation coefficient b/n Y & : = = 0.997
√(∑ )(∑ )


= simple/pair wise correlation coefficient b/n & : = = 0.9487
√(∑ )(∑ )

The three partial correlation coefficients, for our numerical example, are computed as
follows:
= : Partial correlation between Y and while holding constant
√( )( )

=
√( )( )

=
√( )( )

Partial correlation coefficients range in value from -1 to +1. For example, -1 refers to the
case where there is an exact or perfect negative linear relationship. However, +1
indicates a perfect positive linear net relationship And zero partial correlation
indicates no linear relationship between variables. Partial correlation coefficients are
used to determine the relative importance of different explanatory variables in a
multiple regression. For our example above we conclude that is more important than
in explaining the variation of Y, since > .

3.5 Variances and Standard Errors of OLS Estimators


a) Variance of the residuals or the error terms ( ̂ )

( ) ̂ ; is the degree of freedom
Chapter Three Multiple Linear Regression

In above example, .
∑ ∑ (̂ ∑ ̂ ∑ )
̂ = =
( )
̂ = 0.7 ̂ √̂ = √ = 0.866

̅ ∑ ̅ ∑ ̅ ̅ ∑
b) (̂ ) [ ∑ ∑ (∑ )
] Hence: (̂ ) √ (̂ )

(̂ ) * += 30.62 (̂ ) √ = 5.533


c) (̂ ) ∑ ∑ (∑ )
Hence: (̂ ) √ (̂ )

(̂ ) = 0.46875, then: (̂ ) √ (̂ ) = √ = 0.685


d) (̂ ) ∑ ∑ (∑ )
Hence: (̂ ) √ (̂ )

(̂ ) = 0.75, then: (̂ ) √ (̂ ) = √ = 0.866

The - statistics
Its most common use is to test the statistical significance of the joint effects of
regression coefficients/ explanatory variables. The ratio or statistics is used to test the
overall significance of the regression model. A high value for the F - statistic suggests a
significant relationship between the dependent and independent variables, leading to
the rejection of the null hypothesis which asserts the slope coefficients of all
explanatory variables are jointly zero.
;

The F- test is used to determine the adequacy of the regression model for prediction
purposes. If the F- test showed the regression coefficients are jointly insignificant, the
model can’t be used for prediction. The F – test is equivalent to testing the statistical
significance of of a model.

Decision rule:
o If F – statistics > F- critical value, reject
o If F – statistics > F- critical value; accept or can’t be rejected

-Statistics computed from sample data as follows


Chapter Three Multiple Linear Regression


; k - 1 and n- k are the degree of freedoms.

For our example above, the F- statistics is computed AS:


ESS = ̂ ∑ ̂ ∑ = = = 270.5
RSS = TSS – ESS; TSS = ∑ RSS = 272 – 270.5 = 1.5

= =

- Critical value, at ( ) ( ) and = 0.05 is read as 19

Decision: Since F – statistics > F- critical value, we reject or can’t accept , hence,
we conclude that the model statistically significantly explain the variation in the
dependent variable.

Reporting and interpreting the regression results


The regression results of Example 3 above are formally reported below.
̂ = +
Se = (5.533) (0.685) (0.866)
t = (-4.292) (- 0.365) (6.351)
n= 5, , ̅ = 0.9890, ( ) = 180.33, ̂ ,
t𝛼/2 (n-k) =4.303 Fα (2, 2) = 19.0

ANOVA Table
Source SS df MS
Model 270.5 2 135.25 F( 2, 2) = 180.33] Prob > F 0.0055
Residual 1.5 2 0.75 R2 = 0.9945, ̅ = 0.9890
Total 272 4 68 Root MSE = 0.86603

Y Coef. Std. Err. t P>|t| [95% Conf. Interval]


X1 -0.25 0.685 - 0.37 0.750 -3.196 2.696
X2 5.5 0.866 6.35 0.024 1.774 9.226
Cons. -23.75 5.533 -4.29 0.050 -47.558 0.0584

The upper part of the table could be called the ANOVA table (since it tabulates the
Analysis of Variances: ESS, RSS, TSS, and MS (mean sum, which obtained by dividing the
Sum Square (the SS) by the corresponding df. Root MSE is standard error of disturbance
term. Note the result perfectly much with our manual computation. However, the uses
of statistical software tremendously reduce the computational complexity involved. As
Chapter Three Multiple Linear Regression

the number of explanatory variables increases manual computation will become very
difficult and unpractical.

You might also like