0% found this document useful (0 votes)
1 views

Econometrics Chapter Three (1)

Uploaded by

tofiktofa645
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Econometrics Chapter Three (1)

Uploaded by

tofiktofa645
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Chapter Three

THE CLASSICAL REGRESSION ANALYSIS


[The Multiple Linear Regression Model]
• In simple regression we study the
relationship between a dependent variable
and a single explanatory (independent
variable). But it is rarely the case that
economic relationships involve just two
variables.
• Rather a dependent variable Y can depend
on a whole series of explanatory variables or
regressors. For instance, in demand studies
we study the relationship between quantity
demanded of a good and price of the good,
price of substitute goods and the consumer’s
income. The model we assume is:
Yi  0  1 P1   2 P2  3 X i  ui -------------
(3.1)
Where Yi  quantity demanded, P1 is price of the

good, P2 is price of substitute goods, Xi is


consumer’s income, and  ' s are unknown
parameters and ui is the disturbance.
•Equation (3.1) is a multiple regression with three
explanatory variables. In general for K-
explanatory variable we can write the model as
follows:
• Yi  0  1 X 1i   2 X 2i  3 X 3i  .........   k X
ki  ui ------- (3.2)
• In this chapter we will first start our
discussion with the assumptions of the
multiple regressions and we will proceed
our analysis with the case of two
explanatory variables.
3.2 Assumptions of Multiple
Regression Model
• In order to specify our multiple linear
regression model and proceed our analysis
with regard to this model, some
assumptions are compulsory. But these
assumptions are the same as in the single
explanatory variable model developed
earlier chapter.
• These assumptions are:
1.Randomness of the error term: The variable u is
a real random variable.
2.Zero mean of the error term:
3.Hemoscedasticity: The variance of each ui is the
same
• for all the x i values.
4. Normality of u: The values of each ui are normally
distributed.

5. No auto or serial correlation: The values of ui

(corresponding to Xi ) are independent from the

values of any other ui (corresponding to Xj ) for i j.


• i.e. E(uiu j )  0 for xi  j
6.Independence of ui and Xi : Every disturbance

term ui is independent of the explanatory

variables. i.e. E(ui X 1i )  E(ui X 2i )  0


7. No perfect multicollinearity: The explanatory
variables are not perfectly linearly correlated.
We can’t exclusively list all the assumptions but
the above assumptions are some of the basic
assumptions that enable us to proceed our
analysis.
A Model With Two Explanatory Variables

•In order to understand the nature of multiple


regression model easily, we start our analysis
with the case of two explanatory variables.
1. Estimation of parameters of two-explanatory
variables model
The model:
Y   0  1 X 1   2 X 2  Ui ………………(3.3)
is multiple regression with two explanatory
variables. The expected value of the above model
is called population regression equation i.e.
E(Y )  0  1 X 1   2 X 2 , Since E(Ui )  0 .
…………………................(3.4) where i is the
population parameters.  0 is referred to as the
intercept and  1 and  2 are also some times
known as regression slopes of the regression.
• Note that,  2 for example measures the effect on
E(Y ) of a unit change in X 2 when X 1 is held
constant.
• Since the population regression equation is
unknown to any investigator, it has to be
estimated from sample data.
• Let us suppose that the sample data has been
used to estimate the population regression
equation. We leave the method of estimation
unspecified for the present and merely assume
that equation (3.4) has been estimated by sample
regression equation, which we write as:
•Yˆ ˆ0  ˆ1X  ˆ2 X ………………….(3.5)
• Where ˆ are estimates of the j and Yˆ is known
j

as the predicted value of Y.


•Now it is time to state how (3.3) is estimated.
Given sample observation on Y , X 1 & X 2 , we
estimate (3.3) using the method of least square
(OLS).
• Y  0^  1^ X1  2^ X2 e ………….(3.6)
e  Y  Y^  Y   0^  1 ^ X  2 ^ X ..(3.7)
• To obtain expressions for the least square
estimators, we partially differentiate with
respect to , , and and set the partial derivatives
equal to zero.
3.3.2 The coefficient of determination ( R2):two explanatory variables case

•In the simple regression model, we introduced R 2

as a measure of the proportion of variation in the


dependent variable that is explained by variation
in the explanatory variable.
•In multiple regression model the same measure is
relevant, and the same formulas are valid but now
we talk of the proportion of variation in the
dependent variable explained by all explanatory
variables included in the model.
• The coefficient of determination is:
2
• R  ESS/ TSS=1-ESR/TSS=
• The value of R2 is also equal to the squared
sample correlation coefficient between Yˆ & Yt .
•Since the sample correlation coefficient
measures the linear association between two
2
variables, if R is high, that means there is a close
association between the values of Yt and the
values of predicted by the model, Yˆ .
3.3.3Adjusted Coefficient of Determination ()

2
• One difficulty with R is that it can be made
large by adding more and more variables, even
if the variables added have no economic
justification. Algebraically, it is the fact that as
the variables are added the sum of squared
errors (RSS) goes down (it can remain
2
unchanged, but this is rare) and thus R goes up.
• An alternative measure of goodness of fit, called
the adjusted and often symbolized as , is
usually reported by regression programs. It is
computed as:
• =1-(1-
• This measure does not always goes up when a
variable is added because of the degree of
freedom term n-k decrease. As the number of
variables k increases, RSS goes down, but so
does n-k.
• It losses its interpretation; is no longer the
percent of variation explained. This modified is
sometimes used and misused as a device for
selecting the appropriate set of explanatory
variables.
Interpreting Regression Coefficients
1 Level-Level Models
• Y = b0 + b1X1 + · · · + bkXk + e
• The coefficient b1 is interpreted as follows ∂Y ∂X1 =
b1. The partial derivative of Y with respect to X1, ∂Y
∂X1 , can be interpreted as the rate of change in Y
associated with a change in X1 (holding constant
the variables X2 through Xk in the model).
• In other words if we change X1 by 1 unit (i.e.
∆X1 = 1, then we can expect Y to change by b1
units. That is: we expect that ∆Y = b1).
• So the interpretation of b1 in a level-level
regression is that a 1 unit change in X1 is
associated with a b1 unit change in Y holding
constant all other variables in the model.
2 Log-Log Models
A log-log model is a model where both the
dependent variable (Y) and the right hand side
variables (i.e., X1, ..., Xk) have been transformed
by the natural logarithm. These models can be
expressed as follows:
ln(Y ) = b0 + b1ln(X1) + · · · + bkln(Xk) + e
• So the interpretation in a log-log model is that a
1% change in X1 is associated with a b1 %
change in Y holding constant all other variables
in the model.
3 Level-Log Models
A level-log model is a model where the
dependent variable (Y) is in level form and the
right hand side variables have been transformed
by the natural logarithm. These models can be
expressed as follows:
Y = b0 + b1ln(X1) + · · · + bkln(Xk) + e
• Therefore, in a level-log model, the
interpretation is that a 1% increase in X is
associated with a b1/100 unit change in Y
holding constant all other variables in the model.
4 Log-Level Models
• A log-level model is where the dependent
variable (Y) has been transformed by the natural
logarithm and the right hand side variables are
in level form. These models can be expressed as
follows:
ln(Y ) = b0 + b1X1 + · · · + bxXk + e
• So the interpretation is that a 1 unit change in X1
is associated with a 100 · b1 % change in Y
holding constant all other variables in the model.
Hypothesis Testing in Multiple Regression Model

• In multiple regression models we will undertake


two tests of significance. One is significance of
individual parameters of the model. This test of
significance is the same as the tests discussed in
simple regression model. The second test is
overall significance of the model.
• To illustrate consider the following example.
Let Y  0  1ˆ X1  2ˆ X2  e ……………………… (3.51)

A.H 0 : 1  0
H1 : 1  0
B.H 0 :  2  0
H1 :  2  0
• The null hypothesis (A) states that, holding X2
constant X1 has no (linear) influence on Y.
Similarly hypothesis (B) states that holding X1
constant, X2 has no influence on the dependent
variable Yi.To test these null hypothesis we will
use the following tests:
Standard error test: under this and the following
testing methods we test only for ˆ .The test for ˆ
will be done in the same way.
• If SE(ˆ1 )  1/2 ˆ1 , we accept the null
hypothesis that is, we can conclude that the
estimate i is not statistically significant.
• If SE(ˆ1) <1/2 ˆ 1, we reject the null
hypothesis that is, we can conclude that the
estimate i is statistically significant.
•Note: The smaller the standard errors, the
stronger the evidence that the estimates are
statistically reliable.
ii. The student’s t-test: We compute the t-ratio
for each ˆI
• If t*<t (tabulated), we accept the null hypothesis,
i.e. we can conclude that ˆ2 is not significant
and hence the regressor does not appear to
contribute to the explanation of the variations in
Y.
• If t*>t (tabulated), we reject the null hypothesis
and we accept the alternative one; ˆ2 is
statistically significant. Thus, the greater the
value of t* the stronger the evidence that i is
statistically significant.
3.5.2 Test of Overall Significance
Through out the previous section we were
concerned with testing the significance of the
estimated partial regression coefficients
individually, i.e. under the separate hypothesis that
each of the true population partial regression
coefficient was zero.
•In this section we extend this idea to joint test of
the relevance of all the included explanatory
variables. Now consider the following:
• Y   0  1 X 1   2 X 2  .........   k X k  Ui
H 0 : 1   2  3  ............   k  0
H1 : at least one of the k is non-zero
•This null hypothesis is a joint hypothesis that

1 ,  2 ,........ k are jointly or simultaneously equal


to zero. A test of such a hypothesis is called a test of
overall significance of the observed or estimated
regression line, that is, whether Y is linearly related

to X 1 , X 2 ,........X k .
• The test procedure for any set of hypothesis can
be based on a comparison of the sum of squared
errors from the original, the unrestricted
multiple regression model to the sum of squared
errors from a regression model in which the null
hypothesis is assumed to be true.
• When a null hypothesis is assumed to be true,
we in effect place conditions or constraints, on
the values that the parameters can take, and the
sum of squared errors increases.
• Let the Restricted Residual Sum of Square
(RRSS) be the sum of squared errors in the
model obtained by assuming that the null
hypothesis is true and URSS be the sum of the
squared error of the original unrestricted model
i.e. unrestricted residual sum of square (URSS).
• It is always true that RRSS - URSS  0.
• Consider
• Y  ˆ0  ˆ1X1  ˆ X2  .........  3ˆ X3  e .
•This model is called unrestricted. The test of
joint hypothesis is that:
H 0 : 1   2  3  ............   k  0
H1 : at least one of the k is different from zero.
• We know that: Y     X  ˆ X  .........  ˆ X
• Yi  Yˆ  e
•e Y Y
• e2  (Y  Yˆ )^2
•This sum of squared error is called unrestricted
residual sum of square (URSS). This is the case
when the null hypothesis is not true. If the null
hypothesis is assumed to be true, i.e. when all the
slope coefficients are zero.
• Y  ˆ0  ei
By applying OLS

• e=

• The sum of squared error when the null


hypothesis is assumed to be true is called
Restricted Residual Sum of Square (RRSS) and
this is equal to the total sum of square (TSS).
• The ratio: (RRSS  URSS / K  1) /(URSS / n  K)
F (k 1,nk )
(has an F-ditribution with k-1 and n-k degrees of
freedom for the numerator and denominator
respectively) RRSS  TSS
• If the computed value of F is greater than the
critical value of F (k-1, n-k), then the parameters
of the model are jointly significant or the
dependent variable Y is linearly related to the
independent variables included in the model.
regress earnings age female wexp married

Source SS df MS Number of obs = 540


F(4, 535) = 10.90
Model 9081.49396 4 2270.37349 Prob > F = 0.0000
Residual 111436.933 535 208.293333 R-squared = 0.0754
Adj R-squared = 0.0684
Total 120518.427 539 223.59634 Root MSE = 14.432

earnings Coef. Std. Err. t P>|t| [95% Conf. Interval]

age -.085984 .2854256 -0.30 0.763 -.6466764 .4747084


female -6.869205 1.277072 -5.38 0.000 -9.377896 -4.360514
wexp .1800404 .1425477 1.26 0.207 -.0999814 .4600622
married 3.105923 1.32769 2.34 0.020 .4977975 5.714048
_cons 21.71116 11.20674 1.94 0.053 -.3034448 43.72576

You might also like