0% found this document useful (0 votes)
10 views

Econometrics Chapter _Two (1)

Uploaded by

tofiktofa645
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Econometrics Chapter _Two (1)

Uploaded by

tofiktofa645
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 71

Chapter Two

THE CLASSICAL
REGRESSION ANALYSIS
[The Simple Linear
Regression Model
• Economic theories are mainly concerned with the
relationships among various economic variables. When
we phrase the relationships in mathematical terms, we
can predict the effect of one variable on another.
• The specific functional forms may be linear, quadratic,
logarithmic, exponential, hyperbolic, or any other form.
•In this chapter we shall consider a
simple linear regression model, i.e. a
relationship between two variables
related in a linear form. We shall first
discuss two important forms of relation:
stochastic and non-stochastic.
2.1. Stochastic and Non-stochastic Relationships

A relationship between X and Y, characterized


as Y = f(X) is said to be deterministic or non-
stochastic if for each value of the independent
variable (X) there is one and only one
corresponding value of dependent variable (Y).
• On the other hand, a relationship between X and
Y is said to be stochastic if for a particular
value of X there is a whole probabilistic
distribution of values of Y.
• In such a case, for any given value of X, the
dependent variable Y assumes some specific
value only with some probability.
• Let’s illustrate the distinction between stochastic
and non stochastic relationships with the help of
a supply function.
• Assuming that the supply for a certain
commodity depends on its price (other
determinants taken to be constant) and the
function being linear, the relationship can be put
as:
Q  f (P)    P                   
 (2.1)
•The above relationship between P and Q is such
that for a particular value of P, there is only one
corresponding value of Q. This is, therefore, a
deterministic (non-stochastic) relationship.
• This implies that all the variation in Y is due
solely to changes in X, and that there are no
other factors affecting the dependent variable.
• If this were true all the points of price-quantity
pairs, if plotted on a two- dimensional plane,
would fall on a straight line.
• However, if we gather observations on the
quantity actually supplied in the market at
various prices and we plot them on a diagram we
see that they do not fall on a straight line.
•The derivation of the observation from the line
may be attributed to several factors.
a.Omission of variables from the function
b.Random behavior of human beings
c.Imperfect specification of the mathematical
form of the model
d.Error of aggregation
e.Error of measurement
• In order to take into account the above sources of
errors we introduce in econometric functions a random
variable which is usually denoted by the letter ‘u’ or ‘ ’
and is called error term or random disturbance.
•By introducing this random variable in the function the
model is rendered stochastic of the form:
• Yi    X  ui
……………………………………………………….
(2.2)
• Thus a stochastic model is a model in which the
dependent variable is not only determined by the
explanatory variable(s) included in the model but
also by others which are not included in the
model.
2.2. Simple Linear Regression model
• The above stochastic relationship (2.2) with one
explanatory variable is called simple linear
regression model.
• The true relationship which connects the
variables involved is split into two parts: a part
represented by a line and a part represented by
the random term ‘u’.
• The scatter of observations represents the true
relationship between Y and X. The line
represents the exact part of the relationship and
the deviation of the observation from the line
represents the random component of the
relationship.
• Yi    xi  ui
Dependent regression random variable
Variable line
The first component in the bracket is the part of Y
explained by the changes in X and the second is
the part of Y not explained by X, that is to say the
change in Y is due to the random influence of ui .
2.2.1 Assumptions of the Classical Linear Stochastic Regression Model.

• The classicals made important assumption in


their analysis of regression .The most
important of these assumptions are discussed
below.
1. The model is linear in parameters.

•The classicals assumed that the model should


be linear in the parameters regardless of whether
the explanatory and the dependent variables are
linear or not.
Example 1. Y    x  u is linear in both
parameters and the variables, so it Satisfies
the assumption.
2.ln Y     ln x  u is linear only in
the parameters. Since the the classicals
worry on the parameters, the model
satisfies the assumption.
2. Ui is a random real variable
•This means that the value which u may assume
in any one period depends on chance; it may be
positive, negative or zero. Every value has a
certain probability of being assumed by u in any
particular instance.
3. The mean value of the random variable(U) in any
particular period is zero
• This means that for each value of x, the
random variable(u) may assume various values,
some greater than zero and some smaller than
zero, but if we considered all the possible
positive and negative values of u, for any given
value of X, they would have on average value
equal to zero.
• Mathematically, E(Ui )  0
4. The variance of the random variable(U)
is
constant in each period(homoscedasticity)
• For all values of X, the u’s will show the same
dispersion around their mean.
• In Fig.2.c this assumption is denoted by the fact
that the values that u can assume lie with in the
same limits, irrespective of the value of X.
•Mathematically;

Constant variance is called homoscedasticity assumption and


the constant variance itself is called homoscedastic variance.
5. The random variable (U) has a normal distribution

•This means the values of u (for each x) have a


bell shaped symmetrical distribution about their
zero mean and constant variance , i.e. 2

•Ui  N (0,  2 )
6. The random terms of different observations Ui ,Uj are independent (no
autocorrelation)

• This means the value which the random term


assumed in one period does not depend on the
value which it assumed in any other period.
•Algebraically
• Cov(ui,u j )  [(ui  (ui )][u j  (u j )]
•  E(uiu j )  0
7. The random variable (U) is independent of the
explanatory variables(No endogeneity problem).

•This means there is no correlation between


the random variable and the explanatory
variable. If two variables are unrelated their
covariance is zero.
• Hence Cov( X i ,Ui )  0
8.The explanatory variables are measured without error

• U absorbs the influence of omitted variables


and possibly errors of measurement in the y’s.
i.e., we will assume that the regressors are
error free, while y values may or may not
include errors of measurement.
2.2.2 Methods of estimation
•Specifying the model and stating its underlying
assumptions are the first stage of any econometric
application.
•The next step is the estimation of the numerical values
of the parameters of economic relationships. The
parameters of the simple linear regression model can be
estimated by various methods.
•Three of the most commonly used methods are:
1.Ordinary least square method (OLS)
2.Maximum likelihood method (MLM)
3.Method of moments (MM)
•But, here we will deal with only the OLS.
2.2.2.1 The ordinary least square (OLS)
method
•The model Yi    Xi  Ui is called the true
relationship between Y and X.
• Because Y and X represent their respective
population value, and  and  are called the true
parameters since they are estimated from the
population value of Y and X.
•But it is difficult to obtain the population value
of Y and X because of technical or economic
reasons. So we are forced to take the sample
value of Y and X.
• The parameters estimated from the sample value
of Y and X are called the estimators of the true
parameters  and  and are symbolized as ^
and ˆ^ .
•The model Yi  ˆ  ˆX i  ei , is called estimated
relationship between Y and X since ˆ and ˆ are
estimated from the sample of Y and X.
•Estimation of α and β by least square method
(OLS) involves finding values for the estimates α ˆ
and βˆ which will minimize the sum of square of
the squared residuals ∑

•∑∑………………….(2.7)
• To find the values of α ˆ and βˆ that minimize
this sum, we have to partially differentiate ∑with
respect to α ˆ and βˆ and set the partial
derivatives equal to zero.

=∑
2.2.2.3. Statistical Properties of Least Square Estimators

• There are various econometric methods with


which we may obtain the estimates of the
parameters of economic relationships. We would
like to an estimate to be as close as the value of
the true population parameters i.e. to vary within
only a small range around the true parameter.
•How are we to choose among the different
econometric methods, the one that gives ‘good’
estimates? We need some criteria for judging the
‘goodness’ of an estimate.
• Closeness’ of the estimate to the population
parameter is measured by the mean and variance
or standard deviation of the sampling
distribution of the estimates of the different
econometric methods.
• Under the basic assumptions of the classical
linear regression model, the least squares
estimators are linear, unbiased and have
minimum variance (i.e. are best of all linear
unbiased estimators).
2.2.2.4. Statistical test of Significance of the OLS Estimators(First Order
tests)

• After the estimation of the parameters and the


determination of the least square regression line,
we need to know how ‘good’ is the fit of this
line to the sample observation of Y and X, that
is to say we need to measure the dispersion of
observations around the regression line.
• This knowledge is essential because the closer
the observation to the line, the better the
goodness of fit, i.e. the better is the explanation
of the variations of Y by the changes in the
explanatory variables.
• The two most commonly used first order tests in
econometric analysis are:
• i) The coefficient of determination (the
2
square of the correlation coefficient i.e. R ).
This test is used for judging the explanatory
power of the independent variable(s).
ii).The standard error tests of the estimators.
This test is used for judging the statistical
reliability of the estimates of the regression
coefficients.
1. TESTS OF THE ‘GOODNESS OF FIT’ WITH R2

• shows the percentage of total variation of the


dependent variable that can be explained by the
changes in the explanatory variable(s) included
in the model.
• To elaborate this let’s draw a horizontal line
corresponding to the mean value of the
dependent variable Y.
•Total variation =Explained variation
+Unexplained variation
TSS  ESS  RSS=1

2: 2
The limit of R The value of R falls between zero
and one. i.e. 0  R 2  1 .
• Suppose R 2  0.9 , this means that the
regression line gives a good fit to the observed
data since this line explains 90% of the total
variation of the Y value around their mean. The
remaining 10% of the total variation in Y is
unaccounted for by the regression line and is
attributed to the factors included in the disturbance
variable ui .
2. TESTING THE SIGNIFICANCE OF OLS PARAMETERS

• To test the significance of the OLS parameter


estimators we can use
i)Standard error test
ii) Student’s t-test
iii) Confidence interval
• All of these testing procedures reach on the same
conclusion. Let us now see these testing methods
one by one.
i. Standard error test

• This test helps us decide whether the estimates


and are significantly different from zero.
•Formally we test the null hypothesis
H 0 :  0 against the alternative hypothesis
H1 :  0
•First: Compute standard error of the parameters.
• SE() 
• SE() 
•Second: compare the standard errors with the
numerical values of and .
•Decision rule:
 If SE(ˆ )  ˆ , accept the null hypothesis

and reject the alternative hypothesis. We


conclude that ˆ is statistically insignificant.
 If SE(ˆ )  ˆ , reject the null hypothesis and
accept the alternative hypothesis. We conclude
that ˆ is statistically significant.
 The acceptance or rejection of the null
hypothesis has definite economic meaning.
• Namely, the acceptance of the null hypothesis 
 0 (the slope parameter is zero) implies that
the explanatory variable to which this estimate
relates does not in fact influence the dependent
variable Y and should not be included in the
function.
•Numerical example: Suppose that from a sample of
size n=30, we estimate the following supply function.
Q  120  0.6 p  ei
SE : (1.7) (0.025)
•Test the significance of the slope parameter at 5%
level of significance using the standard error test.
• SE (ˆ )  0.025
• (ˆ )  0.6
• ˆi=0.6/2=0.3
• 0.025 <0.3. We accept H1 or we reject H0.
•This implies that SE(ˆi )  12 ˆi . The implication
is ˆ is statistically significant at 5% level of
significance.
ii) Student’s t-test

• Like the standard error test, this test is also


important to test the significance of the
parameters.
• We can derive the t-value of the OLS estimates
• with n-k degree of freedom
•Like the standard error test we formally test the
hypothesis:
•H 0 : i  0 against the alternative
•H1 : i  0 for the slope
• Since we have two parameters in simple linear
regression with intercept different from zero, our
degree of freedom is n-2.
To undertake the above test we follow the
following steps.
•Step 1: Compute t*, which is called the computed
value of t, by taking the value of  in the null
hypothesis.
Step 2: Choose level of significance. Level of
significance is the probability of making ‘wrong’
decision, i.e. the probability of rejecting the
hypothesis when it is actually true or the
probability of committing a type I error.
• Step 3: Check whether there is one tail test or
two tail test. If the inequality sign in the
alternative hypothesis is  , then it implies a
two tail test and divide the chosen level of
significance by two;
• But if the inequality sign is either > or < then it
indicates one tail test and there is no need to
divide the chosen level of significance by two to
obtain the critical value from the t-table.
• Step 4: Obtain critical value of t, called tc at / 2
and n-2 degree of freedom for two tail test.
•Step 5: Compare t* (the computed value of t) and tc
(critical value of t)
 If t*> tc , reject H0 and accept H1. The conclusion is

ˆ is statistically significant.
 If t*< tc , accept H0 and reject H1. The conclusion is

ˆ is statistically insignificant.
•Numerical Example: Suppose that from a sample
size n=20 we estimate the following
consumption function:
• C  100  0.70  e
(75.5) (0.21)
•The values in the brackets are standard errors.
We want to test the null hypothesis:
H 0 : i  0 against the alternative

H1 : i  0 using the t-test at 5% level of


significance.
Step 1. t*= =0.70/0.21=3.3
Step 2 . Since the alternative hypothesis (H1) is
stated by inequality sign (  ) ,it is a two tail test,
hence we divide  /2  0.05/ 2  0.025 to obtain the
critical value of ‘t’ at  /2 =0.025 and 18 degree of
freedom (df) i.e. (n-2=20-2). From the t-table ‘tc’
at 0.025 level of significance and 18 df is 2.10.
Step 3. Since t*=3.3 and tc=2.1, t*>tc. It implies
that ˆ is statistically significant.
iii) Confidence interval
• Do your self.
The End of Chapter Two

You might also like