0% found this document useful (0 votes)
19 views30 pages

Basic Econometrics - II

The document discusses Classical Regression Analysis, focusing on model assumptions, estimation, and interpretation of results. It outlines key assumptions such as linearity, independence of errors, homoscedasticity, and the absence of multicollinearity, as well as the consequences of violating these assumptions. Additionally, it covers the Gauss-Markov theorem, hypothesis testing, and the importance of standard errors in assessing the reliability of estimators.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views30 pages

Basic Econometrics - II

The document discusses Classical Regression Analysis, focusing on model assumptions, estimation, and interpretation of results. It outlines key assumptions such as linearity, independence of errors, homoscedasticity, and the absence of multicollinearity, as well as the consequences of violating these assumptions. Additionally, it covers the Gauss-Markov theorem, hypothesis testing, and the importance of standard errors in assessing the reliability of estimators.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Classical Regression Analysis – Simple and

Multiple Regression Models – Model


Assumptions, Estimation and Interpretation
of Results

N Senthil Kumar
Member of Faculty, RBSC, Chennai
Part 2
Classical Regression Analysis -Model Assumptions
Consequence of Violations of Assumptions
Classical Regression Analysis -Model
Assumptions
I. The theoretical Regression model must be Linear in parameters.
A. 𝑌 = α 𝑋 β 𝑒 ϵ
B. 𝑌 = α + β 𝑋 2 + ϵ
β
C. 𝑌 = α + 𝑋 + ϵ
D. Log 𝑌 = α + β𝑋 + ϵ
E. 𝑌 = α + β 𝑙𝑜𝑔𝑋 + ϵ
Classical Regression Analysis -Model
Assumptions
II. The independent variables or fixed/ deterministic or
they are independent of error terms.
III. Mean of the error term is zero, 𝐸 ϵ𝑖 = 0 𝑉 i
• If the dependent variables are deterministic then β෠ will be
unbiased under assumption III.
• If the dependent variables are random, then we need the
dependent variable and errors should be independent.

The above two assumptions can be expressed as 𝐸 𝜖𝑖 /𝑋𝑖 =


0. Violation of this assumption is called endogeneity leading
to biased estimators.
Classical Regression Analysis -
Model Assumptions
IV. Homoscedasticity or constant
variance of error term, 𝐸 ϵ2𝑖 = σ2 𝑉 i
V. No Autocorrelation between errors,
i.e., 𝐸 ϵ𝑡 ϵ𝑡−𝑠 = 0
VI. The number of observations n must
be greater than the number of
parameters to be estimated
VII. X values should not all be the same.
The variance of X must be positive.
There can be no outlier in X.
VIII. There is no perfect linear
relationship among the regressors (no
multi-collinearity)
Gauss Markov Theorem
• CLRM Assumptions 4 & 5 makes the OLS
estimators to be Best Estimators
(Minimum Variance among the classes of
Linear Unbiased Estimators)
𝑣𝑎𝑟 𝛽መ𝑂𝐿𝑆 = 𝜎 2 Τσ 𝑋𝑖 − 𝑋ത 2
𝑠𝑒 𝛽መ𝑂𝐿𝑆 = 𝜎ොΤ σ 𝑋𝑖 − 𝑋ത 2
σ 2
𝜖𝑖Ƹ
𝜎ො 2 =
𝑛−2
• CLRM Assumptions 7 also makes the OLS
estimators to be consistent (large sample
property)
• CLRM Assumptions 8: Near multi-
collinearity increases the variance of the
estimators.
Gauss Markov Theorem
Under the assumptions of Classical
Linear Regression Model, the Least
Square Estimators are BLUE (Best
Linear Unbiased Estimators).
In other words, the Least Square
Estimators are Linear Unbiased
Estimators with minimum variance.

• CLRM Assumptions 1,2,3 makes


the OLS estimators to be Linear
and Unbiased Estimators
Part 3
Classical Linear Regression Analysis – Inference on Parameters
Classical Normal Linear Regression Model
(CNLRM)
• We know our estimators are BLUE, but can we
answer how accurate or how close these estimates
are to the real parameters?
• Our Estimators are basically random variables. To
answer the question above, we need to have idea
about the distributions of the estimators.
• For this purpose, we make a (reasonable ?)
assumptions that typically the errors follow normal
distributions.
• With the above assumptions, we are now in a
position to completely describe the distributions of
our estimators and hence will be able to answer the
question raised above.
𝛽෠𝑂𝐿𝑆 ~ 𝑁 (𝛽, 𝜎 2Τσ 𝑋𝑖 − 𝑋ത 2)
• We provide information on the level of accuracy by
constructing confidence intervals.
• Is there a gain of this additional assumption? Yes.
BLUE has become BUE
Interval Estimation for Regression Estimators
• Standard errors of the estimators which is the
standard deviation of the sampling
distributions of the estimator is an important
measure of precision.
𝑠𝑒 𝛽መ𝑂𝐿𝑆 = 𝜎ො 2൘෍ 𝑋𝑖 − 𝑋ത 2 𝑎𝑛𝑑

𝜎ො 2 = ෍ 𝜖ෝ𝑖 2Τ 𝑛 − 2
• If the standard errors are smaller, our
estimates are more likely to be closer to the
real parameter.
• Interval estimation provides a confidence
interval. For level of 5%, the probability of such
estimated intervals containing the true
parameter is 95% 2
• Error variance is 𝜎 unknown, which we
replace with estimated values
• This lead to use of t distribution for arriving at
the confidence intervals:
𝑃 ቀ𝛽መ𝑂𝐿𝑆 − 𝑡𝛼Τ2 𝑠𝑒 𝛽መ𝑂𝐿𝑆
≤ 𝛽 ≤ 𝛽መ𝑂𝐿𝑆 −𝑡𝛼Τ2 𝑠𝑒 𝛽መ𝑂𝐿𝑆 ቁ = 1 − 𝛼
Hypothesis Testing on Regression Estimators
• Can we make a statement about the relationship
between the dependent and independent
variable?
• Statistical Hypothesis Testing tells whether the
observed data is compatible with the statement.
• Is there a relationship between the dependent
and independent variable? Is reduced to testing
Ho : β=0 Vs H1 : β ≠ 0.
• The test is statistically significant if the value of
the t-statistic (under the null hypothesis t-
statistic = 𝛽መ𝑂𝐿𝑆 / 𝜎ො 2Τσ 𝑋𝑖 − 𝑋ത 2 ) falls in the
critical region at a given level of significance.
• In our case, very high or low values of t-statistic
will be evidence against the null hypothesis.
• P-Value also can be used, which is the
probability of getting t value equal and more
than the calculated value. Hence, very small p
value, we reject the null.
Interpretation of Results
of Regression Analysis
1. Standard errors of the estimators
provides a good measure of reliability
of the estimators.
2. Check whether the sign of dependent
variables are in consistent with the
prior theory.
3. If P value is lower, the test is
significant and reject the null.
4. Estimate and arrive at estimated
errors (proxies to actual errors) and
see whether they follow normal
distributions.
5. Use adjusted R2 as a measure of
goodness of fit for the model
Consequence of violation of normality
assumption
1. Normality violation does not affect the BLUE properties of the OLS
estimators since BLUE properties depends only on the mean, variance,
covariance structure of the errors and not on the distribution of errors.
2. Normality assumption helps in deriving the distributions of the
estimators and hence in hypothesis testing.
3. Without normality assumption, for large samples, we can assume normal
distribution for the estimators.
2 4
𝑆2 (𝐾−3)2 𝐸(𝑋−𝜇)3 𝐸(𝑋−𝜇)
4. Normality test: 𝐽𝐵 = 𝑛 + , 𝑆2 = , K=
6 24 𝐸(𝑋−𝜇)2 3 𝐸(𝑋−𝜇)2 2
5. 𝐽𝐵 ~ 𝜒22 under the null hypothesis of normally distributed errors
Simple Linear Regression Analysis Call:
lm(formula = circumference ~ age, data = Orange)
Practical – 2 (Standard Errors of
the Estimators) Residuals:
Min 1Q Median 3Q Max
𝑣𝑎𝑟 𝛽෠𝑂𝐿𝑆 = 𝜎 2൘෍ 𝑋𝑖 − 𝑋ത 2 -46.310 -14.946 -0.076 19.697 45.111

Coefficients:
Standard error of the estimators: Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.399650 8.622660 2.018 0.0518 .
𝑠𝑒 𝛽෠𝑂𝐿𝑆 = 𝜎ො ൘ ෍ 𝑋𝑖 − 𝑋ത 2
age 0.106770 0.008277 12.900 1.93e-14 ***
---
Standard error of Residuals Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

σ 𝜖ො 2𝑖 Residual standard error: 23.74 on 33 degrees of freedom


𝜎ො 2 =
𝑛−2 Multiple R-squared: 0.8345, Adjusted R-squared: 0.8295
t-statistic = 𝛽෠𝑂𝐿𝑆 /𝑠𝑒 𝛽෠𝑂𝐿𝑆 F-statistic: 166.4 on 1 and 33 DF, p-value: 1.931e-14

Degrees of freedom =n-2


R Commands {nomtest} {MASS}
n <- nrow (Orange)
sigma <- sqrt (sum (m3$residuals^2)/(n-2))
sigma/sqrt (sum ((Orange$age - mean (Orange$age))^2))
t_value <- m3$coefficients["age"] /( sigma/sqrt (sum ((Orange$age - mean (Orange$age))^2)))
p.value <- 1- pt (t_value , df = n-2) + pt (-t_value , df = n-2)
jb.norm.test(m3$residuals)
hist (stdres(m3),probability = T)
lines (seq (-3,3,0.1), dnorm (seq (-3,3,0.1)), col="blue")
Consequence of violation of Classical Regression Analysis
Assumptions - Heteroskedasticity

• If 𝑬 𝒖𝒊 ≠ 𝟎, 𝒂𝒏𝒅 𝑬 𝒖𝒊 = 𝝁, then end-up in estimating a model 𝑦 = 𝛼 ∗ + 𝛽 𝑋 +


𝑢 ∗, where 𝛼 ∗ = α + 𝜇. Note that however, 𝛽መ𝑂𝐿𝑆 is continue to be BLUE.
𝑥𝑖

• If 𝑋𝑖 𝑠 are random, and if 𝑬 𝒖𝒊/𝑿𝒊 ≠ 𝟎 then, 𝛽𝑂𝐿𝑆 = 𝛽 + σ 𝑢𝑖 then 𝐸(𝛽መ𝑂𝐿𝑆) ≠
𝑆𝑥𝑥
𝛽. Hence the OLS estimator is biased. Such regressors are referred as
endogenous. Note 𝐸 𝑢𝑖 /𝑋𝑖 = 0 → 𝐸 𝑢𝑖 𝑋𝑖 = 0, 𝑎𝑛𝑑 𝐸 𝑢𝑖 = 0
• If 𝑬 𝒖𝟐𝒊 ≠ 𝝈𝟐, 𝒂𝒏𝒅 𝑬 𝒖𝟐𝒊 ≠ 𝝈𝟐𝒊 ,i.e. violation of homoscedasticity: 𝛽መ𝑂𝐿𝑆 continue
to be unbiased and consistent. However, variance 𝛽መ𝑂𝐿𝑆 is no longer 𝜎 2 /𝑆𝑋𝑋 and is
σ 𝑤𝑖2 𝜎𝑖2 . Hence the formula for estimating the stand error 𝑠𝑒(𝛽መ𝑂𝐿𝑆) has become
incorrect. Also, 𝜎ො 2 is also estimated with incorrect formula and hence inference
on the parameter is not possible.
• Further 𝛽መ𝑂𝐿𝑆 looses efficiency to 𝛽መ𝐺𝐿𝑆.
Approach for Heteroskedasticity - Inference
• For inference, White (1980), heteroskedasticity consistent standard errors
can be used.
σ 𝑥𝑖2𝑢ෝ2𝑖
• White used robust standard error based on var 𝛽መ𝑂𝐿𝑆 = (in matrix
𝑆𝑋𝑋
form V = 𝑣𝑎𝑟 𝛽መ𝑂𝐿𝑆 = 𝑋 ′𝑋 −1(𝑋 ′Σ𝑋) 𝑋 ′𝑋 −1 ) and the corresponding t-
statistic for inference.
• In the case of only heteroskedasticity, (𝑋 ′Σ𝑋)𝐾×𝐾 = σ𝑛𝑖=1 𝜎𝑖2 𝒙𝒊 𝒙𝑻𝒊 . Note that
this meat is estimated by replacing 𝜎𝑖2 with its estimator 𝑢ො 𝑖2. Hence we
have White (1980) heteroskedasctity robust standard error is given by
𝑉෢
𝑊 = 𝑋 ′ −1 σ𝑛
𝑋 𝑢
ො 2
𝒙
𝑖=1 𝑖 𝒊 𝒊𝒙𝑻
𝑋 ′ −1
𝑋 .
• To consistently estimate 𝑣𝑎𝑟 𝛽መ𝑂𝐿𝑆 it is not necessary to estimate all the
n(n+ 1)/2 unknown elements in the Σ matrix but only the K(K + 1)/2
ones (𝑋 ′Σ𝑋)𝐾×𝐾.
Approach for Heteroskedasticity - Estimation
• Instead of estimating each 𝜎𝑖 , specify a functional relationship justified by theory
so that the number of parameters to be estimated may be minimal.
• WGLS: For example, in a regression, 𝑦𝑖 = 𝛼 + 𝛽𝑋𝑖 + 𝛾𝑍𝑖 + 𝑢𝑖 if we suspect
heteroskedasticity across 𝑍𝑖 s, then we have a relationship as 𝜎𝑖2 = 𝜎 2 𝑍𝑖 , then the
transformation 𝑦𝑖∗ = 𝑦𝑖 / 𝑍𝑖, 𝑋𝑖∗ = 𝑋𝑖 / 𝑍𝑖 , will provide us BLUE estimators.
1 𝑋𝑖 𝑢𝑖
• The regression will be 𝑦𝑖∗ = 𝛼 +𝛽 +𝛾 + 𝑢𝑖∗ where 𝑢𝑖∗ =
𝑍𝑖 𝑍𝑖 𝑍𝑖
• FGLS: Certain functional form for modelling heteroskedasticity:
• 𝜎𝑖2 = 𝜎 2 𝑍𝑖
• 𝜎𝑖2 = 𝛼 + 𝛽1 𝑍1,𝑖 + 𝛽2 𝑍2,𝑖 (additive)
𝛿1 𝛿 2
• 𝜎𝑖2 = 𝜎 𝑍1,𝑖 𝑍2,𝑖 … . (𝑤𝑒 𝑐𝑎𝑛 𝑡𝑎𝑘𝑒 𝑙𝑜𝑔)
• 𝜎𝑖2 = 𝛼 + 𝛽1 𝑋1,𝑖 + 𝛽1 𝑋1,𝑖
2
..
• We use OLS 𝑢ො 𝑖2 for 𝜎𝑖2 for estimating the variance, 𝜎𝑖2 and FGLS estimator will be
−1 ′ −1

𝛽𝐹𝐺𝐿𝑆 = 𝑋 𝛴 𝑋 𝑋 𝛴෠ 𝑦 where 𝛴 = diag (𝜎ො𝑖2 )
′ ෠−1
Test for Homoscedasticity
Goldfeld and Quandt (1965) Test
Orders observations based on the regressor 𝑋𝑖 and split into three equal
parts, leaves the middle portion and estimate the heteroskedasticity i.e.,
𝜎12 𝑎𝑛𝑑𝜎22. Under the null of homoskedasticity, the ratio i.e., ratio of mean
squares is F distribution. If more than one regressor, order is based on 𝑌෠𝑖 .
R Software: gqtest (y ~ x) {lmtest}
Breuch Pagan (1979) Test
Test 𝐻𝑜 : 𝑏1 = 𝑏2 … = 𝑏𝑟 = 0 𝑖𝑛 𝜎𝑖2 = 𝑓 (𝑎 + 𝑏1 𝑍1 + 𝑏2𝑍2 … 𝑏𝑟 𝑍𝑟 ). Under the
null, 𝑢𝑖2 /𝜎 2 is homoscedastic and hence run a regression on 𝑢ො 𝑖2 /𝜎ො 2 on Z
variables and test the above null. The test statistic is 𝜒2 (𝑟). Here the
functional form need not be specified and hence it is more general test.
R Software: bptest (y ~ x) {lmtest}
Residual Analysis

1. Histogram analysis to check 𝐸 𝑢𝑖 = 0


2. Two dimensional scatter plot for checking 𝐸 𝑢𝑖 /𝑋𝑖 = 0. Plot estimated
𝑢ො 𝑖 𝑣𝑠 𝑋𝑖 .
3. Two dimensional scatter plot for checking 𝐸 𝑢𝑖2/𝑋𝑖 = 0. Plot estimated
𝑢ො 𝑖 2𝑣𝑠 𝑋𝑖 .
4. Two dimensional scatter plot for checking 𝐸 𝑢𝑡 𝑢𝑡−1 = 0. Plot estimated
𝑢ො 𝑡 𝑣𝑠 𝑢ො 𝑡−1. Correlogram analysis.
5. Histogram Analysis, qqplot, JB Test for normality analysis.
6. Check for outliers, since outliers are highly influential due to higher
weights 𝑤𝑖 .
7. Model Mis-specification Analysis
Simple Linear Regression Analysis
Practical – 3 (Testing :
Heteroskedasticity)
gqtest(m3)

Goldfeld-Quandt test

data: m3
GQ = 1.3588, df1 = 16, df2 = 15,
p-value = 0.2789
alternative hypothesis: variance
increases from segment 1 to 2

> bptest(m3) > par (mfrow=c(2,2))


> plot (y=m3$residuals, x=m3$model$age)
studentized Breusch-Pagan > plot (y=m3$residuals^2, x=m3$model$age)
test > m4 <- lm (m3$residuals^2 ~ m3$model$age)
> abline (m4)
data: m3 > plot (y=m3$residuals, x=m3$fitted.values)
BP = 11.228, df = 1, p-value = > plot (x=m3$model$age,
0.0008056 y=m3$model$circumference)
> abline (m3)
Simple Linear Regression Analysis
Practical – 3 (Heteroskedasticity robust inference)
Coefficients:
Under the assumptions: Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.399650 8.622660 2.018 0.0518 .
𝑣𝑎𝑟 𝛽መ𝑂𝐿𝑆 = 𝜎 2൘෍ 𝑋𝑖 − 𝑋ത 2
age 0.106770 0.008277 12.900 1.93e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Standard error of
𝑛
the estimators: Residual standard error: 23.74 on 33 degrees of freedom
Multiple R-squared: 0.8345, Adjusted R-squared: 0.8295
𝑉෢ ′
𝑊 = 𝑋 𝑋
−1 ෍ 𝑢 ො 𝑖2 𝒙𝒊 𝒙𝑻𝒊 𝑋 ′𝑋 −1 F-statistic: 166.4 on 1 and 33 DF, p-value: 1.931e-14

𝑖=1

Heteroskedasticity Robust Standard Error


σ 𝑥𝑖2 𝑢ො 𝑖2 t test of coefficients:
var 𝛽መ𝑂𝐿𝑆 =
𝑆𝑋𝑋 Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.399650 6.433544 2.7045 0.01073 *
age 0.106770 0.011053 9.6600 3.822e-11 ***
---
R Commands {sandwich} Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Vw <- vcovHAC (m3)
coeftest(m3,vcov. = vcovHAC)
Consequence of violation of Classical Regression Analysis Assumptions -
Auto Correlation (Homoskedasticity is assumed)
• Auto correlation is more likely in time series regression. In cross section
data, individuals are randomly selected and not related to each other.
Auto correlation is indicated as 𝜎𝑖𝑗 = 𝐸 𝑢𝑖 𝑢𝑗 ≠ 0.
• OLS estimators are still unbiased and consistent in the presence of
autocorrelation.
• However, variance 𝛽መ𝑂𝐿𝑆 is no longer 𝜎 2 /𝑆𝑋𝑋 and is given by
var 𝛽መ𝑂𝐿𝑆 = ෍ 𝑤𝑡2 𝜎𝑢2 + ෍ ෍ 𝑤𝑡 𝑤𝑠 𝜌|𝑡−𝑠|𝜎𝑢2

• Hence the formula for estimating the stand error 𝑠𝑒(𝛽መ𝑂𝐿𝑆 ) has become
incorrect. Also, 𝜎ො 2 is also estimated with incorrect formula and hence
inference on the parameter is not possible.
• Further 𝛽መ𝑂𝐿𝑆 is not efficient.
Approach for Auto Correlation - Inference
• For inference, Newy and West (1987), extended Whites idea by replacing
the heteroskedastic variance with OLS 𝑢ො 𝑖2 and correlations with 𝑢ො 𝑡 𝑢ො 𝑡−𝑠 for
desirable levels of lags.
• Note that the asymptotic depends on large T relative to the number of lags.
• Hence, we can use the Heteroskedasticity-Auto-correlation robust
standard errors as estimated by 𝑣𝑎𝑟 𝛽መ𝑂𝐿𝑆 = 𝑋 ′𝑋 −1(𝑋 ′Σ𝑋) 𝑋 ′𝑋 −1 for
inference.
Estimation Approach for Auto Correlation (Homoskedasticity is assumed)
• In the presence of auto correlation, 𝛽መ𝑂𝐿𝑆 is not efficient and hence we
need to model the same.
• Estimating T (T-1) parameters are not possible and hence structural
approach (justified by theory) is used to estimate the correlations so that
the number of parameters to be estimated may be minimal .
• Example: For AR(1) structure 𝑢𝑡 = 𝜌𝑢𝑡−1 + 𝜀𝑡 with 𝜀𝑡 𝑊𝑁 (0, 𝜎𝜀2 ) which
gives 𝑣𝑎𝑟 (𝑢𝑡 ) = 𝜎𝑢2 = 𝜎𝜀2 Τ 1 − 𝜌2 𝑎𝑛𝑑 𝐸 𝑢𝑡 𝑢𝑡−𝑠 = 𝜌𝑡−𝑠 𝜎𝑢2
𝜌ො
• For AR(1), we have a transformation (Cochrane-Orcutt (1949)) as below
renders a BLUE estimator. Here 𝜌 is in AR(1) specification as 𝑢𝑡 =
𝜌𝑢𝑡−1 + 𝜀𝑡 .
𝑦𝑡 − 𝜌𝑦𝑡−1 = 𝛼 1 − 𝜌 + 𝛽 𝑋𝑡 − 𝜌𝑋𝑡−1 + 𝜀𝑡
𝑤ℎ𝑒𝑟𝑒 𝑡 = 2 𝑡𝑜 𝑇 𝑎𝑛𝑑 𝜀𝑡 = 𝑢𝑡 − 𝜌𝑢𝑡−1
• We replace 𝜌 with a consistent estimator 𝜌ො which gives asymptotically

efficient estimators 𝛼ො , 𝑎𝑛𝑑 𝛽.
• Cochrane-Orcutt (1949) also provide an iterative search method for 𝜌ො
Test for Auto Correlation
Durbin-Watson Test
𝑇 ෝ𝑡−𝑢
𝑢 ෝ𝑡−1 2
• Durbin-Watson Statistic 𝐷𝑊 = σ𝑡=2 𝑇 2 → 2 (1 − 𝜌). Hence when 𝜌=0,
σ𝑡1 𝑢
ෞ𝑡
DW will be closed to 2 and DW varies from 0 to 4.
• The Durbin-Watson statistic is appropriate when there is a constant in the
regression. Durbin-Watson statistic is inappropriate when there are lagged
values of the dependent variable among the regressors.
Breusch-Godfrey test
• Breusch-Godfrey test is a Lagrange Multiplier test. Here OLS residuals 𝑢ො 𝑡 is
regressed
2 2
on 𝑢ො 𝑡−1 and other regressors in the model. The test statistic is
𝑇𝑅 ~ 𝜒 1 under the null of no auto correlation.
• Breusch-Godfrey test can be specified to include AR(2), MA(2) etc.
• Breusch-Godfrey test is valid even when lagged values of the dependent variable
are present among the regressors.
• “For 𝜌 is large and positive, first differencing the data may not be a bad solution”
Baltagi.
Simple Linear Regression Analysis Practical – 3 (Autocorrelation) Setting up the model
> m.ts <- lm (C ~ Y, data = us_inc_cons)
> summary (m.ts)
Call:
lm(formula = C ~ Y, data = us_inc_cons)
Residuals:
Min 1Q Median 3Q Max
-929.79 -317.75 17.04 339.56 799.53
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.343e+03 2.196e+02 -6.118 1.78e-07 ***
Y 9.792e-01 1.139e-02 85.961 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 437.6 on 47 degrees of freedom
Multiple R-squared: 0.9937, Adjusted R-squared: 0.9935
F-statistic: 7389 on 1 and 47 DF, p-value: < 2.2e-16
Simple Linear Regression Analysis Practical – 3 (Autocorrelation) Testing for Auto Correlation
par (mfrow=c(2,2))
acf (m.ts$residuals)
pacf (m.ts$residuals)
m1.ts <- lm (m.ts$residuals ~ lag
(m.ts$residuals, 1))
acf (m1.ts$residuals)
pacf (m1.ts$residuals)
par (mfrow=c(1,1))

dwtest(m.ts)
bgtest (m.ts)
bgtest (m.ts,order = 2)

> dwtest(m.ts)
Durbin-Watson test
data: m.ts
DW = 0.1805, p-value < 2.2e-16
alternative hypothesis: true autocorrelation
is greater than 0
> bgtest (m.ts)
Breusch-Godfrey test for serial
correlation of order up to 1
data: m.ts
LM test = 38.512, df = 1, p-value = 5.443e-
10
Simple Linear Regression Analysis Practical – 3 (Autocorrelation) HA Consistence Inference/Estimation
> V <- NeweyWest(m.ts,lag = 3,prewhite = F)
> coeftest(m.ts,V)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.3433e+03 4.1359e+02 -3.248 0.002148 **
Y 9.7923e-01 2.1971e-02 44.569 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> m2.ts <- cochrane.orcutt(m.ts)
> summary (m2.ts)
Call:
lm(formula = C ~ Y, data = us_inc_cons)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.7237e+03 8.5921e+02 -2.006 0.05074 .
Y 9.9614e-01 3.7925e-02 26.266 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 183.2567 on 46 degrees of freedom
Multiple R-squared: 0.9375 , Adjusted R-squared: 0.9361
F-statistic: 689.9 on 1 and 46 DF, p-value: < 2.445e-29
Durbin-Watson statistic
(original): 0.18050 , p-value: 4.192e-22
(transformed): 2.44775 , p-value: 9.27e-01
Simple Linear Regression Analysis Practical – 3 (Autocorrelation) Residual Analysis

par (mfrow = c(2,2))


plot (us_inc_cons$C, , cex=.7, col="blue", type="l")
par (new=T)
plot (m2.ts$fitted.values, cex=.7, col="red", type="l")
plot (m2.ts$residuals, type="l")
pacf (m2.ts$residuals)
acf (m2.ts$residuals)
Thanks

You might also like