0% found this document useful (0 votes)
12 views6 pages

Time Series Analysis_Ch 10,12,18(Woolridge) Notes

Chapter 10 discusses the importance of the assumption of no serial correlation in time series data for reliable OLS estimates and valid statistical inference. It explains how serial correlation can lead to misleading results and introduces the AR(1) model as a solution to account for autocorrelation. Additionally, the chapter highlights the issues of spurious regression when analyzing non-stationary time series and the necessity of differencing to obtain meaningful relationships between variables.

Uploaded by

Harsh Jaswal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views6 pages

Time Series Analysis_Ch 10,12,18(Woolridge) Notes

Chapter 10 discusses the importance of the assumption of no serial correlation in time series data for reliable OLS estimates and valid statistical inference. It explains how serial correlation can lead to misleading results and introduces the AR(1) model as a solution to account for autocorrelation. Additionally, the chapter highlights the issues of spurious regression when analyzing non-stationary time series and the necessity of differencing to obtain meaningful relationships between variables.

Uploaded by

Harsh Jaswal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Chapter 10 Time Series Data Under additional assumptions (TS.1 to TS.

5), OLS is the


best (most accurate) method to estimate the relationship
Assumptions: between variables in time series data. This is what we call
BLUE.
Assumption TS.5 (No Serial Correlation) Explained in
Simple Terms: Statistical Inference:
What Does It Mean? If all assumptions (TS.1 to TS.6) are satisfied, you can use
standard t-tests, F-tests, and standard errors to test
Serial Correlation (also known as autocorrelation)
hypotheses and make conclusions from the regression
happens when the errors (the unexplained parts of the
results.
model) in different time periods are related to each other.
Temporal Correlation:
No Serial Correlation means that the errors in one time
period are completely unrelated to the errors in any other In time series data, observations are not independent of each
time period. In other words, knowing the error from last other. What happens today often depends on what happened
period does not help you predict the error in the current in the past (like how stock prices today are influenced by
period. prices yesterday).
Simple Example: Because of this, we need to carefully consider how errors
(the unexplained part of the model) are related to past errors
Imagine you're tracking the temperature in your city every
and to the variables we are studying over time.
day, and you're using a model to predict the temperature
based on various factors like humidity, wind speed, etc. The Logarithms and Dummy Variables:
error is the difference between the actual temperature and
the predicted temperature from your model. Logarithms: In time series analysis, we often take the
logarithm of variables to make relationships linear or to
If there's no serial correlation, the error on Monday has stabilize the variance over time. For example, instead of
nothing to do with the error on Tuesday. They are looking at GDP directly, we might look at the log of GDP.
independent of each other.
Dummy Variables: These are used to represent different
If there is serial correlation, the error on Monday might categories or events, like if a month is part of a specific
affect the error on Tuesday (for example, if the model season (summer vs. winter) or whether a policy was in effect
consistently overestimates the temperature on one day, it during a particular time. Dummy variables take values like 0
might also overestimate it on the next day). or 1 to indicate presence or absence of certain conditions.
Why Is It Important?
Trends and Seasonality:
The assumption of no serial correlation helps ensure that
Trends: Time series data often has a trend, which means
the model is reliable and the statistical tests (like t-tests and
that the data either increases or decreases over time (e.g.,
F-tests) are valid. If serial correlation exists, the standard
population growth over years).
errors might be incorrect, which makes statistical inference
(drawing conclusions from the model) unreliable. Seasonality: This refers to regular patterns in the data that
repeat over time (e.g., higher sales during the holiday season
Real-Life Example:
every year).
Imagine you are predicting a stock's price based on past
Handling Trends and Seasonality: To account for these in
prices and other factors. If there is no serial correlation,
a regression model, we can include time and seasonal
any errors made in predicting today’s stock price will not
dummy variables. These help isolate the effect of trends
give you any information about tomorrow’s prediction
and seasons, making the model more accurate.
errors. Each day’s errors are random and independent.
Problems with R-Squared:
However, if there is serial correlation, it might mean that if
your model makes a large error today, it will likely make R-squared is a common measure of how well a regression
a large error tomorrow as well. This could happen if the model fits the data. However, in time series data, R-squared
model is missing some important trend or pattern in the data. can be misleading, especially when there are trends or
seasonality.
OLS is a common method used to estimate the relationship
between variables. In time series data, OLS remains Alternatives: Instead of using the standard R-squared, we
unbiased under certain conditions (referred to as TS.1 to can use other techniques like detrending (removing the
TS.3 assumptions). trend) or deseasonalizing (removing the seasonal effects) to
get a better understanding of how well the model is
Unbiased means that on average, the OLS estimates are performing.
correct—they do not systematically overestimate or
underestimate the true relationship.

BLUE (Best Linear Unbiased Estimator):


Chapter 12

Serial Correlation Problem:

Serial correlation happens when the errors (residuals) from


your regression model are not independent of each other but
are correlated over time.

Why Should You Care About Autocorrelation?

If your residuals are autocorrelated, your regression model


might:
Predicting Future Values: Once ϕ is estimated, you can use
Underestimate the true variability: This could make the it to predict future values of the time series. For example, if
model seem more precise than it actually is, leading to ϕ=0.8, the model will predict that the next value will be 80%
incorrect conclusions from hypothesis tests. of the previous value, adjusted for random error.

Give misleading estimates: Standard errors might be Error Term ϵt: The error term captures any randomness or
biased, leading to incorrect t-statistics and p-values, which shocks that are not explained by the previous value. It adds
are used to determine the statistical significance of some unpredictability to the model.
predictors in the model.
How AR(1) Model Helps with Serial
How to Handle It: Correlation
1.Using an AR(1) model Instead of ignoring the dependence between observations,
the AR(1) model includes it as part of the model itself. In
(AutoRegressive model of order 1), which essentially
simple terms, it removes serial correlation by "absorbing"
models the current error as a function of the previous error.
the correlation into the model, so what’s left in the residuals
You can test for serial correlation by regressing the (errors) should ideally be uncorrelated or random noise.
residuals from your original model on the lagged residuals
By making each value of the time series dependent on the
(residuals from the previous time period).
previous value the AR(1) model explains away some of the
AR(1) Model Explained in Simple Terms correlation that would have otherwise been left in the
residuals. If the model is a good fit, the residuals should
An AutoRegressive (AR) model is used to predict future show little to no serial correlation.
values in a time series based on its own past values.
Testing for Remaining Correlation:
The AR(1) model is the simplest version of this, where the
After fitting the AR(1) model, you can check whether there’s
current value of the time series depends only on the
still serial correlation in the residuals by looking at the
immediately preceding value.
autocorrelation function (ACF) and partial
AR(1) stands for AutoRegressive model of order 1. autocorrelation function (PACF) plots.

"Auto" refers to the fact that the model relies on the time What Are ACF and PACF Plots?
series itself (no external factors or variables).
Both the ACF and PACF plots are tools to examine serial
"Regressive" means the model uses a regression approach, correlation (autocorrelation) in a time series.
in this case, the previous value of the same series.
Simple Analogy : Parent Influence
Order 1 means that the current value depends only on the
Think of ACF and PACF like a family reunion where
last (previous) value. Higher orders (AR(2), AR(3), etc.)
relatives influence you.
would include dependence on more past values (two or three
 ACF is like asking: "How much do all your
relatives influence you, whether directly or
indirectly?" Even distant relatives (e.g.,
grandparents) might influence you, but their
influence could come through intermediate family
members (e.g., parents).

 PACF is like asking: "How much does each


The AR(1) model helps predict the next value of a time relative directly influence you, once you remove
series based on the last value. It assumes that the future the influence of closer relatives?" It measures the
depends only on the most recent past, with some random direct impact of a particular relative (e.g.,
noise added. grandparents) on you, after controlling for the
influence of closer relatives (e.g., parents).
Analogy: Stock Price influence:

Think of ACF and PACF as tracking how past stock prices


affect today's stock price:

 ACF asks: "How much do past stock prices


influence today's stock price, including both
direct and indirect effects?" For example, the
stock price from 3 days ago might affect today's
price, but part of that influence could be due to the If the model is a good fit, the residuals should now be free of
price 2 days ago affecting yesterday's price, which serial correlation, meaning the ACF and PACF of the
then affects today's price. ACF captures this entire residuals would show no significant spikes.
chain of influence.
Correlogram:
 PACF asks: "How much does a specific past
stock price (e.g., from 3 days ago) directly A correlogram is a visual representation of the
influence today's stock price, once we account autocorrelation of a time series.
for the influence of prices from 1 and 2 days It shows how the values of the time series are related to
ago?" Here, PACF isolates the direct effect of that themselves at different lags (time steps).
3-day lag, after removing the effects of
intermediate days. This is often used to detect patterns like trends, seasonality,
or autocorrelation in time series data.
Relationship Between ACF, PACF, and AR(1)
Purpose: A correlogram helps us understand the strength
For an AR(1) model, the general patterns you would expect and direction of correlation between observations in a time
are: series over different time lags.
ACF (Autocorrelation Function): Graph: It plots the autocorrelation coefficients (values
 Measures the correlation between the current value between -1 and 1) on the y-axis and the lag number on the x-
and values at all previous time lags. axis.
 In other words, it tells you how each past value Usage Example: If you have daily stock prices, a
influences the current value across various time correlogram can show whether today's stock price is
lags. correlated with yesterday’s, the day before, etc. This helps
 The autocorrelation will decline gradually as the identify if past prices influence future prices.
lags increase.
 The ACF plot typically shows a strong spike at lag
1 (the first past value) and then the correlations 2.Testing for Serial Correlation with lagged
diminish, often exponentially. residuals
 This is because, in an AR(1) model, the influence
of past values is strongest at lag 1 and becomes
weaker for further lags.

PACF:

 PACF (Partial Autocorrelation Function):


Measures the correlation between the current value
and a past value while controlling for the values in
between.
 It isolates the direct influence of a particular lag
 For an AR(1) model, the PACF plot will typically
show a sharp spike at lag 1 and close to zero
thereafter.
 This reflects the fact that, once you account for the 3.Durbin-Watson Test:
immediate past value (lag 1), further lags add little
additional information. The Durbin-Watson test is a statistical test used to detect
the presence of autocorrelation (or serial correlation) in the
How to Interpret ACF and PACF Graphs residuals from a regression analysis.

The test specifically checks for first-order autocorrelation,


meaning whether the error terms from one period are
correlated with the error terms from the previous period.
Interpretation of Durbin-Watson Statistic (D):

 D ≈ 2: No autocorrelation. The residuals are


independent, and there is no systematic pattern in
the errors.

 D < 2: Positive autocorrelation means that the


errors are not random. Instead, they follow a
pattern. If your model makes a positive error (it
overestimates the result), it's more likely that the
next error will also be positive.

Explanation:

If your model makes a positive error (it overestimates the


result), it's more likely that the next error will also be
positive.

Similarly, if your model makes a negative error (it


underestimates the result), the next error is also likely to be
negative.

 D > 2: Negative autocorrelation. This indicates that


positive errors tend to follow negative errors and
vice versa.

Explanation:

When we say D > 2: Negative autocorrelation, it means


that the errors (residuals) from our model are alternating in a
back-and-forth pattern.

Negative autocorrelation means that after a positive error,


it’s more likely that the next error will be negative, and vice
versa.

 Positive error means the model predicted a value


that’s higher than the actual value.

 Negative error means the model predicted a value


that’s lower than the actual value.

One time, the model overestimates (positive error), but the


next time, it underestimates (negative error).

For example, say you’re predicting sales again. If your


model predicts higher sales than actual (positive error) in
June, it might predict lower sales than actual (negative
error) in July, and then back to higher in August. The errors
are swinging back and forth.
Chapter 18: Initially, if you directly run a regression between these two
series (stock prices vs. GDP levels), it might show a strong
Spurious Regression relationship that doesn’t actually exist. This is called
spurious regression.
Imagine you’re looking at the stock prices of two
completely unrelated companies that are both increasing To fix this issue, you apply differencing to remove the long-
over time. term trends in the data.
If you run a regression between these two stock prices, the Step-by-Step Process:
model might tell you that they are highly related, even
though there is no real relationship between them—both Step 1: Difference the Series
are just trending upwards independently. This false
To deal with the non-stationarity, you calculate the
relationship is what we call spurious regression.
differences between consecutive data points in both Series
Spurious regression happens when you try to regress one A and Series B.
non-stationary time series (I(1)) on another non-stationary
time series (I(1)).

Even though the two series might be completely unrelated


(random), the regression can still show a strong relationship.
This is misleading because the relationship you see isn’t real
—it's just an artifact of the fact that both series are non-
stationary.

Key Issues:

Random Walks: A random walk is a type of I(1) series


where each value is a random step from the previous one.

Even if two random walks are completely independent, if


you run a regression on them, it might look like there's a
significant relationship when there isn’t.

t-test and Slope Coefficient: In normal regression,


we use the t-test to check whether the slope (relationship Interpret the Results
between variables) is statistically significant.
After running the regression on the differenced data, you are
However, with spurious regression, even when there is no no longer comparing the absolute levels (which might have
real relationship, the t-test will often incorrectly say that been misleading), but rather the short-term changes.
there is one (it will "reject" the null hypothesis of no
relationship more often than it should). This helps you avoid the spurious regression problem and
gives you a clearer understanding of the actual relationship
R-squared: Normally, the R-squared value tells us how between the two variables.
much of the variation in the dependent variable is explained
by the independent variable. Differencing Solution:
In a spurious regression, the R-squared doesn’t behave Removes Trend: Differencing helps to remove the trend
normally. Instead of tending towards zero (indicating no (upward or downward movement) in the data. When the
relationship), it behaves like a random number and can be trend is removed, the data becomes stationary, meaning its
misleadingly high. statistical properties (mean, variance) don’t change over
time.
To fix this issue, you usually need to difference the series
first (i.e., look at changes rather than the levels), which can Stationarity: Once both variables are stationary, if a
help remove the non-stationary trends and give a more relationship exists, it’s less likely to be spurious. In other
accurate analysis. words, the relationship reflects true economic connections,
not just coincident trends.
Example of Fixing Spurious Regression by Differencing
Key Takeaway:
Imagine you have two time series:
Before Differencing: The regression might falsely show a
 Series A: Monthly stock prices of Company X over strong relationship because both series are trending upwards
5 years. over time.
 Series B: Monthly GDP values of a country over After Differencing: By focusing on the changes
the same period. (differences) in the data, you remove the trend and get a
more accurate picture of the relationship between stock
prices and GDP.

Another method for removing the spurious regression is


using time variable in the model

You might also like