We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26
Exploring Data Patterns and
Choosing a Forecasting Technique
Reading: Hanke and Wichern, chap 2/3 • Generally two types of data are of interest to a forecaster: 1. Cross –Sectional Data: • The objective is to examine this data, then extend the revealed relationship to the larger population. • E.g., if we collect data on the age and current maintenance cost of nine buses run by DTC, a scatter diagram can help us visualize the relationship between age and maintenance cost. • The diagram suggests that if we want to forecast annual maintenance cost, age can be a useful variable. 2. Time Series: • Four general types of patterns—horizontal, trend, seasonal and cyclical. • When data observations fluctuate around a constant level or mean, a horizontal pattern exists. This type of series is called stationary in its mean. E.g., monthly sales for a product that do not increase or decrease consistently over time would display a horizontal pattern. • When data observations grow or decline over an extended period of time, a trend pattern exists. • The figure shows a long-term trend (growth) of a time series variable—housing costs. • The movement of the variable has generally been upwards over the 20 year time span shown. Some of the forces that affect and help explain the trend of a series are population, price inflation, technological change, consumer preferences etc. • Definition: The trend is the long-term component that represents the growth or decline in the time series over an extended period of time. • A cyclical pattern exists, when observations exhibit rises and falls that are not of a fixed period. • The cyclical component is the wavelike fluctuation around the trend that is affected by general economic conditions— business cycles. • Definition: The cyclical component is the wave-like fluctuation around the trend. • The figure above shows a time series with a cyclical component. The peak corresponds to an expansion in the economy (boom) and the valley shows a downturn. • When fluctuations are influenced by seasonal factors, a seasonal pattern exists. This pattern tends to repeat itself year after year. • For a monthly series, the seasonal component measures the variability of the series each month. • For a quarterly series, there are 4 seasonal elements corresponding to each quarter. • The figure shows that electrical usage for Washington residential customers is highest in the first quarter (winter) of each year. • Definition: The seasonal component is a pattern of change that repeats itself year after year. Exploring Data Patterns Using Autocorrelation Analysis
• Autocorrelation is the correlation between a variable lagged
one or more periods and itself. • When a variable is measured over time, observations in different time periods are often correlated. This correlation is measured using the autocorrelation coefficient. • We can also study data patterns including components like trend and seasonality using autocorrelation. • The next slide shows a store’s sales data for a year. Lag k autocorrelation coefficient between observation Yt and Yt-k which are k periods apart can be calculated by the above formula.
Y¯ = mean of the series
Yt = observation in time period t Yt-k = observation k time periods earlier or at time period t-k • The scatter diagram of the data shows that lag 1 autocorrelation should be positive. • The autocorrelation coefficient , ϒ1 , for lag 1 is 0.572. • This implies that successive monthly sales are somewhat correlated. • This information gives the store owner some insights into his time series data on sales. • The lag 2 autocorrelation coefficient ϒ2 = 0.463 < ϒ1 • Generally as k (lags) increases, the autocorrelation coefficients become smaller. • If we plot the different ϒ values for different time lags against time/lags we get the auto-correlation function , ACF. • Also called a correlogram—graph of the autocorrelations for various lags of a time series. • The horizontal axis at the bottom of the graph shows each time lag 1, 2, 3 and so on. • The vertical axis shows the possible range of an autocorrelation coefficient, -1 to +1. • The horizontal line in the middle of the graph represents autocorrelations of zero. • The three vertical lines in the graph at the three time lags show the respective autocorrelation coefficients. • The statistical software usually shows you additional information to check if these coefficient values are significantly different from zero. • The ACF coefficients can also answer many other questions: 1. Are the data random? If a series is random, autocorrelations between Yt and Yt-k for any lag k are close to zero. This implies that successive values for a time series are not related to each other. 2. Do the data have a trend? If a series has a trend, successive observations are highly correlated and the autocorrelation coefficients are significantly different from zero for the first several time lags and then gradually drop to zero as number of lags increases. The autocorrelation coefficient for lag 1 is often very large and declines for successive lags. 3. Do the data have a seasonal pattern? If series has a seasonal pattern, a significant autocorrelation coefficient will occur at the seasonal time lag or multiples of the seasonal lag. E.g., 4 for quarterly data and 12 for monthly data. Choosing a Forecasting Technique • For Stationary Data A stationary series is one whose mean (and other characteristics) don’t change over time. This happens when demand patterns influencing the series are stable. Forecasting a stationary series involves using the available history of the series to estimate its mean—which then becomes the forecast for future periods. E.g., naïve methods, simple averaging methods, moving averages, ARMA models • For Data with a Trend A time series is said to have a trend if its average value changes over time. Forecasting techniques for trending data are used when a. Increased productivity and new technology lead to changes in lifestyle. E.g., demand for electronic components increased with advent of computers b. Increasing population causes increase in demand for G&S. E.g., sales revenues of consumer goods, demand for energy c. Purchasing power of the rupee affects economic variables due to inflation. E.g., salaries, production costs and prices d. Market acceptance increases. E.g., growth period in the lifecycle of a new product. Techniques: moving averages, Holt’s linear exponential smoothing, simple regression, growth curves, exponential models, ARIMA • For Data with Seasonality Time series with a pattern of change that repeats itself year after year. To develop a seasonal forecast we select either a multiplicative or additive decomposition method and then estimate seasonal indices from the history of the series. These indices are used to include or remove seasonal effects from forecasts. Process—seasonally adjusting data. Forecasting techniques are used for seasonal data when weather influences the variable of interest or the annual calendar influences variable of interest.????
smoothing, multiple regression, ARIMA models • For Cyclical Data Cyclical patterns are difficult to model because they are not stable. The wave-like pattern around a trend rarely repeats itself after fixed intervals and its magnitude also varies. Forecasting techniques are used for cyclical data when: a. The business cycle influences the variable of interest. b. Shifts in popular tastes occur. c. Shifts in population occur. d. Shifts in product life cycle occur.