Elmrnov10chamberli tcm77-198089
Elmrnov10chamberli tcm77-198089
Methods Explained
Methods explained is a collection of short articles explaining statistical issues and methodologies
relevant to ONS and other data. As well As defining the topic area, the notes explain why and how
these methodologies are used.
Temporal disaggregation
Graeme Chamberlin
Office for National Statistics
Summary
National statistics institutions often face the task of producing timely data, such as monthly and
quarterly time series, even though sources are less timely. Temporal disaggregation is the process
of deriving high frequency data from low frequency data, and is closely related to benchmarking
and interpolation. This article describes and demonstrates some of the available techniques.
Temporal disaggregation is the process of deriving high frequency data from low frequency data,
and if available, related high frequency information. Not only is it useful in the National Accounts
framework, but also for producing flash estimates and forecasts for a range of economic and other
indicators. The process of temporal disaggregation shares similar properties to benchmarking and
interpolation where the same kind of techniques are often applied, albeit in a slightly different way.
This article describes some of these temporal disaggregation techniques and demonstrates their
use in producing a monthly time series for GDP.
Stocks series measure the level of something at a particular point in time (for example
unemployment, money stock, public sector debt). Flows series measure how much of something
has happened over a period of time (such as exports, production, household consumption). It is
possible to express both stocks and flows as an index.
Creating more frequent measures for a stocks series, which is recorded at a specific point in time,
is essentially the same as having a time series with missing data points. Here the data is
interpolated by fitting a curve that is constrained to pass through the lower frequency
observations.
For flows data the same properties of smoothness and continuity are desirable, but even more
important is that temporal additivity is observed. In the case of flows data, the original series is not
'point in time' observations, so temporal disaggregation cannot proceed by simply joining the dots.
This means that the higher frequency data must add or average to the lower frequency data. Index
series are therefore treated as flows regardless of whether the series relates to stock or a flow.
Therefore, if Yt is an observed quarterly series where t = 1,..., n denotes each quarter, then the
monthly disaggregated data y t ,q where q = 1,2,3 denotes each month in the quarter must observe
temporal additivity for a flow series:
3
Yt = ∑ yt ,q (1)
q =1
1 3
Yt = ∑ y t ,q
3 q =1
(2)
The application of one of these two constraints, which are essentially the same (in this example
averaging is just additivity divided by three), is the fundamental difference between interpolation
and temporal disaggregation. The focus of this article is on the temporal disaggregation of flows
series as these are more applicable to the production of economic statistics such as National
Accounts.
There are many different methods for temporally disaggregating a time series (see Chen 2007 for
a good survey). The choice of method will critically depend on the basic information available as
well as preferences. But the fundamental objective is to construct a new time series that is
consistent with the low frequency data whilst preserving the short–term movements in the higher
frequency indicator series (if available).
This article considers a number, although not an exhaustive, selection of techniques for deriving
higher frequency data. If no higher frequency indicator is available then a smoothing method will be
required such as:
• Cubic spines
• Boot, Feibes and Lisman (BFL) smoothing method
However, when a higher frequency indicator is available not only can the smoothing methods be
applied but also a range of statistical methods. In particular three variants of the Chow–Lin
regression method are frequently used:
• Fernandez random walk model
• Litterman random walk Markov model
• AR(1) model
It is not always the case that an indicator variable is required in order to produce a lower frequency
data series – this could after all be achieved by fitting a smooth and continuous curve through the
lower frequency benchmark points. Smoothing approaches assume no other information than that
contained in the higher frequency series, but this might be the preferred option if a suitable and
well–behaved indicator series is unavailable. It is not always the case that an indicator approach is
necessarily the better way and there needs to be a strong case for rejecting a smoothing model.
Furthermore, two time series that are strongly correlated at a lower frequency need not be strongly
correlated at a higher frequency, so judgement should be used as to the appropriate choice of
indicators.
The first two are intuitively obvious. The low frequency benchmark data generally defines the long–
term trend of the disaggregated data whereas the indicator variables (if used) have a bearing on
the short–term data movements. Revisions to either will therefore impact on the derived
disaggregated data.
The final reason for data revision is more specific to disaggregation techniques that rely on
smoothing approaches. These typically work like moving averages, so as the new low frequency or
benchmark data arrives it will affect the previously estimated time series. More importantly, it is the
data towards the end of the sample that is most susceptible to revision – an issue referred to as
the end point problem.
Smoothing approaches generally work on the basis of a centred moving average, meaning that
estimates are based on both forward and past data. This preserves symmetry with the underlying
data source. If the moving average procedure was simply backward looking, then movements in
the derived time series would tend to lag the benchmark – known as phase shifting. (See
Chamberlin 2006 for a discussion on end point problems and phase shifting in deriving time series
trends.) The crux of the end point problem is that forward looking observations are always
required, but these are not available towards the end of the sample. Therefore the benchmark
series needs to be extended by forecasting sufficient future observations. However, as these
forecasts are replaced with actual data outruns the disaggregated times series will be recalculated
and are liable to revision. The data at the end of the sample, which is normally the part of the time
series of most interest to policy– and other decision makers, is likely to be the least stable.
8 345,000
4 335,000
2 330,000
0 325,000
-2 320,000
-4 315,000
2005 2005 2005 2005 2006 2006 2006 2006 2007 2007 2007 2007 2008 2008 2008 2008 2009 2009 2009 2009 2010 2010 2010
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3
Some of the techniques considered here have been applied using ECOTRIM (Barcellan and
Buono 2002), a computer program developed by Eurostat for temporal disaggregation of time
series. This software is available for download from
http://circa.europa.eu/Public/irc/dsis/ecotrim/library
Cubic spline
Using spline functions to produce higher frequency data is routine practise in the National
Accounts. For example, ONS currently uses spline functions in the Index of Services following a
methodology laid out by Baxter (1998).
The basic premise of a spline function is to link sections of a cubic polynomial together at joins
subject to additivity constraints being satisfied. As each is a function of time, sub–period estimates
can then be simply derived from each spline. Following Baxter (1998), a five period (in this case
quarters) is initially taken with each represented by a cubic polynomial of time:
f 2 (t ) = a 2 + b2 t + c 2 t 2 + d 2 t 3
f 3 (t ) = a 3 + b3 t + c3 t 2 + d 3 t 3
f 4 (t ) = a 4 + b4 t + c 4 t 2 + d 4 t 3
f 5 (t ) = a 5 + b5 t + c5 t 2 + d 5 t 3
Constraint 1
The levels and slopes of adjacent sections of cubic polynomial are equal where they meet. This
forms a continuous (that is no jumps or other form of discontinuity) curve.
Constraint 2
The sum of the values of the spline over each sub–period (monthly) is equal to the observed
quarterly data. This is the temporal additivity constraint that is applied to flow data.
Constraint 3
The spline function is constructed to be as smooth as possible subject to the two previous
constraints. This is achieved by minimising the speed at which the gradient of the spline changes
over its whole length. Technically speaking, over the whole range, the integral of the square of the
second derivative of the splines is minimised so as to reduce the incidence of sharp changes in the
time series
2 2
5
⎛ ∂2 f ⎞ 5 i
⎛ ∂ 2 fi ⎞
∫0 ⎜⎜⎝ dt 2 ⎟⎟ dt = ∑ ∫ ⎜⎜ 2 ⎟ dt
⎟
⎠ i =1 i −1⎝ ∂t ⎠
Cubic splines can be fitted to a data series of any length, and be refitted as new data becomes
available. A temporally disaggregated series can be found in the same way for up to five periods
by using a reduced number of data points and equations. When more than five periods of data are
available the spline is extended one period at a time using a five period base, with revision of the
spline function in the previous three periods. This means that the addition of a sixth period sees the
spline and hence estimates for the third, fourth and fifth periods revised. The new spline function
for periods 3 to 6 is still calculated on a five period base to maintain the continuity with the spline in
the first two periods. The spline in period 2 feeds into calculation of the spline for periods 3 to 6 but
in a way that itself remains fixed.
Figure 2 shows a monthly time series of GDP, derived from the quarterly series shown in Figure 1.
These results are compared to the levels and growth rates of a naïve temporal disaggregation
approach in Figures 3 and 4 respectively. Here the quarterly level has simply been divided by
three and allocated to each month within the quarter – known as pro–rata adjustment. It is clear,
from looking at the step pattern in the level time series that this approach simply loads all the
change in the time series to the monthly growth rate between the final month of the quarter and the
first month of the proceeding quarter – with these growth rates corresponding to those of the
quarterly series.
6 116,000
4 112,000
3 110,000
2 108,000
1 106,000
0 104,000
-1 102,000
2005 Jan
2006 Jan
2007 Jan
2008 Jan
2009 Jan
2010 Jan
2005 Oct
2006 Oct
2007 Oct
2008 Oct
2009 Oct
2005 Jul
2006 Jul
2007 Jul
2008 Jul
2009 Jul
2010 Jul
2005 Apr
2006 Apr
2007 Apr
2008 Apr
2009 Apr
2010 Apr
Per cent
Per cent
2008 Jan
Figure 4
Figure 3
2008 Mar 2008 Feb
2008 May
2008 Jun
2008 Jun
2008 Jul
2008 Jul
2008 Aug
2008 Aug
2008 Sep
2009 Nov
112
Economic & Labour Market Review Nov 2010
Cubic splines can be applied where there are no sub–period indicators to guide on the short–term
movements of the data. However, they can also be adapted to cases where such indicators are
available. In this case the spline is not attached to the lower frequency benchmark data, but to the
benchmark–indicator (BI ratio). Where benchmark data (GDP) is quarterly and indicator data (I) is
monthly this ratio is:
GDPt
BI t = 3
∑I
i =1
it
Monthly estimates of GDP can then be produced by multiplying the splined monthly values of the
BI ratio by the monthly indicator series. In this case, dealing with end point problems requires the
BI ratio and not the benchmark data to be extrapolated.
The Index of Manufacturing, Index of Services and Retail Sales index are three possible indicators
of monthly movements in GDP. Figures 5,6 and 7 show how these indicators compare to the
spline function in Figure 2. In all cases the monthly path becomes less smooth and reflective of
monthly changes in each indicator series. Temporal additivity conditions apply so the quarterly
sums are consistent with the data in Figure 1.
116,000
GDP spline
115,000
Index of manufacturing
114,000
113,000
112,000
111,000
110,000
109,000
108,000
107,000
106,000
2005 May
2006 May
2007 May
2008 May
2009 May
2010 May
2005 Jan
2005 Sep
2006 Jan
2006 Sep
2007 Jan
2007 Sep
2008 Jan
2008 Sep
2009 Jan
2009 Sep
2010 Jan
2010 Sep
2005 Nov
2006 Nov
2007 Nov
2008 Nov
2009 Nov
2005 Mar
2006 Mar
2007 Mar
2008 Mar
2009 Mar
2010 Mar
2005 Jul
2006 Jul
2007 Jul
2008 Jul
2009 Jul
2010 Jul
£ million
£ million
2005 Jan 2005 Jan
Figure 7
Figure 6
GDP spline
Index of services
Retail sales
2010 Mar 2010 Mar
114
Economic & Labour Market Review Nov 2010
Whereas the cubic spline methodology aims to produce a smooth curve that maintains temporal
additivity while preserving the long–term trend in the low frequency data, using a high frequency
indicator series produces a time series that reflects the short–term movements inherent in that
indicator.
The examples in Figures 5,6 and 7 also show the similarities between temporal disaggregation and
benchmarking. Benchmarking is the process of constraining a higher frequency time series to a
lower frequency benchmark series, and therefore is really just the mirror image of temporal
disaggregation. As a result the same techniques discussed in this article are also often used for
benchmarking.
The first difference approach estimates a monthly time series y = ( y1 ,...., yT ) to minimize the
period–to–period change in the level of final monthly estimates subject to the additivity constraints
holding. Basically:
T
min y P( y ) = ∑ ( y t − y t −1 )
2
(3)
t =2
subject to
3
Yt = ∑ yt ,q (4)
q =1
The second difference model is similar, but in this case aims to keep the period–to–period change
in Δyt as linear as possible, which is achieved by minimizing the sum of squares of
Δ2 y t = (Δy t − Δy t −1 ) subject to additivity constraints.
T
min y P ( y ) = ∑ [Δ ( y t − y t −1 )]
2
(5)
t =2
subject to (4)
Both methods therefore aim to fit the smoothest possible curve to the low frequency data by
minimising period–to–period movements in the data. Estimates of monthly GDP using the first and
second difference BFL smoothing methods are shown in Figure 8. There is no discernable
difference between the two approaches in this case, and the derived monthly time series is very
similar to the spline in Figure 2.
116,000
First order
115,000
Second order
114,000
113,000
112,000
111,000
110,000
109,000
108,000
107,000
106,000
2005 May
2006 May
2007 May
2008 May
2009 May
2010 May
2005 Jan
2005 Sep
2005 Nov
2006 Jan
2006 Sep
2006 Nov
2007 Jan
2007 Sep
2007 Nov
2008 Jan
2008 Sep
2008 Nov
2009 Jan
2009 Sep
2009 Nov
2010 Jan
2010 Sep
2005 Mar
2006 Mar
2007 Mar
2008 Mar
2009 Mar
2010 Mar
2005 Jul
2006 Jul
2007 Jul
2008 Jul
2009 Jul
2010 Jul
Source: Office for National Statistics
The BFL smoothing model can also be used with indicator series in a similar way as the cubic
spline to produce time series with different short–term characteistics.
BFL is just one of many mathematical approaches to producing temporally disaggregated data.
The Denton adjustment method and its variants (such as Causey–Trager) are based on the
principle of movement preservation – meaning that sub–period estimates y = ( y1 ,...., yT ) should
preserve the movement in the indicator series x = ( x1 ,...., xT ) so as to minimize a penalty function
P( y, x ) subject to the temporal aggregation constraints. The penalty function can take a number of
forms depending on the preferences of the modeller – that is the desirable properties of the high
frequency data that is to be created . Monthly GDP estimates using Denton adjustment
approaches are not produced here but are briefly described in Box 1, as well as being covered in
more detail in Chen (2007).
The Denton adjustment method and its variants are based on the principle of movement
preservation between sub–period estimates and indicator time series.
T
P ( y, x ) = ∑ [Δ( y t − xt )]
2
t =1
This preserves the period–to–period change in the level of the final sub–period estimates and
the indicator values (y-x). As a result y = ( y1 ,...., yT ) tends to be parallel to x = ( x1 ,...., xT ) .
This preserves the proportional period–to–period change in the final sub–period estimates and
the indicator series (y/x). As a result y = ( y1 ,...., yT ) tends to have the same period–to–period
growth rate as x = ( x1 ,...., xT ) .
[ ]
T
P ( y, x ) = ∑ Δ2 ( y t − xt )
2
t =1
This aims to preserve the period–to period change in the indicator series. As a result the
period–to–period percentage change in y = ( y1 ,...., yT ) tends to be very close to that
in x = ( x1 ,...., xT ) .
y t = xt β + u t (6)
Y = B ′y (7)
Substituting (6) into (7) gives an equation for the observed quarterly time series in relation to the
monthly indicator series:
Y = B ′y = B ′xβ + B ′u (8)
The regression coefficients can then be calculated using the Generalised Least Squares (GLS)
estimator
[
βˆ = x ′B(B ′VB )−1 B ′x ]
−1
x ′B(B ′VB ) Y
−1
(9)
And the estimated sub–period (monthly in this case) time series can be derived as
[
yˆ = xβˆ + VB (B ′VB ) Y − B ′xβˆ
−1
] (10)
This can be explained in more basic terms for the specific example of deriving monthly GDP from
the quarterly series with the use of monthly indicator series. Equation (6) postulates a simple linear
relationship between monthly GDP and a set of monthly indicators. Equation (7) is simply the
temporal additivity constraint relating monthly GDP to quarterly GDP. Aggregating the monthly
indicators into a quarterly series in the same way means that a simple linear regression can be
computed between quarterly GDP and the quarterly aggregates of the indicators. Using the GLS
estimator, the regression coefficients βˆ are calculated in (9). These coefficients can then be used
to map the monthly indicator series into monthly GDP estimates ( ŷ ) in (10). Equation (10) consists
of two parts. The first part describes the linear relationship between the monthly indicator series
and monthly GDP time series ( xβˆ ). However, to ensure that the additivity constraint holds,
quarterly discrepancies between the regression’s fitted values and the actual data needs to be
[
allocated across each month in the quarter – which is represented by VB (B ′VB ) Y − B ′xβˆ .
−1
]
When there is no serial correlation in the residuals (u t ) this adjustment simply reduces to
allocating the quarterly discrepancy evenly across the three months of the quarter. Unfortunately,
the assumption of no serial correlation in the residuals is generally not supported, in which case
the Chow–Lin approach would lead to step changes in the monthly GDP estimates across different
quarters. As a result, a number of variants of the Chow–Lin approach have been developed which
allow for serial correlation in the residuals. These include the following three common approaches:
AR(1) model
u t = ρu t −1 + ε t
One of the main advantages of the regression approach is that a number of indicator series can be
used to deduce the short–term movements in the disaggregated time series. Figure 9 presents
estimates of monthly GDP based on these three approaches and the three monthly indicators
(Index of manufacturing, Index of services and Retail sales) used earlier. The underlying
regression results are included in Table 1.
116,000
114,000
112,000
110,000
108,000
AR(1)
106,000 Fernandez
Litterman
104,000
May-05
May-06
May-07
May-08
May-09
May-10
Nov-05
Nov-06
Nov-07
Nov-08
Nov-09
Jan-05
Mar-05
Jul-05
Sep-05
Jan-06
Mar-06
Jul-06
Sep-06
Jan-07
Mar-07
Jul-07
Sep-07
Jan-08
Mar-08
Jul-08
Sep-08
Jan-09
Mar-09
Jul-09
Sep-09
Jan-10
Mar-10
Jul-10
Sep-10
Fernandez
Litterman
AR(1)
The three monthly GDP time series in Figure 9 show the same patterns. This is not unsurprising as
the intrinsic differences between the three methods are not that great, and as shown in Table 1,
the monthly time series in each case have been predominantly driven by the Index of services. The
significance of the other two indicators is mixed, with the Index of manufacturing and the Retail
sales index only having limited significance in accounting for short–term movements in monthly
GDP.
The significance of the Index of services in monthly GDP largely stems from its relatively strong
correlation at the quarterly level – which is unsurprising given its large share in GDP. However, it is
a matter of judgement as to whether the correlation is just as strong at the monthly level and
therefore that this is the most appropriate indicator for forming a monthly GDP time series.
This can be achieved either by fitting a smooth curve through the data subject to temporal additivity
constraints, or by using monthly indicators to inform on short–term data movements. The
application of these methods is important in the National Accounts, but can also be applied
generally across a broad range of economics and other statistics.
Temporal disaggregation techniques are also useful in forecasting and the production of flash
(advance/preliminary) estimates of economic data. For example, GDP is published quarterly so
usually a forecast model will be based on quarterly data, even though in the meantime a number of
potentially useful monthly indicators may have been published. Temporal disaggregation models
therefore enable this higher frequency, and usually more timely data, to be incorporated into the
forecast process to provide more rapid and potentially accurate estimates. Although ONS is not in
the business of providing forecasts, this is the basic approach behind the monthly GDP and early
quarterly estimates of GDP published by the National Institute of Economic and Social Research
(see Mitchell et al 2004).
Contact
[email protected]
References
Baxter M (1998) 'Interpolating annual data into monthly or quarterly data', Government Statistical
Service methodological series No. 6.
Chamberlin G (2006) 'Fitting trends to time series data', Economic Trends, August
Chen B (2007) 'An empirical comparison o different methods for temporal disaggregation at the
National Accounts'. Bureau of Economic Analysis, Washington, USA
Mitchell J, Smith R, Weale M, Wright S and Salazar E (2004) 'An indicator of monthly GDP and an
early estimate of quarterly GDP growth', National Institute of Economic and Social Research
discussion paper 127