Approximations_of_Time_Series
Approximations_of_Time_Series
Research Article
Approximations of Time Series
Copyright q 2011 M. Brackstone and A. S. Deakin. This is an open access article distributed under
the Creative Commons Attribution License, which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
A method is proposed to approximate the main features or patterns including interventions that
may occur in a time series. Collision data from the Ontario Ministry of Transportation illustrate
the approach using monthly collision counts from police reports over a 10-year period from 1990
to 1999. The domain of the time series is partitioned into nonoverlapping subdomains. The major
condition on the approximation requires that the series and the approximation have the same
average value over each subdomain. To obtain a smooth approximation, based on the second
difference of the series, a few iterations are necessary since an iteration over one subdomain is
affected by the previous iteration over the adjacent subdomains.
1. Introduction
Graduated licensing system GLS is a method of gradual exposure of young novice drivers
into the driving environment, allowing them to obtain initial experience with driving under
supervision, followed by more independent driving under higher-risk circumstances 1.
This model was widely incorporated into driver licensing programs across the US and
Canada as well as other countries over the 1990s. Most of these programs have incorporated
similar restrictions into their initial phases 2. These include driving with supervision,
restricted driving at night, limited teenage passengers, and zero blood alcohol level while
driving. This method has had limited long-term evaluation in North America, but long-
term followup in New Zealand suggested a reduced but persistent long-term reduction in
young driver collisions as a result of its implementation 3. The collision data for Ontario
drivers, around the time of the introduction of the GLS 2, p. 126, illustrate the variety of
approximations of time series that are possible.
There are many practical techniques for smoothing a time series 4. The smoothed
value at a point is a weighted average involving the elements in the series that are within
a local window about the point. One way to generate the weights in a moving average
2 ISRN Applied Mathematics
involves a local polynomial approximation of order 3 or 5 where the window includes 5
or 7 et cetera points. The weights are then defined by regression. Another approach defines
the weights in terms of an appropriate kernel, and this method applies more generally to
bivariate data 5. The advantages of local estimates compared with global estimates are
discussed in 6.
The first step in the computational process involves the partition of the domain of the
time series into subdomains. The subdomains are then labelled as odd-numbered or even-
numbered. Iterations are then performed over the odd numbered subdomains followed by
the iterations over the even-numbered subdomains. This process is numerically efficient since
the iterations over one set of subdomains update the boundary conditions for the iterations
over the remaining subdomains 7. To determine a smooth and accurate approximation, this
set of iterations is repeated a few times.
This paper is organized as follows. Section 2 describes the form of the approximations,
the partition of the time series, and the minimization over the subdomains of the partition. In
Section 3, the equations for the approximation are derived, and some computational details
are given in Section 4. Under certain assumptions, approximations of time series with variable
spacing are possible. The time series and their approximations are presented in Section 5,
and an example outlines the approach for step level changes, missing data, and outliers
in Section 6. In Section 7, the approximation over a subdomain is determined by a fourth-
order polynomial and a straight line. Finally, guidelines for the application of the proposed
approach to a time series are outlined in “Concluding remarks.”
n−1
Zt Otk Qtk Rnt , 2.1
k0
where Qt0 Tt Mt and Rnt Ant Wtn . In these equations, Otk is the term for outliers if present;
Qtk is the kth approximation; Tt is a trend and includes the level changes; Mt is a nonperiodic
oscillatory function; Rnt is the remainder; Ant is a measure of the variation of the remainder.
A restriction on the approximations requires that the root-mean-square RMS value of the
remainder is a decreasing sequence with increasing n. The form of 2.1 is similar to the
asymptotic expansion of a function that contains a small positive parameter 8, p. 1–4.
The partition of the domain of the time series {Zt }, henceforth denoted by Zt , is
chosen in order to accurately approximate possible patterns in the time series. Let P
{Ek | k 1, . . . , M} be a partition of 1, N where the nonoverlapping subdomains are
Ek {tk − nk 1, . . . , tk }, nk ≥ 1, where nk is the number of elements in the kth subdomain
Ek . The overlapping subdomains, over which the iterations are computed, are defined as
Eko {tk −nk , Ek , tk 1}. For k 1, E1o {0, E1 , n1 1} and for k M, EM
o
{N −nN , EN , N 1}
so that the overlapping subdomains are defined over the interval 0, N 1.
An approximation Qt of a time series Zt , where Zt Qt Rt , is determined by a few
iterations Itn starting with It0 Zt . Once the desired accuracy is obtained, the last iteration is
defined as Qt . All iterations and the approximation Qt along with the remainder Rt satisfy
the following properties.
ISRN Applied Mathematics 3
1 An iteration over Ek has the same average value ak as the time series
Zt Itn ak , 2.2
t∈Ek t∈Ek
and, hence, the average value of the remainder Rt is zero over this subdomain. If
Zt is a measure of “energy” in the process, then Qt conserves energy over each
subdomain in the partition. In the particular case nk 1, Ek {tk } and t tk is a
fixed point for the approximation so that Qt : Zt at t tk .
2 The measure of smoothness of the iterations at time t for the nth iteration is defined
by δtn Itn 1 It−1
n
− 2Itn which is the second difference of Itn at time t. The norm on
o
Ek is defined as the RMS value
⎛ ⎞1/2
2
δtn
⎝ ⎠
t∈Ek
Δnk . 2.3
nk
Provided that the number of elements in Ek is greater than 1, then the condition that
Δnk has a minimum value is imposed. Itn is required for t tk − nk and t tk 1 in
Eko to determine δtn at the endpoints of Ek . These two values of Itn are the boundary
conditions for the minimization on Ek .
3 For most of the examples presented in this paper, t 1 and t N are fixed points so
that Q1 Z1 n1 1 and QM ZN nM 1. These values provide the boundary
conditions for the minimization over the adjacent subdomain. For the general case
where n1 > 1 in E1 nM > 1 in EM , one of the boundary conditions is missing in
E1o EM
o
so that an external boundary condition is required as described in the last
paragraph of Section 3.
4 In some cases there are two or more approximations over one or more subdomains
and a criterion is required to choose the best approximation. From 2.1, Rkt Qtk
Rkt 1 R0t : Zt where Qtk is determined from Rkt over a partition Pk . For the example
in Section 5, the simplest case occurs when Pk is a refinement of Pk−1 ; that is, Pk
∪Pk where Pk covers E in Pk−1 . Then an approximation over E is Qtk 0 and the
other is defined by Pk . Let the RMS value of the remainder Rkt 1 over Pk be denoted
by Sk 1 , and Sk is the RMS value over E in Rkt . The approximation defined by Pk
is a significant improvement if the ratio Sk 1 /Sk ≤ for a chosen value of . As
shown in Section 7, an upper bound for takes on values between 0.75 and 0.9. For
the example in Section 6 involving an outlier, there are two approximations for Qt0 .
5 The magnitude Ant of the remainder Rnt is defined to be the RMS value of the
remainder over each subdomain in a partition, and this definition implies that the
RMS value of the series Wtn in 2.1 is equal to 1. For the example presented in
Section 5, the subdomains are uniform with 12 elements.
3. Mathematical Details
Given the iterates Itn−1 and δtn−1 on Eko , the iterates Itn for t ∈ Ek are computed such that the
sum of squares of δtn has a minimum value. For the moment, the first and last interval E1
4 ISRN Applied Mathematics
and EM are excluded. The following variables are required to set up the equations for the
minimization over the subdomain:
Xkn δtnk −nk 1 , . . . , δtnk , Ykn Itnk −nk 1 , . . . , Itnk ,
3.1
Bkn Itnk −nk , 0, . . . , 0, Itnk 1
are nk × 1 matrices and a prime on a matrix indicates the transposed matrix. From the
definition of δtn in Section 2, these matrices are related by Xkn − AYkn Bkn , where A is a
nk × nk tridiagonal symmetric matrix with elements
{−2, 1; 1, −2, 1; . . . ; 1, −2, 1; 1, −2}, 3.2
where −2 is on the main diagonal. The equations for the iterations are obtained by replacing
Bkn with Bkn−1 . Since the sum of Itn for t ∈ Ek is a constant for all n, then
H Ykn Zt ak nk , 3.3
t∈Ek
where H 1, 1, . . . , 1 , and ak is the average value of Zt over Ek . The condition on the sum
in terms of Xkn is E Xkn −Bkn−1 ak nk , where E is the solution of AE H. Thus, E Xkn nk ρkn−1
where
Itn−1 Itn−1
ρkn−1 ak −
k−1 k 1
, 3.4
2
and E1 Enk −nk /2 Section 7. The solution for Xkn such that Xkn Xkn has a minimum
is Xkn ρkn−1 G2 nk E, where G2 nk nk /E E. Finally, Δnk |ρkn−1 |Gnk , and the solution Ykn
is
Ykn Itn−1
k −nk
V1 Itn−1
k 1
V2 ρkn−1 V3 , 3.5
i i
V1 i 1 − , V2 i , AV3 G2 nk . 3.6
nk 1 nk 1
4. Computational Aspects
For a time series Zt and a partition P , there is a related series defined by Zt ak for
t ∈ Ek , where ak is the average value of Zt for t ∈ Ek . This property holds for all of the
approximations in this paper,
The approximations for a time series Zt and the averaged time series Zt are the same to the
desired accuracy provided that the same partition and the same external boundary conditions (if any)
are applied.
Consequently, any time series with variable spacing can be approximated provided
that the estimates of the average values of the time series over the subdomains are adequate.
The approximation for the averaged series is employed especially for larger
subdomains nk ≈ 12 or more. The efficiency of the computations is increased if the
boundary conditions in the first set of iterations even and odd are the average of the four
values of the series that straddle the subdomains Ek−1 and Ek . The averaged series was used
in all computations, although the approximation obtained from Zt may be more efficient in
special cases.
It is convenient to introduce another notation to represent a partition: P
{n1 , n2 , . . . ; . . . ; . . . , nM }, where the number of elements in the subdomains in the first block is
{n1 , n2 ,. . .} and in the last block by {. . . , nM }. These blocks are a convenient way to separate
the seasons or a set of months. Also, the approximation Qtk obtained by iterating the time
series n times, using the partition Pk , is denoted by Pkn {Rkt } R0t : Zt . The number of
iterations n is determined from the difference
n
Dt Pkn Rkt − Pk2n Rkt 4.1
by imposing the condition that max|Dtn | < L. L 1 in Figures 1 and 2, L 0.04 in Figures
3 and 4, and L 0.01 in Figure 5. All calculations in this paper were performed using Maple
software 9.
5. Applications
Two time series, provided by the Ontario Ministry of Transportation 2, p. 126, illustrate
the approximations. The graph of the time series for the monthly accidents for young novice
drivers is given in Figure 1 where the main feature here is the intervention that occurs at 52
months owing to the introduction of the GLS on April 1, 1994. The corresponding graph for
all drivers is shown in Figure 3 where the sharp drop in the graph from the maximum in
December/January to April, except for the last 2 years, is a strong feature of the series.
6 ISRN Applied Mathematics
500
400
300
200
100
12 24 36 48 60 72 84 96 108 120
50
0
−50
12 24 36 48 60 72 84 96 108 120
Figure 1: The upper graphs include Zt thin line, the accidents per month for young novice drivers, and
the approximation Qt0 thick line. The lower graphs represent the remainder R1t Zt − Qt0 thin line and
the amplitude of the remainder A1t thick line. The RMS value of the remainder is 25.
500
400
300
200
100
12 24 36 48 60 72 84 96 108 120
Figure 2: The graph of the trend Tt thick line and the approximation Qt0 thin line from Figure 1 for
young novice drivers are shown.
40
35
30
25
20
12 24 36 48 60 72 84 96 108 120
4
2
0
−2
−4
12 24 36 48 60 72 84 96 108 120
Figure 3: The upper graphs represent Zt thin line, the accidents in thousands per month for all drivers,
and the approximation Qt0 thick line. The RMS value of the remainder R1t is 1.83. The lower graphs
include the second approximation Qt1 thick line and the remainder R1t thin line.
In Figure 1, the uniform partition P0 {1, 3; 4; . . . ; 4; 3, 1}, except for the first and last
blocks, provides a smooth approximation Qt0 P04 {Zt } and captures the intervention well. In
this case, the first and last elements are fixed Q10 Z1 and Q120 0
Z120 . Since the sum of the
elements in the remainder over Ek is zero, then P0 {Rt } {0}. Other uniform partitions are
1
possible where there are 3 or 6 elements in each subdomain. The approximation in the former
case is slightly less smooth than Qt0 in Figure 1, and the RMS value of the remainder is 26. For
ISRN Applied Mathematics 7
40
35
30
25
20
12 24 36 48 60 72 84 96 108 120
4
0
−4
12 24 36 48 60 72 84 96 108 120
Figure 4: The upper graph is the approximation Qt0 Qt1 in thousands for all drivers. The lower graph is
the remainder R2t ; the RMS value is 1.29.
1
0.8
0.6
0.4
0.2
0
30 40 50 60 70 80 90
Figure 5: The trend Tt thin line of two segments of a series Zt is Tt 0 for t ≤ 60 and Tt 1 for t ≥ 61;
whereas, Qt thick line is the trend of the series over 1,120.
the case of 6 elements, the RMS value of the remainder is 30. A more accurate approximation
is obtained if the subdomains have two elements; however, this approximation has an angular
appearance since it more closely approximates the time series.
In Figure 2, the approximation Qt0 of Figure 1 is expressed as a sum of a trend
and an oscillatory series. The partition for the trend Tt PT6 {Qt0 } is PT {12, 12, 12,
12; 6, 6, 6, 6; 12, 12, 12, 12}. The external boundary condition implies that the tangent is
horizontal at the endpoints of the series. The trend in this example is defined as a seasonal
approximation of the time series where the subdomains contain 6 elements over the domain
of the intervention. The remainder is the oscillatory series Mt Qt0 − Tt .
In Figure 3, the points for January or December plus one November and April
are fixed points for the approximation Qt0 P03 {Zt }. The partition is P0 {1, 2, 1, 7, 1;
3, 1, 7, 1; 3, 1, 7, 1; 3, 1, 8; 1, 2, 1, 7, 1; 3, 1, 6, 2; 1, 2, 1, 8; 1, 2, 1, 6, 1; 2, 2, 1, 8; 1, 2, 1, 7, 1}. The second
approximation captures the increase in the number of accidents that occur in the summer
months by approximating the remainder R1t in Zt Qt0 R1t to obtain R1t Qt1 R2t where
Qt1 P13 {R1t }. The partition P1 is a refinement of P0 where 7 is replaced with 3, 4; 6 with 3, 3;
8 with 3,5. Consequently, the approximations Qt0 and Qt0 Qt1 have the same average value
over the subdomains of P0 .
The subdomains that are the same in the two partitions P0 and P1 are indicated by the
intervals over which the approximation is zero in Figure 3. For the remaining intervals, the
ratio of the RMS value of the remainder R2t in Figure 4 to the RMS value of R1t is equal to
0.49 so that this second approximation is significant. Furthermore, each of the ten segments
of Qt1 , excluding the segments in which the approximation is identically equal to zero, has a
8 ISRN Applied Mathematics
value between 0.40 and 0.52. The approximation Qt0 Qt1 of Zt and the remainder R2t appear
in Figure 4.
7. Properties of Approximations
Approximations of Random Samples
The point of this exercise is to determine the in item 4 of Section 2 such that the only
reasonable approximation for a series of random samples is Qt0 equal to the mean of the
series. 12,000 random samples from the normal distribution with a mean of 0 and a standard
deviation of 1 were generated using Maple to form 100 time series with 120 elements in each
series. For each series, five approximations were determined where the subdomains of the
uniform partition contained 3, 4, 6, 12, and 24 elements. The external boundary condition for
the approximation is the condition of zero slope of the tangent. For each series, the ratio of
the RMS value of the remainder for the approximation to the RMS value of the series were
calculated, and the results are given in Table 1. The approximations corresponding to 24 and
12 elements are smooth and appear to reflect an underlying pattern in the series; whereas, for
the cases 3 and 4, the approximations are contorted. An upper bound for is less than the
minimum values in the range.
ISRN Applied Mathematics 9
1.2
1
0.8
0.6
0.4
0.2
0
−0.2
−0.4
2 4 6 8 10 12
Figure 6: Graphs for the example are shown: {Zt } thin line; approximation thick line, X 1.25 large
dot at t 6; smooth approximation, X 0.12 at t 6.
Table 1: The number of elements in the subdomains along with the mean, standard error StE, and range
of the ratios is shown.
Subdomain Mean StE Range
24 0.986 0.013 0.94, 1.01
12 0.967 0.018 0.91, 1.00
6 0.930 0.023 0.87, 0.98
4 0.888 0.027 0.81, 0.96
3 0.843 0.035 0.76, 0.92
Quartic Polynomial
The terms in the equation 3.5 for the approximation over Eko have a simple interpretation.
For the first two terms in 3.5, i, V1 i and i, V2 i are points on straight lines. For the
last term, Maple solves AE H for E and 3.6 for V3 exactly; consequently, an accurate
computation shows that i, Ei and i, V3 i are points on the graph of a quadratic and a
quartic polynomial, respectively.
To describe the properties of E and V 3, it is necessary to change variables from t to
s where s t − tk − nk − 1/2 and s 0 is the central point of the interval Eko . In the s
variable, the boundary conditions are applied at the points s ±so , where so nk 1/2.
The equation of the quadratic polynomial vs, nk for E is defined by d2 v/ds2 1 and v 0
at s ±so so that v s2 − s2o /2. For any integer i, Ei is equal to v at the corresponding
value for s, and E E 2so 5 − 2so /120. The equation for the quartic polynomial us, nk for
V3 , provided nk ≥ 3, is determined by d4 u/ds4 G2 nk , where the roots of the equation for
u are s ±so and s2 5s2o 1. Thus, u G2 nk s2 − s2o s2 − 5s2o − 1/24. Gnk is a measure
of the smoothness of the approximation over Eko : G2 1.0, G3 0.59, G4 0.39, G6
0.21, G12 0.062, and G24 0.017.
Concluding Remarks
The major input for the approximation of a time series involves the partition of the domain.
Initially a uniform partition is chosen and, if seasonal behavior is present in the series, a subset
of the partitions cover the domain for the seasons. In general, as the length of the subintervals
decreases, the approximation is less smooth and the accuracy of the approximation increases.
The best approximation occurs at the point at which the approximation is acceptably smooth.
The subintervals can be enlarged to determine a much smoother approximation that can be
labelled as a trend while still respecting the seasonal aspects of the series; however, if an
10 ISRN Applied Mathematics
intervention is present, then some adjustment of the partition may be necessary in the region
of the intervention. For time series with a well-defined local maximum or minimum, the
approximation can be assigned the same value as the series by taking the partition to be
a single point of the domain. For series with jumps and other complexities, examples are
provided to suggest how to proceed in these cases.
An approach in the literature, as indicated in the introduction, defines the approx-
imation at a point as a weighted average of the values of the values of the time series in
a window about the point. This approach may smooth out interesting features in the time
series and, if applied over a smaller intervals, the approximation will not be smooth. Since
the proposed model is not based on regression, a comparison of the two approaches has not
been considered.
References
1 A. F. Williams and R. A. Shults, “Graduated driver licensing research, 2007-present: a review and
commentary.,” Journal of Safety Research, vol. 41, no. 2, pp. 77–84, 2010.
2 M. Brackstone, Proposal for impact evaluation of graduated licensing system on young drivers in Ontario,
M.S. thesis, Department of Epidemiology and Biostatistics, University of Western Ontario, Canada,
2008.
3 J. D. Langley, A. C. Wagenaar, and D. J. Begg, “An evaluation of the New Zealand graduated driver
licensing system,” Accident Analysis and Prevention, vol. 28, no. 2, pp. 139–146, 1996.
4 G. Janacek, Practical Time Series, Oxford University Press, New York, NY, USA, 2001.
5 W. Härdle, Applied Nonparametric Regression, vol. 19 of Econometric Society Monographs, Cambridge
University Press, New York, NY, USA, 1990.
6 L. Keele, Semiparametric Regression for the Social Sciences, Wiley, Hoboken, NY, USA, 2008.
7 B. F. Smith, P. E. Bjørstad, and W. D. Gropp, Domain Decomposition: Parallel Multilevel Methods for
Elliptic Partial Differential Equation, Cambridge University Press, New York, NY, USA, 1996.
8 J. Kevorkian and J. D. Cole, Perturbation Methods in Applied Mathematics, vol. 34 of Applied Mathematical
Sciences, Springer, New York, NY, USA, 1981.
9 Maple software, version 13, Maplesoft, Waterloo, Canada.
10 W. W. S. Wei, Time Series Analysis: Univariate and Multivariate Method, Addison Wesley/Pearson,
Boston, Mass, USA, 2nd edition, 2006.
11 R. S. Tsay, “Outliers, level shifts, and variance changes in time series,” Journal of Forecasting, vol. 7, pp.
1–20, 1988.
Advances in Advances in Journal of Journal of
Operations Research
Hindawi Publishing Corporation
Decision Sciences
Hindawi Publishing Corporation
Applied Mathematics
Hindawi Publishing Corporation
Algebra
Hindawi Publishing Corporation
Probability and Statistics
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
International
Journal of Journal of
Mathematics and
Mathematical
Discrete Mathematics
Sciences