Time Series Analysis Using R
Time Series Analysis Using R
Module Code
Module Name
Course
Your Name
School
Date
Time series Analysis using R 2
Table of Contents
Table of Contents.......................................................................................................................2
Introduction................................................................................................................................3
Overview................................................................................................................................3
Exploratory Data Analysis.........................................................................................................3
Time Series.............................................................................................................................6
Time series with Bayesian approach....................................................................................10
Conclusions..............................................................................................................................14
References................................................................................................................................15
Appendix..................................................................................................................................16
R scripts................................................................................................................................16
Time series Analysis using R 3
Introduction
Overview
In economic terms, as a market, the housing market is usually determined by supply and
demand. However, although equilibrium can be reached in the market, this can lead to some
social problems such as housing unaffordability. In the particular case of London, England,
the last few decades have seen prices rise significantly: at the end of last a year lone, for the
first time in history, the average cost per property exceeded the £500,000 margin
(Antonakakis, 2018). This makes London the most expensive region in the UK to live in.
Referring again to the problem of housing unaffordability, it should be noted that the rate of
growth of housing prices has come to exceed the increase in individual incomes. In some
2014 countries, housing costs have even exceeded the average wage by a factor of several
As a result of the pandemic caused by the COVID-19 virus, it has been reported that
during this period there has been a concentration of demand for housing in England. This has
several origins: In the case of first-time buyers, because they do not have the complications of
moving house, such as securing a target home, arranging arrangements and payments, and
contemplating repairs and refurbishments, there has been an incentive to buy properties,
which are normally at the lower end of house prices. This, in turn, has led to an increase in
demand and hence higher prices. On the other hand, another factor that has played a role in
the increase in prices is the change in people's housing preferences as a result of the strict
contains are date, area, average house price (recorded in £GBP), area code, number of houses
sold and an indicator as to whether the area is a London borough or not. The data are updated
every month from January to 1995 January and 2020 are considered as London regions 45,
Time series Analysis using R 4
therefore, each region has data301 per variable. For the purposes of this analysis, the study
analysed the time series of average house prices to make a prediction. In particular, the study
will work with the Westminster region. Then, for the average price variable for the chosen
The graphs in Figure 1, contains analysis of normality test the histograms of the
residuals, it is not observed in any anomaly that would lead the study to suspect that the
model is not treated in the same way as the residuals or possible suspect that it is not white
noise.
As expected, the minimum corresponds to the first observation (for instance, January
1995) and the maximum was reached in February 1995. 2018. As it is a time series, the study
will care more about the most recent data. For data from the study would have 2018 a mean
of then 991,349, the study would expect the predictions to be around this value.
Time series Analysis using R 5
correlated values, due to their dependence on time (Laptev et al., 2017). Through the different
techniques, the study sought to analyse the information obtained, in order to identify a pattern
that allows us to describe this information over time. Subsequently, this pattern is extended
to a specific period of time in order to carry out a forecast. For time series from the Bayesian
approach, the study find that they have the same mathematical structure as the classical
approach, such as the Box-Jenkins model, but they differ in that in the estimation of the
parameters, these are considered random variables and as such have a probability space
transformation will be applied to the data to correct for heteroskedasticity and again a
Breusch-Pagan test will be performed to check again this condition (Đalić &Terzić, 2021).
the series, since this is necessary to make ARMA or ARIMA. Then, the Dickey-Fuller and
KPSS tests will be performed to check for stationarity in the series (Fedorová, 2016). Finally,
For the Bayesian model, the following will be sought: Firstly, it should be noted that
the study will work with the already differentiated series that it is used with the classical
model (Xiao et al., 2017). Then, initial values on the parameters for the a priori distributions
will be defined; this will aim at estimating moving averages. The data burn-in will be
performed and the number of Markov chains for the estimation will be established.
According to Xiao et al. (2017), the convergence of the parameters will be checked
Time series Analysis using R 6
graphically and by means of a Gelman test. Finally, a prediction is made for 5future values.
To conclude the above procedure, the classical and Bayesian models are compared according
to the goodness-of-fit criteria, and it is determined which one offers a better model. Finally, a
new Bayesian model will be proposed in order to find a better estimation of the variance,
testing several models and comparing them in terms of their p-values and considering that
Time Series
A time series is the succession of observations generated by a stochastic process, the
index of which is taken relative to time. In time series it is assumed that there is a correlation
structure between two observations, they are not independent (Woodward et al., 2017). The
study ran a time series for the price column for the Westminster region using R as observed
in Figure 2.
The analysis obtained a p-value < 0.05 so the data are not homoscedastic. Thus with
the BoxCox transformation the time series was made to be homoscedastic (constant
variance). The BoxCox command applies a transformation to the data according to a lambda
parameter that the study used to find a lambda value that generates a suitable transformation
to the data, we use the command BoxCox.lambda, then, the study finds lambda with
BoxCox.lambda and transform the data with BoxCox using the lambda parameter (Bauer et
al., 2019). All this under the Warrior method, which is simply the way lambda is going to be
calculated. It was anticipated to used the loglike method by changing "warrior" to "loglik",
however this would change the value of lambda and consequently the transformation, as
Finally, the study re-ran the test and with a p-value = 0.2557, while the analysis also
accepted H0, for instance, the study data were homoscedastic. An analysis was then
performed stationarity tests, using two tests: the Dickey-Fuller test and the Kwiatkowski-
Phillips-Schmidt-Shin (KPSS) test. As the D-F test did not pass, the study performed a
differencing test and so it passed the stationarity test, as well as the KPSS test as presented in
Figure 3.
Time series Analysis using R 8
Thereafter, the analysis used the sample ACF and PACF to give us an idea of the
number of lags so that the study could propose to fit with an ARIMA or SARIMA.
Theoretically an AR(p) will have the first few p lags of the PACF outside the confidence
bands and then the lags will quickly tend to zero, similarly an MA(q) will have the first p lags
of the PACF outside the confidence bands and then the lags will quickly tend to zero, hence it
is concluded that an ARMA(p,q) will fulfil both conditions are shown in Figure 4.
By means of auto.arima, the analysis obtained the model: ARIMA(2,0,3) WITH non-
zero MEAN with AIC = -1402.07 AICc = -1401. 68and BIC = -1376.14. Finally, the study
used forecast to obtain the prediction graph and the results are shown in Figure 5.
Time series Analysis using R 9
Figure 5: Projection of future values with confidence bands showing the full series
Figure 6: Projection of future values with confidence bands showing the series from the 2016
classical approach, using JAGS and the forecast package. For this, the study have to define
the equation to work, which requires two autoregressive variables (It will be called ρ 1and
ρ2) and 3latent variables (θ1, θ2 and θ3) for the moving averages, once the study had this, an
auxiliary variable z was defined to carry forward the moving averages. The following results
It is observed that the traces converge and the densities converge. Besides, it can be
seen that they converge very close to zero, which is what could also be seen in the classical
model. At the same time, it is evident that the behaviour of the residuals are normal, which is
Figure 9: Residuals for the ARIMA (2,0,3) model using the Bayesian approach
Form Figure 9, it can be seen that if white noise is followed, the ACF and PACF plots remain
within the bands. The tests of the assumptions were performed and they did pass the test of
independence (Ljung-Box). From the result, if, it is observed at the following graph showing
the fitted values against the observed ones the study noticed that the variance is
underestimated since the fitted values are below the observed ones.se to the prediction
presented in the classical model and, can be stated that it fits good. Though, in case there
were any issues from the beginning with the variance of the series not being constant. It is
seen for another model to fit with the auto.sarima command, the results obtained an ARIMA
(1,0,2).
The study attempted to fit different models with the ggplot2 package. It was observed
that most of them did not pass the constant variance test, so the study tried to take the ones
with the highest p-value for this test and with no correlation in the parameters. The list of
models that were kept are: ARIMA(1,0,2), ARIMA(2,0,3), ARIMA(4,0,2), and ARIMA
Time series Analysis using R 13
(2,0,1), ARIMA(3,0,3). Finally, the results for the prediction of the model is obtained the
Figure 10: Model prediction (then the best model obtained was the ARIMA(1,0,2).
Comparing these models from the results in Figures above, it can be concluded that
that the one that seemed to have the best fit was the ARIMA(1,0,2), which was in fact the one
Conclusions
In the end there was no single best model for the time series fit. The study attempted
to work with the same model obtained in the classical approach for the Bayesian part but for
these methods it was not the best option. It could be seen that the importance of having
different ways of approaching the modelling and how difficult it can be to reach a good fit,
especially in the Bayesian way because it requires more computational work which may not
Time series Analysis using R 14
have been much with the study data but having databases with millions of data can
complicate trying different models. The R scripts used to obtain study results are annexed.
References
Antonakakis, N., 2018. Rethinking London's' ripple effect'on house prices: other UK regions
transmit shocks too. British Politics and Policy at LSE.
Bauer, A., Züfle, M., Herbst, N. and Kounev, S., 2019, June. Best practices for time series
forecasting (tutorial). In 2019 IEEE 4th International Workshops on Foundations and
Applications of Self* Systems (FAS* W) (pp. 255-256). IEEE.
Đalić, I. and Terzić, S., 2021. Violation of the assumption of homoscedasticity and detection
of heteroscedasticity. Decision Making: Applications in Management and
Engineering, 4(1), pp.1-18.
Fedorová, D., 2016. Selection of unit root test on the basis of length of the time series and
value of ar (1) parameter. Statistika, 96(3), p.3.
Laptev, N., Yosinski, J., Li, L.E. and Smyl, S., 2017, August. Time-series extreme event
forecasting with neural networks at uber. In International conference on machine
learning (Vol. 34, pp. 1-5). Sn
Woodward, W.A., Gray, H.L. and Elliott, A.C., 2017. Applied time series analysis with R.
CRC press.
Xiao, Q., Chaoqin, C. and Li, Z., 2017. Time series prediction using dynamic Bayesian
network. Optik, 135, pp.98-103.
Time series Analysis using R 15
Appendix
R scripts