0% found this document useful (0 votes)
47 views

Time Series Analysis Using R

This document discusses time series analysis of London housing prices using R. It begins with an exploratory data analysis of house prices in Westminster, finding prices ranged from £121,387 to £1,117,408 with an average of £521,837. Classical and Bayesian time series models are then fit to the data and compared. For the classical model, tests for stationarity and transformations are applied before fitting ARIMA/SARIMA models. For the Bayesian model, prior distributions are specified and parameters are estimated. The models are compared based on goodness of fit to select the best for forecasting future house prices.

Uploaded by

John Kalar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Time Series Analysis Using R

This document discusses time series analysis of London housing prices using R. It begins with an exploratory data analysis of house prices in Westminster, finding prices ranged from £121,387 to £1,117,408 with an average of £521,837. Classical and Bayesian time series models are then fit to the data and compared. For the classical model, tests for stationarity and transformations are applied before fitting ARIMA/SARIMA models. For the Bayesian model, prior distributions are specified and parameters are estimated. The models are compared based on goodness of fit to select the best for forecasting future house prices.

Uploaded by

John Kalar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Time series Analysis using R 1

Time series Analysis using R

Module Code
Module Name
Course
Your Name

School

Date
Time series Analysis using R 2

Table of Contents

Table of Contents.......................................................................................................................2

Introduction................................................................................................................................3

Overview................................................................................................................................3
Exploratory Data Analysis.........................................................................................................3

Model fitting and Forecasting....................................................................................................5

Time Series.............................................................................................................................6
Time series with Bayesian approach....................................................................................10
Conclusions..............................................................................................................................14

References................................................................................................................................15

Appendix..................................................................................................................................16

R scripts................................................................................................................................16
Time series Analysis using R 3

Introduction
Overview
In economic terms, as a market, the housing market is usually determined by supply and

demand. However, although equilibrium can be reached in the market, this can lead to some

social problems such as housing unaffordability. In the particular case of London, England,

the last few decades have seen prices rise significantly: at the end of last a year lone, for the

first time in history, the average cost per property exceeded the £500,000 margin

(Antonakakis, 2018). This makes London the most expensive region in the UK to live in.

Referring again to the problem of housing unaffordability, it should be noted that the rate of

growth of housing prices has come to exceed the increase in individual incomes. In some

2014 countries, housing costs have even exceeded the average wage by a factor of several

times10, as opposed to when1997, they were only a factor of several times.

As a result of the pandemic caused by the COVID-19 virus, it has been reported that

during this period there has been a concentration of demand for housing in England. This has

several origins: In the case of first-time buyers, because they do not have the complications of

moving house, such as securing a target home, arranging arrangements and payments, and

contemplating repairs and refurbishments, there has been an incentive to buy properties,

which are normally at the lower end of house prices. This, in turn, has led to an increase in

demand and hence higher prices. On the other hand, another factor that has played a role in

the increase in prices is the change in people's housing preferences as a result of the strict

confinement measures decreed by the government at the end of March 2020

Exploratory Data Analysis

A database containing information on dwellings in London was used. The variables it

contains are date, area, average house price (recorded in £GBP), area code, number of houses

sold and an indicator as to whether the area is a London borough or not. The data are updated

every month from January to 1995 January and 2020 are considered as London regions 45,
Time series Analysis using R 4

therefore, each region has data301 per variable. For the purposes of this analysis, the study

analysed the time series of average house prices to make a prediction. In particular, the study

will work with the Westminster region. Then, for the average price variable for the chosen

region are presented in Table 1.

Table 1: Descriptive statistics of House Price (Mean)

Minimum Medium Max

Average price 121,387 521,837 1,117,408

The graphs in Figure 1, contains analysis of normality test the histograms of the

residuals, it is not observed in any anomaly that would lead the study to suspect that the

model is not treated in the same way as the residuals or possible suspect that it is not white

noise.

Figure 1: House price normality test

As expected, the minimum corresponds to the first observation (for instance, January

1995) and the maximum was reached in February 1995. 2018. As it is a time series, the study

will care more about the most recent data. For data from the study would have 2018 a mean

of then 991,349, the study would expect the predictions to be around this value.
Time series Analysis using R 5

Model fitting and Forecasting


In the classical time series approach, broadly speaking, the study has a series of

correlated values, due to their dependence on time (Laptev et al., 2017). Through the different

techniques, the study sought to analyse the information obtained, in order to identify a pattern

that allows us to describe this information over time. Subsequently, this pattern is extended

to a specific period of time in order to carry out a forecast. For time series from the Bayesian

approach, the study find that they have the same mathematical structure as the classical

approach, such as the Box-Jenkins model, but they differ in that in the estimation of the

parameters, these are considered random variables and as such have a probability space

associated with them (Laptev et al., 2017).

First, a Breusch-Pagan test is performed to verify the homoscedasticity of the model

of the observations. Based on the previous result, if heteroskedasticity exists, a Box-Cox

transformation will be applied to the data to correct for heteroskedasticity and again a

Breusch-Pagan test will be performed to check again this condition (Đalić &Terzić, 2021).

Subsequently, if homoscedasticity exists, a differencing will be used to confer stationarity to

the series, since this is necessary to make ARMA or ARIMA. Then, the Dickey-Fuller and

KPSS tests will be performed to check for stationarity in the series (Fedorová, 2016). Finally,

based on the previous results, an ARIMA or SARIMA adjustment is proposed and a

prediction is made for 5 future values.

For the Bayesian model, the following will be sought: Firstly, it should be noted that

the study will work with the already differentiated series that it is used with the classical

model (Xiao et al., 2017). Then, initial values on the parameters for the a priori distributions

will be defined; this will aim at estimating moving averages. The data burn-in will be

performed and the number of Markov chains for the estimation will be established.

According to Xiao et al. (2017), the convergence of the parameters will be checked
Time series Analysis using R 6

graphically and by means of a Gelman test. Finally, a prediction is made for 5future values.

To conclude the above procedure, the classical and Bayesian models are compared according

to the goodness-of-fit criteria, and it is determined which one offers a better model. Finally, a

new Bayesian model will be proposed in order to find a better estimation of the variance,

testing several models and comparing them in terms of their p-values and considering that

there is no correlation between their parameters.

Time Series
A time series is the succession of observations generated by a stochastic process, the

index of which is taken relative to time. In time series it is assumed that there is a correlation

structure between two observations, they are not independent (Woodward et al., 2017). The

study ran a time series for the price column for the Westminster region using R as observed

in Figure 2.

Figure 2: Time series price trend


For the time series, it was essential to have constant variance, so that the study can perform a

h-hosedastic test, under the Breusch- Pagan test, assuming:

H0 = The data are homoscedastic (have constant variance).


H1 = The data are heteroscedastic (the variance is not constant).
Time series Analysis using R 7

The analysis obtained a p-value < 0.05 so the data are not homoscedastic. Thus with

the BoxCox transformation the time series was made to be homoscedastic (constant

variance). The BoxCox command applies a transformation to the data according to a lambda

parameter that the study used to find a lambda value that generates a suitable transformation

to the data, we use the command BoxCox.lambda, then, the study finds lambda with

BoxCox.lambda and transform the data with BoxCox using the lambda parameter (Bauer et

al., 2019). All this under the Warrior method, which is simply the way lambda is going to be

calculated. It was anticipated to used the loglike method by changing "warrior" to "loglik",

however this would change the value of lambda and consequently the transformation, as

"warrior" worked well for us, it is left with this method.

Finally, the study re-ran the test and with a p-value = 0.2557, while the analysis also

accepted H0, for instance, the study data were homoscedastic. An analysis was then

performed stationarity tests, using two tests: the Dickey-Fuller test and the Kwiatkowski-

Phillips-Schmidt-Shin (KPSS) test. As the D-F test did not pass, the study performed a

differencing test and so it passed the stationarity test, as well as the KPSS test as presented in

Figure 3.
Time series Analysis using R 8

Figure 3: Decomposition of the time series.

Thereafter, the analysis used the sample ACF and PACF to give us an idea of the

number of lags so that the study could propose to fit with an ARIMA or SARIMA.

Theoretically an AR(p) will have the first few p lags of the PACF outside the confidence

bands and then the lags will quickly tend to zero, similarly an MA(q) will have the first p lags

of the PACF outside the confidence bands and then the lags will quickly tend to zero, hence it

is concluded that an ARMA(p,q) will fulfil both conditions are shown in Figure 4.

Figure 4: Differentiation of the time series

By means of auto.arima, the analysis obtained the model: ARIMA(2,0,3) WITH non-

zero MEAN with AIC = -1402.07 AICc = -1401. 68and BIC = -1376.14. Finally, the study

used forecast to obtain the prediction graph and the results are shown in Figure 5.
Time series Analysis using R 9

Figure 5: Projection of future values with confidence bands showing the full series

Figure 6: Projection of future values with confidence bands showing the series from the 2016

Time series with Bayesian approach


The study, thereafter attempted to fit the ARIMA(2,0,3) model, proposed in the

classical approach, using JAGS and the forecast package. For this, the study have to define

the equation to work, which requires two autoregressive variables (It will be called ρ 1and

ρ2) and 3latent variables (θ1, θ2 and θ3) for the moving averages, once the study had this, an

auxiliary variable z was defined to carry forward the moving averages. The following results

are shown in Figure 7 were obtained.


Time series Analysis using R 10

Figure 7: Tracing and density of the estimated parameters using strings.3


Time series Analysis using R 11

Figure 8: Tracing and density of the estimated parameters using strings.3

It is observed that the traces converge and the densities converge. Besides, it can be

seen that they converge very close to zero, which is what could also be seen in the classical

model. At the same time, it is evident that the behaviour of the residuals are normal, which is

also evident in Figure 9.


Time series Analysis using R 12

Figure 9: Residuals for the ARIMA (2,0,3) model using the Bayesian approach

Form Figure 9, it can be seen that if white noise is followed, the ACF and PACF plots remain

within the bands. The tests of the assumptions were performed and they did pass the test of

independence (Ljung-Box). From the result, if, it is observed at the following graph showing

the fitted values against the observed ones the study noticed that the variance is

underestimated since the fitted values are below the observed ones.se to the prediction

presented in the classical model and, can be stated that it fits good. Though, in case there

were any issues from the beginning with the variance of the series not being constant. It is

seen for another model to fit with the auto.sarima command, the results obtained an ARIMA

(1,0,2).

The study attempted to fit different models with the ggplot2 package. It was observed

that most of them did not pass the constant variance test, so the study tried to take the ones

with the highest p-value for this test and with no correlation in the parameters. The list of

models that were kept are: ARIMA(1,0,2), ARIMA(2,0,3), ARIMA(4,0,2), and ARIMA
Time series Analysis using R 13

(2,0,1), ARIMA(3,0,3). Finally, the results for the prediction of the model is obtained the

results are shown in Figure 10.

Figure 10: Model prediction (then the best model obtained was the ARIMA(1,0,2).

Comparing these models from the results in Figures above, it can be concluded that

that the one that seemed to have the best fit was the ARIMA(1,0,2), which was in fact the one

suggested by the auto.sarima command.

Conclusions
In the end there was no single best model for the time series fit. The study attempted

to work with the same model obtained in the classical approach for the Bayesian part but for

these methods it was not the best option. It could be seen that the importance of having

different ways of approaching the modelling and how difficult it can be to reach a good fit,

especially in the Bayesian way because it requires more computational work which may not
Time series Analysis using R 14

have been much with the study data but having databases with millions of data can

complicate trying different models. The R scripts used to obtain study results are annexed.

References
Antonakakis, N., 2018. Rethinking London's' ripple effect'on house prices: other UK regions
transmit shocks too. British Politics and Policy at LSE.

Bauer, A., Züfle, M., Herbst, N. and Kounev, S., 2019, June. Best practices for time series
forecasting (tutorial). In 2019 IEEE 4th International Workshops on Foundations and
Applications of Self* Systems (FAS* W) (pp. 255-256). IEEE.

Đalić, I. and Terzić, S., 2021. Violation of the assumption of homoscedasticity and detection
of heteroscedasticity. Decision Making: Applications in Management and
Engineering, 4(1), pp.1-18.

Database obtained from https://www.kaggle.com/justinas/housing-in-london

Fedorová, D., 2016. Selection of unit root test on the basis of length of the time series and
value of ar (1) parameter. Statistika, 96(3), p.3.

Laptev, N., Yosinski, J., Li, L.E. and Smyl, S., 2017, August. Time-series extreme event
forecasting with neural networks at uber. In International conference on machine
learning (Vol. 34, pp. 1-5). Sn

Woodward, W.A., Gray, H.L. and Elliott, A.C., 2017. Applied time series analysis with R.
CRC press.

Xiao, Q., Chaoqin, C. and Li, Z., 2017. Time series prediction using dynamic Bayesian
network. Optik, 135, pp.98-103.
Time series Analysis using R 15

Appendix
R scripts

You might also like