Linear Machine Learning and Probabilistic Approaches For Time Series Analysis
Linear Machine Learning and Probabilistic Approaches For Time Series Analysis
Linear, Machine Learning and Probabilistic Approaches for Time Series Analysis
1. REFERENCE
B. M. Pavlyshenko, "Linear, machine learning and probabilistic approaches for time series
analysis," 2016 IEEE First International Conference on Data Stream Mining & Processing
(DSMP), Lviv, 2016, pp. 377-381, doi: 10.1109/DSMP.2016.7583582.
2. ABSTRACT
In this paper we study different approaches for time series modelling. The forecasting
approaches using linear models, ARIMA algorithm, XGBoost machine learning algorithm are
described. Results of different model combinations are shown. For probabilistic modelling
the approaches using copulas and Bayesian inference are considered.
3. SUMMARY
This paper intends to analyse different approaches for time series predictive
analysis.
The author uses the sales time series of Rossmann stores for the analysis.
It is needed to forecast not just the sales' probable values but also the distribution,
which is required for the risk analysis for assessing different risks in sales dynamics.
To analyse different forecasting approaches, the author used two months of the
historical data as validation data for accuracy scoring using root mean squared error
(RMSE).
Models such as ARIMA, linear regression with LASSO regularization, XGBoost
(Extreme gradient boost) model is used to show the comparison.
The author also shows that stacking with ARIMA on the first step and XGBoost on
the second step model boosts the accuracy of time series forecasting.
Two ways of classification are used, the first way based on the time-series approach
and second, based on the identical and independently distributed variables.
A copula is a multivariate probability distribution, which consists of uniform marginal
probability distribution for each variable.
Copulas describes the dependency between random variables.
Sklar's Theorem asserts that any univariate marginal distribution functions and a
copula, which describes the dependence structure between the variables written
together, represents a multivariate joint distribution.
The copula comprises all information on the dependence structure between the
variables, whereas the marginal cumulative distribution functions include all
information related to the marginal distributions.
Multivariate dependencies with more than two variables, can be analysed using vine
copulas, which allow to construct complex multivariate copula using bivariate ones.
The copula can be used to model stochastic dependencies between various factors
of sales time series distinctly from their marginal distributions.
For experimenting with time series analysis using Bayesian interference, the author
uses Markov Chain Monte Carlo (MCMC) algorithm.
The plots show the stationary process indicates the good convergence and adequate
burn-in period in the MCMC algorithm.
The author examines mean sales using Gaussian distribution, sales using student
distribution, and promo using Bernoulli's distribution in the natural logarithmic
scale.
The Bayesian approach models stochastic dependencies between various factors of
sales time series and obtain the distributions for model parameters. This approach
can be useful for estimating multiple risks related to sales dynamics.
4. NOTES
A case study for different approaches for time series modelling has been conducted.
Results are pictorially plotted in the graph.
For the time series linear regression case, Bayesian interference was applied.
For probabilistic modelling, copulas were used.
The probabilistic approach is used for risk assessment problems.
For machine learning modelling XGBoost model was used.
5. DEFICITS
Paper talks about the modelling of time series using different approaches, but do
not give the conclusion and comparison for each of the experiments conducted.