0% found this document useful (0 votes)
9 views13 pages

Profit_Prediction_Using_ARIMA_SARIMA_and_LSTM_Mode

The article discusses a comparative study of profit prediction using ARIMA, SARIMA, and LSTM models in time series forecasting. The study shows that LSTM outperforms both ARIMA and SARIMA, achieving an accuracy of 97.01% in profit predictions. The research emphasizes the importance of time series analysis in various fields, including finance and healthcare, and outlines the methodology for model building and data analysis.

Uploaded by

Hoàng Liêm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views13 pages

Profit_Prediction_Using_ARIMA_SARIMA_and_LSTM_Mode

The article discusses a comparative study of profit prediction using ARIMA, SARIMA, and LSTM models in time series forecasting. The study shows that LSTM outperforms both ARIMA and SARIMA, achieving an accuracy of 97.01% in profit predictions. The research emphasizes the importance of time series analysis in various fields, including finance and healthcare, and outlines the methodology for model building and data analysis.

Uploaded by

Hoàng Liêm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3224938

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

Profit Prediction Using ARIMA, SARIMA


and LSTM Models in Time Series
Forecasting: A Comparison
UPPALA MEENA SIRISHA 1 , MANJULA C BELAVAGI2 , GIRIJA ATTIGERI3
1
Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
(e-mail:[email protected])
2
Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India (e-mail:
[email protected])
3
Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India (e-mail:
[email protected])
Corresponding author: Manjula C Belavagi(e-mail: [email protected]).

ABSTRACT Time series forecasting using historical data is significantly important nowadays. Many fields
such as finance, industries, healthcare, and meteorology use it. Profit analysis using financial data is crucial
for any online or offline businesses and companies. It helps understand the sales and the profits and losses
made and predict values for the future. For this effective analysis, the statistical methods- Autoregressive
Integrated Moving Average (ARIMA) and Seasonal ARIMA models (SARIMA), and deep learning method-
Long Short- Term Memory (LSTM) Neural Network model in time series forecasting have been chosen. It
has been converted into a stationary dataset for ARIMA, not for SARIMA and LSTM. The fitted models
have been built and used to predict profit on test data. After obtaining good accuracies of 93.84% (ARIMA),
94.378% (SARIMA) and 97.01% (LSTM) approximately, forecasts for the next 5 years have been done.
Results show that LSTM surpasses both the statistical models in constructing the best model.

INDEX TERMS Statistical methods, Time Series Forcasting, Deep Learning, Profit Prediction, ARIMA,
SARIMA, LSTM

I. INTRODUCTION they are not dependent on each other. A time series whose
properties do not change over time is called stationary. For
TIME series is considered as a group of data points
A enumerated in time sequence [1]. Time series data is
a group of quantities which are assembled over uniform
Example temperatures of specific month plotted over years.
Temperatures of all the months plotted for a year is non
stationary as temperatures show variation with respect to the
intervals in time and ordered in a chronological fashion [2]
season. For building prediction model we need stationary
[3]
time series. To eliminate non stationarity from a series,
Auto Regressive Integrated Moving Average (ARIMA) [4] commonly differencing is done. Sometimes, if the time series
explains the time series under consideration on the basis of is more complex, more than one difference operation may
its previous values, that is, its lags and the lagged prediction be necessary. Hence, the difference value "d" value is the
errors. It can be useful for the future forecast for a non minimum number of differencing required to turn the time
stationary time series exhibiting patterns and is not irregular series into a stationary one. The d value would be 0, if the
white noise. The 3 characteristic terms of ARIMA model series is already stationary. If a time series is univariate and
are the parameters (p, d, q) wherein, each of the terms contains trend and/or seasonal components, then Seasonal
are the orders of the AR term, the differencing needed to ARIMA (SARIMA) model is used. If an external predictor,
change the time series into a stationary one and the MA term known as, ‘exogenous variable’ is added to the SARIMA
respectively. The term AR in ARIMA signifies that it is a model then, it is known as the SARIMAX model [5]. In order
linear regression model that makes use of its lags in order to use an exogenous variable, the requirement is to know the
to predict. Linear regression models give the finest results variable’s value during the period of forecast also.
when there is no correlation between the predictors, and

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3224938

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

Since time series has sequence dependence among the Neural Network (FFNN), the Long Short-Term Memory
input variables, a great way of analysis would be to use Neu- (LSTM), the Convolutional Neural Network (CNN) and a
ral Network(NN)s that can handle the dependent properties. hybrid LSTM-CNN.
Recurrent NN (RNN) would be a perfect choice for the same. Ruchir et al [15] and [16] performed stock market pre-
Long Short Term Memory (LSTM) network is one kind of diction using 10 years Bombay Stock Exchange data. They
RNN that is used in DL as huge datasets can be trained to have used ARIMA, Simple Moving Average(SMA) and
obtain huge accuracies. This model has a learning mechanism Holt-Winters models. The parameters considered for the
to memorize and understand mapping from input variables evaluation are RMSE, Mean Absolute(MA) Error and MA
to output variables and figures out what context deriving out Presentation Error. They concluded that SMA shows best per-
of the input data is helpful to do the mapping, and could formance whereas ARIMA model’s performance was poor.
dynamically alter the context as per the necessity. Adhistya Erna Permanasari et al [17] analyzed and im-
The gross profit obtained will be predicted using ARIMA, plemented SARIMA on time series to predict the malaria
SARIMA and LSTM in Time Series Forecasting and a com- occurrences in the United States of America, based on the
parative study of the outcomes of these models is performed. monthly data.Disease forecasting is important to help the
These methods help in understanding the underlying context stakeholders make better policies.
of the data points, thereby make predictions about the future Owing to the increasing market and importance in the
values of those data points [6] [7]. The paper focuses on the field of green energy, Mohammed H. Alsharif et al [18]
following : used ARIMA and SARIMA to predict daily and month-wise
• To perform data collection and explore the intrinsic mean solar radiation in the city of Seoul, respectively. This
structure of the series study is carried out to help the government to make changes
• To analyze the dataset and extract required variables in government policies for advancements in the fields of
• Develop models for profit prediction renewable energy.
• Perform comparative analysis of ARIMA, SARIMA Peng Chen et al [19] used the ARIMA model for the pre-
and LSTM models dictions of crimes related to property which includes robbery,
• Forecasting for the next 5 years using the models theft, and burglary in the city of China. A period of fifty
The paper is organized as follows: In section II literature weeks of recordings of property crime was selected as the
related to time series analysis and deep learning models dataset. The model was trained and the predicted outcomes
are discussed. In section III model building using ARIMA, have been analysed and compared with the Single Exponen-
SARIMA and LSTM are discussed. Subsequently in section tial Smoothing (SES) and Hyperbolic Exponential Smooth-
IV result analysis is performed. Finally, Section V concludes ing (HES). It was found out that the SES and HES gave
by highlighting the work carried out. a lesser accuracy than the ARIMA model. Dattatrayet et al
[20] conducted survey on stock market prediction techniques
II. LITERATURE REVIEW based on year publication, methodology and datasets used
In this section papers related to ARIMA, SARIMA and and performance metrics. They concluded that NN based and
LSTM models have been discussed. In [2] Auto-correlation- fuzzy based techniques can be effectively used for the stock
functions(ACF) and cross-correlation functions (CCF) are market prediction. Omer Berat Sezer et al [21] conducted
used to show the relationship between lags which occurs be- a systematic review using the concept of deep learning for
tween the time series. Authors mentioned ordinary, weighted financial time series prediction. Various types of DL models
and classical correlated least squares regression techniques. which include DNN, RNN, DBN and CNN have been used
Mishra et. al. [8] presented a literature review regarding the for predicting the prices of products. They observed that
usefulness of data science and stochastic approaches for time CNN works better for classification when compared with
series forecasting. They mentioned how various researchers deep learning models which are dependent on RNN and is
used ARIMA, SARIMA, vector ARIMA for variable time suitable for static data representation. They further observed
series and ARIMAX models to analyse the rainfall pattern. that LSTM was the best method for financial time series
They also mentioned the use of neural networks and hybrid forecasting problems.
methods for weather forecasting. Luo et al [9] have discussed Siami et al [22] investigated ARIMA and LSTM in cal-
the identification of correlations between time series and culating the forecasts for data belonging to financial time
event traces. ARIMA model is also used in road safety series and compared their error percentages. They have split
research [10] and [10]. To do this authors have integrated up their datasets into train (70%) and test (30%) data for the
moving average with explanatory variables (ARIMAX). San- accuracy of their models and observed that the prediction
gare et al. [11] used analytical measures and hybrid machine was improved by 85% on an average using LSTM algorithm
learning to predict the road-traffic accidents. Almeida et al. and hence indicated that LSTM performed better compared
[12] used SARIMA model to understand the traffic flow to ARIMA.
characteristics. Artificial neural network algorithms have also Ghassen Chniti et al [23] used LSTM and SVR models
been proposed in [13] [14] for the forecasting approach. for forecasting mobile phone costs in the Europe market. A
The most commonly used algorithms are the Feed-Forward comparison of the mentioned models has been done on uni
2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3224938

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

and multivariate data and it has been found out that SVR
worked better on univariate data while LSTM performed
better on multivariate data, producing RSME of 35.43 and
24.718 respectively.
Srihari et al [24] performed comparative analysis of for-
casting algorithms namely ARIMA, MVFTS, CNN, LSTM
and CBLSTM. They have tested the performance of these
algorithms by considering various domains time series data.
They concluded that performance of weighted MVFTS,
ARIMA, CNN (Convolution Neural Network)s , CBLSTM
was good for data considered in periods more than a couple
of years.
Neural Network based method for stock market is pro-
posed by Pang et al [25], [26] and Jiang et al [27]. Ma-
chine learning based stock market prediction is carried out
in [16]. In [25] authors used advanced LSTM to perform
real time data analysis on Internet data. They concluded that
performance of the model was satisfactory for real time data.
However, the model performance is poor on historical data FIGURE 1: State Machine Diagram of General Flow of Time
due to limited use of text information. Comparative analysis Series Prediction
of ARIMA and NNs models for stock market prediction is
carried out in [26]. They have analysed the results based
on forecast error. Based on this parameter working of both
models was good. They found that performance of ARIMA fields required for the time series analysis are also extracted
model is better with respect forecast accuracy. Financial time and represented in specific format. The attribute "OrderDate"
series forecasting is carried by Sezer et al. [28]. They used is converted to a datetime object and the year, month and days
image dta and extracted the technical indicators which were are extracted to perform exploratory analysis.
necessary for processing. Year wise profit is analyzed using a scatter plot, the order
dates belonging to the same year have been grouped and the
III. METHODOLOGY mean of the profit of all the orders of the respective years are
The Sales dataset [29] has been obtained from the downloads computed. A bar graph is also used to analyze yearwise mean
section of efor excel.com website, and consists of around 1 profit. Similarly, for the next graph the order dates belonging
million sales records, ranging over a period of 46 years (1972 to the same month have been grouped and the mean of the
– 2017). The dataset comprises of multiple variables, which profit of all the orders of the respective months has been taken
are the item type, order date, shipment date, order ID, order and plotted. A bar graph is plotted to analyze the monthly
priority (high, medium, low, cancelled), the region and the mean profit.
country where the orders belong to, the sales channel (online The scatter plot between the Year of order on x-axis and
or offline), the unit price and the cost of each item type, the Profit on y-axis in Figure 2 indicates that there has been
number of units sold per item type, the total revenue, the total a steady increase in profit from 1972 to 2000. The profit
cost and the gross profit made, after taxes. remained almost the same till 2005 then there is a sudden
Figure 1 shows the different steps involved in the process fall from 2005 to 2010. After that, there has been a gradual
of time series forecasting. The first step is to collect the data increase from 2010 to 2017.
over the period of time. The data collected may contain er-
roneous, incomplete or repeated data. Hence, in the next step The bar graph shown in Figure 2 has been displayed to
data preprocesing is carried out by handling missing values. record the mean profit of each of the years. It is seen that there
Once the data is ready exploratory data analysis is carried have been considerable dips in the profit in 1975, 1980, 1983,
out to have better understanding of the data. Subsequently 1992, 1999, and 2009. The general trend of the dataset has
ARIMA, SARIMA and LSTM models are built for profit been noted to increasing gradually and reaching a maxima
prediction. The models built are evaluated and visualized. value between 2000 and 2005, and thereafter, a fall in profit
Data Preprocessing has been noticed till 2009, post which the increasing trend
The preprocessed data set is cereated by handling missing continued. The bar graph in Figure 3 shows the mean profit
values and grouping the data. The same occurrences of the with respect to order month. The order dates belonging to
order date have been grouped together from the branches the same year are grouped month wise. The mean profit of
of the company in different regions and countries in the all the orders of each month is analyzed as shown in Figure
world and the profits on these order dates have been added 4.
together using the sum aggregate function. The necessary
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3224938

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 2: Mean Yearly Profit vs Order Year

FIGURE 5: FLow Chart of ARIMA Model

tells the strength of the relationship between the current


value of the series with respect to its previous values. The
ACF function finds the correlation depending on all of the
four components of a time series, namely, trend, seasonality,
cyclic and residual. PACF fetches correlation of the residuals
with the next lagged value instead of finding present lagged
values like the ACF. Further the model fit is performed in
three stages, building AR model, the MA model and lastly
combining them to obtain ARIMA. The model is used to
make predictions on validation data. After this, the error and
FIGURE 3: Mean Monthly Profit vs Order Month accuracy of the models are checked and evaluated.
Training and Validation - ARIMA model
The dataframe has been split into training and validation
A. MODEL BUILDING USING ARIMA datasets in the ratio of 4:1 (80% train and 20% valid datasets).
In line with the process mentioned ARIMA model is built. The model was built on train data and the validation data
Figure 5 shows the flow of ARIMA model. The series is was used in prediction to check for the accuracy. Window
checked for stationarity and if it is not several transformations functions have been used to perform statistical operations on
are applied to make it stationary. Subsequently Auto Correla- data subsets. Over every row in the DataFrame, new value can
tion Function (ACF), Partial ACF (PACF) graphs are plotted be calculated with rolling functions. A window consists of a
and the values for the terms p, d and q (model parameters) are subset of rows from the dataset and a desired calculation can
obtained. ACF fetches auto correlation values belonging to a be performed on these rows. A required amount of window
series with the lagged values. These values will be graphed can be specified. Window rolling mean or moving average
with the confidence band to obtain the ACF plot which calculation leads to an updated average value for each row
in the specified window. Similarly, Window rolling standard
deviation is used. The window has been chosen to be 24. The
Window rolling mean and Window rolling standard deviation
calculated with a window of 24 is plotted.
Tests for Stationarity - ADF and KPSS Tests
Stationarity of the time series is analyzed by considering
different statistical methods of forecasting. If the time series
show it is stationary, its statistical characteristics remain
constant over a time period (example: mean variance stan-
dard deviation). Hence, there would be no visible trend or
seasonality. While a time series exhibiting non-stationarity is
quite the opposite and these properties are time dependent.
FIGURE 4: Profit (monthwise) vs (Order Year, Order Month) The ADF test is a classic stochastic test for determining
plot if the time series being used is stationary or not. According
4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3224938

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

to mathematics, unit roots cause non-stationarity in a time After this, the Moving Average has been subtracted from
series. This test determines the presence or non existence of a the above. This subtraction is known to be Differencing.
unit root. The ADF test uses two types of hypotheses namely Since the average of 24 values has been taken by specifying
null and alternate. The first one assumes the existence of a a window of 24, the rolling mean has not been defined for
unit root. This implies the non-stationarity of the time series. the first 23 values. Therefore, these 23 null values have been
The second one assumes the non-existence of a unit root. This dropped. Subsequently, after removing the moving average,
implies the stationarity of the time series. Mathematically, it has been observed that the rolling mean and standard
ADF test states that: Null Hypothesis (H0): α =1 in the below deviation are approximately horizontal. This has been done
equation, as in root is existing. to remove the remaining trend and get a stationary series.
Shift transformation is also carried out where the previous
yt = c + βt + αyt−1 + φ∆yt−1 + et (1) value is subtracted from the current value. This also helps
ensure stationarity. Thus two different transformations have
where, yt−1 is first lag of time series and ∆yt−1 first differ- been tried out namely log and time shift. For the sake of
ence of the series at time t − 1. simplicity, only log scale is used because reverting back to
From this test ADF Test Statistic, p-value, the number of the original scale during forecasting would be easier.
lags that have been made use of, the number of observations Residual is the variability left in the series after eliminating
that have been made use of ADF regression and the critical the trend and seasonality, and it cannot be explained by the
values are obtained. In case the p-value is lesser than or model. Residual is used to build the ARIMA model, so its
equal to the defined Significance level of 5% (0.05) then it stationarity has been ensured.
is concluded that the null hypothesis has been declined and AR Model
stationarity of the series has been established. On the other This model declares that the output variable depends linearly
hand, if the p-value is higher than the defined significance on its own previous values. The order for this model has
level and if the ADF test statistic is higher than any of been taken as (2,1,0), by considering q=0 as it is just AR.
the critical values, then it is weak evidence against the null According to the estimated AR terms from the PACF plot, p’s
hypothesis and the series is concluded to be non-stationary. value was supposed to be 1 (RSME of ARIMA= 13.0421),
The p-value obtained from performing ADF for the first time but it resulted in RSME of the combined model ARIMA
on pre-processed data is 0.338 approximately. Hence the data to be greater as opposed to when p’s value is 2 (RSME
has no unit root and is considered to be non-stationary. of ARIMA= 12.5764). Using ARIMA.fit() function from
The KPSS test is also performed in order to find out if the statsmodels.tsa.arima model, the AR model has been fitted
time series is stationary around a mean or a linear trend or is by maximum likelihood, i.e. building of model is done using
not stationary because of the presence of any unit root(s). the transformed train data.
This test is different from ADF test as its null hypothesis Later the AR model fitting is carried out by maximum like-
is exactly opposite to that of ADF’s. The ADF test uses lihood, i.e. building of model is done using the transformed
two types of hypothesis which are null and alternate, which train data.
assume that the time series under consideration is either The ARIMA.predict() function is used, which takes the fit-
stationary or not, respectively. ted results of the AR model and the start and end parameters
The results of the test contain KPSS Test Statistic, p-value, as the datetimes of the beginning and ending of the valid data
the number of lags that have been used and critical values. and the gross profits from the valid dataset. The gross profit of
The p-value here is the probability score that helps decline the valid and the predicted gross profit values vs order year has
null hypothesis if it is less than 0.05, making the series non- been plotted and the accuracy metrics have been displayed
stationary and vice versa. To decline the null hypothesis, the after scaling back.
test statistic should also be higher than the critical values. The Now, the model has to be scaled back to its original
p value obtained from performing KPSS for the first time on scale. So, to deal with the rolling mean transformation done
preprocessed data is 0.010. Hence the data is non-stationary. earlier, cumulative sum has been performed on the predicted
Transformations for achieving stationarity data, using cumsum() function. To counter the effect of
ARIMA model requires stationarity. As the time series in log transformation, log scaling and exponential have been
consideration is non-stationary, differencing has to be applied performed using numpy.ones()*numpy.log() (for the given
to reduce trend and seasonality. The transformations are done indexes) and numpy.exp() respectively. To nullify the effect
as follows. of differencing, the numpy.add() function has been used. The
There is a general increasing trend in the series, except AR prediction graph and the accuracy metrics have been
between 2005 and 2010, so a transformation that penalizes displayed.
higher values more than the smaller ones has been chosen. MA Model This model declares that the output variable
The Logarithmic Transformation has been performed to re- depends linearly on the present and numerous previous val-
duce the trend, i.e., by taking the natural logarithm of the ues of a statistic (imperfectly predictable) term. The order for
dependent variable namely Gross Profit (in thousands) from this model has been taken as (0,1,2), by considering p=0 as
the train data. it is just MA. According to the estimated MA terms from the
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3224938

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 6: FLow Chart of ARIMA Model


FIGURE 7: Swimlane Diagram of LSTM Model
ACF plot, q’s value is taken as 2. Model is built using the
transformed train data.
and a Dense each.
Combined model – ARIMA Model
The third lane depicts predictions on validation data using
The order for this model has been taken as (2,1,2), by
the fitted model and evaluation. Subsequently forecasting for
considering p=2,d=1,and q=2 as per the insights gathered
next 5 years.
from AR and MA models. The model has been built using
Data Preprocessing for LSTM LSTM requires additional
the ARIMA.fit() function. Now, the model has to be scaled
data preprocessing compared to stochastic models. It has
back to its original scale, similar to AR model.
been incorporated as follows.
Splicing of Data The dataset being considered here has
B. MODEL BUILDING USING SARIMA
only Order Date and Gross Profit in Thousands columns. The
SARIMA which is an improved version of the ARIMA total number of rows of the dataset is 548. Out of these 548,
model which incorporates seasonal effects as well. The flow Then the training and testing sets are observed and it is found
of SARIMA model is shown in the Figure 6. The series is that they consist of two columns or attributes which are order
checked for non-stationarity data as SARIMA works for such date and the gross profit in thousands. These parameters are
data. This model takes two kinds of orders namely order normalized using MinMax scaler transform.
and seasonal order (p,d,q) and (P,D,Q,s). Similar to ARIMA, Each and every feature/attribute is translated completely
the order of this model consists of number of parameters of individually so as to ensure every single value lies within the
AR, order of differencing, and parameters of MA as p, d, q range of the training data set. This scaler is used in place of
terms. The seasonal order consists of the seasonal element the mean and variance stabilization transformations.
of this model for the AR units, differences, MA units, and The training set is fitted so that the new model is able to
periodicity as P, D, Q, s terms. D here has to be the integer adapt to unknown data. Data transformations on training and
that tells about the order of integration of the process being testing data is performed to obtain the data in the specific
performed. P and Q should be integral values that indicate the range.
orders of the AR, MA units which help in including all the For the time series, LSTM model to be used on a dataset,
lags up until that point or they can either be iterating values the data has to be reorganized into sample structures con-
that give specific AR/MA lags that need to be included. taining both input and output constituents prior to fitting the
The data splitting and other steps remains the same as data into the LSTM model. It is challenging for all of these
ARIMA model. tasks to be finished in a proper manner. TImeseiesgenerator()
will embed the dataset that is being used into an object of the
C. MODEL BUILDING USING LSTM class Time Series Generator. This object will then be inputted
Figure 7 shows the activity swimlane diagram of LSTM straight into the NN as the dataframe that should be worked
model, having three lanes. The first lane depicts taking upon. Timeseriesgenerator function takes many parameters
data input subsequently data cleaning, feature extraction, which will be discussed in detail below. First two parameters
performing EDA and MinMax Scaling to fit in the range of are the input and output dataset. Length parameter gives the
(0,1). sample’s length that is to be fed into the NN to fit the model.
The second lane, depicts implementation of LSTM model The sampling rate is the time period that occurs between
by defining the model, which consists of one layer of LSTM the two outputs that the model predicts with the given input
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3224938

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

values. Since the length of parameter is 12, it indicates that TABLE 1: Rolling Window Size Comparison
the generator takes in the previous twelve months values to RWS RSME MAPE MAPA (Accuracy %)
predict the next one month’s profit, as the batch size is 1. 5 14.4374196 0.107453364 89.25466363
12 11.84809209 0.086463782 91.35362177
The values generated by this function are stored in an object 24 11.40494076 0.083290455 91.67095453
named as generator, having two columns which consists of 50 12.27557348 0.089923931 91.00760693
an array of the lags and an array of the predicted value.
Training and Testing of LSTM
This includes defining, and fitting the model, making pre-
dictions on test data, and finally, forecasts.
LSTM is specifically designed for removing the long
term dependency problem which means they find it easy to
remember the information that is given to them for a very
long period of time. The sequential model can help in easily
stacking up of layers of the NN on top of each other and does
not depend on the exact shape of the tensors or the layers
in each model. Next the Sequential constructor is created
for this model. This model consists of the one single visible
layer, a hidden layer consisting of 100 LSTM neurons and FIGURE 8: Window (24) Rolling Mean and Standard Devia-
the output layer which is used to predict the future profits. tion vs Time plot
A batch size of 1 and 20 epochs are used in the training of
this model. Verbose is set to 1 which means that the progress
of the training of every epoch with an animated progress bar IV. RESULT ANALYSIS
is shown. The LSTM neurons require a sigmoid activation In this section sales forecasting using of ARIMA, SARIMA
function. Here the batch size means the number of samples and LSTM models is discussed.
from the training dataset that are used for a single iteration.
Epochs tells us the number of iterations of the training data A. RESULT ANALYSIS OF ARIMA MODEL
set that the LSTM model has completed. Since the amount of Window size for MA Forecast is chosen to be 24 because
data in the data set is usually very huge the data is divided into it gives the least possible error for RSME and MAPE and
batches for easier processing. The loss function used here best possible accuracy of 91.671%, approximately as shown
is mean squared error loss which comes under the category in the Table 1. The Window rolling mean and Window rolling
of regression loss functions. This loss specifically calculates standard deviation calculated with a window of 24 have been
the differences between the squares of the profit attribute plotted as shown in Figure 8 and they seem to be varying a
in training dataset and the profit attribute in the predicted lot. *Rolling Window Size (RWS) The p-value obtained from
datasets. The lower the mean squared error value the more performing ADF for the first time on pre-processed data as
accurate the model is because the predicted values are very shown in Figure 9 is 0.338 approximately. Hence the data has
close to the actual or training values. no unit root and is considered to be non-stationary. While the
The optimizer used here is Adam. Which is very fast in p value obtained from performing KPSS for the first time on
computation and it optimizes the weights in every level. pre-processed data is 0.010. Hence the data is non-stationary.
The metrics used for evaluating the model fit is accuracy. So, in the next step transformations are applied. By taking the
This metric comes under the accuracy class. It computes natural logarithm of the dependent variable, i.e., Gross Profit
the number of times the predictions equal to the existing in thousands in train data, the Logarithmic transformation has
values. Rectifiedlinear(Relu) activation function will activate been performed as shown in Figure 10. It can be understood
the node and give the output directly if the output is positive
and directly output 0 if the immediate output is otherwise.
The benefits of this function are that the model is very easy
to train on this model and more often than not attain great
performance. For training the network’s gradient descent
functions need to be used which allow for the feedback of
the errors. A nonlinear function which can permit the estab-
lishment of complicated relations between the neurons. For
giving more sensitivity to the added input activation and to
evade from saturating the neuron, relu is used. Subsequently
model fitting is done.
Subsequently predictions on test data is carried out.
FIGURE 9: ADF, KPSS test results for preprocessed data

VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3224938

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 13: Seasonality Decomposition


FIGURE 10: Log transformations

FIGURE 14: Rolling Mean and Std. Deviation of Residuals


FIGURE 11: Differencing

that the rate at which the rolling mean is increasing has been estimated from the PACF plot, there is only 1 lollipop above
lowered and the variance has been stabilized. the confidence interval (blue region) at lag=1, before the next
one at lag=2 falls into the confidence interval. The value at
Now, after removing the moving average, it has been
lag 0 is ignored as it always shows perfect correlation by
observed in Figure 11 that the rolling mean and standard
default. Hence, p should be 1. "q" term is estimated from the
deviation are approximately horizontal. That is the mean
ACF plot, there are 2 values above the confidence interval
of the series has been stabilized by Differencing the series.
(blue region) at lags 1 and 2 that are quite significant, before
This stabilization has been done to ensure stationarity. From
the next one falls below the confidence interval. The value
the ADF test results, the p-value obtained was 0.05, thus
at lag 0 is ignored as it always shows perfect correlation by
the series is rendered as stationary as shown in Figure12.
default. Hence, q should be 2.
Seasonal decompose is used to break up the time series
I term, or d value is the order of differencing. Only log
data into trend component, seasonality part, level and the
difference is performed. Hence, the d value is 1.
residual as shown in Figure 13. Residual is used to build
the ARIMA model, so its stationarity is examined. Its rolling In Figure17, the AR model has been fitted by maximum
mean and standard deviation have been checked and shown likelihood, i.e. building of model is done using the trans-
in Figure 14. Figure 15 shows ADF and KPSS test results as formed train data. The actual vs predicted results on vali-
stationary. Figure 16 shows ACF and PACF plots. "p" term is

FIGURE 12: Stationarity after Transformations FIGURE 15: Rolling Mean and Std. Deviation of Residuals

8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3224938

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 16: ACF and PACF plots of ARIMA

FIGURE 19: Plotting MA Model on Train Data

FIGURE 17: Plotting AR Model on Train Data

dation data is shown in Figure 18, after it has been scaled FIGURE 20: Actual vs Predicted on Validation Data – MA
back to original scale. In Figure 19, the gross profit of valid Model
and the predicted gross profit values vs order year is plotted.
Subsequently, the model is scaled back to its original scale as
shown in Figure 20, similar to AR model.
The order for the combined ARIMA model has been taken
as (2,1,2), by considering p=2,d=1,and q=2 as per the insights
gathered from AR and MA models. The model has been built
using the ARIMA.fit() function and can be seen in Figure 21)
Now, the model has to be scaled back to its original scale in
Figure 22, similar to AR model.

B. RESULT ANALYSIS OF SARIMA MODEL


Inferences from the ACF and PACF lollipop charts shown in
Figures 23. p term is estimated from the PACF plot, there
FIGURE 21: Plotting ARIMA Model on Train Data

FIGURE 18: Actual vs Predicted on Validation Data – AR FIGURE 22: Actual vs Predicted on Validation Data –
Model ARIMA Model

VOLUME 4, 2016 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3224938

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 23: ACF and PACF plots for SARIMA

FIGURE 25: Plot Diagnostics of SARIMA

FIGURE 24: SARIMA Model fit() and predictions on Valid


Data

are 2 lollipops below the confidence interval (blue region)


at lags 3 and 4, before the next lag at 2 falls above the
confidence interval. The value at lag 0 is ignored as it always
shows perfect correlation by default. Hence, p should be 2.
q term is estimated fom the ACF plot, there are 4 values
above the confidence interval (blue region) at lags 1, 2, 3 FIGURE 26: SARIMAX Results by summary() function
and 4 that are quite significant, before the next one falls into
the confidence interval. The value at lag 0 is ignored as it
always shows perfect correlation by default. Hence, q should
should have almost constant mean and variance. From the
be 4. d value is the order of integration. Hence, d value is 1.
Standardized Residual graph, the residual errors appear to
With respect to the seasonal terms, the plots show expected
vary around a mean of zero and have a uniform variance. This
behaviour’s with unexpected spikes at certain lags. So, it has
indicates an unbiased forecast. The Histogram plus estimated
been hypothesized that P =0 and Q = 1, owing to the tapering
density graph, known as the density plot, suggests a normal
auto correlation function. These values have been checked
distribution having a mean of zero. The Normal Q-Q plot
when they are applied to the model.
shows almost all the dots falling in line with the red line,
The ARIMA.predict() function is used, which takes the fit-
which means that the distribution is proper and not skewed.
ted results of the MA model and the start and end parameters
The Correlogram, also known as the Auto correlation Func-
as the datetimes of the beginning and ending of the valid
tion (ACF) plot or lag plot, indicates that the residuals are not
data and the gross profits from the valid dataset. The gross
auto-correlated at lag 1. If correlations exit among residuals,
profit of valid and the predicted gross profit values vs order
it means there is unexplored data left in the residuals that
year has been plotted as shown in Figure 23 and the accuracy
must be considered for the purpose of forecasting. Then,
metrics have been displayed. Now, the model has to be scaled
a need arises to search for more exogenous variables for
back to its original scale, similar to AR model. The order and
SARIMA. Hence, these plots indicates that the fit is good
seasonal order have been taken as (2,1,4) and (0,1,1,7). The
and can be used for forecasting.
Sarimax.fit() function has been used on the training dataset to
build the model. The predict() function has been used on the The summary() function displayed the SARIMAX Results
fitted model to make the SARIMAX prediction for validation in Figure 26. It is evident that the value of AIC, as well as the
dataset. These have been displayed below in Figure 24. P values belonging to the coefficients estimated by the model
The plot diagnostics have been displayed in Figure 25. To looks significant.
determine the validity of fit of the model, its residuals errors
10 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3224938

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

TABLE 2: Accuracy Metrics for AR, MA Model Predictions


Model AR MA
RSME 29.65521352 33.26912075
ME -27.60785087 -31.38192957
MPE -0.272996613 -0.309043979
MAE 27.60785087 31.38192957
MAPE 0.206260488 0.227875589
Corr 0.101832397 0.059874417
MinMax Error 0.377903407 0.395349002
Accuracy % (MAPA) 79.37 % 77.21 %

TABLE 3: Comparison of Accuracy Metrics of all 3 Models


Model ARIMA SARIMA LSTM
RSME 8.681204196 7.274401885 3.917264522
ME 2.135885345 3.93202283 0.470382847
MPE 0.030512648 0.036229118 0.004556761
MAE 6.481059891 6.010790505 3.257199791
MAPE 0.061593781 0.056216104 0.029891568
FIGURE 27: LSTM Model losses and accuracies of train and Corr 0.622249004 0.84321785 0.840131571
test data MinMax Error 0.338681671 0.35963917 0.178130191
Accuracy % (MAPA) 93.84% 94.38% 97.01%

have been taken into consideration to make a more real-


istic prediction on valid data. On the other hand, LSTM
surpasses both the stochastic models, as expected.
• Additionally, a positive corr value above 0.6 can be seen
in all the three cases, and it indicates a rather good
positive relation between profit and time, and this in turn
explains the increasing trend of the data considered, as
time progresses.
• The accuracies have been computed based on MAPA
and MAPE (MAPA % = (1 – MAPE) * 100) because it
is a percentage metric, hence enhances easier interpre-
tation compared to RSME.
As the models are giving good accuracy, the forecast for
FIGURE 28: Predictions screenshot of 2016-17 the next 5 years has been made for each of them as follows.
Observations from the below figures:
Figures 29 and 30: ARIMA, SARIMA forecast the profit
C. RESULT ANALYSIS OF LSTM MODEL with a gradual decreasing trend over time.
The losses and accuracies of the train and test datas have Figure 31: LSTM forecasts profit with a sudden, but grad-
been plotted using plot() and the model.history.history. This ual decreasing trend over time.
is useful to know how the model has converged. It is seen
from the plot that the losses for train and val loss have V. CONCLUSION AND FUTURE SCOPE
reached their minimum at epoch=2, as seen in Figure 27. The Profit analysis helps to understand the sales and the profits
predictions made after all the inverse transformations have and losses made and predict values for the future In the
been printed for the span of 2016-17 in Figure 28. It shows current work this is carried out on sales data using the sta-
that the predictions are almost in line with the actual data and tistical method-Autoregressive Integrated Moving Average
the model built has understood the dataset well. (ARIMA) and Seasonal ARIMA models (SARIMA), and
deep learning method- Long Short- Term Memory (LSTM)
1) Comparison of Results Neural Network model in time series forecasting. It has
From Table 2, it has been understood that the AR and MA been converted into a stationary dataset for ARIMA, not for
models when combined to get the ARIMA model produce an SARIMA and LSTM. The fitted models have been built and
accuracy of 93.840% approximately. used to predict profit on test data. Accuracies of 93.84%
Following observations are made from the Table3. and (ARIMA), 94.378% (SARIMA) and 97.01% (LSTM) ap-
their Significance: proximately are observed. Using the models built forecasts
• It is observed that SARIMA has higher accuracy for the next 5 years have been done. Results show that LSTM
than that of ARIMA because the seasonal constituents surpasses both the statistical models in constructing the best
(trends and seasonality), that were removed in ARIMA, model.
VOLUME 4, 2016 11

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3224938

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

LSTM surpasses both the stochastic models in construct-


ing the best model, but it is expensive in terms of runtime
and computational capability if the data used, and the number
of iterations required are huge. As it provides only around
3% betterment in accuracy, it can be replaced by SARIMA
for dataset that is larger and not very complex but contains
seasonality. It has been uncovered that the number of epochs
used do not influence the accuracy of LSTM, as it increases
or decreases randomly with epochs. Hence it is best to stop
at minimum epochs once a decent accuracy is achieved.
The accuracy of the future forecasts decreases as more time
elapses from the last known data point. Various new DL
models can be tried in the future. Also, combinations of
stochastic and DL models can be implemented to obtain more
benefits, depending on the data.

REFERENCES
FIGURE 29: Future Forecast – ARIMA Model
[1] C. Chatfield, Time-series forecasting. Chapman and Hall/CRC, 2000.
[2] R. H. Shumway, D. S. Stoffer, and D. S. Stoffer, Time series analysis and
its applications. Springer, 2000, vol. 3.
[3] H. Li, PhD and H. Li, PhD, “Time-series analysis,” Numerical Methods
Using Java: For Data Science, Analysis, and Engineering, pp. 979–1172,
2022.
[4] Y. Takahashi, H. Aida, and T. Saito, “Arima model’s superiority over f-
arima model,” in WCC 2000-ICCT 2000. 2000 International Conference
on Communication Technology Proceedings (Cat. No. 00EX420), vol. 1.
IEEE, 2000, pp. 66–69.
[5] N. Deretić, D. Stanimirović, M. A. Awadh, N. Vujanović, and A. Djukić,
“Sarima modelling approach for forecasting of traffic accidents,” Sustain-
ability, vol. 14, no. 8, p. 4403, 2022.
[6] K. Mokhtar, S. M. Mhd Ruslan, A. Abu Bakar, J. Jeevan, and M. R.
Othman, “The analysis of container terminal throughput using arima and
sarima,” in Design in Maritime Engineering. Springer, 2022, pp. 229–
243.
[7] T. Falatouri, F. Darbanian, P. Brandtner, and C. Udokwu,
“Predictive analytics for demand forecasting – a comparison of
sarima and lstm in retail scm,” Procedia Computer Science,
vol. 200, pp. 993–1003, 2022, 3rd International Conference
on Industry 4.0 and Smart Manufacturing. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S1877050922003076
FIGURE 30: Future Forecast – SARIMA Model [8] N. Mishra and A. Jain, “Time series data analysis for forecasting – a
literature review,” International Journal Of Modern Engineering Research
(IJMER), vol. 4, no. 7, pp. 1–5, 2014.
[9] C. Luo, J.-G. Lou, Q. Lin, Q. Fu, R. Ding, D. Zhang, and Z. Wang,
“Correlating events with time series for incident diagnosis,” ser. KDD ’14.
Association for Computing Machinery, 2014, p. 1583–1592.
[10] C. C. Ihueze and U. O. Onwurah, “Road traffic accidents prediction
modelling: An analysis of anambra state, nigeria.” Accident; analysis and
prevention, vol. 112, pp. 21–29, 2018.
[11] M. Sangare, S. Gupta, S. Bouzefrane, S. Banerjee, and P. Mühlethaler,
“Exploring the forecasting approach for road accidents: Analytical
measures with hybrid machine learning,” Expert Systems with
Applications, no. 167, p. 113855, Apr. 2021. [Online]. Available:
https://hal.archives-ouvertes.fr/hal-03119076
[12] A. Almeida, S. Brás, I. Oliveira, and S. Sargento, “Vehicular traffic flow
prediction using deployed traffic counters in a city,” Future Generation
Computer Systems, vol. 128, pp. 429–442, 2022. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0167739X21004180
[13] I. O. Olayode, L. K. Tartibu, and M. O. Okwu, “Prediction
and modeling of traffic flow of human-driven vehicles at a
signalized road intersection using artificial neural network model:
A south african road transportation system scenario,” Transportation
Engineering, vol. 6, p. 100095, 2021. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S2666691X21000518
[14] M. A. Rahim and H. M. Hassan, “A deep learning based
FIGURE 31: Figure 4.28 Future Forecast – LSTM Model traffic crash severity prediction framework,” Accident Analysis
and Prevention, vol. 154, p. 106090, 2021. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0001457521001214

12 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3224938

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

[15] M. Kulkarni, A. Jadha, and D. Dhingra, “Time series data analysis for DR. MANJULA C BELAVAGI has received the
stock market prediction,” in Proceedings of the International Conference B.E. degree in Computer Science and Engineering
on Innovative Computing and Communications (ICICC) 2020. IEEE, from Karnatak University, Dharwad, India. She
2020, pp. 1–6. has completed her masters degree in Network and
[16] G. V. Attigeri, M. P. M. M, R. M. Pai, and A. Nayak, “Stock market Internet Engineering from JNNCE, Shivamogga,
prediction: A big data approach,” in TENCON 2015 - 2015 IEEE Region affiliated to VTU, Belgaum, India. She has re-
10 Conference, 2015, pp. 1–5. ceived PhD degree from Manipal Academy of
[17] A. E. Permanasari, I. Hidayah, and I. A. Bustoni, “Sarima (seasonal
Higher Education, Manipal, India. She is currently
arima) implementation on time series to forecast the number of malaria
working as Assistant Professor-Selection Grade in
incidence,” 2013 International Conference on Information Technology and
Electrical Engineering (ICITEE), pp. 203–207, 2013. the Department of Information & Communication
[18] M. Y. M. Alsharif and J. Kim, “Time series arima model for prediction of Technology, Manipal Institute of Technology, Manipal. She has published
daily and monthly average global solar radiation: The case study of seoul, research papers in National and International Conference proceedings and
south korea,” Symmetry, vol. 11, no. 2, pp. 1–17, 2019. Journals. Her area of interests include Machine Learning, Game Theory and
[19] P. Chen, H. Yuan, and X. Shu, “Forecasting crime using the arima model,” Wireless Sensor Networks Security.
in 2008 Fifth International Conference on Fuzzy Systems and Knowledge
Discovery, vol. 5, 2008, pp. 627–630.
[20] D. P. Gandhmal and K. Kumar, “Systematic analysis and
review of stock market prediction techniques,” Computer
Science Review, vol. 34, p. 100190, 2019. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S157401371930084X
[21] O. B. Sezer, M. U. Gudelek, and A. M. Ozbayoglu,
“Financial time series forecasting with deep learning :
A systematic literature review: 2005–2019,” Applied Soft
Computing, vol. 90, p. 106181, 2020. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S1568494620301216
[22] S. Siami-Namini, N. Tavakoli, and A. Siami Namin, “A comparison of
arima and lstm in forecasting time series,” in 2018 17th IEEE International
Conference on Machine Learning and Applications (ICMLA), 2018, pp.
1394–1401.
[23] G. Chniti, H. Bakir, and H. Zaher, “E-commerce time series
forecasting using lstm neural network and support vector regression,”
in Proceedings of the International Conference on Big Data and
Internet of Thing, ser. BDIOT2017. New York, NY, USA: Association
for Computing Machinery, 2017, p. 80–84. [Online]. Available:
https://doi.org/10.1145/3175684.3175695
[24] M. P. Athiyarath, Srihari and S. Krishnaswamy, “A comparative study
and analysis of time series forecasting techniques.” SN Computer Science,
vol. 1, no. 3, 2020.
DR. GIRIJA ATTIGERI is currently Associate
[25] Z. Y. W. P. L. W. C. V. Pang, X., “An innovative neural network approach
for stock market prediction,” J Supercomput, vol. 76, p. J Supercomput, Professor in the Department of Information and
2020. Communication Technology, Manipal Institute of
[26] A. Ayodele Ariyo, A. Aderemi Oluyinka, and A. Charles Korede, “"ap- Technology, Manipal Academy of Higher Educa-
plications of deep learning in stock market prediction: Recent progress",” tion, Manipal, India. She has received B.E. and
Journal of Applied Mathematics, vol. 1, no. 1, 2014. M. Tech. degrees from the Visvesvaraya Tech-
[27] W. Jiang, “Applications of deep learning in stock market prediction: nological University, Karnataka, India. She has
Recent progress,” Expert Syst. Appl., vol. 184, no. C, dec 2021. [Online]. more than 15 years of experience in teaching and
Available: https://doi.org/10.1016/j.eswa.2021.115537 research. She received his Ph.D. from the Mani-
[28] O. B. Sezer and A. M. Ozbayoglu, “Algorithmic financial trading with pal Institute of Technology, Karnataka, India. His
deep convolutional neural networks: Time series to image conversion research interests span big data analytics, Machine Learning, and Data
approach,” Expert Syst. Appl., vol. 70, no. 1, 2018. Science. She has around 10 publications in reputed international conferences
[29] “Data sets for testing (till 5 million records) – sales,” and journals.
https://excelbianalytics.com/wp/downloads-18-sample-csv-files-data-
sets-for-testing-sales/, accessed: 5-02-2021.

MS. UPPALA MEENA SIRISHA has completed


her B.Tech degree in Computer and Communica-
tion Engineering from Manipal Institute of Tech-
nology, Manipal, India in 2021. She has finished
internships at Modulus Motors Pvt. Ltd as Junior
Front End Developer, Vizag Steel Plant, Black-
Rock Services India Pvt. Ltd. as SDE intern, and
Mitti (NGO) as Research Analyst. She is the Co-
founder and current board member of Mudra -
Imprint, a social service organisation. Her fields
of interest include Machine Learning, Database and Management Systems
and Computer Networking.

VOLUME 4, 2016 13

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like