A Review On Stock Market Prediction Using Machine Learning Algorithms
A Review On Stock Market Prediction Using Machine Learning Algorithms
ISSN: 1004-9037
https://sjcjycl.cn/
DOI: 10.5281/zenodo.98549673
Harish G N,
[email protected]
Department of CSE
Research scholar at Presidency University, Bangalore
The forecasting of the stock market is a traditional quandary that lies at the crossroads of the financial
and computational disciplines. Regarding this issue, the renowned Efficient Market Hypothesis (EMH)
espouses a bleak perspective, positing that the financial market is efficient [Fama, 1965]. This theory
asserts that any form of analysis, be it technical or fundamental, would not generate a reliable surplus
profit for investors. Notwithstanding, there exists a divergence of opinion amongst scholars regarding
the validity of the Efficient Market Hypothesis [Malkiel, 2003]. Several scholarly inquiries are currently
underway to gauge the varying levels of efficacy between established and developing markets.
Additionally, there are ongoing endeavors to construct robust prognostic models for stock markets,
which is also the purview of the present investigation. The endeavor commences with the narratives of
fundamental and technical analyses. The methodology of fundamental analysis involves the assessment
of a stock's worth based on its inherent value, commonly referred to as fair value. In contrast, technical
analysis solely relies on the interpretation of charts and trends. The utilization of technical indicators,
derived from one's prior experience, may be employed as manually crafted input characteristics for both
machine learning and deep learning models. Subsequently, the introduction of linear models ensues as
the viable resolutions for the prognostication of the stock market, encompassing the autoregressive
integrated moving average (ARIMA) [Hyndman & Athanasopoulos, 2018] and the generalized
autoregressive conditional heteroskedasticity (GARCH) [Bollerslev, 1986]. The advent of machine
learning models has facilitated their utilization in the realm of stock market forecasting, exemplified by
the likes of Logistic regression and support vector machine [Alpaydin, 2014]. The crux of our survey
shall center around the most recent advancements in deep learning, specifically pertaining to the diverse
array of deep neural network architectures as expounded upon by Goodfellow et al. in 2016. The
remarkable triumphs of deep learning in recent years can be attributed to its utilization of vast amounts
of data obtained from the Internet, the parallel processing capabilities of graphics processing units
(GPUs), and the novel convolutional neural network family. This has enabled deep learning to excel in
various domains, such as image classification [Rawat & Wang, 2017; Jiang & Zhang, 2020], object
detection [Zhao et al., 2019], and time series prediction [Brownlee, 2018; Jiang & Zhang, 2018]. Deep
learning models have demonstrated superior performance in tasks such as stock market prediction,
owing to their adeptness in handling large datasets and discerning the intricate, nonlinear associations
between input features and prediction targets, surpassing both linear and machine learning models.
1. Introduction
The intricate nature of the stock market, characterized by a significant amount of noise [Fischer et al.
2018], and the semi-strong form of market efficiency [Malkiel BG et al. 1970], which is widely
acknowledged, renders the task of analyzing and predicting it a challenging one. Making a moderately
precise forecast has the potential to increase the likelihood of generating advantageous outcomes and
mitigating market uncertainties. Notwithstanding, the presence of prospects for lucrative
prognostications is frequently scrutinized by financial economists [Zhou F et al. 2019].
The application of artificial intelligence has been observed in the resolution of time series data that
exhibit chaotic and random behavior, as evidenced by studies conducted by Yan D et al. in 2017 and
Wang J-J et al. in 2012. The scholarly examination of the extensive utilization of astute prognostic models
has conventionally been scrutinized within the realm of machine learning [Henrique BM et al. 2019]. In
contrast to conventional models, machine learning models offer greater adaptability [Zhang Y et al. 2009],
obviate the need for distributional presumptions, and enable facile amalgamation of individual classifiers
to mitigate variance [Kotecha K et al. 2015]. Numerous mechanized methodologies have been
implemented to prognosticate the stock market, as per Kotecha et al.'s 2015 study. The utilization of
various machine learning techniques such as logistic regression (LR), neural networks (NNs) [Frances
et al. 2005, Chen A-S et al. 2003, Moghaddam AH et al. 2016], deep neural networks (DNNs), and decision
trees (DTs) [Krauss C et al. 201710] has been observed. Various machine learning techniques, such as
support vector machines (SVMs), support vector regression (SVR), k-nearest neighbors (KNN), random
forests (RFs), long short-term memory networks (LSTMs), and restricted Boltzmann machines (RBMs)
have been employed by researchers to forecast fluctuations in the stock market, as evidenced by studies
conducted by Wu M-C et al. (2006), Lee M-C et al. (2009), Pai P-F et al. (2005), Kim K-j et al. (2003), Khalid
Alkhatib et al. (2013), Zhang N et al., Krauss C et al. (2017), Bao W et al. (2017, 2019), Qiu J et al. (2020),
and Liang Q et al. (2017). The study conducted by Bessembinder H et al. in 1979 involved the
implementation of Long Short-Term Memory (LSTM) networks to analyze and forecast the directional
movements of constituent stocks of the S&P 500 from 1992 to 2015, in order to compare various machine
learning techniques. It has been observed that LSTM networks exhibit superior performance in
comparison to RF, DNN, and LR. In accordance with Kotecha et al.'s (2015) study, an evaluation was
conducted to compare the efficacy of four models, namely Artificial Neural Network (ANN), Support
Vector Machine (SVM), Random Forest (RF), and Naïve-Bayes, in relation to the CNX Nifty, S&P BSE
Sensex, Infosys Ltd., and Reliance Industries on the Indian stock market.In their study, Goo YJ et al.
(2007) employed a neural network model to forecast the daily closing prices of the FTSE 100 Share Index
in the United Kingdom for both five and twenty-five day periods. Additionally, the researchers utilized
multiple linear regression analysis to compare and contrast the predictive outcomes of the two models.
The study conducted by Chen et al. in 2016 involved the implementation and analysis of the efficacy of
deep neural networks (DNNs) over a period of one day.
The concept of artificial intelligence encompasses the capacity of a system to assimilate knowledge from
its prior encounters and enhance its performance without the need for frequent reconfiguration.
According to Cheng, Li-Chen et al's research in 2018, it has been observed that the fluctuations in long-
term supply rates typically manifest in a linear configuration. Individuals opt to allocate their resources
towards equities that are expected to experience an increase in value in the forthcoming period.
Individuals often exhibit reluctance towards purchasing stocks due to the volatile fluctuations in stock
valuations. Consequently, it is imperative that we make precise prognostications regarding stock market
valuations that are amenable to real-world scenarios. This particular endeavor involves the utilization of
anticipatory methodologies, including but not limited to direct regression, long short-term memory,
Facebook Prophet, and k nearest neighbors. The notable triumph of machine learning (ML) across
various sectors has sparked a surge of curiosity and continued investigation into ML's potential
applications in finance [Nguyen et al., 2015; Kim and Kang, 2019]. Thus, the present study aims to
investigate the utilization of machine learning in financial methodologies and algorithms, with a specific
focus on the prediction of stock prices.
of the SM is a formidable challenge due to the somewhat non-linear character of the available historical
data. The interpretation of stock price movement as a directional indicator and its subsequent utilization
for predictive purposes is a common practice. Assessing the trajectory of future stock price fluctuations
holds paramount significance for investors in gauging market vulnerabilities. The task of modeling the
direction of stock price movement has long been regarded as a formidable and intricate challenge. The
task of predicting stock price movements is a challenging one due to the significant volatility, anomalies,
and noisy signals that are present within the realm of securities markets. In recent decades, this subject
matter has garnered the interest of scholars across various disciplines, with a particular emphasis on
the realm of artificial intelligence. The publication authored by Fatih Ecer and colleagues in the year 2020.
In the event that the software yields a surplus [J. Li et al., 2017], the shareholder may leverage the equity
for lucrative transactions. Conversely, when the pricing index is suboptimal [E. Guresen et al., 2011],
emphasis is placed on enhancing the developmental aspects of the application to facilitate more
judicious decision-making.
Extracting data
Inputs of Processing of Extraction
from various
various stocks the Data and selection
sources
of features
Detailed
Prediction
Classification of the reviewed articles about financial stocks market prediction using computational
techniques and machine learning techniques
Sl.No Author & Article name Technic Name Algorithm Name Metric Name
Fischer T, Krauss C. Deep Random forest Authors are finding one
learning with long short- (RAF), a deep common pattern among the
term memory networks for Long short-term neural net (DNN), stocks selected for trading –
1.
financial market memory (LSTM) and a logistic which exhibit high volatility
predictions. Eur J Oper regression and a short-term reversal
Res. 2018;270(2):654–69. classifier return profile.
1. raw price data and twelve
Zhou F, Zhang Q, Sornette
technical indicators are
D, Jiang L. Cascading Logistic
By cascading the employed for extracting the
logistic regression onto regression
logistic regression information contained in the
gradient boosted decision algorithm,
(LR) model onto the stock indices.
2. trees for forecasting and Gradient-boosted
gradient boosted 2. consideration of
trading stock decision trees,
decision trees transaction cost and buy–sell
indices.Applied Soft Support vector
(GBDT) model thresholds, contributing to
Computing. machine algorithm
exploit short-term strategies
2019;84:105747.
for more stock indices data
Yan D, Zhou Q, Wang J,
Daily market prices and
Zhang N. Bayesian
Bayesian- financial technical indicators
regularisation neural Particle swarm
regularised artificial are utilised as inputs to
3. network based on artificial optimisation
neural networks (BR- predict the one day future
intelligence optimisation. (PSO) algorithm
ANN) closing price of the Shanghai
Int J Prod Res.
(in China) composite index.
2017;55(8):2266–87.
Hybrid approach
combining
exponential
Wang J-J, Wang J-Z, Zhang The closing of the Shenzhen
smoothing model
Z-G, Guo S-P. Stock index Integrated Index (SZII) and
(ESM),
4. forecasting based on a Genetic algorithm opening of the Dow Jones
autoregressive
hybrid model. Omega. Industrial Average Index
integrated moving
2012;40(6):758–66. (DJIAI)
average model
(ARIMA), and the
back propagation
neural network
(BPNN) is used
Three-stage stock
D. Enke, M. Grauer, N. market prediction 3-month Certificate of
Mehdiyev, Stock market system: Multiple Deposit (CDR3) rate, past
prediction with multiple Regression Analysis S&P 500 (SP500) Index level,
Multiple
regression, fuzzy type-2 , Differential past
5. Regression
clustering, and neural Evolution-based Money Supply (M1) level,
Analysis
networks, Procedia type-2 Fuzzy recent Industrial Production
Comput. Sci. 1 (6) (2011) Clustering, a Fuzzy (IP) reading, and the recent
201–206. type-2 Neural Producer Price Index (PPI)
Network
H. Chung, K.S. Shin,
Genetic algorithm- Long short-term Korea Stock Price Index
optimized long short-term memory (LSTM) (KOSPI) data: high price, low
Deep learning
6. memory network for stock network and price, opening price, closing
technique
market prediction, genetic price, and trading volume for
Sustainability 10 (10) (2018) algorithm (GA) 10 days
3765.
K.J. Kim, W.B. Lee, Stock
market prediction using Technical indicators and the
artificial NN with optimal Artificial neural Genetic Algorithm direction of change in the
7.
feature transformation, networks with GA (GA) daily KOSPI: 2,348 trading
Neural Comput. Appl. 13 (3) days data
(2004) 255–260.
Reviewed Articles About Stock Market Prediction Using Machine Learning Techniques
Author & Article name Technic Name Algorithm Name Metric Name
Bayesian-regularised
Bayesian-regularised artificial Yan D, Zhou Q, Wang J, Zhang N.
ANN 0.85%.
neural networks (BR-ANN). Bayesian regularisation neural
Fusion model (HMM,
4 Algorithms used: Particle network based on artificial
ANN,GA): 0.8487%
swarm optimisation (PSO) intelligence optimisation. Int J
ARIMA model: 0.9723%
algorithm Prod Res. 2017;55(8):2266–87.
Sl Type of Market
Time period Market Paper reference
No (Equity/index)
Ballings, M., den Poel, D. V.,
Hespeels, N., & Gryp, R. (2015).
Evaluating multiple classifiers for
1 Stocks 5 years of data Europe
stock price direction prediction.
Expert Systems with Applications,
42(20), 7046–7056.
Chang, P.-C., Liu, C.-H., Lin, J.-L.,
Fan, C.-Y., & Ng, C. S. (2009). A
neural network with a case based
2 Stocks 6 Years Taiwan dynamic window for stock trading
prediction. Expert Systems with
Applications, 36(3, Part 2), 6889–
6898.
Research approach
The primary objective of this review paper is to collate empirical data on the application of machine
learning models in stock market forecasting. This approach entails the formulation of one research
question (Q4) under the vote-counting method and five research questions (Q1, Q2, Q3, Q4, Q5) under
the narrative synthesis method. The research strategy encompasses research questions that facilitate
the extraction of information. We have derived several research questions from the selected studies,
which are as follows: Q1. What are the diverse statistical tools utilized in analyzing the stock market?
Q2. What types of machine learning (ML) algorithms are utilized for predicting the stock market? Q3.
What are the various datasets employed in predicting the stock market? Q4. Has a hybrid method of ML
models been used to predict the stock market? Q5. What are the different performance metrics employed
in stock market forecasting?
We have systematically compiled a selection of scholarly articles that align with our designated research
inquiries. Within this segment, we shall deliberate upon the research inquiries that were previously
explicated. The inquiries of investigation are as follows:
May I inquire as to the various statistical instruments employed in the analysis of the stock market?
Following a rigorous selection process, we have conducted a thorough analysis and extracted pertinent
information. In order to expand our knowledge, let us delve into a selection of statistical instruments
employed in the analysis of the stock market. The diverse statistical methodologies employed in the
analysis possess a descriptive foundation for comprehending the stock market's interpretation. Certain
studies employ ARIMA (Autoregressive Integrated Moving Average), regression, and clustering
methodologies to prognosticate the stock market.
Inquiry number two pertains to the specific machine learning (ML) algorithms utilized for the purpose of
predicting the stock market.
The preponderance of the chosen subjects employ machine learning or deep learning techniques in
order to prognosticate the stock market. A pair of scholarly investigations have been chosen that employ
a merged methodology to enhance precision in prognosticating stock market trends. The primary focus
of this section pertains to the various techniques employed in the prediction of stock market trends. The
prevailing methods utilized for forecasting are explicated as follows: Support Vector Machines (SVM) is
a powerful machine learning algorithm that is widely used in classification and regression analysis. It is
based on the concept of finding the optimal hyper Support Vector Machine (SVM) is widely regarded as
a highly efficacious approach for the purpose of time series prediction. The Support Vector Machine
(SVM) algorithm is a versatile tool that can be effectively employed for both regression and classification
tasks. The works of Schumacher and Chen et al. from the year 2009. The support vector machine (SVM)
is a sophisticated machine learning algorithm that has the capability to classify the future direction of a
stock price, whether it will experience an upward trend or a downward trend. The SVM algorithm entails
the representation of data as a point within a space of n dimensions. The various metrics of the stock
market are delineated and graphed on distinct Cartesian planes. The support vector machine (SVM) is
widely regarded as the most efficacious and prognostic financial market tool.
The Support Vector Regression (SVR) model, as proposed by Yanjie Hu et al. in 2008, is based on the
principles of the Support Vector Machine (SVM) model. While the two models share many similarities,
there exist subtle distinctions between them. The implementation of Support Vector Regression (SVR)
is commonly employed for the purpose of predicting stock prices, while Support Vector Machines (SVM)
are frequently utilized for the forecasting of stock market trends through the analysis of their respective
time series. The prognostication of stock market indices is a highly significant field of inquiry within the
domains of investment and practical applications. This is due to its potential to yield greater profits and
returns while mitigating risk through the implementation of efficacious exchange strategies. The
findings of Yingjun Chen et. al. suggest that the Feature Weighted Support Vector Machine (FWSVM)
outperforms the conventional Support Vector Machine (SVM) in terms of accuracy when predicting
binary labels (profit or loss) over short, medium, and long-term periods. The findings indicate that
FWSVM exhibits superior performance compared to SVM, with a notable margin of 3.4% for 1-day ahead
prediction, 3.2% for 5 days, 2.6% for 10 days, 1.6% for 15 days, 1.4% for 20 days and 1.0% for 30 days.
The Generative Adversarial Network, commonly referred to as GAN, is a type of neural network
architecture that involves two distinct models working in tandem to generate new data. The Generative
Adversarial Network (GAN), as proposed by Zhang et al. in 2019, represents a novel framework that
manifests itself in two distinct versions, akin to a game that lacks any semblance of amusement. Within
the antagonism cycle, the individual who generates data that closely resembles authentic data may be
referred to as a "forger," while the individual who assumes the role of a "judge" in discerning genuine
data from computer-generated data is commonly referred to as a "racist." The esteemed scholars Xingyu
Zhou et. al. have put forth a straightforward yet sophisticated model for predicting stock market trends,
aptly named GAN-FD. This innovative approach is poised to aid individuals lacking financial expertise
and everyday investors in making astute investment choices. The GAN-FD methodology employs a
streamlined approach by utilizing a concise set of 13 technical indices as input data, thereby obviating
the need for convoluted pre-processing of input data. He et al. employed a hybrid sequential GANs
framework for the purpose of forecasting stock index fluctuations. Their empirical investigations have
demonstrated that hybrid sequential GANs exhibit superior performance in the realm of stock prediction,
relative to prior research that relied solely on single algorithmic approaches. The empirical findings
indicate that the Gated Long Short-Term Memory (G-LSTM) model augmented with Deep Long Short-
Term Memory (D-LSTM) and G-LSTM model augmented with Deep Gated Recurrent Unit (D-GRU)
outperformed other models.
The Naïve Bayes algorithm, as posited by Li et al. in 2017, is a classification technique that utilizes
Bayesian networks to derive a theorem for a given dataset, grounded in the principles of Bayes. The
underlying presumption is that the designated dataset comprises a solitary function that lacks any
interdependence with other class functionalities. The algorithm in question exhibits a straightforward
methodology and exceptional out-of-the-box efficacy when applied to top-tier strategies tailored for
voluminous data sets. The amalgamation of GNB algorithm and Linear Discriminant Analysis, known as
GNB_LDA, has been observed to outperform all other GNB models in three out of four evaluation
metrics, namely accuracy, F1-score, and AUC, as per the research conducted by Ernest Kwame
Ampomah et. al. The utilization of a predictive model founded on the Gaussian Naive Bayes algorithm,
coupled with Min-Max scaling and Principal Component Analysis, yielded the most favorable ranking as
determined by the specificity outcomes. Furthermore, it has been observed that the performance of GNB
is superior when employing the Min-Max scaling method as opposed to standardization scaling
methods. The scholarly work conducted by Chia-Cheng Chen and colleagues involved a thorough
examination of the relative efficacy of various machine learning models in the context of the Taiwan
stock market. The comparative analysis of investment performance among four distinct models, namely
ANN, SVM, random forest, and Naïve Bayes, was conducted based on a five-year historical dataset (2014-
2018) of the Taiwan Stock Market (TWSE) Index. The findings of their study suggest that machine
learning models surpass the benchmark index in terms of investment performance. Within the realm of
machine learning models, it is widely acknowledged that artificial neural networks (ANN) and support
vector machines (SVM) exhibit exceptional performance, surpassing their counterparts. Random Forest,
while still a formidable contender, ranks third in comparison, with Naïve-Bayes ultimately falling behind
the rest.
Furthermore, it is noteworthy that a subset of the chosen investigations employ either machine learning
or deep learning methodologies for the purpose of predicting stock market trends. The algorithms have
been subjected to a rigorous evaluation process, wherein they have been applied to a real-time dataset,
taking into account various features, and subsequently assessed based on their performance
parameters. Table 2 enumerates the implementation of the machine learning algorithm for each chosen
study, accompanied by a detailed description of the same. Upon examination of Table 2, it is evident that
a significant proportion of the chosen research endeavors employ neural network methodologies with
notable frequency. Figure 3 depicts the proportion of methodology employed.
Brownian motion, also known as the Wiener process, is a phenomenon that has been extensively studied
in the field of physics. The stochastic model of Brownian motion, originally intended to emulate the
movement of minute particles in a liquid medium, has found additional applications in option pricing
theory. These procedures are extensively bolstered by meticulous mathematical analysis, albeit in
relation to this matter.
Brownian motion refers to the random and erratic movement of microscopic particles suspended in a
fluid, which is caused by the constant bombardment of the A stochastic process characterized by real-
valued random variables. Could you kindly expound on the concept of Brownian motion, also known as
Wiener process, under a probability measure? P if 1. For any given t ≥ 0 and s > 0, it can be observed
that the stochastic variable Wt+s −Wt, commonly referred to as dW, exhibits a certain probability
distribution.
Typically characterized by a mean of zero and a variance of s.
For any given value of n and for all instances where 0 is less than or equal to t0, which in turn is less
than or equal to t1, and so on up to tn, it can be observed that the random variables {Wtr − Wtr−1} exhibit
independence. The initial value of W0 is conventionally set to zero, although it is important to note that
this is an arbitrary choice and any other starting point could be selected.
The function in question exhibits continuity for all values of t greater than or equal to zero.
Essentially, this represents a prolongation of the discrete simple random walk to a continuous temporal
domain. The differential of the change in Wt+s − Wt over an infinitesimal time interval dt is commonly
represented by the symbol dW and follows a distribution with a mean of zero and a variance of dt. The
erratic trajectories of Brownian motions are readily apparent, and it is worth noting that the anticipated
length of the path traversed by W within any given interval is boundless. This characteristic poses a
challenge to the application of calculus in the context of Brownian motions.
Hybrid approaches 9
GAN 3
SVR 3
RNN 6
CNN 6
SVM 21
Percentage of each technique used by selected studies
0 5 10 15 20 25
Fig. 3 ML Techniques
(X-axis represents Percentage of each technique used by selected studies and Y-axis represents
Techniques used in the study )
Inquiry III, Question 3: What are the various typologies of datasets employed in the prognostication of
the stock market?
A notable investigation employed diverse sets of data in the realm of stock market prognostication. As
per the findings of certain scholarly investigations, a number of datasets have been made available to
the public. A significant proportion of the chosen subjects employ publicly available datasets to forecast
the stock market. The aforementioned datasets are commonly employed for the purposes of
classification or forecasting. Table 3 delineates the various categories of data sets employed by the
selected studies, as explicated below. The tabulated data indicates that a majority of the chosen research
endeavors employed the NASDAQ dataset for inventory prediction and projection.
Can it be posited whether the stock market has been prognosticated through the utilization of a
composite approach involving machine learning models?
As depicted in Figure 2, it is noteworthy that a mere three of the chosen studies have employed the
amalgamated approach for prognosticating the stock market. The present study posits the employment
of the hybrid methodology S3, which amalgamates artificial neural networks (ANN) with an
approximation approach. Furthermore, the proposed hybrid methodology S8 combines ANN with genetic
algorithms (GA) to enhance the performance of GA in the domain of securities forecasting in the stock
market. In a recent investigation, S13 adeptly integrated the discrete statistical methodology of wavelet
transforms with the machine learning artificial neural network algorithm (DWT-ANN) in order to predict
stock market trends.
What are the diverse performance metrics utilized in the prognostication of the stock market?
Diverse performance metrics are employed to evaluate the superior market/exchange/forecasting
proficiency of machine learning. The evaluation of an algorithm's efficacy is contingent upon its
performance parameters, which are determined by the methodology employed and the corresponding
data sets utilized. The diverse performance metrics employed by the studies that opted to gauge their
performance are explicated as follows:
The metric employed to assess the classification of a model is Accuracy, as stated by Pang et al. in 2020.
The metric of informal accuracy pertains to the degree to which our model's predictions are deemed
correct.
The Root Mean Squared Error (RMSE) is a statistical metric utilized to determine the disparity between
the anticipated values of a model and the values that are retained. This method was employed in the
calculation process described by J. B. Heaton et al. in 2016. The root mean square error (RMSE) exhibits
a remarkable proximity to both the training and evaluation datasets.
The utilization of Mean Absolute Error (MAE) as a metric for regression values was implemented by
Ummul Khair Pang et al. in 2002. In this particular instance, the error of prediction is derived from the
summation of variances between the anticipated and factual variables, subsequently partitioned by the
total quantity of data points encompassing the entire dataset. The concept of Mean Absolute Error (MAE)
pertains to the computation of the disparity between two variables that are continuous in nature.
The Mean Square Error (MSE) is a statistical metric utilized as the loss function to compute least squares
regression, as per the research conducted by Z. Wang, A, et al. in 2018. Furthermore, it can be expressed
as the aggregate of the disparity between the projected and factual variables, divided by the total count
of observations encompassing the entire dataset. The incorporation of pertinent events or sentiments
pertaining to the stock market may lead to a reduction in the Mean Square Error (MSE).
The utilization of Mean Percentage Absolute Error (MAPE) is a viable approach to assess the relative
reliability of stock data prediction, as posited by Ansari Saleh Ahmar. The present summation [E.
Guresen, et al., 2011] pertains to the collective absolute discrepancies that have been segregated based
on the requisitions. This represents a standard deviation from the true value. Furthermore, certain
individuals employed these performance metrics and their corresponding databases to prognosticate
the fluctuations of the stock market. The fluctuations of stock market exchange rates are subject to
monthly or yearly increments. Figure 4 illustrates that a majority of the chosen studies employ the
precise performance parameter utilized to assess their model in conjunction with their dataset. However,
it is noteworthy that a mere 11% of the chosen studies employed the MAPE parameter for predictive
purposes.
Percentage
11% Accuracy
32% MSE
20%
MAE
RMSE
16% 21% MAPE
From the table is evident that almost all authors are using and publishing in IEEE journals.
5. Conclusion
The present article presents a comprehensive analysis of diverse methodologies employed in the
prediction of stock market trends, leveraging mathematical and machine learning strategies. The
objective of this survey is to evaluate the relative efficacy of prevailing methodologies vis-à-vis modified
approaches, utilization of diverse datasets, performance metrics, and application methodologies, based
on an analysis of the top 50 seminal investigative articles. The categorization of techniques employed in
the prediction of stock market trends is predicated upon various machine learning algorithms. In pursuit
of enhancing prognostic precision, a multitude of inquiries have been undertaken, employing a
confluence of methodologies, in the domain of stock market analysis. The utilization of Artificial Neural
Networks (ANNs) and Neural Networks (NNs) has become a prevalent methodology in the realm of stock
market forecasting, yielding favorable outcomes. It is plausible to devise methodologies that enable the
comprehensive surveillance and oversight of the entirety of the stock market. The primary impediment
to stock market prognostication lies in the inability to discern the prevailing methodologies through the
examination of past stock data. Hence, the stock market is subject to the sway of various externalities,
including but not limited to governmental policy determinations and the prevailing disposition of the
consumer populace. In the forthcoming times, our endeavor shall be to enhance the system by devising
a more dependable and precise stock market mechanism.
References
1. Fischer T, Krauss C. Deep learning with long short-term memory networks for financial market
predictions. Eur J Oper Res. 2018;270(2):654–69.
2. Malkiel BG, Fama EF. Efficient capital markets: A review of theory and empirical work. The journal
of Finance. 1970;25(2):383–417.
3. Zhou F, Zhang Q, Sornette D, Jiang L. Cascading logistic regression onto gradient boosted
decision trees for forecasting and trading stock indices.Applied Soft Computing. 2019;84:105747.
4. Yan D, Zhou Q, Wang J, Zhang N. Bayesian regularisation neural network based on artificial
intelligence optimisation. Int J Prod Res. 2017;55(8):2266–87.
5. Wang J-J, Wang J-Z, Zhang Z-G, Guo S-P. Stock index forecasting based on a hybrid model.
Omega. 2012;40(6):758–66.
6. Henrique BM, Sobreiro VA, Kimura H. Literature review: machine learning techniques applied to
financial market prediction. Expert Syst Appl.2019;124:226–51.
7. Zhang Y, Wu L. Stock market prediction of S&P 500 via combination of improved BCO approach
and BP neural network. Expert Syst Appl.2009;36(5):8849–54.
8. Kotecha K. Predicting stock market index using fusion of machine learning techniques. Expert
Syst Appl. 2015;42(4):2162–72.
9. ZLi J, Financial time series forecasting using twin support vector regression. PLoS ONE.
2019;14(3).pmid:30865670
10. Krauss C, Do XA, Huck N. Deep neural networks, gradient-boosted trees, random forests:
Statistical arbitrage on the S&P 500. Eur J Oper Res.2017;259(2):689–702.
11. Khalid Alkhatib. Stock Price Prediction Using K-Nearest Neighbor (kNN) Algorithm. International
Journal of Business, Humanities and Technology. 2013; Vol. 3, No. 3.
12. Lee M-C. Using support vector machine with a hybrid feature selection method to the stock trend
prediction. Expert Syst Appl. 2009;36(8):10896–904.
13. Wu M-C, Lin S-Y, Lin C-H. An effective application of decision tree to stock trading. Expert Syst
Appl. 2006;31(2):270–4.
14. Pai P-F, Lin C-S. A hybrid ARIMA and support vector machines model in stock price forecasting.
Omega. 2005;33(6):497–505.
15. Frances PH, Marches M, Murray A. A hybrid genetic-neural architecture for stock index
forecasting. Information Science. 2005;17(1):3–37.
16. Kim K-j. Financial time series forecasting using support vector machines. Neurocomputing.
2003;55(1–2):307–19.
17. Chen A-S, Leung MT, Daouk H. Application of neural networks to an emerging financial market:
forecasting and trading the Taiwan Stock Index.Computers & Operations Research. 2003;30(6):901–23.
18. Brownstone D. Using percentage accuracy to measure neural network predictions in stock
market movements. Neurocomputing. 1996;10(3):237–50.
19. Bao W, Yue J, Rao Y. A deep learning framework for financial time series using stacked
autoencoders and long-short term memory. PLoS ONE.2017;12(7):e0180944. pmid:28708865.
20. Liang Q, Rong W, Zhang J, Liu J, Xiong Z, editors. Restricted Boltzmann machine based stock
market trend prediction. International Joint Conference onNeural Networks (IJCNN); 2017: IEEE.
21. Zhang N, Lin A, Shang P. Multidimensional k-nearest neighbor model based on EEMD for
financial time series forecasting. Phys A Stat Mech its Appl.2017;477:161–73.
22. Qiu J, Wang B, Zhou C. Forecasting stock prices with long-short term memory neural network
based on attention mechanism. PLoS ONE. 2020;15(1).pmid:31899770.
23. Kotecha K. Predicting stock and stock price index movement using Trend Deterministic Data
Preparation and machinelearning techniques. Expert Syst Appl. 2015;42(1):259–68.
24. Moghaddam AH, Moghaddam MH, Esfandyari MJJoEF, Science A. Stock market index prediction
using artificial neural network. 2016:89–93.
25. Bessembinder H, Chan K. Market efficiency and the returns to technical analysis. Financ Manag.
1998:5–17.
26. Goo YJ, Chen DH, Chang YW. The application of Japanese candlestick trading strategies in
Taiwan. Investment Management and Financial Innovations.2007;(4, Iss. 4):49–79.
27. Chen S, Bao S, Zhou Y. The predictive power of Japanese candlestick charting in Chinese stock
market. Phys A Stat Mech its Appl. 2016;457:148–65.
28. Cheng, Li-Chen, Yu-Hsiang Huang, and Mu-En Wu.
"Applied attention-based LSTM neural networks in stock prediction." 2018 IEEE International Conference
on Big Data (Big Data). IEEE, 2018.
29. Nguyen, T. H., Shirai, K., and Velcin, J. (2015).
Sentiment analysis on social media for stock movement prediction. Expert Systems with Applications,
42(24):9603– 9611.
48. Chia-Cheng Chen, Yi-Sheng Liu, Ting-Hsin Hsu, An Analysis on Investment Performance of
Machine Learning: An Empirical Examination on Taiwan Stock Market. International Journal of
Economics and Financial Issues, 2019, 9(4), 1-10.
49. X. Pang, Y. Zhou, P. Wang, W. Lin, V. Chang, An innovative neural network approach for stock
market prediction, J. Supercomput. 76 (3) (2020) 2098–2118.
50. J. B. Heaton. N. G. Polson, J. H. Witte, Deep learning for finance: deep portfolios, Applied
Stchastic Models in Business and Industry. Wiley 2016.
51. Ummul Khair, Hasanul Fahmi, Sarudin Al Hakim and Robbi Rahim, Forecasting Error Calculation
with Mean Absolute Deviation and Mean Absolute Percentage Error, International Conference on
Information and Communication Technology (IconICT) IOP Publishing, IOP Conf. Series: Journal of
Physics: Conf. Series 930 (2017) 01 2002.
52. Z. Wang, A. Tan, F. Li, and S.-B. Ho, “Comparisons of learning based methods for stock market
prediction,” in The 4th International Conference on Cloud Computing and Security (ICCCS 2018), 2018.
53. Ansari Saleh Ahmar, Sutte Indicator: A Technical Indicator in Stock Market. International Journal
of Economics and Financial Issues , 2017, 7(2), 223-226.
54. E. Guresen, G. Kayakutlu, T.U. Daim, Using artificial neural network models in stock
market index prediction, Expert Syst. Appl. 38 (8) (2011) 10389–10397.