Gold Price Prediction Using Machine Lear
Gold Price Prediction Using Machine Lear
Sandeep Patalay1
Research Scholar, Dept. of Management Studies,
Vignan’s University, (VFSTR, Deemed to be University),
Vadlamudi, Guntur, Andhra Pradesh 522213, India
Email: [email protected]
Sandeep Patalay is a PhD research scholar at Vignan’s University and pursuing research in
the area of Financial Decision Support Systems. He has a master’s degree in Finance from
Acharya Nagarjuna University, Guntur, India and Bachelor’s degree in Electronics
Engineering from Jawaharlal Nehru Technological University, Hyderabad, India. He works at
Tata Consultancy Services, Hyderabad, India and leads the Centre of Excellence for IoT
technologies. He has over 17 Years of experience in the area of Analytics and Decision
Support Systems.
1
Author for Correspondence
1
Abstract
Gold is one of the critical commodities that is used as a barometer of economic activity
prevailing in the globalized world. The price of the gold is dependent on many economic
indicators and is complex to understand the dynamics of its price discovery. To predict the
prices of gold is paramount for any business involved in global trade and in turn gives
indications of the overall financial stability of the global business environment. In this paper,
the authors develop a system which is based on machine learning algorithms to predict the
gold prices based on the historical data related to other closely related commodities and stock
market indicators. The system is based on M5P model tree Machine learning algorithm which
is used to train on historical commodity prices such as Crude oil and S&P500 which are key
indicators of the global financial markets and forms the core decision logic for future
predictions. The results prove that the M5P Machine learning Algorithm performs better than
the other methods with a forecasting accuracy of 85%.
Keywords:
Gold Price Prediction, Machine Learning, M5P Algorithm, Model Trees
2
1. Introduction
Gold is one of the key commodities traded on the global financial markets and is considered
to be an asset that who holds its true value unlike currency depreciation. Gold is considered a
safe asset compared to fiat currencies due to the physical value that it holds and from ancient
times investors flock to gold assets when the markets are not performing well. The primary
uses of gold apart from its investment value is with respect to its high usage in the luxury and
jewelry markets, electronics and high-Tech industries where it is used as a high grade
conducting material to manufacture printed circuit boards and semiconductor components. In
the recent times due to the global economic recession brought in by the COVID-19
pandemic, investors are flocking to buy gold related assets because the global currencies are
losing their value fast and are very volatile. In this scenario predicting the gold price is of
utmost importance both from the business and academic perspective as it will enable the
financial practitioners to predict the financial situation of the future.
Gold price (Farahani & Mehralian, 2013) is dependent on many macro-economic factors such
as the global financial situation is terms of growth scenario, economic stagnation, recession
and depression cycles. Therefore the key parameters that indicate the status of global
financial markets are chosen to be studied in correlation with gold prices. The authors select
the financial indicators such as Crude Oil prices and S&P500 to be studies in detail for their
prediction effects on global gold prices.
The correlation analysis of gold, crude oil and S&P500 index is complex and they do not
follow a linear pattern. Manual mathematical models are insufficient to model such complex
dynamics and hence the usage of machine learning models is high recommended due to their
ability to solve complex relationships. Therefore a research model that can automatically
learn about the intricate relationship between commodities based on historical data and come
up with an “Trained” model that can predict future prices has been considered. The machine
learning is field where complex mathematical algorithms have been developed to learn on
historical data and form relationships that can be the basis of a prediction model. The usage
of Machine learning algorithms is becoming quite popular in in the financial domain due the
availability of huge historical datasets and the need for prediction which can impact
tremendously in the way financial commodity markets can operate.
In this paper, the authors have selected a particular machine learning algorithm called M5P
model tree. In the domain of Machine learning, the stock prediction is a regression problem
as it requires numerical prediction. The ML model chosen for this study is based on the M5P
model tree algorithm (Witten, Frank, & Hall, 2011) which combines the power of
classification and regression to predict stock prices. Model trees are just like ordinary
decision trees, except that at each leaf they store a linear regression model that predicts the
class value of instances that reach the leaf. This model is chosen because financial data is
complex which needs to be classified and then numerical data can be predicted.
The data for building a machine learning model was based on historical data available in the
open source websites which gave the daily price data for the last 20 years in terms of gold,
crude oil and S&P500 index. This voluminous data is highly suitable for a machine learning
system where this data can be used to automatically correlate the relationship model between
these indicators and form a prediction model which is accurate enough (More than 80%) to
be used for decision making.
3
The results of the prediction model developed using M5P machine learning model is quite
encouraging with more that 85% accuracy in the validation tests performed as part of this
study. Further it also indicates a high correlation (Wen, Yang, Gong, & Lai, 2017) between
the functioning of financial markets and its effects on gold prices. This model can
revolutionize the way gold investments are done and can be a quite useful tool for gold
investors.
2. Literature Review
Gold price prediction is an area of research where both business practitioners and Academia
have been involved for a long time, the traditional prediction models built on Linear
regression and statistical models are available in the literature, but the efficacy of the models
needs to be improved and the application of Artificial and Machine learning needs to be
applied for accuracy of predictions (Matenggo, Bhakti, & Bakar, 2020).
In the recent times, Artificial intelligence and Machine learning have been applied to
prediction of various financial assets including gold (Matenggo et al., 2020) (Varahrami,
2011). The prediction models have varied results due to the frequent fluctuation and the
complex dynamics of price discovery in gold based assets.
(Hafezi & Akhavan, 2018) have built a prediction model based on Artificial Neural Networks
(ANNs) and applied them for predicting the stock prices based on global financial indicators
such as Interest rates, Inflation rates and financial market indices such as Dow Jones Index.
The ANNs are complimented by using a meta-heuristics model called BAT to compensate for
the gold price fluctuations. The results of this study are encouraging, even though the root
mean square value are still to be improved, there is sufficient proof that machine learning
algorithms can be used for gold price prediction with reasonable accuracy (>70%).
The financial indicators used for machine learning models and subsequent gold price
prediction is an interesting area of research, (Chandar, Sumathi, & Sivanadam, 2016) have
done studies to indicate the S&P500 is very critical global financial indicator that be used to
for prediction of gold prices. The authors in study have used a Machine learning model
named “Feed forward Neural Networks” which has been compared to other derivatives of
ANN and the results are quite encouraging with training data sets exceeding 90% accuracy
levels. The authors of this paper have built upon research such as this and have come up with
machine learning model trees that are used for prediction.
The authors of this study also explore the prior research where the effects of gold prices are
affecting other commodities and financial indicators. One such study by (Raza, Jawad
Hussain Shahzad, Tiwari, & Shahbaz, 2016) explores the impact of gold process and their
relationship with other commodities and find that gold volatility has an negative impact on
the overall financial markets, thus we can inference that gold price volatility is of utmost
importance to global financial markets.
The complex relationship between gold prices and the stability of financial markets has been
studies in detail by (Akgül, Bildirici, & Özdemir, 2015). This study has identified the
complex relationship and has used Bayesian models extensively to find the correlation using
a construct in which Gold and S&P500 have been assumed to be Endogenous variables and
crude oil has been assumed to be exogenous variable and the results have indicated that crude
oil prices to a greater extent affect the gold and S&P indices.
4
In developing countries the interrelationship between gold, crude oil and S&P500 is much
stringer as evidenced by the study made by (DR Bhunia, 2013). Due to the impact of forex
reserves by import of crude oil and gold have a recurrent effect on the prices of these
commodities. This is further strengthened by the research contribution of (Fathian & Kia,
2012), the results of this study indicate that gold is greatly influenced by the financial market
trends.
The authors of this study conclude that the research done in this area of gold price prediction
needs to be improved on to make the accuracy levels much better (>80%) so that they can be
deployed with confidence by financial investors and newer models of machine learning
which are versatile to be used in financial domain need to be developed.
As studied in the literature review, the gold price prediction models need to be improved with
advanced machine learning models that are versatile to be used for financial domain. The key
objectives of this study are:
Develop machine learning model which can have the characteristics to use both
decision rules and also quantify the gold price as a regression output
Large Training set for machine learning to be used preferably for the past 20 years
Machine learning model needs to be trained on historical data and validated using
real-time inputs
The accuracy of prediction model shall be more than 80% so that they can be
deployed for real-time investment decisions
The gold price prediction is based on the historical data where the dependent variable is the
Gold Price and the independent variables are S&P 500 index and crude oil prices. The data
considered is time series data and is subjected to M5P algorithm for training. The gold price
is closely correlated to crude oil price and is moderated by the global stock markets; the
variable to represent this trend is S&P 500 index.
5
𝑮𝒐𝒍𝒅𝑷𝒓𝒊𝒄𝒆 = 𝒈(𝑷𝑶𝒊𝒍 , 𝑰𝒏𝒅𝒆𝒙𝑺&𝑷 ) − − − −𝟏
Where,
𝐺𝑜𝑙𝑑𝑃𝑟𝑖𝑐𝑒 = 𝑃𝑟𝑖𝑐𝑒 𝑜𝑓 𝐺𝑜𝑙𝑑
The Tree splitting criterion is used to determine which attribute is the best to split that portion
T of the training data that reaches a particular node. It is based on treating the standard
deviation of the class values in T as a measure of the error at that node, and calculating the
expected reduction in error as a result of testing each attribute at that node. The attribute that
maximizes the expected error reduction is chosen for splitting at the node. The expected error
reduction, which we call SDR for standard deviation reduction, is calculated as below:
|𝑻𝒊 |
𝑺𝑫𝑹 = 𝒔𝒅(𝑻) − ∑ ∗ 𝒔𝒅(𝑻𝒊 ) (𝟏)
|𝑻|
𝒊
As noted earlier, a linear model (LM) is needed for each interior node of the tree; the model
takes the form as show below:
𝒘𝟎 + 𝒘𝟏 ∗ 𝒂𝟏 + 𝒘𝟏 ∗ 𝒂𝟏 + ⋯ + 𝒘𝒌 ∗ 𝒂𝒌 (𝟐)
Where,
The dataset consisting of monthly gold prices, crude oil prices and S&P 500 index values are
divided in to training and validation sets. The M5P model tree is generated based on the
training data and pruned to the nearest node.
6
Figure 2. M5P Machine Learning Process
7
Figure 3. Dataset used for Machine Learning
4.2 Data Analysis
The data analysis and machine learning training and validation is performed using Weka
machine learning workbench. The workbench provides features to analyze the models and
predict the future values.
Figure 4. S&P 500, Crude Oil and Gold Price time series data
8
Figure 5. Weka Machine learning workbench
The M5P tree training has generated 70 rules and nodes for calculation of gold prices; each
node is presented by a rule with regression calculation of gold price.
4.3 Results
The validation of the prediction algorithm is performed by using a 10-fold validation process.
The results are described below for the first 20 instances, but the actual file has more than
500 instances that have been checked for validation.
9
Figure 7. Gold Price Prediction Results from Weka Tool
The accuracy of the model is close to 85% with correlation co-efficient close to 99%.
10
5. Conclusions
The research aim of this study was to advance the existing knowledge based and gold price
prediction models to make them accurate enough to be used in real life scenarios. The
literature review revealed that the price prediction models are using traditional linear
regression models and also have applied machine learning models such as ANNs. The
literature review also reveals that a versatile model which can combine qualitative aspects
such as domain specific rules and quantitative outputs such as nonlinear regression have to be
combined to generate accurate prediction models. The model developed here is able to
predict gold prices with an accuracy level of 85% and will certainly help the commodity
investors in cutting their losses by premium prediction available with this model.
11
6. References
Akgül, I., Bildirici, M., & Özdemir, S. (2015). Evaluating the Nonlinear Linkage between
Gold Prices and Stock Market Index Using Markov-Switching Bayesian VAR Models.
Procedia - Social and Behavioral Sciences, 210, 408–415.
https://doi.org/10.1016/j.sbspro.2015.11.388
Chandar, S. K., Sumathi, M., & Sivanadam, S. N. (2016). Forecasting gold prices based on
extreme learning machine. International Journal of Computers, Communications and
Control, 11(3), 372–380. https://doi.org/10.15837/ijccc.2016.3.2009
DR Bhunia, A. (2013). Cointegration and Causal Relationship Among Crude Price, Domestic
Gold Price and Financial Variables- an Evidence of Bse and Nse. Journal of
Contemporary Issues in Business Research Journal of Contemporary Issues in Business
Research ISSN, 2(1), 2305–8277.
Farahani, M. K., & Mehralian, S. (2013). Comparison between Artificial Neural Network and
neuro-fuzzy for gold price prediction. 13th Iranian Conference on Fuzzy Systems, IFSC
2013, (August 2013). https://doi.org/10.1109/IFSC.2013.6675635
Fathian, M., & Kia, A. N. (2012). Exchange rate prediction with multilayer perceptron neural
network using gold price as external factor. Management Science Letters, 2(2), 561–570.
https://doi.org/10.5267/j.msl.2011.12.008
Hafezi, R., & Akhavan, A. (2018). Forecasting Gold Price Changes: Application of an
Equipped Artificial Neural Network. AUT Journal of Modeling and Simulation, 50(1),
71–82. https://doi.org/10.22060/MISCJ.2018.13508.5074
Matenggo, R., Bhakti, A., & Bakar, A. (2020). Gold price prediction in times of 0inancial and
geopolitical uncertainty : A machine learning approach, (September).
https://doi.org/10.13140/RG.2.2.31050.39366
Raza, N., Jawad Hussain Shahzad, S., Tiwari, A. K., & Shahbaz, M. (2016). Asymmetric
impact of gold, oil prices and their volatilities on stock prices of emerging markets.
Resources Policy, 49, 290–301. https://doi.org/10.1016/j.resourpol.2016.06.011
Varahrami, V. (2011). Recognition of Good Prediction of Gold Price Between MLFF and
GMDH Neural Network. Journal of Economics and International Finance, 3(4), 204–
210.
Wen, F., Yang, X., Gong, X., & Lai, K. K. (2017). Multi-Scale Volatility Feature Analysis
and Prediction of Gold Price. International Journal of Information Technology and
Decision Making, 16(1), 205–223. https://doi.org/10.1142/S0219622016500504
12