Final Internship Report
Final Internship Report
Report
on
“P75: A Comparative Study of ARIMA vs. LSTM Models
Enhanced by Economic News Analysis with
Smart Money Framework”
Submitted by
NAME COLLEGE
Anish Uday Pujari Alliance University
Varunika Annadurai Alliance University
Abraham Senjith Alliance University
1. Introduction
This project aims to forecast Gold Futures (GC) prices by using ARIMA (AutoRegressive
Integrated Moving Average) and LSTM (Long Short-Term Memory) models. The study
enhances traditional forecasting techniques by incorporating technical indicators, economic
news analysis, and Smart Money Concepts (SMC). These enhancements allow the model to
capture market dynamics more effectively by understanding the influence of economic events
and institutional activities on gold prices.
2. Existing System
Existing systems leverage machine learning and deep learning models for predicting Forex
market movements, including models like Support Vector Machines (SVM), Recurrent
Neural Networks (RNN), and LSTM. These models have shown varying levels of accuracy,
generally ranging from 75% to 87%. However, they often face challenges with market
volatility, overfitting, and the inability to adapt to sudden economic events or geopolitical
changes. Moreover, many models rely heavily on historical data, overlooking the nuanced
impact of real-time economic news and institutional market movements.
3. Proposed System
The proposed system introduces a more holistic approach to forecasting Gold Futures by
integrating ARIMA and LSTM models with economic news analysis and Smart Money
Concepts. The system emphasizes the impact of high-impact economic news and institutional
order blocks to identify key support and resistance levels in market prices.
ARIMA Model:
The ARIMA model is employed to identify linear relationships and short-term trends in the
time series data. It is chosen for its ability to handle non-stationary data by differencing the
series to achieve stationarity. Optimal parameters (p, d, q) are selected using auto_arima and
manual grid search methods to ensure the best fit. The model’s performance is evaluated
using metrics like Mean Squared Error (MSE) and Mean Absolute Error (MAE) to gauge
forecasting accuracy.
LSTM Model:
The LSTM model is used to capture complex temporal dependencies in the dataset. This
model is designed to overcome the limitations of traditional RNNs by mitigating issues like
vanishing gradients, allowing it to learn long-term dependencies in sequential data. The
LSTM model incorporates the identification of Order Blocks (Buy and Sell zones) to provide
contextual insights into market structure. These order blocks serve as key input features,
enhancing the model's capacity to predict price reactions around institutional trading zones.
4. Knowledge Gained - Tools, Technology, Courses
The project necessitated the development of a multifaceted skill set, including an in-depth
understanding of deep learning models, especially LSTM, through courses on platforms like
Coursera. Expertise in technical analysis of Forex markets was developed, focusing on
selecting appropriate indicators, recognizing chart patterns, and analyzing trends. Proficiency
in Python programming and data analysis libraries like TensorFlow, Keras, and statsmodels
was essential for implementing the models. Additionally, knowledge of Smart Money
Concepts (SMC) and economic news analysis was crucial to understand market structures
and the influence of institutional traders on price movements.
5. Architectural Framework
The architectural framework consists of multiple stages:
1. Data Acquisition: Historical gold futures price data is retrieved from Yahoo Finance,
and economic news data is sourced from Forex Factory. Data spans from May 1,
2024, to August 31, 2024, covering daily price movements and significant economic
events.
2. Data Pre-Processing: The data is cleaned and filtered to extract relevant information.
High-impact economic news events are shortlisted, and an impact scale (ranging from
1 to 10) is developed to assess the effect of these news events on market values. The
Savitzky-Golay filter is applied to smooth price data, making trend detection clearer.
This smoothed data is then merged with the market price data to form a
comprehensive dataset for analysis.
3. Model Implementation: Both ARIMA and LSTM models are implemented on the
merged dataset. The LSTM model is augmented with the integration of order block
theory, providing it with the ability to recognize key institutional zones and predict
price reactions at these levels.
4. Forecasting and Evaluation: Future price movements are forecasted, and the
performance of each model is evaluated using metrics such as MSE and RMSE.
6. Implementation Details
Data Acquisition:
Data was obtained from multiple sources:
Yahoo Finance: Provided historical daily price data for Gold Futures (GC=F),
including open, high, low, close, and volume.
Forex Factory: Offered economic news data, which was used to analyze the impact
of macroeconomic events on market prices. News events were pre-processed to filter
out irrelevant information, focusing on high-impact events.
Data Pre-Processing:
Merging Data: The economic news data was merged with the historical price data to
create a unified dataset. This merging was crucial to correlate market movements with
high-impact economic events.
Savitzky-Golay Filter: Applied to the closing prices to smooth the data and reduce
noise. This made it easier to identify trends, allowing for more accurate peak and
trough detection.
Order Block Identification: Daily price data was formatted into a candlestick chart
using the mpl_finance library to display Open, High, Low, and Close (OHLC) prices.
Functions were developed to detect Buy and Sell Order Blocks using fractal patterns.
Overlapping levels were filtered out based on average daily price volatility, ensuring
that only significant order blocks were retained.
Model Implementation:
1. ARIMA Model Implementation:
o Stationarity Testing: The Augmented Dickey-Fuller (ADF) test was used to
check for stationarity in the price series. Non-stationary data was differenced
to achieve stationarity, which is a prerequisite for ARIMA modeling.
o Parameter Selection: auto_arima was employed to automatically identify the
optimal parameters (p, d, q). Further refinement was done using manual grid
search to ensure the best fit.
o Model Fitting and Forecasting: The ARIMA model was fitted to the pre-
processed time series data. Forecasts were generated, and performance metrics
such as MSE and MAE were calculated to evaluate accuracy.
2. LSTM Model Implementation:
o Data Normalization: Historical price data was normalized to ensure efficient
learning by the LSTM model. The dataset was structured into sequences, with
the inclusion of identified Order Blocks as additional features.
o Model Architecture: The LSTM model was constructed with multiple layers,
including LSTM and Dense layers. The architecture was designed to capture
both short-term and long-term dependencies in the data, enhanced by the input
of Order Block zones as support and resistance levels.
o Training and Tuning: The model was trained on the combined dataset, with
hyperparameters such as the number of neurons, learning rate, and epochs
tuned for optimal performance. The presence of Order Blocks provided the
model with critical insights into market reactions at institutional zones,
improving its predictive capability.
o Trend Analysis: Local maxima and minima were identified using the
find_peaks function to mark potential trend reversals. This was used to
compare predicted price movements at peaks and troughs with actual market
behavior, providing a measure of trend accuracy.
Incorporating Order Block Theory:
Integration into LSTM: The identified Buy and Sell Order Blocks were incorporated
as input features within the LSTM model. This allowed the model to recognize and
react to these institutional zones, enhancing its ability to predict price movements
around these key areas.
Rationale: Incorporating Order Blocks offered an additional layer of market structure
analysis, providing the model with insights into institutional activities. This approach
improved the model’s ability to anticipate price movements, offering a refined
prediction methodology for advanced traders and institutional investors.
7. Results
The integration of economic news analysis and Order Block Theory into the LSTM model
resulted in an enhanced forecasting accuracy compared to traditional models. The ARIMA
model effectively captured linear trends, while the LSTM model, with the inclusion of Order
Block zones, demonstrated a higher capability to adapt to market changes induced by
institutional activities and high-impact news events. The combination of these techniques
offered a robust framework for forecasting Gold Futures, aligning with market behaviors
influenced by smart money.
8. Conclusion
The project presents a comprehensive and innovative approach to forecasting Gold Futures
by combining ARIMA and LSTM models with economic news analysis and Smart Money
Concepts. By integrating order block zones and assessing the impact of high-impact news
events, the LSTM model's predictive accuracy was significantly enhanced. This advanced
forecasting framework is particularly beneficial for institutional traders, quantitative
researchers, and advanced retail traders who seek a nuanced understanding of market
dynamics influenced by large market players.
9. Details about the Research Article Submitted or Published
The research work has not been submitted for publication as of now. Future plans include
preparing a detailed paper that encompasses the methodologies and findings of this study for
submission to a relevant journal or conference.