10
10
Abstract
The integration of Artificial Intelligence (AI) in the financial
domain has opened new avenues for quantitative trading, par-
ticularly through the use of Large Language Models (LLMs).
However, the challenge of effectively synthesizing insights
from diverse data sources and integrating both structured and
unstructured data persists. This paper presents TradeExpert,
a novel framework that employs a mix of experts (MoE) ap-
proach, using four specialized LLMs, each analyzing distinct Figure 1: Illustration of traditional, LLM-based, and MoE
sources of financial data, including news articles, market data, LLMs-based financial models with diverse financial data
alpha factors, and fundamental data. The insights of these ex-
sources.
pert LLMs are further synthesized by a General Expert LLM
to make a final prediction or decision. With specific prompts,
TradeExpert can be switched between the prediction mode
and the ranking mode for stock movement prediction and Achiam et al. 2023) and LLaMAs (Touvron et al. 2023a,b;
quantitative stock trading, respectively. In addition to existing AI@Meta 2024) to interpret financial texts. However, more
benchmarks, we also release a large-scale financial dataset to specialized language models such as FinBERT (Araci 2019),
comprehensively evaluate TradeExpert’s effectiveness. Our BloombergGPT (Wu et al. 2023), and FinGPT (Yang, Liu,
experimental results demonstrate TradeExpert’s superior per- and Wang 2023) have since evolved, demonstrating en-
formance across all trading scenarios.
hanced proficiency in understanding and predicting mar-
ket movements from unstructured data. These models were
Introduction specifically fine-tuned or pre-trained on vast amounts of fi-
The fusion of artificial intelligence with financial analytics nancial corpus. This extensive training on domain-specific
has spawned a new era of innovation, particularly with the datasets has allowed them to better capture typical patterns
infusion of Large Language Models (LLMs) into the realm in the financial corpus. Despite these advancements, the
of finance. These models, which have formerly excelled in challenge remains to effectively synthesize insights from di-
natural language processing (NLP) tasks, are now being tai- verse data sources like historical stock prices, alpha factors,
lored to decode the complex and cryptic narratives of fi- fundamental data, news articles, etc. In addition, integration
nancial data. This adaptation is driven by a crucial insight: of the deluge of unstructured financial texts with structured
Financial markets are not just numbers-crunching engines quantitative metrics still remains to be investigated with lan-
but complicated information systems where the subtleties of guage models.
news articles, reports, and economic indicators interweave To this end, we propose the TradExpert framework, which
to influence market dynamics. stands at the confluence of these challenges. It leverages a
Before the advent of LLMs, traditional financial mod- Mixture of Experts (MoE) approach (Eigen, Ranzato, and
els (Zeng et al. 2023; Yang et al. 2020; Liu et al. 2020; Sutskever 2013; Du et al. 2022; Shen et al. 2023), involv-
Baek and Kim 2018), primarily relied on quantitative meth- ing multiple LLMs each specialized in distinct facets of fi-
ods such as statistical analysis, time series forecasting, and nancial data—news articles, market data, alpha factors, and
econometric models. These models often struggled to in- fundamental data. This not only enhances the model’s abil-
corporate unstructured data such as news articles or fi- ity to process diverse data modalities but also allows for
nancial reports without manual intervention. As a result, a more nuanced understanding of how different factors in-
the development of LLMs tailored for financial applica- teract to influence market trends. Figure 1 illustrates dif-
tions has progressed rapidly. Initial ventures into this do- ferences among traditional, LLM-based, and MoE LLMs-
main repurposed general LLMs such as GPTs (Brown 2020; based financial models. In TradeExpet, each expert works
with a distinct focus and produces specialized reports, which
* Corresponding Author. Canada CIFAR AI Chair. are finally summarized and analyzed by a general expert, just
like the structured division of labor seen in the real world. Integration of Text and Financial Data has also been
Specifically, TradExpert employs specialized LLMs to first rapidly developed for stock movement prediction. Stock-
independently analyze different data sources, then integrates Net (Xu and Cohen 2018) developed a deep generative
these analyses via another LLM that synthesizes insights to model that jointly exploits text and price signals for stock
predict market movements and inform trading strategies. In- movement prediction. SLOT (Soun et al. 2022) improved
novatively, we utilize a reprogramming mechanism to con- upon this by using self-supervised learning to handle sparse
vert time series data to embeddings aligned with LLMs. and noisy tweet data, capturing multi-level price trends.
In addition, we propose two modes for the General Expert CH-RNN (Wu et al. 2018) introduced a hybrid deep se-
LLM, prediction mode and ranking mode, for stock move- quential modeling approach that leverages social text for
ment prediction and stock trading strategies, respectively. stock prediction, incorporating cross-modal attention mech-
In ranking mode, we innovatively let the LLM serve as a anisms. More recently, studies (Lopez-Lira and Tang 2023;
comparator within a relaxed sorting algorithm, enabling the Chen et al. 2023) have explored the use of ChatGPT for
selection of the Top-K ranked stocks for trading. To com- stock movement prediction, comparing its performance with
prehensively evaluate our method, we have also collected a traditional state-of-the-art models. These works collectively
large-scale datasets, which will be publicly released. demonstrate the increasing sophistication of models that in-
Oue contributions can be summarized as follows. tegrate text and financial data, highlighting the potential for
• We present TradExpert, a novel framework that employs improving trading scenarios.
an MoE approach, integrating four LLMs each special-
ized to analyze distinct sources of financial data, imi- Problem Definition
tating the structured division of labors seen in the real
world. In this study, we aim to trade stocks using a framework that
incorporates Large Language Models (LLMs).
• We utilize the LLM as a comparator within a relaxed The input data comprises four primary components:
sorting algorithm, which enables trades with the Top-K
ranked stocks based on TradExpert’s prediction. • News: Textual information from news articles pertinent
• We release a comprehensive dataset encompassing a to the stock and market conditions.
wide range of financial data, which serves as a new • Market Data: Historical OHLCV (Open, High, Low,
benchmark for financial analysis. Close, Volume) data representing the stock’s trading ac-
• Our comprehensive experiments show that TradExpert tivity.
consistently outperforms state-of-the-art baselines across
all trading scenarios. Ablation studies validate the effec- • Alpha Factors: Quantitative indicators and signals be-
tiveness of the modules proposed in TradExpert. lieved to possess predictive power regarding stock price
movements.
Related Work • Fundamental Data: Earnings call transcripts and funda-
Financial Language Models have significantly advanced mental metrics reflecting the company’s economic health
in recent years, blending NLP techniques with financial an- and performance.
alytics to extract meaningful insights from vast amounts of
Task 1: Stock movement prediction is a fundamental
unstructured financial data. To begin with, FinBert (Araci
challenge in quantitative trading, which involves the pre-
2019) is a financial domain-specific variant of BERT, pre-
diction of future price trends based on multifaceted data
trained on a large corpus of financial communications. In
sources. Formally, let D = {(xi , yi )}N
i=1 denote our dataset,
2023, BloombergGPT (Wu et al. 2023) emerged as a 50-
where xi represents the input vector for the i-th stock on day
billion-parameter model trained on a vast financial dataset.
t, and yi ∈ {Rise, Fall} is the corresponding label indicat-
FLANG (Shah et al. 2022) introduced a financial language
ing whether the stock price will rise or fall on day t + 1. The
model with specialized masking and objectives. Astock (Zou
input xi can be expressed as:
et al. 2022) provided a platform for studying NLP-aided
stock auto-trading algorithms on the Chinese market. BBT- xi = {Newsi , Marketi , Factorsi , Fundamentali } (1)
FinT5 (Lu et al. 2023) advanced Chinese financial NLP
with a large-scale pre-trained model. FinMA (Xie et al. Our objective is to learn a predictive function f param-
2023) showcased a model fine-tuned on a multi-task in- eterized by θ such that fθ (xi ) ≈ yi , where fθ is modeled
struction datasets. FinGPT (Yang, Liu, and Wang 2023) pro- using LLMs. The model outputs a binary prediction, “Rise”
vided an open-source framework for financial LLMs. In- or “Fall” indicating the predicted stock price movement.
vestLM (Yang, Tang, and Tam 2023) showed the effective-
ness of instruction tuning for investment-related tasks. Fin- Task 2: Stock trading simulation involves evaluating the
Report (Li et al. 2024b) introduced a system for automatic fi- performance of Buy-and-Hold strategy based on the Top-
nancial report generation. Lastly, AlphaFin (Li et al. 2024a) K ranked stocks sorted by TradExpert. This task simulates
integrated retrieval-augmented generation techniques for fi- real-market trading scenarios to assess the profitability and
nancial analysis. Collectively, these works demonstrate the risk of TradExpert using metrics including Annualized Re-
evolution of financial NLP models and benchmarks, advanc- turn (AR), Sharpe Ratio (SR), Annualized Volatility (AV),
ing the capabilities of LLMs in financial applications. and Maximum Drawdown (MD).
Figure 2: TradExpert operates by processing distinct sources of financial data such as news texts, market data, alpha factors,
and fundamental data through specialized expert LLMs. Then their reports are sumarized and sent to a General Expert which
delivers the final outputs: (1) prediction of stock movement with prediction mode, (2) which of the two stocks is better or worse
with ranking mode.
News Analyst Figure 4: Instruction and prompt for the Market Analyst.
Experimental Setup
General Expert In our experiments, the four expert LLMs and the Gen-
The General Expert LLM can operate in two distinct modes: eral Expert LLM are bulit on the LLaMA-2-7B bakcbone
prediction mode and ranking mode. Both modes begin by model (Touvron et al. 2023b) and are finetuned via
summarizing the reports (historical conversation including LoRA (Hu et al. 2022) mechanism.
instructions, prompts, and responses) from the four special-
ized experts due to the limitations on input context length of Stock Movement Prediction: TradExpert works in pre-
the backbone LLM. diction mode, that is, the General Expert LLM reponses a
In prediction mode, used for stock movement prediction, binary prediction indicating whether a stock will rise or fall
the summarized reports are used to construct a prompt with a the next day. Methods are evaluated using binary classifica-
prediction prefix. Given the summarized reports, the General tion metrics such as accuracy (Acc) and Matthews Correla-
Expert LLM outputs a binary prediction indicating whether tion Coefficient (MCC).
the stock will rise or fall. Stock Trading Simulation: TradExpert works in ranking
In ranking mode, used for stock trading, the General Ex- mode, that is, the General Expert LLM acts as a compara-
pert LLM functions as a comparator to establish the rank- tor to sort the stocks. We simulate the real profit and risk of
ing ability. Specifically, given the summarized reports of TradExpert by executing trades based on the Top-K ranked
two stocks, the General Expert LLM would determine which stocks. TradExpert and baselines are evaluated using metrics
stock is likely to perform better in the future. To generate a including Annualized Return (AR), Sharpe Ratio (SR), An-
Top-K ranking of stocks, we employ a relaxed comparison- nualized Volatility (AV), and Maximum Drawdown (MD).
based sorting similar to BubbleSort: We initially compare
every pair of stocks and count the number of wins for each Baselines
stock. Subsequently, we sort these counts to establish the For stock movement prediction, the baselines include: (1)
rankings for stocks. Although algorithms like QuickSort and Hybrid Models: StockNet (Xu and Cohen 2018), ALSTM-
vanilla BubbleSort offer fewer comparisons for Top-K se- W (Soun et al. 2022), ALSTM-D (Soun et al. 2022),
lection on average O(N log N ) and O(N · K), we propose SLOT (Soun et al. 2022). 2) Large Language Mod-
to use this relaxed comparison-based sorting alogrithm with els: GPT-4 (Achiam et al. 2023), Gemini (Team et al.
O(N 2 ) due to the non-transitive nature of LLM-based com- 2023), LLaMA2-70B (Touvron et al. 2023b), LLaMA3-
parator (Liu et al. 2024). Therefore, more comparisons tend 8B (AI@Meta 2024), FinMA-7B (Xie et al. 2023), FinGPT-
to yield more accurate rankings in practice. LlaMA2-7B (Yang, Liu, and Wang 2023), InternLM-
The General Expert LLM is finetuned on both tasks of 7B (Cai et al. 2024), Falcon-7B (Almazrouei et al. 2023),
stock movement prediction and stock comparison simulta- Mixtral-7B (Jiang et al. 2023).
neously. The instructions and prompts are shown in Figure 5. For stock trading simulation, the baselines include:
(1)Traditional Models: Random Forest (Breiman 2001),
Experiments Decision Tree (Loh 2011), SVM (Cortes and Vapnik 1995).
In this section, we conduct a comprehensive evaluation for (2) Deep Learning Models: A2C (Mnih et al. 2016),
TradExpert framework on two main tasks: stock movement PPO (Schulman et al. 2017), SARL (Ye et al. 2020),
prediction and stock trading simulation. Our experiments EIIE (Jiang, Xu, and Liang 2017), and DeepTrader (Wang
Table 2: Comparison results on stock movement prediction task. As a binary classification problem, methods are evaluated by
Accuracy (Acc) and Mattheus Correlation Coefficient (MCC). Both metrics are better with higher values. The best and second
best results are in bold and underlined, respectively.