0% found this document useful (0 votes)
11 views

10

Uploaded by

TheEarlyStart Up
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

10

Uploaded by

TheEarlyStart Up
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

TradExpert: Revolutionizing Trading with Mixture of Expert LLMs

Qianggang Ding1,2 , Haochen Shi1,2 , Bang Liu 1,2 *


1
DIRO & Institut Courtois, Université de Montréal, QC, Canada
2
Mila - Quebec AI Institute, QC, Canada
{qianggang.ding, haochen.shi, bang.liu}@umontreal.ca
arXiv:2411.00782v1 [cs.AI] 16 Oct 2024

Abstract
The integration of Artificial Intelligence (AI) in the financial
domain has opened new avenues for quantitative trading, par-
ticularly through the use of Large Language Models (LLMs).
However, the challenge of effectively synthesizing insights
from diverse data sources and integrating both structured and
unstructured data persists. This paper presents TradeExpert,
a novel framework that employs a mix of experts (MoE) ap-
proach, using four specialized LLMs, each analyzing distinct Figure 1: Illustration of traditional, LLM-based, and MoE
sources of financial data, including news articles, market data, LLMs-based financial models with diverse financial data
alpha factors, and fundamental data. The insights of these ex-
sources.
pert LLMs are further synthesized by a General Expert LLM
to make a final prediction or decision. With specific prompts,
TradeExpert can be switched between the prediction mode
and the ranking mode for stock movement prediction and Achiam et al. 2023) and LLaMAs (Touvron et al. 2023a,b;
quantitative stock trading, respectively. In addition to existing AI@Meta 2024) to interpret financial texts. However, more
benchmarks, we also release a large-scale financial dataset to specialized language models such as FinBERT (Araci 2019),
comprehensively evaluate TradeExpert’s effectiveness. Our BloombergGPT (Wu et al. 2023), and FinGPT (Yang, Liu,
experimental results demonstrate TradeExpert’s superior per- and Wang 2023) have since evolved, demonstrating en-
formance across all trading scenarios.
hanced proficiency in understanding and predicting mar-
ket movements from unstructured data. These models were
Introduction specifically fine-tuned or pre-trained on vast amounts of fi-
The fusion of artificial intelligence with financial analytics nancial corpus. This extensive training on domain-specific
has spawned a new era of innovation, particularly with the datasets has allowed them to better capture typical patterns
infusion of Large Language Models (LLMs) into the realm in the financial corpus. Despite these advancements, the
of finance. These models, which have formerly excelled in challenge remains to effectively synthesize insights from di-
natural language processing (NLP) tasks, are now being tai- verse data sources like historical stock prices, alpha factors,
lored to decode the complex and cryptic narratives of fi- fundamental data, news articles, etc. In addition, integration
nancial data. This adaptation is driven by a crucial insight: of the deluge of unstructured financial texts with structured
Financial markets are not just numbers-crunching engines quantitative metrics still remains to be investigated with lan-
but complicated information systems where the subtleties of guage models.
news articles, reports, and economic indicators interweave To this end, we propose the TradExpert framework, which
to influence market dynamics. stands at the confluence of these challenges. It leverages a
Before the advent of LLMs, traditional financial mod- Mixture of Experts (MoE) approach (Eigen, Ranzato, and
els (Zeng et al. 2023; Yang et al. 2020; Liu et al. 2020; Sutskever 2013; Du et al. 2022; Shen et al. 2023), involv-
Baek and Kim 2018), primarily relied on quantitative meth- ing multiple LLMs each specialized in distinct facets of fi-
ods such as statistical analysis, time series forecasting, and nancial data—news articles, market data, alpha factors, and
econometric models. These models often struggled to in- fundamental data. This not only enhances the model’s abil-
corporate unstructured data such as news articles or fi- ity to process diverse data modalities but also allows for
nancial reports without manual intervention. As a result, a more nuanced understanding of how different factors in-
the development of LLMs tailored for financial applica- teract to influence market trends. Figure 1 illustrates dif-
tions has progressed rapidly. Initial ventures into this do- ferences among traditional, LLM-based, and MoE LLMs-
main repurposed general LLMs such as GPTs (Brown 2020; based financial models. In TradeExpet, each expert works
with a distinct focus and produces specialized reports, which
* Corresponding Author. Canada CIFAR AI Chair. are finally summarized and analyzed by a general expert, just
like the structured division of labor seen in the real world. Integration of Text and Financial Data has also been
Specifically, TradExpert employs specialized LLMs to first rapidly developed for stock movement prediction. Stock-
independently analyze different data sources, then integrates Net (Xu and Cohen 2018) developed a deep generative
these analyses via another LLM that synthesizes insights to model that jointly exploits text and price signals for stock
predict market movements and inform trading strategies. In- movement prediction. SLOT (Soun et al. 2022) improved
novatively, we utilize a reprogramming mechanism to con- upon this by using self-supervised learning to handle sparse
vert time series data to embeddings aligned with LLMs. and noisy tweet data, capturing multi-level price trends.
In addition, we propose two modes for the General Expert CH-RNN (Wu et al. 2018) introduced a hybrid deep se-
LLM, prediction mode and ranking mode, for stock move- quential modeling approach that leverages social text for
ment prediction and stock trading strategies, respectively. stock prediction, incorporating cross-modal attention mech-
In ranking mode, we innovatively let the LLM serve as a anisms. More recently, studies (Lopez-Lira and Tang 2023;
comparator within a relaxed sorting algorithm, enabling the Chen et al. 2023) have explored the use of ChatGPT for
selection of the Top-K ranked stocks for trading. To com- stock movement prediction, comparing its performance with
prehensively evaluate our method, we have also collected a traditional state-of-the-art models. These works collectively
large-scale datasets, which will be publicly released. demonstrate the increasing sophistication of models that in-
Oue contributions can be summarized as follows. tegrate text and financial data, highlighting the potential for
• We present TradExpert, a novel framework that employs improving trading scenarios.
an MoE approach, integrating four LLMs each special-
ized to analyze distinct sources of financial data, imi- Problem Definition
tating the structured division of labors seen in the real
world. In this study, we aim to trade stocks using a framework that
incorporates Large Language Models (LLMs).
• We utilize the LLM as a comparator within a relaxed The input data comprises four primary components:
sorting algorithm, which enables trades with the Top-K
ranked stocks based on TradExpert’s prediction. • News: Textual information from news articles pertinent
• We release a comprehensive dataset encompassing a to the stock and market conditions.
wide range of financial data, which serves as a new • Market Data: Historical OHLCV (Open, High, Low,
benchmark for financial analysis. Close, Volume) data representing the stock’s trading ac-
• Our comprehensive experiments show that TradExpert tivity.
consistently outperforms state-of-the-art baselines across
all trading scenarios. Ablation studies validate the effec- • Alpha Factors: Quantitative indicators and signals be-
tiveness of the modules proposed in TradExpert. lieved to possess predictive power regarding stock price
movements.
Related Work • Fundamental Data: Earnings call transcripts and funda-
Financial Language Models have significantly advanced mental metrics reflecting the company’s economic health
in recent years, blending NLP techniques with financial an- and performance.
alytics to extract meaningful insights from vast amounts of
Task 1: Stock movement prediction is a fundamental
unstructured financial data. To begin with, FinBert (Araci
challenge in quantitative trading, which involves the pre-
2019) is a financial domain-specific variant of BERT, pre-
diction of future price trends based on multifaceted data
trained on a large corpus of financial communications. In
sources. Formally, let D = {(xi , yi )}N
i=1 denote our dataset,
2023, BloombergGPT (Wu et al. 2023) emerged as a 50-
where xi represents the input vector for the i-th stock on day
billion-parameter model trained on a vast financial dataset.
t, and yi ∈ {Rise, Fall} is the corresponding label indicat-
FLANG (Shah et al. 2022) introduced a financial language
ing whether the stock price will rise or fall on day t + 1. The
model with specialized masking and objectives. Astock (Zou
input xi can be expressed as:
et al. 2022) provided a platform for studying NLP-aided
stock auto-trading algorithms on the Chinese market. BBT- xi = {Newsi , Marketi , Factorsi , Fundamentali } (1)
FinT5 (Lu et al. 2023) advanced Chinese financial NLP
with a large-scale pre-trained model. FinMA (Xie et al. Our objective is to learn a predictive function f param-
2023) showcased a model fine-tuned on a multi-task in- eterized by θ such that fθ (xi ) ≈ yi , where fθ is modeled
struction datasets. FinGPT (Yang, Liu, and Wang 2023) pro- using LLMs. The model outputs a binary prediction, “Rise”
vided an open-source framework for financial LLMs. In- or “Fall” indicating the predicted stock price movement.
vestLM (Yang, Tang, and Tam 2023) showed the effective-
ness of instruction tuning for investment-related tasks. Fin- Task 2: Stock trading simulation involves evaluating the
Report (Li et al. 2024b) introduced a system for automatic fi- performance of Buy-and-Hold strategy based on the Top-
nancial report generation. Lastly, AlphaFin (Li et al. 2024a) K ranked stocks sorted by TradExpert. This task simulates
integrated retrieval-augmented generation techniques for fi- real-market trading scenarios to assess the profitability and
nancial analysis. Collectively, these works demonstrate the risk of TradExpert using metrics including Annualized Re-
evolution of financial NLP models and benchmarks, advanc- turn (AR), Sharpe Ratio (SR), Annualized Volatility (AV),
ing the capabilities of LLMs in financial applications. and Maximum Drawdown (MD).
Figure 2: TradExpert operates by processing distinct sources of financial data such as news texts, market data, alpha factors,
and fundamental data through specialized expert LLMs. Then their reports are sumarized and sent to a General Expert which
delivers the final outputs: (1) prediction of stock movement with prediction mode, (2) which of the two stocks is better or worse
with ranking mode.

Datasets Table 1: Components in each data source. † and ‡ denote


generated by GPT-4 and external models, respectively.
In this study, we collected a comprehensive datasets encom-
passing various data sources including four primary compo-
nents: News, Market Data, Alpha Factors, and Fundamental Instructions and Prompts Responses
Data. The period covered by all data sources spans 4 years Movement
News News articles
from January 1, 2020, to December 31, 2023. Reasoning†
OHLCV embeddings
Stastics Market
Statistics
Movement
News is collected from several reputable financial news
Expressions
sources, including Yahoo Finance, Reuters, InvestorPlace,
Alpha Descriptions† Movement
GlobeNewswire, The Motley Fool, etc. This dataset com-
Comprehensive score ‡
prises a total of 524,995 news articles for stocks on S&P
500 list, with an average word count of 596.4 words per ar- Earning calls scripts Movement
Fundamental
ticle. Each news article is associated with a list of related Fundamental metrics Reasoning†
stock tickers.
Market Data consists of historical daily OHLCV records
for stocks on S&P 500 list. This dataset includes a total of formed as follows: Training set: January 1, 2020, to June 30,
481,484 records, offering a detailed view of the stocks’ trad- 2022. Validation set: July 1, 2022, to December 31, 2022.
ing activity over the specified period. Testing set: January 1, 2023, to December 31, 2023.

Alpha Factors incorporates 108 technical indicators and Methodology


factors with their expressions, which are believed to possess
predictive power regarding stock price movements. In this study, we propose TradExpert, a novel framework
leveraging the MoE LLMs approach, where four LLMs
Fundamental Data includes earnings call transcripts, fi- serve as specialized experts for distinct sources of financial
nancial statements, and fundamental metrics. The earnings data. A General Expert LLM then synthesizes the summaries
call transcripts are sourced from Seeking Alpha, with 16 of the four Expert LLMs to produce the final output. The
transcripts (4 years, quaterly updated) available for each pipeline of TradExpert is shown in Figure 2.
stock. Fundamental metrics include Earnings Per Share In TradExpert, all expert LLMs are built on the LLaMA-
(EPS), Price-to-Earnings Ratio (P/E Ratio), Book Value Per 2-7B backbone LLM (Touvron et al. 2023b) and are super-
Share (BVPS), etc. vised and fine-tuned using the LoRA mechanism (Hu et al.
2022). Before training and fine-tuning, we preprocess the
Data Split raw datasets to construct prompts, instructions, and ground-
The datasets were split into training, validation, and test sets truth responses for each LLM. An overall description of the
based on chronological order to ensure that future data re- preprocessed datasets is demonstrated in Table 1. The details
mains unseen during the training process. The split was per- will be introduced in the following.
Instruction: You are provided with a news article. Please pre- Instruction: You are provided with historical OHLCV data of
dict how the stock will perform in the next <D> days. Your the past 20 days and a description of its statistics. Please predict
response should include your reasoning followed by a predic- how the stock will perform the next <D> day. Your response
tion of ”Rise” or ”Fall” in the specified format. should be “Rise” or ”Fall”.
Format your response as follows: Reasoning: [Your reasoning —
here] Prediction: [Rise or Fall] Prompt: <Embbedings of reprogrammed OHLCV>
— Statistics: The historical prices have a minimum close of
Prompt: News Article: [Insert news article text here] <min val> <min D> days ago, a maximum close of
Question: Given the information in the news article above, how <max val> <max D> days ago, and a median close of
is the stock expected to perform in the next <D> days? <median val> <median D> days ago. The overall trend is
<upward or downward>...
Question: Given the reprogrammed OHLCV data and its statis-
Figure 3: Instruction and prompt for the News Analyst. tics, how is the stock expected to perform in the next <D> days?

News Analyst Figure 4: Instruction and prompt for the Market Analyst.

The News Analyst LLM is designed to analyze texts of news


articles to predict stock movements. The prompt and instruc- mented with a language description of statistics extracted
tion for fine-tuning the LLM are shown in Figure 3. The out- from TSFresh (Christ et al. 2018), serving as prompts for
puts from the News Analyst LLM include not only a predic- the Alpha Expert. An example of instruction and prompt is
tion of the stock movement but also a reasoning of how the shown in Figure 4.
news article relates to the predicted movement in order to
employ a Chain-of-Thought (CoT) (Wei et al. 2022) reason- Alpha Expert
ing approach. The ground-truth reasonings are pre-generated The Alpha Expert specializes in processing expression-
by the OpenAI GPT-4 API using instructions and prompts based alpha factors, which are technical indicators and
that incorporate the actual stock movements and the texts of algorithm-generated factors believed to possess predictie
news articles. power regarding stock price movements.
We leverage GPT-4’s capability of understanding com-
Market Analyst plex expressions to pre-generate a language description for
The Market Analyst LLM focuses on analyzing historical each factor. In this way, we built our Alpha database, where
OHLCV (Open, High, Low, Close, Volume) data to predict an alpha record consists of:
stock movements. However, time series data is inherently • Expression: The mathematical or logical formula used
continuous and lacks the discrete token structure that LLMs to compute the alpha factor based on OHLCV data.
are designed to process. This misalignment poses a signifi- E.g. rank(ts argmax(corr(ts rank(close,
cant challenge in effectively utilizing LLMs on time series. 10), ts rank(volume, 10), 10), 5))
To this end, we utilize a reprogramming mechanism (Jin • Description: Generated by GPT-4 with prompts that in-
et al. 2024) to reprogram the input financial time series into clude the expression.
text prototype representations.
For each stock, we first calculate the values of all al-
Formally, let an OHLCV data instance be X(i) ∈ RN ×T pha factors based on OHLCV data and then derive a com-
which consists of N variables across T time steps. X(i) is prehensive score via a LightGBM-based model (Ke et al.
first divided
n and embedded into o a sequence of patch em- 2017). Subsequently, we select Top-K alphas that contribute
(i) N ×LP ×dm
beddings XP ∈ R , where LP and dm are the most significantly to this comprehensive score. Descriptions
number of patches and the patch embedding dimension re- of these Top-K alphas are retrieved from the database and,
spectively. The patches are then reprogrammed using a col- along with the calculated values, are included in the prompts
′ and instructions for the Alpha Expert.
lection of text prototypes E′ ∈ RV ×D , which is achieved
by linearly probing the LLM’s pre-trained word embedding Fundamental Analyst
E ∈ RV ×D , where V and V ′ are the size of the vocabu-
lary of the LLM and the text prototypes ( V ′ ≪ V ), and The Fundamental Analyst LLM specializes in analyzing
D is the embedding dimension. The reprogrammed patches fundamental data, including earnings call transcripts and fi-
are generated usinga multi-head nancial metrics, to predict stock price movements on a quar-
(i) (i)⊤
 cross-attention mechanism: terly basis. The procedure of the Fundamental Analyst LLM
(i) Qk Kk (i) (i)
Zk = Softmax √
d
Vk , where query Qk = is similar to that of the News Analyst LLM, with key dif-
k

(i) (i) (i)


ferences being that the fundamental data is updated quar-
XP WkQ , key Kk = E WkK , and value Vk = E′ WkV for

terly and, therefore, the movement predictions are made for
each head k. The reprogrammed embeddings O(i) are ob- the next quarter. The response should include a prediction in
tained by aggregating the outputs from each attention head one of the following five categories: “Strong Rise”, “Moder-
and projecting them to the hidden dimensions of the back- ate Rise”, “No Change”, “Moderate Fall”, or “Strong Fall”,
bone LLM. Finally, the reprogrammed embeddings are aug- followed by a reasoning.
Instruction: You are provided with a summarized report of the aims to address the following research questions: RQ1:
stock. Please predict whether the stock will rise or fall the next How does TradExpert perform in stock movement predic-
<D> day. tion compared with state-of-the-art baselines? RQ2: What
Format your response as follows: Reasoning: [Your reasoning are the potential profits and associated risks of TradExpert
here] Prediction: [Rise or Fall]. in the backtesting on the real market? RQ3: How effective
— is the reasoning capability of TradExpert for unstructured
Prompt: Summarized Report: [Insert summarized report here] data? RQ4: What is the significance of each expert within
Question: Based on the summarized report, will the stock rise the TradExpert framework? RQ5: Why we choose the re-
or fall in the next < D > days?
laxed comparison-based sorting algorithm in TradExpert?
Instruction: You are provided with summarized reports of two
stocks. Please determine which stock will perform better the Datasets
next <D> day. Please output Stock AAA or Stock BBB. We include two categories of datasets in our experiments:

Prompt: Summarized Report for Stock AAA: [Report A]
• Benchmark Datasets: We use publicly available bench-
Summarized Report for Stock BBB: [Report B] mark datasets in stock movement prediction research in-
Question: Based on the summarized reports, which stock will cluding CIKM18 (Wu et al. 2018), ACL18 (Xu and Co-
perform better in the next < D > days? hen 2018), and BigData22 (Soun et al. 2022) datasets.
• Proprietary Datasets: We also utilize our proprietary
datasets, which include extensive historical OHLCV
Figure 5: Instructions and prompts for the General Expert data, news articles, alpha factors, and fundamental met-
LLM: (Top) Prediction mode, (Bottom) Ranking mode. rics for a comprehensive analysis.

Experimental Setup
General Expert In our experiments, the four expert LLMs and the Gen-
The General Expert LLM can operate in two distinct modes: eral Expert LLM are bulit on the LLaMA-2-7B bakcbone
prediction mode and ranking mode. Both modes begin by model (Touvron et al. 2023b) and are finetuned via
summarizing the reports (historical conversation including LoRA (Hu et al. 2022) mechanism.
instructions, prompts, and responses) from the four special-
ized experts due to the limitations on input context length of Stock Movement Prediction: TradExpert works in pre-
the backbone LLM. diction mode, that is, the General Expert LLM reponses a
In prediction mode, used for stock movement prediction, binary prediction indicating whether a stock will rise or fall
the summarized reports are used to construct a prompt with a the next day. Methods are evaluated using binary classifica-
prediction prefix. Given the summarized reports, the General tion metrics such as accuracy (Acc) and Matthews Correla-
Expert LLM outputs a binary prediction indicating whether tion Coefficient (MCC).
the stock will rise or fall. Stock Trading Simulation: TradExpert works in ranking
In ranking mode, used for stock trading, the General Ex- mode, that is, the General Expert LLM acts as a compara-
pert LLM functions as a comparator to establish the rank- tor to sort the stocks. We simulate the real profit and risk of
ing ability. Specifically, given the summarized reports of TradExpert by executing trades based on the Top-K ranked
two stocks, the General Expert LLM would determine which stocks. TradExpert and baselines are evaluated using metrics
stock is likely to perform better in the future. To generate a including Annualized Return (AR), Sharpe Ratio (SR), An-
Top-K ranking of stocks, we employ a relaxed comparison- nualized Volatility (AV), and Maximum Drawdown (MD).
based sorting similar to BubbleSort: We initially compare
every pair of stocks and count the number of wins for each Baselines
stock. Subsequently, we sort these counts to establish the For stock movement prediction, the baselines include: (1)
rankings for stocks. Although algorithms like QuickSort and Hybrid Models: StockNet (Xu and Cohen 2018), ALSTM-
vanilla BubbleSort offer fewer comparisons for Top-K se- W (Soun et al. 2022), ALSTM-D (Soun et al. 2022),
lection on average O(N log N ) and O(N · K), we propose SLOT (Soun et al. 2022). 2) Large Language Mod-
to use this relaxed comparison-based sorting alogrithm with els: GPT-4 (Achiam et al. 2023), Gemini (Team et al.
O(N 2 ) due to the non-transitive nature of LLM-based com- 2023), LLaMA2-70B (Touvron et al. 2023b), LLaMA3-
parator (Liu et al. 2024). Therefore, more comparisons tend 8B (AI@Meta 2024), FinMA-7B (Xie et al. 2023), FinGPT-
to yield more accurate rankings in practice. LlaMA2-7B (Yang, Liu, and Wang 2023), InternLM-
The General Expert LLM is finetuned on both tasks of 7B (Cai et al. 2024), Falcon-7B (Almazrouei et al. 2023),
stock movement prediction and stock comparison simulta- Mixtral-7B (Jiang et al. 2023).
neously. The instructions and prompts are shown in Figure 5. For stock trading simulation, the baselines include:
(1)Traditional Models: Random Forest (Breiman 2001),
Experiments Decision Tree (Loh 2011), SVM (Cortes and Vapnik 1995).
In this section, we conduct a comprehensive evaluation for (2) Deep Learning Models: A2C (Mnih et al. 2016),
TradExpert framework on two main tasks: stock movement PPO (Schulman et al. 2017), SARL (Ye et al. 2020),
prediction and stock trading simulation. Our experiments EIIE (Jiang, Xu, and Liang 2017), and DeepTrader (Wang
Table 2: Comparison results on stock movement prediction task. As a binary classification problem, methods are evaluated by
Accuracy (Acc) and Mattheus Correlation Coefficient (MCC). Both metrics are better with higher values. The best and second
best results are in bold and underlined, respectively.

BigData22 ACL18 CIKM18 S&P500 (Ours)


Acc ↑ MCC ↑ Acc ↑ MCC ↑ Acc ↑ MCC ↑ Acc ↑ MCC ↑
Hybrid Models
ALSTM-W 0.48 -0.01 0.53 0.08 0.54 0.03 0.55 0.10
ALSTM-D 0.49 0.01 0.53 0.07 0.50 -0.04 0.54 0.05
StockNet 0.53 -0.02 0.54 -0.02 0.52 -0.02 0.56 0.06
SLOT 0.55 0.10 0.59 0.21 0.56 0.09 / /
Large Language Models
GPT-4 0.54 0.03 0.52 0.02 0.57 0.02 0.58 0.10
Gemini 0.55 0.04 0.52 0.04 0.54 0.02 0.59 0.11
LLaMA2-7B-chat 0.54 0.05 0.51 0.01 0.55 -0.03 0.57 0.06
LLaMA2-70B 0.47 0.00 0.51 0.01 0.49 -0.07 0.57 0.02
LLaMA3-8B 0.55 0.02 0.52 0.02 0.57 0.03 0.53 0.04
FinMA-7B 0.51 0.02 0.51 0.03 0.50 0.08 0.59 0.09
FinGPT-7B-lora 0.45 0.00 0.49 0.00 0.42 0.00 0.51 0.05
InternLM 0.56 0.08 0.51 0.02 0.57 -0.03 0.60 0.06
Falcon 0.55 0.00 0.51 0.00 0.47 -0.06 0.52 0.03
Mixtral-7B 0.46 0.02 0.49 0.00 0.42 -0.05 0.54 0.04
MoE Larage Language Models
TradExpert-NM 0.59 0.12 0.60 0.15 0.59 0.12 0.64 0.19

et al. 2021). To reduce computational costs in backtesting, 0.5 TradExpert LightGBM


we evaluated all methods on datasets with stocks on the A2C SVM
0.4 PPO IEEI
DOW 30 list, a subset of the S&P 500, with around 30 DJI Index SARL
Cumulative Return

0.3 XGBoost DeepTrader


stocks.
0.2
Results 0.1
Stock Movement Prediction We implemented all base- 0.0
lines ourselves or utilized existing open-source codes, ex- 0.1
cept the closed-source model SLOT, for which we refer to 0.2
the metrics reported in the relevant paper. To ensure a fair
20 -02
20 -03
20 -04
20 -05
20 -06
20 -07
20 -08
20 -09
20 -10
20 -11
-12
23
23
23
23
23
23
23
23
23
23
23
comparison, we only included the News Analyst and Mar-
20

ket Analyst in TradExpert, named TradExeprt-NM. The re-


sults are shown in Table 2. As we can see, among hybrid Figure 6: Cumulative returns over time of all methods on
models, SLOT achieves outstanding accuracy and MCC on 30 stocks on DOW 30 list. DJI Index represents the market
the ACL18, benefitting from the proposed global market trend.
guidance. Among LLMs, InternLM shows remarkable per-
formance, particularly on our proprietary S&P500 dataset.
Our proposed TradExpert-NM, utilizing a mixture of ex-
pert LLMs approach, consistently outperformed other mod- riod is the same as the testing period of our datasets, which
els across all datasets except for MCC on the ACL18, show- ranges from January 1, 2023, to December 31, 2023. The
casing its superior performance. Noting that BigData22, results summarized in Table 3 demonstrate TradExpert’s su-
ACL18, and CIKM18 are relatively small datasets with texts perior performance across all metrics considered. Among
from tweets, while our S&P500 dataset consist of news arti- traditional models, XGBoost achieved a relatively high re-
cles with much more words. This difference in text lengths turn but also exhibited high volatility and drawdown, indi-
contributes to the more significant improvements obtained cating greater risk. Deep learning models generally outper-
by TradExpert-7B-NM on the S&P500 dataset. formed traditional models. Among them, DeepTrader stood
out with the highest return and Sharpe ratio. TradExpert, our
Stock Trading Simulation We perform backtesting to proposed model, significantly outperformed all other mod-
evaluate TradExpert and baselines. To reduce computational els with an exceptional AR of 49.79% and the lowest AV
costs in backtesting, we limit the stock pool to about 30 of 9.95%. This combination yielded an outstanding Sharpe
stocks on the DOW 30, a subset of the S&P 500. For Trad- ratio of 5.01, indicating a high return per unit of risk. Fig-
Expert, we implement a Buy-and-Hold trading strategy on ure 6 shows the trends of cumulative returns over time for
the Top-K stocks ranked by TradExpert. The backtesting pe- all methods.
Table 3: Comparison results on stock trading simulation Table 5: Ablation study for the effectiveness of structured
task with stocks on the DOW 30. Annualized Return (AR), data reasoning in predicting day T + 1’s returns.
Sharpe Ratio (SR), Annualized Volatility (AV), and Max-
imum Drawdown (MD) are utilized to evaluate the profits Model RankIC ↑ RankICIR ↑
and risks of methods. The best results are in bold.
TradExpert-MA 0.12 0.90
Alpha Combination 0.07 0.65
AR ↑ AV ↓ SR ↑ MD ↓
DJI Index 13.92% 11.41% 1.22 9.70% Table 6: Ablation study for the choices of ranking algorithm.
Traditional Models † denotes being equipped in TradExpert.
SVM 15.77% 26.67% 0.59 19.94%
XGBoost 21.58% 27.29% 0.79 21.90% RankIC ↑ RankICIR ↑ Time
LightGBM 2.17% 22.74% 0.1 21.29% †
RelaxedSort 0.12 0.90 O(N 2 )
Deep Learning Models
A2C 19.16% 11.29% 1.7 9.09% BubbleSort 0.06 0.65 O(N · K)
PPO 16.62% 11.51% 1.44 9.45% QuickSort 0.03 0.38 O(N log N )
EIIE 23.64% 13.73% 1.72 10.07%
SARL 21.87% 14.72% 1.49 8.52%
DeepTrader 32.45% 17.86% 1.82 15.32% returns. TradExpert-MA is built on top of the same alphas,
MoE Large Language Models where News and Fundamental experts were removed to ex-
TradExpert 49.79% 9.95% 5.01 6.56% clude affects from other sources. We compare TradExpert-
MA with the combination of alphas using metrics of Ran-
Table 4: Ablation study for the impacts of experts. kIC and RankICIR. The results are shown in Table 5. The
improvements over the alpha combination demonstrate the
reasoning ability of TradExpert for structured data.
Configuration AR ↑ AV ↓ SR ↑ MD ↓
TradExpert 49.79% 9.95% 5.00 6.56% The Choices of Ranking algorithm In TradExpert, we
implement the Top-K ranking by sorting all stocks com-
w/o Market 30.87% 16.43% 1.88 13.29% pletely using a relaxed comparison-based algorithm, where
w/o News 31.92% 18.36% 1.74 13.04% TradExpert serves as the comparator. To justify our choice
w/o Alpha 41.65% 11.38% 3.66 8.94%
w/o Fundamental 44.32% 10.68% 4.15 7.82%
of this seemingly cumbersome approach, we conducted
comparison experiments with other theoretically more ef-
ficient ranking algorithms. Specifically, our alternatives in-
clude QuickSort and BubbleSort with time complexity
Ablation Study O(N log N ) and O(N ·K), respectively. The comparison re-
The Impacts of Experts To evaluate the effectiveness of sults in Table 6 demonstrate that our approach outperforms
each expert within the TradExpert framework, we created others, despite having a higher computational complexity.
multiple versions of TradExpert, each with a specific expert This is attributed to the non-transitive nature of LLM-based
removed. By comparing the performance of these modified comparator. Therefore, a greater number of comparisons
frameworks, we can assess the impact of each expert on the yield more accurate rankings in TradExpert.
overall functionality of TradExpert. The results in Table 4
reveal the varying degrees of impact of each expert. The Conclusion
Market Analyst and the News Analyst emerged as the most In this study, we introduced TradeExpert, a novel framework
critical, significantly influencing profitability and risk man- that harnesses the power of LLMs to enhance stock trading
agement, as seen by the largest drop in AR and AV when strategies. By integrating multiple specialized LLMs, each
they were removed, respectively. The Alpha Expert is obvi- focused on distinct aspects of financial data, TradeExpert
ously less impactful than the Market Analysts and the News provides a comprehensive and nuanced analysis that signif-
Analysts. The Fundamental Analyst had the smallest effect icantly outperforms traditional financial models in practice.
on daily trading metrics, but provided essential long-term Looking ahead, our goal is to explore how to employ Trade-
stability, evident from the modest changes in AR and MD Expert in the high-frequency trading scenario and extend its
upon its removal. This highlights a strategic balance in Trad- capabilities to encompass a wider range of global markets.
Expert, where each expert contributes uniquely to the final
decision and prediction. Limitation Although TradExpert has notable strengths,
its processing time poses certain challenges. On average,
The Effectiveness of Structured Data Reasoning. We it takes 4.7 seconds for a single stock with an Nvidia
show the effectiveness by comparing TradExpert-MA with A5000 GPU. For daily trading, this processing time is gener-
traditional models for structured data like OHLCV data and ally manageable. However, for scenarios demanding quicker
alpha factors. We use a genetic programming-based sym- decision-making, such as high-frequency trading, TradEx-
bolic regression model as our baseline, which mines alpha pert’s latency becomes a notable drawback.
expressions aimed at predicting the RankIC of day T + 1’s
References Hu, E. J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang,
Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; S.; Wang, L.; and Chen, W. 2022. LoRA: Low-Rank Adap-
Aleman, F. L.; Almeida, D.; Altenschmidt, J.; Altman, S.; tation of Large Language Models. In International Confer-
Anadkat, S.; et al. 2023. Gpt-4 technical report. arXiv ence on Learning Representations.
preprint arXiv:2303.08774. Jiang, A. Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.;
AI@Meta. 2024. Llama 3 Model Card. Chaplot, D. S.; Casas, D. d. l.; Bressand, F.; Lengyel, G.;
Lample, G.; Saulnier, L.; et al. 2023. Mistral 7B. arXiv
Almazrouei, E.; Alobeidli, H.; Alshamsi, A.; Cappelli, A.; preprint arXiv:2310.06825.
Cojocaru, R.; Debbah, M.; Goffinet, É.; Hesslow, D.; Lau-
Jiang, Z.; Xu, D.; and Liang, J. 2017. A deep reinforcement
nay, J.; Malartic, Q.; et al. 2023. The falcon series of open
learning framework for the financial portfolio management
language models. arXiv preprint arXiv:2311.16867.
problem. arXiv preprint arXiv:1706.10059.
Araci, D. 2019. Finbert: Financial sentiment analy- Jin, M.; Wang, S.; Ma, L.; Chu, Z.; Zhang, J. Y.; Shi, X.;
sis with pre-trained language models. arXiv preprint Chen, P.-Y.; Liang, Y.; Li, Y.-F.; Pan, S.; et al. 2024. Time-
arXiv:1908.10063. LLM: Time Series Forecasting by Reprogramming Large
Baek, Y.; and Kim, H. Y. 2018. ModAugNet: A new Language Models. In The Twelfth International Conference
forecasting framework for stock market index value with on Learning Representations.
an overfitting prevention LSTM module and a prediction Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.;
LSTM module. Expert Systems with Applications, 113: 457– Ye, Q.; and Liu, T.-Y. 2017. Lightgbm: A highly efficient
480. gradient boosting decision tree. Advances in neural infor-
Breiman, L. 2001. Random forests. Machine learning, 45: mation processing systems, 30.
5–32. Li, X.; Li, Z.; Shi, C.; Xu, Y.; Du, Q.; Tan, M.; and Huang,
Brown, T. B. 2020. Language models are few-shot learners. J. 2024a. AlphaFin: Benchmarking Financial Analysis with
arXiv preprint ArXiv:2005.14165. Retrieval-Augmented Stock-Chain Framework. In Proceed-
Cai, Z.; Cao, M.; Chen, H.; Chen, K.; Chen, K.; Chen, X.; ings of the 2024 Joint International Conference on Com-
Chen, X.; Chen, Z.; Chen, Z.; Chu, P.; Dong, X.; Duan, H.; putational Linguistics, Language Resources and Evaluation
Fan, Q.; Fei, Z.; Gao, Y.; Ge, J.; Gu, C.; Gu, Y.; Gui, T.; Guo, (LREC-COLING 2024), 773–783.
A.; Guo, Q.; He, C.; Hu, Y.; Huang, T.; Jiang, T.; Jiao, P.; Jin, Li, X.; Shen, X.; Zeng, Y.; Xing, X.; and Xu, J. 2024b. Fin-
Z.; Lei, Z.; Li, J.; Li, J.; Li, L.; Li, S.; Li, W.; Li, Y.; Liu, H.; Report: Explainable Stock Earnings Forecasting via News
Liu, J.; Hong, J.; Liu, K.; Liu, K.; Liu, X.; Lv, C.; Lv, H.; Factor Analyzing Model. In Companion Proceedings of the
Lv, K.; Ma, L.; Ma, R.; Ma, Z.; Ning, W.; Ouyang, L.; Qiu, ACM on Web Conference 2024, 319–327.
J.; Qu, Y.; Shang, F.; Shao, Y.; Song, D.; Song, Z.; Sui, Z.; Liu, X.-Y.; Yang, H.; Chen, Q.; Zhang, R.; Yang, L.; Xiao,
Sun, P.; Sun, Y.; Tang, H.; Wang, B.; Wang, G.; Wang, J.; B.; and Wang, C. D. 2020. FinRL: A deep reinforcement
Wang, J.; Wang, R.; Wang, Y.; Wang, Z.; Wei, X.; Weng, learning library for automated stock trading in quantitative
Q.; Wu, F.; Xiong, Y.; Xu, C.; Xu, R.; Yan, H.; Yan, Y.; finance. arXiv preprint arXiv:2011.09607.
Yang, X.; Ye, H.; Ying, H.; Yu, J.; Yu, J.; Zang, Y.; Zhang, Liu, Y.; Zhou, H.; Guo, Z.; Shareghi, E.; Vulic, I.; Korhonen,
C.; Zhang, L.; Zhang, P.; Zhang, P.; Zhang, R.; Zhang, S.; A.; and Collier, N. 2024. Aligning with human judgement:
Zhang, S.; Zhang, W.; Zhang, W.; Zhang, X.; Zhang, X.; The role of pairwise preference in large language model
Zhao, H.; Zhao, Q.; Zhao, X.; Zhou, F.; Zhou, Z.; Zhuo, J.; evaluators. arXiv preprint arXiv:2403.16950.
Zou, Y.; Qiu, X.; Qiao, Y.; and Lin, D. 2024. InternLM2
Loh, W.-Y. 2011. Classification and regression trees. Wiley
Technical Report. arXiv:2403.17297.
interdisciplinary reviews: data mining and knowledge dis-
Chen, Z.; Zheng, L. N.; Lu, C.; Yuan, J.; and Zhu, D. 2023. covery, 1(1): 14–23.
ChatGPT informed graph neural network for stock move- Lopez-Lira, A.; and Tang, Y. 2023. Can chatgpt forecast
ment prediction. arXiv preprint arXiv:2306.03763. stock price movements? return predictability and large lan-
Christ, M.; Braun, N.; Neuffer, J.; and Kempa-Liehr, A. W. guage models. arXiv preprint arXiv:2304.07619.
2018. Time series feature extraction on basis of scalable hy- Lu, D.; Wu, H.; Liang, J.; Xu, Y.; He, Q.; Geng, Y.; Han,
pothesis tests (tsfresh–a python package). Neurocomputing, M.; Xin, Y.; and Xiao, Y. 2023. Bbt-fin: Comprehen-
307: 72–77. sive construction of chinese financial domain pre-trained
Cortes, C.; and Vapnik, V. 1995. Support-vector networks. language model, corpus and benchmark. arXiv preprint
Machine learning, 20: 273–297. arXiv:2302.09432.
Du, N.; Huang, Y.; Dai, A. M.; Tong, S.; Lepikhin, D.; Xu, Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.;
Y.; Krikun, M.; Zhou, Y.; Yu, A. W.; Firat, O.; et al. 2022. Harley, T.; Silver, D.; and Kavukcuoglu, K. 2016. Asyn-
Glam: Efficient scaling of language models with mixture-of- chronous methods for deep reinforcement learning. In In-
experts. In International Conference on Machine Learning, ternational conference on machine learning, 1928–1937.
5547–5569. PMLR. PMLR.
Eigen, D.; Ranzato, M.; and Sutskever, I. 2013. Learning Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; and
factored representations in a deep mixture of experts. arXiv Klimov, O. 2017. Proximal policy optimization algorithms.
preprint arXiv:1312.4314. arXiv preprint arXiv:1707.06347.
Shah, R. S.; Chawla, K.; Eidnani, D.; Shah, A.; Du, W.; Yang, H.; Liu, X.-Y.; and Wang, C. D. 2023. FinGPT: Open-
Chava, S.; Raman, N.; Smiley, C.; Chen, J.; and Yang, D. Source Financial Large Language Models. FinLLM Sympo-
2022. When FLUE Meets FLANG: Benchmarks and Large sium at IJCAI 2023.
Pretrained Language Model for Financial Domain. In Pro- Yang, H.; Liu, X.-Y.; Zhong, S.; and Walid, A. 2020. Deep
ceedings of the 2022 Conference on Empirical Methods in reinforcement learning for automated stock trading: An en-
Natural Language Processing (EMNLP). Association for semble strategy. In Proceedings of the first ACM interna-
Computational Linguistics. tional conference on AI in finance, 1–8.
Shen, S.; Hou, L.; Zhou, Y.; Du, N.; Longpre, S.; Wei, Yang, Y.; Tang, Y.; and Tam, K. Y. 2023. Investlm: A large
J.; Chung, H. W.; Zoph, B.; Fedus, W.; Chen, X.; et al. language model for investment using financial domain in-
2023. Mixture-of-experts meets instruction tuning: A win- struction tuning. arXiv preprint arXiv:2309.13064.
ning combination for large language models. arXiv preprint
Ye, Y.; Pei, H.; Wang, B.; Chen, P.-Y.; Zhu, Y.; Xiao, J.; and
arXiv:2305.14705.
Li, B. 2020. Reinforcement-learning based portfolio man-
Soun, Y.; Yoo, J.; Cho, M.; Jeon, J.; and Kang, U. 2022. agement with augmented asset movement prediction states.
Accurate stock movement prediction with self-supervised In Proceedings of the AAAI conference on artificial intelli-
learning from sparse noisy tweets. In 2022 IEEE Inter- gence, volume 34, 1112–1119.
national Conference on Big Data (Big Data), 1691–1700.
Zeng, Z.; Kaur, R.; Siddagangappa, S.; Rahimi, S.; Balch, T.;
IEEE.
and Veloso, M. 2023. Financial time series forecasting using
Team, G.; Anil, R.; Borgeaud, S.; Wu, Y.; Alayrac, J.-B.; cnn and transformer. arXiv preprint arXiv:2304.04912.
Yu, J.; Soricut, R.; Schalkwyk, J.; Dai, A. M.; Hauth, A.;
Zou, J.; Cao, H.; Liu, L.; Lin, Y.; Abbasnejad, E.; and Shi,
et al. 2023. Gemini: a family of highly capable multimodal
J. Q. 2022. Astock: A new dataset and automated stock trad-
models. arXiv preprint arXiv:2312.11805.
ing based on stock-specific news analyzing model. arXiv
Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, preprint arXiv:2206.06606.
M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.;
Azhar, F.; et al. 2023a. Llama: Open and efficient foundation
language models. arXiv preprint arXiv:2302.13971.
Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.;
Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale,
S.; et al. 2023b. Llama 2: Open foundation and fine-tuned
chat models. arXiv preprint arXiv:2307.09288.
Wang, Z.; Huang, B.; Tu, S.; Zhang, K.; and Xu, L. 2021.
DeepTrader: a deep reinforcement learning approach for
risk-return balanced portfolio management with market con-
ditions Embedding. In Proceedings of the AAAI conference
on artificial intelligence, volume 35, 643–650.
Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.;
Chi, E.; Le, Q. V.; Zhou, D.; et al. 2022. Chain-of-
thought prompting elicits reasoning in large language mod-
els. Advances in neural information processing systems, 35:
24824–24837.
Wu, H.; Zhang, W.; Shen, W.; and Wang, J. 2018. Hybrid
deep sequential modeling for social text-driven stock pre-
diction. In Proceedings of the 27th ACM international con-
ference on information and knowledge management, 1627–
1630.
Wu, S.; Irsoy, O.; Lu, S.; Dabravolski, V.; Dredze, M.;
Gehrmann, S.; Kambadur, P.; Rosenberg, D.; and Mann, G.
2023. Bloomberggpt: A large language model for finance.
arXiv preprint arXiv:2303.17564.
Xie, Q.; Han, W.; Zhang, X.; Lai, Y.; Peng, M.; Lopez-Lira,
A.; and Huang, J. 2023. PIXIU: a large language model,
instruction data and evaluation benchmark for finance. In
Proceedings of the 37th International Conference on Neural
Information Processing Systems, 33469–33484.
Xu, Y.; and Cohen, S. B. 2018. Stock movement predic-
tion from tweets and historical prices. In Proceedings of the
56th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), 1970–1979.

You might also like