0% found this document useful (0 votes)
44 views26 pages

StockGPT A GenAI Model For Stock Prediction and Trading

Uploaded by

sasa332138
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views26 pages

StockGPT A GenAI Model For Stock Prediction and Trading

Uploaded by

sasa332138
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

StockGPT: A GenAI Model for Stock Prediction and Trading∗

Dat Mai

September 2024
arXiv:2404.05101v3 [q-fin.CP] 23 Oct 2024

Click here for the most updated version

Abstract

This paper introduces StockGPT, an autoregressive number model trained and tested on 70
million daily U.S. stock returns over nearly 100 years. Treating each return series as a sequence
of tokens, StockGPT automatically learns the hidden patterns predictive of future returns via
its attention mechanism. On a held-out test sample from 2001 to 2023, daily and monthly
rebalanced long-short portfolios formed from StockGPT predictions yield strong performance.
The StockGPT-based portfolios span momentum and long-/short-term reversals, eliminating
the need for manually crafted price-based strategies, and yield highly signicant alphas against
leading stock market factors, suggesting a novel AI pricing eect. This highlights the immense
promise of generative AI in surpassing human in making complex nancial investment decisions.

Key words: generative articial intelligence, transformer, decoder, stock market, investment,
trading, return prediction


Dat Mai, PhD, CFA ([email protected]) is a quantitative researcher at MKT MediaStats, LLC. I have no
conicts of interest to disclose. The views expressed herein are solely my own and do not reect those of my employer.
This paper was written as part of my postdoctoral research at the University of Missouri-Columbia. I would like to
thank Andrej Karpathy for publicly sharing his lecture and code on the GPT architecture. I acknowledge helpful
comments from the participants at the Citi’s Data Science Seminar and the 2024 Chicago Quantitative Alliance
(CQA) Fall Conference.
1 Introduction

Generative articial intelligence (GenAI)—a set of advanced technologies capable of generating

texts, images, videos, programming codes, or arts from instructions via sounds or texts—has taken

the society by storm and exerted wide-range inuences on many aspects of the world economy (Bal-

dassarre et al. 2023; Dell’Acqua et al. 2023; Mannuru et al. 2023; Noy and Zhang 2023; Otis et al.

2023; Sætra 2023). Although it had been around for years, GenAI came to public prominence since

the introduction of ChatGPT in November 2022, a chatbox able to generate answers, reasoning,

and conversations at human level.

Since its introduction, ChatGPT and similar large language models have quickly made their

ways into the investment industry. One common use of ChatGPT for investment is to give trading

recommendations directly from news about a company (such as news articles or corporate com-

munications) (Lopez-Lira and Tang 2023). A less direct approach is to rely on similar pretrained

language models such as BERT (Devlin et al. 2018) and OPT (Zhang et al. 2022) to generate a

sentiment score for each company which is then used to make trading decisions. For example,

Jiang, Kelly, and Xiu (2022) and Kirtac and Germano (2024) nd that stock portfolios formed on

sentiment scores generated by BERT and OPT have impressive performance.

This paper contributes to this fast-evolving eld by applying the GenAI logic to numeric stock

data. That is, I rst train a new Generative Pretrained Transformer (GPT) model (Brown et al.

2020) from scratch on numeric stock data (hereafter StockGPT) and then show that StockGPT has

the potential to produce strong investment performance.1 Unlike previous nance domain-specic

language models that are pretrained on nancial texts such as FinBERT (Yang, UY, and Huang

2020) and BloombergGPT (Wu et al. 2023), to the best of my knowledge, StockGPT is the rst of

its kind to be pretrained directly on numeric stock return data.

For the trading purpose, using a model trained directly on stock data has three important

advantages over models trained on texts: (i) the model learns price patterns directly from price

data rather than from news about prices, and (ii) the model predictions are available for each stock

at each time point rather than dependent on the availability of news data about stocks, and (iii)

the model predicts the whole distribution of future returns rather than a point estimate.

1
ChatGPT is GPT netuned for the conversational purpose.

1
Language models such as GPT operates by predicting the next most likely token given the

previous ones, p(xt+1 |xt , . . . x1 ). This nature bears a strong resemblance to numeric time series

data such as stock returns where data points come in order and the next value is conditional on

what comes before it. Hence the natural question is whether the architecture of language models

can be applied to numeric time series data. To do so, one fundamental dierence between texts and

numbers needs to be addressed: texts are a collection of (vast but) discrete tokens while numeric

time series are generally continuous. Therefore, to train a generative model for stock returns, I

rst discretize stock return data into intervals (or tokens) and then apply the language model

architecture.

To build the StockGPT model, I adapt a light-weight version of the GPT architecture, which

consists of four attention blocks having about one million parameters. Input into the model is a

sequence of 256 consecutive daily returns on each stock (i.e., the block size in language models)

which approximates the number of trading days in a year.2 The training objective is to predict

the next return value given its previous returns using the transformer architecture, which receives

indexes (or positions) of the tokens in a sequence, retrieves their vector representations, and models

their dependencies via a mechanism called attention (Vaswani et al. 2017). The training sample

consists of around 50 million daily U.S. stock returns from 1926 to 2000, which covers almost all

stocks that have ever been listed on the U.S. stock market during the 20th century. The model is

tested on a hold-out sample of around 20 million daily U.S. stock returns from 2001 to 2023.

Notably, the model is trained only once using the training sample and applied o-the-shelf to

the out-of-sample period. This study design serves two purposes: (i) it is the cleanest setup to

test the eectiveness of the model and (ii) it reduces the computational costs. Despite this simple

setup, the model still delivers strong performance up to 23 years after the period it is trained on. In

practice, the model should be continually retrained with the arrival of new nancial data to uphold

its relevance and performance. This is especially needed in a dynamic environment like the stock

market featured by a low signal to noise ratio and constantly distributional shifts (Kelly, Xiu, et al.

2023).

During the testing phase, for each stock on each trading day t, StockGPT uses 256 daily returns

from t − 255 to t to make a return forecast for t + 1. The evaluation of the forecasts consists of
2
It is a convention in machine learning to specify model parameters in powers of 2.

2
two steps. First, I examine the accuracy of the forecasts by running cross sectional regressions

of realized stock returns on day t + 1 onto return predictions for t + 1. The results indicate that

StockGPT makes fairly accurate predictions.

The second evaluation step entails building real time trading portfolios based on StockGPT

forecasts. At the market close of each trading day t, I build zero cost portfolios by going long/short

the top/bottom decile of stocks having the highest/lowest return forecasts for day t+1 and rebalance

the portfolio at the t + 1 market close. To avoid trading only micro stocks since these stocks are

illiquid and incur high transaction and market impact costs, before forming the portfolio, I remove

stocks below the 10th percentile market value at the market close.

Under the equal-weighting scheme where each stock receives an equal weight in the portfolio,

this daily rebalanced long-short portfolio earns an average annualized returns of 119% from 2001

to 2023, achieving a Sharpe ratio of 6.5. This performance is higher than the best daily-rebalanced

portfolio based on language model predictions in Jiang, Kelly, and Xiu (2022), which has an annual

return of 50% and Sharpe ratio of 4.8 from 2004 to 2019. It is noteworthy that while the prediction

model in Jiang, Kelly, and Xiu (2022) is retrained every year,3 StockGPT is trained only once

using data up to 2000.

Under the value weighting scheme where stock weights in the portfolio are proportional to

their market values, the StockGPT-based portfolio achieves an average annualized return of 27%

and a Sharpe ratio of 1. Since value weighting gives more weight to stocks having higher market

values, this result is consistent with the consensus view in asset pricing that small stocks are more

predictable due to more mispricing and higher arbitrage costs (Baker and Wurgler 2006).

Since StockGPT makes its return forecasts using only historical price data, I examine how it

relates to common price-based strategies such as momentum and long-/short-term reversals. I nd

that the StockGPT-based portfolios span these strategies via the spanning test. This suggests that

AI is more eective than human in designing trading strategies based on historical price movements.

The StockGPT-based portfolios also encompass several factors of the Fama and French (2015) ve

factor model and the Hou et al. (2021) q-factor model.

It is noteworthy that unlike several elds such as medical or law where GenAI is expected to

3
Specically, they retrieve contextual word embeddings from pretrained OPT and BERT and use these embed-
dings to retrain the return prediction model every year.

3
generate (100%) correct responses, StockGPT does not need to accurately predict future returns

on individual stocks for it to become useful. Instead, in the context of cross-sectional asset pricing,

it is only required to identify the groups of stocks that are more likely to go up/down to facilitate

the long/short trading strategy.

While the daily results show the proof of concept that the GPT model can be applied to numeric

stock data to yield strong investment results, it is practically challenging to trade hundreds of stocks

on a daily basis, especially the small cap ones. Therefore, I also experiment with a more realistic

StockGPT model that makes monthly return predictions. Specically, I train a new model to

predict the returns over the next 20 days instead of the next-day return as before. I then use this

model to make monthly return forecasts and form monthly rebalanced portfolios.

On average, this strategy earns an annual return of 13% with a Sharpe ratio of 1 from 2001 to

2023, outperforming 11 common stock factors by a large margin (these factors include momentum,

long-/short-term reversals, ve factors from Fama and French (2015), and three factors from Hou et

al. (2021)). The performance persists if I focus on only the 50% largest companies based on market

cap or retain only stocks listed on NYSE. The StockGPT portfolio also earns a highly signicantly

annual alpha of 16% (t-statistic of 4.7) against all of these factors combined, suggesting a new

AI-based pricing eect not captured by standard asset pricing factors. In other words, StockGPT

can be combined with standard pricing factors to improve the overall risk-return prole.

Although StockGPT shows promising performance, the model can be enhanced in several ways

by practitioners to achieve better results. First, the model should be retrained frequently (such

as monthly) to maintain its relevance and performance. Second, StockGPT as introduced in this

paper is a light-weight adoption of the GPT architecture; it is an open question whether extending

the model along several of its parameter dimensions will yield better performance. Third, training

StockGPT with high frequency stock data can be a fruitful avenue since there is evidence of alpha

to be extracted from the order book (Kolm, Turiel, and Westray 2023). These three enhancements

can be readily implemented given enough computing power. In addition, one area that needs more

exploration and research eorts is how to modify the model itself or the training of it to work better

with large cap stocks.

4
2 Model Architecture

2.1 Overview

StockGPT uses a vanilla decoder-only transformer architecture which is the second step of the

canonical transformer model developed by Vaswani et al. (2017). The decoder-only transformer

is also the architecture of ChatGPT. Figure 1 depicts the sketch of the architecture. Specically,

the decoder receives an input sequence of tokens x = (x1 , x2 , . . . , xt−1 , xt ), transforms its via mul-

tiple layers of attention, and outputs the probability of each next token, p(x2 |x1 ), p(x3 |x2 , x1 ),. . . ,

p(xt+1 |xt , . . . x1 ).

During the training phase, the model learns and updates its parameters via mimimizing the

cross-entropy loss between a token prediction and its actual value l(x̂t+1 , xt+1 ) averaging across all

tokens across all sequences in a training batch. During the deployment phase, the decoder gener-

ates an output sequence of tokens (xt+1 , xt+2 , . . . , xt+m ) one at a time, given the input sequence

x. Specically, it receives an input sequence (x1 , x2 , . . . , xt−1 , xt ), converts it into a conditional

probability distribution p(xt+1 |xt . . . x1 ), and generates the next token from this distribution. The

decoder model is autoregressive in the sense that it consumes its own generated output at each

time as additional input to generate the next one, i.e., p(xt+2 |x̂t+1 , xt , . . . , x1 ) where x̂t+1 is previ-

ously generated.

2.2 Details

Since computer itself does not understand human texts, the transformer rst quanties text tokens

via token and positional embedding. Token embedding simply retrieves a unique vector representa-

tion for each token in a dictionary containing all available tokens. Positional embedding vectorizes

each token position in an input sequence. Without positional embedding, the transformer cannot

understand the context and order of tokens. The transformer then sums up token and positional

embedding vectors for each token. These embeddings are learnable parameters.

embedding = token embedding + positional embedding (1)

5
For example, in the sentence The [rm] made a [rm] decision about its capital structure. the

two [rm] words have the same token embeddings but dierent positional embeddings due to their

positions.

At the heart of the transformer model is the attention mechanism. Accordingly, for each token,

the transformer generates three vectors from its embedding vector: key k, query q, and value v.

The attention for each token t is the weighted sum of its vt with all vi ’s of tokens preceding it,

weighted by the product of its qt with ki ’s of those tokens and a normalizing constant.


attentiont = vi × w i with wi = qt × ki × norm. const. (2)
i=1...t

Intuitively, a token emits a query and the previous tokens that can match its query (i.e., having a

high qt × ki value) get its attention. k, q, and v are also learnable parameters.4 This mechanism

constitutes a self-attention head and helps the transformer develop a contextual understanding

of the tokens.5 In the above example, the attention mechanism helps the model understand that

[its] refers to the rst [rm]. Since each token is only inuenced by the tokens before it, this setup

is autoregressive.

The transformer concatenates multiple attention heads into a multi-head node which is sequen-

tially followed by multiple linear layers to form an attention block.6 Multiple attention blocks are

then stacked on top of each other. The last attention block is follows by a layer normalization

and a linear layer whose output is converted into a vector of probabilities via a softmax activation.

Specically, at time step t, the transformer outputs p(xt+1 |xt . . . x1 ) which is the multinomial dis-

tribution over all available tokens in the dictionary, conditional on all tokens up to t. Given this

distribution, the model can sample the most likely token at t + 1 from its dictionary.

4
Technically speaking, the model learns the weight matrices that produce these vectors.
5
This tutorial by Andrej Karpathy gives an intuitive step-by-step introduction to the GPT architecture:
https://www.youtube.com/watch?v=kCc8FmEb1nY&t=4901s.
6
Specically, for StockGPT, in the rst step of the attention block, the input goes through a layer normalization,
then multi-head concatenation, followed by a linear layer with dropout. The output of this layer is added up to the
input via a skip connection. In the second step, the output of the rst step goes through a second layer normalization,
followed by a linear expanding layer to increase the input dimension by 4 times, a ReLU activation, and a contracting
layer to revert to the input dimension with dropout. Again the output of this second step is added up to its input
via a second skip connection.

6
2.3 StockGPT Specics

Since the transformer can only work with text tokens, to use it on continuous stock return data,

the rst step is to discretize returns into intervals. Table 1 illustrates the discretization rule.

Accordingly, I rst convert returns into integer basis points by multiplying them by 10,000 and

keeping the integer portion. Next, I cut the basis points into intervals of 50, closed on right. The

rst interval closes on -10,000. Since stock prices cannot be negative, returns cannot be lower than

-10,000 basis points (i.e., -100%); therefore, the rst bin contains only -10,000. The last closed

interval is (9,950, 10,000]. Third, for each bin, I use the mid value of each interval to represent

its value with the exception that the rst bin (-Inf, -10,000] is represented by -10,000 and the last

bin (10 000, Inf) by 10,000. In other words, I treat all daily returns greater than 100% as 100%.

Values above this threshold are extremely rare since the 1th to 99th percentile of daily returns in

the training set is from -9.6% to 11.1%. Finally, the bins are numbered from 0 to 401. Therefore,

my return dictionary has a total of 402 tokens where each token is a return bin midpoint. As

an example of the discretization rule, the following return sequence (-2.4%, 0%, 0%, 5%, 4.8%) is

converted into the index sequence (196, 200, 200, 210, 210) which is input into StockGPT.

Besides the vocabulary size of 402, StockGPT has a block size (i.e., length of each input sequence)

of 256, token and positional embedding sizes of 128, 4 attention blocks each consisting of 4 self-

attention heads, and a dropout probability of 0.2 in various layers. Taken together, StockGPT has

0.93 million parameters. StockGPT is trained in 10,000 training steps with each step consisting of

64 sequences (i.e, batch size) drawn randomly from the training data.7 The probability of sampling

each stock during training is proportional to the number of daily return observations it has.

As discussed above, to make a return forecast, given a 256-return sequence input (xt−255 , . . . , xt ),

StockGPT will output p(xt+1 |xt , . . . , xt−255 ), a multinomial distribution over 402 return bins.8 The

model produces output in terms of bin indexes which are converted to numeric returns using the bin

midpoints in Table 1. The expected return for day t + 1 will then be the weighted average of return

7
During the training phase, the cross-entropy loss stabilizes at around 2.5 after
 5,000 training steps. With 402
labels (the number of return bins), the maximum cross-entropy would be E = − i log(1/402) × (1/402) = 6. The
model is fully trained locally on a MacBook M2 with 64GB RAM and 30 GPU cores.
8
Technically, input of sequence of any length from 1 to 256 (i.e., the block size during training) can be used to
make forecasts. Analogously, ChatGPT prompts can be of any length up to a limit (around 2048 tokens). However,
since StockGPT is trained with block size of 256, I also use input sequence of 256 days in making forecasts to utilize
all price patterns the model has discovered during training.

7
bin midpoints weighted by the corresponding bin probabilities presented by p(xt+1 |xt , . . . , xt−255 ).

Alternatively, the expected returns on day t + 1 can be computed by sampling many forecasts from

p(xt+1 |xt , . . . , xt−255 ) and averaging them. The two approaches will produce the same results if the

number of drawn samples is large but the latter approach is more computationally intensive. To

make return forecasts over the next m days, we can recursively sample several paths of forecasts

xj = (xt+1 , xt+2 , . . . , xt+m ) and average across the paths.

3 Data

Stock return data comes from Center for Research in Security Prices (CRSP) that collects all

historical U.S. stock returns from 1926 to 2023. As standard in asset pricing research, I include

only common stocks with a share code of 10 or 11 traded on three main exchanges NYSE, AMEX

and NASDAQ. This sample consists of around 70 million stock observations from 1926 to 2023.

This sample is then split into two parts: the sample from 1926 to 2000 for training and the sample

from 2001 to 2023 for testing. Within the training sample, data from 1926 to 1990 is used for

parameter optimization and data from 1991 to 2000 for hyperparameter tuning and evaluation.

During training evaluation, I document that the model using stocks from NYSE alone has lower

evaluation loss than the one using all three exchanges, 2.55 versus 2.72. This may be because

NYSE is the world largest stock exchange that lists high quality large cap stocks while AMEX and

NASDAQ list smaller stocks that add noises to the training process. Therefore, the main results

focus on the model trained on NYSE data alone while the results using the model trained on all

three exchanges are reported in Table A.1.9

During the testing phase, stock returns from all three exchanges are used. As noted in the

introduction, the model is trained only once using the training sample and kept unchanged during

the testing phase.

9
The latter model still produces annual returns of 83% with Sharpe ratio of 5.

8
4 Results: Daily Prediction

4.1 Fama–MacBeth Regression

To evaluate the quality of return forecast for day t + 1, I rst compare it against the actual return

on that day. Specically, for each trading day t, I run the following cross-sectional regression

xit+1 = at + bt × x̂it+1 + eit+1 (3)

where xit+1 is the actual realized return of stock i on day t + 1 and x̂it+1 is its StockGPT return

forecast. The slope bt and regression adjusted Rt2 are then averaged across all trading days in the

test sample. These measure how well StockGPT forecasts track the actual returns. This regression

specication is referred to as the Fama-MacBeth regression in the asset pricing literature (Fama

and MacBeth 1973).

Table 2 reports the results. Accordingly, the average slope coecient is 0.5, indicating that

a cross-sectional dierence of 100 basis points (i.e, 1%) in StockGPT return predictions signals a

dierence of 50 basis points in realized returns. Moreover, the average cross-sectional R2 is 1.2%

equivalent to a 11% cross-sectional correlation between return predictions and actual returns. For

comparison, the average correlation between return forecasts based on language models and actual

returns is around 2% in Jiang, Kelly, and Xiu (2022). I also examine the relation between return

forecasts for day t + 1 and realized returns on day t + 2 (i.e., skipping one day). For this test, the

slope coecient is 0.09 and R2 is 0.4% which translates into a 6% correlation. The slopes in both

tests are highly signicant with t-statistics over 10. Overall, GPT forecasts track future returns

well even after one day.

4.2 Portfolio Sorting

The main empirical analysis is to examine the trading implications of StockGPT forecasts. To

do so, on each trading day t, I buy the top stock decile having the highest return forecasts for

t + 1 (High portfolio) and sell the bottom decile having the lowest return forecasts for t + 1 (Low

portfolio). To avoid only trading micro-cap stocks, I remove stocks having market value below

the 10th percentile each day. In Table 3, under equal weighting (EW), this long-short portfolio

9
yields an average annual return of 119% with a Sharpe ratio (mean/standard deviation) of 6.5.10

If I remove stocks having prices below $1, $3, and $5, the mean returns (and Sharpe ratios) are

110% (6.3), 86% (5.2), and 74% (4.7), respectively. The left graph in Panel A of Figure 2 plots the

log cumulative returns of these 4 long-short portfolios. These portfolios show a consistent upward

trend throughout the 2001-2023 sample with the biggest jump in 2009 after the nancial crisis.

The right graph in Panel A plots the cumulative returns of each long/short leg of the portfolios. It

is clear that StockGPT can symmetrically predict both rising and falling stocks.

While the annual return of 119% under equal weighting in the baseline model is before trans-

action costs, under the hypothetical worst-case scenario that the portfolio replaces all of its con-

stituents every day (i.e., a turnover of 400% in a long-short portfolio) and each trade costs 5 basis

points, the StockGPT-based strategy still realizes an annual return of 69% net of transaction costs.

Under value weighting (VW), the long-short portfolio without price lter (but still after removing

the bottom decile based on market cap) earns an annual returns of 27% and Sharpe ratio of 1. The

Sharpe ratio of the portfolios with price lters are at 1, 0.9, and 0.8 for the $1, $3, and $5 price

thresholds, respectively. Since value weighting gives more weight to large cap stocks, this result

indicates that StockGPT is more eective at forecasting returns of small cap stocks. This is expected

since small cap stocks are more likely to be mispriced (Baker and Wurgler 2006).

Table 3 also reports the portfolio when return forecasts for t + 1 are used to form portfolio for

t + 2. Under equal weighting, this skipping-one-day portfolio earns 26% annually with a Sharpe

ratio of 1.7. Panel B of Figure 2 shows that when one day is skipped, StockGPT forecasts track

the returns in the long leg better than the short one.

4.3 Spanning Test

Since StockGPT uses only historical market price data to make return forecasts, it is important

to examine how the StockGPT-based portfolio relates to prominent trading strategies based on

historical returns. The three most notable patterns are short-term reversal (using returns from

month t − 1) by Jegadeesh (1990), momentum (using returns from month t − 2 to t − 12) by

Jegadeesh and Titman (1993), and long-term reversal (using returns from month t − 13 to t − 60)
10
Without the market cap restriction, the StockGPT-based portfolio would earns 230% annually with a Sharpe
ratio of 10. On the other hand, if stocks having market cap below the 30th percentile are removed, the resulting
portfolio earns 50% annually with a Sharpe ratio of 2.9.

10
by De Bondt and Thaler (1985). It is also interesting to examine how StockGPT performs relative

to leading stock factors such as the ve factors of Fama and French (2015) and the investment-based

q5 factors of Hou et al. (2021).11

As standard in asset pricing research, to examine whether a strategy earns abnormal returns

relative to a set of other traded factors, we can perform the following contemporaneous regression:

yt = α + β × xt + et (4)

where yt is the return of the target strategy and xt is the set of benchmark factors. If α is signicant,

then yt earns abnormal returns relative to xt ; otherwise, yt is spanned or encompassed by xt . This

test is also referred to as the spanning test.

Panel A of Table 4 reports the results of the spanning tests in which yt is the daily returns

on the StockGPT-based portfolios and xt is the set of benchmark factors. Accordingly, both the

equal-weighted and value-weighted StockGPT portfolios earn sizable and highly signicant alphas

relative to all 11 benchmark factors (t-statistics are greater than 10 for EW and greater than 3 for

VW).

In Panel B, I test whether the StockGPT portfolios span the benchmark factors. The equal-

weighted StockGPT portfolio spans momentum, long-term reversal, value, and size while the value-

weighted StockGPT portfolio spans 9 out of 11 factors except protability from Fama-French and

earning growth from q5 models. That the value-weighted StockGPT portfolio better spans the

other factors than does the equal-weighted StockGPT portfolios is expected since those factors are

also value-weighted.

Overall, the spanning tests show that when we let the stock data speaks for itself via the

attention mechanism in StockGPT, handcrafted price-based strategies such as short-term reversal,

momentum, and long-term reversal are no longer needed. Notably, although StockGPT only learns

from historical returns over the past 12 months, it completely encompasses the long-term reversal

pattern based on returns beyond the past 12 months. Furthermore, StockGPT-based portfolios

also encompass most leading stock factors.

11
The momentum, reversal, and ve factors of Fama and French (2015) are available at
https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data library.html while the q5 factors of Hou et al.
(2021) are at https://global-q.org/factors.html.

11
5 Results: Monthly Prediction

While the daily results show proof of concept that StockGPT can deliver strong investment per-

formance, it is costly and challenging to trade hundreds of stocks on a daily basis. In this section,

I examine the arising question of whether StockGPT can be used to make longer term forecasts to

build lower frequency portfolios.

There are two ways to produce long-term return forecasts over the next m days from StockGPT.

The rst approach it to produce several paths of forecasts xj = (xt+1 , xt+2 , . . . , xt+m ) and average

across the paths to compute the expected returns over the next m days. However, this approach is

very computationally expensive since there are on average 3,000 to 4,000 stocks traded in the cross

section. For each stock on each rebalance day, we need to make many m-day forecast paths and

average them.

The second approach is to train a new StockGPT model where the training target is to predict

return over the next m days (i.e., p(x̄t+1→t+m |xt , . . . , xt−255 ) where x̄t+1→t+m is the mean return

over the next m days). I pursue this approach in this section. Specically, I train a StockGPT model

using historical returns to predict mean returns over the next 20 days (i.e., 20 days approximates

the number of trading days in a month) with all other specications kept unchanged from the daily

model. Like before, the model is trained only once using data up to 2000. During the testing phase,

at the end of each month for each stock, an input sequence of 256 previous daily returns for that

stock is used by the new StockGPT model to predict the next 20-day returns (i.e., return over the

next month).

5.1 Fama–MacBeth Regression

To evaluate the quality of long-term forecasts by StockGPT, I regress realized monthly returns

onto 20-day return forecasts via the Fama-McBeth test discussed above. As reported in Table 5,

the average slope coecient is 3 (signicant at 5%), indicating that a cross-sectional dierence of

100 basis points (i.e, 1%) in StockGPT return forecasts signals a dierence of 300 basis points

in realized returns. Moreover, the average cross-sectional R2 is 0.55% equivalent to a 7.4% cross-

sectional correlation between return predictions and actual returns. When one month is skipped

between 20-day return forecasts and realized returns, the correlation shrinks to zero.

12
5.2 Portfolio Sorting

I then form monthly rebalanced long-short decile portfolios using the 20-day return forecasts and re-

port the performance statistics over 2001-2023 in Panel A of Table 6. The equal-weighted portfolios

after removing the bottom stock decile based on market value earn about 13% annually, signicant

at 1%, with Sharpe ratios around 1. To ensure the tradability of the strategy, I further remove

stocks in the bottom 30th percentile and the performance remains almost unchanged. When the

bottom half of all stocks are removed, the annual mean return falls to about 10% with the Sharpe

ratio falling to about 0.7. When only NYSE stocks are used, the average annual returns are 15%

with a Sharpe ratio of 0.9.

For comparison, in Panel B of Table 6, I report the summary statistics for 11 common stock

factors. Only 5 factors yield signicant returns, including short-term reversal, market, protability

from Fama and French (2015), and return on equity and earnings growth from Hou et al. (2021).

Among these factors, short-term reversal yields the strongest result with a mean return of 8.8% and

a Sharpe ratio of 0.7. It is clear that StockGPT portfolios across dierent specications outperform

these factors.

Panel A of Figure 3 plots the log cumulative returns on the StockGPT portfolios. The long-

short portfolios see a stable upward trend from 2001 to 2023. Between the two legs of the strategy,

StockGPT does better at predicting the future winners. Panel B plots the log cumulative returns

of the baseline StockGPT portfolio and 11 stock factors. Among the factors, short-term reversals

outperformed StockGPT before the nancial crisis but has lagged StockGPT by a large extent since

then.

5.3 Spanning Test

Table 7 reports the spanning tests. In Panel A, the monthly-rebalanced equal-weighted StockGPT

portfolio earns a signicant annual alpha of about 15% (t-statistic of 4.7) against all of the factors.

This suggests that StockGPT represents a new AI pricing eect not captured by standard factor

models. In Panel B, I check whether the stock factors earn alphas against StockGPT. Accordingly,

long-term reversal, value, and investment (from both Fama and French (2015) and Hou et al.

(2021)) are subsumed by StockGPT. Besides, the market alpha against StockGPT is only marginally

13
signicant at the 10% level.

Overall, while the investment performance of the monthly StockGPT model is far less impressive

than that of the daily model, it still outperforms all of the standard stock factors and yields

highly signicant alphas. The monthly results conrm that StockGPT can be used in practice to

implement tradable strategies.

6 Conclusion

This paper introduces StockGPT, a decoder-only transformer model trained directly on U.S. stock

returns. Instead of relying on manually crafted trading patterns using historical stock prices, Stock-

GPT automatically learns the hidden patterns most predictive of future returns via its attention

mechanism. Even though trained on daily returns only up to 2000, StockGPT delivers strong

performance up to 23 years later. The StockGPT-based portfolios encompass common price-based

trading strategies such as momentum and long-/short-term reversals and span several leading stock

factors such as market, size, value, and investment.

StockGPT can be enhanced in several ways. First, StockGPT should be retrained frequently

with the arrival of new stock data to maintain its performance. Second, StockGPT as introduced in

this paper is a light-weight model with only around one million parameters. The natural extension is

to examine bigger models having more granular return intervals, longer block size, bigger embedding

size, and more layers of attention blocks. Third, examining the long-term forecasts from the daily

StockGPT model (as discussed in Section 5) can be a fruitful direction. Finally, training StockGPT

with higher frequency data such as tick size data can yield promising results.

14
References
Baker, M., & Wurgler, J. (2006). Investor sentiment and the cross-section of stock returns. The
journal of Finance, 61 (4), 1645–1680.
Baldassarre, M. T., Caivano, D., Fernandez Nieto, B., Gigante, D., & Ragone, A. (2023). The social
impact of generative ai: An analysis on ChatGPT. Proceedings of the 2023 ACM Conference
on Information Technology for Social Good, 363–373.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam,
P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances
in neural information processing systems, 33, 1877–1901.
De Bondt, W. F., & Thaler, R. (1985). Does the stock market overreact? Journal of Finance, 40 (3),
793–805.
Dell’Acqua, F., McFowland, E., Mollick, E. R., Lifshitz-Assaf, H., Kellogg, K., Rajendran, S.,
Krayer, L., Candelon, F., & Lakhani, K. R. (2023). Navigating the jagged technological
frontier: Field experimental evidence of the eects of AI on knowledge worker productivity
and quality. Harvard Business School Technology & Operations Mgt. Unit Working Paper,
(24-013).
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirec-
tional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Fama, E. F., & French, K. R. (2015). A ve-factor asset pricing model. Journal of nancial eco-
nomics, 116 (1), 1–22.
Fama, E. F., & MacBeth, J. D. (1973). Risk, return, and equilibrium: Empirical tests. Journal of
political economy, 81 (3), 607–636.
Hou, K., Mo, H., Xue, C., & Zhang, L. (2021). An augmented q-factor model with expected growth.
Review of Finance, 25 (1), 1–41.
Jegadeesh, N. (1990). Evidence of predictable behavior of security returns. Journal of Finance,
45 (3), 881–898.
Jegadeesh, N., & Titman, S. (1993). Returns to Buying Winners and Selling Losers: Implications
for Stock Market Eciency. Journal of Finance, 48 (1), 65–91.
Jiang, J., Kelly, B. T., & Xiu, D. (2022). Expected returns and large language models. Available
at SSRN.
Kelly, B., Xiu, D., et al. (2023). Financial machine learning. Foundations and Trends® in Finance,
13 (3-4), 205–363.
Kirtac, K., & Germano, G. (2024). Sentiment Trading with Large Language Models. Finance Re-
search Letters.
Kolm, P. N., Turiel, J., & Westray, N. (2023). Deep order ow imbalance: Extracting alpha at
multiple horizons from the limit order book. Mathematical Finance, 33 (4), 1044–1081.
Lopez-Lira, A., & Tang, Y. (2023). Can ChatGPT forecast stock price movements? Return pre-
dictability and large language models. arXiv preprint arXiv:2304.07619.
Mannuru, N. R., Shahriar, S., Teel, Z. A., Wang, T., Lund, B. D., Tijani, S., Pohboon, C. O., Agbaji,
D., Alhassan, J., Galley, J., et al. (2023). Articial intelligence in developing countries: The

15
impact of generative articial intelligence (AI) technologies for development. Information
Development.
Newey, W. K., & West, K. D. (1987). Hypothesis testing with ecient method of moments estima-
tion. International Economic Review, 777–787.
Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity eects of generative
articial intelligence. Science, 381 (6654), 187–192.
Otis, N., Clarke, R. P., Delecourt, S., Holtz, D., & Koning, R. (2023). The uneven impact of
generative ai on entrepreneurial performance. Available at SSRN 4671369.
Sætra, H. S. (2023). Generative AI: Here to stay, but for good? Technology in Society, 75, 102372.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L
 ., & Polo-
sukhin, I. (2017). Attention is all you need. Advances in neural information processing
systems, 30.
Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., Kambadur, P., Rosenberg,
D., & Mann, G. (2023). BloombergGPT: A large language model for nance. arXiv preprint
arXiv:2303.17564.
Yang, Y., UY, M. C. S., & Huang, A. (2020). FinBERT: A Pretrained Language Model for Financial
Communications.
Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Dewan, C., Diab, M., Li, X., Lin,
X. V., et al. (2022). OPT: Open pre-trained transformer language models. arXiv preprint
arXiv:2205.01068.

16
x2 x3 x4 ........ xt-1 xt xt+1

Softmax

Linear

Attention Block 4

........

Attention Block 1

Positional + Token Embedding

x1 x2 x3 ........ xt-2 xt-1 xt

Figure 1: StockGPT Architecture

17
Panel A: Next Day
High-Low Portfolio High and Low Portfolios

Panel B: Skipping 1 Day


High-Low Portfolio High and Low Portfolios

Figure 2: Daily Cumulative Returns


This gure plots the log cumulative returns of long-short high-minus-low (HML), high (H), and low (L) portfolios
formed from StockGPT return forecasts. Portfolios are equal-weighted. Portfolios are formed after excluding stocks
in the bottom decile based on market value. Dierent portfolios correspond to dierent market price thresholds under
which stocks are excluded. Panel A (B) shows results when return forecasts for day t + 1 are used to form portfolios
for day t + 1 (t + 2). The left (right) panel shows results for the long-short (each leg) portfolios. The sample is daily
from January 2001 to December 2023.

18
Panel A: StockGPT Portfolios
High-Low Portfolio High and Low Portfolios

Panel B: Stock Factors

Figure 3: Monthly Cumulative Returns


Panel A plots the log cumulative returns of long-short high-minus-low (HML), high (H), and low (L) portfolios
formed from StockGPT return forecasts. Portfolios are equal-weighted. Portfolios are formed after excluding stocks
in the bottom decile based on market value. Dierent portfolios correspond to dierent market price thresholds
under which stocks are excluded. Panel B plots the log cumulative returns of StockGPT-based portfolio and stock
factors, including short-term reversal (ST Rev), momentum (Mom), long-term reversal (LT Rev), market (MKT),
value (HML), size (SMB), protability (RMW), and investment (CMA) from Fama and French (2015), as well as
investment (R IA), return on equity (R ROE), and earnings growth (R EG) from Hou et al. (2021). The sample is
monthly from February 2001 to December 2023. 19
Table 1: Return Bins
This table shows the return bins in basis points, bin midpoints, and the corresponding bin indexes.
Return Bin Bin Midpoint Bin Index
(-Inf, -10,000] -10 000 0
(-10 000, -9 950] -9 975 1
(-9 950, 9 900] -9 925 2
... ... ...
(9 950, 10 000] 9 975 400
(10 000, +Inf) 10 000 401

Table 2: Daily Fama-MacBeth Regression


This table reports the time series averages of slopes and adjusted R2 ’s of the following cross-sectional regression

xit+1 = at + bt × x̂it+1 + eit+1

where xit+1 is the actual realized return of stock i on day t + 1 and x̂it+1 is its StockGPT return forecast. Returns
are in basis points and R2 in percentage points. t is the t-statistic of time-series mean of bt computed using Newey
and West (1987) standard error with 20 lags. Horizon 1 (2) means comparing return forecasts for t + 1 with actual
returns on t + 1 (t + 2). The sample is daily from January 2001 to December 2023.

Horizon b t R2
1 0.50 25.18 1.19
2 0.09 10.20 0.41

Table 3: Daily Portfolio Statistics


This table reports the return statistics of the daily long-short StockGPT-based portfolios. Mean and SD (standard
deviation) are in annualized percentage points; Mean/SD (Sharpe ratio) is annualized; Min, Max, and MDD (max
drawdown) are in percentage points; and t-Mean is t-statistic of the mean portfolio return using Newey-West standard
error with 20 lags. Portfolios are formed after excluding stocks in the bottom decile based on market value. Horizon
1 (2) refers to using return forecasts for day t + 1 to form portfolios for day t + 1 (t + 2). EW (VW) refers to
equal-weighting (value weighting). Price Filter refers to the price level under which stocks are removed. The sample
is daily from January 2001 to December 2023.
Horizon Weight Price Filter Mean t-Mean SD Mean/SD Min Max MDD
1 EW 0 119.1 13.6 18.2 6.5 -9.8 18.3 -23.7
1 EW 1 110.4 13.8 17.5 6.3 -9.4 18.0 -25.4
1 EW 3 85.8 13.6 16.4 5.2 -8.9 17.2 -28.7
1 EW 5 73.8 13.6 15.8 4.7 -9.0 15.3 -28.5
1 VW 0 27.0 4.3 26.4 1.0 -15.0 20.3 -76.3
1 VW 1 25.4 4.3 25.6 1.0 -13.9 19.8 -74.2
1 VW 3 21.3 3.9 24.0 0.9 -13.2 17.2 -69.8
1 VW 5 18.8 3.7 23.2 0.8 -12.9 15.1 -73.0
2 EW 0 26.1 7.6 15.1 1.7 -7.3 15.4 -35.4
2 EW 1 25.6 7.7 14.8 1.7 -7.2 16.0 -33.8
2 EW 3 21.9 7.1 14.6 1.5 -8.6 14.8 -30.5
2 EW 5 19.5 6.7 14.3 1.4 -8.8 14.3 -26.5
2 VW 0 5.8 1.2 25.4 0.2 -16.6 18.7 -62.6
2 VW 1 5.8 1.2 24.7 0.2 -15.4 17.8 -63.1
2 VW 3 2.6 0.6 23.0 0.1 -15.4 16.1 -58.1
2 VW 5 1.8 0.4 22.0 0.1 -14.8 14.3 -57.9

20
Table 4: Daily Spanning Tests
This table reports results of the following spanning test

yt = α + β × x t + e t

In Panel A, yt is one of StockGPT-based portfolios and xt are short-term reversal (ST Rev), momentum (Mom),
long-term reversal (LT Rev), market (MKT), value (HML), size (SMB), protability (RMW), and investment (CMA)
from Fama and French (2015), as well as investment (R IA), return on equity (R ROE), and earnings growth (R EG)
from Hou et al. (2021). In Panel B, yt is one of the factors and xt is one of StockGPT-based portfolios. α is in
annualized percentage points and R2 is in percentage points. tβ is computed with Newey-West standard error using
20 lags. The sample is daily from January 2001 to December 2023.

Panel A: Stock Factors Span StockGPT


β tβ β tβ β tβ β tβ
EW α 111.12 13.73 112.38 13.76 116.80 13.31 116.53 13.17
EW ST Rev 0.57 11.66 0.61 11.14
EW Mom 0.06 2.48 0.00 0.08
EW LT Rev -0.12 -2.33 -0.05 -1.38
EW MKT 0.10 5.54 0.23 7.99 0.24 8.32
EW HML 0.09 1.84 0.05 0.80
EW SMB 0.04 0.86 0.01 0.13 0.02 0.26
EW RMW -0.02 -0.24 -0.05 -0.94
EW CMA -0.08 -0.54 -0.20 -1.87
EW R IA 0.14 1.16 -0.14 -1.65
EW R ROE -0.13 -1.57 0.03 0.39
EW R EG 0.18 2.70 -0.05 -0.46
EW R2 27.26 25.74 7.64 7.47
VW α 17.41 3.11 18.64 3.25 25.52 4.14 25.60 4.07
VW ST Rev 0.85 12.81 0.84 11.45
VW Mom 0.09 1.97 0.10 2.27
VW LT Rev -0.18 -2.56 -0.13 -2.09
VW MKT 0.03 1.22 0.21 4.48 0.24 4.76
VW HML 0.22 2.58 0.14 1.12
VW SMB -0.15 -2.76 -0.21 -2.33 -0.18 -2.33
VW RMW 0.02 0.25 0.14 1.44
VW CMA -0.10 -0.44 -0.38 -2.07
VW R IA 0.06 0.28 -0.26 -2.01
VW R ROE 0.15 1.41 0.40 3.69
VW R EG 0.10 1.05 -0.28 -1.55
VW R2 24.79 23.37 3.74 4.17
Panel B: StockGPT Spans Stock Factors
ST Rev Mom LT Rev MKT HML SMB RMW CMA R IA R ROE R EG
EW α -40.19 9.23 5.33 -25.98 -0.89 -0.90 10.20 6.72 6.68 9.37 8.85
EW tα -6.11 1.57 1.44 -3.47 -0.18 -0.26 4.30 3.05 2.66 3.27 2.71
EW β 0.42 -0.06 -0.04 0.29 0.02 0.03 -0.05 -0.04 -0.04 -0.05 -0.03
EW tβ 9.19 -1.92 -2.13 7.51 0.59 1.39 -4.75 -4.05 -3.50 -3.59 -1.82
EW R2 25.69 0.45 0.66 7.28 0.04 0.27 1.27 1.26 1.09 0.96 0.60
VW α 2.17 1.96 1.21 5.30 0.74 3.03 4.50 2.61 2.38 3.45 5.10
VW tα 0.75 0.48 0.52 1.28 0.26 1.46 2.46 1.66 1.39 1.52 2.64
VW β 0.27 -0.00 -0.04 0.12 0.01 -0.02 -0.00 -0.02 -0.02 0.01 -0.00
VW tβ 8.18 -0.14 -2.50 3.32 0.53 -1.51 -0.12 -2.99 -2.65 0.99 -0.28
VW R2 22.82 -0.01 0.95 2.38 0.05 0.28 -0.02 0.85 0.65 0.12 0.00

21
Table 5: Monthly Fama-MacBeth Regression
This table reports the time series averages of slopes and adjusted R2 ’s of the following cross-sectional regression

xit+1 = at + bt × x̂it+1 + eit+1

where xit+1 is the actual realized return of stock i in month t + 1 and x̂it+1 is its StockGPT return forecast. Returns
are in basis points and R2 in percentage points. t is the t-statistic of time-series mean of bt computed using Newey
and West (1987) standard error with 4 lags. Horizon 1 (2) means comparing return forecasts for month t + 1 with
actual returns in month t + 1 (t + 2). The sample is monthly from February 2001 to December 2023.

Horizon b t R2
1 3.01 2.49 0.55
2 -0.08 -0.08 0.43

22
Table 6: Monthly Portfolio Statistics
Panel A reports the return statistics of the monthly equal-weighted long-short StockGPT-based portfolios. Mcap
Filter refers to the monthly market cap percentile under which stocks are removed and Price Filter refers to the
price level under which stocks are removed. Panel B reports the return statistics of stock factors including short-
term reversal (ST Rev), momentum (Mom), long-term reversal (LT Rev), market (MKT), value (HML), size (SMB),
protability (RMW), and investment (CMA) from Fama and French (2015), as well as investment (R IA), return
on equity (R ROE), and earnings growth (R EG) from Hou et al. (2021). Mean and SD (standard deviation) are
in annualized percentage points; Mean/SD (Sharpe ratio) is annualized; Min, Max, and MDD (max drawdown) are
in percentage points; and t-Mean is t-statistic of the mean portfolio return using Newey-West standard error with 4
lags. The sample is monthly from February 2001 to December 2023.

Panel A: StockGPT Portfolios


Exchange MCap Filter Price Filter Mean t-Mean SD Mean/SD Min Max MDD
All 10 0 13.5 3.8 14.9 0.9 -14.3 18.4 -35.0
All 10 1 13.5 4.2 13.5 1.0 -14.3 17.1 -27.2
All 10 3 13.1 4.1 13.6 1.0 -15.6 12.0 -20.6
All 10 5 12.5 3.9 13.8 0.9 -14.5 12.7 -25.5
All 30 0 13.6 3.6 15.7 0.9 -18.0 25.4 -33.5
All 30 1 13.2 3.7 15.2 0.9 -17.9 23.4 -29.5
All 30 3 12.2 3.7 14.5 0.8 -17.3 13.9 -20.2
All 30 5 11.8 3.7 14.2 0.8 -15.1 13.0 -24.6
All 50 0 10.2 3.1 15.4 0.7 -19.1 18.3 -21.0
All 50 1 9.9 3.1 15.1 0.7 -19.0 17.0 -20.2
All 50 3 9.2 2.9 14.8 0.6 -18.9 13.2 -21.5
All 50 5 8.5 2.7 14.8 0.6 -17.6 12.6 -23.9
NYSE 0 0 14.7 3.2 18.5 0.8 -14.4 35.4 -21.3
NYSE 0 1 16.0 4.0 17.4 0.9 -13.6 30.8 -20.5
NYSE 0 3 13.8 3.9 16.9 0.8 -13.9 22.8 -22.1
NYSE 0 5 11.5 3.5 16.1 0.7 -15.0 17.0 -26.0

Panel B: Stock Factors


Factor Mean t-Mean SD Mean/SD Min Max MDD
ST Rev 8.8 4.0 13.0 0.7 -15.2 21.7 -16.0
Mom 2.4 0.7 16.4 0.1 -26.8 16.9 -56.3
LT Rev 0.5 0.2 10.3 0.0 -9.3 13.1 -56.4
MKT 7.5 2.2 15.9 0.5 -17.2 13.6 -51.4
HML 1.2 0.5 11.3 0.1 -14.6 13.8 -56.0
SMB 2.2 1.2 9.4 0.2 -9.2 7.1 -29.9
RMW 4.9 2.8 8.1 0.6 -8.4 10.1 -22.0
CMA 2.3 1.3 7.2 0.3 -6.7 10.6 -23.7
R IA 2.0 1.1 7.8 0.3 -6.7 11.2 -29.4
R ROE 4.3 1.9 9.9 0.4 -12.0 10.4 -29.3
R EG 5.3 2.8 8.4 0.6 -8.4 14.3 -23.6

23
Table 7: Monthly Spanning Tests
This table reports results of the following spanning test

yt = α + β × x t + e t

In Panel A, yt is the equal-weighted monthly-rebalanced StockGPT-based portfolio and xt are short-term reversal
(ST Rev), momentum (Mom), long-term reversal (LT Rev), market (MKT), value (HML), size (SMB), protability
(RMW), and investment (CMA) from Fama and French (2015), as well as investment (R IA), return on equity
(R ROE), and earnings growth (R EG) from Hou et al. (2021). In Panel B, yt is one of the factors and xt is the
StockGPT-based portfolio. α is in annualized percentage points and R2 is in percentage points. tβ is computed with
Newey-West standard error using 4 lags. The sample is monthly from February 2001 to December 2023.

Panel A: Stock Factors Span StockGPT

β tβ β tβ β tβ β tβ
α 15.58 4.66 14.94 4.16 15.43 4.79 15.27 4.87
ST Rev -0.06 -0.66 -0.08 -0.77
Mom -0.21 -1.82 -0.25 -1.99
LT Rev -0.49 -3.25 -0.31 -2.71
MKT 0.04 0.70 0.08 1.10 0.06 0.84
HML 0.09 0.51 0.01 0.10
SMB -0.18 -1.54 -0.34 -2.75 -0.40 -2.93
RMW -0.32 -1.72 -0.34 -1.76
CMA -0.46 -0.93 -0.05 -0.23
R IA 0.78 1.73 0.03 0.16
R ROE 0.02 0.11 -0.31 -1.25
R EG 0.02 0.10 -0.01 -0.05
R2 12.82 9.65 4.94 4.72
Panel B: StockGPT Spans Stock Factors
ST Rev Mom LT Rev MKT HML SMB RMW CMA R IA R ROE R EG
α 9.00 5.83 2.33 6.15 2.14 3.35 6.06 2.84 2.26 5.44 5.69
tα 4.03 1.98 0.97 1.65 0.74 1.91 3.10 1.50 1.11 2.77 2.95
β -0.02 -0.26 -0.14 0.10 -0.07 -0.08 -0.08 -0.04 -0.02 -0.09 -0.03
tβ -0.21 -1.60 -2.03 1.04 -0.78 -1.40 -1.39 -0.77 -0.33 -0.99 -0.46
R2 -0.33 5.05 3.46 0.53 0.43 1.34 2.08 0.36 -0.26 1.28 -0.17

24
A Daily Model Trained will All Stocks

Table A.1: Daily Portfolio Statistics


This table reports the return statistics of the daily long-short StockGPT-based portfolios. Mean and SD (standard
deviation) are in annualized percentage points; Mean/SD (Sharpe ratio) is annualized; Min, Max, and MDD (max
drawdown) are in percentage points; and t-Mean is t-statistic of the mean portfolio return using Newey-West standard
error with 20 lags. Portfolios are formed after excluding stocks in the bottom decile based on market value. Horizon
1 (2) refers to using return forecasts for day t + 1 to form portfolios for day t + 1 (t + 2). EW (VW) refers to
equal-weighting (value weighting). Price Filter refers to the price level under which stocks are removed. The sample
is daily from January 2001 to December 2023.
Horizon Weight Price Filter Mean t-Mean SD Mean/SD Min Max MDD
1 EW 0 82.5 10.6 16.4 5.0 -9.0 13.8 -24.4
1 EW 1 71.1 10.3 15.7 4.5 -8.4 13.4 -25.9
1 EW 3 48.0 8.9 15.0 3.2 -9.3 14.1 -37.4
1 EW 5 36.3 7.8 14.7 2.5 -9.3 13.7 -47.7
1 VW 0 18.7 3.5 24.9 0.8 -16.9 17.7 -61.5
1 VW 1 17.1 3.3 24.4 0.7 -14.0 15.7 -62.8
1 VW 3 12.6 2.6 23.2 0.5 -13.4 17.7 -57.7
1 VW 5 10.4 2.3 22.4 0.5 -12.5 15.9 -58.7
2 EW 0 16.3 4.9 13.9 1.2 -8.6 13.7 -40.8
2 EW 1 14.7 4.6 13.7 1.1 -9.2 14.3 -37.4
2 EW 3 12.9 4.3 13.5 1.0 -9.9 13.9 -31.7
2 EW 5 10.8 3.8 13.4 0.8 -10.1 13.5 -29.2
2 VW 0 -0.6 -0.1 23.7 -0.0 -12.3 17.4 -75.1
2 VW 1 -1.1 -0.2 23.0 -0.0 -11.6 17.0 -78.5
2 VW 3 -3.6 -0.8 21.8 -0.2 -11.1 15.3 -84.7
2 VW 5 -4.7 -1.0 21.2 -0.2 -11.0 15.2 -87.1

25

You might also like