Portfolio Construction and Risk Management
Portfolio Construction and Risk Management
Anton Vorobets
This book is written based on the need to bridge quantitative analysis and investment management
in practice, particularly focusing on portfolio construction and risk management. The approach can
be characterized as being quantamental, i.e., a mix of quantitative analysis and human and machine
interaction. It is based on the philosophy that some things are really hard for machines and statistical
models to do, because these usually require a significant amount of stationary data to work well, while
humans are better capable of quickly developing an understanding based on little or imperfect informa-
tion. Hence, most people likely have the highest probability of being successful using a quantamental
approach to investment management, but all the methods can be used for fully systematic investing.
The book does not attempt to theorize or quantify human behavior like some finance and economics
academics do with utility theory. Conventional utility theory has poor empirical support and limits
the analysis to methods that have very little to do with real-world market behavior. In relation to
investor preferences, our hypothesis is that investors characterize investment risk as large losses, not
squared deviations from the mean. It is very hard to reject this hypothesis in practice. We will in
particular focus on the Conditional Value-at-Risk (CVaR) investment risk measure, because it has
many nice properties and is easy to interpret. As a consequence, CVaR is becoming the preferred tail
risk measure among both investment managers and market makers.
The goal is to keep the book concise and to the point. It attempts to keep the mathematics
practically rigorous while referring to proofs of very technical results instead of replicating them. It
does not spend time on introducing general mathematical concepts, because these are readily available
in other great books. Hence, it is a prerequisite that you understand linear algebra, calculus including
convex constrained optimization with Lagrange multipliers, probability theory, multivariate time series,
and machine learning/artificial intelligence/econometrics/statistics. Finally, we will use Python and
in particular the fortitudo.tech1 package for the code examples.
This book is cohesive in the sense that it gives you a complete investment risk and analysis frame-
work. The approach is scientific as it tries to address many of the complex nuances of real-world
investment markets. It is based on my experience managing complex multi-asset portfolios for insti-
tutional clients and my regular dialogues with institutional asset managers about the problems they
experience in practice.
The book will naturally draw on my articles2 , while presenting the investment framework in a more
coherent way to make it perfectly clear how it should be used from start to finish including its subtle
1 Available at https://github.com/fortitudo-tech/fortitudo.tech
2 Available at https://ssrn.com/author=2738420
i
nuances. I omit aspects related to signal extraction and security selection alpha. You should rather see
the framework from this book as the general principles for a good investment calculator, which allows
you to get the most out of your investment risk budget and, hence, generate portfolio construction
and risk management alpha. Alpha related to security selection and market timing is so elusive that I
would trade it myself and not tell anyone how I did it.
This book is very different from the current mainstream academic finance and economics books
that continue to spend time on theories and methods like utility theory, CAPM, mean-variance, and
Black-Litterman. I think these methods do more harm than good in practice due to their highly
unrealistic market assumptions. However, they are still being taught mainly due to academic and
commercial vested interests. We will occasionally use mean-variance to build intuition in the idealized
case, but it is never recommended to actually use mean-variance to manage portfolios in practice. The
elliptical distribution assumption is grossly oversimplifying when it comes to representing the dynamics
of real-world markets well. All methods presented in this book operate directly on fully general Monte
Carlo distributions with associated joint probability vectors.
The book is written for investment practitioners who employ a scientific approach to investment
management that potentially combines data with human discretion. By scientific, I mean constantly
testing the theories and methods against real-world market behavior. Readers are generally encouraged
to critically examine the content of this book and suggest improvements that will make the framework
even better for skillfully navigating investment markets in practice. Hypothetical issues or introduction
of constraints due to theories with poor empirical support will not be considered.
The PDF version of the book and the accompanying Python code are provided free of charge online,
while crowdfunding made it possible to write it. You can still support this project and get perks such
as recognition in the acknowledgments section below or personal access to trying out an institutional-
grade implementation of the investment framework. Top 10 contributors by February 3, 2025 will
have their name written in this preface (if they wish) in addition to getting three months access to
an institutional-grade implementation of the investment framework described in this book. Finally,
all significant contributors will receive a one-year complimentary Substack subscription containing
exclusive case studies. It is a great opportunity to master the theory and methods while engaging with
a community of others.
The book occasionally uses proprietary software for some of the case studies and examples. I
decided to do this in order to show readers some of the more advanced analysis that they can perform
using the methods from this book, although the accompanied code cannot be provided. In most cases,
the code is available, and it is an integral part of studying the content of this book, but readers should
not expect the accompanying code to be of production quality. The accompanying code is given
without any warranty under the GPLv3 license. The author cannot be made liable for any potential
damages stemming from using the code.
To support this project and potentially receive perks for it, see the crowdfunding campaign at:
https://igg.me/at/pcrm-book
It doesn’t matter how beautiful your theory is. If it disagrees with experiment, it’s wrong.
- Richard Feynman
ii
Acknowledgments
This book has been written through a crowdfunding campaign at: https://igg.me/at/pcrm-book
Contributors have also often shared their feedback on the content, making the book better and easier
to understand for all readers. All contributions are highly appreciated, while some people have con-
tributed significant amounts and allowed their names to be shared in this section. Until February 3,
2025 the below lists will be updated and expanded. Top 10 contributors will get access to exploring
an institutional-grade implementation of the investment framework and really test out their newly
acquired knowledge in practice. Potential ties among top 10 contributors are settled on a first come
first served basis, giving early contributors an advantage.
A special thank you goes to Laura Kristensen for thoroughly reading this book and providing ex-
tensive feedback.
Top 10 Contributors
See the current top 10 at: https://antonvorobets.substack.com/p/pcrm-book
Platinum Contributors
Charlotte Hansen and Hitesh Sundesha.
Gold Contributors
Silver Contributors
Andrea Bordoni.
Bronze Contributors
Matteo Nobile, Roger McIntosh, Mark Brezina, Veliko Dinkov Donchev, Irina Johansen, Pedro Jorge
Feijoo Videira e Castro, Hans-Peter Schrei, Zarko Stefanovski, Radu Briciu, and four anonymous.
iii
Licenses
The Portfolio Construction and Risk Management book © 2025 by Anton Vorobets is licensed under
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International. To view a copy of
this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/
The accompanying code to the Portfolio Construction and Risk Management book © 2025 by Anton
Vorobets is licensed under version 3 of the GNU General Public License. To view a copy of this license,
visit https://www.gnu.org/licenses/gpl-3.0.en.html
iv
Contents
3 Market Simulation 24
3.1 Stationary Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Projection of Stationary Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 Time- and State-Dependent Resampling . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.1.1 Multiple State Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.2 Generative Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.2.1 Variational Autoencoders (VAEs) . . . . . . . . . . . . . . . . . . . . . 41
3.2.2.2 Generative Adversarial Networks (GANs) . . . . . . . . . . . . . . . . . 44
3.2.3 Perspectives on No-Arbitrage and Stochastic Differential Equations . . . . . . . . 44
3.3 Computing Simulated Risk Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Simulation Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 Better CVaR and Variance Optimization Backtesting . . . . . . . . . . . . . . . . . . . . 52
4 Instrument Pricing 55
4.1 Bond Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.1 Nominal Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.2 Inflation-Linked Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1.3 Credit Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Equity Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
v
4.2.1 Fundamental Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.2 Factor Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Demystifying Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.1 A Note on Forwards/Futures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.2 The Underlying as a Risk Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4 Dynamic Strategies and a Delta Hedging Case Study . . . . . . . . . . . . . . . . . . . . 62
4.5 Illiquid Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.6 Multi-Asset Pricing Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6 Portfolio Optimization 90
6.1 Exposures and Relative Market Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.2 CVaR vs Variance Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2.1 Solving CVaR Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.3 Risk Budgeting and Tracking Error Constraints . . . . . . . . . . . . . . . . . . . . . . . 97
6.3.1 Validating Portfolio Optimization Solvers . . . . . . . . . . . . . . . . . . . . . . 101
6.4 Parameter Uncertainty and Resampled Portfolio Stacking . . . . . . . . . . . . . . . . . 102
6.4.1 Return and Risk Stacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.4.2 Derivatives and Risk Factor Parameter Uncertainty . . . . . . . . . . . . . . . . 117
6.4.3 Multiple CVaR Levels Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.4.4 Perspectives on Resampled Portfolio Stacking Targets . . . . . . . . . . . . . . . 119
6.5 Portfolio Rebalancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
vi
Access
https://antonvorobets.substack.com/p/pcrm-book
https://github.com/fortitudo-tech/pcrm-book
vii
Chapter 1
Our first task is to generate realistic prior simulations of the market R based on observed historical
market data D ∈ RT ×N . Realistic simulations are characterized by their ability to capture the stylized
market facts presented in Chapter 2 and ideally additional subtle nuances of real-world investment
markets. Another important point is that analysis and optimization methods must be able to handle
the fully general simulations R and associated probability vectors p and q. Mean-variance analysis
clearly fails in this regard, because it reduces the market to an elliptical distribution and only allows
for linear and cross-sectionally constant dependencies through the covariance matrix.
Note that the market representation in (1.1.1) and most of the analysis in this book are one-period
in nature, while we do not make any assumptions about living in a one-period world. Hence, there is
an implicit horizon in the matrix R. Sometimes, we will be explicit and write Rh , h = 1, 2, . . . , H, as
in Chapter 3 about market simulation, while maintaining the more concise notation throughout most
of the book. For dynamic investment strategies, it is important to simulate entire paths, while their
cumulative returns can still be analyzed in a meaningful way for some horizon h ∈ {1, 2, . . . , H}.
1
When generating Monte Carlo simulations of the market, the focus will be on generating simulations
of factors and then translating their simulations into instrument price and return simulations. We will
distinguish between two kind of factors: risk factors and market factors. A risk factor is defined as
something that goes directly into the pricing function of an instrument, for example, US government
bond zero-coupon interest rates are risk factors for US government bonds. A market factor is something
that might have a statistical effect on the price or return of an instrument, for example, Purchasing
Managers’ Index (PMI) numbers, but does not directly enter into the pricing function of an instrument.
Risk factors are explored much more carefully in Chapter 4 about instrument pricing.
Once we have good simulations of prices, returns, and factors, we probably want to incorporate
adjustments based on information that might not be available in the historical data. We call this
(subjective) market views. We might also want to examine how our simulations behave in adverse
market scenarios, which we call stress-testing. No matter which of the two cases it is, the fundamental
method we use is called Entropy Pooling (EP), introduced by Meucci (2008a). EP minimizes the
relative entropy (Kullback–Leibler divergence) between the prior probability vector p and the posterior
probability vector q subject to linear constraints on the scenario probabilities, i.e.,
q = argmin xT (ln x − ln p)
x
subject to
Gx ≤ h and Ax = b.
Note that we use ln x to mean the logarithm of each element of x, and that ≤ and = are used to denote
element-wise (in)equalities. Entropy Pooling can be used in many sophisticated ways, see for example
Vorobets (2021) and Vorobets (2023), that we will carefully explore in Chapter 5.
After implementing the market views and stress-testing, we likely want to optimize our portfolios
subject to various investment constraints and include portfolio optimization parameter uncertainty, see
Kristensen and Vorobets (2024) and Vorobets (2024). Portfolio optimization is the topic of Chapter 6,
where we will focus on CVaR optimization with risk targets and risk budgets in addition to the ability
to seamlessly handle derivatives instruments by separating relative market values v ∈ RI from relative
exposures e ∈ RI , see Vorobets (2022a) and Vorobets (2022b).
Chapter 7 is about general risk and return analysis, with a particular focus on the nuances of tail
risk hedging and management. The general idea is that the methods in the first six chapters will
allow us to build well-diversified portfolios, which we can additionally equip with tail risk hedges if
we wish. Outright tail risk hedging often comes at a significant negative (volatility) risk premium.
Hence, tail risk hedges should ideally be designed with the objective of giving a positive return when
key diversification assumptions fail, but outright hedging can be considered on a tactical basis.
Chapter 8 contains a summary of the investment risk and analysis framework introduced in this
book in addition to a comparison with old methods such as mean-variance and Black-Litterman (BL).
In Chapter 8, we will see that in the best case the old framework is a simple subset of the new, while
BL contains so many questionable aspects that it does not even produce a correct updating of the
CAPM prior to a posterior distribution when returns are normally distributed. For this reason, it is
never recommended to use the old framework in practice, and especially not the BL model.
2
The rest of this chapter presents core principles for reasoning about market states, structural breaks,
and time-conditioning in addition to the essence of investment and risk management. These principles
will be important to keep in mind throughout the book and should apply to the majority of investment
markets and strategies. Very niche investment markets or strategies might have characteristics that
are so unique and different from the majority that they fall outside the scope of a general portfolio
construction and risk management book like this one.
3
Let us now imagine that the coin is biased such that p = 0.9, making it obvious that you can
expect to earn money by systematically betting on heads. If you are not careful, you can still lose all
your money as there is no guarantee that the coin will come out heads. Hence, it is still impossible for
you to call the realization of the coin ct at any time t. How you decide to invest in this coin will to
a large extent be a function of your risk willingness. The highest expected return will be achieved by
betting all your money on heads at each point in time t = 1, 2, . . . , T , while it also gives you the highest
probability of losing all your money. A reasonable suggestion would be to find a strategy where the
trade-off between the expected return and the probability of losing all your money is the best. While
we are not going to spend time on that in this book, the interested reader can study this further.
Continuing with the coin flipping, let us imagine a situation where p = 0.1. In this case, betting
on tails will obviously be associated with a positive expected return. However, let us say that you are
constrained to only betting on heads, i.e., you are “long-only” heads, while you can adjust the size of
your bets before each coin flip at time t. If the coin is in a constant state with p = 0.1, while you are
only allowed to bet on heads, it only make sense for you to participate if you are a real thrill seeker.
However, if the coin switches between p ∈ {0.1, 0.5, 0.9}, you still have the opportunity of generating
a positive expected return if you are able to infer the value of p and size your investments accordingly.
To formalize the above a bit more, let us introduce a state variable zt ∈ {zt,1 , zt,2 , zt,3 }, t =
0, 1, . . . , T , and imagine that states changes according to the following transition probability matrix
0.9 0.1 0.0
T = 0.1 0.8 0.1 .
Each entry in the transition matrix represents the probability Tij for transitioning from a state i to a
state j. This evolution of states is known as a Markov chain, because the next state zt+1 only depends
on the current state zt .
A stationary probability vector π can be computed for the transition matrix T , i.e., a row vector π
so that πT = π. In our particular example, the transition matrix is called doubly stochastic, because
both its rows and columns sum to 1. For doubly stochastic matrices, the stationary distribution will
be uniform, i.e., π = (1/3, 1/3, 1/3) in our case. Readers can verify that this is indeed the case by
performing the matrix multiplication πT . Here, by the stationary distribution we mean what the
Markov chain will produce over the long run. Hence, if the time period T is sufficiently long, we
expected the three different states to occur equally frequently regardless of the initial state.
Readers who are interested in exploring the characteristics of the above Markov chain further
are encouraged to examine the accompanying code to this section. The code contains a function for
simulating this Markov chain with some elementary analysis illustrating that it behaves as expected.
The important realization one has to make is that there is stochasticity in the state, which determines
the probability of the coin coming up heads p, and finally there is stochasticity in the outcome of the
coin. This is an important conceptual separation. In practice, we would only observe outcomes of the
coin and have to estimate both the transition matrix T and the associated heads probabilities p for
each state. If we are only good at estimating one of them and very bad at estimating the other, our
strategy is unlikely to be successful.
4
In the previous paragraphs, a lot new terminology has been vaguely introduced. To maintain our
focus, we will not delve deeper into Markov chains and simply refer the interest readers to Norris
(1997). This might be unsatisfactory for some readers, but we do not want the mathematical formality
to overshadow the main points. The objective of the above presentation was to introduce the concept
of market states, if we for a second imagine that the coin flipping is a market that someone would
allow us to invest in. It was also to introduce the concept of short-term and long-term behavior, by
showing that the current state helps us to infer the likely next state, while it does very little to help
us infer the state many time-steps into the future. You can see this in practice in the accompanying
code to this section.
The analysis so far has been very nice and simple, with a very well-behaved coin having constant
states and state transition probabilities T . Now let us imagine that a sudden appearance of new state
/ {0.1, 0.5, 0.9} can happen at a random point in time, causing a fundamental change in
p ∈ [0, 1] ∈
the transition matrix T . Or maybe one of the existing states stops appearing all together. No matter
which case it is, we will refer to any changes to the transition matrix T as a structural break.
Structural breaks pose additional challenges because we cannot simply lean back and enjoy the
profits of our strategy that might have been very successful historically at inferring the state zt and
sizing investment into the heads outcome accordingly. We must constantly monitor the performance of
our investment strategy and adapt it to the new reality. If we insist on using fully systematic strategies,
we have to rely on having sufficient data showing us that a structural break has occurred. This will
probably lead to a period of lower expected returns and a higher probability of losing money, with the
investment strategy being suboptimal until we adjust it.
If we are allowed to adjust our strategy based on other information, for example, someone reliable
communicating to us that a structural break has occurred, we have the opportunity to adapt before we
see realizations of the structural break in the data. In real-world markets, such announcements could,
for example, be a central bank communicating a new forward guidance on their monetary policy. The
ability to mix historical data with forward-looking adjustments based on qualitative inputs is what
makes the quantamental investing approach appealing.
For real-world investment markets, prices, returns, and factors have much more complex distri-
butions than the Bernoulli distribution of the coin, and the states are much more abstract than the
probability of heads. It is probably also very hard to find true stationarity in the data, because mar-
kets are constantly evolving. But the concepts of market states and structural breaks will still be
valuable for us when thinking about investment markets, and you are surely going to hear investment
practitioners talk about these things. At the very least, we are going to use these concepts in Section
3.2.1 about time- and state-dependent resampling methods for market simulation. Note also that
practitioners sometimes refer to states as regimes.
Since real-world investment markets are governed by much more complex states, we are unlikely
to be able to capture all necessary state-dependent information at any given point in time by relying
solely on interpretable state variables. For example, we might use the VIX index and the slope of the
interest rate curve as state variables for real-world markets, but there is likely still much more that
we need to condition on to fully capture the market state. Hence, we probably need a residual state
variable that we can condition on.
5
Time-conditioning has historically been a frequently used way to capture market state. Popular
examples are the exponentially weighted moving average (EWMA) and generalized autoregressive
conditional heteroskedasticity (GARCH) models. The hypothesis is that recent data tells us more
about the immediate future than old data. While it is probably true that markets tomorrow are more
similar to markets today than 30 years ago, it is a strong assumption that time-conditioning alone will
capture all the necessary information. In this book, we will view time-conditioning as a way to capture
residual information that our other state variables are potentially unable to capture.
Here comes another real-world blow. All the perspectives related to market state, structural breaks,
and time-conditioning might be completely wrong for how investment markets actually behave. While
we can have full control over the coin flipping experiment, we have no control over the distributions of
investment markets. These perspectives are simply the ones that we currently have the mathematical
machinery to handle, so our analysis will naturally be biased by them. Hence, when we analyze the
stylized market facts in Chapter 2, it will be through this lens. The same goes for our simulation
approaches in Chapter 3. We are, however, not stipulating that market distributions can be fully
characterized by their state, or that market states behave according to a Markov chain. The hypothesis
is rather that something along these lines is approximately correct, where we are at least not introducing
constraints that obviously violate the characteristics of real-world markets.
6
Let us assume that we characterize the states of bonds and equities by the following assumptions
on the mean vector and covariance matrix:
! !
0.05 0.082 −0.2 · 0.08 · 0.3
µof f = and Σof f = ,
0.06 • 0.32
! !
0.03 0.12 0
µ= and Σ = ,
0.1 • 0.252
! !
0.02 0.12 0.2 · 0.1 · 0.2
µon = and Σon = ,
0.125 • 0.22
While we can optimize portfolios for each state, we must decide on one portfolio to invest in. It is,
however, still interesting to examine what happens to the portfolio in each of the different cases that
we have imagined. For example, what happens to the base case portfolio in the risk off case. For risk
management purposes, we would then ask if it is a risk that we are comfortable with, or if we should
attempt to reduce or eliminate it through an adjustment of the portfolio or with tail risk hedging, as
presented in Section 7.3.
Figure 1.3.1 shows the return distributions of the optimal portfolios from Table 1.1 in the risk
off state. It is clear that if we decide to invest in the optimal portfolio for the base case, it will be
suboptimal in the risk off case. Even worse is the optimal portfolio for the risk on state, which performs
a lot worse than the risk off optimal portfolio. Considerations like these are what we will define as risk
management as opposed to general diversification.
One could argue that the risk-adjusted performance loss in the risk off case is not substantial for
the base case optimal portfolio, while the risk on portfolio exposes us to significant losses if we end up
in the risk off case. Hence, unless we strongly believe that the risk on or risk off case is very likely, we
should probably choose the base case portfolio. An important consideration is also how large the risk
overshoot will be for the risk on portfolio if we are in a risk off state, something that we will examine
much more carefully in Section 6.3.
7
Figure 1.3.1: Portfolio return distributions in the risk off state.
Diversification, e.g., building a portfolio of both bonds and equities can of course also be seen as a
form of risk management, but we will define risk management more specifically as a careful analysis
of adverse scenarios and assessment of how to deal with them in this book. Diversification can be
defined in many different ways, while we will loosely think about it as the ratio between the sum of
the standalone risks of the individual exposures and actual portfolio risk, i.e.,
PI
i=1R (ei )
d= ,
R (e)
where R (ei ) is the risk of the individual exposures, while R (e) is the risk of the portfolio.
The risk measure R can for example be CVaR, variance, or VaR. For coherent risk measures, see
Artzner et al. (1999), it holds that
I
X
R (e) ≤ R (ei ) , (1.3.1)
i=1
which implies that d ≥ 1 when the portfolio risk R (e) is positive, which will usually be the case. This
property is formally referred to subadditivity and sometimes called the diversification principle. VaR
is not a coherent risk measure, because it does not respect this diversification principle. VaR also has
8
several other issues, which is why CVaR is becoming the preferred investment tail risk measure for
both market makers and investment managers.
Readers are encouraged to examine the accompanying Python code to this section to see how the
computations have been performed for the optimal portfolios from Table 1.1 and the portfolio return
plot from Figure 1.3.1. In addition, there are joint return plots for the three different states, while
Figure 1.3.2 shows the joint return plots for the base and risk on states.
Figure 1.3.2: Joint return plots for bonds and equities in the base and risk on state.
Readers are also encouraged to spend some time thinking about what the graphs in Figure 1.3.2
show as we will be looking at graphs similar to these throughout the book. Think, for example, about
which of the two graphs shows a positive correlation between bonds and equities, in which graph
equities have the highest variance, and what it means that the graph is darker and lighter in different
areas. In Figure 1.3.2, we have purposefully simulated the distributions to generate the Monte Carlo
market simulation matrix R from (1.1.1). The central graph is a contour plot of the joint distribution,
while the two axes show the marginal distributions.
Finally, take a moment to reflect on the content in this chapter. Although the perspectives have
been introduced in highly oversimplified cases, the concepts of market states, structural breaks, time-
conditioning, and the essence of investment and risk management will carry over to more complex
cases and analysis. These concepts have simply been presented in the most basic cases to make them
perfectly clear in this chapter. As we increase the complexity of the market simulation and analysis,
they might become less apparent, but they will be fundamentally the same.
In the rest of this book, we will start focusing more on tail risk and risk budgeting in addition to
risk contribution analysis. In some cases, we will also analyze the market simulations directly either
through joint plots similar to 1.3.2 or other conditional perspectives. All this will show that once
we abandon the focus on the covariance matrix, we can perform much deeper and more meaningful
analysis of investment markets.
9
Chapter 2
This chapter presents some stylized market facts, i.e., typical statistical characteristics of historical
multi-asset investment data. Most of these characteristics should be well-known to experienced invest-
ment managers, but they will be presented from perspectives that are perhaps new to them. For people
with no or very little practical experience with investment markets, this chapter will be essential to
understand how gross the mean-variance oversimplification is. Hence, mean-variance should not be
used to manage investments in practice, despite it being widely promoted by some academics and other
people who have a reputational or commercial vested interest in the approach.
This chapter uses daily historical data for 10 US equity indices (S&P 500 and nine sector indices),
three US treasury yields (the 13 week, 10 year, and 30 year yields), and the VIX index since December
1998. This gives us 6,493 daily observations including a tech bubble burst, a financial crisis, a COVID
crisis, and several monetary easing and tightening cycles. Hence, the data should be sufficient to
extract some interesting historical insights about investment markets characteristics.
We start with Section 2.1 about the risk and return trade-off, which is the tendency for riskier
instruments to give a higher expected return. This is quite intuitive, because most people are risk
averse by nature. The risk and return trade-off was first formalized by Markowitz (1952) and Sharpe
(1964) with the introduction of the academically celebrated capital asset pricing model (CAPM). As
we will see in this chapter, reality is much more nuanced than the CAPM predicts.
Section 2.2 continues with an illustration of how there is a tendency for market risk to cluster
over time, i.e., periods with high market volatility tend to be followed by periods with continued high
volatility and vice versa. This shows that risk is somewhat predictable, while expected returns are
more challenging to predict, and outcome prediction is probably close to impossible with publicly
available information as explained in Section 1.2.
Section 2.3 illustrates the volatility risk premium, which is the tendency for option implied volatility
to be higher than subsequent realized volatility. This empirically reveals investors’ preferences for
avoiding losses. Section 2.4 contains additional insights that are usually conveniently ignored by
some finance and economics academics, but that any investment manager should be familiar with
and incorporate into their portfolio construction and risk management. Section 2.5 summarizes the
consequences of the stylized market facts for investment modeling and analysis. Finally, Section 2.6
presents a traditional backtest of CVaR and variance optimization and explains why it is naive.
10
2.1 Risk and Return Trade-Off
The foundation for quantitative analysis of the trade-off between risk and return was laid by Markowitz
(1952) and Sharpe (1964). The CAPM model states that
E [Ri ] = Rf + βi (E [Rm ] − Rf ) ,
where
Cov (Ri , Rm ) σi
βi = = ρi,m .
Var (Rm ) σm
Here, Ri is the return of instrument i, Rf is a risk-free rate, and Rm is the return of “the market”, for
example, a broad equity index such as S&P 500 for an equity application of the model. The CAPM
model has been empirically rejected on many occasions, and many other factors that attempt to explain
the expected returns of investments have been introduced, for example, by Fama and French (1992).
The CAPM is a direct consequence of the mean-variance framework, which this book has already
underlined builds on fundamentally flawed assumptions about the market and investor preferences.
Hence, it is no surprise that the CAPM fails empirically. However, many articles and books have been
written about CAPM and mean-variance, so there are significant vested interests in maintaining the
relevance of this work among some academics as well as technology and course providers. But this is
of course a poor reason for using these methods to manage your own or other peoples’ money.
Markowitz (1959) already recognized that the focus should rather be on the investment down-
side, i.e., avoiding losses. However, with the computational technology available in 1959, this was an
unthinkable problem to solve in practice. In fact, estimating a covariance matrix was perceived as
computationally intensive and in many cases infeasible. Luckily, we no longer live in the 1950s and
have now discovered fast and stable algorithms for solving fully general CVaR optimization problems.
Hence, we are no longer technologically constrained in a way that makes tail risk analysis based on
realistic market simulations practically infeasible.
Below, we compute some statistics and analyze the daily historical data of S&P 500 and nine US
sector equity indices. The indices are selected based on data availability to maximize the number of
observations, so newer sectors like Real Estate are not included. However, the main conclusions are
unlikely to change due to an inclusion of these sectors.
We first start with some descriptive statistics for the daily returns in Table 2.1. We immediately note
that the daily average return is quite low but positive for all indices. The daily risk numbers look quite
high compared to the average return, but we must remember that returns compound exponentially,
while risk does not. This is evident from Table 2.2 and Table 2.3 that show monthly and yearly return
statistics by setting the parameter H = 21 and H = 252 trading days, respectively.
We note that the kurtosis generally decreases as the horizon increases. However, it is important not
to fool oneself, because the decrease in kurtosis is often combined with an increase in negative skewness.
Hence, when we just look at the statistics from the table, it is important to have the combination of
skewness and kurtosis in mind and remember that these statistics do not tell us the full story about
how the distribution actually looks. Skewness and kurtosis will be analyzed more carefully in Section
2.4, in addition to other visually interesting aspects of the equity return distributions.
11
Index Mean Volatility Skewness Kurtosis 90%-CVaR
Materials 0.044% 1.502% -0.020 9.469 2.735%
Energy 0.048% 1.821% -0.247 13.885 3.265%
Financial 0.038% 1.817% 0.313 17.526 3.163%
Industrial 0.043% 1.338% -0.163 10.649 2.477%
Technology 0.048% 1.633% 0.273 10.139 3.036%
Consumer Staples 0.031% 0.964% -0.097 10.945 1.756%
Utilities 0.036% 1.224% 0.207 14.821 2.225%
Health Care 0.040% 1.130% -0.021 12.033 2.062%
Consumer Discretionary 0.046% 1.428% -0.226 8.958 2.644%
S&P 500 0.032% 1.221% -0.153 12.863 2.261%
We continue with a visualization of the trade-off between risk and return in Figure 2.1.1, where
the historical daily 90%-CVaR is plotted against the historical average return from Table 2.1. From
Figure 2.1.1, we see that there is some approximate trade-off between tail risk and return. For a more
careful analysis, we could analyze individual stocks and consider whether some instruments have safe
haven or convexity properties that justify a lower return similar to options presented in Section 2.3.
12
Figure 2.1.1: Historical daily US index average return and 90%-CVaR relationship from Table 2.1.
13
Figure 2.2.1: Historical daily returns for S&P 500.
14
2.3 Volatility Risk Premium
The volatility risk premium refers to the fact that option implied volatility σimplied tends to be higher
than subsequent realized volatility σrealized . To understand what this means, this section briefly
introduces European-style options for readers who are completely unfamiliar with derivatives. We
start by noting that European refers to the exercise properties of the options, not geography. For a
more detailed introduction to options and other derivatives, see Hull (2021).
The two most common options types are put and call options. A put/call option gives the option
holder the right, but not the obligation, to sell/buy an underlying security at a predetermined strike
price K. Hence, a put option’s payoff at expiry is max (K − ST , 0), while a call option’s payoff is
max (ST − K, 0), where ST is the value of the underlying security at the option’s expiry time T . The
payoff profiles for put and call options are shown in Figure 2.3.1 as a function of the underlying value
ST . As we see from Figure 2.3.1, the option payoff introduces convexity, i.e., we win something if the
market moves in the direction we want and do not lose anything, besides the initial option price, if it
moves in the opposite direction.
Options have many characteristics that are similar to insurance contracts. For example, the payoff
of an insurance on your house or car is very similar to a put option payoff, if you think of your house
or car as the underlying of the insurance contract. There is obvious value in such insurance contracts,
so they have a premium associated with them. In this regard, it is important to separate between
exposure/notional, which you can think of as the value of your house or car, and option market value,
which you can think of as the price you pay for the insurance. See Section 6.1 for more precise
definitions of exposures and (relative) market values.
15
There are various ways in which market makers price options, but they most commonly use stochas-
tic differential equations (SDEs) as presented in Section 3.2.3. The most important principle is that
the prices they quote should not allow for arbitrage, which is a risk-free profit opportunity. SDEs
offer nice guarantees when it comes to no-arbitrage, but they are usually a poor description of actual
high-dimensional market behavior as explained in Section 3.2.3.
Although each market maker has their own pricing models and quote prices that probably do not
allow arbitrage, there is a market convention to quote option prices in terms of the implied volatility
σT,K , which is basically a maturity T , strike K, and underlying level St normalized price for the option.
The implied volatility is calculated using the famous Black and Scholes (1973) formula, which assumes
that the underlying security follows a geometric Brownian motion, see Section 3.2.3.
In relation to options’ implied volatilities, there is a popular index known as the VIX, commonly
referred to as “the fear index”, which is designed to be a replication of a variance swap strike σstrike
based on listed options’ strikes and expiries. A variance swap is a derivative that allows investors to
speculate directly on the implied variance of some underlying and has payoff σrealized Nvar ,
2 2
− σstrike
where Nvar is the (variance) notional/exposure of the variance swap contract. We can think of VIX
as being the approximate σstrike for a one-month variance swap. The difference is that the variance
swap is an over-the-counter (OTC) instrument that is not traded on exchanges, so the strike σstrike
can be calculated more precisely when we are not limited to exchange traded options.
Figure 2.3.2 illustrates the historical relationship between S&P 500 and the difference between
realized volatility and one-month lagged VIX. This helps us understand the approximate behavior of
a variance swap contract and leads to a historical volatility risk premium estimate of 3.64%.
Figure 2.3.2: Historical relationship between S&P 500 and realized - implied volatility.
16
2.4 Skewness, Kurtosis, and Other Interesting Insights
This chapter has so far focused on the equity time series and statistics. This section will incorporate
some multi-asset perspectives and examine skewness and kurtosis further. All these characteristics will
be important for us to keep in mind when modeling investment markets in Chapter 3 and subsequently
analyzing and optimizing portfolios as presented in chapters 5-7.
We start by visualizing what the skewness and kurtosis numbers from Section 2.1 actually look
like. Looking just at the statistics makes it hard to get a sense of how the distributions are, because
the same statistics can lead to very different distributional shapes. It is also important to keep the
combination of skewness and kurtosis in mind. For example, a distribution that has slightly more
kurtosis than the normal distribution but significant negative skewness can have very significant left
tails. A good example of this is the Technology index, where the yearly kurtosis of 3.26 might seem
like it is “close to a normal distribution”, but the combination with a significant negative skewness of
−0.588 can introduce some very significant left tails as shown in Figure 2.4.1. Readers can use the
accompanying code to create graphs similar to Figure 2.4.1 for the other equity indices and verify that
they are skewed and fat-tailed in complex ways.
Figure 2.4.1: Yearly return distribution for the US Technology equity index.
From Figure 2.4.1, we can conclude that investment return distributions can be skewed and fat-
tailed in complex ways, i.e., it is not just an adjustment of some analytically known distribution that
allows us to approximate their empirical properties well. Another important point is that if we use
oversimplified analytical distributions, such as the normal or t-distribution, we are not only introducing
a large approximation error for one investment in our portfolio but likely for all of them.
17
We can also conclude that just looking at statistics such as the mean, variance, skewness, kur-
tosis, and correlation can be misleading, because they can effectively remove many of the nuances
of investment distributions. Hence, it is important to actually visually look at the data to under-
stand its characteristics. Once we do that, it becomes immediately clear that the elliptical distribution
assumption inherent to mean-variance analysis is grossly oversimplifying for the marginal distributions.
Another important point about elliptical distributions, which many people tend to forget, is that
they only allow for linear and cross-sectionally constant dependencies. However, this is also a gross
oversimplification of investment market dependencies. Not only for portfolios that contain nonlinear
derivatives such as put and call options, where the violation is obvious, but also for plain vanilla
portfolios that contain cash equity indices and bonds. For example, if we look at the 10% worst yearly
historical scenarios for S&P 500, aligning with the 90%-CVaR focus from Section 2.1, we can compute
the correlation matrix in these scenarios and the remaining 90% of scenarios. This is done in Table
2.4 and Table 2.5 below, where we clearly see a significant change in the correlations, e.g., column 4.
0 1 2 3 4 5 6 7 8 9
0, Materials 100.0 54.8 89.7 88.5 -42.7 54.4 56.0 62.5 85.6 83.2
1, Energy 54.8 100.0 60.8 77.1 -10.9 52.3 65.0 52.4 23.2 63.1
2, Financial 89.7 60.8 100.0 91.7 -40.4 45.8 53.7 72.5 80.1 85.9
3, Industrial 88.5 77.1 91.7 100.0 -30.6 65.1 71.6 68.2 72.8 87.7
4, Technology -42.7 -10.9 -40.4 -30.6 100.0 -35.0 -35.0 4.7 -29.6 5.4
5, C. Staples 54.4 52.3 45.8 65.1 -35.0 100.0 89.4 7.0 45.3 40.8
6, Utilities 56.0 65.0 53.7 71.6 -35.0 89.4 100.0 12.7 40.2 44.6
7, Health Care 62.5 52.4 72.5 68.2 4.7 7.0 12.7 100.0 58.4 81.7
8, C. Discretionary 85.6 23.2 80.1 72.8 -29.6 45.3 40.2 58.4 100.0 76.7
9, S&P 500 83.2 63.1 85.9 87.7 5.4 40.8 44.6 81.7 76.7 100.0
Table 2.4: Correlation matrix for the 10% worst S&P 500 scenarios.
0 1 2 3 4 5 6 7 8 9
0, Materials 100.0 43.4 65.9 83.7 50.5 47.3 32.4 51.3 70.8 76.2
1, Energy 43.4 100.0 34.5 44.0 3.3 16.4 34.9 9.1 5.0 29.1
2, Financial 65.9 34.5 100.0 82.8 42.9 51.3 31.6 61.4 71.2 80.1
3, Industrial 83.7 44.0 82.8 100.0 57.2 55.6 36.9 57.8 75.6 88.0
4, Technology 50.5 3.3 42.9 57.2 100.0 14.6 4.4 51.1 60.7 82.3
5, C. Staples 47.3 16.4 51.3 55.6 14.6 100.0 58.8 55.3 52.7 50.0
6, Utilities 32.4 34.9 31.6 36.9 4.4 58.8 100.0 20.6 17.3 30.7
7, Health Care 51.3 9.1 61.4 57.8 51.1 55.3 20.6 100.0 69.2 72.5
8, C. Discretionary 70.8 5.0 71.2 75.6 60.7 52.7 17.3 69.2 100.0 83.0
9, S&P 500 76.2 29.1 80.1 88.0 82.3 50.0 30.7 72.5 83.0 100.0
Table 2.5: Correlation matrix for the 90% best S&P 500 scenarios.
It should be quite obvious that if we build our portfolios based on the assumption that marginal
return distributions do not have fat left tails such as in Figure 2.4.1, and that dependencies behave in
the same way in the tail risk scenarios as in the other scenarios, this is a recipe for disaster. We note
once again that the correlation statistic gives us a very crude understanding of the dependencies. Many
18
important nuances are probably missing from just looking at the correlations, similar to the issues from
looking just at skewness and kurtosis statistics. Readers are encouraged to use the accompanying code
to explore the data further.
Next, we perform an analysis similar to the above but for multi-asset daily log returns. First, we
compute constant maturity zero-coupon bond prices, see Section 4.1, from the interest rate data, and
then compute correlation statistics for the daily return series conditional on the VIX index change
being above or below the 90th percentile. Table 2.6 and Table 2.7 shows the correlation matrices,
while Figure 2.4.2 shows the daily 2y bond returns plotted against daily S&P 500 returns.
We can draw several conclusions from these daily return tables and joint return figure. First of all,
from Figure 2.4.2 we see that VIX spikes and adverse S&P 500 scenarios tend to occur simultaneously,
which agrees with our conclusion from Figure 2.3.2. Second, we can see that there has historically
been a lower correlation between bond and equities in these market stress scenarios, as evident by the
lower correlations in Table 2.6.
Figure 2.4.2: 2y zero-coupon bond and S&P 500 daily log returns.
0 1 2 3 4
0, S&P 500 100.0 -25.3 -32.9 -32.6 -46.1
1, 13w bond -25.3 100.0 29.8 17.9 8.4
2, 2y bond -32.9 29.8 100.0 93.5 19.4
3, 30y bond -32.6 17.9 93.5 100.0 18.2
4, VIX -46.1 8.4 19.4 18.2 100.0
Table 2.6: Correlation matrix conditional on daily VIX changes being above the 90th percentile.
19
0 1 2 3 4
0, S&P 500 100.0 -5.4 -19.7 -19.6 -65.8
1, 13w bond -5.4 100.0 18.2 13.7 1.6
2, 2y bond -19.7 18.2 100.0 92.9 13.6
3, 30y bond -19.6 13.7 92.9 100.0 13.4
4, VIX -65.8 1.6 13.6 13.4 100.0
Table 2.7: Correlation matrix conditional on daily VIX changes being below the 90th percentile.
1. Return distributions do not follow nice bell-shaped curves. They are skewed and fat-tailed in
complex ways.
2. Cross-sectional asset dependencies are not just linear and constant, even for plain vanilla instru-
ments such as stocks and bonds.
3. There is a tendency for risk to cluster, i.e., periods with high market volatility tend to be followed
by periods with continued high volatility and vice versa.
4. There exists a volatility risk premium, which indicates that investors are willing to pay a premium
above fair value for convexity and, hence, perceive risk as losses instead of all deviations from
the mean.
So, when we model investment markets in Chapter 3, we must use methods that are capable of
capturing all of the above characteristics, which a mean vector µ and a covariance matrix Σ is not
capable of. In general, distributional statistics can be misleading and hide many important nuances of
investment markets data. Therefore, we must focus on generating market scenarios as represented by
the simulation matrix R and associated joint probability vector p, see Section 1.1.
As we will see in the following chapters, generating fully general simulated market paths and
analyzing them is much harder than estimating a mean vector µ̂ and a covariance matrix Σ̂ and
subsequently analyzing these using some variation of mean-variance. However, if something is obvious
and easy, it is also unlikely to produce better results than the market. The complexity and more
careful attention to detail is what allows us to outperform the average investor.
A final important point is that market simulation and analysis go hand in hand. It does not make
a big difference if we generate very good market simulation scenarios R and subsequently reduce them
to a mean vector and covariance matrix for mean-variance analysis, because this effectively removes
the important nuances that allow us to build portfolios in a clever way.
20
2.6 Naive CVaR and Variance Optimization Backtesting
While the empirical analysis and arguments given in this chapter should be sufficient to make any
logically thinking person abandon the use of mean-variance for investment management in practice,
requests are occasionally made about historical backtests “proving” that CVaR produces better results
than variance. This section provides such a backtest and explains why it is naive.
We can logically conclude that mean-CVaR analysis is more meaningful given that investors have
an aversion to losses, as the volatility risk premium from Section 2.3 illustrates, and the fact that mean-
CVaR will coincide with mean-variance in most practical cases when the mean-variance assumptions
are satisfied, see Vorobets (2022b). Hence, mean-variance can be thought of as a simple subset of
mean-CVaR under highly oversimplified and unrealistic market assumptions. These are some of the
valid arguments for transitioning to mean-CVaR, in addition to the fact that CVaR analysis gives
meaningful insights for fully general joint distributions.
The remainder of this section proceeds to perform a standard expanding window backtest, which
is frequently produced and shared in various empirical studies as well as shown to investment clients.
We use the equity data that is described in the introductory section of this chapter and analyzed
throughout. We start with some descriptive statistics of the quarterly returns in Table 2.8.
Not surprisingly, we see from Table 2.8 that the quarterly returns are also skewed and fat-tailed in
complex ways as well as very far from being normally distributed. All of this is similar to the daily,
monthly, and yearly returns presented in Section 2.1.
We next proceed to the backtest, which is designed in the following way. As we have roughly 25
years of historical data (assuming 252 trading days in a year), we use approximately the first 10 years
as the initial in-sample data and expand the in-sample period with 63 days at each iteration, which
corresponds to a yearly quarter. This gives us a total of 15 · 4 = 60 quarters where we minimize
the CVaR and variance of a long-only portfolio having the 10 US equity indices from Table 2.8 as its
investment universe. We rebalance the portfolios to these optimized exposures every quarter. Readers
can find all the details in the accompanying code to this section and adjust the backtest parameters.
The historical performance of the CVaR and variance optimizations are given in Figure 2.6.1, while
the historical outperformance of the CVaR portfolio is given in Figure 2.6.2.
21
Figure 2.6.1: CVaR and variance optimized historical performance.
22
What can we conclude from Figure 2.6.1 and Figure 2.6.2? That CVaR is a strictly better risk
measure, which gives better performance over time and especially outperforms during market sell-offs?
It certainly seems like it if we use the backtest and its approximately 25% cumulative outperformance.
Some people would happily make that conclusion and often do that when presenting new risk measures
or pitching investment strategies.
However, the only thing we really can conclude is that we are able to find a backtest configuration
where the CVaR optimized portfolio has performed best historically. It is almost certain that if we
adjust the backtest configuration, we could find one where the variance optimized portfolio has better
performance. Hence, drawing generalized conclusions based on one historical realization and backtest,
especially without knowing how we decided to perform exactly this backtest, is dangerous.
We can, however, conclude that there are cases where CVaR optimized portfolios behave as ex-
pected. So, there are some indications that our logical reasoning and practice coincide. Making the
conclusion that CVaR optimized portfolios will always generate better performance is however prema-
ture, and we probably will not be able to draw that conclusion at any point. The best we can hope for
is that CVaR optimized portfolios probably lead to more desirable performance characteristics given
the empirical market facts and investor preferences for avoiding large losses.
Section 3.5 presents better approaches for designing backtests that will likely allow us to make
generalized conclusions with more confidence. These backtests use synthetically generated market
paths that preserve the main characteristics of the historical data, while giving us new paths to
validate our strategies. Backtests like the one presented in this section will surely be able to fool
novice investors, and some people willingly use them to do that. However, this book is focused on
methods and approaches that are likely to give you better tail risk-adjusted performance in practice,
because reality always eventually catches up if we are not careful in our logic and analysis.
23
Chapter 3
Market Simulation
This chapter focuses on generating realistic future paths Rh , h = 1, 2, ..., H, for the factors, prices,
and returns that are relevant for our investments and portfolios. By realistic, we mean simulations
that are capable of capturing the stylized facts presented in Chapter 2 in addition to other subtle
nuances. The chapter will introduce several simulation approaches; one based on resampling and some
based on generative machine learning methods such as variational autoencoders (VAEs) and generative
adversarial networks (GANs).
The simulations use observed historical time series D ∈ RT ×N , where T represents the number of
historical observations, while N represents the number of time series. We can think about the data as
being aligned by days, while the frequency can be different. We note that there might be additional
challenges associated with extreme frequencies, e.g., very short or long horizons. For short horizons
such as intraday data, the analysis might be significantly affected by market microstructure elements,
while longer horizons simply give us fewer observations, making it potentially harder to learn the
patterns of the data.
Investment time series have many unique challenges simply due to the constantly changing nature
of investment markets, but also due to the need to capture the dependencies both in the cross-section
and across time. The many structural breaks and few constraints on the possible future outcomes
make it one of the most challenging problems, especially considering the potentially high-dimensional
nature of the time series that we want to simulate future paths for.
The simulation approach based on historical data lends itself to the trivial critique that history will
never repeat itself. While this is true, it will in most cases be hard to argue for discarding the historical
data of the markets and risk factors that we are trying to model. It is also important to underline
that we make no assumptions about history repeating itself, because this is obviously false. What we
instead try to do is to generate new synthetic samples that preserve the characteristics of historical
observations while giving us entirely new paths for the future behavior of investment markets.
In the “quantamental” spirit of this book, we can make discretionary adjustments based on infor-
mation that is not available in the historical data by using the Sequential Entropy Pooling method,
thoroughly presented in Chapter 5. The causal and predictive framework from Section 5.4 specifically
allows us to hypothesize about future relationships that have not been present in historical data while
respecting the core characteristics of our simulations Rh , h = 1, 2, . . . , H.
24
3.1 Stationary Transformations
The first step is to transform the observed historical time series D ∈ RT ×N into stationary data ST ∈
RT̃ ×Ñ . Formally, we say that a stochastic process is stationary if its unconditional joint probability
distribution does not change when shifted in time. For more on time series stationarity, see Hamilton
(1994). Loosely speaking, we can think of stationary data as having some repeatable statistical patterns
that we are able to learn from. For more on statistical learning, see Hastie, Tibshirani, and J. Friedman
(2009).
The harsh reality of investment markets is that they are probably not stationary in the strict
theoretical sense due to constant structural breaks, presented in Section 1.2. Hence, the best we can
hope for is likely approximate stationarity and repeatability. But just because the market simulation
problem is really hard, and we are aware of the limitations, it does not mean that we cannot learn
something useful from historical data. We must just not fool ourselves into believing that we have
estimated the “correct” model that market realizations are generated from. This model probably does
not exist in reality. Hence, we are looking for models and methods that allow us to capture the
complexities of investment markets and are likely to be useful in the future. Reducing the market to
a covariance matrix and a mean vector fails in relation to capturing these complexities.
A natural question is: why do we bother transforming the data into something stationary and
not just use the raw data? This is simply due to the fact that most statistical models work best on
data that is stationary. So, we can see it as a part of preprocessing similar to data normalization.
Some models might perform well without any preprocessing, but it is more the exception than the
rule. Hence, the general idea is that we transform the raw data into something stationary, project
these stationary transformations into the future, and then compute the desired market simulations Rh ,
h = 1, 2, . . . , H. A subtle nuance of this approach is that we must be able to recover the quantities
that we are actually interested in from the simulated stationary transformations. For most cases, this
is not a problem, but it is an important aspect to keep in mind.
We can, without loss of generality, focus on simulating risk factors as defined in Section 1.1, because
the prices and returns of our investable instruments are functions of these risk factors, see Chapter 4
on instrument pricing. Sometimes a stationary transformation will be very closely related to the P&L
that we are interested in simulating. For example, for equities the log return is a good candidate for a
stationary transformation that we can use for projection. With the risk factor simulation at hand, it
should be straightforward for us to transform them into the price or return simulations that we want
to analyze for investment and risk management purposes.
To understand what we are trying to achieve with our approach, consider for a moment an AR (1)
process
Xt = φ0 + φ1 Xt−1 + εt ,
iid
with εt ∼ N 0, σ . If |φ1 | < 1, we call the process stationary. If φ1 = 1, we call the process a
2
random walk. If we imagine that we know the values φ0 and φ1 , we can easily transform the raw data
observations Xt , t = 0, 1, 2, . . . , T , into noise observations εt in the following way
εt = Xt − φ0 − φ1 Xt−1
25
and focus on just estimating σ.
With an estimate σ̂ of σ and some initial value x0 , we can simulate the AR (1) process by simply
sampling iid from the normal distribution with mean zero and variance σ̂ 2 . In this realization lies an
important point. Even if the original time series observations Xt are highly dependent over time, it
is possible to perform some transformations where the fundamental uncertainty is iid, which is the
case for εt . Some people struggle to understand how one can transform originally correlated data into
something that is iid. The AR (1) process gives a simple and easy to understand example. These ideas
will be particularly important to keep in mind in Section 3.2.2 when variational autoencoders (VAEs)
are introduced, because the encoder’s purpose will be to estimate parameters that transform the data
into something that has an iid Gaussian distribution, while the decoder’s objective is to reverse this
transformation and reintroduce the time series dependencies.
The AR (1) process can be stationary or non-stationary. In the non-stationary case with φ1 = 1,
we can transform the process to a stationary one by difference, i.e., work with
∆Xt = Xt − Xt−1 = φ0 + εt ,
iid
from which it follows that ∆Xt ∼ N φ0 , σ 2 . Investment price time series, for example, the S&P 500
and STOXX 50 equity indices are considered to be non-stationary processes. Hence, we must perform
some differencing to transform this data into something stationary, for example, the log return.
Even in cases where we have stationary data, for example, interest rate or implied volatility time
series, we might want to perform additional transformations to make the data even nicer for us to work
with. By nicer, we mean something that comes closer to being iid, because iid processes like ∆Xt in
the example above simplify the estimation for us. Highly persistent processes with φ1 close to 1 can
be challenging for many statistical time series models if they are applied to the raw series.
Real-world investment data usually does not follow simple process like the AR (1) process above.
Hence, hoping that we can transform the raw data into something that is iid without overfitting to
the particular historical sample is probably a bit naive. If we had data that was iid through time, we
could simply focus on estimating the cross-sectional dependencies. However, since this is not the case,
we still need models that are capable of capturing both cross-sectional and time series dependencies,
but we are trying to make the latter as simple as possible to make it easier for our models to learn.
Throughout this chapter, we will work with the time series simulation that follows with the forti-
tudo.tech Python package. This is simply because it contains the common type of data that we have
for investment markets, i.e., prices, interests rates and spreads, as well as implied volatility surfaces. It
is hard to find real-world data for these quantities that can be distributed with the book, so we must
rely on simulated time series. However, the transformation and analysis performed in this chapter can
be applied to the real-world data that readers have licenses for. Real-world data will simply have more
complex dynamics than the stochastic differential equation (SDE) simulation that follows with the
fortitudo.tech Python package. See Section 3.2.3 for more perspectives on SDEs and the no-arbitrage
condition.
The accompanying code to this section contains suggestions for the transformations we can perform
for equity price series, interest rates and spreads, as well as volatility surfaces. These transformations
are fairly simple and easy for us to invert. Hence, they satisfy the recovery requirement.
26
The figures below show the characteristics of the stationary transformations, starting with a com-
parison between the raw equity index time series in Figure 3.1.1 and its log return stationary transfor-
mation in Figure 3.1.2. It is clear from Figure 3.1.2 that the data looks like something which is closer
to being iid, or at the very least less persistent than the raw time series in Figure 3.1.1. However, it is
also clear that we have not removed all time series dependencies, as there appears to be some clustering
in the magnitude of the stationary transformations in 3.1.2. Hence, our statistical simulation models
must still be able to capture both the cross-sectional and time series dependencies.
27
The remaining figures in this section show the stationary transformation time series for zero-coupon
interest rates, implied volatilities, and credit spreads. What we see from these figures is that they from
a statistical point of view are quite similar to the stationary transformation of the equity index in
Figure 3.1.2. In this realization lies a key point. We do not care about which asset class or risk factors
we are working with. The only thing that matters is that we are able to transform it into something
that has nice properties for simulating future paths. The label we put on the data does not matter for
our generative models. It is only the statistical properties of the data that matters.
28
Figure 3.1.5: Credit spread stationary transformations.
You can find the details of the stationary transformations in the accompanying code to this section,
where you will also see how the various graphs have been generated. It is important to underline that
the stationary transformations we propose are just examples. You might be able to come up with other
stationary transformations that work even better. The important point is that you keep the objective
in mind, i.e., making the data easier for statistical models to work with and generate paths from. It is
also important that you are able to transform the stationary transformations to the risk factors that
you are actually interested in without loss of information. In most cases, the reverse transformation
does not cause any issues, but it is an important aspect to keep in mind.
Note also that the stationary transformations ST ∈ RT̃ ×Ñ can have other dimensions than the
historical time series D ∈ RT ×N . For example, we might have T̃ = T − 1 if we have performed
differencing to achieve stationarity. Similarly, we might have performed some transformations that
reduce or increase the number of time series such that N ̸= Ñ . In the examples that we will work
with in this book, we will only have a reduction in the time series dimension due to differencing and
thus T̃ = T − 1 as well as N = Ñ , but you should not be limited by this in practice.
29
is important to understand that resampling methods usually have less capability when it comes to
capturing highly complex time series dependencies, while they are excellent at capturing the cross-
sectional dependencies no matter how complex they are. The opposite holds for generative machine
learning methods, which are very capable when it comes to capturing time series dependencies, but
suffer from the well-known curse of dimensionality when it comes to the cross-sectional dependencies.
There will be more perspectives on these points in the sections that cover each of the approaches.
Synthetic data generation has received a lot of attention in recent years with, for example, image
and text generation. Contrary to these applications, investment time series have a lot less structure.
For example, when we are generating an image, there are certain limitations on the pixels and the image
resolution. When we are generating text, there are certain grammatical rules and a finite number of
words in the dictionary that we can choose from. We would not worry about pixels suddenly behaving
in structurally new ways or the rules of the language fundamentally changing in important ways
without us knowing. Even with the additional structure, image and text generation problems are by
no means trivial, which is evident by the large industries that exist around solving these problems.
Recently, video and music generation has also received increased attention. Investment time series
probably have more in common with these applications, while still being distinct due to the constant
structural breaks. Another challenge with investment data is that we are trying to estimate both the
cross-sectional distribution as well as the time series dependencies using just one historical realization
for each time series. This creates challenges both when it comes to estimation and evaluation. For
example, when evaluating our distribution forecast for the next time step, we only have one realization
to assess the quality of our distributional forecast. The potentially high-dimensional nature of our
joint distributional forecasts can additionally complicate these aspects.
By now, it should be clear to readers that the market simulation problem is immensely challenging.
However, this does not mean that we cannot do better than doing nothing and relying solely on
qualitative considerations when building our portfolios. We also want to avoid using methods and
making assumptions that are in obvious disagreement with observed historical data. It is for example
very dangerous to make the assumption that investment return distributions do not have fat tails, are
not skewed in complex ways, and that the dependencies are only linear and cross-sectionally constant.
These are the assumptions that we would make if we restrict ourselves to just focusing on a mean
vector and a covariance matrix, which is why we avoid doing that in the sections below.
30
dependencies.
As already discussed, it is unlikely that we will be able to find stationary transformations such that
the transformed data is iid across time without overfitting to the particular historical sample. This
realization has lead to the development of bootstrap methods that attempt to capture the remaining
time series dependencies. Most popularly, the block bootstrap that samples several time periods at a
time. For more details on resampling methods with dependent data, see Lahiri (2003). The structural
breaks are not something that we have control of. The strategy here will be to have simulation methods
that are reactive enough to capture them within reasonable time or use a discretionary Entropy Pooling
adjustment as presented in Chapter 5.
This section introduces a new time series bootstrapping method, which is presented for the first
time in this book and is coined Fully Flexible Resampling. It builds on a clever use of the Entropy
Pooling method, introduced in Section 1.1 and thoroughly presented in Chapter 5. More specifically,
it is a generalization of the approach introduced by Meucci (2013), where we include an implicit
Markov chain, see Section 1.2, that allows us to simulate new paths for horizons H ≥ 1 and potentially
condition on a different initial state than the current. The introduction of this Markov chain gives us
the flexibility to capture some of the time series dependencies that are remaining after our stationary
transformations from Section 3.1.
The starting point of our simulation is the stationary transformations ST ∈ RT̃ ×Ñ , having T̃
observations. When we resample these observations, the number of time series Ñ is not important. We
can have markets that are arbitrarily high-dimensional. If the historical data is a good representation of
the cross-sectional dependencies that we can expect in the future, we will be able to capture these with
our resampling approach no matter how complex they are. In fact, when we perform our resampled
simulation, we can just sample the historical time series indices and store them as our simulations.
Based on these indices, we can sample the stationary transformations that we need for a particular
purpose. This is also a useful feature in the case where we work with very high-dimensional markets,
where we are perhaps not able to store all of our historical data in memory at once.
By default, the historical observations carry equal sample weight pt = 1
T̃
for t = 1, 2, . . . , T̃ .
However, there is nothing restricting us from changing these sample weights, besides the natural
PT̃
constraint that t pt = 1, and pt ∈ [0, 1] for all t. As an example, we can assign equal probability
to all scenarios in a subset of the historical observations or implement some decay in the sample
weight that we assign to historical scenarios, giving a higher weight to recent observations. A common
suggesting for the latter is exponential decay with half life parameter τ , i.e.,
n o
pexp ∝ e−
ln 2
τ (T̃ −t) , t ∈ 1, 2, . . . T̃ , (3.2.1)
t
where the “proportional to” symbol ∝ is used to indicate that the scenario probabilities must be
PT̃
normalized such that t pexp
t = 1. We will refer to the exponentially decaying probabilities (3.2.1) as
time-conditioning.
To introduce state-conditioning, think about the VIX index introduced in Chapter 2. We could
think of this as a state variable for the market, with periods of low, medium, and high implied volatility.
It is indeed commonly thought of as such by investment practitioners and sometimes called the “fear
index”. If we define some values V IXhigh and V IXlow for which we consider the VIX index to be,
31
respectively, high or low, we can assign equal probability to all historical scenarios having a VIX value
equal to or higher than V IXhigh and define this as an “elevated implied volatility” state. We could
then define additional two states with VIX below V IXlow , and VIX between V IXlow and V IXhigh .
These are elementary examples of state-conditioning.
The most elementary way of introducing state-dependence into resampling is then to estimate
a state transition probability matrix as presented in Section 1.2. We can estimate these transition
probabilities by simply counting how many times the VIX index has historically transitioned from
one state to another and then normalize, such that each row of the transition matrix sums to one.
The resampling would then happen according to the following procedure: use the current state to
sample the next state according to the transition probability matrix T and then sample a historical
joint observation of the stationary transformations conditional on the new state. This procedure
would probably help us capture some of the time series dependencies that remain in the stationary
transformations, certainly more than an iid bootstrap would, but it contains no time-conditioning like
in (3.2.1), and it uses just one state variable.
The Fully Flexible Resampling method introduced in this section allows us to elegantly combine
time- and state-conditioning including several arbitrarily complex state variables. This section focuses
on presenting the method with a code example, while a more detailed mathematical analysis of the
method’s properties will be presented in a separate forthcoming article in order to keep the focus on
the essence in this book. The method essentially uses Entropy Pooling in the way proposed by Meucci
(2013), while the innovation here is to generalize the approach to H-step-ahead simulations instead of
being restricted just the next period as in the original article.
Letting zt denote the value of a state variable such as the VIX index, we start by formally defining
purely state-conditioned scenario probabilities as
1 if z ∈ R (z ⋆ ) ,
t
pcrisp
t ∝ (3.2.2)
0 otherwise.
In the above, R (z ⋆ ) is a symmetric range around the value z ⋆ in the sense that
X α X
pt = = pt ,
2
{t|zt ∈[z,z ⋆ ]} {t|zt ∈[z ⋆ ,z̄]}
where pt , t = 1, 2, . . . , T̃ , are the uniform empirical scenario probabilities and α ∈ [0, 1] is some
probability.
This section will proceed to define the Fully Flexible Resampling method for one state variable,
such as the VIX index, while Section 3.2.1.1 presents how to condition on multiple state variables. The
fundamental resampling approach is independent of the number of state variables. Additional state
variables simply introduce more states and hence more state probability vectors for us to sample from.
For one state variable, let us imagine that we partition the historical realizations into J ranges such
32
as (3.2.2) by defining J − 1 partitioning values vj such that vj < vj+1 for j = 1, 2, . . . J − 1, i.e.,
z ≤ vj for j = 1,
t
R zj⋆ = vj−1 < zt ≤ vj
for j = 2, . . . , J − 1,
for j = J.
v <z
j−1 t
The partitioning values vj could, for example, be determined by some desired percentiles such as 25%
and 75%.
The next step is to compute J scenario probability vectors qj that each combine the time-conditioning
from (3.2.1) with the state-conditioning from (3.2.2) for the ranges R zj⋆ . We do this using Entropy
T̃
X
xt zt = µj ,
t=1
T̃
X
xt zt2 ≤ µ2j + σj2 ,
t=1
where
X
µj = pcrisp
t zt ,
t∈{t|zt ∈R(zj⋆ )}
X
σj2 = pcrisp
t zt2 − µ2j .
t∈{t|zt ∈R(zj⋆ )}
Meucci (2013) shows this approach corresponds to a mixture of an exponential decay prior (3.2.1) and
a smoother kernel version of the crisp probabilities (3.2.2) with optimal bandwidth and center. More
technical details can be found in Meucci (2013) and the forthcoming article about the Fully Flexible
Resampling method. See also Chapter 5 for a deeper presentation of Entropy Pooling than in the
introductory Section 1.1.
With an initial state j = j0 and the probability vectors qj at hand, we can generate S paths using
the Fully Flexible Resampling method with following procedure:
n o
1. Sample a historical scenario t ∈ 1, 2, . . . , T̃ according to the scenario probabilities qj .
2. Update the state j, so it corresponds to the state of the historical scenario t from 1.
At first, it might be unclear how the above procedure results in a Markov chain as introduced in Section
1.2, because we do not explicitly sample a state before resampling the historical scenarios. After the
initial state, the state sampling is implicit through the scenario probability vectors qj . Based on these,
we can calculate a state transition probability matrix T if we want. The case study in this section
illustrates how you can apply the method to the stationary transformations from Section 3.1, using
33
the one month at-the-money forward (ATMF) implied volatility as the state variable. You can find all
details in the accompanying code to this section.
Figure 3.2.1 shows the historical realizations of the 1m ATMF implied volatility together with its
state classification based on 25% and 75% percentiles. Based on the state classification, we can calculate
µj and σj that we will use as Entropy Pooling view values to compute qj against the exponential decay
prior (3.2.1), which Figure 3.2.2 shows with a half life parameter τ = T̃ /2.
Figure 3.2.3 shows the Entropy Pooling posterior probability vectors qj for each of the state clas-
sifications from Figure 3.2.1. The exponential decay for these probability vectors is evident from the
figure. It is also clear that we have more scenarios in the mid implied volatility state with each sce-
nario having a lower average probability than for the high or low implied volatility states. The scenario
probabilities of the high and low implied volatility states are naturally concentrated on fewer historical
observations.
We note that in this case study we used a state variable which is fairly straightforward to interpret,
but we are not limited to state variables with an economic interpretation, although this will probably
be preferred by other stakeholders. We can use arbitrarily complex state variable as long as their values
can be partitioned in a meaningful way. In this case, we have an intuitive idea of what it means that
the implied volatility is higher or lower, but that might also hold for more complex state variables.
34
Figure 3.2.2: Exponential decay prior.
We use the probability vectors from Figure 3.2.3 to perform a resampled simulation H = 21 days
into the future, using the Fully Flexible Resampling procedure from this section. Since the stationary
transformations for the equity index are just the log returns, it is easy for us to transform these into the
returns that we would use for investment and risk analysis. Hence, Figure 3.2.4 shows the one month
return distribution for the equity index conditional on the initial state. The reverse transformation
and simulation for implied volatilities, interest rates, and credit spreads will be presented in Section
3.3.
Based on the scenario probability vectors qj , which are illustrated in Figure 3.2.3, we can compute
the implicit state transition probability matrix T of the resampling Markov chain. For this case study,
it is given by
0.902 0.098 0.000
T = 0.043 0.923 0.035 .
35
Figure 3.2.3: Fully Flexible Resampling probability vectors qj .
Figure 3.2.4: One month equity return distributions conditional on initial state.
36
3.2.1.1 Multiple State Variables
Even if we have multiple state variables, the Fully Flexible Resampling procedure is the same as in
Section 3.2.1. We just need to do a bit more work when computing the historical scenario probability
vectors qj , j = 1, 2, . . . , J, and define partitioning values for the M state variables, i.e., we add an
m ∈ {1, 2, . . . , M } subscript to the ranges
z ≤ vi,m for i = 1,
t,m
⋆
R zi,m = vi−1,m < zt,m ≤ vi,m for i = 2, . . . , Im − 1,
for i = I m .
v <z
i−1,m t,m
QM
The number of state probability vectors qj then becomes J = m=1 Im .
The first step is the same as in Section 3.2.1, but it must be repeated for all M state variables, i.e.,
we must compute posterior probability vectors qi,m using Entropy Pooling with pexp
t as the prior and
the views
T̃
X
xt zt,m = µi,m ,
t=1
T̃
X
2
xt zt,m ≤ µ2i,m + σi,m
2
,
t=1
where
X
µi,m = pcrisp
t zt,m ,
t∈{t|zt ∈R(zi,m
⋆
)}
X
2
σi,m = pcrisp
t
2
zt,m − µ2i,m .
t∈{t|zt ∈R(zi,m
⋆
)}
PM
Note that the above results in m=1 Im probability vectors qi,m for the M individual state variables.
These probability vectors need to be combined through a proper weighting, so the state conditioning
happens jointly over all M state variables.
The method used for combining the individual probability vectors qi,m to achieve joint state condi-
tioning is the one suggested by Meucci (2013). However, in this book it is recommended to use linear
mixing as in Section 5.3, because this usually results in a higher effective number of scenarios (ENS),
given in equation (5.1.2), for the final state probability vectors qj , which is arguably a desirable feature
for a resampling method where we want to use the available data as effectively as possible. See Section
5.1 for a definition and explanation of ENS.
With multiple state variables, we have Im probability vectors q1,m , q2,m , . . . , qIm ,m for each state
variable m = 1, 2, . . . , M . We must combine these over the Cartesian product I = I1 × I2 × · · · × IM
with Im = {1, 2, . . . , Im } into J probability vectors qj . For each m we define the mapping fm :
{1, 2, . . . , J} → Im such that fm (j) = im gives us the index im of state variable m for scenario j. We
must then determine weights wim ,m for the vectors qim ,m that satisfy the constraints wim ,m ≥ 0 and
37
PM
m=1 wim ,m = 1. The final state probability vector qj is then given by
M
X
qj = wim ,m qim ,m for j ∈ {1, 2, . . . , J} .
m=1
With an overall understanding of what we are trying to achieve, we continue to the procedure of
computing wim ,m . As already argued, vectors qim ,m with a higher effective number of scenarios (ENS)
are preferable from a data efficiency perspective. Hence, vectors qim ,m with a higher ENS should be
given a higher weight wim ,m . The other consideration comes from giving a lower weight to redundant
state variables in the sense that they contain very little extra information compared to the other state
variables. Such state variables should be given a lower weight wim ,m . Hence, if we let Dim ,m be a soon
to be defined diversity indicator, we determine the weights wim ,m using the following formula
EN Sim ,m Dim ,m
wim ,m = PM .
m=1 EN Sim ,m Dim ,m
The effective number of scenarios is defined in equation (5.1.2), so we focus on defining Dim ,m
using the notation from this book. For any pair of individual probability vectors (qim ,m , qim̃ ,m̃ ), we
PT̃ 1/2
first define the Bhattacharyya coefficient as bm,m̃ = t (qt,im ,m qt,im̃ ,m̃ ) , compute the Hellinger
distance dm,m̃ = 1 − bm,m̃ , and finally compute the diversity score as
p
1 X
Dim ,m = dm,m̃ .
M −1
m̸̃=m
The justification for the diversity score is given by Meucci (2013). While the author finds that this
leads to meaningful weights wim ,m , readers are encouraged to explore other meaningful alternatives
based on the same fundamental objectives.
The case study in this section uses the same data as Section 3.2.1 and adds the slope of the interest
rate curve, defined as the difference between the 1y and 10y zero-coupon bond yield, as an additional
state variable. For the slope, we define just one value v1,2 = 0, i.e., the states are whether the slope
of the interest rate curve is positive or inverted. Hence, I1 = 3 for the 1m implied volatility state
variable as in the previous section, while I2 = 2 for the slope state variable. This gives us a total of
J = 3 · 2 = 6 states with M = 2 vectors in each state that need to be combined into qj based on
EN Sim ,m and Dim ,m .
This section shows the results of the analysis similar to Section 3.2.1, but it is left as an exercise for
readers to test their understanding by replicating the graphs below. We start with Figure 3.2.5 and
Figure 3.2.6 that illustrate the state variables together with their state classifications. We note that
Figure 3.2.5 shows the same 1m implied volatility time series, but it is now classified into six states
instead of three. Finally, Figure 3.2.7 shows the six Fully Flexible Resampling probability vectors qj
that we would use for resampled simulation.
From Figure 3.2.7, we note that there are few states where the interest rate curve is inverted. Hence,
the state conditioning becomes significant on the historical scenarios where we have an inverted interest
rate curve, clearly distorting the exponential decay prior.
38
Figure 3.2.5: 1m ATMF implied volatility and state classification.
39
Figure 3.2.7: Fully Flexible Resampling probability vectors qj .
40
3.2.2.1 Variational Autoencoders (VAEs)
where W ∈ RN ×N is a transformation matrix which ensures that all columns of F are orthogonal to
each other, i.e., have zero correlation. Furthermore, the principal component factors are ordered such
that Fi explains a larger proportion of the variance in the data than Fj when i < j, with Fi and Fj
denoting some columns i and j in the matrix F . There is a lot of literature on PCA and software that
solves the problem of finding the matrix W , see for example James et al. (2023), so we will not spend
time on it here and instead focus on relating PCA to AEs.
If we want to reconstruct the raw demeaned data, we can naturally do it through
F W −1 = D̄ ∈ RT ×N .
Hence, we can think of the matrix W as a linear encoding of the raw data D̄ into the principal
component factors F , while W −1 is a linear decoding of the principals components into the raw data
D̄. The dimensionality reduction comes from the fact that we can decide to use only some of the
principal components to analyze and reconstruct our data, see James et al. (2023).
When it comes to AEs, the encoding into factors is performed through a general function
f (D) = F ∈ RT ×Ñ ,
where Ñ < N typically, and the function f (D) is a neural network. The same is true for the subsequent
decoding function
g (F ) = D̃.
Note that D̃ will contain a reconstruction loss when Ñ < N , while D̃ = D when Ñ ≥ N . This is
similar to principal components when we do not use all of them to reconstruct the data.
The variational autoencoder was introduced by Kingma and Welling (2013). Instead of reducing
the data to a deterministic representation, VAEs typically parameterize the encoder and decoder such
that the factors are fitted to a normal distribution with a diagonal covariance matrix, i.e.,
f (D) = F ∼ N µ, diag σ 2
,
41
trick” and a mean squared error objective combined with a Kullback–Leibler divergence term can be
found in Kingma and Welling (2019).
Once we have estimated the relevant VAE parameters, we no longer need the encoder f (D) for mar-
ket simulation. We can just sample from the normal distribution with mean µ and covariance diag σ 2
and subsequently input these samples into the decoder g (F ), which will generate new synthetic data
samples that have properties similar to the historical data D, assuming that we have specified and
estimated our VAE model properly.
VAEs are usually straightforward to train, but they are very hard to implement in a way that
properly handles statefulness in training, evaluation, and simulation given the current deep learning
technology. This leads to a common misunderstanding that VAEs are not capable of handling data
that is correlated over time. Some people also struggle to understand how the iid sampling from
N µ, diag σ 2 can produce correlated synthetic samples, but it is in the exact same way as the
AR (1) process presented in Section 3.1, where the noise term εt is also iid Gaussian, while the actual
AR (1) process has obvious autocorrelation. Properly implemented and stateful neural network layers
are in fact very capable of capturing time series dependencies.
Compared to the Fully Flexible Resampling method introduced in Section 3.2.1, VAEs in their
raw form are naturally harder to interpret. Hence, investment managers have so far been reluctant
to use them for market simulation, while it might be possible to come up with architectures that
introduce an interpretative or causal overlay. One practical area where VAEs have been used is data
imputation, for example, if we have market data from various countries or regions where the holidays
are different. Previously, such imputations were typically done with regression models, PCA, or more
simple imputation methods such as replacement with the mean or last observation carried forward
(LOCF).
Below is a case study where we compare the performance of LOCF, PCA, and a VAE to impute
missing data in the multi-asset simulated data introduced in Section 3.1. The case study randomly
removes 10% of the data and imputes it using the three different methods. It uses proprietary im-
plementations of these methods, while the VAE algorithm is spelled out in McCoy, Kroon, and Auret
(2018). The VAE has a very simple architecture with one layer of 16 simple RNN nodes for both the
encoder and decoder. It is perhaps possible to get even better performance by improving the architec-
ture, but this example shows that properly implemented VAEs, which correctly handle statefulness,
are capable of producing good results with narrow and simple architectures. The challenge is to get
these implementations right for the tabular time series data that we work with.
The mean squared errors for LOCF, PCA, and VAE are, respectively, 0.916, 0.601, and 0.379,
so we see that the VAE approach produces significantly better data imputations than the two other
methods. Figure 3.2.8 illustrates the last year of imputed and real data for the equity index, while
Figure 3.2.9 illustrates the deviations from the true value over the entire period for the indices that
are missing. From Figure 3.2.9, we can visually get a sense of the gain from using VAEs. Note that
the magnitude of the differences in Figure 3.2.9 increases simply due to the equity index increasing.
Since imputation methods like PCA and regressions are typically applied in a statistical pattern
recognition way, the lack of interpretative overlay for VAEs seems to be acceptable to investment
managers. For the actual market simulation, methods like the Fully Flexible Resampling from Section
42
3.2.1 seems to be preferred, at least until generative machine learning methods can clearly prove that
they produce much better future market simulations.
43
3.2.2.2 Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs) were initially introduced by Goodfellow et al. (2014) and
are fundamentally different from VAEs, while the synthetic data generation objective is the same.
In a standard GAN model, there are two neural networks competing with each other. The first one
is a generator G (z), which typically takes a normally distributed noise vector z ∼ N (0, diag (1)) as
input, while the second is a discriminator D (G (z) , D), which takes both the original data D and the
simulated data G (z) as input.
The idea is to train these two networks in a way where the generator becomes better at generating
synthetic samples G (z) that the discriminator cannot distinguish from the original data D. While this
sounds natural and is the essence of what we are trying to do, several practical issues can occur when
we train GAN models, for example, mode collapse. Mode collapse is a situation where the generator
learns to generate excellent data points in a specific area of the distribution and continues to generate
these samples without exploring the entire distribution of the historical data. A method to alleviate
issues with mode collapse is to use minibatch discrimination layers, see Salimans et al. (2016).
Similar to VAEs, tabular time series introduce several additional complexities. Hence, it is crucial
that statefulness is handled correctly in GAN training, evaluation, and simulation. There have also
been specific time series GAN architectures proposed, for example, by Yoon, Jarrett, and Van der
Schaar (2019) that essentially combine the VAE and GAN architectures and generate samples in the
latent VAE space, such that we are no longer constrained to it being iid normal as in Section 3.2.2.1.
In summary, it is still very early days for generative machine learning methods applied to investment
time series, and there is a significant exploration barrier due to the implementation complexities. Naive
copy/pasting of code from other applications usually does not work.
44
ment markets. Perhaps even more importantly, they are very restrictive when it comes to capturing the
potentially very high-dimensional dependencies of investment markets. For example, if we generalize
the geometric Brownian motion to a stochastic volatility model formulation
√
dXt = µXt dt + vt Xt dWt ,
dVt = αt dt + βt dBt ,
the additional dependencies between these processes are usually introduced through a correlation
between the Brownian motions Wt and Bt that drive the fundamental uncertainty. This is still too
restrictive.
In summary, stochastic differential equations are good no-arbitrage calibrating machines that allow
market makers to price derivatives on typically a single underlying such as S&P 500 and STOXX
50, but they are usually too restrictive when it comes to approximating both the marginal and joint
distributions of investment markets. The author is not familiar with any successful use of SDEs for
the risk modeling of high-dimensional markets for investment applications, but he has witnessed many
failed attempts. The joint calibration of these models becomes increasingly complex as the dimension
increases, and there can be many additional restrictions on, for example, the dynamics of interest rates.
The approaches presented in this book for market simulation are much more focused on being able
to capture the complex shapes and dependencies observed in investment markets. This is usually much
more important for investment managers. We also note that the issues related to arbitrage are mostly
related to derivative instruments, where there must be a logical consistency between the underlying
and the derivative. While this aspect is enforced in our focus on simulating risk factors, we cannot
rule out that the implied volatility surfaces we simulate allow for arbitrage. These simulated arbitrage
opportunities are unlikely to be tradeable in practice, so they mainly create issues if we have investment
algorithms that specifically look for them in the simulated data.
A final perspective is that if our historical market data does not contain arbitrage and our simu-
lations models are good at capturing the features of our data, our simulations should be essentially
arbitrage-free. This is indeed also what early results seem to indicate for synthetic data generation
using machine learning methods as presented in Section 3.2.2. We also note that the stationary
transformation preprocessing in Section 3.1 is very flexible. Hence, it might be possible to perform
these transformations in a way that ensures no arbitrage for specific instruments. Providing joint
no-arbitrage guarantees for very general simulations is, to the author’s knowledge, very hard.
We end this section by noting that there are no issues with arbitrage for common simulations of
instruments like stocks, because all dependencies, both historical and simulated, are statistical. Hence,
it is quite unlikely that we randomly end up with a simulation where there is a guaranteed risk-free
profit, unless we perhaps simulate the price of the same stock traded on different exchanges. If we do
something like that, we must give the simulations extra attention.
If you are a market maker interested in understanding the joint risks of your trading books, the
methods presented in this book can still be very helpful. You just have to make sure that you do
not use the simulations to give prices to clients if they are not guaranteed to be arbitrage-free. An
understanding of the joint risks across trading books is indeed something that can add significant value
to market makers and is often lacking currently.
45
3.3 Computing Simulated Risk Factors
As already highlighted throughout this chapter, the fundamental idea of the simulation approach is:
1. Transform the raw historical investment time series data D ∈ RT ×N into stationary data ST ∈
RT̃ ×Ñ .
4. Generate the final market simulation R = Rh ∈ RS×I , h ∈ {1, 2, . . . , H}, by using the simulated
risk factors for instrument pricing as presented in Chapter 4.
An important point, which was already made in Section 3.1, is that it should be easy for us to recover
˜
the risk factor simulations R̃ ∈ RS×I×H in step 3 from the simulated stationary transformations paths
˜ ∈ RS×Ñ ×H from step 2. For most transformations such as differencing, this is usually not an issue.
ST
However, for fractionally differenced time series it is nontrivial to recover the original time series in
general. Hence, although fractional differencing might be useful for some investment applications, it
creates some challenges for common market simulation approaches as presented in this book. One can
imagine that there exist other stationary transformations that make it hard to recover the original
time series, either due to numerical issues or simply because it is challenging to do in full generality
as with fractional differencing.
In Section 3.2.1, we already saw how to recover the quantity of interest for an equity instrument,
which is usually the return. It is quite easy to compute the return based on simulated log return
paths. In this section, we will continue with the recovery of the risk factors for government bonds,
options, and credit bonds. These risk factors are respectively, the simulated zero-coupon curve, implied
volatility surface, and credit spread. Once we have simulations for these quantities, we can compute
the instrument P&L for government bonds, options, and credit bonds as presented in Chapter 4.
We let the accompanying code show how the risk factor recovery is performed for the simulated
stationary transformations from Section 3.2.1 instead of writing it out mathematically. This is similar
to 3.2.1, where the accompanying code shows how the stationary transformations are computed. In
brief, the stationary transformations for the equity index and implied volatility surface is log changes,
while the stationary transformations for government bonds and credit spreads can be interpreted as
log changes in constant maturity discount factors. For government bonds, this interpretation is strictly
correct, while for credit spreads it is simply the same transformation performed to the spread. One
could have combined the zero-coupon curve and the credit spread for joint projection of this quantity.
We leave it to the interested reader to try this out and once again underline that the suggestions
from this chapter are just examples. Readers have full flexibility in relation to using other stationary
transformations if they are more suitable for their particular case.
Figure 3.3.1 shows the government bond zero-coupon curve simulation at the one-month horizon
resulting from the stationary transformations simulation using the Fully Flexible Resampling method
from Section 3.2.1. Figure 3.3.2 shows a simulated implied volatility surface. Finally, Figure 3.3.3
shows a hypothetical credit spread with multiple maturities.
46
Figure 3.3.1: Simulated zero-coupon curve at the one-month horizon.
47
Figure 3.3.3: Simulated credit spreads at the one-month horizon.
We note that the case study in this section uses the Fully Flexible Resampling method in a quite
elementary way with just one state variable given by the one-month at-the-money forward (ATMF)
implied volatility. This state variable is probably insufficient to capture all of the the time series
dependencies in the data, which is evident from the quite erratic simulated implied volatility surface
in Figure 3.3.2. Readers can adjust the accompanying code to see other implied volatility simulations.
Figure 3.3.2 shows the simulation for s = 100, while there are in total S = 10, 000 simulations.
In practice, we should probably condition on more state variables as presented in Section 3.2.1.1
or use a more complex state variable that better summarizes the state of the data than the one-
month ATMF implied volatility. An example of a more complex state variable could be a statistical
summary of several state variables, extracted through principle components analysis (PCA) or other
similar methods. There is also nothing that prevents us from combining machine learning methods
as presented in Section 3.2.2 with the Fully Flexible Resampling method. Once we understand the
individual foundations of the methods, we can be creative in how we combine them.
48
likely it is that this has been drawn from our joint simulated distribution. Not only that, we must
also make an assessment of how likely each of the S paths are, i.e., how likely it is that the sequence
of simulated realizations has been drawn from our simulated distribution.
Besides defining the overall objective, we must define what we mean by statistical characteristics
and which methods we use for comparing these. Not all choices are equally good. For example,
we might be excellent at estimating the future covariance matrix but completely miss all the other
important distributional aspects that are presented in Chapter 2 about stylized market facts. We must
also remember that a single quantitative training metric might not be sufficient to assess whether our
models are good or not. For example, think about the mode collapse issue for GANs presented in
Section 3.2.2.2 where the model becomes very good at generating samples in some limited area of the
distribution but fails to cover all areas of the distribution.
Although it would be great with a single metric to credibly evaluate the out-of-sample simulated
paths, there currently does not exist such a metric to the author’s knowledge. Hence, we probably have
to combine multiple metrics and make a joint assessment based on these. Finally, even for problems
with more structure like image generation, there seems to be no definite consensus on how the evaluate
their quality. As a consequence, the final evaluation is usually a mix of quantitative and qualitative
metrics, i.e., simply assessing if people can tell whether the imagine is real or generated.
The first simple market simulation assessment that we can make is just to track if our model
generates paths where the subsequent realization falls into the range of generated paths. For example,
if our model generates paths where the subsequent realization is outside of the distributional range 90%
of the time, this is obviously a bad model. This assessment might sound trivial, and it is surprisingly
often not performed due to its apparent lack of sophistication. While this simple initial assessment
does not allow us to conclude that the generative model we use is good, it will allow us to reject
fundamentally flawed models or detect fundamental implementation or calibration errors. It might
also give us a sense of whether the model seems to be biased in some direction, for example, if the
realizations tend to be in the upper or lower area of the simulation. An example of such a bias is the
backtest in Figure 3.5.2 below, where the portfolio simulations seems to be biased upwards compared
to the realization. Readers can examine and adjust the accompanying code to Section 3.5 to verify
that if we introduce exponential time decay from equation (3.2.1) in the Fully Flexible Resampling
method, the model’s upward bias seems to be mitigated.
Having an understanding of the simulation evaluation challenges, we proceed to various suggestion
of how you can evaluate both the cross-section and time series aspects of your market simulations. For
the quantitative research orientated readers, discovering single statistics that allow us to meaningfully
summarize and compare both the joint time series and cross-sectional properties of investment time
series is probably a good and highly relevant topic for future research.
With the book’s approach to simulation, i.e., transforming the raw data into stationary transfor-
mations and projecting these into the future while capturing state through Fully Flexible Resampling
or time series VAEs and GANs, the time series dependencies evaluation is fairly straightforward. The
idea is that the stationary transformations should bring us as close as possible to something that is iid,
so the first-order time series dependencies in the stationary transformations should be close to zero.
It is unlikely that we in practice will achieve something that is strictly iid. Therefore, we use methods
49
that can capture the time series state, but the time series dependencies are likely to appear in higher
moments, as we have seen in Section 2.2 about risk clustering. Hence, one can evaluate, for example,
the autocorrelation of the absolute values of the historical data and compare this to the simulated
values.
When we evaluate the cross-section, it is important to consider what the time series state is at
this time. For example, if we evaluate the next day’s simulations in a high volatility state, we should
compare it with a realization following a high volatility state. So, for the Fully Flexible Resampling
method, we can evaluate the performance at each state and compare it to the future or historical
realization following that state. If our simulation horizon is sufficiently far into the future, where we
believe the Markov chain has converged to something stationary, we can compare these simulations
without conditioning on the market state, like we do in Table 3.1 and Table 3.2 below for some selected
series of the multi-asset simulated data that follows with the fortitudo.tech Python package.
The simplest cross-sectional comparison is just to assess how accurately our models approximate
cross-sectional statistics of the historical or out-of-sample data. This is done in Table 3.1 and Table
3.2 below for a horizon of H = 252, where it is assumed that we can ignore the remaining state-
dependence without introducing a large error. Similar to Section 3.2.1.1, it is left as an exercise to the
readers to implement the FFR model with multiple state variables and subsequently compute its risk
factor simulation as in Section 3.3. The case study in this section sets the exponential decay half life
parameter to τ = T̃ .
Table 3.1: Difference between prior and simulation statistics using one state variable.
Table 3.2: Difference between prior and simulation statistics using two state variables.
50
Figure 3.4.1 and Figure 3.4.2 show the first two principal components for the risk factor simulation
using the Fully Flexible Resampling models described in Section 3.2.1. Looking at the PCA results,
it appears that the two state variable model is capable of capturing more nuances than the one state
variable model. An alternative is to consider t-SNE as presented by Maaten and Hinton (2008), which
is capable of capturing nonlinear dependencies but requires specification of additional parameters and
might be challenged in high dimension and for large simulations.
Figure 3.4.1: The first two principal components for Fully Flexible Resampling with one state variable.
Figure 3.4.2: The first two principal components for Fully Flexible Resampling with two state variables.
51
To conclude this section, it is important to underline that its main objective is to provide the
fundamental perspectives for evaluating high-dimensional market simulations. While the concrete
evaluation used here might be sufficient for other similar kind of data, completely different mathemat-
ical methods might be needed for your particular market data to assess whether the joint simulation
is sufficiently good or not. Metrics that provide very good summaries of joint high-dimensional data
might be discovered in the future, but they are currently not widely known.
While this section used the historical observations to validate the model calibrations and illustrate
the perspectives, it is important to remember that the focus should be on out-of-sample data and
generalization. Statistical models are so powerful today that a good in-sample fit is to be expected, so
the challenges are mainly related to overfitting by finding models that are sufficient for capturing the
repeatable statistical properties of the historical data while not overfitting to the historical noise.
52
The main results are presented in Figure 3.5.1 and Figure 3.5.2 below. We first start with Figure
3.5.1, which shows the historical performance of CVaR and variance optimized portfolios. Although
CVaR again comes out on top, we certainly still could have found other backtest and simulation
configurations where the opposite would have been the case. The most interesting result is perhaps
that we only use S = 100 scenarios for the 90%-CVaR optimization. This gives us just 10 scenarios
below the VaR, which we use to determine the portfolio. One of the excuses for not doing the harder
work and performing CVaR analysis is often that “we do not have enough data to have good estimates
of the tail risks”. While more data is almost always preferable, this case study shows that CVaR
optimized portfolios can give meaningful results even without a high number of joint scenarios S.
However, we should be careful about generalizing this conclusion. It might be driven by our particular
data and case study design, i.e., there might not be a lot of diversification between the 10 equity indices
neither in the tails nor on average, so the portfolios are largely determined by the indices with the
lowest standalone risks.
Figure 3.5.1: Historical performance of CVaR and variance strategies with simulated joint scenarios.
In Figure 3.5.2, we see some synthetic paths for the CVaR optimized portfolios. We first note
that the historical realization is within this synthetic path distribution, which is good, but it seems
that the synthetic paths are somewhat more optimistic than the historical realizations. Hence, just
conditioning on VIX with the values that we used seems to be insufficient to capture the market state.
Readers can adjust the accompanying code to introduce, for example, exponential decay in the prior
probabilities as in equation (3.2.1), refine the conditioning with other values, or use multiple state
variables as presented in Section 3.2.1.1.
Readers are encouraged to carefully examine the accompanying code to this section. First and
53
foremost to make sure that they understand the suggested backtesting procedure, which is designed to
overcome the issues with having just one historical realization. It is important to understand that this
procedure also can give a false sense of security if our simulations R are an inaccurate representation
of the historical market characteristics.
There is of course also the danger that the historical data we have is an inaccurate representation
of the future, but historical backtests arguably suffer even more from this. Hence, this book generally
recommends using a quantamental approach to investment management by combining the simulations
based on historical data with more forward looking information using the Sequential Entropy Pooling
(SeqEP) method presented in Chapter 5. We note that the methods presented in Chapter 5 can be
used in a fully systematic way, although this book does not present such case studies beyond the use
of the Fully Flexible Resampling method in this section. Systematic use of the methods can be seen
as the base case for quantamental investors, defining the foundations of their investment processes.
Figure 3.5.2: Historical and synthetic performance for the CVaR optimized strategy.
54
Chapter 4
Instrument Pricing
This chapter presents the final market modeling step before the joint market scenarios R, introduced in
Section 1.1, can be used for (subjective) views, stress-testing, optimization, and general risk and return
analysis. As explained in Section 3.3, we can focus on simulating future paths Rh , h ∈ {1, 2, . . . , H},
for the risk factors that are relevant for our portfolios and subsequently price all the instruments that
we are interested in. The fully general joint simulation of factors and instrument (relative) P&L then
allows us to perform very deep investment analysis as presented in chapters 5 to 8.
While this book uses the popular risk factor terminology, these factors can also rightfully be called
pricing factors, because they enter directly into the pricing functions of investment instruments. The
factor model literature is vast and generally focuses on linear factor models, initially popularized by
Ross (1976). This chapter will instead focus on pricing functions, but readers are free to approximate
these by linear models if they deem it sufficient to price the instruments they invest it. For example,
a derivative instrument’s P&L can be approximated by a linear combination of partial derivatives
commonly referred to as “Greeks”, see Section 4.4. It is generally recommended to perform full pricing
and not ignore the residual, unless this is practically infeasible for some reason.
This chapter presents a general framework for investment instrument valuation including some
common examples and perspectives on illiquid alternatives. While this framework should apply to
most investment instruments, some might be so unique that they require an entirely different approach
or a combination of the methods presented in this chapter, for example, callable bonds that are a
combination of a bond and a call option.
Section 4.1 gives a presentation of bond pricing and their risk factors, i.e., zero-coupon interest
rates, yields to maturity, breakeven inflation, and credit spreads. Section 4.2 presents equity pricing,
which to a large extend depends on the investment manager’s desired level of granularity and sometimes
requires additional factor model estimation. Section 4.3 demystifies derivative instruments that are
sometimes perceived as being very different from plain vanilla instruments such as stocks and bonds.
The perspectives on derivatives valuation in Section 4.3 combined with the portfolio management
framework for derivatives presented in Section 6.1 should illustrate that derivatives only have few extra
characteristics compared to other investment instruments. Section 4.4 presents dynamic strategies and
a put option delta hedging case study. Section 4.5 contains some general perspectives and principles
for illiquid alternatives. Finally, Section 4.6 is a multi-asset pricing case study.
55
4.1 Bond Pricing
This section gives an introduction to bond pricing. While there exist many different kinds of bonds,
their common characteristics is that they have coupon payments ct that need to be properly discounted
to calculate the bond’s current price P . The intention of this section is not to present an exhaustive
list of bond instruments, but to describe the most common types of bonds and give readers a general
understanding of the principles that are used to price them. Readers should then be able to apply
these principles to the particular bonds that are relevant for their portfolios.
−t −tN
X
P = (1 + rt ) ct + (1 + rtN ) C.
t∈{t1 ,t2 ,...,tN }
−t
The term (1 + rt ) = Dt is know as the discount factor for time t. Dt is the price of a zero-coupon
bond with principal set to one. In the above pricing formula, C is the principal which is paid together
with the last coupon payment at time tN .
If we imagine that we have access to the zero-coupon interest rates rt for all sufficiently long
maturities t, we call this collection of interest rates the zero-coupon yield curve. Actual zero-coupon
bonds usually have maturity of up to one year. We can extract zero-coupon bond yields directly from
these, while we have to estimate them based on traded coupon bonds for longer maturities, see Munk
(2011). There also exist Separate Trading of Registered Interest and Principal of Securities (STRIPS)
instruments, which partition the coupon bond into zero-coupon bonds. If these are available to you as
an investor, you can also extract zero-coupon yields from these.
56
Another common rate is the yield to maturity, which is defined as the common yield ytN applied
to all repayments, i.e.,
−t −tN
X
P = (1 + ytN ) ct + (1 + ytN ) C.
t∈{t1 ,t2 ,...,tN }
For zero-coupon bonds, the yield to maturity ytN and zero-coupon interest rate rt are the same,
because the bond only has one payment that pays back the principal. For coupon bonds, it is only in
the uncommon scenario where we have a flat interest rate curve until the maturity of the bond that
the two will be the same.
While the yield to maturity ytN can be interesting and useful for bond investors, who use it to
analyze bonds in various ways, it is usually the zero-coupon curve, rt with t ∈ {t1 , t2 , . . . , tN }, that we
are trying to simulate for risk purposes. If we can generate good simulations for zero-coupon curves
of, for example, US treasuries, we can price any US government bond using these simulations and
consequently generate future P&L scenarios.
In practice, we probably have to generate scenarios for some key maturities of the zero-coupon
interest rate curve and interpolate between these points if it is necessary to price the bonds we are
interested in. There exist several interpolation methods, for example, spline interpolation or Nelson-
Siegel(-Svensson) parametrization. We will not go into details with these in this book but refer inter-
ested readers to Munk (2011) or other freely available resources.
where
CP It CP ItN
c̄t = ct and C̄ = C
CP I CP I
are the inflation-adjusted coupons and principal, with CP I being the current level of the price index.
The difference between the yield on nominal bonds and inflation-linked bonds with the same ma-
turity is known as the breakeven inflation, while the yield on inflation-linked bonds is called the real
rate. Inflation-linked bonds are usually not traded as much as nominal bonds, which can complicate
estimation of the real yield curve directly from bond prices. However, there exists a fairly liquid market
for inflation swaps, which trades many different maturities and allows us to estimate the real rate curve
by subtracting the inflation swap strike from the nominal bond curves.
Inflation-linked bonds typically also have a feature where the principal is protected against deflation,
CP ItN
i.e., if CP I < 1 then C̄ is set equal to C. Note that this usually only applies to the principal and
not the coupons. Hence, this feature should be incorporated into our P&L modeling of ILBs, while we
57
can perform the same stationary transformations to the breakeven inflation as we do to interest rates
and the credit spreads below.
The quantity stN is called the credit spread for maturity tN . It is a function of the bond’s default
probability and the potential recovery payment in the case of a default. Credit bond investors might
carefully analyze these quantities to determine if they believe the current credit spread stN represents
these risks appropriately. For our multi-asset risk modeling purposes, simulating the credit spread
stN will usually be sufficient. As we have seen in Section 3.1, we can apply the same stationary
transformations to credit spreads as we do to zero-coupon interest rates. The interpretability of a log
return of the constant maturity zero-coupon bond just disappears in this case, but interpretability is
not a core objective of the stationary transformations. We just want something that is easier for our
statistical models, such as the ones presented in Section 3.2, to project into the future.
58
We note that both fundamental and factor models require additional estimation and assumptions
compared to direct simulation of the equity return and price. The market simulation methods from
Section 3.2 do not require that you impose these fundamental or factor structures on the equity
instruments. They are simply presented here in case you want to, and because they are commonly
seen and talked about in practice.
where p is the equity risk premium, i.e., the extra compensation investors require for bearing equity
risk, r⋆ is some long-term estimate of the risk-free rate, dt are the dividend payments, and g is a
long-term dividend growth rate estimate. The last term is the present value of a perpetuity with an
assumed constant growth rate g.
Fundamental stock picking investors usually apply this dividend discount model, or some residual
operating earnings variant of it, to reformulated accounting numbers, see Penman (2012). These
investor will then input a risk premium p for the particular stock and see if the price is above or below
the current market price. On the other hand, multi-asset investors will usually apply this model to an
index like the S&P 500 and solve for the risk premium to assess how attractive the index return is and
use it as a signal. The author has experience with both of these applications of the model.
Since the model builds on many forecasts and assumptions, it is probably a good idea to jointly
sample these assumptions and get a distribution for the price of the stock or index. This is also
our objective when it comes to market simulation. The challenge comes from the fact that we must
perform a joint simulation of the dividend paths and other parameters together with the potentially
very high-dimensional market we are trying to model. This can be very challenging in practice if we
want to use the dividend discount model for many stocks or indices, and it is not something that the
author has seen done before for a high-dimensional market simulation.
where Ft ∈ RN are the factor realizations, and εi,t is a residual for stock i at time t. As with so many
other things in finance and economics, the usual factor model formulation is linear and given by
59
where α ∈ R is an intercept that allows us to assume that the residual has zero mean without loss of
generality, and β ∈ RN are the “factor loadings”. The linear formulation implies that
From the linear expected return formulation, we can immediately conclude that the CAPM model from
Section 2.1 is a linear factor model with one factor Ft = Rm − Rf . Other examples of linear factor
models are the APT introduced by Ross (1976) and the Fama-French model introduced by Fama and
French (1992).
The expected return expression for a linear factor model (4.2.3) tempts some people to ignore the
residual in (4.2.1) when performing market simulation. This makes it easier to compute marginal risk
contributions and assign them all to specific factors as presented in Section 7.1, but it potentially
ignores very important characteristics of the residual, so this approach is generally not recommended
in this book.
The presentation of linear factor models in this section is very short and sweet. There are many
details and formalities which are purposefully skipped. Some people might even say that CAPM or
the APT does not fall into their definition of linear factor models. We will not delve into these details
as they are not essential for our purposes and simply conclude that the expected return expression has
a linear factor model form. For more details on linear factor models, see Meucci (2014).
60
where S0 is the current underlying “spot” price, while rt and dt are the risk-free zero-coupon interest
rates and dividends, which have been previously introduced. In the forward formula, we note that
t ≤ T , i.e., we only discount dividends that are payable during the life of the forward contract.
Forward/futures strikes are often conveniently computed using a dividend yield q instead of the discrete
dividends. This reduces the formula to
For most forward/futures contracts, the dividends will be close to known during the life of the
contract, unless they have a very long time to maturity. These can therefore be extracted from market
expectations. If we still want to simulate dividends uncertainty, it is probably a good idea to at
least anchor them in the market expectations. Besides that, we see that the forward contract has the
underlying and interest rates as risk factors, which we have methods for simulating and pricing as
presented in the previous chapters and sections.
Lets now turn to FX forwards, which are commonly used to hedge currency risk. Consider, for
example, the EUR/USD spot exchange rate at SEU R/U SD = 1.05, which means that one euro can
be exchanged for 1.05 US dollars. We can simulate this spot exchange rate by performing the same
stationary transformations as we do with equity indices and implied volatility surfaces. However, if we
also simulate the USD and EUR risk-free interest rate curves, we actually also automatically simulate
the forward FX prices, because these are functions of the current spot and the interest rate differential.
For example, for a one-year EUR/USD forward contract the forward rate is given by
(1 + r1,EU R )
FEU R/U SD = SEU R/U SD ,
(1 + r1,U SD )
where r1,EU R and r1,U SD are the one-year risk-free interest rates in EUR and USD, respectively.
61
call and put options with strike K is given by
√
1 FT 1 2
d1 = √ ln + σT,K T and d2 = d1 − σT,K T .
σT,K T K 2
As we see from the above formula, the main risk factors for European-style put and call options
are the underlying spot S0 , the risk-free rates rT , and the implied volatilities σT,K , assuming that the
constant dividend yield q is close to known.
If we compute the changes in option prices using the Black (1976) formula, we can view this as
a nonlinear factor model similar to (4.2.1) with the residual εi,t = 0 for all t, in which case we can
rightfully call it a pricing function. If we decide to approximate the changes in option prices by a
Taylor expansion using option “Greeks”, i.e., partial derivatives with respect to the price inputs, we
can view this as a linear factor model similar to (4.2.2), in which case εi,t ̸= 0 in general. It is up to you
as an investment and risk manager to decide which approach you use, but it is generally recommended
not to ignore the residual and perform full pricing whenever it is feasible, because many important
nuances that can significantly affect our portfolio construction can be hidden in the residual.
There exist many other derivatives than European-style put and call options, for example, American-
style options that allow us to exercise the right to sell/buy during the life of the option. Another
example is variance swaps, which were first introduced in Section 2.3. These can be priced either
through a replicating portfolio of options or by direct Monte Carlo simulation. The important point
is simply that we do what we can to keep the pricing consistent by using the risk factor simulation
correctly. It is clearly out of scope for this book to go through all the details of every known derivative
instrument, so it is up to the reader to use the principles from this chapter to appropriately simulate
the (relative) P&L of the derivative instruments that are relevant for their portfolio. To the author’s
knowledge, these principles should be sufficient to price any derivative with a satisfactory accuracy for
investment management purposes.
62
As trend-following strategies rely on some signal, which is beyond the scope of this book, we will
instead use another commonly interesting dynamic strategy, which is the delta hedging of an option.
Let us imagine that we have a portfolio consisting of an equity index and a one month put option with
an at-the-money forward strike K. The intention of the put option is clearly to limit the downside of
the portfolio. If we statically hold the option to maturity, we can just look at the one month horizon
and compute the option’s P&L at expiry given by max (K − ST , 0) − p0 , with ST being the value of
the underlying index after one month, and p0 being the initial option price. So if K − ST ≤ p0 , the
put option has been unprofitable over the one month horizon.
An alternative to the buy and hold strategy is to delta hedge the put option, i.e., trade for-
wards/futures so that the derivative portfolio’s P&L is not sensitive to first-order changes in the
underlying index. If the market movements end up being very volatile during the one month period,
the delta hedging strategy can actually provide a positive P&L at the end of the horizon, even if
K ≤ ST . To understand this, we can decompose the put option’s P&L using its “Greeks”, i.e.,
∂p ∂p ∂p ∂p 1 ∂2p 2
dp = dt + dS + dσ + dr + dS + ε. (4.4.1)
∂t ∂S ∂σ ∂r 2 ∂S 2
∂Π ∂Π ∂Π 1 ∂2Π 2
dΠ = dt + dσ + dr + dS + ε,
∂t ∂σ ∂r 2 ∂S 2
drivers of a delta hedged option’s cumulative P&L, which is held to expiry, will be the terms ∂Π
∂t dt
1 ∂2Π
and 2 ∂S 2 dS .
2
For a long position in a put option, theta ∂Π
∂t will be negative (the value of the option
decreases as time to expiry decreases), so whether the delta hedged strategy is profitable over the one
1 ∂2Π
month horizon or not depends mainly on the cumulative magnitude of 2 ∂S 2 dS
2
and ∂Π
∂t dt.
The case study to this section illustrates the effect of delta hedging and compares the P&L of the
different option strategies at the one week, two week, three week, and one month horizons. We use
the risk factor simulation from Section 3.2.1. Since we only have data for one month implied volatility
and strikes from −10% to +10%, we make some simplifying assumptions when computing the option’s
P&L simulation. In real-world applications, we should use the implied volatilities that corresponds to
the correct strike, while the main focus of this example is to illustrate how dynamic strategies can be
simulated and analyzed at different horizons. Readers can see the details in the accompanying code.
63
Figure 4.4.1: One month ATMF put P&L at different horizons.
Figure 4.4.2: One month ATMF delta hedged put P&L at different horizons.
64
Figure 4.4.3: Joint plot for underlying asset and one month put P&L.
Figure 4.4.4: Joint plot for underlying asset and one month delta hedged put strategy P&L.
65
From the figures above, we see how the options P&L for the buy and hold strategy eventually
converges to max (K − ST , 0) − p0 , while the delta hedged strategy offers some convexity at the end
of the one month period. From the last figure, we clearly see that the delta hedging has been fairly
successful. We note however that it is not perfect, both due to the frequency of the hedging, which
is one day, and due to some of the simplifying assumptions that have been made in the pricing and
estimation of ∆ using the Black and Scholes (1973) formula. For more information about option
“Greeks” and delta hedging, see Hull (2021).
66
give an accurate representation of the actual risk and return characteristics.
We note that the price in Figure 4.6.1 is given in index points. Readers can see the accompanying
code for an example of how to convert these prices to relative P&Ls for the options, using the portfolio
67
management framework for derivatives from Section 6.1, and returns for the equity index. It is also
important to remember that there is a joint simulation of the implied volatilities and interest rates for
the options. Hence, it is not just the underlying index that affects the option prices.
Finally, we show a joint histogram plot for the equity and bond returns in Figure 4.6.2. The bond
return has been computed using the formula from Section 4.1.1 with simulated zero-coupon interest
rates. We note from Figure 4.6.2 that the state of the data is such that one-month correlation between
bonds and equities is positive. We also note that the bond risk is very low compared to the equity
risk. Finally, it is important to keep in mind that the time series simulation which follows with
the fortitudo.tech Python package is not necessarily a realistic representation of investment market
behavior, because it is generated using stochastic differential equations presented in Section 3.2.3.
68
Chapter 5
This chapter gives a comprehensive presentation of the Entropy Pooling (EP) method, introduced by
Meucci (2008a) and refined by Vorobets (2021), as well as its applications in causal and predictive
market views and stress-testing as introduced by Vorobets (2023). While all this content is in principle
publicly available already, it seems to be challenging for most people to understand. Hence, this
chapter fills in the gaps in one place with a careful treatment and unified notation.
Entropy Pooling is a way to update the prior probability vector p, associated with the market
Monte Carlo distribution R, by inputting information about the desired update. Some people like to
think about Entropy Pooling as a generalization of the Black-Litterman (BL) model without all the
oversimplifying normal distribution and CAPM assumptions, additionally avoiding the questionable
engineering with the τ parameter. I think EP is so much more than that, so it is not doing it justice
to make this comparison, which you will hopefully also discover after reading this chapter.
Section 5.1 presents the basic version of Entropy Pooling, which at its core is a relative entropy
(Kullback–Leibler divergence) minimization between the prior probability vector p and the posterior
probability vector q subject to linear constraints on the posterior probabilities. The section also
presents some of the common views specifications, ranking views, and the nuances of VaR and CVaR
views.
Section 5.2 presents the Sequential Entropy Pooling (SeqEP) refinement introduced by Vorobets
(2021), which improves on many of the limitations imposed by the requirement that views are specified
as linear constraints on the posterior probabilities and usually gives much better results. The sequential
approach can also solve practically interesting problems that the original approach simply cannot.
Section 5.3 presents aspects related to view confidences and multiple users or states, which is
presented in the appendix of Meucci (2008a) but is still a largely unexplored area of the EP method.
View confidences give additional nuances to the method, and user weights or state probabilities allow
us to input potentially conflicting views that are aggregated into one posterior distribution.
Section 5.4 uses the multiple states results and presents the Causal and Predictive Market Views
and Stress-Testing framework from Vorobets (2023). This framework combines Entropy Pooling with
a causal Bayesian network layer on top to generate joint causal views and their associated joint prob-
abilities, which allow us to compute a single posterior distribution that incorporates the causality
assumptions from the Bayesian network.
69
5.1 Entropy Pooling (EP)
As mentioned in the introductory Section 1.1, Entropy Pooling (EP) solves the relative entropy
(Kullback–Leibler divergence) minimization problem subject to linear constraints on the posterior
probabilities
q = argmin xT (ln x − ln p) (5.1.1)
x
subject to
Gx ≤ h and Ax = b.
The first natural question that arises is: why minimize the relative entropy? The short answer is
because it has good properties for our updating problem, similar to the mean squared error having
good properties for linear regression. However, this explanation is probably insufficient for most people
from an intuitive perspective, which is arguably important for the adoption of a new method when
nontechnical people ask them to explain it.
To start building Entropy Pooling intuition, we must first be clear about what we intuitively will
not be able to explain. We will not be able to intuitively explain the actual value of the relative
entropy, but we can transform it to the effective number of scenarios given in (5.1.2) below, which
has a nice intuitive interpretation as the probability mass concentration over the S joint Monte Carlo
samples in R.
The relative entropy represents a statistical distance between two distributions p and q, while it
is not a mathematical metric because it is asymmetric and does not satisfy the triangle inequality. It
can be interpreted as the expected excess surprise from using the distribution q instead of p. We are
now approaching the essence of what we are doing when we are minimizing the relative entropy. We
are minimizing the spuriosity while updating our prior distribution p to the posterior distribution q.
In relation to the spuriosity, we probably want to avoid assigning all probability to one scenario s
and get a degenerate posterior market distribution q. Unless we have it as an actual view or stress-
test, we probably also want to avoid introducing dependencies that are completely opposite of what
we have in our prior simulation. For example, if we have two equity indices that are highly dependent,
we usually want a stress-test on one of them to affect the other, see the example with S&P 500 and
STOXX 50 in Section 5.2 below.
Besides operating on fully general Monte Carlo distributions R, taking the potentially very complex
dependencies into account is where the true power of Entropy Pooling comes from. For example, it
is just as easy to stress-test a portfolio containing only S&P 500 as a portfolio containing S&P 500
and thousands of derivatives on S&P 500. Entropy Pooling essentially makes a prediction on how
other instruments and factors are expected to behave under the posterior distribution q. As we will
see in Section 6.2.1 below, there is no need to reprice European put and call options when we perform
Entropy Pooling stress-testing. Their posterior P&L distribution is automatically given to us.
With all of the above in mind, it hopefully becomes clear that Entropy Pooling is so much more
than the BL model, which relies on the oversimplifying and empirically rejected normal distribution
and CAPM assumptions. It sounds nice that “equilibrium expected returns” can be extracted using
the BL model, but since these have very little to do with reality, they probably do not add any actual
investment value and might even be detrimental.
70
In summary, Entropy Pooling is a theoretically sound method for implementing market views and
stress-testing fully general Monte Carlo distributions R. It helps us predict what will happen to all
instruments that we invest in and all factors that our portfolios are exposed to. It will ensure logical
consistency in our derivatives P&L. As we will see in Section 5.3 below, Entropy Pooling also handles
view confidences and state probabilities in a much more natural, probabilistic way than the BL model
with its τ parameter, see Meucci (2008b) for an explanation of the issues and paradoxes. Hence, there
is no reason to continue using BL when fast and stable Entropy Pooling implementations are freely
available. If you for some reason want to use the CAPM prior, you can still do that to simulate R
while getting all the Entropy Pooling benefits when implementing views and stress-tests.
To explore Entropy Pooling further from a mathematical perspective, we start by noticing that
if all elements of the prior probability vector p are equal to S,
1
the second term of (5.1.1) reduces to
ln S, which does not affect the solution, i.e., in the uniform prior probability case we can reduce the
problem to
S
X
q = argmin xT ln x = argmax −xT ln x = argmax −
xs ln xs .
x x x1 ,x2 ,...,xS
s=1
PS
The expression − s=1 xs ln xs is known as the entropy, which we are trying to maximize when the
prior probability vector p is uniform.
In most practical cases, the prior probability vector p will be uniform, while we have the opportunity
to specify any valid prior probability vector with strictly positive elements that sum to one. Hence,
Entropy Pooling will in most practical cases correspond to entropy maximization subject to linear
constraints on the posterior distribution. Many more technical details are given about the properties
of relative entropy minimization by Caticha and Giffin (2006), who also show that it corresponds to
a generalization of Bayesian updating in the sense that information about the posterior distribution
is given by constraints instead of the usual data. Caticha and Giffin (2006) also explain that entropy
does not require any interpretation in this situation. It just has good properties for updating the
distribution when additional information is given about the moments of the posterior distribution.
There is generally a lot of information about the relative entropy (Kullback–Leibler divergence),
which is a quantity used in many different fields of mathematics and statistics. Readers who are
interested in exploring this further must be warned that the content can be confusing due to the order
of the distributions in (5.1.1). Sometimes, the objective is formulated equivalently as −xT (ln p − ln x),
while in other cases the formulation remains the same but the prior and posterior order is switched
for computational convenience. Finally, there are situations where the prior is implicitly assumed to
be uniform, in which case we have seen that relative entropy minimization corresponds to entropy
maximization.
A useful way of assessing how much the views and stress-tests deviate from the prior is called
the effective number of scenarios, introduced by Meucci (2012a) and given by the exponential of the
posterior probability entropy ( )
S
X
Ŝ = exp − qs ln qs . (5.1.2)
s=1
The idea is that the effective number of scenarios is Ŝ = 1 if all probability mass is given to one
scenario, while it is Ŝ = S when scenario probabilities are uniform and equal to S.
1
Note that we use
71
the convention that qs ln qs = 0 for qs = 0. It is often convenient to compute the relative effective
number of scenarios ŝ = Ŝ
S ∈ (0, 1]. We note that the relative effective number of scenarios is just the
exponential of the negative relative entropy when prior probabilities are uniform. Hence, in a uniform
prior case, a lower relative entropy gives a higher effective number of scenarios, which makes intuitive
sense as this indicates that the posterior distribution is close to the uniform prior distribution.
The second natural Entropy Pooling questions is: what are the matrices and vectors G, h, A, and
b? The short answer is that G and A contain functions of the Monte Carlo simulation R from (1.1.1),
while h and b contain constraint values for these functions. From an intuitive investment perspective,
this might not help you much, so we improve on that with examples below.
Consider for a moment how you would implement a constraint on the posterior expected value of
some price, return, or factor i = 1, 2, . . . , I. If we let Ri denote column i from the matrix R containing
the market Monte Carlo simulation, it should be easy to convince yourself that the prior expected
value is given by RiT p = µi . If we want this value to change for the posterior distribution to µ̃i , we
must implement the constraint that RiT x = µ̃i , which can be done through the matrix A and vector b.
More generally, we can write
RiT x ⪌ µ̃i ,
using ⪌ to indicate that the view can be an equality or inequality in one of the two directions, with
inequality views being implemented through G and h.
Questions related to Entropy Pooling’s limitations are more subtle. For example, we cannot im-
plement views that are not feasible for the Monte Carlo market simulation in R, i.e., if we want to
implement a view that the expected value of some return should be 10% while all our simulations are
below 10%, we cannot do that. It is hard to definitely say if this is a bug or a feature, because if R
does not contain all the possible scenarios that we believe can occur in reality, it is a prior problem
rather than a posterior problem. The prior problem should be fixed by better quality simulations
using, for example, the approaches from Chapter 3. Limiting ourselves to the scenarios in R allows
us to avoid the potentially computationally expensive repricing of derivatives and other instruments
after implementing risk factor views, so in that sense it is a feature.
A significant limitation of the EP method is, however, that views must be specified as linear
constraints on the posterior probabilities. Ideally, we would want to be able to solve the problem
with fully general constraints G (x) ≤ h and A (x) = b. Although we can specify nonlinear parameter
views with linear constraints on the posterior probabilities, we cannot specify them in full generality
because we must fix some parameters to be able to specify specific view types. For example, consider
the variance view
2
RiT ⊙ RiT x − RiT x ⪌ σ̃i2 ,
which is clearly nonlinear in the posterior probabilities x, where we use ⊙ to denote the element-wise
Hadamard product.
Note that we in the specification of views will use a concise programming broadcast notation, where
we subtract scalars from vectors. We use this notation to replicate the way that these views would be
implemented in most programming languages. From a strictly mathematical perspective, readers can
imagine that there is a conforming S-dimensional vector of ones multiplied with scalar values such as
72
2
RiT x that we do not want to constantly replicate.
To specify a variance view with linear constraints on the posterior probabilities, we must fix the
second term to some value µ̃i and specify the variance view through the two constraints
There is naturally the philosophical question of whether one can have a variance view without having
a mean view. While we will not delve into this, we note that it is a fact that we would be able to
implement the variance view with nonlinear functions on the posterior probabilities without specifying
a mean view. Hence, as we will clearly see later in Section 5.2, the linear constraints pose a significant
limitation. The original suggestion by Meucci (2008a) is to fix the mean to the prior, i.e., µ̃i = RiT p =
µi , while we note that this imposes an implicit view about the mean staying the same.
where λ and ν are Lagrange multipliers. Meucci (2008a) shows that the solution is given by
with ι being an S-dimensional vector of ones. The solution (5.1.3) illustrates that positivity constraints
on scenario probabilities x ≥ 0 are automatically satisfied and can therefore be omitted.
The original/primal problem formulation is potentially very high-dimensional as it is a function of
the number of scenarios S. On the other hand, the Lagrange dual function
G (λ, ν) = L (x (λ, ν) , λ, ν)
is only a function of the Lagrange multipliers λ and ν and therefore has dimension equal to the number
of views in addition to a Lagrange multiplier for the requirement that posterior probabilities sum to
one. Meucci (2008a) therefore proposes to solve the dual problem given by
and subsequently recover the solution to the original/primal problem (5.1.1) by computing
q = x (λ⋆ , ν ⋆ ) .
A fast and stable Python implementation for solving the Entropy Pooling problem is freely available
in the open-source packages entropy-pooling and fortitudo.tech.
73
5.1.2 Common View Specifications and Ranking Views
This section presents views specifications that are commonly used in practice. In particular, mean,
volatility, skewness, kurtosis, and correlation views. The convenient feature of these views is also that
it is easy to implement ranking views once we understand how views on these parameters are specified
directly. The list of views in this chapter is by no means exhaustive. It is simply the views that I
have seen being used the most in practice. Once you understand how to specify these views and the
limitations the linear constraints impose, you are encouraged to experiment with other types of views.
Before we start writing out the views specifications, we start by defining classes of views. This will
probably seem abstract to you at first, but it is actually quite simple once you grasp it. Understanding
this view classification will be essential for understanding the Sequential Entropy Pooling refinement in
Section 5.2, which solves many of the issues imposed by the linear constraints requirement and usually
gives significantly better results.
We have already seen that some views can be naturally specified through linear constraints without
loss of generality, for example, views on the mean. However, views on variances require us to fix the
mean in order for us to be able to implement them as linear constraints on the posterior probabilities.
With this in mind, let Ci , i ∈ {0, 1, 2, ...}, denote the class of parameters that require i other parameters
from some or all of the classes Cj , j = 0, 1, .., i − 1, to be fixed in order to be formulated as linear
constraints on the posterior probabilities.
Finally, let C¯ denote the class of parameters that can be formulated as linear constraints on the
posterior probabilities but do not belong to any Ci . Hence, C = C0 , C1 , C2 , ...., C¯ is the class of all
parameters that can be formulated as linear constraints on the posterior probabilities, with or without
fixing other parameters.
Many commonly interesting parameters can be characterized by the Ci classes. For example, as we
have already seen, means belong to C0 , and variances belong to C1 (mean fixed). Below, we will see
that skewness and kurtosis belong to C2 (mean and variance fixed), and correlations belong to C4 (two
means and two variances fixed), while CVaR views belong to the residual class C.¯
Looking at the linear constraints Gx ≤ h and Ax = b, we see that a particular Entropy Pooling
parameter view is formulated as
T
f (R) x ⪌ c,
S
X
xs = ιT x = 1,
s=1
which we implement through the equality constraints matrix A and vector b. As we saw from the
solution to EP problem (5.1.3), the positivity requirement for scenario probabilities x ≥ 0 will auto-
matically be satisfied, so we do not need to include it in our constraints.
As we have already seen, a view on the expected value of price, return, or factor i is the easiest to
implement as
RiT x ⪌ µ̃i .
74
Note that most other views require multiple constraints in order for them to be formulated as linear
constraints in a logically consistent way. For example, a view on the variance requires an equality view
on the mean in addition to a view on the variance using the mean view value, i.e.,
If we do not fix the mean with a view on the expected value i, there is no guarantee that we are
subtracting the right constant in the variance view. The interested reader is encouraged to try this out
and calculate the posterior variance to verify that it usually becomes incorrect without the mean view.
It is generally good practice to compute posterior statistics using the posterior probability vector q to
verify that views have been implemented correctly.
With a good understanding of mean and variance views, we proceed to skewness, kurtosis, and
correlation views. These views require us to fix means and variances, so we will assume that these
have been fixed through the constraints
The astute reader might have recognized that views on parameters are simply specified as their
definition for discrete distributions, which the Monte Carlo market simulation R represents. We note
that we make no assumptions on the actual market distribution being discrete, but any samples R
from this distribution will be. It is up to the reader to assess what they think is a sufficiently good
approximation, but sample sizes of S = 10, 000 seem to be sufficient for most practical purposes, while
keeping the computation time unnoticeable. Readers who are interested in seeing a practical example
of all these view types being implemented are encouraged to see the case study of Vorobets (2021).
So how would we implement a ranking view? Simply by subtracting two view specifications from
each other. For example, for views on the expected value
T
RiT x − RjT x = (Ri − Rj ) x ⪌ 0.
75
It can often be convenient to multiple one of the terms by a scalar a and specify views such as
T
(Ri − aRj ) x ≤ 0.
This allows us to implement views such as µi ≤ 2µj , i.e., that the expected value of i is at most a = 2
times larger than the expected value of j, assuming that µj > 0.
We will not write out the ranking views for variances, skewness, kurtosis, and correlations, but these
are also simply subtracting one view specification from another potentially scaled view specification of
the same parameter type. It is a good exercise for readers to try this out on their own and compute
the posterior values to see whether they have understood this concept or not. Ranking views can of
course also be specified across different parameters, for example, that the expected value of i is half
the variance of j. While this is a technical possibility, I have not seen it being done in practice yet.
With some practice, this section will hopefully give readers the understanding they need to explore
Entropy Pooling views on their own. Readers are generally encouraged to share their experience and
view ideas. Practically interesting use cases might be added to this section over time.
We can then simply add ai,α to the matrix G or A as well as 1 − α to the vectors h and b, depending
76
on whether the view is an inequality or equality view. Hence, for VaR views we simply identify the
scenarios where the losses are greater than the VaR view value ṽ and assign them a total probability
of probability 1 − α. Since VaR views do not require any parameters to be fixed, VaR views belong to
the class C0 . Note that we use the convention that the VaR view value ṽ is specified as a loss, meaning
that an α-VaR value of 10% assigns 1 − α of the probability mass to the scenarios where the elements
of Ri are below −10%. We will use the same convention for CVaR views below.
CVaR views introduce an additional complexity when we do not have an equality VaR view, because
we do not a priori know the optimal number of scenarios below the VaR, which we can adjust the
probabilities of to achieve the desired CVaR view value cv.
˜ By the optimal number of scenarios below
the still undetermined VaR value v̄, we naturally mean the number of scenarios that gives us the lowest
relative entropy while satisfying the CVaR view.
Once we have a fast and numerically stable algorithm to find the optimal number of scenarios below
the VaR, CVaR views become straightforward to implement simultaneously with an equality VaR view
implemented through the constraints ai,α in A and 1 − α in b, replacing ṽ with v̄ when computing as
using (5.1.4). In the presentation of CVaR views below, we will assume that this has been done, so
that the VaR is fixed to some value v̄, which we did not have a view on.
Assuming that we know the number of scenarios below the α-VaR value v̄, α-CVaR views on return
i with a view value cv
˜ can be formulated as
To understand why this view combination gives us the desired α-CVaR value, let us define CV ⊆
{1, 2, . . . , S} as the set of indices below the α-VaR value v̄. Hence, the sample CVaR is then given by
RiT ⊙ ai,α x
P
s∈CV Rs,i xs − (1 − α) cv
˜
E [Ri |Ri ≤ −v̄] = P = ⪌ = −cv.
˜
s∈CV xs ai,α x (1 − α)
We note again that we define the α-CVaR value as a loss, which is why we have a minus in front of
the view value cv.
˜
As we do not know the number of scenarios below the VaR value a priori, CVaR views belong to
the residual class C.
¯ If we have a VaR view, CVaR could in principle be considered a C1 view, but as we
will see in Section 5.2 this would impose unnecessary implicit constraints, so it is never recommended
to classify CVaR views as belonging to C1 .
With CVaR views being classified as C,
¯ we can shed some more light on what intuitively happens
when we compute the optimal number of scenarios below the VaR. Imagine a situation where we have
implemented all views except the CVaR view for i, and that we do not have a VaR view for i. The
brute-force solution is to loop through all possible configurations of scenarios below the VaR value,
which there are in principle S of, and then select the solution with the lowest relative entropy that
also satisfies the CVaR view. This is obviously a quite slow way of finding the optimal solution, so
we must build algorithms that make good initial guesses and quickly identify the optimal number
of scenarios without getting stuck or being too sensitive to the numerical issues that are inherent to
practical implementations.
To further understand the issues with developing algorithms for finding the optimal number of
77
scenarios below the VaR value, see Meucci, Ardia, and Keel (2011). Going into details with these
algorithms is beyond the scope of this book and would derail our focus. Interested readers can study
the original article by Meucci, Ardia, and Keel (2011) to see a mathematical formalization of how such
algorithms can be designed, while we note that their proposed algorithm for fully general Monte Carlo
simulations R is not stable enough in practice.
Although we do not go further into detail with general algorithms for implementing the CVaR
views, there is an example with a view implementing a 50% increase in the 90%-VaR and 90%-CVaR
for daily S&P 500 returns. This gives readers a practical example of how VaR and CVaR views are
eventually implemented. We also assess which effect the combined VaR and CVaR view for S&P 500
has on the STOXX 50 daily return distribution. Figure 5.1.1 shows the results. Readers are encouraged
to examine the accompanying code to this section for more details and see how the VaR and CVaR
views are implemented and validated to verify that they really understand the constraints above.
Figure 5.1.1: Daily return distributions including 90%-VaR and 90%-CVaR views for S&P 500.
As with almost all aspects of CVaR analysis for fully general distributions, it is significantly harder
to implement CVaR Entropy Pooling views. However, the analysis of fully general distributions and
their tail risks also gives us insights that are simply not possible to get using conventional methods based
on the mean-variance oversimplification of the market. When we stress-test and analyze scenarios of
historical or market simulations, we are really starting to look into the complex nuances of investment
markets. Our imagination truly becomes the most limiting factor. Hence, although this section has
given you some examples of practical use cases and view specifications, you should not be limited by
these. Once you understand the Entropy Pooling technology, you might get ideas that go well beyond
what is shown in this and the following sections.
A final interesting use case of CVaR views is to implement them directly on a portfolio’s return and
analyze how the marginal risk contributions change, both due to the higher standalone CVaR values
but also due to changes in diversification properties. See Figure 5.1.2 for the results of such an analysis
with a simple log-normal prior return simulation, where we have implemented a 50% increase in the
90%-CVaR of a portfolio containing 25% equities, 20% alternatives, and the rest in government and
corporate bonds. This example is generated using a proprietary implementation.
78
Figure 5.1.2: Portfolio CVaR view.
79
This fundamental hypothesis is also what practitioners generally experience, while there is no
guarantee that it is always true. It is perhaps also possible to generate simulations that specifically
exploit the design of the sequential heuristics, but for simulations that resemble real-world market
behavior and views on commonly interesting parameters, the assumption seems to hold most of the
time.
We now proceed to define the sequential heuristic algorithms, closely following some sections from
Vorobets (2021). A particular set of views V can be partitioned in a similar way to the view class C,
i.e., V = V0 , V1 , ..., VI , V̄ with each Vi being the set of views on parameters that belong to Ci , and
V̄ being the set of views on parameters that belong to C. ¯ The main idea of the sequential heuristics
is to process views according to this partition, carry forward the updated parameters θi , i = 0, 1, ..., I,
and use them to set fixed values when specifying the views in Vj , j = i + 1, i + 2, ..., I, and V̄. More
specifically, EP is sequentially applied to the sequence of views with increasing cardinality given by
V 0 = {V0 } , V 1 = {V0 , V1 } , ..., V = V0 , V1 , ..., VI , V̄ . Note that the final set in this sequence contains
all views V, so the final posterior probabilities are guaranteed to satisfy all views, assuming that the
views are feasible for the scenarios in R of course.
With the partitioning of the views established, the remaining question is which prior probability
to use in the sequential processing. There are two natural choices in this regard. One is to use
the original prior probability vector p, while the other is to use the updated posterior probabilities
q0 , q1 , ..., qI associated with the updated parameters θ0 , θ1 , ..., θI . This choice is exactly the difference
between the two heuristics. Algorithm 1 (H1) uses the original probability vector p in all iterations,
while Algorithm 2 (H2) uses the updated posterior probabilities q0 , q1 , ..., qI , except the first iteration
where p is used.
The two heuristics usually lead to similar final posterior probabilities q, but H1 is slightly better
when measured by the relative entropy, as each relative entropy minimization step is against the
original probability vector p, while H2 is usually slightly faster. Hence, H2 can be used if computation
time is a crucial factor, while H1 is recommended for all other purposes.
Before presenting the sequential heuristics, some additional definitions must be established. For
convenience, we define q−1 = p and θ−1 = θprior . By EP V i , θ, r we mean that the EP method
presented in Section 5.1 is applied to the set of views V i using the parameter values in θ as fixed values
when necessary and r ∈ RS as the prior probability vector. Finally, f (R, r) denotes the function that
computes updated parameter values.
The two sequential heuristics are given by Algorithm 1 (H1) and Algorithm 2 (H2) below.
80
Algorithm 5.2 (H2)
for i ∈ {0, 1, ..., I}
if Vi ̸= ∅, compute qi = EP V i , θi−1 , qi−1 and θi = f (R, qi )
In the last six years, where I have worked with the Sequential Entropy Pooling heuristics quite
extensively, there has only been one extreme instance where the original heuristic of always using the
prior value has given a better result. In all other cases, the sequential heuristics have given significantly
better results. Readers are encouraged to test this out in practice on their own data and compare the
performance gains through the relative entropy or effective number of scenarios as well as a visual
assessment of the view/stress-test.
Below is a very simple case study using the H1 heuristic on daily S&P 500 and STOXX 50 data.
The prior statistics are given in Table 5.1, while the posterior statistics for, respectively, the original
Entropy Pooling heuristic and H1 are given in Table 5.2 and Table 5.3. We implement views on
STOXX 50 expected return and the volatility of S&P 500. The relative effective number of scenarios is
85% for the original heuristic and 89.4% for H1. For a more advanced case study involving skewness,
kurtosis, and correlations views, see Vorobets (2021).
Table 5.1: Prior statistics for S&P 500 and STOXX 50 daily returns.
Table 5.2: Posterior statistics for S&P 500 and STOXX 50 daily returns with original EP heuristic.
Table 5.3: Posterior statistics for S&P 500 and STOXX 50 daily returns with H1 EP heuristic.
From the tables above, we clearly see the constraint that the original Entropy Pooling heuristic
imposes by fixing the mean of S&P 500, which is marked with an underscore in Table 5.2. Views are
marked with bold. Figure 5.2.1 shows the prior and posterior distributions for both S&P 500 and
STOXX 50. A visual inspection reveals that the sequential heuristic H1 also gives results that look
nicer and more realistic, without sudden kinks in unexpected areas of the distribution.
81
Figure 5.2.1: Prior and posterior distributions for S&P 500 and STOXX 50.
To understand the reasoning behind the sequential algorithms, the idea for both is to only fix
parameters when we absolutely have to according to the class definitions C = C0 , C1 , C2 , ...., C¯ . Hence,
we allow views for the classes Ci to affect the parameters of the classes Cj , i < j, for as long as we
can. The difference between H1 and H2 is then whether we anchor the sequential updates in the latest
intermediate posterior probability qi or the prior p. You can think of H1 as looking into the future and
then taking the information back to the prior to look further into the future, while H2 takes a step in
the right direction and continues from there. As each iteration of H1 is against the prior probability, it
is natural that it tends to give the best results, while the author currently cannot provide a proof that
it will always be the case. Readers are encouraged to explore the methods and share their experiences.
With the above principles in mind, it hopefully becomes clearer why we characterized CVaR as
a parameter belonging to C.
¯ There is simply no need for us to process it as a C1 parameter. In the
algorithms that we use to find the optimal number of scenarios below the VaR, we quite clearly are
likely to get a lower relative entropy if we allow views before C¯ to affect the final value of VaR before
implementing the CVaR view.
The sequential algorithms do not only give us better results with lower relative entropies and higher
effective number of scenarios. They also give us distributions that look more realistic. See for example
the accompanying code to this section and another example in Figure 5.2.2 below. The example in
Figure 5.2.2 implements several multi-asset views and compares the result of the original Entropy
Pooling heuristic, which always fixes parameters to their prior values, and the H1 heuristic. These
computations are performed using a proprietary implementation, so the accompanying code is not
provided, but the views are exactly the same in the two cases. We simply see that the H1 heuristic
gives significantly better results and more realistic-looking distributions.
There are some additional important benefits to the sequential heuristics. For example, they allow
us to solve problems where we have a ranking view that must increase the expected value of a to make
it higher than the expected value of b, and then implement views on higher moments for one or both
of the assets. This is not possible with the original approach. It is also convenient that means are
updated automatically, for example, in the case where we have views on two assets’ returns and a view
on the variance of a basket of the two. See Vorobets (2021) for more perspectives.
82
Figure 5.2.2: DM and EM equity posterior distributions.
83
In the simplest case, you have a set of views V and some common confidence c in these views.
Hence, it is natural to allocate this confidence to the posterior probability vector qV and the rest to
the prior, so the final posterior becomes
q = cqV + (1 − c) p.
Note that we require that the total confidence sums to 1. We cannot logically be more than 100%
confident in our views.
What happens when we have multiple views with multiple confidences? For example, that the yearly
expected return of S&P 500 should be 10% with 70% confidence, and that the yearly expected return
of STOXX 50 should be 12% with 90% confidence. What does that mean? Obviously, we cannot
just allocate 70% confidence to one posterior probability vector qV1 and 90% to another posterior
probability vector qV2 . If we think more carefully about it, it means that with 70% confidence you
believe that the expected return of S&P 500 should be 10% and the expected return of STOXX 50
should be 12%. With additionally 90% − 70% = 20% confidence, you believe that the expected return
on STOXX 50 should be 12%. And finally, you allocate the last 1 − 90% = 10% to the prior. Ordering
and partitioning views in this way is what we will define as view confidence.
Note that we use the notation Vi instead of Vi to distinguish between partitioning of views according
to the Sequential Entropy Pooling heuristics and view sets in general. To define how view confidence
is handled in general, imagine that we have a set of views V = {V1 , V2 , ..., VI } and their associated
confidences c = {c1 , c2 , . . . cI }, ordered from lowest confidence to highest confidence, i.e., ci ≤ cj for
i < j. Let us then define V̄i = {Vj |j ≥ i}, e.g., V̄1 = V = {V1 , V2 , ..., VI } and V̄2 = {V2 , V3 , ..., VI }.
Hence, the posterior probability vector with view confidences become
I
X
q= (ci − ci−1 ) qi + (1 − cI ) p, (5.3.1)
i=1
where qi = EP V̄i , p and c0 = 0 for convenience. Note that EP V̄i , p can be Entropy Pooling
with the original heuristic or one of the sequential heuristics H1 or H2 from Section 5.2. The general
formula (5.3.1) might seem complicated at first, but it is the same principle as describe in the simple
case above with c1 = 70% and c2 = 90%. It is a good exercise to verify that you can see this.
To conclude view confidences, we again underline that it must be a set of logically consistent views.
For example, we cannot be 50% confident that a parameter is equal to 10 and 50% confident that it is
equal to 12. We can, however, believe that there is a 50% probability that the parameter is equal to
10 and 50% probability that the parameter is equal to 12. This is assigning probabilities to states or
weights to different users with conflicting views, which Entropy Pooling also can handle.
We can think of view confidences as the probabilities we assign within one user’s views or one state,
while the other probabilities and weights are assigned across states or users. This distinguishing will
be important in Section 5.4 below, where we can define views with multiple confidences for each state
and have conflicting view values across states. Handling of probabilities assigned to users and states
is quite simple. We just need to make sure that they are positive and sum to one. How we determine
these probabilities and view confidences is however more complex, but we have full flexibility.
84
5.4 Causal and Predictive Market Views and Stress-Testing
This section presents the Causal and Predictive Market Views and Stress-Testing framework from
Vorobets (2023). Contrary to the article, you now have a better understanding of Entropy Pooling
view confidences and the state probability weighting of posterior probability vectors. The causal and
predictive framework is essentially a combination of Bayesian networks (BNs) and Entropy Pooling,
where the Bayesian network acts as a causal joint view generator that additionally produces the state
probabilities for each posterior probability vector through the joint view probabilities. EP is then
used in the usual way to project each of the joint views over the market simulation represented by the
matrix R. Hence, the framework naturally combines and leverages the strengths of the two methods.
The idea of using Bayesian networks for causal market analysis gained traction after the introduc-
tion by Rebonato and Denev (2011), while the idea of combining Bayesian networks with Entropy
Pooling was first introduced by Meucci (2012b). Contrary to the framework introduced by Meucci
(2012a), which discretizes the Monte Carlo simulation R into bins and applies EP to multivariate
histograms, the framework in this book does not impose such limitations. Instead, we work with the
fully general market representation (1.1.1).
Rebonato and Denev (2014) give a careful treatment of the thoughts behind building Bayesian
networks for investment analysis, which have inspired the basic use cases of the framework in this
book. However, the book’s framework can be used in many creative ways that go well beyond the
original perspectives, see Vorobets (2023) for several case studies. The presentation of the framework
will naturally draw quite heavily on Vorobets (2023), while hopefully being easier to understand given
the deeper understand of Entropy Pooling, view confidences, and state probability weighting.
85
Figure 5.4.1: Simple Bayesian network illustration.
social sciences. It is usually non-trivial to prove causality, and it is also the hardest part of estimating
BNs. Once the conditional dependencies are specified, estimating the probabilities of a discrete BN
is straightforward. In this framework, we suggest that you specify both the conditional dependencies
and probabilities for the BN. This gives a high level of flexibility, allowing you to hypothesize about
and impose future causal relationships that are not necessarily strongly present in historical data.
However, a BN where both the causal structure and probabilities are estimated based on historical
data will work just as well in this framework.
Specifying the probability tables of a discrete BN might seem as a daunting task at first, but it is
usually easier than one expects if the task is approached in a structured way. The following four step
procedure works well in practice:
In all of these steps, it of course helps to have an elegant implementation of the BN technology to
keep track of and manage all the information. Especially in step 3, it is convenient to let the tables
be auto-generated. For step 4, it is also convenient to be able to specify probability ranges or leaving
probabilities unspecified with some method helping you to estimate a probability based on the defined
BN structure. A suggestion is to use maximum entropy for inferring missing values as described by
Corani and Campos (2015).
With a high-level understanding of BNs, the question remains how to use them for investment
analysis. The idea is well-described in Rebonato and Denev (2014), so it is only briefly summarized
here: we want the relevant assets, returns, or factors to correspond to the leaf nodes of the BN. For
example, for an asset allocation investor, the leaf nodes E, D, and F in Figure 5.4.1 could be variables
like real rate, inflation, and risk premium, while the other nodes are variables that causally affect the
distribution of the real rate, inflation, and risk premium, see the case study in Section 5.4.4 below.
86
Letting X = (X1 , X2 , . . . , XN ) ∈ NN denote the N -dimensional vector of random variables/nodes
in a discrete BN, the joint probability can be computed using the well-known chain rule factorization
N
Y
P (X1 , X2 , . . . , XN ) = P (Xi |pa (Xi )) ,
i=1
where P (Xi |pa (Xi )) denotes the probability of Xi conditional on its parents. Since root nodes do not
have any parents, P (Xi |pa (Xi )) = P (Xi ).
Letting LN ⊆ {1, 2, ..., N } denote the leaf node indices, the framework in this article mostly focuses
on the joint distribution of the leaf nodes Xi , i ∈ LN , given by
Y
P ({Xi |i ∈ LN }) = P (Xi |pa (Xi )) . (5.4.1)
i∈LN
87
5.4.3 Additional Perspectives
The framework is introduced with one variable for each leaf node, but it is actually much more flexible
than that because we can use fully general EP views for each of the leaf node states. The only limitation
is that a leaf node cannot contain views that contradict the views from another leaf node. For example,
you cannot have a state in one leaf node with a view that some variable should be equal to 10, while
you have a view in another leaf node that this variable should be equal to 20. You must implement
these views using the same node with two different states to ensure logical consistency in the joint
views. This is the only restriction.
It can be tempting to interpret the framework in the following way: the causality comes from the
BN, while the predictiveness comes from EP. This is a reasonable way to think about the framework,
but it is strictly speaking not correct. It is clear that the BN defines causal relationships, this is
one of its main features. However, there is also an element of predictiveness when one conditions on
realizations of the variables, making some events more and less likely. Similarly with EP, where the
predictiveness aspect is clear, while there can also be causal elements in the market simulation, for
example, interest rates causing changes in bond prices.
It is important to realize that assuming the causal relationships in the BN introduces an EP view,
even when we do not condition on realization of relevant variables. How much this view deviates from
the prior can be assessed by the relative entropy between the prior and unconditional posterior. If the
prior is uniform in scenario probabilities, the (relative) effective number of scenarios can also be used
to assess how concentrated scenario probabilities are.
Although the BN defines a causal structure, it can also be used to answer non-causal questions. For
example, if one has defined a BN where a central bank’s decision depends on inflation and employment,
one can condition on the central bank being hawkish and compute the probability of high inflation or
low unemployment, i.e., answering the question of how the distribution of these variables should be in
order for the central bank to be hawkish. These results can then be compared to what is implied by
the market and used for taking positions.
A full implementation of the framework does not only have a good interface for BNs combined with
maximum entropy for estimating partially specified probabilities, but also an integration of SeqEP
presented in Section 5.2 with support for rank views, view confidences, and CVaR views. Building
such an implementation is a very daunting task that should probably only be attempted by the most
determined. However, once successful the framework allows investment and risk managers to perform
sophisticated market views and stress-testing analysis at a level that is way beyond current standards.
For case studies using the framework presented in this section, see Vorobets (2023). There is
a simple Bayesian network example available as open-source code using the fortitudo.tech Python
package. There is even a video walkthrough of the article and the accompanying code. This book will
purposefully not replicate more from the article than what has already been done, except present some
perspectives on asset allocation cases in Section 5.4.4 below, which are interesting and easy to relate
to for most people.
We conclude this section by stating that the framework is incredibly flexible and powerful in
the hands of skilled quantamental investment practitioners, who are usually very excited about its
possibilities.
88
5.4.4 Asset Allocation Case Study
A popular case that most investment managers can relate to is an analysis of the macroeconomy and
translation into portfolio P&L. Specifically, consider the Bayesian network given in Figure 5.4.2. This
is a plain vanilla application of the framework, where each leaf node consist of a key macro risk factor.
Many asset allocation investors think about the risk in their portfolios from a perspective similar to
this one.
Figure 5.4.3 shows the result of a stagflation and rate hike stress-test using this Bayesian network
for a low risk portfolio and its tail risk hedge. The case uses a proprietary implementation and is
therefore not available in the accompanying code.
Figure 5.4.3: Portfolio and tail hedge prior and posterior P&L.
89
Chapter 6
Portfolio Optimization
This chapter focuses on portfolio optimization, in particular CVaR portfolio optimization for fully
general Monte Carlo simulations R and associated probability vectors p and q, i.e., the starting point
presented in (1.1.1). An elegant aspect of CVaR optimization in practice is that it operates directly on
the market simulations R and the associated probability vectors p and q. Hence, it does not matter how
complex our simulations or market views and stress-tests are, CVaR will give us meaningful results.
Section 6.1 presents the portfolio management framework that we work with throughout the book,
first documented by Vorobets (2022a). By making a separation between relative exposures e and
relative market values v, we can handle derivative instruments as easily as plain vanilla cash instruments
such as stocks and bonds.
Section 6.2 compares CVaR optimization to the traditional variance optimization, showing how
CVaR optimization problems can be solved with linear programming and giving an understanding of
why CVaR optimization is much harder to implement in practice than variance optimization. However,
fast and stable algorithms that make CVaR optimization practically feasible exist.
Section 6.3 presents portfolio optimization problems with multiple risk targets, in particular a risk
target for overall portfolio risk Rtarget and a risk target for deviations from a benchmark portfolio
RT E . The joint optimization of these two risk targets introduce important trade-offs between the
risk of the benchmark and the risk of the deviations from the benchmark. This analysis is initially
presented for variance optimization, because it makes it easy for us to build intuition by representing
diversification with the correlation, while the results are eventually generalized to CVaR.
Section 6.4 introduces parameter uncertainty into the portfolio optimization problem. We focus
in particular on resampled optimization with the perspectives introduced by Kristensen and Vorobets
(2024). The section introduces a new Exposure Stacking generalization, which is coined Resampled
Portfolio Stacking. While derivatives are easy to handle in general portfolio management, they in-
troduce significant complexities when it comes to portfolio optimization with parameter uncertainty.
Conveniently, Entropy Pooling helps us solve these problems as explained by Vorobets (2024).
Section 6.5 introduces a new method for intelligent portfolio rebalancing, which is based on the
Resampled Portfolio Stacking perspectives and presented for the first time in this book. Portfolio
rebalancing is an essential part of portfolio management, but it is usually handled in an ad hoc manner
without a framework for thinking about the rebalancing problem. This book suggests an improvement.
90
6.1 Exposures and Relative Market Values
The most well-known portfolio in investment management is perhaps the long-only portfolio character-
PI
ized by the constraints i=1 wi = 1 and wi ≥ 0, with wi representing the weight of the total portfolio
value invested in asset i. There are no issues with this characterization for simple portfolios consisting
of only cash instruments and in fact being long-only. However, modern portfolios are increasingly
utilizing derivatives, making the traditional framework insufficient.
Not much attention has been given to generalizing the traditional framework to portfolios that invest
in derivative instruments. As a consequence, practitioners usually try to treat derivative portfolios in
the same way as portfolios investing in cash instruments only. A key quantity in this attempt is the
offsetting cash position, introduced to make everything sum to one by offsetting the leverage introduced
by the derivative instruments.
This book argues that although the offsetting cash approach might seem like a natural first guess
by trying to somehow replicate the derivative exposure, it is in fact unnecessarily complicated. It
is much easier and fairly straightforward to treat the derivative positions as they are by properly
separating exposure/notional and market value/price. The exposures are then used for all aspects
of portfolio management, while prices are used for elementary bookkeeping, see Vorobets (2022a) for
more perspectives.
To understand the need for separating between exposure/notional and market value/price, consider
a futures contract. The futures contract has initial market value of zero, but a positive or negative
exposure depending on the direction of the position. It follows immediately that it is meaningless to
use the market value of the futures contract as a measure of exposure, because it will appear as if the
contract has no effect on the portfolio’s future P&L distribution. Hence, it is important to realize that
it is the exposure that determines the portfolio’s future P&L distribution, while the market value is
used for elementary bookkeeping.
The usual self-financing constraint found in, e.g., portfolio optimization problems reads something
like
I
X
wi = ιT w = 1, (6.1.1)
i=1
with ι being an I-dimensional vector of ones. With the above reasoning in mind, this book argues that
the self-financing constraint should be more accurately expressed as
I
X
vi ei = v T e = 1, (6.1.2)
i=1
with v being an I-dimensional vector of relative market values (the value of the derivative instrument
with exposure/notional set to one), and e being an I-dimensional vector of exposures relative to
portfolio value.
For plain vanilla cash instruments like stocks and bonds, market value and exposure are the same.
Hence, v = ι and e = w. This illustrates how (6.1.1) is in fact a special case of (6.1.2). Now imagine
that we are allowed to invest in at-the-money forward European put and call options. The price of
these options with one year to expiry, zero interest rate, zero dividends, implied volatility of 15%, and
91
a notional of Ei = 100 is approximately Vi = 5.98, see the accompanying code to this section. So what
is vi in this case? That is easily calculated as vi = Vi
Ei = 5.98
100 = 0.0598.
The vector of relative market values v makes it easy for us to calculate the value Vpf of any portfolio
characterized by the exposures vector E ∈ RI , i.e.,
v T E = Vpf . (6.1.3)
I I I
X X Vi − Vi,0 Ei,0 1 X
rP F = ∆vi ei = = Vi − Vi,0 . (6.1.5)
i=1 i=1
Ei,0 Vpf Vpf i=1
Note that (6.1.5) is exactly how portfolio return is conventionally computed for cash portfolios. Hence,
instead of percentage returns and portfolio weights, we can use the relative P&L ∆vi and exposures
ei relative to portfolio value Vpf for portfolio optimization and risk decomposition.
For realistic optimization and rebalancing applications, the self-financing constraint (6.1.2) should
92
include transaction costs that are payable at the time of the trade. In that case, (6.1.2) becomes
v T e + T C (e − e0 ) = 1,
where T C (e − e0 ) is the transaction cost function (implicitly normalized by portfolio value Vpf ) for
the turnover e − e0 , with e0 ∈ RI being the vector of initial exposures relative to portfolio value.
Using the traditional cash-based framework in (6.1.1), long-short portfolios are usually characterized
by the constraint
I
X
wi = 0.
i=1
However, it is rarely the case (if ever) in practice that one is allowed to take margin free risk, e.g.,
borrow a stock to sell it in the market and use all of the proceeds to take a long position. Usually, the
portfolio must be collateralized in some way. The value of the collateral/margin is thus the value of the
portfolio, and the long-short constraint is therefore more correctly implemented by the requirement
that
X
ei = 0,
i∈LS
where LS ⊊ {1, 2, ..., I} is the subset of instruments that can be used for the long-short exposures.
93
a case study verifying this result. Hence, you lose nothing from using CVaR in the oversimplified
textbook case, while you gain a lot when it comes to real-world investment analysis.
If we use demeaned returns when computing CVaR, it becomes more comparable to other deviation
measures, e.g., variance and (lower semi-)absolute deviation. Since most portfolios in practice have a
positive expected return, it will also give us more conservative risk estimates, which in practical cases
is arguably better than underestimating the risk. Readers can freely decide what they believe is best
for their purposes, while a formal treatment of the differences between these two choices is given by
Rockafellar, Uryasev, and Zabarankin (2006).
Besides giving meaningful results for fully general Monte Carlo distributions (1.1.1) and focusing
on investment tail risk, what are some other benefits of using CVaR? First of all, contrary to VaR,
CVaR is a coherent risk measure, respecting the diversification principle in (1.3.1), which is arguably
a desirable feature for an investment risk measure, see Artzner et al. (1999). From a less technical
perspective, it becomes harder to hide significant risks below the VaR value because the mean is
sensitive to outliers. Hence, CVaR is steadily overtaking VaR to become the preferred investment tail
risk measure among both market makers and investment managers.
While all of the above should be sufficient to stop using variance and start using CVaR for invest-
ment practitioners, there is an additional important aspect to CVaR. It is much easier to interpret than
variance for investment clients and other nontechnical stakeholders. For example, if you ask a regular
person which risk they can tolerate as the expected loss in the worst year out of 10 (90%-CVaR), it is
probably much easier for them to answer this question than which level of expected squared deviations
from the mean they can tolerate.
As an investment practitioner, you might have developed some sense of what 5% and 10% yearly
volatility looks like, but regular people do not have this sense. The ease of interpretability is very
important for nontechnical people on boards and asset management clients. Finally, if you tell a regular
person that your portfolio optimization method focuses as much on minimizing positive deviations from
the mean as negative deviations, you will probably see someone who looks at you in a very strange
way and tells you not to do that.
It is mainly finance and economics academics, or people who otherwise feel reputationally invested
in the mean-variance system they have used for the past many years, who continue to defend mean-
variance analysis despite its obvious and severe deficiencies. One of the seemingly scientific arguments
that these people sometimes present is that “CVaR implies risk neutrality below the VaR value”. This
argument is based on a utility theory definition of risk aversion, which in itself is known to be a very
poor representation of how people actually behave.
That utility theory is a good representation of reality seems to be a rumor among finance and
economics academics, which they take for granted without being able to share any studies that actually
show this. For a careful treatment of the issues with utility theory, see D. Friedman et al. (2014).
Interestingly, D. Friedman et al. (2014) conclude that people seem to behave more according to a
linear utility function subject to constraints, which fully agrees with practical CVaR optimization and
analysis.
If utility theory was a good representation of how people actually behaved, almost anyone who
was presented with the CVaR risk measure should feel a deep sense of unease by the “risk neutrality
94
below the VaR”. However, people’s reaction tend to be quite the opposite, where they feel that it
represents their preferences for avoiding losses quite well. If you tell them that minimizing the upside
is a byproduct of your optimization method, you will probably see them become very uneasy. Hence,
the reality is that mean-variance optimization does not represent what people actually want to do.
Interestingly, Markowitz (1952) already acknowledged this and argued that the focus should be on the
downside, but this was practically unthinkable with the technology that was available at the time.
Perhaps even more intriguing, Harry Markowitz told in an interview that he did not use mean-
variance to manage his own portfolio. He used the 1/N heuristic. While we will rely on heuristics when
it comes to resampled portfolio optimization in Section 6.4, the methods and framework recommended
in this book are actually used to manage the author’s own money.
We will not delve more into utility theory or the attempts to justify the continued use of mean-
variance analysis. The author’s hypothesis is that it is mainly driven by reputational investments
in this theory and the ease of implementing mean-variance optimization compared to mean-CVaR
optimization. We will not rely on any aspects related to utility theory and simply note that CVaR
seems to be a good representation of the investment risk that people want to avoid. For a more detailed
comparison between variance and CVaR as investment risk measures, see Vorobets (2022b).
1
(V aR⋆ , y ⋆ , e⋆ ) = argmin V aR + pT y
V aR,y,e 1−α
subject to
ys ≥ −Rs e − V aR ∀s ∈ {1, 2, . . . , S} ,
ys ≥ 0 ∀s ∈ {1, 2, . . . , S} ,
T
µtarget ≥ µ e,
v T e = 1,
e ∈ E.
T
In the above, y = (y1 , y2 , . . . , yS ) is a vector of auxiliary variables, while Rs represents row s from
the Monte Carlo simulation R, and p is the associated joint probability vector from (1.1.1). Finally,
µ = RT p is the vector of expected relative P&L’s, while v is the vector of relative market values
introduced in Section 6.1. We use E to represent the set of linear (in)equality constraints on the
95
portfolio exposures e, forming a convex polyhedron.
To focus on the essence of the problem, we do not include transaction costs, which require an
additional introduction of auxiliary variables e+ ≥ 0 and e− ≥ 0, representing the buys and sells, in
addition to the constraints e = e0 + e+ − e− with e+ e− = 0, and an extension of the self-financing
constraint to v T e + T C (e − e0 ) = 1 as explained in Section 6.1. We would also need to subtract the
transaction costs from the expected return as explained by Krokhmal, Palmquist, and Uryasev (2002).
From the CVaR optimization problem, we immediately notice that it requires an introduction of
S auxiliary variables ys , for s = 1, 2, . . . , S, in addition to 2S extra constraints. Hence, even though
it can be formulated as a linear programming problem, it is potentially very high-dimensional and
introduces a trade-off between computation time and approximation quality due to the discretization
and linearization of the objective function. In practice, the original formulation of the CVaR optimiza-
tion problem is significantly slower and less stable than the traditional mean-variance optimization.
This fact is especially inconvenient as portfolio optimization in practice almost always needs to include
parameter uncertainty as presented in Section 6.4 below.
Using the original formulation, solving CVaR problems subject to CVaR constraints is as straight-
forward as solving problems with expected return constraints. Krokhmal, Palmquist, and Uryasev
(2002) show that these problems simply require us to change the optimization objective to
1
V aR + pT y ≤ CV aRtarget ,
1−α
while of course removing the expected return constraint from our initial formulation above.
Krokhmal, Palmquist, and Uryasev (2002) show that we solve equivalent problems in both cases,
so it does not matter which method we use for the same combination of risk and return. This is a
feature that we will use in Section 6.3.1 below. While solving the problem with a single CVaR target is
straightforward, albeit slow and potentially unstable, solving the problem with multiple CVaR targets
becomes much more complicated. Imagine for example if we had two CVaR targets that would require
2S auxiliary variables and 4S constraints to solve the problem. Whatever issues we have with speed
and stability from just one CVaR target are guaranteed to be amplified in this case.
The most sophisticated investment managers usually want to solve CVaR optimization problems
with multiple risk constraints as presented in Section 6.3 below. Fortunately, there exist fast and
stable algorithms to solve CVaR problems. These are, however, hard to discover and very complex
to implement. Due to their proprietary nature, they cannot be shared in this book, while a fast and
semi-stable version of an algorithm solving the problem with a return target can be found in the
fortitudo.tech Python package.
It is left as an exercise for readers to really test out whether they understand the CVaR problem
formulation and solve a problem for a portfolio with a 5% return target and 25% individual upper
bounds on cash instruments as well as −50% to 50% bounds for derivative instruments, i.e., solving
the prior and posterior optimization problems from Vorobets (2022a). See the accompanying code to
96
this section, which gives you results for these problems using the fortitudo.tech package.
This is usually how most investment managers think about the risk and return of their portfolios,
while it can of course be partitioned in many other ways. To make it easier for us to analyze the main
ideas, we stick to the usual decomposition (6.3.1).
The return decomposition above implies that the portfolio variance can be expressed as
σP2 F = σBM
2
+ σT2 E + 2ρBM,T E σBM σT E . (6.3.2)
A meaningful risk budgeting workflow is to initially set the benchmark/long-term portfolio (in many
cases it is externally given to the investment manager) and then risk budget using σP F and σT E . An
important but perhaps trivial observation is that once the benchmark risk σBM is fixed, there are only
two degrees of freedom among the remaining parameters σP F , σT E , and ρBM,T E .
Several interesting insights can be extracted from analyzing (6.3.2). For example:
2
∂σP
1. The risk contribution from the tracking error portfolio is amplified by ρBM,T E since F
∂σT E =
2σT E + 2ρBM,T E σBM ,
2. A tracking error portfolio reduces the overall risk σP F when σT2 E + 2ρBM,T E σBM σT E < 0 ⇔
σBM < −2ρBM,T E , which can only happen when ρBM,T E < 0.
σT E
The practical implication of 1. is that tracking error underestimation affects portfolio risk σP F more
adversely the higher the correlation ρBM,T E is, i.e., you are more exposed to risk overshoots due to
estimation error related to σT E , e.g., if σT E = 3% while you estimate σ̂T E = 2%. Figure 6.3.1 below
illustrates this for various correlations ρBM,T E and tracking errors σT E . This figure clearly illustrates
that the risk overshoots increase with the correlation. See the accompanying code to this section for
97
the details of how these computations have been performed, and think about how the graph will look
for ρBM,T E = 1. You can use the code to validate your intuition.
Figure 6.3.1: Risk overshoot for σBM = 10%, σ̂T E = 2%, and various ρBM,T E .
There is a slightly more subtle second-order effect from the fact that you are probably also more
likely to underestimate the benchmark risk σBM simultaneously with underestimating the tracking
error risk σT E if the two portfolios have a high correlation ρBM,T E . Figure 6.3.2 below repeats the
analysis from Figure 6.3.1 using Entropy Pooling to stress-test the tracking error σT E for various
correlation levels. We clearly see that the risk overshoot becomes significantly higher not only due to
the ceteris paribus effect from a higher tracking error than we have estimated, but also due to a higher
benchmark risk. Hence, maintaining a low correlation ρBM,T E between the benchmark and tracking
error portfolios is a very important part of skillful portfolio construction in practice.
The practical implication of 2. is that there is a trade-off between standalone tracking error risk σT E
and the correlation ρBM,T E , i.e., a lower correlation ρBM,T E allows for a higher standalone tracking
error σT E without increasing the overall portfolios risk σP F . Hence, diversification can happen within
and between the benchmark and tracking error portfolios, with ρBM,T E being a proxy for the degree
of diversification between the two portfolios. Note also that the trade-off between σT E and ρBM,T E
depends on the ratio σT E
σBM . This observation explains why the impact of FX hedging on overall portfolio
98
risk is more significant for a low risk portfolio of short-term investment grade bonds than a high risk
portfolio of equities.
Figure 6.3.2: Risk overshoot with EP for σBM ≈ 10%, σ̂T E ≈ 2%, and various ρBM,T E σBM .
To understand the FX hedging argument, let σBM represent the risk of US equities or investment
grade bonds. Let us assume that σBM = 20% for equities and σBM = 5% for bonds. As a foreign EUR
investor, you can decide whether to hedge the USD risk of these investments or not. In that case, the
tracking error portfolio consists of EURUSD. Let us assume that its risk is σT E = 10%, which is the
same no matter if we invest in equities or bonds. From the relation σT2 E + 2ρBM,T E σBM σT E ≤ 0 ⇔
σBM ≤ −2ρBM,T E , we can see that for bonds we must require that ρBM,T E = −1 if the USD risk is
σT E
not to increase the risk of the portfolio, while we only require ρBM,T E = −0.25 for equities. It is quite
obvious that we are more likely to have a correlation of ρBM,T E = −0.25 or lower than ρBM,T E = −1.
Hence, FX affects the risk of low risk bond portfolios more than higher risk equity portfolios.
CVaR is a more complex measure of risk that in the general case does not follow nice relationships
like (6.3.2). For CVaR, the best we can do to replicate the form of (6.3.2) is
99
where f (BM, T E) ≤ 0 is the diversification term. It follows that f (BM, T E) ≤ 0 from the subad-
ditivity property of CVaR, i.e., CV aR (X + Y ) ≤ CV aR (X) + CV aR (Y ), see Artzner et al. (1999)
and Rockafellar, Uryasev, and Zabarankin (2006).
Note that the analysis of the volatility tracking error σT E and the correlation between the tracking
error portfolio and benchmark ρBM,T E can be generalized to the CVaR risk measure. With the
portfolio and benchmark CVaR risks CV aR (P F ) and CV aR (BM ) fixed, there is again a trade-off
between the diversification between the benchmark and tracking error portfolio f (BM, T E) and the
standalone tracking error portfolio risk CV aR (T E). In summary, a better diversification represented
by a lower f (BM, T E) allows for a higher standalone risk CV aR (T E) for the tracking error portfolio.
For CVaR, the two terms are just more abstract and flexible than standard deviations and correlations.
Although it is only variance that we can decompose nicely into σBM , σT E , and ρBM,T E , the general
principles are likely to hold for more complex risk measures such as CVaR, because the decomposition
(6.3.3) will hold for any coherent risk measure R. Hence, we must not be limited by the convenience
of variance, because we pay a high price in terms of building our portfolios based on highly unrealistic
market assumptions in that case.
Let us generally define portfolio optimization problems with risk targets and tracking error/tactical/
short-term risk budgets as
e⋆ = argmax µT e
e
subject to
R (e) ≤ Rtarget ,
R (e − eBM ) ≤ RT E ,
v T e = 1,
e ∈ E.
For some investment risk measure R, e.g., CVaR or variance. As we have seen in the analysis above,
this formulation introduces an implicit constraint on the diversification term f (BM, T E).
The main takeaway from this section is that the two risk constraints introduce important trade-offs
between the benchmark risk and the tracking error risk. It is important that we constantly take these
dependencies into account and avoid significantly underestimating the risk in our portfolio as shown
in Figure 6.3.2. While it will be shown in Section 6.3.1 below how you can solve the risk budget
optimization problem for variance, the same analysis as well as the subsequent Entropy Pooling stress-
testing of the tracking error risk can be performed for CVaR. Readers are encouraged to test their
understanding using the CVaR formulation from Section 6.2.1 and the CVaR Entropy Pooling views
from Section 5.1.3 to perform this analysis.
A final comment is that we can of course also underestimate the risk of the benchmark. In the typical
case where the benchmark represents broad “market beta” exposure, a high correlation between the
benchmark and tracking error portfolios implies that the tracking error portfolio is simply more “market
beta”. It is almost equivalent to increasing the exposure of the futures position introduced in the first
paragraph of this section. If the tracking error portfolio is simply characterized by systematically
having more benchmark exposure, it should arguably become a part of the benchmark and not be
100
considered investment alpha.
101
both cases are given in the accompanying code to Section 6.2.1. Note that this is still a very basic use
case, while solving the problem with advanced constraints and transaction costs is more challenging.
Target return Target risk Target risk and tracking error Benchmark
Gov & MBS 0.00 0.00 0.00 10.00
Corp IG 0.00 0.00 0.00 10.00
Corp HY 0.00 0.00 0.00 10.00
EM Debt 18.39 18.39 18.39 10.00
DM Equity 0.00 0.00 0.00 10.00
EM Equity 0.00 0.00 0.00 10.00
Private Equity 6.61 6.61 6.61 10.00
Infrastructure 25.00 25.00 25.00 10.00
Real Estate 25.00 25.00 25.00 10.00
Hedge Funds 25.00 25.00 25.00 10.00
Return target 5.00
Volatility 6.69 6.69
Tracking error 3.18
The numbers in bold from Table 6.1 indicate which constraints were used to optimize the portfolio.
Hence, we started with a target return constraint, then a target risk constraint, and finally a target risk
and tracking error constraint. You can find all the details in the accompanying code to this section.
Note that in Table 6.1 above, we optimized a portfolio with a return target first and then took the
risk and tracking error targets as given. In practice, we would usually specify risk targets and tracking
errors and then take the optimal expected return as given. Tracking error constraints can in practice
contribute to alleviating the inherent portfolio optimization issues related to concentrated portfolios,
especially when combined with transactions costs, but they do not solve all the issues. Therefore,
portfolio optimization in practice must almost always take parameter uncertainty into account, which
we do in the next section.
102
portfolios with the same constraints and similar risk levels to compute a sample of optimal exposures
e⋆b ∈ RI , b ∈ {1, 2, . . . , B}. Afterwards, we compute a weighted average of the optimal exposures to
get the final resampled portfolio
B
X
e⋆w = wb e⋆b , (6.4.1)
b=1
PB
with wb ≥ 0 and b=1 wb = 1.
The original suggestion by Michaud and Michaud (1998) is to set wb = 1
B and align the portfolio
risk through the portfolio’s index on the efficient frontier. A desirable aspect of the resampled approach
is that it is highly flexible, while it initially did not have much justification. Fundamental perspectives
justifying the resampled approach are presented by Kristensen and Vorobets (2024), while also intro-
ducing other ways of aligning the portfolio risk with a new method for determining the sample weights
wb .
To understand the new stacking methods, it is important to view the portfolio optimization problem
from the right perspective. We consider a risk-adjusted return objective f , which is a function of the
market model (R, p) with R ∈ RS×I being a matrix of joint market scenarios and p ∈ RS being an
associated scenario probability vector from (1.1.1). Additionally, f is a function of the optimization
constraints E and the portfolio exposures e ∈ RI . First, consider the case where f determines the
expected return, while constraints on the risk are included in E. The optimal portfolio exposures e⋆
are then given by
e⋆ = argmax f (R, p, E, e) .
e
It is also possible to work from the perspective of minimizing risk with a return target. In such
cases, the optimal portfolio exposures are given by
e⋆ = argmin g (R, p, E, e) ,
e
where g is a function that determines the risk, while the return target is included in the constraints E.
As we noted in Section 6.3.1 above, the two perspectives solve equivalent risk-adjusted return problems.
Kristensen and Vorobets (2024) argue that the parameter uncertainty issue can be analyzed from
the same perspective as statistical learning models using the bias-variance trade-off. Specifically,
treating the market model or parameters as data and the optimal portfolio exposures as estimates,
with the particular mean-risk optimization method being an estimator. From this perspective, the
traditional mean-risk portfolio optimization estimators have no bias but a high variance due to their
sensitivity to the market model and parameter values. On the other hand, resampled versions of these
estimators have some bias but a lower variance.
To explore resampled portfolio optimization from a bias-variance perspective, we define the optimal
exposure for sample b ∈ {1, 2, . . . , B} by
Note that in this definition, we are optimizing over samples (Rb , pb ) of the market model. For the
mean-variance objective, this is simply estimates of mean vectors µb and covariance matrices Σb . For
103
the mean-CVaR objective, this can be entirely new joint market scenarios Rb and associated probability
vectors pb , with pb possibly generated using Sequential Entropy Pooling as described in Chapter 5. In
their case study, Kristensen and Vorobets (2024) show that it is usually the mean vectors that affect
the efficient exposures e⋆b the most.
How do we analyze the risk-adjusted objective from a bias-variance perspective? We must first
understand in which sense the traditional portfolio optimization methods such as mean-variance and
mean-CVaR are unbiased. We call a mean-risk estimator unbiased if it correctly estimates e⋆ , i.e.,
if it correctly estimates the optimal portfolio exposures given the market model (R, p). Clearly, the
resampled estimator (6.4.1) is not unbiased in the general case.
Hence, when we perform resampled optimization, we are purposefully introducing bias to reduce
the variance of the final optimal exposures estimate e⋆w . If we weight the sample estimates equally
as originally suggested, we are just hoping that the increase in bias will reduce the variance more.
Although this is also what people experience in practice, we do not have any guarantees that the
bias-variance trade-off is minimized by an equally weighted average, so it is worth exploring whether
we can do better.
Direct minimization of the vector mean squared error
2
B
X
M SE (e⋆w ) = E wb e⋆b − e⋆
b=1 2
of the resampled estimator (6.4.1) is not interesting given that the solution will simply try to find a
linear combination of the vectors e⋆b which is as close as possible to e⋆ with respect to the Euclidean
norm ∥·∥2 . Instead, we focus on multivariate stacking objectives having a form similar to the vector
mean squared error.
Stacked generalization, see Wolpert (1992) and Breiman (1996), has received increased attention in
recent years due to its versatility and potential for improving out-of-sample performance. Hence, we
analyze the resampled portfolio optimization problem from this perspective. A natural, albeit slightly
naive, starting point for the objective function is to ensure that the Euclidean norm of the difference
between e⋆w and each e⋆b is, on average, as small as possible, i.e., solve the problem
B B 2
1 X ⋆ X
w = argmin
⋆
ek − wb e⋆b . (6.4.2)
w B
k=1 b=1 2
However, as shown in Appendix A of Kristensen and Vorobets (2024), the optimal solution to (6.4.2)
always yields a vector of equal sample weights wb⋆ = 1
B for all b ∈ {1, 2, . . . , B}, corresponding to the
traditional resampling method.
Instead of (6.4.2), Kristensen and Vorobets (2024) define an objective function based on the ideas
of stacked regression and cross-validation. Let B = {1, 2, . . . , B} be the set of sample indices and
suppose that we partition B into L nonempty sets K1 , K2 , . . . , KL for some L ∈ {2, 3, . . . , B}. For any
choice of l ∈ {1, 2, . . . , L}, we consider the set of exposure vectors {ek | k ∈ Kl } as a validation set, find
sample weights wb for the remaining B − |Kl | exposure vectors, and calculate the average difference εl
104
between the weighted exposure and the validation set exposures, i.e., we compute
2
1 X ⋆ X
εl = ek − wb e⋆b .
|Kl |
k∈Kl b∈K
/ l
2
If we repeat the analysis using each of the sets K1 , K2 , . . . , KL as our validation set indices, we can
PL
compute the L-fold cross-validation estimate as the average ε = L1 l=1 εl , see James et al. (2023).
Thus, instead of using the naive objective function (6.4.2), we wish to find the vector of sample weights
w⋆ that minimizes the cross-validation estimate ε, i.e., solve the problem
2
L
1 X 1 X ⋆ X
w⋆ = argmin ek − wb e⋆b . (6.4.3)
L |Kl |
w
l=1 k∈Kl b∈K
/ l
2
Kristensen and Vorobets (2024) show in Appendix B that (6.4.3) can be formulated as a quadratic
programming problem and therefore solved in a fast and stable way. Note that (6.4.3) allows us to stack
exposures both across different market model samples (Rb , pb ) to incorporate parameter uncertainty as
well as different efficient portfolio estimators. For example, one could combine mean-variance, mean-
CVaR, or even mean-CVaR with different CVaR levels such as 90%, 95%, and 97.5%, see the case
study in Section 6.4.3.
The original suggestion to stack based on portfolio exposures e⋆b is based on the risk alignment
considerations from Section 2.1 in Kristensen and Vorobets (2024). The case study in the accompanying
code to this section confirms that the portfolios stacked on the portfolio mean or volatility indeed drift
quite significantly in relation to risk and return compared to the original resampled frontier. Table 6.2
shows the results for a repeated analysis from Kristensen and Vorobets (2024) with a different seed
and including portfolio mean, volatility, and mean / volatility stacked resampled portfolios for L = 2.
Table 6.2: Mean-variance optimal exposures for L = 2 as well as portfolio return and volatility.
From Table 6.2, we immediately note the significant drift in the risk and return of the resampled
portfolios that are stacked based on portfolio mean, volatility, and mean / volatility. It is also interest-
105
ing that the portfolios based on mean and mean / volatility coincide in this case. The issues with drift
in risk and return properties seem to diminish once we set L = 10, see Table 6.3. Hence, the number
of folds parameter L acts as a hyperparameter for the resampled stacking methods. It seems to affect
the various objectives differently, where low values such as L = 2 work well for Exposure Stacking
while higher values such as L = 10 are required for stacking based on portfolio mean and volatility.
Table 6.3: Mean-variance optimal exposures for L = 10 as well as portfolio return and volatility.
For more details about the above computations, see the accompanying code to this section. Since
these method are still very new, there is generally still a need to extensively explore them to see when
they work well. The same is true for the Resampled Portfolio Stacking methods that are introduced
for the first time in this book in the sections below. A Python function for solving (6.4.3) is provided
in the accompanying code, so it is easy to explore these methods further. The code also allows you
to change the L parameter and see the consequences. Finally, the code contains out-of-sample return,
risk, and risk-adjusted return distributions graphs similar to Kristensen and Vorobets (2024).
106
Resampled Exposure Return Risk Return / Risk Frontier
Gov & MBS 0.12 0.00 0.00 0.00 0.00 0.00
Corp IG 0.00 0.00 0.00 0.00 0.00 0.00
Corp HY 0.00 0.00 0.00 0.00 0.00 0.00
EM Debt 4.88 0.00 0.00 1.36 0.43 0.00
DM Equity 0.48 0.00 0.00 0.00 0.00 0.00
EM Equity 0.07 0.00 0.00 0.00 0.00 0.00
Private Equity 18.07 13.92 21.90 24.90 18.42 20.41
Infrastructure 34.35 46.49 38.89 37.44 38.54 40.49
Real Estate 16.36 10.49 12.66 9.23 15.60 11.06
Hedge Funds 25.66 29.11 26.55 27.07 27.01 28.04
Mean 6.34 6.25 6.81 7.03 6.50 6.71
Vol 8.96 8.78 9.80 10.25 9.21 9.61
Table 6.4: Mean-variance optimal exposures for L = 2 with Exposure, Return, Risk, and Return /
Risk Stacking.
Risk, and Return / Risk Stacking. It illustrates out-of-sample results similar to the case study from
Kristensen and Vorobets (2024) for L ∈ {2, 5, 20, B = 1000}. While this gives us many different
insights, there are still many other interesting analyses that can be performed for the new methods.
Readers are encouraged to explore these on their own and share their experiences. All the details for
the computations related to this section can be found in the accompanying code. The rest of this
section focuses on showing the simulation case study results.
107
Figure 6.4.2: Out-of-sample portfolio variance distribution for L = 2.
108
Resampled Exposure Return Risk Return / Risk Frontier
Gov & MBS 0.12 0.00 0.01 0.00 0.00 0.00
Corp IG 0.00 0.00 0.00 0.00 0.00 0.00
Corp HY 0.00 0.00 0.00 0.00 0.00 0.00
EM Debt 4.88 1.27 5.21 1.85 2.25 0.00
DM Equity 0.48 0.00 0.52 0.00 0.00 0.00
EM Equity 0.07 0.00 0.06 0.00 0.00 0.00
Private Equity 18.07 17.76 18.26 20.33 16.19 20.41
Infrastructure 34.35 38.11 34.12 34.81 38.16 40.49
Real Estate 16.36 15.62 16.29 15.55 20.57 11.06
Hedge Funds 25.66 27.25 25.53 27.47 22.82 28.04
Mean 6.34 6.42 6.35 6.59 6.26 6.71
Vol 8.96 9.07 8.99 9.40 8.78 9.61
Table 6.5: Mean-variance optimal exposures for L = 5 with Exposure, Return, Risk, and Return /
Risk Stacking.
109
Figure 6.4.5: Out-of-sample portfolio variance distribution for L = 5.
110
Resampled Exposure Return Risk Return / Risk Frontier
Gov & MBS 0.12 0.00 0.00 0.00 0.00 0.00
Corp IG 0.00 0.00 0.00 0.00 0.00 0.00
Corp HY 0.00 0.00 0.00 0.00 0.00 0.00
EM Debt 4.88 4.21 5.18 5.59 3.92 0.00
DM Equity 0.48 0.00 0.48 0.41 0.04 0.00
EM Equity 0.07 0.00 0.01 0.01 0.00 0.00
Private Equity 18.07 18.06 17.99 18.10 14.83 20.41
Infrastructure 34.35 35.24 34.36 34.25 42.86 40.49
Real Estate 16.36 16.35 16.64 16.32 21.35 11.06
Hedge Funds 25.66 26.13 25.33 25.32 17.00 28.04
Mean 6.34 6.36 6.33 6.33 6.16 6.71
Vol 8.96 8.97 8.94 8.94 8.65 9.61
Table 6.6: Mean-variance optimal exposures for L = 20 with Exposure, Return, Risk, and Return /
Risk Stacking.
111
Figure 6.4.8: Out-of-sample portfolio variance distribution for L = 20.
112
Resampled Exposure Return Risk Return / Risk Frontier
Gov & MBS 0.12 0.12 0.00 0.00 0.03 0.00
Corp IG 0.00 0.00 0.00 0.00 0.00 0.00
Corp HY 0.00 0.00 0.00 0.00 0.00 0.00
EM Debt 4.88 4.86 4.41 4.32 3.46 0.00
DM Equity 0.48 0.47 0.49 0.49 0.45 0.00
EM Equity 0.07 0.11 0.07 0.08 0.02 0.00
Private Equity 18.07 18.07 19.00 18.22 17.57 20.41
Infrastructure 34.35 34.36 35.11 35.53 38.34 40.49
Real Estate 16.36 16.36 16.03 16.39 18.29 11.06
Hedge Funds 25.66 25.66 24.90 24.98 21.85 28.04
Mean 6.34 6.34 6.44 6.38 6.36 6.71
Vol 8.96 8.97 9.14 9.03 8.99 9.61
Table 6.7: Mean-variance optimal exposures for L = B = 1000 with Exposure, Return, Risk, and
Return / Risk Stacking.
113
Figure 6.4.11: Out-of-sample portfolio variance distribution for L = B = 1000.
114
From the above figures and tables, we can conclude that L serves as a hyperparameter, and that
the different resampled portfolio optimization approaches react differently to this hyperparameter.
Stacking based on exposures seems to perform well for low values of L, while stacking based on risk
and return require a higher L to not significantly drift in terms of portfolio risk and return. Once L
is increased towards the number of resamples B, all approaches converge to similar results with only
minor deviations.
It is important to underline that the case studies in this chapter still do not allow us to make definite
conclusions. The results might be affected by the particular normally distributed market simulation
as well as potential numerical instability in the estimation of the sample weights wb . Hence, the new
methods must still be carefully tested on practical cases to assess the magnitude of their potential
risk-adjusted gains. It might also be meaningful to combine the various objectives either into one
vector or through some form of weighting.
This section is concluded by coining all approaches that solve the objective in (6.4.3) with some re-
sampled target x⋆b that is not necessarily the exposures e⋆b as Resampled Portfolio Stacking. The figures
below compares the out-of-sample results for L ∈ {2, 5, 20, B = 1000} and Return / Risk Stacking.
Figure 6.4.13: Out-of-sample portfolio return distribution for Return / Risk Stacking.
115
Figure 6.4.14: Out-of-sample portfolio variance distribution for Return / Risk Stacking.
Figure 6.4.15: Out-of-sample portfolio return / variance distribution for for Return / Risk Stacking.
116
6.4.2 Derivatives and Risk Factor Parameter Uncertainty
As we have seen in Section 6.1, derivatives introduce only a simple extra layer of complexity in general
portfolio management that requires us to separate between relative market values v ∈ RI and relative
exposures e ∈ RI , see also Vorobets (2022a) for the original documentation.
For portfolio optimization with parameter uncertainty, derivatives introduce significantly more
complexity as we must ensure that the derivatives P&L is consistent with the parameter uncertainty
we introduce in the underlying and risk factors such as implied volatilities. For example, the expected
P&L of a European option at expiry is fully determined by the P&L of the underlying. Hence, we
cannot introduce separate parameter uncertainty into the underlying and the derivative instrument if
we want to maintain logical consistency.
While Resampled Portfolio Stacking allows us to introduce parameter uncertainty by generating
both new market simulations Rb and scenario probability vectors pb for b = 1, 2, . . . , B, we probably
want to avoid the costly simulation and pricing of derivatives associated with generating Rb for each
sample. Hence, we fix Rb = R for all samples and introduce parameter uncertainty into the derivative
instruments by adjusting pb with (Sequential) Entropy Pooling.
An elegant feature of the Entropy Pooling method is that it allows us to avoid the potentially
costly repricing of derivative instruments, see the case study in Vorobets (2022a). The special aspect
of derivatives is that they are inseparable from the underlying and the derivative’s other risk factors.
Hence, parameter uncertainty related to derivatives P&L should be introduced through parameter
uncertainty in the underlying and the derivative’s other risk factors. This is exactly what the approach
introduced by Vorobets (2024) does.
For each sample b = 1, 2, . . . , B, the algorithm is:
1. Introduce parameter uncertainty into the non-derivative instruments and risk factors.
2. Compute (Sequential) Entropy Pooling posterior probability vectors pb using the new parameters
as views.
117
6.4.3 Multiple CVaR Levels Case Study
This section uses the constraints and simulation from Vorobets (2022a) and Vorobets (2024) to illus-
trate how Exposure Stacking can be applied across different optimization methods as suggested by
Kristensen and Vorobets (2024). In particular, optimization is performed for 90%, 95%, and 97.5%-
CVaR for a multi-asset portfolio containing derivatives. The risk is aligned by selecting the middle
portfolio on the efficient frontier. The resampled efficient exposures are shown in Table 6.8 below.
Table 6.8: CVaR resampled efficient portfolios using 3-fold Exposure Stacking.
Readers can find all the details in the accompanying code and are encouraged to explore the results
further. We note that in most practical applications, investment managers will be constrained by a
particular α-CVaR with a target for the overall portfolio risk and tracking error risk as explained in
Section 6.3. However, it is meaningful to put CVaR risk limits across multiple α values, for example,
α ∈ {90%, 95%, 97.5%} as in this case study.
We note that for CVaR optimization, the case study has used a lower number of resamples B
simply because the problem takes longer time to solve, even with a semi-fast and semi-stable algorithm
compared to the traditional linear programming presented in Section 6.2.1. We also use the efficient
frontier to align the risk, because this is the only semi-fast CVaR implementation available to us. In
practice, it is advised to use a target risk Rtarget and target tracking error RT E problem formulation,
but fast and stable solutions to these problems for CVaR are not freely available.
An example using the target risk Rtarget and target tracking error RT E will be given for mean-
variance optimization in Section 6.5 below. We have already seen how to solve this problem using
second-order cone programming in the accompanying code to Section 6.3.1. Although the dual risk
CVaR optimization problem is really hard to solve in a fast and stable way, which is crucial for
resampled portfolio optimization, it is practically feasible, while the algorithms are highly specialized
and require very careful implementation.
118
6.4.4 Perspectives on Resampled Portfolio Stacking Targets
In the previous sections, we have used various stacking targets. To define them formally, we define the
stacking target x as
x = t (e, R, p) , (6.4.4)
where e ∈ RI are portfolio exposures, while R and p are from the market representation with fully
general Monte Carlo distributions and associated joint scenario probabilities from (1.1.1). The function
t (e, R, p) can be the marginal risk contributions or another appropriate target function. We use p in
(6.4.4), but the probability vector might as well be an Entropy Pooling posterior probability vector q.
Which target (6.4.4) is the most meaningful depends on the application. The reader has full
flexibility in terms of determining which target makes sense for them. It is generally recommended to
use targets that include the marginal risk contributions presented in Section 7.1, because these take into
account the differences in standalone risk as well as potentially complex diversification interactions.
We note that the risk contributions do not necessarily have to stem from the exposures e, they could
also be from risk factors.
The choice of stacking target (6.4.4) is by no means trivial and should be done with care. For
′
example, in Section 6.4.1 we have used x = µ ⊘ R (e), the ratio between marginal return and risk
contributions. The astute reader might have noticed that this is not mathematically defined when
∂R(e)
the marginal risk contribution ∂ei is zero. We could still technically do it from a computational
perspective due to the way real numbers are stored in computer memory. Although it did not result
in any issues in our case studies, it might lead to issues in other cases.
Given the issues with the ratio between marginal risk and return contributions, it is worth consid-
ering whether the target that includes these quantities should be formulated as a combined vector of
the two or as the difference between the elements, see also Section 6.5. With these formulations, there
will be no conceptual mathematical issues or practical issues from dividing by a number that is very
close to zero. However, it of course introduces some implicit weighting, depending on the magnitude
of the marginal risk and return contributions.
When we optimize over multiple risk measures, or the same risk measure with different hyperpa-
rameters like in Section 6.4.3, it introduces some additional nuances to the stacking objective. While
we used Exposure Stacking in Section 6.4.3, we could in principle have used a combined vector con-
taining the marginal risk contributions to α-CVaR with α ∈ {90%, 95%, 97.5%}. The interested reader
is encouraged to perform this analysis. With multiple risk measures, we could also decide that there is
one which is more important to us than the others and use the marginal risk contributions from this
risk measure as the Resampled Portfolio Stacking target. The book’s general recommendation is to
stick to CVaR, because it has nice properties.
From the case study in Section 6.4.1, we noticed that the number of folds hyperparameter L affects
the various targets x differently. Hence, when using a new target it is especially important to examine
how it is affected by this parameter. It is generally recommended that L is determined by its out-of-
sample performance by evaluating the results on new samples as done in Section 6.4.1. Finally, we
underline that the Resampled Portfolio Stacking approach is still very new. Hence, the magnitude of
the risk-adjusted return gains must still be assessed by extensive practical use of the method, which
readers are encouraged to explore in creative ways.
119
6.5 Portfolio Rebalancing
Portfolio rebalancing is often approached in an ad hoc way without a clear way of thinking about it.
Having introduced Resampled Portfolio Stacking in Section 6.4.1, we have a natural way of measuring
similarities between portfolios with the Euclidean norm, which can be applied to any suitable target x⋆b .
For example, we can measure the distance between portfolios based on their marginal
risk contributions
′ ′
R (e)⊙e or a combined vector of the marginal return and risk contributions µ ⊙ e, R (e) ⊙ e ∈ R2I .
′
Note that we purposefully avoid using the ratio between the marginal risks and returns µ ⊘ R (e) due
to the potential issues with this target, presented in Section 6.4.4.
When we introduce market model or parameter uncertainty into the portfolio optimization problem,
we are generating a distribution of optimal portfolio exposures e⋆b for the particular constraints, risk
measure, and risk targets. We can use these sample exposures e⋆b to generate a distribution for an
Euclidean norm metric, allowing us to assess the distance between our current portfolio e0 and a
desired portfolio etarget that we are considering rebalancing towards.
The above reasoning is not entirely new as these thoughts were initially introduced by Michaud and
Michaud (1998). However, the original resampled approach is focused on mean-variance optimization
and the efficient frontier, aligning the portfolio risks through the index on the efficient frontier. The
method introduced in this sections goes well beyond that and works for general risk measures R (e) as
well as portfolio optimization with a portfolio risk target Rtarget , a tracking error target RT E , or both.
Risk and return alignment through the index on the efficient frontier is of course also still possible.
The resampled rebalancing target (6.4.4) can simply be the exposures e, following the original
Exposure Stacking suggestion presented in Section 6.4. However, it can also be the marginal return
′
contributions µ ⊙ e, the marginal
risk contributions
R (e) ⊙ e, or a combined vector of the marginal
′
return and risk contributions µ ⊙ e, R (e) ⊙ e ∈ R2I as suggested in this section. All targets align
with the general Resampled Portfolio Stacking perspectives introduced in Section 6.4.1, while only the
last two follow the marginal risk recommendation from Section 6.4.4.
We formally define a resampled test statistic as
x̄0 = ∥xtarget − x0 ∥2
120
low, for example at 5%, this indicates that our current exposures e0 are far from the target exposures
etarget . The above approach is coined Resampled Portfolio Rebalancing, and the proportion of x̄b < x̄0
the Resampled Rebalancing Probability.
The convenient feature of the Euclidean norm is that it clearly measures the deviation from our
target in the sense that a higher Euclidean norm represents a larger deviation. If we measured the
deviations directly by, for example, the portfolio’s CVaR, which can be both negative and positive, we
do not have this uniformity in interpretation. Other measures of deviation can of course also be used,
while it is important that they preserve the uniformity in interpretation that allows us to do one-sided
tests. The original suggestion by Michaud and Michaud (1998) is to use the volatility tracking error,
but this is related to their focus on mean-variance optimization, which we are not constrained to. Note
also that we can operate with multiple targets xb and their joint distribution if we want.
Continuing along the lines of the analysis fromSection 6.4.4, wefocus on a combined vector of the
′
marginal return and risk contributions, i.e., x = µ ⊙ e, R (e) ⊙ e . The accompanying code to this
section shows you how to perform the rebalancing test for mean-variance optimization, because it is
easy and fast with an implementation of the dual risk objective already available from Section 6.3.1.
In practice, it is always recommended not to reduce the market model to a mean vector and covariance
matrix as well as optimize over CVaR as argued throughout the book and in Section 6.2.
We use the portfolio from Table 6.1 in Section 6.3.1 as the current allocation e0 , while using a target
portfolio etarget with an overall volatility of 7% and a tracking error of 2% to the equally weighted
benchmark ebm . We use B = 1, 000 resamples and N = 100 samples to introduce uncertainty into the
means. See Table 6.9 for an overview of the portfolios that we work with.
Figure 6.5.1 shows the sampling histogram of x̄b as well as the test statistic x̄0 . We note that this
depends on the target xb , the risk measure R, the portfolio exposure constraints E, and the market
model/parameter uncertainty introduced by the samples. When we include the marginal risks in the
target x, we take into account the different diversification effects present in the portfolios. Hence,
the new rebalancing framework is very flexible for assessing the distance between the target exposures
etarget and the current exposures e0 , and it works for fully general parameter uncertainty.
121
Figure 6.5.1: Sampling distribution for x̄b and the test statistic x̄0 .
The histogram in Figure 6.5.1 is a convenient way of visualizing how the Resampled Portfolio
Rebalancing test is performed. In practice, it is sufficient to look at the proportion where x̄b ≥ x̄0 . We
call this proportion the Resampled Portfolio Rebalancing p-value. In this particular case study, the
p-value is 1.1%, while the rebalancing probability is 98.9%, see the accompanying code to this section
for all computation details.
While we have so far focused on the investable portfolio exposures e, there is nothing preventing
us from analyzing the portfolio’s risk and return from a risk factor perspective. Hence, we can apply
the Resampled Portfolio Rebalancing approach to risk factor contributions. This, however, requires
that we are able to perform such a risk factor decomposition of the portfolio’s risk and return. More
perspectives on this will be given in Chapter 7 below.
122
Chapter 7
This chapter presents fundamental methods and perspectives for general risk and return analysis.
While these methods are frequently used by practitioners, they are often ignored by academics, who
seem to think that portfolio construction is mainly about estimating covariance matrices and perform-
ing mean-variance optimization.
As mentioned in the previous Chapter 6 about portfolio optimization, many practitioners are in
fact so skeptical about portfolio optimization that they do not perform it explicitly and mostly rely on
a risk allocation exercise using the marginal contributions to risk as presented in Section 7.1. The risk
allocation exercise can naturally be combined with the views and stress-testing methods presented in
Chapter 5. As we have seen in Section 6.3, the Sequential Entropy Pooling method can also naturally
be used to stress-test the diversification assumptions for such risk budgeting exercises.
While many investment managers claim to only do risk budgeting, the reality is that expected
return assumptions have a tendency to sneak themselves into the allocation in implicit ways. So the
problem of estimating expected returns is not ignored in practice, these estimates are just rarely fed
into a mean-variance optimizer and used for actual investment management. As discussed throughout
this book, mean-variance has many shortcomings that quickly show up and lead to undesired outcomes
in practice. It is hard to separate practitioners’ skepticism towards mean-variance from the skepticism
towards portfolio optimization in general. As Chapter 6 shows, there exist practically feasible solutions
for handling fully general parameter uncertainty issues, and practitioners frequently use some variant
of the resampled portfolio optimization approach.
Risk parity approaches that distribute the risk contribution evenly from the individual investments
or some factor representation have been promoted a lot in recent years. However, the implicit as-
sumption is that the marginal return contributions also happen to be the same for each instrument.
This is probably rarely true in reality, and many practitioners are not even able to implement risk
parity portfolios given their investment constraints. Hence, risk parity is often more theoretical than
practical. If you have a portfolio containing very complex investment strategies with strong alpha
signals where the expected return estimation and market modeling might be extremely challenging,
it might make sense to allocate roughly the same amount of risk to each strategy. In all other cases,
it is recommended to not ignore the expected returns and perform portfolio optimization including
parameter uncertainty as presented in Chapter 6.
123
7.1 Marginal Risk and Return Contributions
Marginal risks are defined as the gradient vector of the investment risk measure R with respect to the
individual exposures e, i.e.,
∂R(e)
∂e1
∂R(e)
′ ∂e2 ∈ RI .
R (e) = ..
.
∂R(e)
∂eI
For investment risk measures that are homogeneous of degree one, it holds that
I
X ∂R (e) ′ T
R (e) = ei = R (e) e, (7.1.1)
i=1
∂ei
see Meucci (2007). That the risk measure is homogeneous of degree one simply means that if we double
all the individual exposures, the risk will also double. This will hold for the investment risk measures
that we consider, in particular CVaR and variance.
We call the elements
′ ∂R (e)
R (e) ⊙ e = ei ∈ RI (7.1.2)
∂ei i
the marginal risk contributions. It is this quantity that we use in the Risk Stacking approach from
Section 6.4.1.
For variance, the marginal risks can be computed simply as
′ Σe
R (e) = √ .
eT Σe
There is also a slightly more complicated formula for general CVaR marginal risks based on the Monte
Carlo market simulation R and associated probability vectors p and q from 1.1.1, but it is generally
fine to compute the partial derivatives numerically, i.e.,
′ R (ei,0 + ∆) − R (ei,0 )
R (e) ≈
∆ i
for some small ∆ > 0, e.g., ∆ = 0.000001. A convenient feature of the numerical approach is that you
only have to know how to compute the risk of a portfolio. You do not have to know how to derive
the analytical marginal risk contribution formula for that particular risk measure. It also ensures
consistency in the way the risk is computed, e.g., when it comes to interpolation methods for VaR and
CVaR.
Marginal relative returns are simply given by the mean vector
µ ∈ RI ,
µ ⊙ e = (µi ei )i ∈ RI .
124
It should be easy to convince yourself that this is indeed the case. In a rebalancing or portfolio
optimization application with transaction costs, we probably want to account for the transaction cost
as well when computing the marginal return contributions of the portfolios we are trading towards.
It might also be interesting to analyze the marginal risk-adjusted returns defined as
′ ′
(µ ⊙ e) ⊘ R (e) ⊙ e = µ ⊘ R (e) ∈ RI ,
where ⊘ denotes the element-wise Hadamard division. We note that the marginal risk-adjusted returns
do not follow the decomposition from (7.1.1), and that they are only defined for the elements where
∂R(e)
∂ei ̸= 0 for i ∈ {1, 2, . . . , I}. It is worth computing this quantity to assess if the ratio between
expected marginal return and risk contribution is particularly attractive for some exposure i. It is
also interesting to assess these ratios for risk parity investors to evaluate whether they have included
sources of risk in their portfolio with a very low contribution to the expected return.
Another example of how marginal risk contribution analysis can be combined with the views and
stress-testing from Chapter 5 is a case where we implement a CVaR stress-test for developed market
equities instead of the portfolio, as we do in Section 5.1.3, and assess its effect on the marginal risk
contributions. The case study in this section uses a proprietary implementation, but it is a good
exercise for readers to try to replicate it with an Entropy Pooling CVaR stress-test, a computation of
the portfolio CVaR, and a CVaR risk decomposition as in equation (7.1.2).
Figure 7.1.1 shows the stress-tests, where we have increased the 90%-CVaR of developed market
equities by 50%. Figure 7.1.1 also illustrates the effect from the CVaR stress-test on corporate high-
yield bonds. We note that Entropy Pooling not surprisingly predicts more tail risk for corporate high
yield as a consequence of the increased equity tail risk. Table 7.1 shows the portfolio that we compute
the marginal risk contributions (7.1.2) for as well as their posterior and prior values. From Table 7.1,
we note that many interesting interaction effects happen. For example, the 90%-CVaR contribution
from equities increases significantly more than 50%. This effect stems from the fact that other assets
with a positive correlation with equities also have an increase in CVaR, which makes the DM equity
allocation even less diversifying. The important point here is that the marginal risk contributions
(7.1.2) include several complex interaction effects. Figure 7.1.2 visualizes them.
125
Figure 7.1.1: Joint distribution for DM Equity and Corp HY after a 90%-CVaR stress-test.
126
7.2 Market Views vs Stress-Tests
The terms market views and stress-tests are frequently used by investment practitioners, although there
is usually no precise definition. We can loosely define views as minor adjustments to the market model,
while stress-tests are specific adverse scenarios that we want to examine. While both are meaningful
to analyze from a (marginal) risk and return perspective as presented in Section 7.1, stress-tests are
probably what we want to build tail risk hedges for as presented in Section 7.3.
In this section, we will see some examples of what we can rightfully call market views, and what
should be characterized as a stress-test. We keep the market simulation very simple with a log-normal
simulation for equities and bonds as well as introduce a stress-test for the classical 60/40 portfolio.
The stress-test example will also illustrate a case where the Sequential Entropy Pooling algorithms
from Section 5.2 are capable of solving practically interesting problems that the original suggestion
from Meucci (2008a) of always using the prior value when necessary cannot.
We make the following assumption for the log-return of bonds and equities:
! !
0.04 0.12 −0.3 · 0.1 · 0.2
µ= and Σ = .
0.10 • 0.22
Figure 7.2.1 shows a joint plot for this prior log-normal distribution, see the accompanying code for
how the simulation and plot was created.
127
We first implement a view on equities, setting the expected return to 7.5% and the volatility to
27.5%. This can rightfully be called a market view, because the adjustment is not that significant,
which is evident by a low relative entropy of 7.08% and a high effective number of scenarios of 93.16%,
see equation (5.1.2). You can see how these views have been implemented and the statistics computed
in the accompanying code to this section.
Next, we implement an interesting stress-test using Sequential Entropy Pooling from Section 5.2.
We first calculate the 90%-CVaR of the 60/40 equity/bond portfolio to be 11.68%. We implement this
as the expected return for the portfolio as a C0 view. After the first step, we update the means and
volatilities of bonds and equities and finally implement the C4 correlation view that it should be equal
to 30%.
Sequential Entropy Pooling is needed for the stress-test because we must update the means and
volatilities of bonds and equities before we implement the correlation view. If we kept them at their
old values when implementing the correlation view, this would lead to logical inconsistencies or, at the
very least, very different volatilities in the correlation view. Hence, this practically relevant tail risk
stress-test is not possible to implement with the original Entropy Pooling heuristic that simply fixes
parameters to their prior values when necessary.
Figure 7.2.2 shows the joint prior and posterior distribution for the stress-test view. So, this is a
case where we are in the left tail of our portfolio’s return distribution and diversification fails, i.e., a
tail risk scenario if there ever was one. Perspectives on how to analyze and work with tail risk hedging
are given in Section 7.3 below. Figure 7.2.2 is generated using a proprietary implementation, but
readers are encouraged to replicate the plot in the accompanying code. Some initial code for the prior
distribution is provided as a starting point.
Figure 7.2.2: Prior and posterior distribution for the equity and bond stress-test.
128
7.3 Tail Risk Hedging
As Section 2.3 about the volatility risk premium shows, outright tail risk hedging using instruments
that are guaranteed to introduce downside convexity, such as options, is usually expensive. This is
due to investors’ risk aversion to significant loss scenarios. The counterparts that provide such loss
protection also have an aversion to these events and therefore require a compensation beyond the
break-even value for providing us with such downside convex payoffs. Hence, a strategic allocation to
put options on your portfolio will probably be a significant performance drag. If that is not the case
for your particular portfolio, you can go ahead and outright hedge the downside strategically, but it is
unlikely to hold true in reality. If you happen to be particularly worried about the downside over some
short period of time, you can of course tactically consider equipping your portfolio with an outright
tail risk hedge, being aware that this is a very challenging timing task.
Due to the volatility risk premium, many investors combine a generally well-diversified portfo-
lio with strategies that probably provide some downside convexity, for example, investing in trend-
following strategies, as introduced in Section 4.4. It is important to note that these are not guaranteed
to provide a positive payoff during a market sell-off, so the hedge is statistical and indirect in this case.
Another approach to tail risk hedging, which this book generally recommends, is to consider the
scenarios where we are in the left tail of the portfolio’s return distribution and diversification fails,
as in the last example of Section 7.2. There can be many such scenarios, and they are unique to
individual portfolios. Hence, providing general rules for how to identify them is challenging. There
might, however, exist general mathematical methods for how to isolate the diversification factors in
a portfolio and subsequently stress-tests these directly using Sequential Entropy Pooling. Investment
managers usually have a good sense of which diversification fail scenarios they are worried about, so
these can readily be implemented using Sequential Entropy Pooling like in Section 7.2.
If we are only worried about one tail risk scenario, perhaps what we believe is the most severe
one, we focus on building a tail risk for this particular scenario. We can then use a state probability
c ∈ [0, 1] as presented in Section 5.3 to combine the tail risk scenario probability vector qtail with
the prior probability p or some other base case posterior probability qbase . This gives us a posterior
probability q = cqtail + (1 − c) qbase that we can use for the final optimization of the portfolio including
an appropriately sized tail risk hedge strategy.
If we are worried about N different tail risk scenarios, we can conveniently use Bayesian networks
as presented in Section 5.4 to keep track of the state probabilities of each of these tail risk scenarios
PN
cn ∈ [0, 1] for n ∈ {1, 2, . . . , N } with n=1 cn = 1. In this case, we create one node called “tail risk
scenario” and assign probabilities to the N different scenarios. It is up to you as portfolio manager to
figure out how to set these tail risk scenario probabilities, while you have full flexibility.
It is probably hard to come up with good rules for determining the tail risk scenarios in general,
but if you want to introduce more structure for determining the tail risk scenario probabilities, you
can add the key risk factors on top of the tail risk scenario node, similar to the market case example
in Vorobets (2023). Figure 7.3.1 illustrates such a Bayesian network where real rates, inflation, and
growth affects the tail risk scenario probability. This allows us to also condition on the realization of
key risk factors and assess how it affects our tail risk hedging strategy. We can then use the final qtail
posterior probability vector to build the joint tail risk strategy.
129
Figure 7.3.1: Bayesian net with real rate, inflation, and growth affecting tail risk scenario probability.
In summary, the objective of tail risk hedging is to get as much downside convexity as possible
for the lowest price. In relation to evaluating the performance of the tail risk hedging, a put option
on the portfolio with some out-of-the-money strike is a natural benchmark. This put option probably
does not trade in the market, so you either have to price it yourself or get a counterparty to quote
a price. The put option is likely an upper bound on the downside convexity that you can expect
from a good tail risk hedging strategy, while it should also be an upper bound on the performance
drag that you experience from the tail risk hedge. If your tail risk hedging strategy ends up costing
more than the put option with a worse payoff in the tail risk scenarios, then this is obviously a bad
outcome, especially considering that the tail risk hedging strategy probably requires a lot more careful
investment engineering than an outright put option position.
A final perspective on tail risk hedging is that if you struggle to find good hedges that bring your
level of tail risk to a satisfactory level after you have build a generally well-diversified portfolio, you
should probably consider setting the strategic risk level lower for your portfolio. This might be a better
and easier way to achieve an increased risk-adjusted return at a satisfactory level of risk. Some people
talk about the perspective that the tail risk hedging allows you to take on more risk in the rest of the
portfolio. While this can perhaps be done in a clever way, it is important to avoid simply canceling
the tail risk hedge by implementing close to opposite trades in an other part of your portfolio. In the
end, such an approach will probably just accumulate higher trading costs, while remaining more or less
the same from a risk factor exposure perspective. Hence, meaningful tail risk hedging should focus on
achieving downside convexity as effectively and cheaply as possible, while the outright risk reduction
can be considered on a tactical basis.
130
Chapter 8
This final chapter summarizes the key points of the book as well as essential parts of the new investment
framework. It makes it clear how the new framework and methods are upgrades to the current
mainstream standard with (co)variance-based analysis and optimization. For each mainstream method,
including covariance-based risk, the Black-Litterman model, and mean-variance, the book suggests an
improvement. As explained in Section 8.1, the new approach is strictly better from a logical perspective.
It is, however, much harder to implement a production-quality version of it, but it is practically feasible
with current technology.
While the book has used mean-variance to illustrate investment concepts and principles, it has hope-
fully made it clear that mean-variance grossly oversimplifies the portfolio construction problem. Hence,
mean-variance should not be used to manage portfolios in practice. Underestimating the marginal tail
risks of instrument P&L distributions and assuming that dependencies are cross-sectionally constant
and linear is a clear recipe for disaster. If you use this approach, reality will eventually catch up to
you and reveal the excessive tail risks that your portfolios have probably been exposed to.
A natural question is: if the new investment framework is strictly better, why is everyone not using
it already? There are multiple reasons for this. The first one being that most people still do not
know about it, or have a sufficiently good understanding of the methods to use them. Then comes the
very significant implementation complexities. Finally, there are more subtle reasons related to human
nature with many commercial and reputational interests formed around covariance matrix estimation,
Black-Litterman, and mean-variance. While these methods are simple enough for most people to
understand, the ease of use also eliminates any potential for portfolio construction alpha. It is simply
too easy to replicate for mom-and-pop investors.
People with commercial or reputational vested interests in the old methods do not want you to
know about better alternatives. Some of them proactively engage in trying to limit your knowledge
about the framework and methods presented in this book. A final subtle nuance is that some people
simply do not like to admit that what they have been doing for a long time is in fact not very beneficial.
Hence, there is significant nonscientific resistance and a bias to maintain the status quo.
131
If you do not directly benefit from maintaining the status quo, it is in your best interest to transition
to methods that are going to help you get a better risk-adjusted return. In the long run, it is in the
best interest of everyone to transition to something better than the old approach. Remember that
investment markets are relentless feedback machines in the sense that they will simply let you know
whether you managed your investments in a good or bad way. You will not even know why you did
good or bad. You might have been lucky, or you might have been skillful. However, to increase your
probability of being successful you have to infer reality well and build portfolios in a clever way based
on this reality. This is what this book tries to help you do.
132
are handled in a natural probabilistic way as explained in Section 5.3. Entropy Pooling even has an
analytical solution in the normally distributed case, see Meucci (2008a). So once again, there is no
reason to continue using the old BL model when we have access to the new Sequential Entropy Pooling
method.
In summary, there are no logical reasons to continue using BL and mean-variance when Entropy
Pooling and mean-CVaR implementations are practically feasible. The continued justification of the
old methods is based on irrational arguments or commercial and reputational vested interests. As an
ambitious investment manager, you must not let these biases affect your investment performance.
133
probabilities for a very large number of variables S. When we use Sequential Entropy Pooling to
formulate various views and stress-tests, convexity is guaranteed by the linear constraints, and solutions
can be found in a fast and stable way.
Another interesting Entropy Pooling research area is to generalize the method to operate on fully
general multi-period market simulations as presented in Chapter 3. Meucci and Nicolosi (2016) have
initialized work on what they call Dynamic Entropy Pooling, but they unfortunately focus on just
Ornstein-Uhlenbeck processes, which suffer from the issues presented in Section 3.2.3 when it comes to
realistic market simulation. Hence, more research needs to be done to generalize the Entropy Pooling
method for general multi-period market simulations Rh ∈ RS×I , h ∈ {1, 2, . . . , H}.
It is of course also interesting to extend the CVaR analysis to multiple periods, while the risk
budgeting exercise from Section 6.3 is a practically good heuristic. It is important to note that this book
is multi-period in its market simulation approach, because we obviously live in a multi-period world
and want to be able to calculate the cumulative P&L of dynamic strategies. Few investment managers
contest that. However, investment managers’ appetite for multi-period optimization seems to be quite
low. The standard is still one-period analysis with mean-variance, while some introduce the risk
budgeting with tracking error constraints from Section 6.3 to handle signals with various horizons in a
heuristic way. The value of multi-period optimization is probably greater for high-frequency investors,
who usually also have a better understanding of their trading costs and market impact. For most
other investment managers, one-period optimization with risk budgets and proportional transaction
cost estimates seems to be preferred. A focus on fully general distributions and CVaR tail risks is a
major upgrade to mean-variance in that case. The multi-period optimization reservations are caused
by investment managers’ skepticism about forecasting paths for expected returns and transaction costs,
which might exaggerate the issues with mean uncertainty presented in Section 6.4.
As stated in Section 3.2.2 about generative machine learning for investment market simulation,
this field is still in its infancy with many opportunities for significant improvements. The author
believes that in the same way that we have large language models (LLMs), we will have large market
models (LMMs) in the future. While investment market generation is a very different field from text
generation, the fundamental approaches can be the same, but the data preprocessing and architectures
must necessarily be different. The author’s hypothesis is that a lot of transfer learning can happen in
the cross-section, e.g., that we can learn something about the dynamics of the US government bond
curve by also training the model on the dynamics of the German government bond curve. Indicator
variables that classify the type of data might also be useful in this context. Finally, it would be
interesting to examine a combination of generative machine learning methods from Section 3.2.2 and
the Fully Flexible Resampling method introduced in Section 3.2.1, potentially getting the best of both
worlds in terms of capturing very complex time series and cross-sectional dependencies.
In summary, while the framework from this book significantly improves on many of the core issues
with the current investment risk and analysis standard, there are still many interesting research topics
to explore. Not only does this research still need to be done, but these topics are in many cases very
broad and, hence, outside the scope of this book. This book already contains many new results and
perspectives for portfolio construction and risk management, which can be explored in many creative
ways. Research-oriented readers are encouraged to work on solving these problems.
134
Bibliography
Artzner, P. et al. (1999). “Coherent Measures of Risk”. Mathematical Finance 9.3, pp. 203–228.
Black, F. (1976). “The Pricing of Commodity Contracts”. Journal of Financial Economics, pp. 167–
179.
Black, F. and M. Scholes (1973). “The Pricing of Options and Corporate Liabilities”. Journal of Political
Economy 81.3.
Boyd, Stephen and Lieven Vandenberghe (2004). Convex Optimization. Cambridge University Press.
Breiman, Leo (1996). “Stacked Regressions”. Machine Learning 24, pp. 49–64.
Caticha, A. and A. Giffin (2006). “Updating Probabilities”. Bayesian Inference and Maximum Entropy
Methods in Science and Engineering Conference.
Corani, G. and C. de Campos (2015). “A Maximum Entropy Approach to Learn Bayesian Networks
from Incomplete Data”. Interdisciplinary Bayesian Statistics. Springer Proceedings in Mathematics
& Statistics. url: https://doi.org/10.1007/978-3-319-12454-4_6.
Fama, E. F. and K. R. French (1992). “The Cross-Section of Expected Stock Returns”. Journal of
Finance, American Finance Association 47.2, pp. 427–465.
Friedman, D. et al. (2014). Risky Curves: On the Empirical Failure of Expected Utility. Routledge.
Goodfellow, Ian J. et al. (2014). “Generative Adversarial Networks”. arXiv. url: https://arxiv.org/
abs/1406.2661.
Hamilton, James D. (1994). Time Series Analysis. Princeton University Press.
Hastie, T., R. Tibshirani, and J. Friedman (2009). The Elements of Statistical Learning. Springer.
Hull, J (2021). Options, Futures, and Other Derivatives. 11th ed. Pearson Education Limited.
James, Gareth et al. (2023). An Introdution to Statistical Learning with Applications in Python.
Springer.
Kingma, D. P. and M. Welling (2013). “Auto-Encoding Variational Bayes”. arXiv. url: https : / /
arxiv.org/abs/1312.6114.
Kingma, D. P. and M. Welling (2019). “An Introduction to Variational Autoencoders”. Foundations
and Trends in Machine Learning.
Kristensen, L. and A. Vorobets (2024). “Portfolio Optimization and Parameter Uncertainty”. SSRN.
url: https://ssrn.com/abstract=4709317.
Krokhmal, P., J. Palmquist, and S. Uryasev (2002). “Portfolio Optimization with Conditional Value-
at-Risk Objective and Constraints”. Journal of Risk 4, pp. 43–68.
Lahiri, S. N. (2003). Resampling Methods for Dependent Data. Springer Series in Statistics.
135
Maaten, L. van der and G. Hinton (2008). “Visualizing Data using t-SNE”. Journal of Machine Learning
Research 9.86, pp. 2579–2605.
Markowitz, H. (1952). “Portfolio Selection”. The Journal of Finance 7.1, pp. 77–91.
Markowitz, H. (1959). Portfolio Selection: Efficient Diversification of Investments. Yale University
Press.
McCoy, J., S. Kroon, and L. Auret (2018). “Variational Autoencoders for Missing Data Imputation
with Application to a Simulated Milling Circuit”. IFAC-PapersOnLine 51.21, pp. 141–146.
Meucci, A. (2007). “Risk Contributions from Generic User-Defined Factors”. The Risk Magazine,
pp. 84–88. url: https://ssrn.com/abstract=930034.
Meucci, A. (2008a). “Fully Flexible Views: Theory and Practice”. Risk 21.10, pp. 97–102. url: https:
//ssrn.com/abstract=1213325.
Meucci, A. (2008b). “The Black-Litterman Approach: Original Model and Extensions”. url: https:
//ssrn.com/abstract=1117574.
Meucci, A. (2012a). “Effective Number of Scenarios in Fully Flexible Probabilities”. GARP Risk Pro-
fessional, February 2011, pp. 32–35. url: https://ssrn.com/abstract=1971808.
Meucci, A. (2012b). “Stress-Testing with Fully Flexible Causal Inputs”. Risk. url: https://ssrn.
com/abstract=1721302.
Meucci, A. (2013). “Estimation and Stress-Testing via Time- and Market-Conditional Flexible Proba-
bilities”. SSRN. url: https://ssrn.com/abstract=2312126.
Meucci, A. (2014). “Linear Factor Models: Theory, Applications and Pitfalls”. SSRN.
Meucci, A., D. Ardia, and S. Keel (2011). “Fully Flexible Extreme Views”. Journal of Risk 14.2, pp. 39–
49. url: https://ssrn.com/abstract=1542083.
Meucci, A. and M. Nicolosi (2016). “Dynamic Portfolio Management with Views at Multiple Horizons”.
Applied Mathematics and Computation 274, pp. 495–518.
Michaud, R. and R. Michaud (1998). Efficient Asset Management: A practical Guide to Stock Portfolio
Optimization and Asset Allocation. Oxford University Press.
Munk, C. (2011). Fixed Income Modelling. Oxford University Press.
Norris, J. R. (1997). Markov Chains. Cambridge University Press.
Penman, S. (2012). Financial Statement Analysis and Security Valuation. 5th. McGraw-Hill Education.
Rebonato, R. and A. Denev (2011). “Coherent Asset Allocation and Diversification in the Presence of
Stress Events”. Journal of Investment Management. url: https://ssrn.com/abstract=1824207.
Rebonato, R. and A. Denev (2014). Portfolio Management under Stress: A Bayesian-Net Approach to
Coherent Asset Allocation. Cambridge University Press.
Rockafellar, R. T. and S. Uryasev (2000). “Optimization of Conditional Value-at-Risk”. Journal of risk
2, pp. 21–42.
Rockafellar, R. T., S. Uryasev, and M. Zabarankin (2006). “Generalized Deviations in Risk Analysis”.
Finance and Stochastics 10.1, pp. 51–74.
Ross, S. (1976). “The Arbitrage Theory of Capital Asset Pricing”. Journal of Economic Theory 13,
pp. 341–360.
Salimans, T. et al. (2016). “Improved Techniques for Training GANs”. Advances in Neural Information
Processing Systems.
136
Sharpe, W. F. (1964). “Capital Asset Prices: A Theory of Market Equilibrium Under Conditions of
Risk”. The Journal of Finance 19.3, pp. 425–442.
Vorobets, A. (2021). “Sequential Entropy Pooling Heuristics”. SSRN. url: https : / / ssrn . com /
abstract=3936392.
Vorobets, A. (2022a). “Portfolio Management Framework for Derivative Instruments”. SSRN. url:
https://ssrn.com/abstract=4217884.
Vorobets, A. (2022b). “Variance for Intuition, Cvar for Optimization”. SSRN. url: https://ssrn.
com/abstract=4034316.
Vorobets, A. (2023). “Causal and Predictive Market Views and Stress-Testing”. SSRN. url: https:
//ssrn.com/abstract=4444291.
Vorobets, A. (2024). “Derivatives Portfolio Optimization and Parameter Uncertainty”. SSRN. url:
https://ssrn.com/abstract=4825945.
Wolpert, David H. (1992). “Stacked Generalization”. Neural Networks 5, pp. 241–259.
Yoon, J., D. Jarrett, and M. Van der Schaar (2019). “Time-Series Generative Adversarial Networks”.
Advances in Neural Information Processing Systems.
137
Index
D H
data imputation, 42, 43 H1, 80–82, see also Sequential Entropy Pooling
delta hedging, 62–66 H2, 80–82, see also Sequential Entropy Pooling
demean, 93, 132 hedging, 62–66, 98, 127, 129, 130
138
homogeneous of degree one, 124 PCA, 41, 51
portfolio optimization, 90–123, 132
I
validation, 101, 102
illiquid alternatives, see alternatives
portfolio rebalancing, see rebalancing
inflation-linked bonds, see bonds, inflation-linked
pricing, 55–68
instrument pricing, see pricing
bonds, 56–58, 67
interest rate, 26–28, 56, 57, 61, 68
derivatives, 60–66
K equity, 58–60, 67
Kullback-Leibler divergence, 2, 42, 70, 71 options, 16, 61, 67
kurtosis, 11, 17–19 principle component analysis, see PCA
L R
leaf node, 85, 87, 88, see also Bayesian network ranking view, see view, ranking
LOCF, 42 rebalancing, 120–122
relative entropy, 70, 71, 80
M relative market value, 91–93
machine learning, 29, 30, 40–44, 134 Resampled Portfolio Rebalancing, 120–122, 133
marginal return, 124 Resampled Portfolio Rebalancing p-value, 122
marginal risk, 124 Resampled Portfolio Stacking, 102–119, 133
marginal risk-adjusted return, 125 Resampled Rebalancing Probability, 121
market resampling, 29–40
factors, 2 Fully Flexible, see Fully Flexible Resampling
representation, 1 test statistic, 120
simulation, see simulation return contribution, 106, 124–126
states, 3–6, 50, 83, 84 Return Stacking, 106–116
value, see relative market value risk
views, see views budgeting, 97–101, 123, 134
Markov chain, 4, 31, 33, 35 clustering, 13, 14, 20
mean-CVaR, 21–23, 93–97, 117, 118, 132 contribution, 97, 106, 119, 120, 123–126
mean-variance, 1, 11, 18, 20–23, 93–96, 101, 102, factors, 2, 25, 28, 46–48, 55, 61–63, 117, 122,
106, 121, 123, 131–133 129
minibatch discrimination layer, 44 management, 6–9
mode collapse, 44, 49 measure, 8, 94, 119, 124
parity, 123, 125
N
premium, see volatility risk premium and eq-
nominal bonds, see bonds, nominal
uity risk premium
O risk and return trade-off, 11–13, 133
optimization, see portfolio optimization Risk Stacking, 106–116, 124
options, 15, 63, 91, 129, 130 root node, 85, see also Bayesian network
pricing, see pricing options
S
P Sequential Entropy Pooling, 24, 54, 79–82, 128,
parameter uncertainty, 102–120 129, 133, 134
139
simulation, 24–54, 59, 61, 63, 66, 67, 127 mean, 72, 74, 75
evaluation, 30, 48–52 ranking, 75, 76
skewness, 11, 17–19 skewness, 75
stacking, 103–117, 119, see also Resampled Port- VaR, 76–79
folio Stacking variance, 72, 75
stacking target, 119, 133 VIX, 16, 19, 31, 52, 53
state-conditioning, 31, 40 volatility
states, see market states implied, 15, 16, 28, 31, 34, 45, 46, 61, 63, 68
stationarity, 25 realized, 15, 16
Markov chains, 4 risk premium, 15, 16, 20, 129
stationary transformation, 45, 46
Y
stationary transformations, 25–29, 61
yield, 57, 58
stress-testing, 69–90, 125–128
structural breaks, 3–6, 24, 31 Z
stylized market facts, 10 zero-coupon, 46, 56–58, 61, 68
subadditivity, 8
T
tail risk hedging, 127, 129, 130
time series, 24, 25, 30
time-conditioning, 6, 31, 40
tracking error, 97–102, 118, 120, 133, 134
transition matrix, 4, 32, 33, 35
doubly stochastic, 4
trend-following strategies, 62, 129
U
uncertainty, see parameter uncertainty
utility theory, 94
V
VAE, 26, 40–43, 52
Value-at-Risk, see VaR
VaR, 8, 94, 124
variance, 8, 52, 53, 97–99, 124, see also mean-
variance
variational autoencoder, see VAE
view, 69–90, 127, 128
classification, 74, 77, 79, 82
confidence, 83, 84
correlation, 75
CVaR, 76–79
kurtosis, 75
140