Outsider Trading
Outsider Trading
In this paper we examine inefficiencies and information disparity in the Japanese stock market.
By carefully analysing information publicly available on the internet, an ‘outsider’ to conventional
statistical arbitrage strategies—which are based on market microstructure, company releases, or
analyst reports—can nevertheless pursue a profitable trading strategy. A large volume of blog data
is used to demonstrate the existence of an inefficiency in the market. An information-based model
that replicates the trading strategy is developed to estimate the degree of information disparity.
arXiv:1003.0764v2 [q-fin.TR] 7 Mar 2010
1. Introduction. Since the dawn of history, informa- information is incorporated in the price by the marginal
tion has always been generated locally; it then spreads investor. This bold statement, which comes in various
globally by various means, often being lost and some- forms, has often been criticised in the literature (e.g.,
times being rediscovered. Nothing has fundamentally Grossman and Stiglitz 1980). Often there is an abun-
changed with the advent of the internet. Here again, in- dance of valuable information that is widely accessible to
formation is generated locally on individual web sites and the whole market but from which not everyone has the
then, due to the potency of content and presentation, as resources or analytic capability to extract useful signals.
well as the vagaries of place and timing, disappears into Indeed, not even the so-called ‘marginal’ investor appears
some data repository, or is picked up and amplified, cre- to exploit this additional data. The important point is
ating avalanche effects. Nowadays information is often that the distribution of information is never homogeneous
posted initially on blogs and twitter accounts, or dis- because the ability to extract something useful is inho-
cussed on bulletin boards, and only subsequently, with mogeneous across different market agents.
some delay, reaches the traditional media as represented To establish a relationship between information and
by newspapers and television. This dissemination from a investment return, we must first identify what is meant
small to a wider circle of viewers is also of interest in the by information. In financial markets information con-
financial market context, because as knowledge spreads, sists of two parts: signal and noise. By ‘signal’ we mean
it starts influencing investment decisions. We demon- components of information that are dependent on the ac-
strate in this paper that by capturing these trends at an tual return of, say, an investment; whereas by ‘noise’ we
early stage of information diffusion in a systematic and mean components of information that are statistically in-
quantitative manner it is possible to construct a supe- dependent of the actual return of that investment. Both
rior trading strategy, thus establishing the existence of components have direct impact on price dynamics, but it
market inefficiencies. is ultimately the signal component that determines the
A closely related issue to information extraction in fi- realised value of the return. Thus, given this noisy in-
nancial markets is the valuation of information. Suppose formation, market participants try their best to estimate
one is in possession of a piece of information, deemed the signal; this estimate (in a suitably defined sense dis-
valuable, that one wishes to monetise. How does one cussed below) in turn determines the random dynamics
price information, when information is viewed as a trad- of the associated price process.
able asset? For instance, consider the information that In many cases signal and noise are superimposed in
the price of a given stock will move up the following day an additive fashion. In other words, there are essentially
with 75% likelihood. Leaving aside issues to do with two unknowns, ‘signal’ and ‘noise’, and one known, ‘sig-
insider trading for the moment, if one was to ‘sell’ this nal plus noise’. The rate at which the signal is revealed to
piece of information, how should one set a fair price? Ev- the market then determines the signal-to-noise ratio. The
idently, in this example the price depends on a number of kind of information inhomogeneity discussed above there-
market factors, such as market impact. It also depends fore arises primarily from the fact that different agents
crucially on whether this information provision is a one- have different signal-to-noise ratios. With further refine-
off event or whether such information will be supplied on ments, however, one finds that signal-to-noise ratio is it-
a regular basis. All these issues make it virtually impossi- self rarely known in financial markets, i.e. it is what
ble to arrive at the notion of a ‘fair price’ of information. one might call a known unknown. Yet, it is the signal-
It is nevertheless possible to associate a rate of return to-noise ratio that directly affects the performance of an
with the use of information, as we shall show here. investment. Hence we can determine the relative ratios of
In the efficient market theory ‘all’ the publicly available signal-to-noise ratios of different agents from their per-
2
formances. This is one objective of the present paper. It is often argued that the price dynamics is generated
We examine the ratio of two signal-to-noise ratios; one by supply and demand; this is indeed so, but it has to be
for the market as a whole, and one for an internet-search noted that a large part of supply and demand in financial
based strategy. markets is induced by the arrival of information (for ex-
Our choice for using an internet-search based strategy, ample, an announcement of a substantial profit leading
as a comparison against the market, should be evident: to high demand for company shares). We thus take the
most information circulates via the internet. Unlike tra- view that the traditional ‘supply and demand’ argument
ditional investment firms, large internet search engines, is in fact mostly the symptom and not the cause, at least
by their very nature and in spite of being ‘outsiders’ to in the case of highly liquid financial instruments.
financial markets, are well positioned to extract signals As regards changes in risk preference, at the individual
from large data sets. From the viewpoint of internet level this can be relatively volatile, but averaged over the
search engines, the kind of analysis discussed here also market the volatility will be reduced. On the other hand,
has a profound implication. One of the key difficulties in the flow of information is significantly more dynamic and
the business of information provision is in the quantita- volatile. It is common for a dynamical system to depend
tive assessment of the validity and quality of the search on fast moving and slowly moving variables; in the case
engines or other recommendation tools. However, we now of a financial market, information is the fast moving and
recognise that financial market dynamics provide a suit- risk preference is the slowly moving variable. For our
able testing ground, and one with rapid feedback. For strategy, the changes in overall risk preference have little
example, a “celebrity popularity engine” offered by in- impact, because we only test market neutral strategies
ternet companies, useful to advertisers, can be applied that have no exposure to the overall risk preference of
to individual companies; the quality of the engine, which the market. Therefore, our first simplifying assumption
otherwise would have been difficult to assess, can now is to regard market risk preference as fixed, and focus
be tested instantly against the future movements of the attention on the structure of information. Phrased in
corresponding stock prices. more technical terms, we will assume that the pricing
measure is given once and for all, and we shall construct
We have therefore taken a large number of blog articles
the market filtration from the outset, which will be used
from the internet, applied natural language processing
to derive the price process. This is in line with the BHM
(NLP) to convert numerous texts into numerical senti-
approach introduced in Brody et al. (2007, 2008), which
ment indices for individual listed companies, and then
will now be reviewed briefly.
developed a trading strategy that converts the sentiment
indices into portfolio positions. The results show the ex- Consider an elementary asset that pays a single div-
istence of an astonishing inefficiency in a highly liquid idend X at time T (e.g., a credit-risky discount bond).
equity market. We also construct a theoretical model, We assume that there is an established pricing measure
within the information-based asset pricing framework of Q, under which the random cash flow X has the a priori
Brody-Hughston-Macrina (BHM), for the characterisa- density p(x). In this case, market participants are con-
tion of the strategy. The model has the advantage that cerned about the realised value of X. In particular, the
the ratio of the signal-to-noise ratios between the in- risk-adjusted view of the market today about the cash
formed outsider and the general market can be estimated flow is represented by the a priori density p(x). By to-
from the investment performance. morrow, however, the market will obtain additional noisy
information, based on which the market will update its
2. Information and asset price. To understand the view, represented in the form of an a posteriori density
interplay between information and asset price, we must for X. This information consists of two components; sig-
first step back from the conventional approach in quanti- nal and noise. Although the signal-to-noise ratio is gen-
tative finance, and begin by identifying the main sources erally unknown, and furthermore it will change in time,
for price movements at a phenomenological level. After let us assume for simplicity that it is known to the mar-
a little reflection it should not be difficult to identify two ket, and that it is given by a constant σ. We also assume
important factors, namely, risk preference and available for the moment that the market is efficient in the sense
information. To understand these two factors we list two that all available information is used in the determina-
different scenarios: (i) I would have bought the new Toy- tion of the price today. Hence there is no residual noise
ota car, had I not lost my job; (ii) I would have bought today. Likewise, the noise will vanish at time T when the
the new Toyota car, had I not read the news of the re- value of X is revealed for sure. To keep the matter sim-
call. In case (i) the assessment of the worthiness of the ple, we model the noise term by the simplest Gaussian
product has not changed, but the purchase decision has process that vanishes at time 0 and time T —the Brow-
nevertheless been affected by the changes in one’s ap- nian bridge process {βtT } over the time interval [0, T ].
petite toward risk; whereas in case (ii) the assessment Therefore, our choice for the information is
of the worthiness of the product has changed due to the
arrival of new information. ξt = σXt + βtT . (1)
3
The market filtration {Ft } is thus generated by the where σ̂ 2 = (σ 2 − 2ρσσ ′ + σ ′2 )/(1 − ρ2 ), and
knowledge {ξs }0≤s≤t of the information process. σ − ρσ ′ σ ′ − ρσ ′
If we write PtT for the discount function, and assume β̂tT = β tT + β . (4)
σ̂(1 − ρ2 ) σ̂(1 − ρ2 ) tT
that it is deterministic, then the price at time t of the
asset is determined by St = PtT E[X|Ft ]. A short calcu- Therefore, the effective signal-to-noise ratio for the in-
lation then shows that the price process is given by formed trader is given by σ̂, which can be compared
against the market signal-to-noise ratio σ.
R∞ At time T /2 both the market and the informed trader
xp(x)e T −t (σxξt − 2 σ x t) dx
T 1 2 2
0.15
0.1
0.05
−0.05
0 10 20 30 40 50 60 70 80
trading cycles