0% found this document useful (0 votes)
6 views5 pages

Outsider Trading

This paper investigates inefficiencies and information disparity in the Japanese stock market, demonstrating that outsiders can develop profitable trading strategies using publicly available internet data, particularly blog data. The authors create an information-based model to quantify the degree of information disparity and establish a trading strategy that exploits market inefficiencies. The findings suggest that by capturing early trends in information diffusion, a superior trading strategy can be constructed, highlighting the role of signal-to-noise ratios in investment performance.

Uploaded by

la.mahfoudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views5 pages

Outsider Trading

This paper investigates inefficiencies and information disparity in the Japanese stock market, demonstrating that outsiders can develop profitable trading strategies using publicly available internet data, particularly blog data. The authors create an information-based model to quantify the degree of information disparity and establish a trading strategy that exploits market inefficiencies. The findings suggest that by capturing early trends in information diffusion, a superior trading strategy can be constructed, highlighting the role of signal-to-noise ratios in investment performance.

Uploaded by

la.mahfoudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Outsider Trading

Dorje C. Brody1 , Julian Brody2 , Bernhard K. Meister3 , and Matthew F. Parry4


1
Department of Mathematics, Imperial College London, London SW7 2BZ, UK
2
Business Service Division, Yahoo Japan Corporation, 9-7-1 Akasaka, Minato-ku, Tokyo, Japan
3
Department of Physics, Renmin University of China, Beijing, China 100872
4
Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, UK

In this paper we examine inefficiencies and information disparity in the Japanese stock market.
By carefully analysing information publicly available on the internet, an ‘outsider’ to conventional
statistical arbitrage strategies—which are based on market microstructure, company releases, or
analyst reports—can nevertheless pursue a profitable trading strategy. A large volume of blog data
is used to demonstrate the existence of an inefficiency in the market. An information-based model
that replicates the trading strategy is developed to estimate the degree of information disparity.
arXiv:1003.0764v2 [q-fin.TR] 7 Mar 2010

1. Introduction. Since the dawn of history, informa- information is incorporated in the price by the marginal
tion has always been generated locally; it then spreads investor. This bold statement, which comes in various
globally by various means, often being lost and some- forms, has often been criticised in the literature (e.g.,
times being rediscovered. Nothing has fundamentally Grossman and Stiglitz 1980). Often there is an abun-
changed with the advent of the internet. Here again, in- dance of valuable information that is widely accessible to
formation is generated locally on individual web sites and the whole market but from which not everyone has the
then, due to the potency of content and presentation, as resources or analytic capability to extract useful signals.
well as the vagaries of place and timing, disappears into Indeed, not even the so-called ‘marginal’ investor appears
some data repository, or is picked up and amplified, cre- to exploit this additional data. The important point is
ating avalanche effects. Nowadays information is often that the distribution of information is never homogeneous
posted initially on blogs and twitter accounts, or dis- because the ability to extract something useful is inho-
cussed on bulletin boards, and only subsequently, with mogeneous across different market agents.
some delay, reaches the traditional media as represented To establish a relationship between information and
by newspapers and television. This dissemination from a investment return, we must first identify what is meant
small to a wider circle of viewers is also of interest in the by information. In financial markets information con-
financial market context, because as knowledge spreads, sists of two parts: signal and noise. By ‘signal’ we mean
it starts influencing investment decisions. We demon- components of information that are dependent on the ac-
strate in this paper that by capturing these trends at an tual return of, say, an investment; whereas by ‘noise’ we
early stage of information diffusion in a systematic and mean components of information that are statistically in-
quantitative manner it is possible to construct a supe- dependent of the actual return of that investment. Both
rior trading strategy, thus establishing the existence of components have direct impact on price dynamics, but it
market inefficiencies. is ultimately the signal component that determines the
A closely related issue to information extraction in fi- realised value of the return. Thus, given this noisy in-
nancial markets is the valuation of information. Suppose formation, market participants try their best to estimate
one is in possession of a piece of information, deemed the signal; this estimate (in a suitably defined sense dis-
valuable, that one wishes to monetise. How does one cussed below) in turn determines the random dynamics
price information, when information is viewed as a trad- of the associated price process.
able asset? For instance, consider the information that In many cases signal and noise are superimposed in
the price of a given stock will move up the following day an additive fashion. In other words, there are essentially
with 75% likelihood. Leaving aside issues to do with two unknowns, ‘signal’ and ‘noise’, and one known, ‘sig-
insider trading for the moment, if one was to ‘sell’ this nal plus noise’. The rate at which the signal is revealed to
piece of information, how should one set a fair price? Ev- the market then determines the signal-to-noise ratio. The
idently, in this example the price depends on a number of kind of information inhomogeneity discussed above there-
market factors, such as market impact. It also depends fore arises primarily from the fact that different agents
crucially on whether this information provision is a one- have different signal-to-noise ratios. With further refine-
off event or whether such information will be supplied on ments, however, one finds that signal-to-noise ratio is it-
a regular basis. All these issues make it virtually impossi- self rarely known in financial markets, i.e. it is what
ble to arrive at the notion of a ‘fair price’ of information. one might call a known unknown. Yet, it is the signal-
It is nevertheless possible to associate a rate of return to-noise ratio that directly affects the performance of an
with the use of information, as we shall show here. investment. Hence we can determine the relative ratios of
In the efficient market theory ‘all’ the publicly available signal-to-noise ratios of different agents from their per-
2

formances. This is one objective of the present paper. It is often argued that the price dynamics is generated
We examine the ratio of two signal-to-noise ratios; one by supply and demand; this is indeed so, but it has to be
for the market as a whole, and one for an internet-search noted that a large part of supply and demand in financial
based strategy. markets is induced by the arrival of information (for ex-
Our choice for using an internet-search based strategy, ample, an announcement of a substantial profit leading
as a comparison against the market, should be evident: to high demand for company shares). We thus take the
most information circulates via the internet. Unlike tra- view that the traditional ‘supply and demand’ argument
ditional investment firms, large internet search engines, is in fact mostly the symptom and not the cause, at least
by their very nature and in spite of being ‘outsiders’ to in the case of highly liquid financial instruments.
financial markets, are well positioned to extract signals As regards changes in risk preference, at the individual
from large data sets. From the viewpoint of internet level this can be relatively volatile, but averaged over the
search engines, the kind of analysis discussed here also market the volatility will be reduced. On the other hand,
has a profound implication. One of the key difficulties in the flow of information is significantly more dynamic and
the business of information provision is in the quantita- volatile. It is common for a dynamical system to depend
tive assessment of the validity and quality of the search on fast moving and slowly moving variables; in the case
engines or other recommendation tools. However, we now of a financial market, information is the fast moving and
recognise that financial market dynamics provide a suit- risk preference is the slowly moving variable. For our
able testing ground, and one with rapid feedback. For strategy, the changes in overall risk preference have little
example, a “celebrity popularity engine” offered by in- impact, because we only test market neutral strategies
ternet companies, useful to advertisers, can be applied that have no exposure to the overall risk preference of
to individual companies; the quality of the engine, which the market. Therefore, our first simplifying assumption
otherwise would have been difficult to assess, can now is to regard market risk preference as fixed, and focus
be tested instantly against the future movements of the attention on the structure of information. Phrased in
corresponding stock prices. more technical terms, we will assume that the pricing
measure is given once and for all, and we shall construct
We have therefore taken a large number of blog articles
the market filtration from the outset, which will be used
from the internet, applied natural language processing
to derive the price process. This is in line with the BHM
(NLP) to convert numerous texts into numerical senti-
approach introduced in Brody et al. (2007, 2008), which
ment indices for individual listed companies, and then
will now be reviewed briefly.
developed a trading strategy that converts the sentiment
indices into portfolio positions. The results show the ex- Consider an elementary asset that pays a single div-
istence of an astonishing inefficiency in a highly liquid idend X at time T (e.g., a credit-risky discount bond).
equity market. We also construct a theoretical model, We assume that there is an established pricing measure
within the information-based asset pricing framework of Q, under which the random cash flow X has the a priori
Brody-Hughston-Macrina (BHM), for the characterisa- density p(x). In this case, market participants are con-
tion of the strategy. The model has the advantage that cerned about the realised value of X. In particular, the
the ratio of the signal-to-noise ratios between the in- risk-adjusted view of the market today about the cash
formed outsider and the general market can be estimated flow is represented by the a priori density p(x). By to-
from the investment performance. morrow, however, the market will obtain additional noisy
information, based on which the market will update its
2. Information and asset price. To understand the view, represented in the form of an a posteriori density
interplay between information and asset price, we must for X. This information consists of two components; sig-
first step back from the conventional approach in quanti- nal and noise. Although the signal-to-noise ratio is gen-
tative finance, and begin by identifying the main sources erally unknown, and furthermore it will change in time,
for price movements at a phenomenological level. After let us assume for simplicity that it is known to the mar-
a little reflection it should not be difficult to identify two ket, and that it is given by a constant σ. We also assume
important factors, namely, risk preference and available for the moment that the market is efficient in the sense
information. To understand these two factors we list two that all available information is used in the determina-
different scenarios: (i) I would have bought the new Toy- tion of the price today. Hence there is no residual noise
ota car, had I not lost my job; (ii) I would have bought today. Likewise, the noise will vanish at time T when the
the new Toyota car, had I not read the news of the re- value of X is revealed for sure. To keep the matter sim-
call. In case (i) the assessment of the worthiness of the ple, we model the noise term by the simplest Gaussian
product has not changed, but the purchase decision has process that vanishes at time 0 and time T —the Brow-
nevertheless been affected by the changes in one’s ap- nian bridge process {βtT } over the time interval [0, T ].
petite toward risk; whereas in case (ii) the assessment Therefore, our choice for the information is
of the worthiness of the product has changed due to the
arrival of new information. ξt = σXt + βtT . (1)
3

The market filtration {Ft } is thus generated by the where σ̂ 2 = (σ 2 − 2ρσσ ′ + σ ′2 )/(1 − ρ2 ), and
knowledge {ξs }0≤s≤t of the information process. σ − ρσ ′ σ ′ − ρσ ′
If we write PtT for the discount function, and assume β̂tT = β tT + β . (4)
σ̂(1 − ρ2 ) σ̂(1 − ρ2 ) tT
that it is deterministic, then the price at time t of the
asset is determined by St = PtT E[X|Ft ]. A short calcu- Therefore, the effective signal-to-noise ratio for the in-
lation then shows that the price process is given by formed trader is given by σ̂, which can be compared
against the market signal-to-noise ratio σ.
R∞ At time T /2 both the market and the informed trader
xp(x)e T −t (σxξt − 2 σ x t) dx
T 1 2 2

0 have accumulated noisy information, based on which they


St = PtT R∞ . (2)
p(x)e T −t (σxξt − 2 σ x t) dx
T 1 2 2
evaluate the a posteriori probabilities, pm and pi , respec-
0
tively, that X = 1. The trading strategy is as follows. If
We see therefore that in the BHM framework it is possible the a posteriori probability is larger than the threshold
to derive the price process in a manner that replicates value K+ then take a long position by the amount X̂t ; if
how price processes are generated in the first place via the a posteriori probability is smaller than the threshold
flow of information. In spite of the various simplifying value K− then take a short position by the amount X̂t ,
assumptions, the resulting price process (2) is very rich where X̂t = Et [X] is the expectation of X using the mar-
and possesses many desirable features. Perhaps the most ket filtration. The position is then held till time T , at
notable from a practical point of view is the fact that the which point the profit or loss is made because the value of
pricing and the hedging of elementary contingent claims X is now revealed. Also at time T the next observation
are made easy. for the value of the random variable representing whether
the asset price moves up or down over the interval [T, 2T ]
3. Modelling the informed outsider. Within the begins, and the same strategy is repeated over and over.
BHM framework it is straightforward to model the infor- Our model thus makes an implicit simplifying assump-
mation disparity seen in the market. Indeed, it has been tion that the magnitude of the stock volatility over the
shown in Brody et al. (2009) that if there is an informed range [nT, (n + 1)T ] is independent of the value of n.
trader in the market who has access not only to the mar- Both the market and the informed trader employ the
ket information (1) but also to an additional information same strategy, but the informed trader on average makes
source ξt′ = σ ′ Xt + βtT

, then the informed trader can better estimates for the realised value of X, thus statis-
exploit the information to generate statistical arbitrage. tically obtaining a higher rate of return than the market.
Here we shall modify the setup considered therein so as to The risk-neutral valuation of the market position can be
replicate the trading strategy that we have developed by made straightforwardly, because the resulting cash flow
use of data taken from the internet, and calibrate some is given by (X − X̂t )(1{X̂t > K+ } − 1{X̂t < K− }). By
of the model parameters. In this manner we are able to a change of measure technique introduced in Brody et
test the performance of internet-based recommendation al. (2007) one can show that the value of the strategy
or rating engines from investment performances. is given by a formula analogous to the Black-Scholes op-
Our modelling setup can be summarised as follows. tion pricing formula. The valuation of the position of the
We let X be a binary random variable taking the values informed trader is less obvious, although one can show
{0, 1}, where 1 represents price moving up by a unit over that the expected P&L difference is positive, leading to
the period [0, T ] and 0 represents price moving down by a statistical arbitrage opportunity.
a unit over the same period. At time 0 both the mar- 4. Implementation and calibration. We have im-
ket and the informed trader share the same information plemented the strategy using publicly available informa-
about the value of X, represented by the a priori proba- tion sources. Specifically, we have gathered the totality
bilities (p, 1−p). The informed trader, however, begins to of Japanese blog articles since 2006 and used them as
gather information from the internet, using text and data our sole information source. In 2009, nearly 20 million
mining; whereas the general market gathers information Japanese blog articles appeared on the internet, making
through more widely accessible sources such as newspa- a daily average of around 50,000 articles. Each blog arti-
per articles and financial reports. We let ξt of (1) repre- cle is weighted by its relevance (e.g., page views). Those
sent the market information process, and ξt′ = σ ′ Xt+βtT′
with insufficient weight are regarded as ‘pure noise’ and
represent the extra information gathered from the inter- have been discarded from the analysis.

net, where the two noises {βtT } and {βtT } may be de- Natural language processing (NLP) technology of Ya-
pendent, with correlation ρ. It is shown in Brody et al. hoo Japan Corporation and Yahoo Japan Research In-
(2009) that in the case of multiple information sources stitute has been applied to analyse company specific
the knowledge of the informed trader can be represented comments of the listed companies. The NLP classifies
in the form of a single effective information process whether the comments are positive, neutral, or negative;
this classification is then used to establish sentiment in-
ξˆt = σ̂Xt + β̂tT , (3) dex for each company. Based on the sentiment index, a
4

trading strategy, analogous to the one described above, 0.45


P&L curves: market (black) and informed trader (green)

is developed. The idea can be illustrated as follows. If 0.4

many people write complimentary remarks about a new 0.35


product released by a given company then it is likely that
0.3
sales of the product will go up, leading to an increase in
0.25
its share price.
0.2

0.15

0.1

0.05

−0.05
0 10 20 30 40 50 60 70 80
trading cycles

FIG. 2: Simulation of the strategy. We have run the strategy


for the informed trader (black solid line) and that for the
market (blue dashed line), and taken the average over 5,000
sample paths. Parameter values are set as ρ = 0.1, σ = 0.2,
σ ′ = 0.48, and hence σ̂ = 0.50.

however, we found that the range is relatively narrow:


σ̂
2.4 . . 2.6. (5)
σ
The simulation results associated with the choice ρ = 0.1
are shown in figure 2.
5. Discussion. We have successfully extracted a trading
FIG. 1: Information-based trading. The blog sentiment data signal from the abundant data accessible on the internet.
is used to create a trading strategy for the relevant stocks. By applying the results to the stock market, we were able
The performance (total return) that results from the strategy to assess the performance of the information extraction
is shown in the solid black line. The dark blue dashed line
represents the average stock prices; the light blue dashed line
and provision engine. The results have identified perhaps
the Nikkei 225 Index. The ‘learning’ period for optimisation a surprising level of apparent inefficiency even in a highly
corresponds to the left of the vertical green line; the strategy liquid equity market, indicating the degree of information
is applied over the seven month period starting in the late inhomogeneity.
March 2009. Our information-based strategy yields over 40% It is of course well documented that asset prices in fi-
return for the seven month period. nancial markets respond to the unravelling of information
(e.g., Engle and Ng 1993; Andersen et al. 2007). Indeed,
the realisation that information filtering and communi-
The strategy has been optimised using the data from cation is the key for grasping social sciences such as eco-
2008 to early 2009 (for example, the choice of the thresh- nomics has been recognised since Wiener (1954). Our
old values K± ), and applied for the seven month period analysis differs sharply from previous work carried out in
from April 2009. Specifically, for the analysis presented this area in that we explicitly identify the existence of in-
here we have considered 10 companies for whom the av- formation disparity and derive an estimate for how much
erage numbers of blog comments are highest. In order more the rate of information extraction could have been
to obtain a conservative estimate for the ratio σ̂/σ, and enhanced had the market been truly efficient. In con-
also to reduce exposure to the market risk preference, trast with Google Finance, for instance, that provides a
we have adopted a long-short strategy against the Nikkei postmortem analysis of the relation between large price
225 Index. The result of the strategy, as well as the aver- moves and revelations of news items, our informed trader
age stock prices of the active names and the Nikkei 225 is able to exploit additional information sources to antic-
Index, are shown in figure 1. ipate price moves.
To estimate the ratio σ̂/σ we have simulated the strat- The analysis reported in this paper is naturally of in-
egy numerically. Because we do not yet have a suitable terest to statistical arbitrage funds, because the strategy
method of estimating the correlation ρ between the noise is orthogonal to conventional strategies that rely on, for
in the blog sentiments and the noise for market investors, example, microstructure. On the other hand, from the
we can only give a range for this estimate. Fortunately, viewpoint of an internet search engine, one might envis-
5

age a scenario whereby individual investors purchasing


‘signal’ from information providers and making their own
investments. Such a model, however, is unlikely to be
[1] Andersen, T., Bollerslev, T., Diebold, F. X. & Vega, C.
sustainable, because if the signal is circulated broadly, it (2007) “Real-time price discovery in stock, bond and
ceases to remain useful. As Wiener emphasises, concen- foreign exchange markets” Journal of International Eco-
tration of useful information is intrinsically unstable due nomics, 73 251-277.
to the second law (Wiener 1954). The only way in which [2] Brody, D. C., Hughston, L. P. & Macrina, A. (2007) “Be-
information can be spontaneously concentrated, at least yond hazard rates: a new framework for credit-risk mod-
momentarily, is via innovation. It is interesting therefore elling” In Advances in Mathematical Finance: Festschrift
Volume in Honour of Dilip Madan (Basel: Birkhäuser).
to reflect on the fact that in spite of the enhancement of
[3] Brody, D. C., Hughston, L. P. & Macrina, A. (2008)
technology in improving the method of information gath- “Information-based asset pricing” International Journal
ering and provision, whose purpose a priori goes against of Theoretical and Applied Finance 11 107-142.
the second law, ultimately such developments can only [4] Brody, D. C., Davis, M. H. A., Friedman, R. L., and Hugh-
result in enforcing the compliance with the second law. ston, L. P. (2009) “Informed traders” Proceedings of the
As a result, in the long run the second law will enhance Royal Society London A465 1103-1122.
the ‘efficiency’ of financial markets, but maybe also, para- [5] Engle, R. F. and Ng, V. K. (1993) “Measuring and testing
the impact of news on volatility” The Journal of Finance,
doxically, the instability of financial markets, because in
48, 1749-1778.
a noise-dominated market, the revelation of the true sig- [6] Grossman, S. J. and Stiglitz, J. E. (1980) “On the impossi-
nal has a significant impact. bility of informationally efficient markets” The American
The authors thank Robyn Friedman for stimulating Economic Review 70 393-408.
[7] Wiener, N. The human use of human beings. Revised Edi-
discussion. The opinions expressed in this article are
tion (London: Eyre and Spottiswoode, 1954).
those of the authors. Email: [email protected]∗ ,
[email protected], [email protected]∗ , and
[email protected] (∗ corresponding authors).

You might also like