CS230 Project Report:
Crypto Exchange Price Prediction using Limit Order Book
Ben Gilboa Tamal Biswas Ashwin Selka Padmanabhan
(SUID# - 06278930) (SUID# — 05107984) (SUID# — 06246676)
Stanford University
Spring 2018
Abstract
In this project we develop models for prediction of future
Bitcoin price trends using limit order book data as inputs.
1. Introduction
High frequency trading or Algo trading is gaining
significant momentum in stock exchanges. In today's
‘market, a sizable portion of the daily traded volume is
done by specialized companies using those techniques. In
the elaborated stock market, itis almost impossible for
individuals not using heavy machinery and very fast
access to data to gain any advantage, as margins and
arbitrages are closed in fraction of a second.
‘The rise of the crypto market and exchanges might reveal
opportunities that ate long gone in the stock market for
small scale algorithmic trading.
In this project we explore and develop a deep machine
learning model that predicts the future price of digital
asset such as bitcoin, We developed RN (Recurrent
Neural Network) that predicts the future price trend of a
tradable and volatile digital asset such as the Bitcoin. The
Input to the model will be a limit order book data along
With other historical indicators for demand and supply to
develop our predictor. Although we chose a digital asset
for this project, the principals and methods we develop are
transferable to any asset that is tradable in an exchange.
2. Prior work
Prior workin this area can be split into two categories
namely Mathematical models and the Deep Learning
models. Tian Guo and Nino Antulov-Fantulin [1] try to
predict the shor term biteoin price Muetuations
mathematically using their own custom model derived
from the volatility of the order book which is more reliable
than the related time series and moving average models
like ARIMA, ARIMAX etc. Huisu Jang and Jaewook Lee
[2] use information from Blockchain transaction data and
try proving that a Bayesian neural network performs well
n predicting the Bitcoin price time series associated with
its high volatility. Muhammad J Amjad and Devarat Shah
[3] improve on the current time series prediction
lub: https://github.com/gilboab/Cs
) Project
algorithms. More specifically, they develop a framework
for time series analysis and then present a scalable real
time algorithm with an intent to predict the next state of
Bitcoin with high accuracy. Justin A Srignano [5]
developed a new Neural Network architecture in 2015 for
‘modeling spatial distributions of the limit order books.
While this work was mostly around regular stocks and not
the highly volatile erypto currencies, The paper presents a
‘good motivational factor for combining Neural Networks
and Limit Order books for future price predictions and.
fluctuations.
3. Dataset Characteristics and Acquisition
‘The data that is primarily used in for our predictor is the
‘data from limit order book.
31. Limit order book
‘The limit order book captures all pending orders of bids
and asks. Graphical presentation (figure 1) shows an
accumulative view ofa single limit order book snapshot
‘The limit order book snapshot represents the demand and
supply in the market ina certain point in time. In the fi
it is clearly seen that the demand is “stronger”. There are
‘much more buyers who are willing to buy the asset for a
price that is lower by 3% from last price than sellers who
are willing to sellin a price that is higher by 3% than the
last price. This might indicate that the price is about to
increase. We look at the 500 highest bid orders and the 500
lowest ask orders in every snapshot of the order book.
Limit Order Book Snapshot
Figure 1: Limit Order Book Snapshot3.2. Bitcoin historical price
For each limit order book sample data point we look at
the corresponding bitcoin price. This is basically the “last”
price of a transaction at the same time when the order book
\was sampled. This data is used to generate the classifier for
price increase or deerease. Consider a point in time “to” that
corresponds to sample in our dataset “so”. By considering
‘certain number of examples (8,82, .., 5.9) We get historical
feature tothe training set. For predicting future trend at time
{sane We Compare the Bitcoin price at ty and t-taye to label
«price increase or decrease
Bitcoin price in the dataset
Figure 2: Biteoin Price throughput our samples
33. Data acquisition
‘We obtain the above data by sampling the Bittrex
‘exchange every 1 minute using the API it provides and
storing the data. We obtained so over 30,000 samples that
represent 3 weeks worth of trading data. The data is not
100% consecutive as sometimes the software crashes for
several reasons due to networking or related issues on the
Bitwex side
4, Initial model
As a starting point we use only one limit order meaning
that we predict a future change based on the current status
without looking atthe history or consecutive trends. Infact,
hi
\we shuffle the samples and eliminate any timing notation
Since every order in the book has 2 parameters (quantity
and price) we can’t use it as is. We apply a small
‘modification to the data to extract a training example, We
define “bins” of 10S and we sum the quantities that relate
to each bin. From 500 bid orders we create 100 bins that
represent the last price down to last price minus 10008. In
later phases we modified the 108 bins to 0.1% bins. Figure
2 present a result of the binning process and a visual
representation of one training example that we feed to the
initial NN. It is easy to observe that this training example
corresponds to the one used in figure 1. After binning the
data, we end up with 200 features for every training
example.
For the labels we have the last Bitcoin price that
corresponds to every training example. We make it a
classification problem by comparing the next value of the
bitcoin (1 min into the future) to the current price. If the
price increased the label is *I’ and if decreased or same itis
“0°. This classification is very naive and will not result in a
successful trading strategy but it
classification for initial design.
training set features (bins)
8
s
Figure 3: sample of one training example after
‘structured in bins
4.1, Fully Connected Network Architecture
“The objective ofthis initial phase sto find the correlation
and validate the data from the order book as valid
predictor. The architecture shown in figure 3 describes our
‘current initial network
‘We had orginally attempted using lesser number of
layers and neurons and came up with the architecture in
Figure 3 after some fine tuning and hyperparameter
experimentation, More details below
Figure 4: NN Architecture
Initial Resalts(ur current architecture has 6 layers. We used about
21,000 training examples and shuffled them. Then we
defined the training / dev sets as 80%/20% split. For the
labels we compared bitcoin price Imin, 2min, 3min, Sin
and 10min into the future to the current price, After
adjusting the learning rate combined with Adam
optimization and Early stopping, we achieved
approximately 95% accuracy on the training set and
approximately 64% on the dev set. The 95% accuracy is
very encouraging result for us but the high variance is
clearly a concern. We tried to add L2 Regularization and
Dropout but it did not help to reduce variance. It only
increased the bias.
‘The conclusion we got rom the FCN exercise is that
the architecture can’t predict better than 65% on the dev
set when learning for single order book sample and the
dataset that we have,
Tanmg secon A
Figure 5: Training vs Dev Accuracy with max dev
accuracy at around 3100 epochs
Figure 5 shows the accuracy associated withthe final
numbers after tuning the hyper parameters. We see that
dev set accuracy does not reach more that 65%, These are
results validate the correlation between the order book and
the labels but they also tel us thatthe model is not good
‘enough. To get a better model we wanted to bring back the
sense of time to the samples and use RNS for that.
5. RNN
To develop the best predictor for future Bitcoin price, we
twied different approaches and RNN architectures. We
used Tensorflow to develop our initial RNN models given
its high flexibility, portability, performance and other
advantages as explained in [6]
‘The common theme among all the experiments below was
to play with the hyperparameters and the related objects
associated with the Network namely:
1. Learning Rate
2. Number of Time Steps
Number of hidden units
Number if iterations
Batch Size etc.
Static vs Dynamic RNN
‘Type of Cell (LSTM/GRU/Basic RNN etc. ete
5.1. Input and output similar to the Fully connected
Network,
In this experiment, we fed the limit order book input
directly into an LSTM network followed by a sigmoid
‘output prediction of an increase or a decrease The
rationale here was to see how the RNN performed
‘compared to a pure Fully Connected Network and its
effect on the accuracy. This network trained slower
‘compared to the raw non RNN network, From accuracy
standpoint, we were able to achieve similar or better
results on the training set but worse than expected.
performance on the dev set.
¢ 9
Figure 6: RNN Architecture where Inputs and Outputs
are same as the Fully Connected Network
5.2, Input as Order book encoded with FC Network
In this experiment, we encoded the limit order book data
using a Fully Connected Network and fed the activations
from the last but one layer (layer before the sigmoid.
activation in the original FC network) into the RNN. From
accuracy standpoint, we were not able to achieve a
remarkable inerease, and this performed very similar to
the earlier RNN (5.1) where the order book was directly
fed as input
‘As part of these variations (5.1 and 5.2), in addition to
binning the quantity of biteoins based on the distance from
the current price, we doubled the number of input features
by including the distance themselves. We believed that by
doing so, we will help the network predict better,
primarily because, while the bitcoin price can change by
several 100 or 1000 dollars over time, the difference
between the current price and the bid/ask will follow a
pattern. For example, ifthe price today is $8000, many or
‘most ofthe bid/ask orders would be closer to the
$$7000/S8000/S9000 range rather than the $20000 range,
However, the neural network's dev set accuracy did not
see any reasonable improvement® 9 q ®
——
6 6 6 ®
Sao os
AAA A
Figure 7: RNN Architecture where Fully connected
network acting as encoder becomes RNN input
5.3. Single order book input split into time steps
In this experiment, we split the input into equal number of
parts and fed each part toa time step in the LSTM
network. For example, we spit the 200 feature inputs into
10 parts of 20 each and fed it to the LSTM network with
20 time steps. The rationale here was to follow a similar
pattern associated with examples from another domain
‘where an input image was split into rows and each row
\was fed into an LSTM cell asa time step. Unfortunately,
wwe didn’t find any remarkable change in accuracy with
this approach
an ee | 4
[See ee
e 4 4 6
aSea a
a.
a»
Figure 8: RNN Architecture with Single Limit order
book spliced into multiple time steps
With the change to RNN architecture and experimentation,
wwe were able to achieve 95% plus accuracy on the training
set and the dev set accuracy improved to about 66% (a 3%
increase compared to the FC network)
54. Categorical Model
Prediction of binary label is the simplest way to establish
the correlation between the input dataset and the outeome
but it is not a useful indication for successful trading
algorithm,
We enhanced the output labels to 3 categories;
1, Increase by more than threshold percent
2. Decrease by more than threshold percent
3. Did not change by more than threshold percent
We created a signal with 4 dimensions so we can tune and
find the best option. One dimension is the threshold (0.19%,
(0.2%, 0.3%, 0.4%, 0.5%), second dimension is the future
Took ahead prediction (Imin, 2min, 3min, Smin, 10min),
‘ther 2 dimensions are for the RNN time step (window
size) and the batch,
‘We used time step of and divided the data set to groups:
‘of 4 consecutive samples with overlap. Each input sample
‘o the time distributed network is 4 samples of 200
‘dimensions representing 4 consecutive order book
snapshots at 1 min intervals. For example, the first taining
‘example represent time (Ls, t2, tt) and the second
‘example represent time (C3, Us, t,t) ete”. This way the
RNN gets the sense of time without the sense of artificial
grouping. For the output we used one value that represent
the trend of the bitcoin price for every sample. We only
issue one label forthe entire unrolled RNN of 4 time steps.
This way, the model receive a sequence in time and the
single result of this sequence. To select the label, we tried