Bitcoin Price Prediction Based On Deep Learning Me
Bitcoin Price Prediction Based On Deep Learning Me
https://www.scirp.org/journal/jmf
ISSN Online: 2162-2442
ISSN Print: 2162-2434
Xiangxi Jiang
Keywords
Deep Learning Model, Multi-Layer Perceptron, Gated Recurrent Network,
Long Short-Term Memory, Cross-Validation, Normalization
1. Introduction
Bitcoin is a cryptocurrency and a form of electronic cash. It is a digital currency
that can be sent from user to user on the peer-to-peer Bitcoin network without
intermediaries. It keeps a record of trading among peers and every record is en-
crypted. Each new record created contains the cryptographic hash of a previous
block. Each record contains a timestamp and the data of the sender, the receiver,
and the amount. Given Bitcoin is an emerged technology, few predictions is made
on Bitcoin future value. Greaves and Au used linear regression, logistic regres-
sion and support vector machine to predict Bitcoin future price with low per-
formance [1]. Indira et al. proposed a Multi-layer Perceptron based non-linear
2. Dataset Exploration
Data used in this research is collected from Kaggle [6]. Bitcoin data from Jan
2012 to July 2018 is collected. It has a timestamp, the value at Open, High, Low,
Close, the volume traded in Bitcoin and USD, the weighted price and the date.
This research focuses on predicting Bitcoin price in the future hour by using the
price of past 24 hours, so only the timestamp and the weighted price are used in
the model.
3. Pre-Processing
As shown in Figure 1, the dataset is by minute, and contains around 3,409,920
points. Since we predicted the price by hours, we have had 1,409,920/60 which is
56,832 datapoints. The dataset is further split into training, validating and test-
ing sets. As shown in Figure 2, training data takes up to 80% of the entire data-
set, and validating and testing 10% respectively. As the time series data, samples
are not randomized. We used the first 24 hours’ Bitcoin price as input to predict
the next hours’ Bitcoin price. Several other pre-processing methods are imple-
mented to improve data processing and model convergency efficiency. Mini-
batch is used to split large data into small batches, which improves memory effi-
ciency. Minimum-Maximum normalization and window-based normalization is
used to set the whole training dataset to (−1, 1) scale. Window normalization is
based on the reference of stock market. The normalization methods will take
each sized window and normalize each one to reflect percentage changes from
the start hour of the window [3].
4. Models
Deep learning network is a type of computer modeling that finds the pattern
within the given datasets and categorize the input accordingly. There are many
different structures for deep learning network, including Multiple Layer Percep-
tron (MLP) that has a linear activation function, Recurrent Neural Network
(RNN) that records a separate hidden unit to influence the next calculation. Ex-
tensions of RNN include Long Short-Term Memory (LSTM) and Gated Recur-
rent Model (GRU).
MLP is a basic method in prediction. It reads all input with no ordering and
then determine the relationship between the independent variables and the de-
pendent variables. Hidden layers can be added between the input layer and the
output layer together with the activation function, to better describe the non-linear
relationship.
RNN is a group of method to calculate products from previous result of the
model and new input data. In fact, it is better to MLP that it has “experience”
from last calculations that will influence its calculations. The “experience” is
gained from the model, is kept privately but is allowed to pass onto the next
model. This private variable is called the hidden state and is passed on from the
current calculation to the future calculation. It determines independently the
output of the model, apart from the algorithm itself. However, RNN model de-
pends on the continuous flow, which is sequential like the time series, in order to
input data for the training. If the pattern repeats only over the long term, the
previous repetition may be not influential enough to affect it at the next repeti-
tion. It also requires the data to be in order of time. Therefore determines, unlike
MLP, RNN cannot be given random samples.
Long Short-Term Memory solves the issue that the diminished influence of
distant events on the RNN network. It has a switch that can choose certain
events to remember. It also is not long-term dependent and doesn’t require as
much training. It has four layers to determine the output, then passes the hidden
state with the result to the next cycle. “Forgetting gates” exists in addition to four
layers to determine if the experience should not be counted. Four layers and
forgetting gates can be given different information to focus on either short-term
or long-term memory.
GRU or Gated Recurrent Model is considered as one of the simpler model
compared to the LSTM model, combination of the “forget” step with the “input”
step into one, and as a result, requires only one hidden unit.
Among the three methods, MLP is mostly credited with its simplicity and the
need for less computational power. They have the same amount of information
as input. However, the number of hidden layers and the hidden units are more
magic numbers. Some number turns out to work well especially, while some may
turn out to be just the opposite. RNN accounts on the previous model through
the hidden unit. The value uses in the calculation but does not need interven-
tion. It can be very accurate, given the fact that the model has a large training set.
However, long term patterns cannot be memorized and this may result in inac-
curacy, especially when rapid changes take place in recent years. LSTM can
choose whether it should “forget” previous states. Therefore, it is better capable
of dealing with data that has repetitive trend over a long time. GRU model is also
able to choose whether it should recall previous experience, but it is capable of
learning more rapidly and need a bit less resource.
Six models are compared in this research. The model setups are listed in the
following Table 1 and training results will be discussed in the next part.
5. Results
As shown in Figure 3, in MLP and RNN frameworks, we find the similar
Table 1. Font sizes of headings. Table captions should always be positioned above the
tables.
Figure 6. Cross validation results. The top one is the 10-fold split of original data, the
middle one is the average RMSE for each fold, the bottom one is the RMSE/average price
in that fold.
Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this pa-
per.
References
[1] Alex, G. and Au. B. (2015) Using the Bitcoin Transaction Graph to Predict the Price
of Bitcoin.
[2] Indera, N.I., Yassin, I.M., Zabidi, A. and Rizman, Z.I. (2017) Non-Linear Autore-
gressive with Exogeneous Input (NARX) Bitcoin Price Prediction Model Using
PSO-Optimized Parameters and Moving Average Technical Indicators. Journal of
Fundamental and Applied Sciences, 9, 791-808. https://doi.org/10.4314/jfas.v9i3s.61
[3] Aungiers, J. (2018) Time Series Prediction Using LSTM Deep Neural Networks.
https://www.altumintelligence.com/articles/a/Time-Series-Prediction-Using-LSTM-
Deep-Neural-Networks
[4] Isaac, M., Saluja, S. and Zhao. A. (2015) Automated Bitcoin Trading via Machine
Learning Algorithms.
http://cs229.stanford.edu/proj2014/Isaac%20Madan,%20Shaurya%20Saluja,%20Aoj
ia%20Zhao,Automated%20Bitcoin%20Trading%20via%20Machine%20Learning%2
0Algorithms.pdf
[5] Pedregosa, F., et al. (2011) Scikit-Learn: Machine Learning in Python. Journal of
machine learning research, 12, 2825-2830.
[6] Zielak. (2019) Bitcoin Historical Data, Bitcoin Data at 1-Min Intervals from Select
Exchanges, Jan 2012 to July 2018, Version 14.
https://www.kaggle.com/mczielinski/Bitcoin-historical-data