Bitcoin Price Prediction Using Machine Learning
Bitcoin Price Prediction Using Machine Learning
The next step is to select parameters that will be fed to the 1) Log Normalization: In this method, the range is compressed
predictive network. From an array of available features, some and we get the values that were close to zero before
are mentioned below: normalization. The function is: A'= log(A)/log(max)
TABLE 1. BITCOIN FEATURES AND THEIR EQUATIONS 2) In built MATLAB method: The function used is 'normc' to
normalize database columns. It compresses the range to the best
S.no. Features Equations/Definitions possible extent as compared to other methods.
1. Block Size Average block size in MB
2. Total bitcoins Total number of bitcoins mined
3) Standard deviation normalization: Here, we take into
3. Day high, day Highest and lowest values of
consideration the difference of every value with respect to the
low different days
mean value. The advantage of this technique is that we get the
4. Number of Total number of unique Bitcoin
negative values as well due to proper compression of the Y axis.
transactions transactions per day
The formula is z = (x – μ) / σ.
5. Trade Volume USD trade volumes from the top
exchanges
4) Z score normalization: This method uses technique similar to
After feature selection, the sample inputs will be fed to the standard deviation method by considering the mean value.
model. The variation in the bitcoin values can be considered
as a pattern. The pattern can be either going up, down or 5) Boxcox normalization: The function used is:-
staying within a certain margin of the previous day's price.
data(λ)=(data^λ−1)/λ ….λ is not = 0
The next choice that is available is the number of layers and data(λ)=log(data) …λ is = 0
the number of neurons per layer. Hence, the model will
perform a pattern aided regression algorithm and artificial The sudden changes in data are observed significantly in this type
neural networks to correctly predict the bitcoin value. The of normalization, so that the data can be processed more accurately.
accuracy can be compared with different models after the final
prediction. Here are the results obtained after implementing various
normalization techniques:
IV.ONGOING WORK AND ACHIEVED RESULTS
The first step towards Bitcoin prediction is database
collection. For our paper we have collected database from the
following sources:
A. Quandl
Quandl holds databases related to financial, economic, and
social background from over 500 publishers. Data available on
Quandl can be used on different platforms such as Python,
MATLAB, Maple and Strata. We were able to procure
datasets for Bitcoin for up to 5 years of timestamp data with
specifications such as –Data high, Data low, Open, Close,
volume of transaction, weighted price.
Figure 1. Graph of data before normalisation
B. CoinMarketCap
CoinMarketCap keeps a track of all the cryptocurrencies
available in the market. They keep a record of all the
transactions by recording the amount of coins in circulation
and the volume of coins traded in the last 24-hours. They
continuously update their records as they receive feeds from
various cryptocurrency exchanges. CoinMarketCap provides
with historical data for Bitcoin price changes.
V. PROPOSED WORK
A .Bayesian Regression
1. Break the first third of the data into all possible
consecutive intervals of sizes 180s, 360s and 720s.
Apply k-means clustering to retrieve 100 cluster
centers for each interval size, and then use sample
Entropy to narrow these down to the 20 best/most
varied and hopefully most effective clusters.
2. Use the second set of prices to calculate the
corresponding weights of features found using the
Bayesian regression method. The regression works as
follows –
x At time t, evaluate three vectors of past prices of
Figure 4. Graph of data after standard deviation normalization technique
different time intervals (180s, 360s and 720s).
x For each time interval, calculate the similarity
between these vectors and our 20 best kmeans
patterns with their known price jump, to find the
probabilistic price change dp_i.
x Calculate the weights, w_i for each feature using a
Differential Evolution optimization function.
3. The third set of prices is used to evaluate the
algorithm, by running the same Bayesian regression
to evaluate features, and combining those with the
weights calculated in step 2.
B. GLM/Random forest:
1) Construct three-time series data sets for 30, 60, and 120
Figure 5. Graph of data after z-score normalization technique
minutes (180, 360, 720 data points respectively) preceding
the current data point at all points in time respectively.
2) Run GLM/Random Forest on each of the two time series
data sets separately.
3) We get two separate linear models: M1, M2 corresponding
to each of the data sets. From M1, we can predict the price
change at t, denoted ∆P1. Similarly, we have ∆P2 for M2.
4) Combine these values to predict the macro price change
defined as ∆P=W0+∑Wj∆Pj, where W0 is initial market
value at t=0, and Wj denotes the weight at the given
interval.
5) In addition to using 10-second interval data, we can also Shreya Maji, age 21, was born in Pune, India.
use 10-minute interval data to gain a longer-term picture of She has been extremely devoted towards the
technical education and is currently studying in
the price trends. final year of the undergraduate engineering
course from SPPU, Pune, India. Her area of
specialization is electronics and
VI.CONCLUSIONS telecommunications. With the current project
of Bitcoin Prediction, Ms. Shreya Maji is
After establishing the learning framework and completing aiming for a place in IEEE publications for the
the normalization, we intend to use the two methods first time.
mentioned above and choose the best method to solve the
Bitcoin prediction problem.
ACKNOWLEDGMENT
We would like to thank our project guide Mr. Kaustubh
Sakhare for encouraging us to work on this project and our
parents for the constant support.
REFERENCES
AUTHOR BIOGRAPHIES