shsconf_icdeba2023_02015
shsconf_icdeba2023_02015
1051/shsconf/202418102015
ICDEBA 2023
Abstract. This study investigates the Fear & Greed Index, an indicator designed to reflect market sentiment
regarding Bitcoin price, intending to utilize it as a predictive parameter for future price fluctuations. Due to
the substantial volatility in Bitcoin prices and its significant influence on prediction outcomes, the dataset
was preprocessed through monthly filtering and normalization. To forecast Bitcoin prices, an array of
machine learning algorithms, including linear regression, random forest, and XGBoost, as well as their
enhanced counterparts, were employed. The optimal model was identified by comparing the Grid Search
XGBoost analysis results. This research holds implications for accurately predicting Bitcoin prices and
underscores the impact of market sentiment on its valuation.
2 Data and Method presents the descriptive statistics for the dataset’s two
variables of interest.
2.1 Data Table 1. Statistics of dataset
This study utilizes a dataset from Bitcoin & Fear and Median 39 11,464.51
Greed on Kaggle, originating from the alternative. Me. Mode 24 6,741.75
The dataset comprises five variables: Date, Value, Variance 487.85 279,449,800
Value_Classification, BTC_Closing, and BTC_Volume. Max 95 67566.82813
Date records the date information, spanning from
Min 5 3236.761719
February 1, 2018, to the present day. Value is a
numerical representation of the Fear & Greed Index, These statistics offer a preliminary understanding of the
ranging from 0 to 100. Value_Classification categorizes distribution and central tendencies of the Fear & Greed
the index into five levels: Extreme Fear, Fear, Neutral, Index and Bitcoin closing prices. In the subsequent
Greed, and Extreme Greed, reflecting the intensity of sections, the study will delve deeper into the relationship
market sentiment concerning Bitcoin's price. between these variables and employ machine-learning
BTC_Closing denotes the closing price of Bitcoin on the techniques to forecast Bitcoin prices based on the Fear
specified date, while BTC_Volume represents the & Greed Index.
market volume of Bitcoin on that date.
2
SHS Web of Conferences 181, 02015 (2024) https://doi.org/10.1051/shsconf/202418102015
ICDEBA 2023
3
SHS Web of Conferences 181, 02015 (2024) https://doi.org/10.1051/shsconf/202418102015
ICDEBA 2023
Bitcoin price for the rest of the days using the following Linear regression is a predictive model for estimating a
formula: continuous target variable based on input features. The
model posits a linear relationship between input features
(𝑃𝑃𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 − 𝑃𝑃𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 )
𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = (1) and the target variable. The primary aim of linear
𝑃𝑃𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
regression is to determine a linear function that
Applying this formula transforms the Bitcoin price minimizes the sum of squared prediction errors.
for each day within a month into a normalized value
relative to the base price (i.e., the price on the first day 2.2.1.2 Formula
of the month). This normalization process helps to
reduce the impact of significant price variations between The linear regression formula can be denoted as:
months. It allows for more accurate comparisons and
analysis of the relationship between the Fear & Greed 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 × 𝑥𝑥1 + 𝛽𝛽2 × 𝑥𝑥2 + ⋯ + 𝛽𝛽𝑛𝑛 × 𝑥𝑥𝑛𝑛 + 𝜀𝜀 (2)
Index and Bitcoin prices.
In this formula, y is the target variable, xi represents
input features, βi denotes model parameters, β0 is the
2.1.5.2 Split Data into Training and Test Sets
intercept term, and ε is the error term.
After normalizing the dataset and dividing it by month,
the next step is to split the dataset into a training set and 2.2.1.3 Build model
a test set in an 8:2 ratio. This division allows us to train
machine learning models on most data (80%) and The study creates a linear regression model, fits it with
evaluate their performance on the remaining unseen data training data, produces predictions using test data,
(20%). calculates performance metrics, and visualizes predicted
To recap, the following preprocessing steps have results compared to actual values.
been completed:
1. Checking for and removing missing values
2.2.2 Polynomial Regression
2. Calculating descriptive statistics
3. Smoothing the data using a moving average or Polynomial Regression is an extension of linear
other smoothing techniques regression, allowing for modelling non-linear
4. Removing outliers relationships between input features and continuous
5. Dividing the data by month and selecting output targets. By introducing higher-degree polynomial
months with higher correlations terms into the linear regression equation, polynomial
6. Normalizing Bitcoin prices using the first day of regression can effectively capture non-linear patterns in
each month as the base the data.
7. Splitting the dataset into a training set and a test After observing satisfactory performance from linear
set in an 8:2 ratio regression, the study applies polynomial regression for
With the preprocessed dataset, the study can apply further improvement. In this analysis, the study chooses
different machine-learning models to build and evaluate a 5-degree polynomial for fitting.
their effectiveness in predicting Bitcoin prices based on Combining higher accuracy and simplicity of
the Fear & Greed Index and other relevant factors. calculation, this study creates a polynomial regression
Standard models include linear regression, decision model with a degree of 5, fits it with training data,
trees, random forests, and neural networks. After produces predictions using test data, calculates
training and evaluating each model, the study can performance metrics, and visualizes predicted results
compare their performance and select the most suitable compared to actual values.
model for its specific use case.
4
SHS Web of Conferences 181, 02015 (2024) https://doi.org/10.1051/shsconf/202418102015
ICDEBA 2023
Where 𝑦𝑦𝑖𝑖 is the prediction of the 𝑖𝑖th decision tree, XGBoost does not have a specific formula, as it is an
and N is the number of decision trees. The Random ensemble method based on gradient boosting and
Forest prediction is the average of all decision tree decision trees. However, the prediction process for
predictions. XGBoost can be represented as follows:
5
SHS Web of Conferences 181, 02015 (2024) https://doi.org/10.1051/shsconf/202418102015
ICDEBA 2023
and predicted values. Further improvements to the hyperparameter tuning, feature engineering, or trying
model can be explored through additional feature alternative machine learning algorithms.
engineering, hyperparameter tuning, or trying
alternative machine learning algorithms.
6
SHS Web of Conferences 181, 02015 (2024) https://doi.org/10.1051/shsconf/202418102015
ICDEBA 2023
3.5 Discussion
Table 2 are summary of the results. When comparing the
results of different models, it is essential to consider
both the Mean Squared Error (MSE) and the Coefficient
of Determination (R²). A smaller MSE indicates a minor
prediction error, while a larger R² suggests better
descriptive performance. Due to the significant
fluctuations in Bitcoin prices, the MSE results may be
relatively large, making R² values particularly important.
The XGBoost and Random Forest models with volatile nature, the study advises exercising caution
hyperparameter tuning show considerably higher R² when investing in Bitcoin.
values than the other models, indicating better Further models and analyses could be improved by
descriptive performance. Among these two, XGBoost considering additional factors, incorporating more
has a relatively smaller MSE, making it superior in sophisticated features, or applying advanced machine-
prediction accuracy. Therefore, considering both its learning techniques. This would help to enhance the
descriptive ability (R²) and prediction accuracy (MSE), reliability and generalization capabilities of the models,
the XGBoost model performs better in predicting allowing for more accurate predictions and better
Bitcoin prices. decision-making in the context of Bitcoin investments.
In summary, the study concludes that the Grid
Search XGBoost model is the best model for predicting
Bitcoin prices in this analysis. References
1. Z. Chen, C. Li, & W.Sun, Journal of Computational
4 Conclusion and Applied Mathematics, 365, (2020)
2. A. H. Dyhrberg, Finance Research Letters, 16,
In summary, the study concludes that the Grid Search (2016)
XGBoost model is the best model for predicting Bitcoin
3. Georgoula, I., Pournarakis, D., Bilanakos, C.,
prices based on the Fear & Greed Index using a machine
Sotiropoulos, D. N., & Giaglis, G. M. SSRN
learning model in Python. The study found that the Fear
2607167, (2015)
& Greed Index strongly correlates with Bitcoin prices.
However, due to the high volatility of Bitcoin prices, it 4. A. Hayes, SSRN 2579445, (2015)
is not sufficient to study the relationship between the 5. L. Kristoufek, Scientific Reports, 3, 1(2013)
index and Bitcoin prices in isolation; the study must also 6. E. Bouri, R. Gupta , A. K. Tiwari, & D. Roubaud,
consider their relationship over time. The study Finance Research Letters, 23, (2017)
ultimately determined that Grid Search XGBoost yields
the best prediction results by utilizing various models 7. P. Ciaian, M. Rajcaniova, & D. A. Kancs, The
and model improvements for Bitcoin price prediction. economics of BitCoin price formation. Applied
This study provides strong evidence that market Economics, 48,19 (2016)
sentiment, as represented by the Fear & Greed Index, is 8. J. Bouoiyour, & R. Selmi, Annals of Economics
closely correlated with Bitcoin prices. This finding and Finance, 16, 2(2015)
offers a foundation for future research on Bitcoin prices 9. D. Koutmos, Annals of Operations Research, 10,
using alternative approaches influenced by market (2018)
sentiment and a direction for incorporating market 10. D.Malone, & K. J. O'Dwyer, Bitcoin mining and its
sentiment when constructing predictive models for energy footprint. In 25th IET Irish Signals &
Bitcoin prices. Systems Conference 2014 and 2014 China-Ireland
It is essential to acknowledge that Bitcoin prices are International Conference on Information and
influenced by various factors, which means the study’s Communications Technologies , (2014)
predictions are subject to a certain degree of error. The
high MSE values of the optimal model also suggest a 11. H. Vranken, Current Opinion in Environmental
potential risk of overfitting. Therefore, given its highly Sustainability, 28, (2017).
7
SHS Web of Conferences 181, 02015 (2024) https://doi.org/10.1051/shsconf/202418102015
ICDEBA 2023