Music_Popularity_Prediction_Through_Data_analysis_
Music_Popularity_Prediction_Through_Data_analysis_
Email address:
Received: October 7, 2021; Accepted: October 20, 2021; Published: October 29, 2021
Abstract: Today, the music industry has grown tremendously with the emergence of smartphones and streaming services. In the
past, the most of revenue was from the album’s sales and concerts. However, these days, streaming services on the web or
smartphones have become a huge part of the music industry. Therefore, from an artist’s perspective, it is important to rank their music
high on streaming services to earn money. While the music industry is growing, the top 1% of artists have gone from earning 26
percent of revenue to between 56% and 77%. This shows the huge income gap among the artists. Large profit on various artist can
help to make a better music business. This paper is written in order to analyze the popular music in Spotify, which is one of the most
popular music streaming services in the world. To find the factors that popular music has, this paper analyzes data of 2010~2019 top
50 music on Spotify. The paper also presents the table and graph that clearly illustrate the average of many music factors such as beat
per minute and duration to investigate how music should be made to rank high on the Spotify. Moreover, the paper utilizes a machine
learning model to predict the popularity of music by analyzing the beat per minute, speecheness, loudness, and duration, etc. The
prediction model is expected to be used by many artists or music companies before they release their music.
Keywords: Data Science, Machine Learning, EDA, Music, Business
The paper contains 5 sections: Introduction, data Loudness also has a similar pattern with the BPM and
Exploration, popularity prediction model, and conclusion. energy because people usually feel energy with the higher
bpm and louder beat and sound. Valence describes the
2. Data Exploration musical positiveness in the music. Music with high valence
sounds more positive (e.g. happy and cheerful). In contrast,
Before implementing the model, the paper examine the music with low valence sounds more negative (e.g sad and
data that collected from Spotify. This section analyzes the depressed). Usually a positive song has higher BPM with
dataset to summarize its characteristics with the visualization. energy while a negative song has lower BPM with lower
energy. Therefore, the average of valence is higher in 2010-
2.1. About Data 2015 than the 2016-2019.
The section analyzes the two different sets of data: Top 10 Table 3. Average Length, Acoustitiness and Speecheness in each year.
songs in 2010-2019 and Top 50 songs in 2019. The data set
contains 13 columns: name of song, artist’s name, genre, Year Avg. Length Avg. Acoustitiness Avg. Speecheness
BPM (bit per minute), energy, danceability, loudness, 2010 230 11.6 8.9
liveness, length, acousticness, speechiness, and popularity. 2011 243 13.3 9.7
2012 224 4.9 5.8
2.2. Overview 2013 234 10.3 8.3
2014 224 17.6 8.7
In this section, the paper first summarizes the top 10 songs 2015 223 16.6 7.1
between 2010 - 2019. Table 1 shows the average of BPM, 2016 220 15.9 8.4
evergy, and danceability. Table 2 is average of Loudness 2017 222 16.6 9.8
(dbm), liveness, and valence. Table 3 shows the average of 2018 217 12.8 8.6
length, acoustintness, and speechness. 2019 200 21.7 8.1
Table 1. Average BPM, Energy and Danceability in each year. As shown in Table 3, the length of music is getting shorter.
Year Avg. BPM Avg. Energy Avg. Danceability The reason that the song is getting shorter is streaming. As
2010 122 78 65 explained in the previous section, the most of revenue is from
2011 119 75 64 the streaming. In most streaming services, artists can get paid
2012 121 75 66 if someone listens to at least 30 seconds of a song. Therefore,
2013 122 74 62 a song doesn't have to be long. Also, many artists tend to
2014 123 68 63
2015 120 70 64
make 12 short songs in one album than 10 long songs
2016 114 67 63 because there is more chance to get streamed if there are
2017 117 69 65 more songs in the streaming service. There were no special
2018 115 65 67 patterns found in aoucstitiness and speechenss.
2019 112 65 70
2.3. Genre
Beats per minute (MBP) indicate the tempo in the music.
In other words, music with higher bpm is faster than the This section investigates which genres are popular on
lower one. In table 1, there is a trend of BPM in each year. Spotify. Since there were 50 different genres, they are
From 2011 to 2015, the average BPM is higher than 120. divided the genre vs. popularity into two different graphs:
However this has decreased to 115 after 2015. There was a Figure 1 and Figure 2. The higher popularity means the
similar trend in Energy. Energy of popular songs was high in higher rank in this graph. One of the most popular genres
2011 to 2015, but it decreased after 2015. Since the BPM is were R&B and escape room. Escape room is the genre that is
highly related to the energy of songs, they have similar related to R&B. So from this section can conclude that R&B
patterns. Unlike BPM and energy there is no huge difference is the most popular genre in Spotify. Other than R&B,
on danceability. However, the popular songs in recent years electronic music such as electronic pop are also popular.
have more dancability than the past.
2.4. Music Trend in 2021
Table 2. Average Loudness, Liveness and Valence in each year.
As explained in the previous section, there is a trend and
Year Avg. Loudness Avg. Liveness Avg. Valence popular style of music in each year. From the EDA process
2010 -5 21 57 with the Top 10 data in 2010~2019, the R&B style with low
2011 -5 21 54
2012 -4.9 16 64
BPM is the most popular style. This section also analyzes the
2013 -5.1 20 53 music trend in 2019 by analyzing the Top 50 music on
2014 -5.8 17 52 Spotify.
2015 -5.6 18 53 As shown in Figure 1, the most popular music was R&B
2016 -6.7 18 45 music in 2010-2019. However, as shown in Figure 3, pop
2017 -5.6 15 52
2018 -5.6 15 49
music was the most popular genre in 2019. This indicates
2019 -5.8 15 51 that the trend of music in recent years is pop music.
International Journal of Science, Technology and Society 2021; 9(5): 239-244 241
The graph also indicates that genre can affect the model also uses genre with the other 9 vectors to predict the
popularity of music. So, in the later section, the prediction popularity of music.
3.1. Correlation of Factors that can predict the popularity of music by analyzing the 10
factors. The one additional factor is genre. We transformed
Figure 4 depicts the correlation and relationship between the genre into integer type (e.g. pop to 1), so genre could be
the various factors. As we described in the previous section, used as a factor to predict the popularity. We used the
music contains 9 factors: bpm (BPM), nrgy (energy), dnce following algorithms and compare the performances
(danceability), loudness (dB), live (liveness), val (valence), (accuracy) of the prediction:
dur (length), acoustics (acous), and speech (speechness). Linear Regression: Linear regression is the model that
‘Pop’ in the heatmap indicates the popularity (rank). In the predicts the popularity by fitting linear equations in variables.
heatmap, if the number inside of each box is closer to 1, the The 9 factors will be transformed into one vector variable, and
more correlated the two factors are. For example, since the this will be used as explanatory variables, and the popularity is
heatmap value for the region that represents the relationship considered to be the dependent variable [11, 12].
of the loudness (db) and the danceability is 0.16 and is very The k-nearest Neighbor regression: K-neighbor regression
close to 1, it is considered as very correlated. The factor that is a non-parametric method performed using Python in order
has the highest correlation with the popularity is loudness to analyze the relationship between two variables. As similar
(dB). Danceability, valence and BPM are also important to Linear regression, 9 factors of music were transformed
factors for the high rank at Spotify. However, as shown in into one variable [13].
Figure 4, length (duration), speechness, and liveness are not Random Forest: In random forest, multiple decision trees
correlated with popularity (rank). are created. Each tree predicts the value (popularity) by
3.2. Model Implementation learning the simple decision rules. Random forest combines
the results of those trees to get more accurate and stable
This section shows the implementation a prediction model predictions of popularity [14, 15].
Figure 4. RSME of three machine learning algorithms in the popularity prediction model.
244 Jaehyun Kim: Music Popularity Prediction Through Data Analysis of Music’s Characteristics
Root Mean Square Error is the standard deviation of [4] A. Varshavsky, “Analysis of income inequality impact on the
prediction errors. Lower RMSE means higher accuracy. As musical art”, Journal of the New Economic Association, New
Economic Association, 2020.
shown in Figure 4, Linear regression model has the lowest
RSME error which is 3.12 while KNN and Random forest got [5] P. Dicola, “Money from Music: Survey Evidence on
3.3 and 4.5. However, the prediction of all three algorithms Musicians’ Revenue and Lessons About Copyright Incentives,
succeeded in predicting the popularity of music with low Northwestern university School of Law, 2019.
RSME. For example, the actual popularity of the song [6] M. Mai, “Death of the Music Long Tail”, silpayamanant.2014,
“Beautiful People” is 86. For the prediction popularity, Linear https://silpayamanant.wordpress.com/2014/03/07/death-of-
regression model got 88, while KNN and Random Forest the-musical-long-tail/.
model got 88.5 and 81. Therefore, we expect that artists can [7] “U.S. music streaming revenue 2020”, Statista, 2021.
use this prediction model to predict the popularity in Spotify. https://www.statista.com/statistics/437717/music-streaming-
revenue-usa/
4. Conclusion & Future Works [8] Seth A. Carver “Changing the Industry, Spotify”, University
of Tennessee, 2016.
With the emergence of smart phone and streaming
services, ranking high in streaming services has become an [9] Fleicher, Rasmus & Snickars, Pelle. “Discovering Spotify - A
important factor for better profits and popularity. The trend of Thematic Introduction. Culture Unbound”: Journal of Current
Cultural Research. 9. 2017. 130-145.
music changes every year, so it is crucial to understand the 10.3384/cu.2000.1525.1792130.
trend to rank high in the streaming services. To solve this
issue, we analyze the top 50 musics in 2010-2019 to see the [10] Araujo, Carlos & Cristo, Marco & Giusti, Rafael. “Predicting
trend of popular music. We also implemented the machine Music Popularity on Streaming Platforms”. 2019. 141-148.
10.5753/sbcm.2019.10436.
learning model to predict the popularity by analyzing 9
factors of the music. We made the model with the three [11] scikit-learn developers.. sklearn. neighbors.
different machine learning algorithms, and we got the model kneighborsregressor. scikit learn. https://scikit-
with 90% of accuracy. We expect that this model can be used learn.org/stable/modules/generated/sklearn.neighbors.KNeigh
borsRegressor.html.
by many artists or companies for predicting their music
before release. For future works, we are going to expand this [12] Kumari, Khushbu & Yadav, Suniti. “Linear regression analysis
project that can predict the popularity of music by analyzing study”. Journal of the Practice of Cardiovascular Sciences.
the audio of music instead of text data. 2018 4. 33. 10.4103/jpcs.jpcs_8_18.