0% found this document useful (0 votes)

2 views

ML(project)_merged

The document outlines a project aimed at predicting the future popularity of songs using streaming and engagement metrics collected in 2024, framed as a regression task. It details the input data features, including track information and various popularity metrics from platforms like Spotify, YouTube, and TikTok, as well as the data preprocessing steps required. The dataset used for this analysis is sourced from Kaggle and includes comprehensive streaming metrics, with a focus on selecting relevant features for a linear regression model.

Uploaded by

Samuel Abiola-Henry

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

ML(project)_merged

Uploaded by

Samuel Abiola-Henry

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Predicting the future popularity of songs based

on various streaming and engagement metrics

collected in 2024.
Anonymous .. lol
September 2024

1 Problem Formulation
This project aims to predict the future popularity of songs based on streaming
and engagement data collected in 2024. The problem is framed as a regression
task, where the objective is to forecast a continuous value that represents a
song’s future popularity. This popularity metric could be defined by future
streaming counts, chart positions, or other relevant indicators that reflect how
widely a song is consumed

2 Data Points
Features (Input Data): The features represent the input data that the
model uses to make predictions. These are the various metrics and attributes
related to each song that can influence its popularity.

1. Track Information:
• Track: The name of the song.
• Album Name: The name of the album on which the song appears.
• Artist: The artist or group performing the song.
• Release Date: The date when the song was released.
• ISRC: International Standard Recording Code, a unique identifier
for the song.
2. Popularity Metrics:
• All Time Rank: A ranking of the song based on its historical per-
formance.
• Spotify Streams: The number of times the song has been streamed
on Spotify.

1
• Spotify Playlist Count: The number of Spotify playlists that in-
clude the song.
• Spotify Playlist Reach: The potential audience size of the Spotify
playlists that include the song.
• Spotify Popularity: A metric indicating the song’s popularity on
Spotify.
• YouTube Views: The number of views the song has received on
YouTube.
• YouTube Likes: The number of likes the song has received on
YouTube.
• TikTok Posts: The number of posts featuring the song on TikTok.
• TikTok Likes: The number of likes for posts featuring the song on
TikTok.
• TikTok Views: The number of views for posts featuring the song
on TikTok.
• YouTube Playlist Reach: The potential audience size of YouTube
playlists that include the song.
• Apple Music Playlist Count: The number of Apple Music playlists
that include the song.
• AirPlay Spins: The number of times the song has been played on
AirPlay.
• SiriusXM Spins: The number of times the song has been played
on SiriusXM radio.
• Deezer Playlist Count: The number of Deezer playlists that in-
clude the song.
• Deezer Playlist Reach: The potential audience size of Deezer
playlists that include the song.
• Amazon Playlist Count: The number of Amazon Music playlists
that include the song.
• Pandora Streams: The number of times the song has been streamed
on Pandora.
• Pandora Track Stations: The number of Pandora stations that
include the song.
• Soundcloud Streams: The number of times the song has been
streamed on SoundCloud.
• Shazam Counts: The number of times the song has been identified
using Shazam.
• TIDAL Popularity: A metric indicating the song’s popularity on
TIDAL.

2
• Explicit Track: A binary indicator (0 or 1) indicating whether the
song is explicit.
Label(Target Variable): The label is the variable that the model aims to
predict. In this case, it summarizes the overall popularity of the song.
Track Score: A composite score that reflects the overall popularity of the
song. This score integrates various aspects of the song’s performance and re-
ception across different platforms and metrics.

3 Data Set
The dataset is from Kaggle and is titled ”Most Streamed Spotify Songs 2024.” [1]
It offers a detailed collection of streaming and engagement metrics for popular
songs across various platforms in 2024.
The dataset contains a comprehensive collection of streaming and engage-
ment metrics for popular songs across various platforms in 2024. It includes
information such as track details, streaming data from platforms like Spotify,
YouTube, TikTok, and Pandora, as well as radio spins and playlist counts.

3.1 Needed Data Preprocessing

Handling Missing Data + Cleaning Numeric fields + Normalization + Counting
Data points

4 Feature Selection
Identify Numeric Fields: I listed all numeric columns that might help predict
Track Score. Select Features: I chose features based on their relevance:
• Streaming Metrics: Spotify Streams, YouTube Views, etc., reflect
track popularity.
• Engagement Metrics: Spotify Playlist Count, YouTube Likes, etc.,
show listener engagement.
• Playlist Metrics: Spotify Playlist Reach, Deezer Playlist Reach,
etc., indicate playlist inclusion.
• Explicit Track: Included as a categorical feature that could impact the
score.
Exclude Non-Numeric Features: Columns like Track, Album Name, Artist,
and Release Date were left out because they’re less relevant for numeric pre-
diction.

5 Model
Linear regression

3
ML(project)

September 6, 2024

READING DATA INTO PANDAS

[113]: import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

[114]: data = pd.read_csv('Most Streamed Spotify Songs 2024.csv',␣

↪encoding='ISO-8859-1')

data.head(10)

[114]: Track Album Name \

0 MILLION DOLLAR BABY Million Dollar Baby - Single
1 Not Like Us Not Like Us
2 i like the way you kiss me I like the way you kiss me
3 Flowers Flowers - Single
4 Houdini Houdini
5 Lovin On Me Lovin On Me
6 Beautiful Things Beautiful Things
7 Gata Only Gata Only
8 Danza Kuduro - Cover ýýýýýýýýýýýýýýýýýýýýý - ýýýýýýýýýýýýýýýýýý -
9 BAND4BAND (feat. Lil Baby) BAND4BAND (feat. Lil Baby)

Artist Release Date ISRC All Time Rank Track Score \

0 Tommy Richman 4/26/2024 QM24S2402528 1 725.4
1 Kendrick Lamar 5/4/2024 USUG12400910 2 545.9
2 Artemas 3/19/2024 QZJ842400387 3 538.4
3 Miley Cyrus 1/12/2023 USSM12209777 4 444.9
4 Eminem 5/31/2024 USUG12403398 5 423.3
5 Jack Harlow 11/10/2023 USAT22311371 6 410.1
6 Benson Boone 1/18/2024 USWB12307016 7 407.2
7 FloyyMenor 2/2/2024 QZL382406049 8 375.8
8 MUSIC LAB JPN 6/9/2024 TCJPA2463708 9 355.7
9 Central Cee 5/23/2024 USSM12404354 10 330.6

Spotify Streams Spotify Playlist Count Spotify Playlist Reach … \

1
0 390,470,936 30,716 196,631,588 …
1 323,703,884 28,113 174,597,137 …
2 601,309,283 54,331 211,607,669 …
3 2,031,280,633 269,802 136,569,078 …
4 107,034,922 7,223 151,469,874 …
5 670,665,438 105,892 175,421,034 …
6 900,158,751 73,118 201,585,714 …
7 675,079,153 40,094 211,236,940 …
8 1,653,018,119 1 15 …
9 90,676,573 10,400 184,199,419 …

SiriusXM Spins Deezer Playlist Count Deezer Playlist Reach \

0 684 62.0 17,598,718
1 3 67.0 10,422,430
2 536 136.0 36,321,847
3 2,182 264.0 24,684,248
4 1 82.0 17,660,624
5 4,654 86.0 17,167,254
6 429 168.0 48,197,850
7 30 87.0 33,245,595
8 NaN NaN NaN
9 117 78.0 10,800,098

Amazon Playlist Count Pandora Streams Pandora Track Stations \

0 114.0 18,004,655 22,931
1 111.0 7,780,028 28,444
2 172.0 5,022,621 5,639
3 210.0 190,260,277 203,384
4 105.0 4,493,884 7,006
5 152.0 138,529,362 50,982
6 154.0 65,447,476 57,372
7 53.0 3,372,428 5,762
8 NaN NaN NaN
9 92.0 1,005,626 842

Soundcloud Streams Shazam Counts TIDAL Popularity Explicit Track

0 4,818,457 2,669,262 NaN 0
1 6,623,075 1,118,279 NaN 1
2 7,208,651 5,285,340 NaN 0
3 NaN 11,822,942 NaN 0
4 207,179 457,017 NaN 1
5 9,438,601 4,517,131 NaN 1
6 NaN 9,990,302 NaN 0
7 NaN 6,063,523 NaN 1
8 NaN NaN NaN 1
9 3,679,709 666,302 NaN 1

2
[10 rows x 29 columns]

HANDLING MISSING DATA

[115]: def handle_missing_data(data):
if 'TIDAL Popularity' in data.columns:
data = data.drop(columns=['TIDAL Popularity'])
print("Dropped 'TIDAL Popularity' column (entirely empty).")

missing_data = data.isnull().sum()
print("\nMissing Data before handling (after dropping TIDAL Popularity):")
print(missing_data[missing_data > 0])

data = data.dropna(axis=1, how='all')

data = data.dropna()

print("\nData after dropping columns and rows with missing values:")

print(f"Remaining data points: {len(data)}")

missing_data_after = data.isnull().sum()
if missing_data_after.sum() == 0:
print("\nNo missing data found after handling.")
else:
print("\nMissing Data after handling:")
print(missing_data_after[missing_data_after > 0])

return data

data = pd.read_csv('Most Streamed Spotify Songs 2024.csv',␣

↪encoding='ISO-8859-1')

data = handle_missing_data(data)
data.head(10)

Dropped 'TIDAL Popularity' column (entirely empty).

Missing Data before handling (after dropping TIDAL Popularity):

Artist 5
Spotify Streams 113
Spotify Playlist Count 70
Spotify Playlist Reach 72
Spotify Popularity 804
YouTube Views 308
YouTube Likes 315
TikTok Posts 1173
TikTok Likes 980
TikTok Views 981
YouTube Playlist Reach 1009
Apple Music Playlist Count 561

3
AirPlay Spins 498
SiriusXM Spins 2123
Deezer Playlist Count 921
Deezer Playlist Reach 928
Amazon Playlist Count 1055
Pandora Streams 1106
Pandora Track Stations 1268
Soundcloud Streams 3333
Shazam Counts 577
dtype: int64

Data after dropping columns and rows with missing values:

Remaining data points: 565

No missing data found after handling.

[115]: Track Album Name \

0 MILLION DOLLAR BABY Million Dollar Baby - Single
1 Not Like Us Not Like Us
2 i like the way you kiss me I like the way you kiss me
5 Lovin On Me Lovin On Me
9 BAND4BAND (feat. Lil Baby) BAND4BAND (feat. Lil Baby)
12 LUNCH HIT ME HARD AND SOFT
15 LALA LALA - Single
16 Fortnight (feat. Post Malone) THE TORTURED POETS DEPARTMENT
18 BLUE HIT ME HARD AND SOFT
21 Espresso Espresso

Artist Release Date ISRC All Time Rank Track Score \

0 Tommy Richman 4/26/2024 QM24S2402528 1 725.4
1 Kendrick Lamar 5/4/2024 USUG12400910 2 545.9
2 Artemas 3/19/2024 QZJ842400387 3 538.4
5 Jack Harlow 11/10/2023 USAT22311371 6 410.1
9 Central Cee 5/23/2024 USSM12404354 10 330.6
12 Billie Eilish 5/17/2024 USUM72401991 13 316.3
15 Myke Towers 3/22/2023 USWL12300002 16 299.9
16 Taylor Swift 4/18/2024 USUG12401028 17 297.6
18 Billie Eilish 5/17/2024 USUM72401996 19 292.6
21 Sabrina Carpenter 4/12/2024 USUM72403305 22 281.5

Spotify Streams Spotify Playlist Count Spotify Playlist Reach … \

0 390,470,936 30,716 196,631,588 …
1 323,703,884 28,113 174,597,137 …
2 601,309,283 54,331 211,607,669 …
5 670,665,438 105,892 175,421,034 …
9 90,676,573 10,400 184,199,419 …
12 221,636,195 13,800 197,280,692 …

4
15 925,655,569 103,605 79,944,921 …
16 395,433,400 12,784 177,932,568 …
18 91,272,461 6,499 52,287,548 …
21 547,882,871 24,425 262,343,414 …

AirPlay Spins SiriusXM Spins Deezer Playlist Count Deezer Playlist Reach \
0 40,975 684 62.0 17,598,718
1 40,778 3 67.0 10,422,430
2 74,333 536 136.0 36,321,847
5 522,042 4,654 86.0 17,167,254
9 3,823 117 78.0 10,800,098
12 41,344 45 138.0 38,243,636
15 92,231 228 60.0 5,633,435
16 129,968 3 99.0 37,988,531
18 181 1 24.0 5,054,005
21 37,208 236 167.0 41,414,565

Amazon Playlist Count Pandora Streams Pandora Track Stations \

0 114.0 18,004,655 22,931
1 111.0 7,780,028 28,444
2 172.0 5,022,621 5,639
5 152.0 138,529,362 50,982
9 92.0 1,005,626 842
12 163.0 1,354,692 1,219
15 83.0 12,171,026 13,242
16 134.0 9,961,769 13,437
18 33.0 283,089 162
21 149.0 10,362,898 10,848

Soundcloud Streams Shazam Counts Explicit Track

0 4,818,457 2,669,262 0
1 6,623,075 1,118,279 1
2 7,208,651 5,285,340 0
5 9,438,601 4,517,131 1
9 3,679,709 666,302 1
12 1,313,357 450,344 0
15 871,978 2,765,808 1
16 377,734 1,210,029 0
18 975,891 257,661 0
21 1,551,157 1,373,085 1

[10 rows x 28 columns]

CLEANING NUMERIC FEILDS

[116]: def clean_numeric_fields(data):
numeric_columns_with_commas = [

5
'Spotify Streams', 'Spotify Playlist Count', 'Spotify Playlist Reach',
'YouTube Views', 'YouTube Likes', 'TikTok Posts', 'TikTok Likes',
'TikTok Views', 'YouTube Playlist Reach', 'Apple Music Playlist Count',
'AirPlay Spins', 'SiriusXM Spins', 'Deezer Playlist Count',
'Deezer Playlist Reach', 'Amazon Playlist Count', 'Pandora Streams',
'Pandora Track Stations', 'Soundcloud Streams', 'Shazam Counts'
]

for col in numeric_columns_with_commas:

if col in data.columns:
data[col] = data[col].astype(str).str.replace(',', '')
data[col] = pd.to_numeric(data[col], errors='coerce')

return data

data = clean_numeric_fields(data)
data.head(10)

[116]: Track Album Name \

Artist Release Date ISRC All Time Rank Track Score \

Spotify Streams Spotify Playlist Count Spotify Playlist Reach … \

0 390470936 30716 196631588 …
1 323703884 28113 174597137 …
2 601309283 54331 211607669 …
5 670665438 105892 175421034 …

6
9 90676573 10400 184199419 …
12 221636195 13800 197280692 …
15 925655569 103605 79944921 …
16 395433400 12784 177932568 …
18 91272461 6499 52287548 …
21 547882871 24425 262343414 …

AirPlay Spins SiriusXM Spins Deezer Playlist Count \

0 40975 684 62.0
1 40778 3 67.0
2 74333 536 136.0
5 522042 4654 86.0
9 3823 117 78.0
12 41344 45 138.0
15 92231 228 60.0
16 129968 3 99.0
18 181 1 24.0
21 37208 236 167.0

Deezer Playlist Reach Amazon Playlist Count Pandora Streams \

0 17598718 114.0 18004655
1 10422430 111.0 7780028
2 36321847 172.0 5022621
5 17167254 152.0 138529362
9 10800098 92.0 1005626
12 38243636 163.0 1354692
15 5633435 83.0 12171026
16 37988531 134.0 9961769
18 5054005 33.0 283089
21 41414565 149.0 10362898

Pandora Track Stations Soundcloud Streams Shazam Counts Explicit Track

0 22931 4818457 2669262 0
1 28444 6623075 1118279 1
2 5639 7208651 5285340 0
5 50982 9438601 4517131 1
9 842 3679709 666302 1
12 1219 1313357 450344 0
15 13242 871978 2765808 1
16 13437 377734 1210029 0
18 162 975891 257661 0
21 10848 1551157 1373085 1

[10 rows x 28 columns]

NORMALIZING DATA

7
[117]: def normalize_data(data):
numeric_columns = [
'Track Score', 'Spotify Streams', 'Spotify Playlist Count',
'Spotify Playlist Reach', 'Spotify Popularity', 'YouTube Views',
'YouTube Likes', 'TikTok Posts', 'TikTok Likes', 'TikTok Views',
'YouTube Playlist Reach', 'Apple Music Playlist Count', 'AirPlay Spins',
'SiriusXM Spins', 'Deezer Playlist Count', 'Deezer Playlist Reach',
'Amazon Playlist Count', 'Pandora Streams', 'Pandora Track Stations',
'Soundcloud Streams', 'Shazam Counts', 'Explicit Track'
]

scaler = MinMaxScaler()

data[numeric_columns] = scaler.fit_transform(data[numeric_columns])

return data

data = normalize_data(data)
data.head(10)

[117]: Track Album Name \

Artist Release Date ISRC All Time Rank Track Score \

0 Tommy Richman 4/26/2024 QM24S2402528 1 1.000000
1 Kendrick Lamar 5/4/2024 USUG12400910 2 0.745715
2 Artemas 3/19/2024 QZJ842400387 3 0.735090
5 Jack Harlow 11/10/2023 USAT22311371 6 0.553336
9 Central Cee 5/23/2024 USSM12404354 10 0.440714
12 Billie Eilish 5/17/2024 USUM72401991 13 0.420456
15 Myke Towers 3/22/2023 USWL12300002 16 0.397223
16 Taylor Swift 4/18/2024 USUG12401028 17 0.393965
18 Billie Eilish 5/17/2024 USUM72401996 19 0.386882
21 Sabrina Carpenter 4/12/2024 USUM72403305 22 0.371157

Spotify Streams Spotify Playlist Count Spotify Playlist Reach … \

0 0.089010 0.049345 0.747964 …
1 0.073378 0.044924 0.663452 …

8
2 0.138373 0.089457 0.805405 …
5 0.154611 0.177038 0.666612 …
9 0.018820 0.014837 0.700281 …
12 0.049481 0.020612 0.750454 …
15 0.214312 0.173153 0.300416 …
16 0.090172 0.018887 0.676245 …
18 0.018960 0.008211 0.194337 …
21 0.125865 0.038660 1.000000 …

AirPlay Spins SiriusXM Spins Deezer Playlist Count \

0 0.024170 0.096701 0.104631
1 0.024054 0.000283 0.113208
2 0.043848 0.075747 0.231561
5 0.307945 0.658785 0.145798
9 0.002255 0.016424 0.132075
12 0.024388 0.006230 0.234991
15 0.054405 0.032139 0.101201
16 0.076666 0.000283 0.168096
18 0.000106 0.000000 0.039451
21 0.021948 0.033272 0.284734

Deezer Playlist Reach Amazon Playlist Count Pandora Streams \

0 0.412387 0.604278 0.016404
1 0.244226 0.588235 0.007087
2 0.851123 0.914439 0.004575
5 0.402276 0.807487 0.126229
9 0.253076 0.486631 0.000914
12 0.896156 0.866310 0.001232
15 0.132006 0.438503 0.011088
16 0.890178 0.711230 0.009075
18 0.118428 0.171123 0.000256
21 0.970460 0.791444 0.009441

Pandora Track Stations Soundcloud Streams Shazam Counts Explicit Track

0 0.006437 0.018714 0.058976 0.0
1 0.007987 0.025724 0.024369 1.0
2 0.001575 0.027998 0.117350 0.0
5 0.014324 0.036659 0.100208 1.0
9 0.000227 0.014291 0.014284 1.0
12 0.000333 0.005100 0.009465 0.0
15 0.003713 0.003386 0.061131 1.0
16 0.003768 0.001466 0.026416 0.0
18 0.000035 0.003789 0.005166 0.0
21 0.003040 0.006024 0.030055 1.0

[10 rows x 28 columns]

9
[118]: def display_data_points(data):
features = [
'------Track Information---------',
'Album Name',
'Artist',
'Release Date',
'ISRC',
'------Popularity Metrics---------',
'All Time Rank',
'Spotify Streams',
'Spotify Playlist Count',
'Spotify Playlist Reach',
'Spotify Popularity',
'YouTube Views',
'YouTube Likes',
'TikTok Posts',
'TikTok Likes',
'TikTok Views',
'YouTube Playlist Reach',
'Apple Music Playlist Count',
'AirPlay Spins',
'SiriusXM Spins',
'Deezer Playlist Count',
'Deezer Playlist Reach',
'Amazon Playlist Count',
'Pandora Streams',
'Pandora Track Stations',
'Soundcloud Streams',
'Shazam Counts',
'Explicit Track'
]

labels = ['Track Score']

print("Features (Input Data):")
for feature in features:
print(f"- {feature}")

print("\nLabels (Target Variable):")

for label in labels:
print(f"- {label}")

display_data_points(data)

Features (Input Data):

- ------Track Information---------
- Album Name
- Artist

10
- Release Date
- ISRC
- ------Popularity Metrics---------
- All Time Rank
- Spotify Streams
- Spotify Playlist Count
- Spotify Playlist Reach
- Spotify Popularity
- YouTube Views
- YouTube Likes
- TikTok Posts
- TikTok Likes
- TikTok Views
- YouTube Playlist Reach
- Apple Music Playlist Count
- AirPlay Spins
- SiriusXM Spins
- Deezer Playlist Count
- Deezer Playlist Reach
- Amazon Playlist Count
- Pandora Streams
- Pandora Track Stations
- Soundcloud Streams
- Shazam Counts
- Explicit Track

Labels (Target Variable):

- Track Score
COUNTING DATA POINTS
[119]: data_points = data.shape
data_points

[119]: (565, 28)

MODEL OF CHOICE: Linear Regression Simplicity and Interpretability: Linear regression is

straightforward and provides clear insights into how each feature affects the Track Score. Feature
Relationships: Since the relationship between features like streaming metrics and Track Score
is likely linear, a linear model can capture these relationships effectively. Baseline Performance:
Linear regression serves as a good baseline model. If it performs well, it might indicate that more
complex models aren’t necessary.
[120]: features = [
'Spotify Streams', 'Spotify Playlist Count', 'Spotify Playlist Reach',
'Spotify Popularity', 'YouTube Views', 'YouTube Likes', 'TikTok Posts',
'TikTok Likes', 'TikTok Views', 'YouTube Playlist Reach',
'Apple Music Playlist Count', 'AirPlay Spins', 'SiriusXM Spins',

11
'Deezer Playlist Count', 'Deezer Playlist Reach', 'Amazon Playlist Count',
'Pandora Streams', 'Pandora Track Stations', 'Soundcloud Streams', 'Shazam␣
↪Counts',

'Explicit Track'
]

#dividing variables
X = data[features]
y = data['Track Score']

#spliting the data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,␣
↪random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print("\nModel Evaluation:")
print(f"Mean Squared Error (MSE): {mse:.6f}")
print(f"R-squared Score: {r2:.4f}")

Model Evaluation:
Mean Squared Error (MSE): 0.003194
R-squared Score: 0.6456
Choice of Loss Function:
I choose for Mean Squared Error (MSE) because it captures the average squared difference between
predictions and actual values, which is crucial for regression. It particularly helps in focusing on
reducing larger errors.
Training Set (80%):
Size: 80% of the data. Why: This larger portion gives the model plenty of examples to learn from,
helping it understand patterns and relationships better for accurate predictions.
Test Set (20%):
Size: 20% of the data. Why: This set is for evaluating how well the model performs on new, unseen
data. It’s a common practice to ensure the model can generalize well to real-world situations.
Design Choice:
Split Ratio: 80% training and 20% testing in my opinion is a good balance. It provides enough
data to train the model effectively while keeping a significant portion for testing its performance.

12
Overfitting Prevention: Keeping a separate test set helps avoid overfitting, ensuring the model isn’t
just memorizing the training data but can perform well on new data.
[ ]:

Singabahambayo PDF
No ratings yet
Singabahambayo PDF
6 pages
Work Hard Playlist Hard - Second Edition: Actionable Advice to Help Artists Grow Their Audience on Music Streaming Platforms
From Everand
Work Hard Playlist Hard - Second Edition: Actionable Advice to Help Artists Grow Their Audience on Music Streaming Platforms
Mike Warner
No ratings yet
Trinity R&P Guitar Syllabus From 2018 PDF
No ratings yet
Trinity R&P Guitar Syllabus From 2018 PDF
54 pages
Hit Song Prediction Based On Early Adopter Data and Audio Features
No ratings yet
Hit Song Prediction Based On Early Adopter Data and Audio Features
2 pages
Spotify Project: Problem Statement
No ratings yet
Spotify Project: Problem Statement
4 pages
Eb The Night Has A Thousand Eyes
No ratings yet
Eb The Night Has A Thousand Eyes
1 page
Music Lesson Plan Grade 1 Quarter 1
50% (6)
Music Lesson Plan Grade 1 Quarter 1
3 pages
Chico Trujillo Mix 1st Trumpet in BB
100% (1)
Chico Trujillo Mix 1st Trumpet in BB
4 pages
Andres Segovia An Autobiography of The Years 1893 1920 PDF
100% (4)
Andres Segovia An Autobiography of The Years 1893 1920 PDF
232 pages
PI_analysis
No ratings yet
PI_analysis
5 pages
De CBP B3 Spotify
No ratings yet
De CBP B3 Spotify
11 pages
Spotify Final Research Report
No ratings yet
Spotify Final Research Report
99 pages
ML Case Study
No ratings yet
ML Case Study
5 pages
Aneesha Big Data Project
No ratings yet
Aneesha Big Data Project
4 pages
Project Spotify Haseeb
No ratings yet
Project Spotify Haseeb
37 pages
Mcon 1 2 ML Methodology MCON
No ratings yet
Mcon 1 2 ML Methodology MCON
2 pages
B3 Song Popularity Analysis
No ratings yet
B3 Song Popularity Analysis
65 pages
Internet News Data With Readers Engagement
No ratings yet
Internet News Data With Readers Engagement
3 pages
Music_Popularity_Prediction_Through_Data_analysis_
No ratings yet
Music_Popularity_Prediction_Through_Data_analysis_
6 pages
Ieee Paper
No ratings yet
Ieee Paper
6 pages
DS Final Project PDF
No ratings yet
DS Final Project PDF
20 pages
Spotify Data Explaination
No ratings yet
Spotify Data Explaination
2 pages
Raagalytics - ppt for ICBAI
No ratings yet
Raagalytics - ppt for ICBAI
7 pages
Spotify Data Analysis Report[1]
No ratings yet
Spotify Data Analysis Report[1]
6 pages
Paper 2 (Spotify)
No ratings yet
Paper 2 (Spotify)
9 pages
A Model For Predicting Music Popularity On Streami
No ratings yet
A Model For Predicting Music Popularity On Streami
10 pages
QT Project (B)
No ratings yet
QT Project (B)
16 pages
Spotify Assignment
No ratings yet
Spotify Assignment
10 pages
DAV Project
No ratings yet
DAV Project
11 pages
Final Project Presentation 2
No ratings yet
Final Project Presentation 2
15 pages
Mini Project Bai
No ratings yet
Mini Project Bai
15 pages
IP SPOTIFY MUSIC ANALYSIS
No ratings yet
IP SPOTIFY MUSIC ANALYSIS
11 pages
Predicting Spotify Song Popularity
No ratings yet
Predicting Spotify Song Popularity
11 pages
Assignment 2-Individual Assignment MMA867
No ratings yet
Assignment 2-Individual Assignment MMA867
2 pages
A Song Classifier for Predicting User Preference Based on Spotify Song Attributes
No ratings yet
A Song Classifier for Predicting User Preference Based on Spotify Song Attributes
6 pages
Data Mining PPT
No ratings yet
Data Mining PPT
16 pages
Creating Cohorts of Songs Problem Statement
No ratings yet
Creating Cohorts of Songs Problem Statement
5 pages
Spotify1
No ratings yet
Spotify1
7 pages
Scientific Reports - Analyzing and Predicting Success of Professional Musicians
No ratings yet
Scientific Reports - Analyzing and Predicting Success of Professional Musicians
15 pages
Escal - GT3 - Jupyter Notebook
No ratings yet
Escal - GT3 - Jupyter Notebook
14 pages
Case Study of Spotify
No ratings yet
Case Study of Spotify
11 pages
Project Report PDF
100% (1)
Project Report PDF
38 pages
Project
No ratings yet
Project
21 pages
Proposal Spotify Recommendation System
No ratings yet
Proposal Spotify Recommendation System
13 pages
Project2Spotify
No ratings yet
Project2Spotify
2 pages
2024209510
No ratings yet
2024209510
6 pages
Spotify Top Hits
No ratings yet
Spotify Top Hits
6 pages
Predicting Music Popularity Using Spotify and YouT
No ratings yet
Predicting Music Popularity Using Spotify and YouT
14 pages
Hit Song Prediction Leveraging Low - and High-Level Audio Features
No ratings yet
Hit Song Prediction Leveraging Low - and High-Level Audio Features
8 pages
PPT FINAL
No ratings yet
PPT FINAL
14 pages
T Sivaprakash MBA BA03 040 Capstone Project
No ratings yet
T Sivaprakash MBA BA03 040 Capstone Project
16 pages
cst383 Andrew Brandon Micheal - Jupyter Notebook 1
No ratings yet
cst383 Andrew Brandon Micheal - Jupyter Notebook 1
11 pages
Music Data Analysis in R
No ratings yet
Music Data Analysis in R
76 pages
Spotify Challenge 5.Pptx
No ratings yet
Spotify Challenge 5.Pptx
12 pages
ip project
No ratings yet
ip project
20 pages
Spotify
No ratings yet
Spotify
10 pages
30000songs_sets.ipynb - Colaboratory
No ratings yet
30000songs_sets.ipynb - Colaboratory
11 pages
Spotify Playlist Recommender: The Task The Dataset Metrics Proposed Solutions EDA Result
No ratings yet
Spotify Playlist Recommender: The Task The Dataset Metrics Proposed Solutions EDA Result
27 pages
Spotify Data Analysis SQL Project 1712710947
No ratings yet
Spotify Data Analysis SQL Project 1712710947
23 pages
Spotify Work
No ratings yet
Spotify Work
2 pages
Leveraging Social Media in The Music Industry: September 2019
No ratings yet
Leveraging Social Media in The Music Industry: September 2019
63 pages
1) Product Outcome: Increasing The Time Spent Listening On The App 2) Assume The Tech Bandwidth Is Infinite
No ratings yet
1) Product Outcome: Increasing The Time Spent Listening On The App 2) Assume The Tech Bandwidth Is Infinite
9 pages
Music_Genre_Classification Final Paper1 copy final
No ratings yet
Music_Genre_Classification Final Paper1 copy final
16 pages
Elicit - Streaming_s Impact on Music Marketing ROI - Report
No ratings yet
Elicit - Streaming_s Impact on Music Marketing ROI - Report
8 pages
Regression
No ratings yet
Regression
13 pages
How [Not] to Write Songs in the Streaming Age - 40 Mistakes to Avoid If You Want to Get More Streams
From Everand
How [Not] to Write Songs in the Streaming Age - 40 Mistakes to Avoid If You Want to Get More Streams
Brian Oliver
No ratings yet
Unit VI Lesson 4 Historical Background of Storytelling
No ratings yet
Unit VI Lesson 4 Historical Background of Storytelling
3 pages
A Sampling of Audio Tools: Part 1: Minim
No ratings yet
A Sampling of Audio Tools: Part 1: Minim
2 pages
Continue
No ratings yet
Continue
5 pages
Link L5 U1 Reinforcement Ws
No ratings yet
Link L5 U1 Reinforcement Ws
1 page
Diablo Suelto-Score and Parts
No ratings yet
Diablo Suelto-Score and Parts
1 page
Dickinson MusicLennoxBerkeley 1963
No ratings yet
Dickinson MusicLennoxBerkeley 1963
5 pages
Choral Teaching Sequence 1
No ratings yet
Choral Teaching Sequence 1
2 pages
Grace Alone: & & 4 4 4 P - J B BB B B B B - B J B B P .. J - J - J
0% (1)
Grace Alone: & & 4 4 4 P - J B BB B B B B - B J B B P .. J - J - J
2 pages
Performance Tasks in Mapeh 10
75% (4)
Performance Tasks in Mapeh 10
2 pages
Vol 2 p57-87
100% (1)
Vol 2 p57-87
31 pages
EloNalEne017 para SwissManager
No ratings yet
EloNalEne017 para SwissManager
738 pages
MiscRealbksongfind2016 (Trascinato) PDF
No ratings yet
MiscRealbksongfind2016 (Trascinato) PDF
1 page
DJ Equipment Size Chart 08.12.13 3
No ratings yet
DJ Equipment Size Chart 08.12.13 3
48 pages
116 - PDFsam - Kupdf - Net - Techniques and Materials of Music
No ratings yet
116 - PDFsam - Kupdf - Net - Techniques and Materials of Music
5 pages
Arctic Monkeys-Teddy Picker
No ratings yet
Arctic Monkeys-Teddy Picker
5 pages
Pastime With Good Company
No ratings yet
Pastime With Good Company
1 page
Caught Chords
No ratings yet
Caught Chords
4 pages
Balasevic Djordje-Devojka Sa Cardas Nogama
No ratings yet
Balasevic Djordje-Devojka Sa Cardas Nogama
14 pages
Timeless 50s - Hindi Playlist - Only The Best Songs! @WynkMusic
No ratings yet
Timeless 50s - Hindi Playlist - Only The Best Songs! @WynkMusic
5 pages
Jobim Samba Do Aviao
100% (4)
Jobim Samba Do Aviao
2 pages
Quiz Present Progressive Tense
No ratings yet
Quiz Present Progressive Tense
4 pages
Common Dance Terms
No ratings yet
Common Dance Terms
2 pages
TOPIC 3 Lyric Poetry
No ratings yet
TOPIC 3 Lyric Poetry
5 pages

ML(project)_merged

Uploaded by

ML(project)_merged

Uploaded by

Predicting the future popularity of songs based

on various streaming and engagement metrics

3.1 Needed Data Preprocessing

READING DATA INTO PANDAS

[114]: data = pd.read_csv('Most Streamed Spotify Songs 2024.csv',␣

[114]: Track Album Name \

Artist Release Date ISRC All Time Rank Track Score \

Spotify Streams Spotify Playlist Count Spotify Playlist Reach … \

SiriusXM Spins Deezer Playlist Count Deezer Playlist Reach \

Amazon Playlist Count Pandora Streams Pandora Track Stations \

Soundcloud Streams Shazam Counts TIDAL Popularity Explicit Track

HANDLING MISSING DATA

data = data.dropna(axis=1, how='all')

print("\nData after dropping columns and rows with missing values:")

data = pd.read_csv('Most Streamed Spotify Songs 2024.csv',␣

Dropped 'TIDAL Popularity' column (entirely empty).

Missing Data before handling (after dropping TIDAL Popularity):

Data after dropping columns and rows with missing values:

No missing data found after handling.

[115]: Track Album Name \

Artist Release Date ISRC All Time Rank Track Score \

Spotify Streams Spotify Playlist Count Spotify Playlist Reach … \

Amazon Playlist Count Pandora Streams Pandora Track Stations \

Soundcloud Streams Shazam Counts Explicit Track

[10 rows x 28 columns]

CLEANING NUMERIC FEILDS

for col in numeric_columns_with_commas:

[116]: Track Album Name \

Artist Release Date ISRC All Time Rank Track Score \

Spotify Streams Spotify Playlist Count Spotify Playlist Reach … \

AirPlay Spins SiriusXM Spins Deezer Playlist Count \

Deezer Playlist Reach Amazon Playlist Count Pandora Streams \

Pandora Track Stations Soundcloud Streams Shazam Counts Explicit Track

[10 rows x 28 columns]

[117]: Track Album Name \

Artist Release Date ISRC All Time Rank Track Score \

Spotify Streams Spotify Playlist Count Spotify Playlist Reach … \

AirPlay Spins SiriusXM Spins Deezer Playlist Count \

Deezer Playlist Reach Amazon Playlist Count Pandora Streams \

Pandora Track Stations Soundcloud Streams Shazam Counts Explicit Track

[10 rows x 28 columns]

labels = ['Track Score']

print("\nLabels (Target Variable):")

Features (Input Data):

Labels (Target Variable):

[119]: (565, 28)

MODEL OF CHOICE: Linear Regression Simplicity and Interpretability: Linear regression is

#spliting the data

mse = mean_squared_error(y_test, y_pred)

You might also like