Open In App

Music Recommendation System Using Machine Learning

Last Updated : 13 Jun, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

When did we see a video on youtube let's say it was funny then the next time you open your youtube app you get recommendations of some funny videos in your feed ever thought about how? This is nothing but an application of Machine Learning using which recommender systems are built to provide personalized experience and increase customer engagement.

In this article, we will walk through the process of building a content-based music recommendation system using machine learning. The system recommends songs similar to a given input song based on metadata such as genre, artist name and track name. We will use the "TCC CEDs Music Dataset" which contains metadata about songs, including their genres, artists and lyrics.

Step 1: Importing Libraries & Dataset

Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code.

  • Pandas- This library helps to load the data frame in a 2D array format and has multiple functions to perform analysis tasks in one go.
  • Numpy - Numpy arrays are very fast and can perform large computations in a very short time.
  • Matplotlib/Seaborn - This library is used to draw visualizations.
  • Sklearn - This module contains multiple libraries having pre-implemented functions to perform tasks from data preprocessing to model development and evaluation.
Python
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
import seaborn as sns

The dataset we are going to use contains data about songs released in the span of around 100 years. Along with some general information about songs some scientific measures of sound are also provided like loudness, acoustics, speechiness, and so on.

Python
data = pd.read_csv('/content/tcc_ceds_music.csv')  
data.head()

Output:

Screenshot-2025-04-02-145341
Dataset View

Step 2: Exploratory Data Analysis (EDA)

Before building the recommendation system, we perform exploratory data analysis (EDA) to gain insights into the dataset.

Distribution of Songs by Genre

We visualize the top 10 genres in the dataset to understand the diversity of songs.

Python
plt.figure(figsize=(10, 6))
sns.countplot(y='genre', data=data, order=data['genre'].value_counts().index[:10])
plt.title('Top 10 Genres')
plt.xlabel('Count')
plt.ylabel('Genre')
plt.show()

Output:

download
Song Distribution by Genre
Top Artists by Song Count

We identify the most popular artists based on the number of songs they have in the dataset.

Python
top_artists = data.groupby('artist_name').size().sort_values(ascending=False).head(10)
plt.figure(figsize=(10, 6))
sns.barplot(x=top_artists.values, y=top_artists.index, palette='viridis')
plt.title('Top 10 Artists by Number of Songs')
plt.xlabel('Number of Songs')
plt.ylabel('Artist Name')
plt.show()

Output:

download-
Top Artists by Song Count

Step 3: Preprocessing the Data

To build the recommendation system, we preprocess the data by combining relevant features and converting them into numerical vectors.

Combine Features

We concatenate the genre, artist_name, and track_name columns into a single feature called combined_features.

Python
data['combined_features'] = (
    data['genre'].fillna('') + ' ' +
    data['artist_name'].fillna('') + ' ' +
    data['track_name'].fillna('')
)
Vectorize Text Data

We use TF-IDF Vectorization to convert the combined features into numerical vectors.

Python
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(data['combined_features'])
Compute Similarity Scores

We compute the cosine similarity between songs based on their vectorized features.

Python
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

def get_recommendations(song_title, data, cosine_sim, top_n=10):
    # Get the index of the song that matches the title
    idx = data[data['track_name'] == song_title].index
    if len(idx) == 0:
        print("Song not found in the dataset.")
        return
    
    idx = idx[0]

    sim_scores = list(enumerate(cosine_sim[idx]))

    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    sim_scores = sim_scores[1:top_n+1]  
    song_indices = [i[0] for i in sim_scores]

    recommendations = data.iloc[song_indices]
    return recommendations

Explanation :

  • The function retrieves the index of the input song.
  • It computes similarity scores for all songs and sorts them in descending order.
  • It returns the top-N most similar songs.

Step 4: Generate Recommendations

We use the get_recommendations function to recommend songs similar to a specific input song (e.g., 'cry').

Python
recommended_songs = get_recommendations('cry', data, cosine_sim, top_n=10)
print(recommended_songs[['track_name', 'artist_name', 'genre']])

Output:

Screenshot-2025-04-02-151558
Recommended Songs

Step 5: Visualize Recommendations

Finally, we visualize the recommended songs using a bar chart.

Python
plt.figure(figsize=(10, 6))
sns.barplot(y='track_name', x='artist_name', data=recommended_songs, palette='coolwarm')
plt.title('Recommended Songs Similar to "Cry"')
plt.xlabel('Artist Name')
plt.ylabel('Song Name')
plt.show()

Output:

download
Visualizing Recommendation

Although this model requires a lot of changes before it can be used in any real-world music app or website. But this is just an overview of how recommendation systems are built and used.

Dataset Link : click here

Google Colab : click here


Next Article
Practice Tags :

Similar Reads