K means Clustering – Introduction

Music Recommendation System Using Machine Learning

Last Updated : 13 Jun, 2025

When did we see a video on youtube let's say it was funny then the next time you open your youtube app you get recommendations of some funny videos in your feed ever thought about how? This is nothing but an application of Machine Learning using which recommender systems are built to provide personalized experience and increase customer engagement.

In this article, we will walk through the process of building a content-based music recommendation system using machine learning. The system recommends songs similar to a given input song based on metadata such as genre, artist name and track name. We will use the "TCC CEDs Music Dataset" which contains metadata about songs, including their genres, artists and lyrics.

Step 1: Importing Libraries & Dataset

Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code.

Pandas- This library helps to load the data frame in a 2D array format and has multiple functions to perform analysis tasks in one go.
Numpy - Numpy arrays are very fast and can perform large computations in a very short time.
Matplotlib/Seaborn - This library is used to draw visualizations.
Sklearn - This module contains multiple libraries having pre-implemented functions to perform tasks from data preprocessing to model development and evaluation.

Python

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
import seaborn as sns

The dataset we are going to use contains data about songs released in the span of around 100 years. Along with some general information about songs some scientific measures of sound are also provided like loudness, acoustics, speechiness, and so on.

Python

data = pd.read_csv('/content/tcc_ceds_music.csv')  
data.head()

Output:

Screenshot-2025-04-02-145341 — Dataset View

Step 2: Exploratory Data Analysis (EDA)

Before building the recommendation system, we perform exploratory data analysis (EDA) to gain insights into the dataset.

Distribution of Songs by Genre

We visualize the top 10 genres in the dataset to understand the diversity of songs.

Python

plt.figure(figsize=(10, 6))
sns.countplot(y='genre', data=data, order=data['genre'].value_counts().index[:10])
plt.title('Top 10 Genres')
plt.xlabel('Count')
plt.ylabel('Genre')
plt.show()

Output:

download — Song Distribution by Genre

Top Artists by Song Count

We identify the most popular artists based on the number of songs they have in the dataset.

Python

top_artists = data.groupby('artist_name').size().sort_values(ascending=False).head(10)
plt.figure(figsize=(10, 6))
sns.barplot(x=top_artists.values, y=top_artists.index, palette='viridis')
plt.title('Top 10 Artists by Number of Songs')
plt.xlabel('Number of Songs')
plt.ylabel('Artist Name')
plt.show()

Output:

download- — Top Artists by Song Count

Step 3: Preprocessing the Data

To build the recommendation system, we preprocess the data by combining relevant features and converting them into numerical vectors.

Combine Features

We concatenate the genre, artist_name, and track_name columns into a single feature called combined_features.

Python

data['combined_features'] = (
    data['genre'].fillna('') + ' ' +
    data['artist_name'].fillna('') + ' ' +
    data['track_name'].fillna('')
)

Vectorize Text Data

We use TF-IDF Vectorization to convert the combined features into numerical vectors.

Python

tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(data['combined_features'])

Compute Similarity Scores

We compute the cosine similarity between songs based on their vectorized features.

Python

cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

def get_recommendations(song_title, data, cosine_sim, top_n=10):
    # Get the index of the song that matches the title
    idx = data[data['track_name'] == song_title].index
    if len(idx) == 0:
        print("Song not found in the dataset.")
        return
    
    idx = idx[0]

    sim_scores = list(enumerate(cosine_sim[idx]))

    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    sim_scores = sim_scores[1:top_n+1]  
    song_indices = [i[0] for i in sim_scores]

    recommendations = data.iloc[song_indices]
    return recommendations

Explanation :

The function retrieves the index of the input song.
It computes similarity scores for all songs and sorts them in descending order.
It returns the top-N most similar songs.

Step 4: Generate Recommendations

We use the get_recommendations function to recommend songs similar to a specific input song (e.g., 'cry').

Python

recommended_songs = get_recommendations('cry', data, cosine_sim, top_n=10)
print(recommended_songs[['track_name', 'artist_name', 'genre']])

Output:

Screenshot-2025-04-02-151558 — Recommended Songs

Step 5: Visualize Recommendations

Finally, we visualize the recommended songs using a bar chart.

Python

plt.figure(figsize=(10, 6))
sns.barplot(y='track_name', x='artist_name', data=recommended_songs, palette='coolwarm')
plt.title('Recommended Songs Similar to "Cry"')
plt.xlabel('Artist Name')
plt.ylabel('Song Name')
plt.show()

Output:

download — Visualizing Recommendation

Although this model requires a lot of changes before it can be used in any real-world music app or website. But this is just an overview of how recommendation systems are built and used.

Dataset Link : click here
Google Colab : click here

K means Clustering – Introduction

A

abhishekm482g

Improve

Article Tags :

Practice Tags :

Similar Reads

100+ Machine Learning Projects with Source Code [2025]

This article provides over 100 Machine Learning projects and ideas to provide hands-on experience for both beginners and professionals. Whether you're a student enhancing your resume or a professional advancing your career these projects offer practical insights into the world of Machine Learning an