0% found this document useful (0 votes)
55 views

Customer Segmentation Report

This document summarizes a customer segmentation project using KMeans clustering. The project uses a mall customer dataset containing customer ID, age, gender, income and spending score. KMeans clustering is applied to group customers based on income and spending into a fixed number of clusters. Various libraries like NumPy, Pandas, Matplotlib and Scikit-Learn are used. The code generates an elbow plot to find optimal cluster number, performs clustering, plots clusters and centroids.

Uploaded by

dsingh1be21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Customer Segmentation Report

This document summarizes a customer segmentation project using KMeans clustering. The project uses a mall customer dataset containing customer ID, age, gender, income and spending score. KMeans clustering is applied to group customers based on income and spending into a fixed number of clusters. Various libraries like NumPy, Pandas, Matplotlib and Scikit-Learn are used. The code generates an elbow plot to find optimal cluster number, performs clustering, plots clusters and centroids.

Uploaded by

dsingh1be21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Customer Segmentation

Computer Science and Engineering Department

Thapar Institute of Engineering and Technology

(Deemed to be University), Patiala – 147004

Machine Learning Project

Submitted By:

Name : Yogesh Rathee

Roll No : 102103022

Name : Jagveer Singh

Roll No : 102103024

Submitted To:

Ms. Kudratdeep Aulakh


Index
Sr. No. Content used Page No.
1. Introduction 3
2 Libraries used 4
3. Algorithm(s) used 5
4. Code and Screenshots 6
1. Introduction

1.1 Mall Customer Segmentation Data


https://www.kaggle.com/datasets/vjchoudhary7/custom
er-segmentation-tutorial-in-python

This data set is created only for the learning purpose of the customer
segmentation concepts, also known as market basket analysis. I will
demonstrate this by using unsupervised ML technique (KMeans Clustering
Algorithm) in the simplest form.

1.2 Description of dataset


You are owing a supermarket mall and through membership cards , you
have some basic data about your customers like Customer ID, age,
gender, annual income and spending score. Spending Score is something
you assign to the customer based on your defined parameters like
customer behavior and purchasing data.
2. Libraries Used:

Numpy : NumPy is a Python library for efficient numerical computation,


offering multi-dimensional array support and a wide range of
mathematical functions. It is widely used in data analysis, scientific
research, and machine learning.

Pandas : Pandas is a Python library for data manipulation and analysis,


offering DataFrames and Series for working with structured data
efficiently.

Matplotlib.pyplot : matplotlib.pyplot is a Python library for creating 2D


data visualizations, like plots and charts. It's a fundamental tool for data
visualization in Python.

Seaborn: Seaborn is a Python library that enhances Matplotlib for


creating appealing and informative statistical data visualizations.

Sklearn: Scikit-Learn (sklearn) is a Python library for machine learning,


offering a broad set of tools and algorithms for various tasks in data
science and artificial intelligence
3. Algorithm(s) Used

K-means clustering : K-means clustering is a popular unsupervised machine


learning algorithm. Its main task is to group data into a fixed number of clusters,
often referred to as "k." These clusters are formed based on the similarities
between data points, aiding data segmentation and organization.

The algorithm operates iteratively. Initially, it places "k" cluster centers


randomly within the data space. Data points are then assigned to the nearest
cluster center, typically using Euclidean distance. The cluster centers are then
recalculated as the mean of their assigned data points. This process repeats until
the cluster assignments and centers no longer change significantly.

K-means has applications in various fields, like marketing, image segmentation,


and document classification. It's essential for revealing natural data groupings,
making it a valuable tool for data analysis and preprocessing. However, it does
have some limitations, such as sensitivity to the initial placement of cluster
centers and the need to specify "k" beforehand. Nonetheless, it remains a
versatile and valuable method for data clustering and pattern recognition.
4. Code and Screenshots

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.cluster import KMeans

# loading the data from csv file to a Pandas DataFrame

customer_data = pd.read_csv('D:/ML project/ye rha tere


project/Mall_Customers.csv')

# Display the first 5 rows in the dataframe

print(customer_data.head())

# finding the number of rows and columns

print(customer_data.shape)

# getting some informations about the dataset

print(customer_data.info())

# checking for missing values

print(customer_data.isnull().sum())

X = customer_data.iloc[:,[3,4]].values

print(X)

# finding wcss value for different number of clusters

wcss = []

for i in range(1,11):

kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)

kmeans.fit(X)
wcss.append(kmeans.inertia_)

# plot an elbow graph

sns.set()

plt.plot(range(1,11), wcss)

plt.title('The Elbow Point Graph')

plt.xlabel('Number of Clusters')

plt.ylabel('WCSS')

plt.show()

kmeans = KMeans(n_clusters=5, init='k-means++', random_state=0)

# return a label for each data point based on their cluster

Y = kmeans.fit_predict(X)

print(Y)

# plotting all the clusters and their Centroids

plt.figure(figsize=(8,8))

plt.scatter(X[Y==0,0], X[Y==0,1], s=50, c='green', label='Cluster 1')

plt.scatter(X[Y==1,0], X[Y==1,1], s=50, c='red', label='Cluster 2')

plt.scatter(X[Y==2,0], X[Y==2,1], s=50, c='yellow', label='Cluster 3')

plt.scatter(X[Y==3,0], X[Y==3,1], s=50, c='violet', label='Cluster 4')

plt.scatter(X[Y==4,0], X[Y==4,1], s=50, c='blue', label='Cluster 5')

# plot the centroidsplt.scatter(kmeans.cluster_centers_[:,0],


kmeans.cluster_centers_[:,1], s=100, c='cyan', label='Centroids')

plt.title('Customer Groups')

plt.xlabel('Annual Income')

plt.ylabel('Spending Score')

plt.show()

You might also like