0% found this document useful (0 votes)

55 views

Customer Segmentation Report

This document summarizes a customer segmentation project using KMeans clustering. The project uses a mall customer dataset containing customer ID, age, gender, income and spending score. KMeans clustering is applied to group customers based on income and spending into a fixed number of clusters. Various libraries like NumPy, Pandas, Matplotlib and Scikit-Learn are used. The code generates an elbow plot to find optimal cluster number, performs clustering, plots clusters and centroids.

Uploaded by

dsingh1be21

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

Customer Segmentation Report

Uploaded by

dsingh1be21

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Customer Segmentation

Computer Science and Engineering Department

Thapar Institute of Engineering and Technology

(Deemed to be University), Patiala – 147004

Machine Learning Project

Submitted By:

Name : Yogesh Rathee

Roll No : 102103022

Name : Jagveer Singh

Roll No : 102103024

Submitted To:

Ms. Kudratdeep Aulakh

Index
Sr. No. Content used Page No.
1. Introduction 3
2 Libraries used 4
3. Algorithm(s) used 5
4. Code and Screenshots 6
1. Introduction

1.1 Mall Customer Segmentation Data

https://www.kaggle.com/datasets/vjchoudhary7/custom
er-segmentation-tutorial-in-python

This data set is created only for the learning purpose of the customer
segmentation concepts, also known as market basket analysis. I will
demonstrate this by using unsupervised ML technique (KMeans Clustering
Algorithm) in the simplest form.

1.2 Description of dataset

You are owing a supermarket mall and through membership cards , you
have some basic data about your customers like Customer ID, age,
gender, annual income and spending score. Spending Score is something
you assign to the customer based on your defined parameters like
customer behavior and purchasing data.
2. Libraries Used:

Numpy : NumPy is a Python library for efficient numerical computation,

offering multi-dimensional array support and a wide range of
mathematical functions. It is widely used in data analysis, scientific
research, and machine learning.

Pandas : Pandas is a Python library for data manipulation and analysis,

offering DataFrames and Series for working with structured data
efficiently.

Matplotlib.pyplot : matplotlib.pyplot is a Python library for creating 2D

data visualizations, like plots and charts. It's a fundamental tool for data
visualization in Python.

Seaborn: Seaborn is a Python library that enhances Matplotlib for

creating appealing and informative statistical data visualizations.

Sklearn: Scikit-Learn (sklearn) is a Python library for machine learning,

offering a broad set of tools and algorithms for various tasks in data
science and artificial intelligence
3. Algorithm(s) Used

K-means clustering : K-means clustering is a popular unsupervised machine

learning algorithm. Its main task is to group data into a fixed number of clusters,
often referred to as "k." These clusters are formed based on the similarities
between data points, aiding data segmentation and organization.

The algorithm operates iteratively. Initially, it places "k" cluster centers

randomly within the data space. Data points are then assigned to the nearest
cluster center, typically using Euclidean distance. The cluster centers are then
recalculated as the mean of their assigned data points. This process repeats until
the cluster assignments and centers no longer change significantly.

K-means has applications in various fields, like marketing, image segmentation,

and document classification. It's essential for revealing natural data groupings,
making it a valuable tool for data analysis and preprocessing. However, it does
have some limitations, such as sensitivity to the initial placement of cluster
centers and the need to specify "k" beforehand. Nonetheless, it remains a
versatile and valuable method for data clustering and pattern recognition.
4. Code and Screenshots

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.cluster import KMeans

# loading the data from csv file to a Pandas DataFrame

customer_data = pd.read_csv('D:/ML project/ye rha tere

project/Mall_Customers.csv')

# Display the first 5 rows in the dataframe

print(customer_data.head())

# finding the number of rows and columns

print(customer_data.shape)

# getting some informations about the dataset

print(customer_data.info())

# checking for missing values

print(customer_data.isnull().sum())

X = customer_data.iloc[:,[3,4]].values

print(X)

# finding wcss value for different number of clusters

wcss = []

for i in range(1,11):

kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)

kmeans.fit(X)
wcss.append(kmeans.inertia_)

# plot an elbow graph

sns.set()

plt.plot(range(1,11), wcss)

plt.title('The Elbow Point Graph')

plt.xlabel('Number of Clusters')

plt.ylabel('WCSS')

plt.show()

kmeans = KMeans(n_clusters=5, init='k-means++', random_state=0)

# return a label for each data point based on their cluster

Y = kmeans.fit_predict(X)

print(Y)

# plotting all the clusters and their Centroids

plt.figure(figsize=(8,8))

plt.scatter(X[Y==0,0], X[Y==0,1], s=50, c='green', label='Cluster 1')

plt.scatter(X[Y==1,0], X[Y==1,1], s=50, c='red', label='Cluster 2')

plt.scatter(X[Y==2,0], X[Y==2,1], s=50, c='yellow', label='Cluster 3')

plt.scatter(X[Y==3,0], X[Y==3,1], s=50, c='violet', label='Cluster 4')

plt.scatter(X[Y==4,0], X[Y==4,1], s=50, c='blue', label='Cluster 5')

# plot the centroidsplt.scatter(kmeans.cluster_centers_[:,0],

kmeans.cluster_centers_[:,1], s=100, c='cyan', label='Centroids')

plt.title('Customer Groups')

plt.xlabel('Annual Income')

plt.ylabel('Spending Score')

plt.show()

Iphone Photography School
0% (4)
Iphone Photography School
1 page
AS 1101 6 1989 Graphical Symbols PDF
67% (3)
AS 1101 6 1989 Graphical Symbols PDF
55 pages
Segmentation Analysis
No ratings yet
Segmentation Analysis
17 pages
Practical Data Analysis - Second Edition
From Everand
Practical Data Analysis - Second Edition
Hector Cuesta
No ratings yet
09 Q7 ITIL 2011 Overview Diagram English 1111071
100% (3)
09 Q7 ITIL 2011 Overview Diagram English 1111071
1 page
Summer Training Project Report - PLC
84% (19)
Summer Training Project Report - PLC
28 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
ML - K-Means
No ratings yet
ML - K-Means
12 pages
VL2024250504566_AST03
No ratings yet
VL2024250504566_AST03
2 pages
Ads Phase 4
No ratings yet
Ads Phase 4
12 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Phase 2
No ratings yet
Phase 2
5 pages
_DWDM_PPT
No ratings yet
_DWDM_PPT
13 pages
Ass6(DMDS)
No ratings yet
Ass6(DMDS)
7 pages
Practical File of AI and ML
No ratings yet
Practical File of AI and ML
26 pages
Aiml Project Review
No ratings yet
Aiml Project Review
22 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Objectives of Clustering
No ratings yet
Objectives of Clustering
3 pages
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
No ratings yet
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
7 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
BDA LabReport-9
No ratings yet
BDA LabReport-9
17 pages
Customer Segmentation With K-Means and RMF
No ratings yet
Customer Segmentation With K-Means and RMF
13 pages
Lecture - 7 - Practical - DBSCAN Clustering in Python
No ratings yet
Lecture - 7 - Practical - DBSCAN Clustering in Python
3 pages
AML Assignment 1 1
No ratings yet
AML Assignment 1 1
4 pages
ADS Phase4
No ratings yet
ADS Phase4
21 pages
DA_EXP_10 (1)
No ratings yet
DA_EXP_10 (1)
6 pages
Peer Eval
No ratings yet
Peer Eval
6 pages
Kman 07
No ratings yet
Kman 07
9 pages
CSUDS Project
No ratings yet
CSUDS Project
13 pages
Mall Customer Segmentation Using Machine Learning Techniques
No ratings yet
Mall Customer Segmentation Using Machine Learning Techniques
17 pages
Customer Segmentation Using Machine Learning
100% (1)
Customer Segmentation Using Machine Learning
28 pages
DA_EXP_10
No ratings yet
DA_EXP_10
6 pages
Practical-8: Import As Import As Import As Import Import As
No ratings yet
Practical-8: Import As Import As Import As Import Import As
9 pages
Experiment-7: Implementation of K-Means Clustering Algorithm
No ratings yet
Experiment-7: Implementation of K-Means Clustering Algorithm
3 pages
ENERGY_CONSUMPTION_PREDICTION_SYSTEM (1)
No ratings yet
ENERGY_CONSUMPTION_PREDICTION_SYSTEM (1)
21 pages
ML2 Practical List
No ratings yet
ML2 Practical List
80 pages
Ml Assignment 4
No ratings yet
Ml Assignment 4
6 pages
Workshop Project Report
No ratings yet
Workshop Project Report
10 pages
DS MP
No ratings yet
DS MP
18 pages
AdityaGaur BDA Exp8
No ratings yet
AdityaGaur BDA Exp8
4 pages
Data Mining
No ratings yet
Data Mining
28 pages
Mining and Visualising Real-World Data: About This Module
100% (1)
Mining and Visualising Real-World Data: About This Module
16 pages
PeerEval Unsupervised
No ratings yet
PeerEval Unsupervised
6 pages
Clustering Algorithms SciKit Learn 1705740354
No ratings yet
Clustering Algorithms SciKit Learn 1705740354
22 pages
DATA MINING EX1
No ratings yet
DATA MINING EX1
10 pages
Mastering Python For Data Science - Sample Chapter
71% (7)
Mastering Python For Data Science - Sample Chapter
24 pages
K Means Clustering Customer Clustering
No ratings yet
K Means Clustering Customer Clustering
7 pages
Axe Submission
No ratings yet
Axe Submission
4 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
Customer Segmentation in Python
No ratings yet
Customer Segmentation in Python
71 pages
Untitled document-2-1-13-7-11.4
No ratings yet
Untitled document-2-1-13-7-11.4
5 pages
Customer Segmentation Using Machine Learning With A Coupon Generator GUI
No ratings yet
Customer Segmentation Using Machine Learning With A Coupon Generator GUI
6 pages
set 2
No ratings yet
set 2
19 pages
Clustering Mall Data Students
No ratings yet
Clustering Mall Data Students
11 pages
2324 BigData Lab3
No ratings yet
2324 BigData Lab3
6 pages
6
No ratings yet
6
4 pages
Machine Learning Project Report - Customer Segmentation
No ratings yet
Machine Learning Project Report - Customer Segmentation
2 pages
Machine Learning Project Report - Customer Segmentation
No ratings yet
Machine Learning Project Report - Customer Segmentation
2 pages
Spark Lab
No ratings yet
Spark Lab
6 pages
Experiment 4 1
No ratings yet
Experiment 4 1
4 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
3. Chapter 5 CLUSTERING
No ratings yet
3. Chapter 5 CLUSTERING
36 pages
Project Report DS
No ratings yet
Project Report DS
10 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Can Bus Users Guide
No ratings yet
Can Bus Users Guide
29 pages
Time Division Multiplexing (Transmitter, Receiver, Commutator)
No ratings yet
Time Division Multiplexing (Transmitter, Receiver, Commutator)
27 pages
Industrial Automation and Robotics
No ratings yet
Industrial Automation and Robotics
2 pages
Pallets: United States
No ratings yet
Pallets: United States
4 pages
Capstone PGPCC AWS PDF
No ratings yet
Capstone PGPCC AWS PDF
24 pages
Important Instructions To Examiners:: (Autonomous)
No ratings yet
Important Instructions To Examiners:: (Autonomous)
29 pages
CS8661 - IP Lab Manual Final
No ratings yet
CS8661 - IP Lab Manual Final
86 pages
Lte Kpi PDF
No ratings yet
Lte Kpi PDF
11 pages
Memory Forensic in Incident Response
100% (2)
Memory Forensic in Incident Response
74 pages
BEXEL Data Enrichment User Manual
No ratings yet
BEXEL Data Enrichment User Manual
4 pages
4165 D9ee PDF
No ratings yet
4165 D9ee PDF
2 pages
Capio Overview
No ratings yet
Capio Overview
5 pages
IT Masters CSU Free Short Course - Hacking Countermeasures - Week 1
No ratings yet
IT Masters CSU Free Short Course - Hacking Countermeasures - Week 1
68 pages
Method of Statement - Gunitting
No ratings yet
Method of Statement - Gunitting
8 pages
WOLCK XG (S) - PON OLT Datasheet
No ratings yet
WOLCK XG (S) - PON OLT Datasheet
5 pages
Rasid Cyber Sec 2nd Report
No ratings yet
Rasid Cyber Sec 2nd Report
8 pages
Universal Replicator User Guide
No ratings yet
Universal Replicator User Guide
476 pages
Lab Manual Exp 3 - Gas Temperature Process Control
No ratings yet
Lab Manual Exp 3 - Gas Temperature Process Control
8 pages
Improving TCP Performance Over Wireless Network - Split Connection Approach
100% (1)
Improving TCP Performance Over Wireless Network - Split Connection Approach
50 pages
Car Booking Management System
No ratings yet
Car Booking Management System
31 pages
D1 Aipcp24-Cf-Iciriac2024-00084
No ratings yet
D1 Aipcp24-Cf-Iciriac2024-00084
1 page
Designing Shielded Enclosures PDF
100% (1)
Designing Shielded Enclosures PDF
42 pages
3900 Huawei O&M+Manual+-3G
No ratings yet
3900 Huawei O&M+Manual+-3G
21 pages
Ojs Checklist-Fin2 PDF
No ratings yet
Ojs Checklist-Fin2 PDF
11 pages
Modeling Characteristic Curves of Digital Overcurrent Relay (DOCR) For User-Defined Characteristic Curve Using Artificial Neural Network
No ratings yet
Modeling Characteristic Curves of Digital Overcurrent Relay (DOCR) For User-Defined Characteristic Curve Using Artificial Neural Network
6 pages
Lab 4 Instrumentation
No ratings yet
Lab 4 Instrumentation
6 pages

Customer Segmentation Report

Uploaded by

Customer Segmentation Report

Uploaded by

Customer Segmentation

Computer Science and Engineering Department

Thapar Institute of Engineering and Technology

(Deemed to be University), Patiala – 147004

Machine Learning Project

Name : Yogesh Rathee

Name : Jagveer Singh

Ms. Kudratdeep Aulakh

1.1 Mall Customer Segmentation Data

1.2 Description of dataset

Numpy : NumPy is a Python library for efficient numerical computation,

Pandas : Pandas is a Python library for data manipulation and analysis,

Matplotlib.pyplot : matplotlib.pyplot is a Python library for creating 2D

Seaborn: Seaborn is a Python library that enhances Matplotlib for

Sklearn: Scikit-Learn (sklearn) is a Python library for machine learning,

K-means clustering : K-means clustering is a popular unsupervised machine

The algorithm operates iteratively. Initially, it places "k" cluster centers

K-means has applications in various fields, like marketing, image segmentation,

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.cluster import KMeans

# loading the data from csv file to a Pandas DataFrame

customer_data = pd.read_csv('D:/ML project/ye rha tere

# Display the first 5 rows in the dataframe

# finding the number of rows and columns

# getting some informations about the dataset

# checking for missing values

# finding wcss value for different number of clusters

kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)

# plot an elbow graph

plt.title('The Elbow Point Graph')

kmeans = KMeans(n_clusters=5, init='k-means++', random_state=0)

# return a label for each data point based on their cluster

# plotting all the clusters and their Centroids

plt.scatter(X[Y==0,0], X[Y==0,1], s=50, c='green', label='Cluster 1')

plt.scatter(X[Y==1,0], X[Y==1,1], s=50, c='red', label='Cluster 2')

plt.scatter(X[Y==2,0], X[Y==2,1], s=50, c='yellow', label='Cluster 3')

plt.scatter(X[Y==3,0], X[Y==3,1], s=50, c='violet', label='Cluster 4')

plt.scatter(X[Y==4,0], X[Y==4,1], s=50, c='blue', label='Cluster 5')

# plot the centroidsplt.scatter(kmeans.cluster_centers_[:,0],

You might also like