0% found this document useful (0 votes)

7 views

K Means Clustering Customer Clustering

Uploaded by

SubhransuSekharSahoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

K Means Clustering Customer Clustering

Uploaded by

SubhransuSekharSahoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

28/09/2023, 18:14 k-means-customer-clustering

Import Libraries and Datasets

In [31]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [32]: df = pd.read_csv('/kaggle/input/customer-clustering/segmentation data.

csv')

In [33]: df

Out[33]:
Marital Settlement
ID Sex Age Education Income Occupation
status size

0 100000001 0 0 67 2 124670 1 2

1 100000002 1 1 22 1 150773 1 2

2 100000003 0 0 49 1 89210 0 0

3 100000004 0 0 45 1 171565 1 1

4 100000005 0 0 53 1 149031 1 1

... ... ... ... ... ... ... ... ...

1995 100001996 1 0 47 1 123525 0 0

1996 100001997 1 1 27 1 117744 1 0

1997 100001998 0 0 31 0 86400 0 0

1998 100001999 1 1 24 1 97968 0 0

1999 100002000 0 0 25 0 68416 0 0

2000 rows × 8 columns

Data Pre-Processing
In [34]: df.isna().sum()

Out[34]: ID 0
Sex 0
Marital status 0
Age 0
Education 0
Income 0
Occupation 0
Settlement size 0
dtype: int64

https://htmtopdf.herokuapp.com/ipynbviewer/temp/8dc8485bd9d6596905d45321750d7a5b/k-means-customer-clustering.html?t=1695905086525 1/7
28/09/2023, 18:14 k-means-customer-clustering

In [35]: df.duplicated().sum()

Out[35]: 0

In [36]: df.shape

Out[36]: (2000, 8)

In [37]: df.head()

Out[37]:
Marital Settlement
ID Sex Age Education Income Occupation
status size

0 100000001 0 0 67 2 124670 1 2

1 100000002 1 1 22 1 150773 1 2

2 100000003 0 0 49 1 89210 0 0

3 100000004 0 0 45 1 171565 1 1

4 100000005 0 0 53 1 149031 1 1

In [38]: df.tail()

Out[38]:
Marital Settlement
ID Sex Age Education Income Occupation
status size

1995 100001996 1 0 47 1 123525 0 0

1996 100001997 1 1 27 1 117744 1 0

1997 100001998 0 0 31 0 86400 0 0

1998 100001999 1 1 24 1 97968 0 0

1999 100002000 0 0 25 0 68416 0 0

In [39]: df.corr()

Out[39]:
Marital
ID Sex Age Education Income Occupation
status

ID 1.000000 0.328262 0.074403 -0.085246 0.012543 -0.303217 -0.291958

Sex 0.328262 1.000000 0.566511 -0.182885 0.244838 -0.195146 -0.202491

Marital
0.074403 0.566511 1.000000 -0.213178 0.374017 -0.073528 -0.029490
status

Age -0.085246 -0.182885 -0.213178 1.000000 0.654605 0.340610 0.108388

Education 0.012543 0.244838 0.374017 0.654605 1.000000 0.233459 0.064524

Income -0.303217 -0.195146 -0.073528 0.340610 0.233459 1.000000 0.680357

Occupation -0.291958 -0.202491 -0.029490 0.108388 0.064524 0.680357 1.000000

Settlement
-0.378445 -0.300803 -0.097041 0.119751 0.034732 0.490881 0.571795
size

In [40]: df.drop('ID', axis=1, inplace=True)

In [41]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Sex 2000 non-null int64
1 Marital status 2000 non-null int64
2 Age 2000 non-null int64
3 Education 2000 non-null int64
4 Income 2000 non-null int64
5 Occupation 2000 non-null int64
6 Settlement size 2000 non-null int64
dtypes: int64(7)
memory usage: 109.5 KB

Data Visualization

In [42]: df.hist(figsize =(10,10))

plt.show()

In [43]: plt.figure(figsize=(15,6))
plt.scatter(df["Age"],df["Income"])

plt.xlabel('Age')
plt.ylabel('Income')
plt.show()

In [44]: plt.figure(figsize=(15,6))
plt.scatter(df["Occupation"],df["Income"])

plt.xlabel('Occupation')
plt.ylabel('Income')
plt.show()

In [45]: sns.boxplot(df)

Out[45]: <Axes: >

Training the K-means algorithm on the training

dataset
In [47]: from sklearn.cluster import KMeans

In [48]: #training the K-means model on a dataset

kmeans = KMeans(n_clusters=5, init='k-means++', random_state= 42)
y_predict= kmeans.fit_predict(df)

/opt/conda/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:87
0: FutureWarning: The default value of `n_init` will change from 10 to
'auto' in 1.4. Set the value of `n_init` explicitly to suppress the wa
rning
warnings.warn(

In [51]: y_predict

Out[51]: array([4, 1, 2, ..., 2, 0, 2], dtype=int32)

In [52]: df

Out[52]:
Sex Marital status Age Education Income Occupation Settlement size

0 0 0 67 2 124670 1 2

1 1 1 22 1 150773 1 2

2 0 0 49 1 89210 0 0

3 0 0 45 1 171565 1 1

4 0 0 53 1 149031 1 1

... ... ... ... ... ... ... ...

1995 1 0 47 1 123525 0 0

1996 1 1 27 1 117744 1 0

1997 0 0 31 0 86400 0 0

1998 1 1 24 1 97968 0 0

1999 0 0 25 0 68416 0 0

2000 rows × 7 columns

Visualizing the Clusters

In [55]: df['Age'][y_predict == 0]

Out[55]: 10 25
12 22
14 28
20 48
24 26
..
1989 25
1992 51
1994 45
1996 27
1998 24
Name: Age, Length: 719, dtype: int64

In [57]: #visulaizing the clusters

plt.scatter(df['Age'][y_predict == 0], df['Income'][y_predict == 0], s
= 100,
c = 'blue', label = 'Cluster 1') #for first cluster
plt.scatter(df['Age'][y_predict == 1], df['Income'][y_predict == 1], s
= 100,
c = 'green', label = 'Cluster 2') #for second cluster
plt.scatter(df['Age'][y_predict == 2], df['Income'][y_predict == 2], s
= 100,
c = 'red', label = 'Cluster 3') #for third cluster
plt.scatter(df['Age'][y_predict == 3], df['Income'][y_predict == 3], s
= 100,
c = 'cyan', label = 'Cluster 4') #for fourth cluster
plt.scatter(df['Age'][y_predict == 4], df['Income'][y_predict == 4], s
= 100,
c = 'magenta', label = 'Cluster 5') #for fifth cluster

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,

1],
s = 300, c = 'yellow', label = 'Centroid')

plt.title('Clusters of customers')
plt.xlabel('Age of Customers')
plt.ylabel('Incomes of Customers')
plt.legend()
plt.show()

In [ ]:

https://htmtopdf.herokuapp.com/ipynbviewer/temp/8dc8485bd9d6596905d45321750d7a5b/k-means-customer-clustering.html?t=1695905086525 7/7

Striver SDE Sheet
100% (1)
Striver SDE Sheet
14 pages
Extremal Graph Theory Bollobas PDF
0% (2)
Extremal Graph Theory Bollobas PDF
2 pages
8 Queens Problem
0% (1)
8 Queens Problem
24 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
exp 8ml
No ratings yet
exp 8ml
5 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
Clustering Algorithms SciKit Learn 1705740354
No ratings yet
Clustering Algorithms SciKit Learn 1705740354
22 pages
Ml Assignment 4
No ratings yet
Ml Assignment 4
6 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
PMA_Experiment_2
No ratings yet
PMA_Experiment_2
6 pages
Customer_Segmentation_Analysis
No ratings yet
Customer_Segmentation_Analysis
18 pages
Report ML 2
No ratings yet
Report ML 2
10 pages
K Means Clustering - Experiment 12
No ratings yet
K Means Clustering - Experiment 12
3 pages
K Means Clustering
No ratings yet
K Means Clustering
5 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
End To End Machine Learning Problem
No ratings yet
End To End Machine Learning Problem
20 pages
Customer Segmentation With K-Means and RMF
No ratings yet
Customer Segmentation With K-Means and RMF
13 pages
Experiment-7: Implementation of K-Means Clustering Algorithm
No ratings yet
Experiment-7: Implementation of K-Means Clustering Algorithm
3 pages
Ass6(DMDS)
No ratings yet
Ass6(DMDS)
7 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
k Means Clustering
No ratings yet
k Means Clustering
11 pages
DA_EXP_10 (1)
No ratings yet
DA_EXP_10 (1)
6 pages
K_means.ipynb_-_Colab
No ratings yet
K_means.ipynb_-_Colab
10 pages
Python Machine Learning
No ratings yet
Python Machine Learning
19 pages
DA_EXP_10_66
No ratings yet
DA_EXP_10_66
6 pages
Data Science Analysis Final Project
No ratings yet
Data Science Analysis Final Project
10 pages
ML - K-Means
No ratings yet
ML - K-Means
12 pages
K Means
100% (2)
K Means
329 pages
DA_EXP_10
No ratings yet
DA_EXP_10
6 pages
Customer Segmentation Using Machine Learning
100% (1)
Customer Segmentation Using Machine Learning
28 pages
23CC554
No ratings yet
23CC554
10 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
CUSTOMER SEGMENTATION USING ENSEMBLE CLUSTERING
No ratings yet
CUSTOMER SEGMENTATION USING ENSEMBLE CLUSTERING
20 pages
Słowacja Wszystko PDF
No ratings yet
Słowacja Wszystko PDF
379 pages
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
No ratings yet
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
7 pages
Customer_segmentation
No ratings yet
Customer_segmentation
43 pages
21AI71-module-5-textbook
No ratings yet
21AI71-module-5-textbook
25 pages
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
No ratings yet
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
6 pages
10.Lab Activity
No ratings yet
10.Lab Activity
11 pages
k means
No ratings yet
k means
5 pages
Kmeansclustering Sales Dataset
No ratings yet
Kmeansclustering Sales Dataset
6 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
4 Clustering With K-Means - Kaggle
No ratings yet
4 Clustering With K-Means - Kaggle
9 pages
3. Chapter 5 CLUSTERING
No ratings yet
3. Chapter 5 CLUSTERING
36 pages
K.means Clustering
No ratings yet
K.means Clustering
8 pages
Exploratory Data Analysis66
No ratings yet
Exploratory Data Analysis66
17 pages
Pa66 ML Exp6
No ratings yet
Pa66 ML Exp6
9 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
Clustering Mall Data Students
No ratings yet
Clustering Mall Data Students
11 pages
AdityaGaur BDA Exp8
No ratings yet
AdityaGaur BDA Exp8
4 pages
Overview of Clustering:: UNIT-5
No ratings yet
Overview of Clustering:: UNIT-5
27 pages
Elbow Method
No ratings yet
Elbow Method
2 pages
Clustering in R
No ratings yet
Clustering in R
12 pages
Practical-8: Import As Import As Import As Import Import As
No ratings yet
Practical-8: Import As Import As Import As Import Import As
9 pages
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
INTRO TO ML ASS
No ratings yet
INTRO TO ML ASS
3 pages
K Means R and Rapid Miner Patient and Mall Case Study
No ratings yet
K Means R and Rapid Miner Patient and Mall Case Study
80 pages
02.1 K-Means Example
No ratings yet
02.1 K-Means Example
12 pages
4 Clustring
No ratings yet
4 Clustring
48 pages
Kman 07
No ratings yet
Kman 07
9 pages
Day_07__1693295835
No ratings yet
Day_07__1693295835
7 pages
SalesDataAnalysis__1693296057
No ratings yet
SalesDataAnalysis__1693296057
14 pages
MLOps_Roadmap_1693296293
No ratings yet
MLOps_Roadmap_1693296293
6 pages
Optimizing the Hyperparameters 1693296270
No ratings yet
Optimizing the Hyperparameters 1693296270
11 pages
CSS Measurement 1697000454
No ratings yet
CSS Measurement 1697000454
7 pages
Naive_Bayes_1696233556
No ratings yet
Naive_Bayes_1696233556
5 pages
Most Frequently Used String Methods in Real Time Projects 1696233506
No ratings yet
Most Frequently Used String Methods in Real Time Projects 1696233506
10 pages
Credit Card Default Clients Prediction 1693295790
No ratings yet
Credit Card Default Clients Prediction 1693295790
23 pages
Identity Management the Foundation of 1695790200
No ratings yet
Identity Management the Foundation of 1695790200
8 pages
Data Cleaning Null and Missing Values 1695787806
No ratings yet
Data Cleaning Null and Missing Values 1695787806
17 pages
Temporal Dead Zone 1697241863
No ratings yet
Temporal Dead Zone 1697241863
6 pages
Redis Introduction and Installation 1695738500
No ratings yet
Redis Introduction and Installation 1695738500
6 pages
Local Storage in Javascript 1697241756
No ratings yet
Local Storage in Javascript 1697241756
9 pages
Level_Up_Your_React_Skills_with_These_8_Hooks_1699841285
No ratings yet
Level_Up_Your_React_Skills_with_These_8_Hooks_1699841285
10 pages
Why Use Websockets Over HTTP 1697870893
No ratings yet
Why Use Websockets Over HTTP 1697870893
6 pages
Assignments_For_Front_End_developer_1695789943
No ratings yet
Assignments_For_Front_End_developer_1695789943
5 pages
ML Public Datasets 1693110238
No ratings yet
ML Public Datasets 1693110238
39 pages
_Learn_JSON_in_2_
No ratings yet
_Learn_JSON_in_2_
8 pages
15 Terms Every React Developer Should Know
No ratings yet
15 Terms Every React Developer Should Know
17 pages
JavaScript Closures Unraveled
No ratings yet
JavaScript Closures Unraveled
6 pages
Formal Languages, Automata and Computability
No ratings yet
Formal Languages, Automata and Computability
28 pages
Structure of An Algorithm
No ratings yet
Structure of An Algorithm
10 pages
Constraint Satisfaction Problem in AI
No ratings yet
Constraint Satisfaction Problem in AI
5 pages
Numerical Methods Important PDF
No ratings yet
Numerical Methods Important PDF
174 pages
Compiler Design QB PDF
No ratings yet
Compiler Design QB PDF
11 pages
Automata (2021 2022)
No ratings yet
Automata (2021 2022)
2 pages
02.04.2025_CCA3007_A21-A22
No ratings yet
02.04.2025_CCA3007_A21-A22
3 pages
Exp 5
No ratings yet
Exp 5
9 pages
ADA Lab Manual Updated 2023-24
No ratings yet
ADA Lab Manual Updated 2023-24
38 pages
Assignment Nptel
No ratings yet
Assignment Nptel
5 pages
Efficiency of Algorithm
No ratings yet
Efficiency of Algorithm
27 pages
Ma3354 DM Unit 3 Part A, B Question and Answer
No ratings yet
Ma3354 DM Unit 3 Part A, B Question and Answer
9 pages
Graphing Rational Functions
No ratings yet
Graphing Rational Functions
6 pages
Shortest Path
No ratings yet
Shortest Path
7 pages
Lesson 8 - LPP - Transportation Problems
No ratings yet
Lesson 8 - LPP - Transportation Problems
43 pages
Indian Institute of Technology, Kharagpur
No ratings yet
Indian Institute of Technology, Kharagpur
2 pages
Block-4 Graph Theory
No ratings yet
Block-4 Graph Theory
83 pages
CSE 421 Algorithms: Richard Anderson Dynamic Programming
No ratings yet
CSE 421 Algorithms: Richard Anderson Dynamic Programming
4 pages
Aoa MCQS
No ratings yet
Aoa MCQS
14 pages
LP Using Simplex Method (Minimization Process)
No ratings yet
LP Using Simplex Method (Minimization Process)
20 pages
PDA_EXAMPLE
No ratings yet
PDA_EXAMPLE
23 pages
Soft Input Viterbi Decoder
No ratings yet
Soft Input Viterbi Decoder
7 pages
9.binary Search Tree - Set 1 (Search and Insertion) : Searching A Key
No ratings yet
9.binary Search Tree - Set 1 (Search and Insertion) : Searching A Key
11 pages
DSA Lab Manual R2021
No ratings yet
DSA Lab Manual R2021
78 pages
D.S Viva Questions-Ktunotes - in
No ratings yet
D.S Viva Questions-Ktunotes - in
7 pages
### 1. Job Sequencing with Deadline and knapsack
No ratings yet
### 1. Job Sequencing with Deadline and knapsack
4 pages
Thompson Algorithm
No ratings yet
Thompson Algorithm
3 pages

K Means Clustering Customer Clustering

Uploaded by

K Means Clustering Customer Clustering

Uploaded by

28/09/2023, 18:14 k-means-customer-clustering

Import Libraries and Datasets

In [32]: df = pd.read_csv('/kaggle/input/customer-clustering/segmentation data.

... ... ... ... ... ... ... ... ...

1995 100001996 1 0 47 1 123525 0 0

1996 100001997 1 1 27 1 117744 1 0

1997 100001998 0 0 31 0 86400 0 0

1998 100001999 1 1 24 1 97968 0 0

1999 100002000 0 0 25 0 68416 0 0

2000 rows × 8 columns

1995 100001996 1 0 47 1 123525 0 0

1996 100001997 1 1 27 1 117744 1 0

1997 100001998 0 0 31 0 86400 0 0

1998 100001999 1 1 24 1 97968 0 0

1999 100002000 0 0 25 0 68416 0 0

ID 1.000000 0.328262 0.074403 -0.085246 0.012543 -0.303217 -0.291958

Sex 0.328262 1.000000 0.566511 -0.182885 0.244838 -0.195146 -0.202491

Age -0.085246 -0.182885 -0.213178 1.000000 0.654605 0.340610 0.108388

Education 0.012543 0.244838 0.374017 0.654605 1.000000 0.233459 0.064524

Income -0.303217 -0.195146 -0.073528 0.340610 0.233459 1.000000 0.680357

Occupation -0.291958 -0.202491 -0.029490 0.108388 0.064524 0.680357 1.000000

In [40]: df.drop('ID', axis=1, inplace=True)

In [42]: df.hist(figsize =(10,10))

Out[45]: <Axes: >

Training the K-means algorithm on the training

In [48]: #training the K-means model on a dataset

Out[51]: array([4, 1, 2, ..., 2, 0, 2], dtype=int32)

... ... ... ... ... ... ... ...

2000 rows × 7 columns

Visualizing the Clusters

In [57]: #visulaizing the clusters

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,

You might also like