K Means Clustering Customer Clustering
K Means Clustering Customer Clustering
In [33]: df
Out[33]:
Marital Settlement
ID Sex Age Education Income Occupation
status size
0 100000001 0 0 67 2 124670 1 2
1 100000002 1 1 22 1 150773 1 2
2 100000003 0 0 49 1 89210 0 0
3 100000004 0 0 45 1 171565 1 1
4 100000005 0 0 53 1 149031 1 1
Data Pre-Processing
In [34]: df.isna().sum()
Out[34]: ID 0
Sex 0
Marital status 0
Age 0
Education 0
Income 0
Occupation 0
Settlement size 0
dtype: int64
https://htmtopdf.herokuapp.com/ipynbviewer/temp/8dc8485bd9d6596905d45321750d7a5b/k-means-customer-clustering.html?t=1695905086525 1/7
28/09/2023, 18:14 k-means-customer-clustering
In [35]: df.duplicated().sum()
Out[35]: 0
In [36]: df.shape
Out[36]: (2000, 8)
In [37]: df.head()
Out[37]:
Marital Settlement
ID Sex Age Education Income Occupation
status size
0 100000001 0 0 67 2 124670 1 2
1 100000002 1 1 22 1 150773 1 2
2 100000003 0 0 49 1 89210 0 0
3 100000004 0 0 45 1 171565 1 1
4 100000005 0 0 53 1 149031 1 1
In [38]: df.tail()
Out[38]:
Marital Settlement
ID Sex Age Education Income Occupation
status size
In [39]: df.corr()
Out[39]:
Marital
ID Sex Age Education Income Occupation
status
Marital
0.074403 0.566511 1.000000 -0.213178 0.374017 -0.073528 -0.029490
status
Settlement
-0.378445 -0.300803 -0.097041 0.119751 0.034732 0.490881 0.571795
size
https://htmtopdf.herokuapp.com/ipynbviewer/temp/8dc8485bd9d6596905d45321750d7a5b/k-means-customer-clustering.html?t=1695905086525 2/7
28/09/2023, 18:14 k-means-customer-clustering
In [41]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Sex 2000 non-null int64
1 Marital status 2000 non-null int64
2 Age 2000 non-null int64
3 Education 2000 non-null int64
4 Income 2000 non-null int64
5 Occupation 2000 non-null int64
6 Settlement size 2000 non-null int64
dtypes: int64(7)
memory usage: 109.5 KB
Data Visualization
https://htmtopdf.herokuapp.com/ipynbviewer/temp/8dc8485bd9d6596905d45321750d7a5b/k-means-customer-clustering.html?t=1695905086525 3/7
28/09/2023, 18:14 k-means-customer-clustering
In [43]: plt.figure(figsize=(15,6))
plt.scatter(df["Age"],df["Income"])
plt.xlabel('Age')
plt.ylabel('Income')
plt.show()
https://htmtopdf.herokuapp.com/ipynbviewer/temp/8dc8485bd9d6596905d45321750d7a5b/k-means-customer-clustering.html?t=1695905086525 4/7
28/09/2023, 18:14 k-means-customer-clustering
In [44]: plt.figure(figsize=(15,6))
plt.scatter(df["Occupation"],df["Income"])
plt.xlabel('Occupation')
plt.ylabel('Income')
plt.show()
In [45]: sns.boxplot(df)
https://htmtopdf.herokuapp.com/ipynbviewer/temp/8dc8485bd9d6596905d45321750d7a5b/k-means-customer-clustering.html?t=1695905086525 5/7
28/09/2023, 18:14 k-means-customer-clustering
/opt/conda/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:87
0: FutureWarning: The default value of `n_init` will change from 10 to
'auto' in 1.4. Set the value of `n_init` explicitly to suppress the wa
rning
warnings.warn(
In [51]: y_predict
In [52]: df
Out[52]:
Sex Marital status Age Education Income Occupation Settlement size
0 0 0 67 2 124670 1 2
1 1 1 22 1 150773 1 2
2 0 0 49 1 89210 0 0
3 0 0 45 1 171565 1 1
4 0 0 53 1 149031 1 1
1995 1 0 47 1 123525 0 0
1996 1 1 27 1 117744 1 0
1997 0 0 31 0 86400 0 0
1998 1 1 24 1 97968 0 0
1999 0 0 25 0 68416 0 0
Out[55]: 10 25
12 22
14 28
20 48
24 26
..
1989 25
1992 51
1994 45
1996 27
1998 24
Name: Age, Length: 719, dtype: int64
https://htmtopdf.herokuapp.com/ipynbviewer/temp/8dc8485bd9d6596905d45321750d7a5b/k-means-customer-clustering.html?t=1695905086525 6/7
28/09/2023, 18:14 k-means-customer-clustering
plt.title('Clusters of customers')
plt.xlabel('Age of Customers')
plt.ylabel('Incomes of Customers')
plt.legend()
plt.show()
In [ ]:
https://htmtopdf.herokuapp.com/ipynbviewer/temp/8dc8485bd9d6596905d45321750d7a5b/k-means-customer-clustering.html?t=1695905086525 7/7