0% found this document useful (0 votes)
18 views

SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-4

The document describes using the K-Means algorithm in SAP HANA to perform customer segmentation on telco customer data. It shows how to execute the K-Means procedure using SAP HANA PAL, drop and create the necessary tables, and call the PAL K-Means procedure. It then discusses using the elbow criterion method to determine the optimal number of clusters (K) by running K-Means with different values of K and plotting the total intra-cluster distance. Based on the elbow point in the graph, 3 clusters were determined to be best. The K-Means procedure was then rerun with K=3 and the results showed how customers were assigned to the 3 clusters.

Uploaded by

jefferyleclerc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-4

The document describes using the K-Means algorithm in SAP HANA to perform customer segmentation on telco customer data. It shows how to execute the K-Means procedure using SAP HANA PAL, drop and create the necessary tables, and call the PAL K-Means procedure. It then discusses using the elbow criterion method to determine the optimal number of clusters (K) by running K-Means with different values of K and plotting the total intra-cluster distance. Based on the elbow point in the graph, 3 clusters were determined to be best. The K-Means procedure was then rerun with K=3 and the results showed how customers were assigned to the 3 clusters.

Uploaded by

jefferyleclerc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

3/13/24, 3:53 PM SAP HANA PAL – K-Means Algorithm or How to do Cust...

- SAP Community

DROP TABLE PAL_KMEANS_CENTERS_TAB_TELCO;

CREATE COLUMN TABLE PAL_KMEANS_CENTERS_TAB_TELCO(

"CENTER_ID" INT,

"V000" DOUBLE,

"V001" DOUBLE,

"V002" DOUBLE,

"V003" DOUBLE,

"V004" DOUBLE,

"V005" DOUBLE,

"V006" DOUBLE,

"V007" DOUBLE

);

/* Execute the K-Means procedure */

https://community.sap.com/t5/technology-blogs-by-members/sap-hana-pal-k-means-algorithm-or-how -to-do-customer-segmentation-for-the/ba-p/12976696/page/2 14/39


3/13/24, 3:53 PM SAP HANA PAL – K-Means Algorithm or How to do Cust... - SAP Community

CALL PAL_KMEANS_TELCO(TELCO, PAL_CONTROL_TAB_TELCO, PAL_KMEANS_RESASSIGN_TAB_TELCO,


PAL_KMEANS_CENTERS_TAB_TELCO) with overview;

Pretty easy huh?

Identify the Right Number of Clusters

Ok, I have my code ready, but I’m missing a very important part, I still don’t know how many Ks I need to specify as the
input parameter (well, I do know because I created the sample data, but let’s pretend I don’t know). There are multiple
techniques to find out how many groups will produce the best clustering, in this case I will use the Elbow Criterion. The
elbow criterion is a common rule of thumb that says that one should choose a number of clusters so that adding another
cluster does not add sufficient information. I will run the code above specifying different number of clusters and for each run
I will measure the total intra-cluster distance. When the distance does not decrease much from one run to the other I will
know the number of groups I need to use. I built the chart below with the results:

https://community.sap.com/t5/technology-blogs-by-members/sap-hana-pal-k-means-algorithm-or-how -to-do-customer-segmentation-for-the/ba-p/12976696/page/2 15/39


3/13/24, 3:53 PM SAP HANA PAL – K-Means Algorithm or How to do Cust... - SAP Community

As you can see, the distance goes dramatically down between 2 and 3, and after 3 the distance keeps going down but in a
smaller scale. So the “elbow” is clearly in cluster 3. This means that I should use 3 clusters to run the algorithm. So now I’m
going to run the algorithm again using the right number of clusters. This is the result:

The first column is the Customer ID and the second column is the cluster that has been assigned to that customer. So
based on how customers use their mobile phones, the K-Means algorithm clustered my customers in the following way:

Customer ID 1 thru 10 --> Cluster 2


Customer ID 10001 thru 10010 --> Cluster 1
Customer ID 20001 thru 20010 --> Cluster 0

https://community.sap.com/t5/technology-blogs-by-members/sap-hana-pal-k-means-algorithm-or-how -to-do-customer-segmentation-for-the/ba-p/12976696/page/2 16/39

You might also like