0% found this document useful (0 votes)
14 views

SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1Y

The document discusses using a K-Means algorithm to perform customer segmentation on telco data. It describes creating sample customer data with 30 rows grouped into 3 segments. It then shows the code used to define the input and output table types for K-Means in SAP HANA, load the sample data, call the K-Means procedure, and determine the optimal number of clusters using the elbow method by running K-Means with different numbers of clusters and measuring the total intra-cluster distance.

Uploaded by

jefferyleclerc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1Y

The document discusses using a K-Means algorithm to perform customer segmentation on telco data. It describes creating sample customer data with 30 rows grouped into 3 segments. It then shows the code used to define the input and output table types for K-Means in SAP HANA, load the sample data, call the K-Means procedure, and determine the optimal number of clusters using the elbow method by running K-Means with different numbers of clusters and measuring the total intra-cluster distance.

Uploaded by

jefferyleclerc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

3/14/24, 10:31 AM SAP HANA PAL – K-Means Algorithm or How to do Cust...

- SAP Community

So each row in this table will represent a unique customer. Now I need to fill it, but I do not have access to real data, so I
had to build my own dataset. I created 30 different customers (30 rows) that can be grouped in 3 segments:

Segment 1: From Customer ID 1 thru 10. In this segment customers usually have short calls. They originate or receive
a low number of calls. These customers call more in the evening, more often during the weekend and to mobile lines.
They send and receive a fair amount of SMSs. This segment could represent personal mobile users.
Segment 2: From Customer ID 10001 thru 10010. In this segment customers have an average call duration. They
originate or receive an average number of calls. They usually call during business hours and during week days. They
send or receive a small amount of SMSs. This segment could represent small business users.
Segment 3: From Customer ID 20001 thru 20010. In this segment customers usually have long duration calls. They
usually call during business hours and during week days. They usually call to mobile lines and they heavily use SMSs.
This segment could represent enterprise business users.

The resulting table looks like this:

https://community.sap.com/t5/technology-blogs-by-members/sap-hana-pal-k-means-algorithm-or-how -to-do-customer-segmentation-for-the/ba-p/12976696/page/2 5/39


3/14/24, 10:31 AM SAP HANA PAL – K-Means Algorithm or How to do Cust... - SAP Community

and will point the AFL Wrapper Generator to the different

table types that we just created */

DROP TABLE PDATA_TELCO;

CREATE COLUMN TABLE PDATA_TELCO(

"ID" INT,

"TYPENAME" VARCHAR(100),

"DIRECTION" VARCHAR(100) );

/* Fill the table */

INSERT INTO PDATA_TELCO VALUES (1, '_SYS_AFL.PAL_KMEANS_DATA_TELCO', 'in');

INSERT INTO PDATA_TELCO VALUES (2, '_SYS_AFL.PAL_CONTROL_TELCO', 'in');

INSERT INTO PDATA_TELCO VALUES (3, '_SYS_AFL.PAL_KMEANS_RESASSIGN_TELCO', 'out');

INSERT INTO PDATA_TELCO VALUES (4, '_SYS_AFL.PAL_KMEANS_CENTERS_TELCO', 'out');

/* Creates the KMeans procedure that executes the KMeans Algorithm */

https://community.sap.com/t5/technology-blogs-by-members/sap-hana-pal-k-means-algorithm-or-how -to-do-customer-segmentation-for-the/ba-p/12976696/page/2 10/39


3/14/24, 10:31 AM SAP HANA PAL – K-Means Algorithm or How to do Cust... - SAP Community

CALL PAL_KMEANS_TELCO(TELCO, PAL_CONTROL_TAB_TELCO, PAL_KMEANS_RESASSIGN_TAB_TELCO,


PAL_KMEANS_CENTERS_TAB_TELCO) with overview;

Pretty easy huh?

Identify the Right Number of Clusters

Ok, I have my code ready, but I’m missing a very important part, I still don’t know how many Ks I need to specify as the
input parameter (well, I do know because I created the sample data, but let’s pretend I don’t know). There are multiple
techniques to find out how many groups will produce the best clustering, in this case I will use the Elbow Criterion. The
elbow criterion is a common rule of thumb that says that one should choose a number of clusters so that adding another
cluster does not add sufficient information. I will run the code above specifying different number of clusters and for each run
I will measure the total intra-cluster distance. When the distance does not decrease much from one run to the other I will
know the number of groups I need to use. I built the chart below with the results:

https://community.sap.com/t5/technology-blogs-by-members/sap-hana-pal-k-means-algorithm-or-how -to-do-customer-segmentation-for-the/ba-p/12976696/page/2 15/39

You might also like