Mini Project II Instructions Segmentation and Regression
Mini Project II Instructions Segmentation and Regression
Mahima Hada
2. Purchase data is typically available in long format– multiple rows for each customer. This
format is ideal for Tableau (recall the survey data which you had to convert to long
format for Tableau). You can read the excel sheet into Tableau; as the data has a number
of records per customer, you should be able to see the following data (note that this is
actually frequency data…)
1
https://searchdatamanagement.techtarget.com/definition/RFM-analysis
Prof. Mahima Hada
3. In Tableau, explore multiple clustering solutions (the online videos given for tableau
clustering will teach you how to do that). Multiple clustering solutions customers are
possible, based on the variables you pick (but make sure you include at least some of the
RFM variables). Some examples of clustering solutions are shown below (these are just
examples, you can likely do better):
Prof. Mahima Hada
4. Save the clustering solutions you deem “reasonable” as a variable in the data sheet in
Tableau (the online videos given for tableau clustering will teach you how to do that). At
the end of your clustering exercise, your data source should include Clusters in it;
something like the example given below. As you see, the data is still in long format, so
“cluster 1” (for one of the solutions), and “cluster 3” (for the second clustering solution)
is repeated for each purchasing record for Customer 1.
5. Export the above data into a .csv file. Now you can analyze it in Excel.
6. As you would want to see which clustering solution predicts customers’ purchases better
(to choose between the different clustering solutions), you need to do a regression
analysis. Note that the Tableau data is in long format – in which each customer has
multiple entries. If you put this data into a regression, the regression analysis will treat
each row as a separate datapoint, and ignore the fact that a group of records belong to one
customer. Therefore, you need to transpose this data into a “wide” format (one row per
customer and multiple columns used to represent data in rows).
7. Before you transpose the data, think about which variables you want. This is the stage at
which you add variables that aggregate a customer’s multiple purchases into one variable.
For example, you have “Size” (i.e., purchase size in $) as a purchase-level variable – you
can aggregate it for each customer as Average (average money customer spends in each
purchase), Total (total amount of money spent by customer over all purchases), and/or
most recent amount paid.
8. At this point, you have multiple ways you can proceed. You can create the “wide” data in
Excel (you can google some automatic ways to do it, or do it manually), in
R/Stata/Python or in Tableau Prep.
Prof. Mahima Hada
9. If you decide to do it in Tableau Prep, your Tableau license includes Tableau Prep as
well. Download Tableau Prep. In Tableau Prep, you will need to use the “Aggregate”
Function and divide your varibales (or fields) into “Grouped Fields” (same value across
customer for each transaction: Customer id, Cluster) or “Aggregated Fields” (fields you
want aggregated within each customer: recency, number of responses, size average, size
total etc.). An example is given below:
10. Once you switch the data into Wide format, your data should look like similar to the one
shown below. You will have different columns based on what clusters you chose, the
variables you chose etc., but each customer should be in one row only, with all data for
that customer in columns.
Prof. Mahima Hada
11. Once you have data in this format, you can analyze it using regression analysis.
12. The purpose of the regression analysis is to figure out which Clustering solution is the
best in estimating Sales.