0% found this document useful (0 votes)
336 views10 pages

SPSS Annotated Output K Means Cluster Anal

The document describes using k-means cluster analysis on customer usage data from a telecommunications provider to segment their customer base. Initially, a 3-cluster solution was obtained but did not capture all important groups. A 4-cluster solution identified a potentially profitable "Internet" customer cluster missed previously. Examining the final cluster centers and distances between clusters provided insight into the natural groupings of customers and how they compare.

Uploaded by

Aditya Mehra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
336 views10 pages

SPSS Annotated Output K Means Cluster Anal

The document describes using k-means cluster analysis on customer usage data from a telecommunications provider to segment their customer base. Initially, a 3-cluster solution was obtained but did not capture all important groups. A 4-cluster solution identified a potentially profitable "Internet" customer cluster missed previously. Examining the final cluster centers and distances between clusters provided insight into the natural groupings of customers and how they compare.

Uploaded by

Aditya Mehra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

SPSS ANNOTATED OUTPUT K-MEANS CLUSTER ANALYSIS|

K-means cluster analysis is a tool designed to assign cases to


a fixed number of groups (clusters) whose characteristics are
not yet known but are based on a set of specified variables. It
is most useful when you want to classify a large number
(thousands) of cases.
A good cluster analysis is:
 Efficient. Uses as few clusters as possible.
 Effective. Captures all statistically and commercially
important clusters. For example, a cluster with five customers
may be statistically different but not very profitable.

The K-Means Cluster Analysis procedure begins with the


construction of initial cluster centers. You can assign these
yourself or have the procedure select k well-spaced observations
for the cluster centers.
After obtaining initial cluster centers, the procedure:
 Assigns cases to clusters based on distance from the
cluster centers.
 Updates the locations of cluster centers based on the mean
values of cases in each cluster.
These steps are repeated until any reassignment of cases would
make the clusters more internally variable or externally
similar.

A telecommunications provider wants to segment its customer base


by service usage patterns. If customers can be classified by
usage, the company can offer more attractive packages to its
customers.

1. To run the cluster analysis, from the menus choose:


Analyze > Classify > K-Means Cluster...
Figure 1. K-Means Cluster Analysis dialog box
2. If the variable list does not display variable labels in
file order, right-click anywhere in the variable list and
from the context menu choose Display Variable
Labels and Sort by File Order.
3. Select Standardized log-long distance through Standardized
log-wireless and Standardized multiple
lines through Standardized electronic billing as analysis
variables.
4. Type 3 as the number of clusters.
5. Click Iterate.Figure 2. Iterate dialog box

6. Type 20 as the maximum iterations.


7. Click Continue.
8. Click Options in the K-Means Cluster Analysis dialog
box.Figure 3. Options dialog box
9. Select ANOVA table and Cluster information for each
group in the Statistics group.
10. Select Exclude cases pairwise in the Missing Values
group. There are many missing values due to the fact that
most customers do not subscribe to all services, so
excluding cases pairwise maximizes the information you can
obtain from the data... at the cost of possibly biasing the
results.
11. Click Continue, then click OK in the K-Means Cluster
Analysis dialog box.

Figure 1. Initial cluster centers for three-cluster solution

The initial cluster centers are the variable values of


the k well-spaced observations.

Figure 1. Iteration history for three-cluster solution


The iteration history shows the progress of the clustering
process at each step. In early iterations, the cluster centers
shift quite a lot. By the 14th iteration, they have settled down
to the general area of their final location, and the last four
iterations are minor adjustments.
If the algorithm stops because the maximum number of iterations
is reached, you may want to increase the maximum because the
solution may otherwise be unstable. For example, if you had left
the maximum number of iterations at 10, the reported solution
would still be in a state of flux.

Figure 1. ANOVA table for three-cluster solution

The ANOVA table indicates which variables contribute the most


to your cluster solution. Variables with large F values provide
the greatest separation between clusters.

Figure 1. Final cluster centers for three-cluster solution


The final cluster centers are computed as the mean for each
variable within each final cluster. The final cluster centers
reflect the characteristics of the typical case for each
cluster.
 Customers in cluster 1 tend to be big spenders who purchase
a lot of services.
 Customers in cluster 2 tend to be moderate spenders who
purchase the "calling" services.
 Customers in cluster 3 tend to spend very little and do not
purchase many services.

Figure 1. Distances between final cluster centers for three-


cluster solution

This table shows the Euclidean distances between the final


cluster centers. Greater distances between clusters correspond
to greater dissimilarities.
 Clusters 1 and 3 are most different.
 Cluster 2 is approximately equally similar to clusters 1
and 3.
These relationships between the clusters can also be intuited
from the final cluster centers, but this becomes more difficult
as the number of clusters and variables increases.

Figure 1. Number of cases in each cluster for three-cluster


solution
A large number of cases were assigned to the third cluster,
which unfortunately is the least profitable group. Perhaps a
fourth, more profitable, cluster could be extracted from this
"basic service" group.

Figure 1. K-Means Cluster Analysis dialog box

1. To run a cluster analysis with four clusters, reopen the


K-Means Cluster Analysis dialog box.
2. Type 4 as the number of clusters.
3. Click Save.Figure 2. Save dialog box

4. Select Cluster membership and Distance from cluster


center.
5. Click Continue.
6. Click OK in the K-Means Cluster Analysis dialog box.
7. The saved variables can be used to create a useful boxplot.
From the menus, choose:
Graphs > Chart Builder...
8. Click the Gallery tab, select Boxplot from the list of
chart types, and drag and drop the Simple Boxplot icon onto
the canvas.
9. Drag and drop Distance of Case from its Classification
Cluster Center onto the y axis.
10. Drag and drop Cluster Number of Case onto the x axis.
11. Click OK to create the boxplot.
Figure 3. Chart Builder

Figure 1. Plot of distances from cluster center by cluster


membership for four-cluster solution
This is a diagnostic plot that helps you to find outliers within
clusters. There is a lot of variability in cluster 2, but all
the distances are within reason.

Figure 1. Final cluster centers for four-cluster solution

This table shows that an important grouping is missed in the


three-cluster solution. Members of clusters 1 and 2 are largely
drawn from cluster 3 in the three-cluster solution, and they are
unlikely to be big spenders. However, members of cluster 1 are
highly likely to purchase Internet-related services, which
establishes them as a distinct and possibly profitable group.
Clusters 3 and 4 seem to correspond to clusters 1 and 2 from the
three-cluster solution.
Figure 1. Distances between final cluster centers for four-
cluster solution

The distances between the clusters have not changed greatly.


 Clusters 1 and 2 are the most similar, which makes sense
because they were combined into one cluster in the three-
cluster solution.
 Clusters 2 and 3 are the most dissimilar, since they
represent opposite spending behaviors.
 Cluster 4 is still equally similar to the other clusters.

Figure 1. Number of cases in each cluster for four-cluster


solution

Nearly 25% of cases belong to the newly created group of "E-


service" customers, which is very significant to your profits.

Using k-means cluster analysis, you initially grouped the


customers into three clusters. However, this solution was not
very satisfactory, so you reran the analysis with four clusters.
These results were better, and from the final cluster centers,
you saw that a potentially profitable "Internet" grouping was
missed in the three-cluster solution.
This example underscores the exploratory nature of cluster
analysis, since it is impossible to determine the "best" number
of clusters until you have run the analyses and examined the
solutions.
The next step for the company is to try to construct a model
that classifies the customers according to their demographic
information. With such a model, the company can customize offers
for individual prospective customers. For information on how the
company builds such a model, see Using Discriminant Analysis to
Classify Telecommunications Customers.

The K-Means Cluster Analysis procedure is a tool for finding


natural groupings of cases, given their values on a set of
variables. It is most useful when you want to classify a large
number (thousands) of cases.
 The TwoStep Cluster Analysis procedure allows you to use
both categorical and continuous variables, and can
automatically select the "best" number of clusters.
 If you want to cluster variables instead of cases, or have
a small number of cases, try the Hierarchical Cluster
Analysis procedure.
 If your k-means analysis is part of a segmentation
solution, these newly created clusters can be analyzed in
the Discriminant Analysis procedure.

See the following texts for more information on k-means


cluster analysis:
Aldenderfer, M. S., and R. K. Blashfield. 1984. Cluster
Analysis. Newbury Park: Sage Publications.

You might also like