K Means Handout

This document discusses k-means clustering, an unsupervised machine learning technique that groups unlabeled data points into a specified number of clusters based on their similarity. It provides an example of using k-means clustering in SPSS to analyze customer data from a telecommunications provider, grouping customers into 3 clusters based on their usage patterns. The results are then interpreted, identifying characteristics of customers in each cluster and which variables best discriminate between the clusters.

Uploaded by

Ankit Seth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views

K Means Handout

Uploaded by

Ankit Seth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

ClusterAnalysis

WhatisClusterAnalysis? Clusteranalysisisastatisticaltechniqueusedtogroupcases(individualsorobjects)intohomogeneous subgroupsbasedonresponsestovariables.UsingPASW(SPSS)17.0toconductaclusteranalysis,there arethreeclusteringprocedures:twostep,kmeans,andhierarchical. Kmeansclusteringallowsyoutoselectthenumberofclustersandtheprocedurecanbeusedwith moderatetolargedatasets.Thekmeansclusteringalgorithmassignscasestoclustersbasedonthe smallestamountofdistancebetweentheclustermeanandcase.Thisisaniterativeprocessthatstops oncetheclustermeansdonotchangemuchinsuccessivesteps.

KMeansClustering
Asanexampleofkmeansclustering,asamplePASW17.0datasetwasused;telco_extra.sav, telecommunicationsproviderdatathathas14continuousvariables.Thecontinuousvariableshave alreadybeenstandardized,withameanof0andstandarddeviationof1,toallowfordifferentunitsin whichvariablesweremeasured.Thisanalysiswillclustercustomersbytheirserviceusagepatterns. InPASW17.0,gotoAnalyze>Classify>KMeansCluster

Next,theKMeansClusterAnalysismenuappears.SelectStandardizedloglongdistancethrough StandardizedlogwirelessandStandardizedmultiplelinesthroughStandardizedelectronicbilling variablesandplaceintheVariablesbox. LabelCasesby.Optional;placevariableheretolabelcases NumberofClusters.Youhavetospecifythenumberofclustersyouwant.Forthisexample, type3inthebox. Method.Thedefault"Iterateandclassify,"whichisaniterativeprocessisusedtocomputethe clustermeanseachtimeacaseisaddedordeletedfromthecluster.Clustersarethenclassified Page1of7

basedonceclustercentershavebeenupdated.The"Classifyonly"methodareclassifiedbased ontheinitialclustercenters,whicharenotiterativelycomputed.Forthisexample,Iterateand classifyischosen. ClusterCenters.Youcandrawinitialclustercentersfromafile(Readinitial)oryoucansave thefinalclustercenters(Writefinal).Forthisexample,wearenotusingeitheroption.

ClicktheIteratebutton;theKMeansClusterAnalysis:Iterateboxappears.ChangeMaximum Iterationsto20.ClickContinue. MaximumIterations.Setsthemaximumnumberofiterations. ConvergenceCriterion:Thedefaultterminatesoncethelargestchangeinmeansofanycluster islessthan2%oftheminimumdistancebetweeninitialclustercenters. Userunningmeans.Ifthisboxischecked,clustercenterswillbeupdatedaftereachcaseis classified,insteadofafterallofthecasesareclassified.

Page2of7

ClickOptionsintheKMeansClusterAnalysisdialogbox.CheckInitialclustercenters,ANOVAtable, Clusterinformationforeachcase,andExcludecasespairwise.ClickContinue.ClickOk. Initialclustercenters.Printstheinitialvariablemeansforeachclusterintheoutput. ANOVAtable.ANOVAFtestsareconductedforeachvariabletoindicatehowwellthevariable discriminatesbetweenclusters. Clusterinformationforeachcase.Printseachcase'sfinalclusterassignmentandtheEuclidean distancebetweenthecaseandtheclustercenterintheouput. MissingValues.Thedefaultislistwisedeletion.Forthisexample,therearemanymissingvalues becausemostcustomersdidnotsubscribetoallservices,soexcludingcasespairwisemaximizes theinformationyoucanobtainfromthedata.

Page3of7

KMeansClusteringInterpretation
TheInitialClusterCenterstableshowsthefirststepinthekmeansclusteringinfindingthekcenters.

TheIterationHistorytableshowsthenumberofiterationsthatwereenoughuntilclustercentersdid notchangesubstantially.

Page4of7

TheClusterMembershiptablegivesyouthecaseclustereachcasebelongstoandtheEuclidean distanceofeachcasetotheclustercenter.Belowisaprintoutofthefirstandlast10cases.Visual inspectionofdistancesisnecessarytocheckforoutliersthatmaynotadequatelyreflectthepopulation.

TheFinalClusterCenterstablebelowallowsyoutodescribetheclustersbythevariables.Forexample, customersinCluster1tendtopurchasealotofservices,asevidencedbyvaluesabovethemeanforall variables.CustomersinCluster2tendtopurchasethe"calling"services,shownbypositivevaluesfor thefourcallingservices(callerID,callwaiting,callforwarding,and3waycalling).Customersin Cluster3tendtospendverylittleanddonotpurchasemanyservices;theyhavenegativevalueson mostofthevariables.

Page5of7

TheDifferencesbetweenFinalClusterCenterstableshowstheEuclideandistancesbetweenthefinal clustercenters.Greaterdistancesbetweenclustersmeantherearegreaterdissimilarities.

Clusters1and3havethegreatestdissimilarities.

Cluster2isequallysimilartoClusters1and3.

TheANOVAtableindicateswhichvariablescontributethemosttoyourclustersolution.Variableswith largemeansquareerrorsprovidetheleasthelpindifferentiatingbetweenclusters.Forexample,long distanceandcallingcardhadthetwohighestmeansquareerrors(andlowestFstatistics);therefore,the twovariableswerenotashelpfulastheothervariablesinforminganddifferentiatingclusters.

Page6of7

TheNumberofCasesineachClustertableillustratesthesplitofcasesintoclusters.Alargenumberof caseswereassignedtothethirdcluster,whichistheleastprofitablegroup.

Page7of7

Tata Vs Hyundai
No ratings yet
Tata Vs Hyundai
49 pages
K Means Cluster Analysis in SPSS
0% (1)
K Means Cluster Analysis in SPSS
2 pages
Bacher 2002 Cluster Analysis
No ratings yet
Bacher 2002 Cluster Analysis
199 pages
Ebook 037 Tutorial Spss K Means Cluster Analysis PDF
No ratings yet
Ebook 037 Tutorial Spss K Means Cluster Analysis PDF
13 pages
K Means Clustering
100% (1)
K Means Clustering
13 pages
11-12-K Means Using SPSS
No ratings yet
11-12-K Means Using SPSS
4 pages
SPSS Annotated Output K Means Cluster Anal
No ratings yet
SPSS Annotated Output K Means Cluster Anal
10 pages
Cluster Analysis
No ratings yet
Cluster Analysis
12 pages
Presentation Malo
No ratings yet
Presentation Malo
65 pages
Session 18-Cluster Analysis
No ratings yet
Session 18-Cluster Analysis
20 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
7th Exp Data Analytics
No ratings yet
7th Exp Data Analytics
5 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Cluster Analysis
No ratings yet
Cluster Analysis
3 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
Predictive Analytics and Data Mining: Segmentation Using Clustering
No ratings yet
Predictive Analytics and Data Mining: Segmentation Using Clustering
25 pages
Cluster Analysis Notes
No ratings yet
Cluster Analysis Notes
37 pages
Unit - 5 Cluster Analysis
No ratings yet
Unit - 5 Cluster Analysis
83 pages
Clustering: Analisis Big Data - Pertemuan 6
No ratings yet
Clustering: Analisis Big Data - Pertemuan 6
51 pages
K-Means Questions: K K K K K
No ratings yet
K-Means Questions: K K K K K
3 pages
SPSS Tutorial Cluster Analysis
No ratings yet
SPSS Tutorial Cluster Analysis
42 pages
SPSS Tutorial Cluster Analysis PDF
No ratings yet
SPSS Tutorial Cluster Analysis PDF
42 pages
K Means Clustering
No ratings yet
K Means Clustering
13 pages
DWDS Unit 6 Cluster Analysis (1)
No ratings yet
DWDS Unit 6 Cluster Analysis (1)
31 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
SPSS Week7
No ratings yet
SPSS Week7
42 pages
SPSS Week7
No ratings yet
SPSS Week7
42 pages
Cluster Analysis: Mala Srivastava
No ratings yet
Cluster Analysis: Mala Srivastava
21 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Cluster Analysis For Market Segmentation
No ratings yet
Cluster Analysis For Market Segmentation
24 pages
Cluster analysis
No ratings yet
Cluster analysis
23 pages
CLUSTER ANALYSIS-2
No ratings yet
CLUSTER ANALYSIS-2
7 pages
2021 BM MA Course Session 3 - Segmentation
No ratings yet
2021 BM MA Course Session 3 - Segmentation
20 pages
Cluster Analysis GP Seminar
No ratings yet
Cluster Analysis GP Seminar
13 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
10ClusBasic (1)
No ratings yet
10ClusBasic (1)
31 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Cluster Analysis
No ratings yet
Cluster Analysis
25 pages
10.cluster Analysis
No ratings yet
10.cluster Analysis
68 pages
Cluster Analysis
No ratings yet
Cluster Analysis
61 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Chapter 23 - Cluster Analysis
100% (1)
Chapter 23 - Cluster Analysis
16 pages
Chapter 04 Clustering
No ratings yet
Chapter 04 Clustering
36 pages
Cluster Analysis: Consumer Segmentation
No ratings yet
Cluster Analysis: Consumer Segmentation
17 pages
Advanced Mathematical Applications in Data Science
From Everand
Advanced Mathematical Applications in Data Science
Biswadip Basu Mallik
No ratings yet
Industrial Statistics: Application of Multivariate Statistical Methods in Marketing Research
No ratings yet
Industrial Statistics: Application of Multivariate Statistical Methods in Marketing Research
15 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
Cluster Analysis
No ratings yet
Cluster Analysis
2 pages
Interpretation Cluster Analysis
No ratings yet
Interpretation Cluster Analysis
8 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
CS8091 BDA Unit 2
No ratings yet
CS8091 BDA Unit 2
101 pages
Cluster Lecture-1
No ratings yet
Cluster Lecture-1
20 pages
Cluster Analysis - Part B
No ratings yet
Cluster Analysis - Part B
25 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
The Future of Search
From Everand
The Future of Search
Andres J. Clary
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
AC86VH Confirmed: Booking Reference Status Date of Booking
No ratings yet
AC86VH Confirmed: Booking Reference Status Date of Booking
1 page
Name Location Sap Code Regional HR Grade Function Contact No. Employee Email-Id
No ratings yet
Name Location Sap Code Regional HR Grade Function Contact No. Employee Email-Id
2 pages
Maruti Suzuki India Limited
No ratings yet
Maruti Suzuki India Limited
3 pages
PPT, Ipt
No ratings yet
PPT, Ipt
32 pages
Consume Er
No ratings yet
Consume Er
26 pages
The Basis of Market Segmentation: A Critical Review of Literature
No ratings yet
The Basis of Market Segmentation: A Critical Review of Literature
11 pages
Paper 65-Fraud Detection in Credit Cards
No ratings yet
Paper 65-Fraud Detection in Credit Cards
12 pages
IEEE Paper Format Template
No ratings yet
IEEE Paper Format Template
3 pages
R Programming Unit-2
No ratings yet
R Programming Unit-2
29 pages
Lecture-8 Outlier Detection
No ratings yet
Lecture-8 Outlier Detection
72 pages
Data Preparation and Analysis
No ratings yet
Data Preparation and Analysis
22 pages
DL_Unit1 (1)
100% (1)
DL_Unit1 (1)
79 pages
Bumps and Pothole Detection Report Final
No ratings yet
Bumps and Pothole Detection Report Final
64 pages
Cellular Manufacturing - 2023
No ratings yet
Cellular Manufacturing - 2023
22 pages
OCS353 Data Science Fundamentals QB_(Common to EEE,Mech,Civil)
No ratings yet
OCS353 Data Science Fundamentals QB_(Common to EEE,Mech,Civil)
7 pages
FCM The Fuzzy C Means Clustering Algorithm
No ratings yet
FCM The Fuzzy C Means Clustering Algorithm
14 pages
Consumer’s Perception of Food Pairing Products
No ratings yet
Consumer’s Perception of Food Pairing Products
8 pages
ch07 Clustering
No ratings yet
ch07 Clustering
56 pages
Kumpulan Jurnal Imam Riadi Internasional
No ratings yet
Kumpulan Jurnal Imam Riadi Internasional
339 pages
CV Generate
No ratings yet
CV Generate
1 page
Software Engineering
No ratings yet
Software Engineering
35 pages
STEP - Splunk Training and Enablement Platform
No ratings yet
STEP - Splunk Training and Enablement Platform
14 pages
Social and Natural Sciences Differ in Their Research Strategies, Adapted To Work For Different Knowledge Landscapes
No ratings yet
Social and Natural Sciences Differ in Their Research Strategies, Adapted To Work For Different Knowledge Landscapes
11 pages
Subtypes of Mathematical Learning Disability and Their Antecedents
No ratings yet
Subtypes of Mathematical Learning Disability and Their Antecedents
15 pages
02 ruchiJWoo35-49
No ratings yet
02 ruchiJWoo35-49
16 pages
Verdical Data Science
No ratings yet
Verdical Data Science
13 pages
An Introduction to Spatial Data Science with GeoDa Volume 2 Clustering Spatial Data 2nd Edition Luc Anselin pdf download
No ratings yet
An Introduction to Spatial Data Science with GeoDa Volume 2 Clustering Spatial Data 2nd Edition Luc Anselin pdf download
67 pages
Artificial Intelligence3-1
No ratings yet
Artificial Intelligence3-1
43 pages
Machine Learning Manual
No ratings yet
Machine Learning Manual
40 pages
Analysis and Prediction of Customer Segmentation Using Behavioral Data (1)
No ratings yet
Analysis and Prediction of Customer Segmentation Using Behavioral Data (1)
63 pages
Causal discovery algorithms A practical guide
No ratings yet
Causal discovery algorithms A practical guide
11 pages
Personality and Individual Differences: Graeme Galloway
No ratings yet
Personality and Individual Differences: Graeme Galloway
5 pages
Ccs360 Rs Iat2 Questions & Answers
No ratings yet
Ccs360 Rs Iat2 Questions & Answers
25 pages
bakshi2018
No ratings yet
bakshi2018
9 pages
Implementation of Data Mining To Classify The Consumer's Complaints of Electricity Usage Based On Consumer's Locations Using Clustering Method
No ratings yet
Implementation of Data Mining To Classify The Consumer's Complaints of Electricity Usage Based On Consumer's Locations Using Clustering Method
8 pages
Constrained Clustering
No ratings yet
Constrained Clustering
2 pages

K Means Handout

Uploaded by

K Means Handout

Uploaded by

ClusterAnalysis

TheClusterMembershiptablegivesyouthecaseclustereachcasebelongstoandtheEuclidean distanceofeachcasetotheclustercenter.Belowisaprintoutofthefirstandlast10cases.Visual inspectionofdistancesisnecessarytocheckforoutliersthatmaynotadequatelyreflectthepopulation.

You might also like