Unit 3 - Clustering: K Means Algorithm

The document describes the k-means clustering algorithm and its implementation in R. It discusses the steps of k-means which are to choose the number of clusters k, initialize the centroids, assign points to the closest centroid, recompute the centroids, and repeat until convergence. It also discusses determining the number of clusters using within-sum-of-squares and performs k-means clustering on sample data in R. Finally, it provides an overview of decision trees for classification problems and demonstrates building a decision tree model in R using the rpart package.

Uploaded by

Dhanush Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views48 pages

Unit 3 - Clustering: K Means Algorithm

Uploaded by

Dhanush Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Unit 3 -Clustering

K Means Algorithm
Algorithm Steps
• Choose the valueof k and the k initial guesses for the centroids

• Compute the centroid, the center of mass, of each newly defined

cluster from Step2.
• Repeat Steps 2 and 3 until the algorithm converges to
an answer.
1. Assign each point to the closest centroid computed in Step3.
2. Compute the centroid of newly defined clusters.
3. Repeat until the algorithm reaches the final answer.
Initial starting points forthe centroids
Points are assigned to the closest centroid
Compute the mean of each cluster
Determining the Number of Clusters
• WSS is the sum of the squares of the distances between each data
point and the closest centroid.
• The term indicates the closest centroid that is associated with
the ith point.
Using R to Perform a K-means Analysis
• WSS to determine an appropriate number, k of clusters.
library(plyr)
library(ggplot2)
library(cluster)
library(lattice)
library(graphics)
library(grid)
library(gridExtra)
grade_input =as.data.frame(read.csv(“c:/data/grades_km_input.csv”))
kmdata_orig
=as.matrix(grade_input[,c(“Student”,“English”,“Math”,“Science”)])
kmdata<-kmdata_orig[,2:4]
kmdata[1:10,]

wss<-numeric(15)
for(k in 1:15)
wss[k]<-sum(kmeans(kmdata,centers=k, nstart=25)$withinss)

plot(1:15,wss,type=“b”,xlab=“Number of Clusters”,ylab=“Within Sum of

Squares”)
km=kmeans(kmdata,3,nstart=25) km
Classification
• prediction purposes
• Given input the goal is to predict a response or output
variable Y. Each member of the set is called an input
variable.
• The input values of a decision tree can be categorical or
continuous
• Test points(called nodes) and branches, which represent the decision
being made
• A node without further branches is called a leaf node.
Root Node: It represents entire population or sample and this further gets divided into two or
more homogeneous sets.
Splitting: It is a process of dividing a node into two or more sub-nodes.
Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node.
Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node.
Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can
say opposite process of splitting.
Branch / Sub-Tree: A sub section of entire tree is called branch or sub-tree.
Parent and Child Node: A node, which is divided into sub-nodes is called parent node of
sub-nodes where as sub-nodes are the child of parent node.
Car dataset description
R in Decision Tree
• It holds tools for data splitting, pre-processing, feature selection,
tuning.
• Install Package
install.packages("caret")
install.packages("rpart.plot")
library(caret)
library(rpart.plot)
library(rpart)
library(rpart.plot)
setwd("C:/Users/HP/Desktop/MKCE/even sem/big data lab")
play_decision<-read.table("DTdata.csv",header=TRUE,sep=',')
summary(play_decision)
fit<-rpart(Play~Outlook+Temperature+Humidity+Wind,
method="class",
data=play_decision,
control=rpart.control(minsplit=1),
parms=list(split='information'))
summary(fit)
rpart.plot(fit,type=4,extra=1)
rpart.plot(fit, type=4, extra=2,clip.right.labs=FALSE,varlen=0,faclen=0)
newdata<-data.frame(Outlook="rainy",Temperature="mild",Humidity="hi
gh",Wind=FALSE)

predict(fit,newdata=newdata,type="prob")
predict(fit,newdata=newdata,type="class")
Naive Bayes

Comandos Básicos No Termux:: (Apt List) (Chmod +X .SH) (Python - Py)
67% (3)
Comandos Básicos No Termux:: (Apt List) (Chmod +X .SH) (Python - Py)
9 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
Overview of Clustering K Means
No ratings yet
Overview of Clustering K Means
8 pages
ML4 ML Algorithms
No ratings yet
ML4 ML Algorithms
123 pages
Random Forest
No ratings yet
Random Forest
83 pages
FMLanswerkey-IT 2
No ratings yet
FMLanswerkey-IT 2
11 pages
Datamining Lab Record
No ratings yet
Datamining Lab Record
36 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
STAT452 Project1
No ratings yet
STAT452 Project1
13 pages
4 Clustring
No ratings yet
4 Clustring
48 pages
unit 3--ML (NEW)
No ratings yet
unit 3--ML (NEW)
68 pages
R Cluster Analysis
No ratings yet
R Cluster Analysis
5 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
Session 18-Cluster Analysis
No ratings yet
Session 18-Cluster Analysis
20 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Note 6
No ratings yet
Note 6
33 pages
Lecture Material 12
No ratings yet
Lecture Material 12
9 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
Seminar 3
No ratings yet
Seminar 3
43 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
Chap 8
No ratings yet
Chap 8
9 pages
CSC649 Lecture 3 Unsupervised ML - KMeansClustering
No ratings yet
CSC649 Lecture 3 Unsupervised ML - KMeansClustering
22 pages
Unit 6 - Machine Learning in R
No ratings yet
Unit 6 - Machine Learning in R
45 pages
Decision Trees: Make A Decision (Represent An Outcome
No ratings yet
Decision Trees: Make A Decision (Represent An Outcome
4 pages
Data Mining Unit-2
No ratings yet
Data Mining Unit-2
37 pages
Lecture 7 - Integrated Analysis With R
No ratings yet
Lecture 7 - Integrated Analysis With R
79 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Statistical Computing With R: Masters in Data Sciences 503 (S27) Third Batch, SMS, TU, 2024
No ratings yet
Statistical Computing With R: Masters in Data Sciences 503 (S27) Third Batch, SMS, TU, 2024
30 pages
R Lab Program
No ratings yet
R Lab Program
20 pages
RDataMining Reference Card
No ratings yet
RDataMining Reference Card
5 pages
U02Lecture08 Statistical Machine Learning
No ratings yet
U02Lecture08 Statistical Machine Learning
41 pages
Cluster Analysis: Minh Tran, PHD
No ratings yet
Cluster Analysis: Minh Tran, PHD
37 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Unit - II
No ratings yet
Unit - II
37 pages
IDS Unit-3 L2
No ratings yet
IDS Unit-3 L2
26 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
Chap9 Cart 574 1
No ratings yet
Chap9 Cart 574 1
42 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
08 Tree Classification
No ratings yet
08 Tree Classification
22 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
R Course - Part7 ML - Exercise Sheet 2024
No ratings yet
R Course - Part7 ML - Exercise Sheet 2024
8 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Day 3
No ratings yet
Day 3
74 pages
K-Means Clustering
No ratings yet
K-Means Clustering
18 pages
K-Means Cluster Analysis UC Business Analytics R Programming Guide
No ratings yet
K-Means Cluster Analysis UC Business Analytics R Programming Guide
19 pages
Decision Tree (Autosaved)
No ratings yet
Decision Tree (Autosaved)
14 pages
Dar Lect 12
No ratings yet
Dar Lect 12
29 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
ML Notes
No ratings yet
ML Notes
12 pages
Decision Tree R
No ratings yet
Decision Tree R
5 pages
Data Mining and Machine Learning
No ratings yet
Data Mining and Machine Learning
48 pages
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
No ratings yet
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
7 pages
Clustering in R
No ratings yet
Clustering in R
12 pages
Overview of Clustering:: UNIT-5
No ratings yet
Overview of Clustering:: UNIT-5
27 pages
Materi Praktikum
No ratings yet
Materi Praktikum
7 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
77 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
C Programs For Practice
No ratings yet
C Programs For Practice
3 pages
Microprocessor and Microcontroller
No ratings yet
Microprocessor and Microcontroller
37 pages
LN1 (Introduction - Edited)
No ratings yet
LN1 (Introduction - Edited)
27 pages
Sepm
No ratings yet
Sepm
303 pages
'This' Reference in Java - GeeksforGeeks
No ratings yet
'This' Reference in Java - GeeksforGeeks
15 pages
Themida + Win License 1.x - 2.x Multi Pro Edition v1.2
No ratings yet
Themida + Win License 1.x - 2.x Multi Pro Edition v1.2
114 pages
Sample Resume2
No ratings yet
Sample Resume2
3 pages
Program Kreasi Rumah Makan
No ratings yet
Program Kreasi Rumah Makan
4 pages
Oracle Financials Faqs
No ratings yet
Oracle Financials Faqs
62 pages
CSE130 Asgn2 Winter23 v7-1
No ratings yet
CSE130 Asgn2 Winter23 v7-1
10 pages
Abstraction & Polymorphism
No ratings yet
Abstraction & Polymorphism
25 pages
Microservices Implementation With Ocelot Gateway Using .NET Core 6 API and Angular 14
No ratings yet
Microservices Implementation With Ocelot Gateway Using .NET Core 6 API and Angular 14
34 pages
Tinku Kumar
No ratings yet
Tinku Kumar
16 pages
AI Lab-2
No ratings yet
AI Lab-2
44 pages
PHP and MYSQL Interview Questions and Answers PART 1
No ratings yet
PHP and MYSQL Interview Questions and Answers PART 1
25 pages
A14 R30iB+ Program Limits Mar 2020
No ratings yet
A14 R30iB+ Program Limits Mar 2020
29 pages
Linux Lab
No ratings yet
Linux Lab
4 pages
Computer Science Practical File (Term-2) PDF
No ratings yet
Computer Science Practical File (Term-2) PDF
18 pages
All Revit Parameters Explained
No ratings yet
All Revit Parameters Explained
35 pages
Unit 3 - Java Programming - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Java Programming - WWW - Rgpvnotes.in
32 pages
10th IT Practicals 2023-24
No ratings yet
10th IT Practicals 2023-24
34 pages
Get The Most Out of HttpClient
No ratings yet
Get The Most Out of HttpClient
39 pages
9 PLC Program To Implement Binary To Gray Code Conversion
No ratings yet
9 PLC Program To Implement Binary To Gray Code Conversion
5 pages
4 DB Relmod
No ratings yet
4 DB Relmod
51 pages
PESU BTech Jan2017 6thsemCourseInfo
No ratings yet
PESU BTech Jan2017 6thsemCourseInfo
51 pages
Advanced Big Data Analytics: Dr. M. Kashif Hanif
No ratings yet
Advanced Big Data Analytics: Dr. M. Kashif Hanif
23 pages
24B-HCM Redwood Personalization Helper FAQ
No ratings yet
24B-HCM Redwood Personalization Helper FAQ
2 pages
Unit-3 (C++ Programming) 26-March-2025
No ratings yet
Unit-3 (C++ Programming) 26-March-2025
41 pages
Java Assignments Helpful For Beginners
No ratings yet
Java Assignments Helpful For Beginners
27 pages