DMW Assignment 2

The document discusses clustering techniques for grouping data instances. It provides details on two clustering algorithms: 1. K-means clustering, which iteratively assigns data points to K clusters based on feature similarity and centroid positions. 2. K-medoids clustering, which is similar to K-means but chooses actual data points as cluster centers instead of averages. It is more robust to outliers. Both algorithms aim to minimize distances between points and cluster centers. The document also mentions RapidMiner software for data preparation, machine learning and clustering visualization.

Uploaded by

mad world

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views

DMW Assignment 2

Uploaded by

mad world

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Assignment 2

Aim: Consider a suitable dataset. For clustering of data instances in different groups,
apply different clustering techniques (minimum 2). Visualize the clusters using suitable
tool.

Objectives:
1. Understanding clustering Algorithms

Theory:
Clustering : Clustering is the grouping of a particular set of objects based on their
characteristics, aggregating them according to their similarities. Regarding to data
mining, this methodology partitions the data implementing a specific join algorithm,
most suitable for the desired information analysis. There are several different ways to
implement this partitioning, based on distinct models.

Centralized each cluster is represented by a single vector mean, and a object value is
compared to these mean values

Distributed – the cluster is built using statistical distributions

Connectivity – The connectivity on these models is based on a distance function

between elements

Group – algorithms have only group information

Graph – cluster organization and relationship between members is defined by a graph
linked structure

Density – members of the cluster are grouped by regions where observations are dense
and similar

Rapid Miner:
RapidMiner is a data science software platform developed by the company of the same

name that provides an integrated environment for data preparation, machine learning,

deep learning, text mining, and predictive analytics

KMeans Algorithm:

Kmeans clustering is a type of unsupervised learning, which is used when you
have unlabeled data (i.e., data without defined categories or groups). The goal of this
algorithm is to find groups in the data, with the number of groups represented by the
variable K. The algorithm works iteratively to assign each data point to one of K groups
based on the features that are provided. Data points are clustered based on feature
similarity. The results of the Kmeans clustering algorithm are:

1.The centroids of the K clusters, which can be used to label new data

2. Labels for the training data (each data point is assigned to a single cluster)

Deciding the number of clusters

The number of clusters should match the data. An incorrect choice of the number of
clusters will invalidate the whole process. An empirical way to find the best number of
clusters is to try Kmeans clustering with different number of clusters and measure the
resulting sum of squares.
Algorithm

1. Clusters the data into k groups where k is predefined.
2. Select k points at random as cluster centers.
3. Assign objects to their closest cluster center according to the Euclidean distance
function.
4. Calculate the centroid or mean of all objects in each cluster.
5. Repeat steps 2, 3 and 4 until the same points are assigned to each cluster in
consecutive rounds.

kmedoids algorithm

The kmedoids or PAM algorithm is a clustering algorithm reminiscent to the kmeans
algorithm. Both the kmeans and kmedoids algorithms are partitional (breaking the
dataset up into groups) and both attempt to minimize the distance between points
labeled to be in a cluster and a point designated as the center of that cluster. In contrast
to the kmeans algorithm,kmedoids chooses data points as centers (medoids or
exemplars) and can be used with arbitrary distances, while in kmeans the centre of a
clusters is not necessarily one of the input data points (it is the average between the
points in the cluster). kmedoid is a classical partitioning technique of clustering, which
clusters the data set of n objects into k clusters, with the number k of clusters assumed
known a priori (which implies that the programmer must specify k before the execution
of the algorithm). The "goodness" of the given value of k can be assessed with methods
such as silhouette. It is more robust to noise and outliers as compared to kmeans
because it minimizes a sum of pairwise dissimilarities instead of a sum of squared
Euclidean distances. A medoid can be defined as the object of a cluster whose average
dissimilarity to all the objects in the cluster is minimal, that is, it is a most centrally
located point in the cluster

LY COMP Syllabus Pattern 2021
No ratings yet
LY COMP Syllabus Pattern 2021
25 pages
Appian Quick Reference Guide
No ratings yet
Appian Quick Reference Guide
4 pages
unit4_ml[1]
No ratings yet
unit4_ml[1]
20 pages
Unit 5 - Cluster Analysis
No ratings yet
Unit 5 - Cluster Analysis
14 pages
Unit 5 - Cluster Analysis
No ratings yet
Unit 5 - Cluster Analysis
14 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
CV UNIT 4
No ratings yet
CV UNIT 4
60 pages
Ambo University: Inistitute of Technology
No ratings yet
Ambo University: Inistitute of Technology
15 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Unit - V DW
No ratings yet
Unit - V DW
6 pages
Unit 4
No ratings yet
Unit 4
4 pages
Exp5 - Unsupervised Learning
No ratings yet
Exp5 - Unsupervised Learning
13 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Unit-5 DM
No ratings yet
Unit-5 DM
11 pages
unit 4 mining
No ratings yet
unit 4 mining
12 pages
Unit - 4 DM
No ratings yet
Unit - 4 DM
24 pages
ML U5
No ratings yet
ML U5
24 pages
ML extended
No ratings yet
ML extended
25 pages
Data Mining Notes UNIT IV
No ratings yet
Data Mining Notes UNIT IV
19 pages
Clustering
No ratings yet
Clustering
11 pages
Ds Econtent
No ratings yet
Ds Econtent
8 pages
DMDW Qa-5
No ratings yet
DMDW Qa-5
7 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
Unit 4 Introduction to Algorithm
No ratings yet
Unit 4 Introduction to Algorithm
10 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Clustering
No ratings yet
Clustering
7 pages
MLP U4
No ratings yet
MLP U4
11 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
DMBI5
No ratings yet
DMBI5
9 pages
DSV_Unit 3_Data Analysis in Depth
No ratings yet
DSV_Unit 3_Data Analysis in Depth
53 pages
Comparison of Different Clustering Algorithms Using WEKA Tool
No ratings yet
Comparison of Different Clustering Algorithms Using WEKA Tool
3 pages
Gautam A. Kudale
No ratings yet
Gautam A. Kudale
6 pages
Hierarchial Clustering
No ratings yet
Hierarchial Clustering
14 pages
Cluster Analysis Clustering
No ratings yet
Cluster Analysis Clustering
6 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
UNIT 2 DMW
No ratings yet
UNIT 2 DMW
26 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
Clustering
No ratings yet
Clustering
7 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
DM Clustering UNIT4
No ratings yet
DM Clustering UNIT4
36 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Modeling_ KNN, K-Means, Hierarchical
No ratings yet
Modeling_ KNN, K-Means, Hierarchical
4 pages
Unsupervised Learning - Clustering Cheatsheet - Codecademy
No ratings yet
Unsupervised Learning - Clustering Cheatsheet - Codecademy
5 pages
DMW UNIT 5
No ratings yet
DMW UNIT 5
10 pages
A Novel Approach For Data Clustering Using Improved K-Means Algorithm PDF
No ratings yet
A Novel Approach For Data Clustering Using Improved K-Means Algorithm PDF
6 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
A Density Clustering Based On Outlier
No ratings yet
A Density Clustering Based On Outlier
6 pages
10 Marks Questions
No ratings yet
10 Marks Questions
19 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
Image Segmentation1
No ratings yet
Image Segmentation1
42 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
CS Practice Paper
No ratings yet
CS Practice Paper
12 pages
Sony Cmt-x7cd SVM
No ratings yet
Sony Cmt-x7cd SVM
102 pages
Deep Learning AMI: Developer Guide
No ratings yet
Deep Learning AMI: Developer Guide
142 pages
Presentation 1
No ratings yet
Presentation 1
18 pages
Oose Case Study
No ratings yet
Oose Case Study
8 pages
Dell Inspiron 15 5570 LA-F114PR10 CAL60 UMA 20170726
No ratings yet
Dell Inspiron 15 5570 LA-F114PR10 CAL60 UMA 20170726
53 pages
Telemac Guide For Programming v6p0
No ratings yet
Telemac Guide For Programming v6p0
143 pages
NUMERICAL SEMIGROUPS AND APPLICATIONS 2nd Edition Abdallah. D'Anna Marco. Garcia-Sanchez Pedro A. Assi All Chapters Instant Download
100% (2)
NUMERICAL SEMIGROUPS AND APPLICATIONS 2nd Edition Abdallah. D'Anna Marco. Garcia-Sanchez Pedro A. Assi All Chapters Instant Download
55 pages
Volatile Memory: Nonvolatile Memory Non-Volatile Memory (NVM) Non-Volatile
No ratings yet
Volatile Memory: Nonvolatile Memory Non-Volatile Memory (NVM) Non-Volatile
5 pages
Datasheet PDF
No ratings yet
Datasheet PDF
24 pages
User Registration Guide 10052017 (F)
No ratings yet
User Registration Guide 10052017 (F)
19 pages
Java Programming Using Linux, March
No ratings yet
Java Programming Using Linux, March
2 pages
Verilog Exercisesheet
No ratings yet
Verilog Exercisesheet
6 pages
Class X: Polynomial: Alfa Circle Contact No:-9621645520
No ratings yet
Class X: Polynomial: Alfa Circle Contact No:-9621645520
1 page
AFDX Tutorial: Session One: AFDX Background
No ratings yet
AFDX Tutorial: Session One: AFDX Background
29 pages
Buot Sol Andrew v. - Application Development and Emerging Technologies
No ratings yet
Buot Sol Andrew v. - Application Development and Emerging Technologies
7 pages
Unit 2 Emerging Technologies
No ratings yet
Unit 2 Emerging Technologies
19 pages
Da Pacem Domine Canon Sheet Music For Vocals (Choral)
No ratings yet
Da Pacem Domine Canon Sheet Music For Vocals (Choral)
1 page
Practical 3
No ratings yet
Practical 3
5 pages
Work Resume PDF
No ratings yet
Work Resume PDF
5 pages
CH 10 Error Detection and Correction Multiple Choice Questions and Answers MCQ PDF - Data Communication
No ratings yet
CH 10 Error Detection and Correction Multiple Choice Questions and Answers MCQ PDF - Data Communication
9 pages
Face Prep Capgemini Slot Analysis 23rd Aug 2021 Slot 1
No ratings yet
Face Prep Capgemini Slot Analysis 23rd Aug 2021 Slot 1
17 pages
Handouts ICT IP3
No ratings yet
Handouts ICT IP3
72 pages
OWASP Foundation, The Open Source Foundation For Application Security - OWASP Foundation
No ratings yet
OWASP Foundation, The Open Source Foundation For Application Security - OWASP Foundation
4 pages
Manual InvGate Insight en
No ratings yet
Manual InvGate Insight en
179 pages
Module 3. Derivatives 1
No ratings yet
Module 3. Derivatives 1
16 pages
Historian DB+REG
No ratings yet
Historian DB+REG
21 pages
TP VAM Solved Examples
No ratings yet
TP VAM Solved Examples
5 pages

DMW Assignment 2

Uploaded by

DMW Assignment 2

Uploaded by

Assignment 2

­ Connectivity – The connectivity on these models is based on a distance function

You might also like

Connectivity – The connectivity on these models is based on a distance function