100% found this document useful (1 vote)

90 views

Cluster Analysis

Linear regression predicts numerical values, while logistic regression predicts categorical values. Classification involves classifying data into categories based on class labels. Clustering groups similar objects together without predefined labels. Some applications of clustering include market segmentation, identifying land use patterns, grouping insurance customers by risk levels, and finding climate patterns. Clustering measures similarity between objects to group those that are most similar based on distance or correlation.

Uploaded by

Naman Jain

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

90 views

Cluster Analysis

Uploaded by

Naman Jain

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 58

1

Linear Regression is used to predict for numerical value.

Logistics Regression is used for prediction of non numerical value

(categorical value)

Classification is the process of classifying the data with the help of class
labels
Wednesday, December 09, 2020 3
Wednesday, December 09, 2020 4
Wednesday, December 09, 2020 5
Wednesday, December 09, 2020 6
The Wine Quality

The Wine Quality Dataset involves predicting the quality of white wines Good/Bad
for given chemical measures of each wine.
1.Fixed acidity.
2.Volatile acidity.
3.Citric acid.
4.Residual sugar.
5.Chlorides.
6.Free sulfur dioxide.
7.Total sulfur dioxide.
8.Density.
9.pH.
10.Sulphates.
11.Alcohol.
12.Quality (Good/Bad).
Wednesday, December 09, 2020 7
Example
The below table has information about 20 wines sold in the market along with their alcohol
and alkalinity of ash content

Alkalinity of
Wine Alcohol Wine Alcohol Alkalinity of Ash
Ash

1 14.8 28 11 10.7 12.2

2 11.05 12 12 14.3 27
3 12.2 21 13 12.4 19.5
4 12 20 14 14.85 29.2
5 14.5 29.5 15 10.9 13.6
6 11.2 13 16 13.9 29.7
7 11.5 12 17 10.4 12.2
8 12.8 19 18 10.8 13.6
9 14.75 28.8 19 14 28.8
10 10.5 14 20 12.47 22.8

Wednesday, December 09, 2020 8

Wednesday, December 09, 2020 9
Classification and Clustering
Classification and Clustering are the two types of learning methods which characterize objects into groups
by one or more features. These processes appear to be similar, but there is a difference between them

Wednesday, December 09, 2020 10

Classification and Clustering

Wednesday, December 09, 2020 11

Classification and Clustering

Wednesday, December 09, 2020 12

Regression and Clustering

Regression:
• It predicts continuous values and their output. Regression analysis is the
statistical model that is used to predict the numeric data instead of
labels.

• It can also identify the distribution trends based on the available data or
historic data.

• Predicting a person's income based on various attributes such as age and

experience is an example of creating a regression model.

Wednesday, December 09, 2020 13

Regression and Clustering
Clustering:
• Clustering is quite literally the clustering or grouping up of data
according to the similarity of data points and data patterns.

• The aim of this is to separate similar categories of data and differentiate

them into localized regions.

• This way, when a new data point arrives, we can easily identify which
group or cluster it belongs to.

• This is done for unstructured datasets where it is up to the machine to

figure out the categories.
Wednesday, December 09, 2020 14
INTRODUCTION TO CLUSTERING

• Clustering is usually one of the first tasks performed in most

analytics projects.

• It helps data scientists to analyze individual clusters further

• A cluster refers to a collection of data points aggregated

together because of certain similarities.

Wednesday, December 09, 2020 15

• Clustering is the classification of objects into different groups,
or more precisely, the partitioning of a data set into subsets
(clusters), so that the data in each subset (ideally) share some
common trait - often according to some defined distance
measure.

Wednesday, December 09, 2020 16

When you were still young and naïve…
 Classification

Wednesday, December 09, 2020 17

You may classify them by
 Shape

Wednesday, December 09, 2020 18

You may classify them by
 Color

Wednesday, December 09, 2020 19

You may classify them by
 Sum of internal
angles

Wednesday, December 09, 2020 20

Classification
 Shape
 Color
 Sum of internal
angles

 Similarity of characteristics

Wednesday, December 09, 2020 21

In clustering, we do not have a target to predict. We look at
the data and then try to club similar observations and form
different groups.

Wednesday, December 09, 2020 22

What is Cluster Analysis?

• Finding groups of objects such that the objects in a group will be similar (or related) to one
another and different from (or unrelated to) the objects in other groups

Inter-cluster
Intra-cluster distances are
distances are maximized
minimized

Wednesday, December 09, 2020 23

• Clustering: the act of grouping “similar” object into sets.

• A clustering problem can be viewed as unsupervised classification.

• Clustering is appropriate when there is no a priori knowledge about

the data.

• Grouping is based on the distance (proximity)

• You don’t know who or what belongs to which group. Not even the
number of groups.
Wednesday, December 09, 2020 24
Applications of Clustering Applications
Example:
 A bank wants to give credit card offers to its customers. Currently, they look
at the details of each customer and based on this information, decide which
offer should be given to which customer.
 Now, the bank can potentially have millions of customers. Does it make
sense to look at the details of each customer separately and then make a
decision? Certainly not! It is a manual process and will take a huge amount
of time.
 So what can the bank do? One option is to segment its customers into
different groups. For instance, the bank can group the customers based on
their income:

Wednesday, December 09, 2020 25

Applications of Clustering Applications

• cluster analysis in
market segmentation:

• Help marketers
discover distinct
groups in their
customer bases, and
then use this
knowledge to develop
targeted marketing
programs

Wednesday, December 09, 2020 26

• Land use: Identification of • Insurance: Identifying groups of
areas of similar land use in insurance policy holders with a high
an earth observation average claim cost
database

Wednesday, December 09, 2020 27

• City-planning: Identifying groups
of houses according to their
house type, value, and
geographical location

• The field of psychiatry: The

characterization of patients on
the basis of clusters of symptoms
can be useful in the identification
of an appropriate form of
therapy.
Wednesday, December 09, 2020 28
• Earth-quake studies: Observed
earth quake epicenters should be
clustered along continent faults

Wednesday, December 09, 2020 29

Climate - Understanding the
Earth’s climate requires
finding patterns in the
atmosphere and ocean. To
that end, cluster analysis has
been applied to find patterns
in the atmospheric pressure
of Polar Regions and areas of
the ocean that have a
significant impact on land
climate

Wednesday, December 09, 2020 30

Measuring similarity

 Similarity
 The degree of correspondence among objects across
all of the characteristics.

 Correlational measures
 Distance measures

Wednesday, December 09, 2020 31

Similarity measure
 Correlation measure
 Grouping cases base on respondent pattern
 Distance measure
 Grouping cases base on distance

Wednesday, December 09, 2020 32

Non-overlapping clusters
Cluster in which each observation belongs to only one cluster. Non-
overlapping clusters are more frequently used clustering techniques in
practice.

Wednesday, December 09, 2020 33

Overlapping clusters
• An observation may belong to more than one cluster

Wednesday, December 09, 2020 34

Example
The below table has information about 20 wines sold in the market along with their alcohol
and alkalinity of ash content

Alkalinity of
Wine Alcohol Wine Alcohol Alkalinity of Ash
Ash

1 14.8 28 11 10.7 12.2

Wednesday, December 09, 2020 35

Clusters of wine based on alcohol and ash content.

Wednesday, December 09, 2020 36

Types of Clustering

• A clustering is a set of clusters

• Important distinction between hierarchical and partitional sets of clusters

• Partitional Clustering[k means clustering]

• A division data objects into non-overlapping subsets (clusters) such that
each data object is in exactly one subset

• Hierarchical clustering
• A set of nested clusters organized as a hierarchical tree

Wednesday, December 09, 2020 37

K-Means Clustering

• K-means clustering is one of the simplest and popular unsupervised machine

learning algorithms.

• Data set into K distinct, non-overlapping clusters.

• There is an algorithm that tries to minimize the distance of the points in a

cluster with their centroid – the k-means clustering technique..

• The main idea is to define k centers, one for each cluster.

Wednesday, December 09, 2020 38

How to choose the value of K?
• One of the most challenging tasks in this clustering algorithm is to choose
the right values of k.
• What should be the right k-value?
• How to choose the k-value?
• If you are choosing the k values randomly, it might be correct or may be
wrong.
• If you will choose the wrong value then it will directly affect your model
performance.
• So there are two methods by which you can select the right value of k.

• Elbow Method.
• Silhouette Method.
Wednesday, December 09, 2020 39
• You’ll define a target number k, which refers to the number of
centroids you need in the dataset. A centroid is the imaginary or real
location representing the center of the cluster.

• Every data point is allocated to each of the clusters through reducing

the in-cluster sum of squares.

• In other words, the K-means algorithm identifies k number of

centroids, and then allocates every data point to the nearest cluster,
while keeping the centroids as small as possible.

Wednesday, December 09, 2020 40

• Figure shows the results obtained from performing K-means clustering on a
simulated example consisting of 150 observations in two dimensions, using three
different values of K.

Note that there is no ordering of the clusters, so the cluster coloring is arbitrary.
Wednesday, December 09, 2020 41
Advantages of K-means
• It is very simple to implement.
• It is scalable to a huge data set and also faster to large datasets.
• it adapts the new examples very frequently.
• Generalization of clusters for different shapes and sizes.

Disadvantages of K-means
• It is sensitive to the outliers.
• Choosing the k values manually is a tough job.
• As the number of dimensions increases its scalability decreases.

Wednesday, December 09, 2020 42

Wednesday, December 09, 2020 43
CLASS ROOM PROBLEMS

Wednesday, December 09, 2020 44

Wednesday, December 09, 2020 45
Wednesday, December 09, 2020 46
Wednesday, December 09, 2020 47
Wednesday, December 09, 2020 48
CLASS ROOM PROBLEMS

Cluster the following eight points (with (x, y) representing locations) into two
clusters:
A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)

Wednesday, December 09, 2020 49

CLASS ROOM PROBLEMS

Cluster the following points (with (x, y) representing locations) into two clusters:

Wednesday, December 09, 2020 50

CLASS ROOM PROBLEMS

Cluster the following points (with (x, y) representing locations) into two clusters:

Wednesday, December 09, 2020 51

CLASS ROOM PROBLEMS

Cluster the following points (with (x, y) representing locations) into two clusters:

Wednesday, December 09, 2020 52

Hierarchical clustering

Hierarchical clustering will help to determine the optimal number of clusters.

1.It starts by putting every point in its own cluster, so each cluster is a singleton

2.It then merges the 2 points that are closest to each other based on the distances from
the distance matrix. The consequence is that there is one less cluster

3.It then recalculates the distances between the new and old clusters and save them in
a new distance matrix which will be used in the next step

4.Finally, steps 1 and 2 are repeated until all clusters are merged into one single cluster
including all points.

Wednesday, December 09, 2020 53

There exists 5 main methods to measure the distance between clusters, referred as
linkage methods:
1.Single linkage: computes the minimum distance between clusters before merging
them.

2.Complete linkage: computes the maximum distance between clusters before merging
them.

3.Average linkage: computes the average distance between clusters before merging
them.

4.Centroid linkage: calculates centroids for both clusters, then computes the distance
between the two before merging them.

5.Ward’s (minimum variance) criterion: minimizes the total within-cluster variance and
find the pair of clusters that leads to minimum increase in total within-cluster variance
after merging.
Wednesday, December 09, 2020 54
Dendogram: -
• A dendrogram is a diagram that shows the hierarchical relationship between objects.

• It is most commonly created as an output from hierarchical clustering.

• The main use of a dendrogram is to work out the best way to allocate objects to
clusters.

• The dendrogram below shows the hierarchical clustering of six observations shown on

the scatterplot to the left.

Wednesday, December 09, 2020 55

• The key to interpreting a dendrogram is to focus on the height at which any two objects
are joined together.

• In the example above, we can see that E and F are most similar, as the height of the link
that joins them together is the smallest. The next two most similar objects are A and B.

• In the dendrogram above, the height of the dendrogram indicates the order in which the
clusters were joined.

Wednesday, December 09, 2020 56

Wednesday, December 09, 2020 57
Wednesday, December 09, 2020 58

QuantEconlectures Python3 PDF
100% (1)
QuantEconlectures Python3 PDF
1,125 pages
Key To The UK Species of Bembidion
No ratings yet
Key To The UK Species of Bembidion
54 pages
Mitsubishi Lancer Diesel 4D68 Workshop Manual - Engine
No ratings yet
Mitsubishi Lancer Diesel 4D68 Workshop Manual - Engine
68 pages
Cluster Analysis Concept & Methods
No ratings yet
Cluster Analysis Concept & Methods
14 pages
Chap 003
No ratings yet
Chap 003
38 pages
Discriminant Analysis
100% (1)
Discriminant Analysis
32 pages
Supervised-Unsupervised Learning
No ratings yet
Supervised-Unsupervised Learning
2 pages
Discriminant Function Analysis: Basics Psy524 Andrew Ainsworth
No ratings yet
Discriminant Function Analysis: Basics Psy524 Andrew Ainsworth
39 pages
Correlation & Regression
100% (1)
Correlation & Regression
53 pages
Multiple Discriminant Analysis
No ratings yet
Multiple Discriminant Analysis
18 pages
Time Series Data: y + X + - . .+ X + U
No ratings yet
Time Series Data: y + X + - . .+ X + U
81 pages
Stats For Managers - Intro
100% (1)
Stats For Managers - Intro
101 pages
1 The Role of Statistics and The Data Analysis Process
100% (1)
1 The Role of Statistics and The Data Analysis Process
30 pages
Introduction To STATISTICS-new
100% (1)
Introduction To STATISTICS-new
46 pages
KPMG
100% (1)
KPMG
2 pages
Discriminant Analysis
0% (1)
Discriminant Analysis
16 pages
2-Introduction of Machine Learning
No ratings yet
2-Introduction of Machine Learning
39 pages
8multiple Linear Regression
100% (1)
8multiple Linear Regression
21 pages
M&A Deal of ABC Inc. and XYZ Inc.: Insert Your Title Here
100% (1)
M&A Deal of ABC Inc. and XYZ Inc.: Insert Your Title Here
25 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Forecasting of Stock Prices Using Multi Layer Perceptron
100% (1)
Forecasting of Stock Prices Using Multi Layer Perceptron
6 pages
Correlation and Regression - The Simple Case
100% (2)
Correlation and Regression - The Simple Case
106 pages
Logistic Regression
100% (1)
Logistic Regression
14 pages
1.1 Simple Linear Regression Model
100% (1)
1.1 Simple Linear Regression Model
15 pages
Supervised Unsupervised
No ratings yet
Supervised Unsupervised
12 pages
Quiz Feedback1 - Coursera
100% (1)
Quiz Feedback1 - Coursera
7 pages
Risk Return Summery
100% (1)
Risk Return Summery
85 pages
Control Charts
No ratings yet
Control Charts
36 pages
CPE412 Pattern Recognition (Week 8)
100% (1)
CPE412 Pattern Recognition (Week 8)
25 pages
Chapter-18: Research Methodology
No ratings yet
Chapter-18: Research Methodology
19 pages
Cluster Analysis
No ratings yet
Cluster Analysis
12 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
Stat1012 Cheatsheet Double-Sided
100% (1)
Stat1012 Cheatsheet Double-Sided
2 pages
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
100% (1)
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
27 pages
7. Heteroscedasticity: y = β + β x + · · · + β x + u
100% (1)
7. Heteroscedasticity: y = β + β x + · · · + β x + u
21 pages
Taller Practica Churn
50% (2)
Taller Practica Churn
6 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
Process Capability: K.Masan Sr. Manager QA - Moldtek Plastics Limited
No ratings yet
Process Capability: K.Masan Sr. Manager QA - Moldtek Plastics Limited
19 pages
KPMG - Data Set
100% (1)
KPMG - Data Set
1,685 pages
Clustering Techniques
No ratings yet
Clustering Techniques
38 pages
Photon Prog Guide
100% (1)
Photon Prog Guide
919 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
The Box-Jenkins Methodology For RIMA Models
No ratings yet
The Box-Jenkins Methodology For RIMA Models
172 pages
Logistic Regression
100% (1)
Logistic Regression
56 pages
EFFIE 2002 Case Studies
100% (1)
EFFIE 2002 Case Studies
16 pages
Blank: CFC Cumulative Forecast Error or Bias Error
100% (1)
Blank: CFC Cumulative Forecast Error or Bias Error
2 pages
Cluster
100% (1)
Cluster
72 pages
ML Lecture 2 Supervised Learning Setup
No ratings yet
ML Lecture 2 Supervised Learning Setup
38 pages
Univariate and Bivariate Data Analysis + Probability
100% (1)
Univariate and Bivariate Data Analysis + Probability
5 pages
Project 5 PDF
100% (1)
Project 5 PDF
48 pages
Logistic Regression Example
100% (1)
Logistic Regression Example
22 pages
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
100% (1)
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
16 pages
Python Vs R in Data and Machine Learning PDF
100% (1)
Python Vs R in Data and Machine Learning PDF
6 pages
Homework 2
100% (1)
Homework 2
12 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
33 pages
Case Study 2
100% (1)
Case Study 2
12 pages
Cluster Analysis
No ratings yet
Cluster Analysis
38 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
#MRNotes - Volume 5: January 3 – December 26, 2022
From Everand
#MRNotes - Volume 5: January 3 – December 26, 2022
T.S. Lim
No ratings yet
Compass: Business Software To Digitise and Gamify Incentives For Sales Partners, Gig-Workforce & Support Teams
No ratings yet
Compass: Business Software To Digitise and Gamify Incentives For Sales Partners, Gig-Workforce & Support Teams
21 pages
Brands Want To Reinforce Positive Behaviours Through Rewards
No ratings yet
Brands Want To Reinforce Positive Behaviours Through Rewards
22 pages
Client: A Leading Food Ordering and Delivery Startup
No ratings yet
Client: A Leading Food Ordering and Delivery Startup
1 page
Xoxoday25 Compass One Pager
No ratings yet
Xoxoday25 Compass One Pager
1 page
Some of Our Clients: Brands Distributors Retailers
No ratings yet
Some of Our Clients: Brands Distributors Retailers
3 pages
Shadowfax Connect: Swift Secure Seamless
No ratings yet
Shadowfax Connect: Swift Secure Seamless
6 pages
Shadowfax E2E: Swift Secure Seamless
No ratings yet
Shadowfax E2E: Swift Secure Seamless
6 pages
Shadowfax Next: Swift Secure Seamless
No ratings yet
Shadowfax Next: Swift Secure Seamless
5 pages
Shadowfax44 Shadowfax Brochure ET FMCG Summit Nov 2020 1
No ratings yet
Shadowfax44 Shadowfax Brochure ET FMCG Summit Nov 2020 1
8 pages
Shadowfax For FMCG: Swift Secure Seamless
No ratings yet
Shadowfax For FMCG: Swift Secure Seamless
4 pages
Shadowfax Connect: Swift Secure Seamless
No ratings yet
Shadowfax Connect: Swift Secure Seamless
6 pages
SVKM's NMIMS, School of Technology Management & Engineering MBA (Tech.) VII Semester BADM-Unit II Practice Sheet 02
No ratings yet
SVKM's NMIMS, School of Technology Management & Engineering MBA (Tech.) VII Semester BADM-Unit II Practice Sheet 02
3 pages
Sales Diary39 Case Study - Schemes - Effective Implementation and Communication
No ratings yet
Sales Diary39 Case Study - Schemes - Effective Implementation and Communication
1 page
Nrrs Case Study: Salesdiary
No ratings yet
Nrrs Case Study: Salesdiary
4 pages
Shadowfax Achieves 99% Delivery Success Rate
No ratings yet
Shadowfax Achieves 99% Delivery Success Rate
3 pages
Five Trends That Are Likely To Take Place in FMCG During Covid-19
No ratings yet
Five Trends That Are Likely To Take Place in FMCG During Covid-19
1 page
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
33 pages
Sales Diary32 SalesDiary - Company Profile
No ratings yet
Sales Diary32 SalesDiary - Company Profile
1 page
Salesdiary: Defining and Adherence of Sales Process Call
No ratings yet
Salesdiary: Defining and Adherence of Sales Process Call
3 pages
Sales Diary36 Case Study - SalesDiary For A Cosmetic Brand
No ratings yet
Sales Diary36 Case Study - SalesDiary For A Cosmetic Brand
1 page
Technology For Making Retail Easy - Ezretail Anywhere, Anytime!
No ratings yet
Technology For Making Retail Easy - Ezretail Anywhere, Anytime!
1 page
2 - FMCG Virtual Summit Agenda PDF
No ratings yet
2 - FMCG Virtual Summit Agenda PDF
6 pages
Ba 04
No ratings yet
Ba 04
32 pages
Overview of Salesdiary: Appobile Labs Powered by
No ratings yet
Overview of Salesdiary: Appobile Labs Powered by
5 pages
Presents: Leading Digital Transformer
No ratings yet
Presents: Leading Digital Transformer
44 pages
Marlow Foods Implement SAP S/4HANA With Itelligence As A Platform For The Future
No ratings yet
Marlow Foods Implement SAP S/4HANA With Itelligence As A Platform For The Future
2 pages
Ba 03
No ratings yet
Ba 03
46 pages
Corelation and Regression
No ratings yet
Corelation and Regression
137 pages
Intelligence20 IT Manufacturing-Brochure
No ratings yet
Intelligence20 IT Manufacturing-Brochure
6 pages
Ba 02
No ratings yet
Ba 02
26 pages
165
No ratings yet
165
4 pages
HP GND Air With APU Bleed AIRCOND Air With Air From Packs Simultaneously
No ratings yet
HP GND Air With APU Bleed AIRCOND Air With Air From Packs Simultaneously
1 page
Sharp MX-m260 MX-m310 Service Manual
100% (1)
Sharp MX-m260 MX-m310 Service Manual
137 pages
Albert Einstein Thesis
100% (3)
Albert Einstein Thesis
5 pages
Evolution of Pull Apart Basin
No ratings yet
Evolution of Pull Apart Basin
15 pages
ICN Daily Round Checklist
100% (3)
ICN Daily Round Checklist
1 page
Asking Permission Board Game
No ratings yet
Asking Permission Board Game
3 pages
Yearly Plan Chemistry Form 4
No ratings yet
Yearly Plan Chemistry Form 4
16 pages
Corporate Governance and CEO Innovation: Atl Econ J (2018) 46:43 - 58
No ratings yet
Corporate Governance and CEO Innovation: Atl Econ J (2018) 46:43 - 58
16 pages
Rev 01a - 15ft - 105ft Comm Bin Sidewall
No ratings yet
Rev 01a - 15ft - 105ft Comm Bin Sidewall
94 pages
Ford Ranger
No ratings yet
Ford Ranger
18 pages
Judge The Validity of The Evidence Listened To
No ratings yet
Judge The Validity of The Evidence Listened To
7 pages
BP 3150 Final
No ratings yet
BP 3150 Final
2 pages
A Family Friend
100% (1)
A Family Friend
7 pages
Ref: "Tata Motors Limited"Direct Recruitments Offer
No ratings yet
Ref: "Tata Motors Limited"Direct Recruitments Offer
3 pages
D Brown Transcript
No ratings yet
D Brown Transcript
2 pages
OFDM Simulink
100% (1)
OFDM Simulink
7 pages
PBA-80Tx2500 Hydraulic Press Brake With E200P CNC Controller
No ratings yet
PBA-80Tx2500 Hydraulic Press Brake With E200P CNC Controller
3 pages
IGCSE Coordinated Sciences Practical Tests
0% (1)
IGCSE Coordinated Sciences Practical Tests
6 pages
GT Sudaluk
No ratings yet
GT Sudaluk
2 pages
Dental Consideration in Respiratory Disease
No ratings yet
Dental Consideration in Respiratory Disease
6 pages
Management Accountant Nov 2019
No ratings yet
Management Accountant Nov 2019
124 pages
JEE CC - WK18PT 2023 Indefinite Integrals Questions
No ratings yet
JEE CC - WK18PT 2023 Indefinite Integrals Questions
1 page
Champak Agada
No ratings yet
Champak Agada
16 pages
Complex Number DPP (1 To 6)
100% (1)
Complex Number DPP (1 To 6)
12 pages
Literary Criticism
No ratings yet
Literary Criticism
4 pages
Nghe Full Test 2
No ratings yet
Nghe Full Test 2
8 pages
Jurassic Park RPG JPRPG v2 PDF Free Part 3
No ratings yet
Jurassic Park RPG JPRPG v2 PDF Free Part 3
10 pages

Cluster Analysis

Uploaded by

Cluster Analysis

Uploaded by

1

Linear Regression is used to predict for numerical value.

Logistics Regression is used for prediction of non numerical value

1 14.8 28 11 10.7 12.2

Wednesday, December 09, 2020 8

Wednesday, December 09, 2020 10

Wednesday, December 09, 2020 11

Wednesday, December 09, 2020 12

• Predicting a person's income based on various attributes such as age and

Wednesday, December 09, 2020 13

• The aim of this is to separate similar categories of data and differentiate

• This is done for unstructured datasets where it is up to the machine to

• Clustering is usually one of the first tasks performed in most

• It helps data scientists to analyze individual clusters further

• A cluster refers to a collection of data points aggregated

Wednesday, December 09, 2020 15

Wednesday, December 09, 2020 16

Wednesday, December 09, 2020 17

Wednesday, December 09, 2020 18

Wednesday, December 09, 2020 19

Wednesday, December 09, 2020 20

Wednesday, December 09, 2020 21

Wednesday, December 09, 2020 22

Wednesday, December 09, 2020 23

• A clustering problem can be viewed as unsupervised classification.

• Clustering is appropriate when there is no a priori knowledge about

• Grouping is based on the distance (proximity)

Wednesday, December 09, 2020 25

Wednesday, December 09, 2020 26

Wednesday, December 09, 2020 27

• The field of psychiatry: The

Wednesday, December 09, 2020 29

Wednesday, December 09, 2020 30

Wednesday, December 09, 2020 31

Wednesday, December 09, 2020 32

Wednesday, December 09, 2020 33

Wednesday, December 09, 2020 34

1 14.8 28 11 10.7 12.2

Wednesday, December 09, 2020 35

Wednesday, December 09, 2020 36

• A clustering is a set of clusters

• Important distinction between hierarchical and partitional sets of clusters

• Partitional Clustering[k means clustering]

Wednesday, December 09, 2020 37

• K-means clustering is one of the simplest and popular unsupervised machine

• Data set into K distinct, non-overlapping clusters.

• There is an algorithm that tries to minimize the distance of the points in a

• The main idea is to define k centers, one for each cluster.

Wednesday, December 09, 2020 38

• Every data point is allocated to each of the clusters through reducing

• In other words, the K-means algorithm identifies k number of

Wednesday, December 09, 2020 40

Wednesday, December 09, 2020 42

Wednesday, December 09, 2020 44

Wednesday, December 09, 2020 49

Wednesday, December 09, 2020 50

Wednesday, December 09, 2020 51

Wednesday, December 09, 2020 52

Hierarchical clustering will help to determine the optimal number of clusters.

Wednesday, December 09, 2020 53

• It is most commonly created as an output from hierarchical clustering.

• The dendrogram below shows the hierarchical clustering of six observations shown on

Wednesday, December 09, 2020 55

Wednesday, December 09, 2020 56

You might also like