0% found this document useful (0 votes)

18 views

Cluster Analysis

Uploaded by

nssaini1712

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Cluster Analysis

Uploaded by

nssaini1712

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

Cluster Analysis

Cluster Analysis
• Cluster: a collection of data objects

• Similar to one another within the same cluster

• Dissimilar to the objects in other clusters

• Cluster analysis:- Finding similarities between data according to the

characteristics found in the data and grouping similar data objects into
clusters

• Clustering helps to splits data into several subsets. Each of these subsets
contains data similar to each other, and these subsets are called clusters.
Applications of cluster analysis in data
mining:
• In many applications, clustering analysis is widely used, such as data analysis, market research,

pattern recognition, and image processing.

• It assists marketers to find different groups in their client base and based on the purchasing

patterns. They can characterize their customer groups.

• Clustering is also used in tracking applications such as detection of credit card fraud.

• In terms of biology, It can be used to determine plant and animal taxonomies, categorization of

genes with the same functionalities and gain insight into structure inherent to populations.

• It helps in the identification of areas of similar land that are used in an earth observation database

and the identification of house groups in a city according to house type, value, and geographical

location.
Requirements of Clustering in Data Mining:
• Scalability − We need highly scalable clustering algorithms to deal with large databases.

• Ability to deal with different kinds of attributes − Algorithms should be capable to be applied on any
kind of data such as interval-based (numerical) data, categorical, and binary data.

• Discovery of clusters with attribute shape − The clustering algorithm should be capable of detecting
clusters of arbitrary shape. They should not be bounded to only distance measures that tend to find
spherical cluster of small sizes.

• High dimensionality − The clustering algorithm should not only be able to handle low-dimensional data
but also the high dimensional space.

• Ability to deal with noisy data − Databases contain noisy, missing or erroneous data. Some algorithms
are sensitive to such data and may lead to poor quality clusters.

• Interpretability − The clustering results should be interpretable, comprehensible, and usable.

Types Of Data Used In Cluster Analysis
Are:
• Interval-Scaled variables
• Binary variables
• Nominal, Ordinal,
• Ratio variables
• Variables of mixed types
Types Of Data Used In Cluster Analysis
Are:
• Interval-Scaled Variables- Interval-scaled variables are continuous measurements of a
roughly linear scale.

Typical examples include weight and height, latitude and longitude coordinates and
weather temperature.

• Binary Variables- A binary variable is a variable that can take only 2 values. 0 for absent
and 1 for present variable.

For example, binary variable given to the bike holder, 1 mean customer have bike and 0
means customer don’t have a bike.
• Nominal or Categorical Variables- A generalization of the binary variable in that it
can take more than 2 states, e.g., Map contain more then two colors to indicate
states like red, yellow, blue, green.

• Ordinal Variables- An ordinal variable can be discrete or continuous. In this order

is important, e.g., rank. It can be treated like interval-scaled.

• Ratio-Scaled Intervals- Ratio-scaled variable: It is a positive measurement on a

nonlinear scale, approximately at an exponential scale, such as Ae^Bt or A^e-Bt.

• Variables Of Mixed Type- A database may contain all the six types of variables.
symmetric binary, asymmetric binary, nominal, ordinal, interval, and ratio. And
those combined called as mixed-type variables.
Clustering Methods:

Clustering methods can be classified into the following categories −

• Partitioning Method

• Hierarchical Method

• Density-based Method

• Grid-Based Method

• Model-Based Method

• Constraint-based Method
Partitioning Method:

• Suppose we are given a database of ‘n’ objects and the partitioning

method constructs ‘k’ partition of data. Each partition will represent a
cluster and k ≤ n. It means that it will classify the data into k groups,
which satisfy the following requirements −

• Each group contains at least one object.

• Each object must belong to exactly one group.

Hierarchical Methods:
• This method creates a hierarchical decomposition of the given set of data objects. We can classify

hierarchical methods on the basis of how the hierarchical decomposition is formed. There are two

approaches here − 1. Agglomerative Approach 2. Divisive Approach

• Agglomerative Approach- This approach is also known as the bottom-up approach. In this, we

start with each object forming a separate group. It keeps on merging the objects or groups that

are close to one another. It keep on doing so until all of the groups are merged into one.

• Divisive Approach- This approach is also known as the top-down approach. In this, we start with

all of the objects in the same cluster. a cluster is split up into smaller clusters. It is down until each

object in one cluster. once a merging or splitting is done, it can never be undone.
Density-based Method:

• This method is based on the notion of density. The basic idea is to

continue growing the given cluster as long as the density in the
neighborhood exceeds some threshold, i.e., for each data point within
a given cluster, the radius of a given cluster has to contain at least a
minimum number of points.
Grid-based Method:

• In this, the objects together form a grid. The object space is divide
into finite number of cells that form a grid structure.

Advantages

• The major advantage of this method is fast processing time.

• It is dependent only on the number of cells in each dimension in the

quantized space.
Model-based methods:

• In this method, a model of cluster is find the best fit data for a given
model. This method locates the clusters by clustering the density
function. It reflects spatial distribution of the data points.

• This method also provides a way to automatically determine the

number of clusters based on standard statistics, taking outlier or noise
into account.
Constraint-based Method:

• In this method, the clustering is performed by the application-

oriented constraints. A constraint refers to the user expectation
or the properties of clustering results. Constraints provide us
with an interactive way of communication with the clustering
process. Constraints can be specified by the user or the
application requirement.
Examples of Clustering Applications
• Marketing: Help marketers discover distinct groups in their customer bases, and then
use this knowledge to develop targeted marketing programs

• Land use: Identification of areas of similar land use in an earth observation database

• Insurance: Identifying groups of motor insurance policy holders with a high average
claim cost

• City-planning: Identifying groups of houses according to their house type, value, and
geographical location

• Earth-quake studies: Observed earth quake epicenters should be clustered along

continent faults
Data Mining Applications
Data Mining Applications

Here is the list of areas where data mining is widely used −

• Financial Data Analysis

• Retail Industry

• Telecommunication Industry

• Biological Data Analysis

Financial Data Analysis

• Design and construction of data warehouses for multidimensional

data analysis and data mining.

• Loan payment prediction and customer credit policy analysis.

• Classification and clustering of customers for targeted marketing.

• Detection of money laundering and other financial crimes.

Retail Industry

• Design and Construction of data warehouses based on the benefits of

data mining.

• Multidimensional analysis of sales, customers, products, time and

region.

• Analysis of effectiveness of sales campaigns.

• Product recommendation and cross-referencing of items.

Telecommunication Industry

• Multidimensional Analysis of Telecommunication data.

• Fraud pattern analysis.

• Identification of unusual patterns.

• Multidimensional association and sequential patterns analysis.

• Mobile Telecommunication services.

• Use of visualization tools in telecommunication data analysis.

Biological Data Analysis

• Semantic integration of heterogeneous, distributed genomic databases.

• Alignment, indexing, similarity search and comparative analysis multiple

nucleotide sequences.

• Discovery of structural patterns and analysis of genetic networks and protein

pathways.

• Association and path analysis.

• Visualization tools in genetic data analysis.

Trends in Data Mining
• Application Exploration.

• Integration of data mining with database systems, data warehouse systems and web database systems.

• Standardization of data mining query language.

• Web mining.

• Biological data mining.

• Data mining and software engineering.

• Distributed data mining.

• Real time data mining.

• Privacy protection and information security in data mining.

Web mining
• Web Mining is the process of Data Mining techniques to extract information
from Web documents and services.

• The main purpose of web mining is to discover useful information from the
World Wide Web and its usage patterns.

Web mining is further divided into three different types

• Web content mining

• Web structure mining

• Web usage mining

Web content mining
• Web content mining is the extracting useful information from the content of the web
documents.

• Web content consist of several types of data – text, image, audio, video etc.

• It can provide effective and interesting patterns about user needs.

• Text documents are related to text mining, machine learning and natural language
processing.

• This mining is also known as text mining. This type of mining performs scanning and
mining of the text, images and groups of web pages according to the content of the
input.
Web Structure Mining
• Web structure mining is the discovering structure information from
the web.

• The structure of the web consists of web pages as nodes, and

hyperlinks as edges connecting related pages.

• Structure mining basically shows the structured summary of a

particular website. It identifies relationship between web pages linked
by information or direct link connection.
Web Usage Mining

• Web usage mining is the identifying or discovering interesting usage

patterns from large data sets.

• These patterns enable you to understand the user behaviors.

• In web usage mining, user access data on the web and collect data in
form of logs. So, Web usage mining is also called log mining.
Applications of Web Mining
• Personalized marketing
• E-commerce
• Search engine optimization
• Fraud detection
• Web content analysis
• Customer service
• Healthcare
Text Data Mining

• Text data mining can be described as the process of extracting

essential data from standard language text.

• All the data that we generate via text messages, documents, emails,
files are written in common language text.
Areas of text mining in data mining:
• Information Extraction: The automatic extraction of structured data such as
entities, relationships, and attributes describing entities from an unstructured
source is called information extraction.

• Natural Language Processing: NLP stands for Natural language processing.

Computer software can understand human language as same as it is spoken. NLP
is primarily a component of artificial intelligence(AI).

• Data Mining: Data mining refers to the extraction of useful data, hidden patterns
from large data sets. Data mining tools can predict behaviors and future trends.

• Information Retrieval: Information retrieval deals with retrieving useful data from
data that is stored in our systems.
Text Mining Applications
• Digital Library: Various text mining strategies and tools are being used to get the pattern and
trends from journal and proceedings which is stored in text database.

• Academic and Research Field: In the education field, different text-mining tools and strategies
are utilized to examine the instructive patterns in a specific region/research field.

• Life Science: Life science and healthcare industries are producing textual and mathematical data
regarding patient records, sicknesses, medicines, symptoms, and treatments of diseases, etc.

• Social-Media: Text mining analyzing web-based media applications to monitor and investigate
online content like the plain text from internet news, web journals, emails, blogs, etc.

• Business Intelligence: Text mining plays an important role in business intelligence that help
different organization and enterprises to analyze their customers and competitors to make better
decisions.
Advantages of Text Mining

• Large Amounts of Data: Text mining allows organizations to extract

data from large amounts of unstructured text data.

• Variety of Applications: Text mining has a wide range of applications,

including sentiment analysis, entity recognition, and topic modeling.

• Improved Decision Making

• Cost-effective: Text mining can be a cost-effective way, as it eliminates

the need for manual data entry.
Difference between spatial and Temporal data
mining
Spatial Data Mining Temporal Data Mining
Spatial data mining refers to the extraction of temporal data mining refers to the process of extraction
knowledge, spatial relationships and interesting patterns of knowledge about the occurrence of an event whether
that are not specifically stored in a spatial database. they follow, random, cyclic, seasonal variation, etc

It needs space. It needs time.

Primarily, it deals with spatial data such as location, geo- Primarily, it deals with temporal content, form a huge set
referenced. of data.

It involves characteristic rules, evaluation rules, and It targets mining new patterns and unknown knowledge,
association rules. which takes the temporal aspects of data.

Examples: Finding hotspots, unusual locations. Examples: An association rules which seems - "Any
person who buys motorcycle also buys helmet". By
temporal aspect, this rule would be - "Any person who
buys a motorcycle also buy a helmet after that."
Rough Set Theory

• It is a formal theory derived from fundamental research on logical

properties of information systems.

• Rough set theory has been a methodology of database mining or

knowledge discovery in relational databases.

• We can use rough set approach to discover structural relationship

within imprecise and noisy data.
Basic problems in data analysis solved by
Rough Set:
• Characterization of a set of objects in terms of attribute values.

• Finding dependency between the attributes.

• Reduction of superfluous attributes.

• Finding the most significant attributes.

• Decision rule generation.

Goals of Rough Set Theory
• The main goal of the rough set analysis is the induction of (learning)
approximations of concepts. Rough sets work on basis of KDD. It offers
mathematical tools to discover hidden patterns in data.

• It can be used for feature selection, feature extraction, data reduction,

decision rule generation, and pattern extraction (templates, association
rules) etc.

• Identifies partial or total dependencies in data, eliminates redundant data,

gives approach to null values, missing data, dynamic data and others.

The Story of Prometheus and Epimetheus
No ratings yet
The Story of Prometheus and Epimetheus
3 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
DWDM - Unit - VI
No ratings yet
DWDM - Unit - VI
38 pages
DATA_MINING_UNIT-4
No ratings yet
DATA_MINING_UNIT-4
15 pages
MODULE-V
No ratings yet
MODULE-V
16 pages
CLUSTER ANALYSIS unit 3 Data mining
No ratings yet
CLUSTER ANALYSIS unit 3 Data mining
84 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
UNIT-4
No ratings yet
UNIT-4
106 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
9 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
Iv Unit DM
No ratings yet
Iv Unit DM
26 pages
Unit 5
No ratings yet
Unit 5
27 pages
Dmbi Unit-4
No ratings yet
Dmbi Unit-4
18 pages
UNIT 3 DWDM Notes
No ratings yet
UNIT 3 DWDM Notes
32 pages
Screenshot 2024-05-17 at 3.30.05 PM
No ratings yet
Screenshot 2024-05-17 at 3.30.05 PM
31 pages
Chap8-Cluster Analysis
No ratings yet
Chap8-Cluster Analysis
103 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
DM MODULE 4
No ratings yet
DM MODULE 4
17 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
Unit-3 DWDM 7TH Sem Cse
No ratings yet
Unit-3 DWDM 7TH Sem Cse
54 pages
Clustering
No ratings yet
Clustering
6 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
DM Unit 5
No ratings yet
DM Unit 5
15 pages
Clustering Unit4
No ratings yet
Clustering Unit4
9 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Cluster Analysis-Unit 4
No ratings yet
Cluster Analysis-Unit 4
7 pages
Unit-IV Cluster Outlier Analysis
No ratings yet
Unit-IV Cluster Outlier Analysis
21 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
37 pages
Lecture 3.2.1 3.2.2
No ratings yet
Lecture 3.2.1 3.2.2
28 pages
E-Note_28966_Content_Document_20241211091351PM
No ratings yet
E-Note_28966_Content_Document_20241211091351PM
69 pages
17 GM ASAP Data Mining - Clustering
No ratings yet
17 GM ASAP Data Mining - Clustering
107 pages
Chap8-Cluster Analysis
No ratings yet
Chap8-Cluster Analysis
78 pages
Clustering new
No ratings yet
Clustering new
6 pages
1.1 Project Overview: Data Mining
No ratings yet
1.1 Project Overview: Data Mining
74 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
A06-A Survey of Clustering Techniques
No ratings yet
A06-A Survey of Clustering Techniques
5 pages
Unit 4
No ratings yet
Unit 4
5 pages
UNIT 4 Clustering and Applications
No ratings yet
UNIT 4 Clustering and Applications
5 pages
Assignment 4
No ratings yet
Assignment 4
40 pages
DM Cluster Analysis
No ratings yet
DM Cluster Analysis
3 pages
05. UNIT-V(DMWH6EM)
No ratings yet
05. UNIT-V(DMWH6EM)
30 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
Clustering
No ratings yet
Clustering
8 pages
Unit 4
No ratings yet
Unit 4
4 pages
Unit 2 - Introduction to Cluster Analysis
No ratings yet
Unit 2 - Introduction to Cluster Analysis
53 pages
DWDM Lecture Notes U-5
No ratings yet
DWDM Lecture Notes U-5
26 pages
DWMModule 4 (1) (1) (1)
No ratings yet
DWMModule 4 (1) (1) (1)
31 pages
Custer Analysis: Prepared by Navin Ninama
No ratings yet
Custer Analysis: Prepared by Navin Ninama
20 pages
Unit 4
No ratings yet
Unit 4
21 pages
Data Mining Unit-4
No ratings yet
Data Mining Unit-4
27 pages
Unit 5 DWM by DR KSR Cluster Analysis
No ratings yet
Unit 5 DWM by DR KSR Cluster Analysis
72 pages
Clustering Full 1
No ratings yet
Clustering Full 1
98 pages
Clustering
No ratings yet
Clustering
34 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
70 pages
UG BSF Clustering
No ratings yet
UG BSF Clustering
119 pages
Clustering
No ratings yet
Clustering
51 pages
2023112069310_29501Clustering In Data Mining Process
No ratings yet
2023112069310_29501Clustering In Data Mining Process
3 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Expert System Assignment
No ratings yet
Expert System Assignment
4 pages
Project Management Assignment
No ratings yet
Project Management Assignment
3 pages
Training Method Multi Layer Neural Network 10th Standard
No ratings yet
Training Method Multi Layer Neural Network 10th Standard
2 pages
Perceptron Explanation 8th Standard
No ratings yet
Perceptron Explanation 8th Standard
2 pages
Artificial Intelligence Sample Notes
No ratings yet
Artificial Intelligence Sample Notes
10 pages
Joinpdf 13722898319890707590
No ratings yet
Joinpdf 13722898319890707590
22 pages
Kimberly Feeny - Resume
No ratings yet
Kimberly Feeny - Resume
3 pages
Case Study OB2
No ratings yet
Case Study OB2
1 page
PN en 50341 3 2002 Ac 2006 U Kolor
No ratings yet
PN en 50341 3 2002 Ac 2006 U Kolor
352 pages
Project Final
No ratings yet
Project Final
36 pages
14 - Waves General Waves and Wave Intensity - 14
No ratings yet
14 - Waves General Waves and Wave Intensity - 14
4 pages
Water Rocket Report
0% (1)
Water Rocket Report
25 pages
Funke Shell Tube He e
No ratings yet
Funke Shell Tube He e
24 pages
Rizal and Other Heores and Heroines
100% (1)
Rizal and Other Heores and Heroines
37 pages
Staff Development Programme Paeds
100% (1)
Staff Development Programme Paeds
62 pages
Application of Laplace Transform For Solving Improper Integrals Whose Integrand Consisting Error Function
No ratings yet
Application of Laplace Transform For Solving Improper Integrals Whose Integrand Consisting Error Function
8 pages
The Relationship Between Mental Health and Job Performance
No ratings yet
The Relationship Between Mental Health and Job Performance
7 pages
Crossword Puzzle - Compressed
No ratings yet
Crossword Puzzle - Compressed
33 pages
Digital_Transformation_in_Vocational_Education_Cha
No ratings yet
Digital_Transformation_in_Vocational_Education_Cha
8 pages
@MedicalBooksStore 2010 the Art
No ratings yet
@MedicalBooksStore 2010 the Art
3,224 pages
Deber 8io 2
No ratings yet
Deber 8io 2
7 pages
History Websites List
No ratings yet
History Websites List
6 pages
Case Study Pneumatic
100% (7)
Case Study Pneumatic
16 pages
TUV Rheinland Whitepaper Solar Impact of Spectral Irradiance On Energy Yield EN
No ratings yet
TUV Rheinland Whitepaper Solar Impact of Spectral Irradiance On Energy Yield EN
7 pages
Philippine Culture and Tourism Geography
No ratings yet
Philippine Culture and Tourism Geography
7 pages
I Am Sharing 'Inbde Notes Full ' With You
No ratings yet
I Am Sharing 'Inbde Notes Full ' With You
2,183 pages
AlGhazali and Logic Islamic Law
No ratings yet
AlGhazali and Logic Islamic Law
15 pages
Modul 3 English For Sosial Communications
No ratings yet
Modul 3 English For Sosial Communications
5 pages
Part 1 - Lesson Plan
No ratings yet
Part 1 - Lesson Plan
4 pages
Budget Defense Powerpoint
No ratings yet
Budget Defense Powerpoint
17 pages
10.21307 - Aoj 2020 021
No ratings yet
10.21307 - Aoj 2020 021
11 pages
Forty Years of Fanger's Model of Thermal Comfort Comfort For All PDF
No ratings yet
Forty Years of Fanger's Model of Thermal Comfort Comfort For All PDF
20 pages
Horizontal and Vertical Curves: Topics
No ratings yet
Horizontal and Vertical Curves: Topics
37 pages
The Man in the Machine 1st Edition Marvin Mudrick - The latest ebook edition with all chapters is now available
100% (1)
The Man in the Machine 1st Edition Marvin Mudrick - The latest ebook edition with all chapters is now available
54 pages
1.20 Teacher Toolbox - Functional Decomposition
No ratings yet
1.20 Teacher Toolbox - Functional Decomposition
3 pages