0% found this document useful (0 votes)

20 views

Advanced Mining Techniques

Uploaded by

naackrmu2023

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Advanced Mining Techniques

Uploaded by

naackrmu2023

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Advanced Mining Techniques

What is statistical analysis?

Statistical analysis, or statistics, involves collecting, organizing and analyzing data based on established
principles to identify patterns and trends. It is a broad discipline with applications in academia, business, the
social sciences, genetics, population studies, engineering and several other fields. Statistical analysis has
several functions. You can use it to make predictions, perform simulations, create models, reduce risk and
identify trends.

Thanks to improving technology, many organizations now have vast amounts of data on every aspect of their
operations and markets. To make sense of this data, businesses rely on statistical analysis techniques to
organize their data and turn this information into tools for making precise decisions and long-term forecasts.
Statistical analysis allows owners of data to perform business intelligence functions that solidify their
competitive advantage, improve efficiency and optimize resources for maximum returns on investments.

Main types of statistical analysis

There are three major types of statistical analysis:

Descriptive statistical analysis

Descriptive statistics is the simplest form of statistical analysis, using numbers to describe the qualities of a
data set. It helps reduce large data sets into simple and more compact forms for easy interpretation. You can
use descriptive statistics to summarize the data from a sample or represent a whole sample in a research
population. Descriptive statistics uses data visualization tools such as tables, graphs and charts to make
analysis and interpretation easier. However, descriptive statistics is not suitable for making conclusions. It can
only represent data so you can apply more sophisticated statistical analysis tools to draw inferences.

Inferential statistical analysis

Inferential statistical analysis is used to make inferences or draw conclusions about a larger population based
on findings from a sample group within it. It can help researchers to find distinctions among groups present
within a sample. Inferential statistics is also used to validate generalizations made about a population from a
sample due to its ability to account for errors in conclusions made about a segment of a larger group.

Associational statistical analysis

Associational statistics is a tool researchers use to make predictions and find causation. They use it to find
relationships among multiple variables. It is also used to determine whether researchers can make inferences
and predictions about a data set from the characteristics of another set of data. Associational statistics is the
most advanced type of statistical analysis and requires sophisticated software tools for performing high-level
mathematical calculations. To measure association, researchers use a wide range of coefficients of variation,
including correlation and regression analysis.

Association Rule
Association rule mining finds interesting associations and relationships among large sets of data items. This
rule shows how frequently a itemset occurs in a transaction. A typical example is a Market Based Analysis.

Market Based Analysis is one of the key techniques used by large relations to show associations between
items.It allows retailers to identify relationships between the items that people buy together frequently.
Association rule learning can be divided into three types of algorithms:

1. Apriori

2. Eclat

3. F-P Growth Algorithm

How does Association Rule Learning work?

Association rule learning works on the concept of If and Else Statement, such as if A then B.

Here the If element is called antecedent, and then statement is called as Consequent. These types of
relationships where we can find out some association or relation between two items is known as single
cardinality. It is all about creating rules, and if the number of items increases, then cardinality also increases
accordingly. So, to measure the associations between thousands of data items, there are several metrics. These
metrics are given below:

● Support

● Confidence

● Lift

Support is the frequency of A or how frequently an item appears in the dataset. It is defined as the fraction of
the transaction T that contains the itemset X. If there are X datasets, then for transactions T, it can be written
as:
Confidence

Confidence indicates how often the rule has been found to be true. Or how often the items X and Y occur
together in the dataset when the occurrence of X is already given. It is the ratio of the transaction that contains
X and Y to the number of records that contain X.

Lift

It is the strength of any rule, which can be defined as below formula:

What is Cluster Analysis?

Cluster analysis is a multivariate data mining technique whose goal is to groups objects (eg., products,
respondents, or other entities) based on a set of user selected characteristics or attributes. It is the basic and
most important step of data mining and a common technique for statistical data analysis, and it is used in many
fields such as data compression, machine learning, pattern recognition, information retrieval etc.

Types of Cluster Analysis

The clustering algorithm needs to be chosen experimentally unless there is a mathematical reason to choose
one cluster method over another.It should be noted that an algorithm that works on a particular set of data will
not work on another set of data. There are a number of different methods to perform cluster analysis.

Hierarchical Cluster Analysis

In this method, first, a cluster is made and then added to another cluster (the most similar and closest one) to
form one single cluster. This process is repeated until all subjects are in one cluster. This particular method is
known as Agglomerative method. Agglomerative clustering starts with single objects and starts grouping them
into clusters.

The divisive method is another kind of Hierarchical method in which clustering starts with the complete data
set and then starts dividing into partitions.

Centroid-based Clustering

In this type of clustering, clusters are represented by a central entity, which may or may not be a part of the
given data set. K-Means method of clustering is used in this method, where k are the cluster centers and objects
are assigned to the nearest cluster centres.

Distribution-based Clustering

It is a type of clustering model closely related to statistics based on the modals of distribution. Objects that
belong to the same distribution are put into a single cluster.This type of clustering can capture some complex
properties of objects like correlation and dependence between attributes.
Density-based Clustering

In this type of clustering, clusters are defined by the areas of density that are higher than the remaining of the
data set. Objects in sparse areas are usually required to separate clusters.The objects in these sparse points are
usually noise and border points in the graph.The most popular method in this type of clustering is DBSCAN.
Cluster Analysis
Cluster Analysis is the process to find similar groups of objects in order to form clusters.It is an unsupervised
machine learning-based algorithm that acts on unlabelled data. A group of data points would comprise together
to form a cluster in which all the objects would belong to the same group.

Cluster:
The given data is divided into different groups by combining similar objects into a group. This group is
nothing but a cluster. A cluster is nothing but a collection of similar data which is grouped together.

Properties of Clustering :
1. Clustering Scalability: Nowadays there is a vast amount of data and should be dealing with huge databases.
In order to handle extensive databases, the clustering algorithm should be scalable. Data should be scalable if it
is not scalable, then we can’t get the appropriate result and would lead to wrong results.

2. High Dimensionality: The algorithm should be able to handle high dimensional space along with the data of
small size.

3. Algorithm Usability with multiple data kinds: Different kinds of data can be used with algorithms of
clustering. It should be capable of dealing with different types of data like discrete, categorical and interval-
based data, binary data etc.

4. Dealing with unstructured data: These would be some databases that contain missing values, noisy or
erroneous data. If the algorithms are sensitive to such data then it may lead to poor quality clusters. So it
should be able to handle unstructured data give it some structure to the data by organizing it into groups of
similar data objects. This makes the job of the data expert easier in order to process the data and discover new
patterns.

5. Interpretability: The outcomes of clustering should be interpretable, comprehensible, and usable. The
interpretability reflects how easily the data is understood.

Clustering Methods:
The clustering methods can be classified into the following categories:

● Partitioning Method
● Hierarchical Method
● Density-based Method
● Grid-Based Method
● Model-Based Method
● Constraint-based Method

Applications of Cluster Analysis

● Clustering analysis is broadly used in many applications such as market research, pattern
recognition, data analysis, and image processing.
● Clustering can also help marketers discover distinct groups in their customer base. And they can
characterize their customer groups based on the purchasing patterns.
● In the field of biology, it can be used to derive plant and animal taxonomies, categorize genes
with similar functionalities and gain insight into structures inherent to populations.
● Clustering also helps in identification of areas of similar land use in an earth observation
database. It also helps in the identification of groups of houses in a city according to house type,
value, and geographic location.
● Clustering also helps in classifying documents on the web for information discovery.

● Clustering is also used in outlier detection applications such as detection of credit card fraud.

● As a data mining function, cluster analysis serves as a tool to gain insight into the distribution of
data to observe characteristics of each cluster.

Requirements of Clustering in Data Mining

The following points throw light on why clustering is required in data mining −

● Scalability − We need highly scalable clustering algorithms to deal

with large databases.
● Ability to deal with different kinds of attributes − Algorithms
should be capable to be applied on any kind of data such as interval-
based (numerical) data, categorical, and binary data.
● Discovery of clusters with attribute shape − The clustering
algorithm should be capable of detecting clusters of arbitrary
shape. They should not be bounded to only distance measures that
tend to find spherical cluster of small sizes.
● High dimensionality − The clustering algorithm should not only be
able to handle low-dimensional data but also the high dimensional
space.
● Ability to deal with noisy data − Databases contain noisy, missing or
erroneous data. Some algorithms are sensitive to such data and may
lead to poor quality clusters.
● Interpretability − The clustering results should be interpretable,
comprehensible, and usable.

OM-ACM9443 Rev B
No ratings yet
OM-ACM9443 Rev B
52 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
DM MODULE 4
No ratings yet
DM MODULE 4
17 pages
data mining 5
No ratings yet
data mining 5
39 pages
Cluster Analysis
No ratings yet
Cluster Analysis
3 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
15 pages
Dmbi Unit-4
No ratings yet
Dmbi Unit-4
18 pages
17 GM ASAP Data Mining - Clustering
No ratings yet
17 GM ASAP Data Mining - Clustering
107 pages
Clustering
No ratings yet
Clustering
8 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
8 Data Mining Algorithms
No ratings yet
8 Data Mining Algorithms
8 pages
Data Mining - Docx Ghhdocx
No ratings yet
Data Mining - Docx Ghhdocx
6 pages
Cluster Analysis (1)- Rmm
No ratings yet
Cluster Analysis (1)- Rmm
17 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
37 pages
Clustering
No ratings yet
Clustering
6 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
MODULE-V
No ratings yet
MODULE-V
16 pages
Clustering Agglo Devisive DBSCAN
No ratings yet
Clustering Agglo Devisive DBSCAN
78 pages
DM Unit 5
No ratings yet
DM Unit 5
15 pages
DM UNIT-5 NOTES
No ratings yet
DM UNIT-5 NOTES
16 pages
A06-A Survey of Clustering Techniques
No ratings yet
A06-A Survey of Clustering Techniques
5 pages
Data Mining Clustering Techniques
No ratings yet
Data Mining Clustering Techniques
3 pages
Lec 02
No ratings yet
Lec 02
33 pages
Assignment 4
No ratings yet
Assignment 4
40 pages
Unit 5
No ratings yet
Unit 5
27 pages
Unit 4
No ratings yet
Unit 4
4 pages
Unit 5 Clustering-2
No ratings yet
Unit 5 Clustering-2
28 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
Lectures 5 and 6 - Data Anaysis in Management - MBM
No ratings yet
Lectures 5 and 6 - Data Anaysis in Management - MBM
61 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
Data Mining Implementation
No ratings yet
Data Mining Implementation
9 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
13 pages
Rangkuman Data Analitik Dan Big Data
No ratings yet
Rangkuman Data Analitik Dan Big Data
10 pages
Unit 2 (DWDM)
No ratings yet
Unit 2 (DWDM)
54 pages
Screenshot 2024-05-17 at 3.30.05 PM
No ratings yet
Screenshot 2024-05-17 at 3.30.05 PM
31 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
DM Unit-1
No ratings yet
DM Unit-1
27 pages
Unit 4
No ratings yet
Unit 4
5 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
Unit 5
No ratings yet
Unit 5
9 pages
Yihao Final Paper CCSC for Submission
No ratings yet
Yihao Final Paper CCSC for Submission
6 pages
Unit-3 DWDM 7TH Sem Cse
No ratings yet
Unit-3 DWDM 7TH Sem Cse
54 pages
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
Introduction To Data Mining For Business Analytics
No ratings yet
Introduction To Data Mining For Business Analytics
51 pages
unit 3 BI & Data science (1)
No ratings yet
unit 3 BI & Data science (1)
19 pages
YEAH
No ratings yet
YEAH
2 pages
Unit 4
No ratings yet
Unit 4
40 pages
1.1 Project Overview: Data Mining
No ratings yet
1.1 Project Overview: Data Mining
74 pages
Unit 2 - Introduction to Cluster Analysis
No ratings yet
Unit 2 - Introduction to Cluster Analysis
53 pages
ML Unit 3
No ratings yet
ML Unit 3
17 pages
Data Warhouse
No ratings yet
Data Warhouse
5 pages
DBB2102 QUANTITATIVE TECHNIQUES FOR MANAGEMENT (18)
No ratings yet
DBB2102 QUANTITATIVE TECHNIQUES FOR MANAGEMENT (18)
12 pages
Unit 4
No ratings yet
Unit 4
15 pages
Predictive Analysis 5
No ratings yet
Predictive Analysis 5
8 pages
Chap8-Cluster Analysis
No ratings yet
Chap8-Cluster Analysis
103 pages
Unit 2
No ratings yet
Unit 2
37 pages
Data Analytics And Business Intelligence NOTES (1)
No ratings yet
Data Analytics And Business Intelligence NOTES (1)
37 pages
Data Mining - Detailed - Simple Terms
No ratings yet
Data Mining - Detailed - Simple Terms
9 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Airstream 2008
No ratings yet
Airstream 2008
5 pages
Student Honor Congratulatory Letter
No ratings yet
Student Honor Congratulatory Letter
6 pages
SM Case
No ratings yet
SM Case
12 pages
Abstract For Flood Control System
0% (1)
Abstract For Flood Control System
3 pages
Seca-777-Digital Scale With EMR - 240113 - 095504
No ratings yet
Seca-777-Digital Scale With EMR - 240113 - 095504
2 pages
Odb August 28 2023
No ratings yet
Odb August 28 2023
15 pages
AWH Mixer Technology
No ratings yet
AWH Mixer Technology
32 pages
Comparison Between ECA-1995 & ECA - 2023
No ratings yet
Comparison Between ECA-1995 & ECA - 2023
3 pages
PESTEL Analysis
No ratings yet
PESTEL Analysis
23 pages
Unit 11
No ratings yet
Unit 11
5 pages
Office Hours With Warren Buffett May 2013
No ratings yet
Office Hours With Warren Buffett May 2013
11 pages
Internship Reflection Paper
No ratings yet
Internship Reflection Paper
7 pages
Analysis of Consumer Preferences in The Malls of Delhi NCR - FP
No ratings yet
Analysis of Consumer Preferences in The Malls of Delhi NCR - FP
14 pages
Wiley CPAexcel - REG - Exam Plan Review
No ratings yet
Wiley CPAexcel - REG - Exam Plan Review
7 pages
Discourse Community Analysis Assignment Part 2
No ratings yet
Discourse Community Analysis Assignment Part 2
4 pages
Bootstrapping Definition - Investopedia
No ratings yet
Bootstrapping Definition - Investopedia
3 pages
MCQ
0% (1)
MCQ
87 pages
1.3.9 Practice - Complete Your Assignment (Practice)
No ratings yet
1.3.9 Practice - Complete Your Assignment (Practice)
5 pages
swapnaraj park quote ofredevlopment
No ratings yet
swapnaraj park quote ofredevlopment
14 pages
3 Patents
No ratings yet
3 Patents
4 pages
Relience Vacancy
No ratings yet
Relience Vacancy
1 page
Towage and Salvage
No ratings yet
Towage and Salvage
5 pages
Assignment Cri 215
No ratings yet
Assignment Cri 215
1 page
T, Arivandandam Vs T.V. Satyapal (1977) 4 SCC 467
No ratings yet
T, Arivandandam Vs T.V. Satyapal (1977) 4 SCC 467
5 pages
Download ebooks file Encyclopedia of solid earth geophysics 2nd Edition Harsh K. Gupta (Editor) all chapters
100% (4)
Download ebooks file Encyclopedia of solid earth geophysics 2nd Edition Harsh K. Gupta (Editor) all chapters
50 pages
Test 2 Practice Questions
No ratings yet
Test 2 Practice Questions
8 pages
Trade License Dashboard
No ratings yet
Trade License Dashboard
46 pages
Lab 5 Enthalpy of Vaporization
No ratings yet
Lab 5 Enthalpy of Vaporization
4 pages
BTEC HND Assignment Help: Small Business Enterprise
No ratings yet
BTEC HND Assignment Help: Small Business Enterprise
25 pages

Advanced Mining Techniques

Uploaded by

Advanced Mining Techniques

Uploaded by

Advanced Mining Techniques

What is statistical analysis?

Main types of statistical analysis

There are three major types of statistical analysis:

Descriptive statistical analysis

Inferential statistical analysis

Associational statistical analysis

3. F-P Growth Algorithm

How does Association Rule Learning work?

It is the strength of any rule, which can be defined as below formula:

What is Cluster Analysis?

Types of Cluster Analysis

Hierarchical Cluster Analysis

Applications of Cluster Analysis

Requirements of Clustering in Data Mining

● Scalability − We need highly scalable clustering algorithms to deal

You might also like