Handout 2 Data Mining
Handout 2 Data Mining
Slide - 1
Slide - 2
Data Mining
Techniques
Useful Data
Slide - 3
1
11/30/2024
Data Mining
• Data mining is focused on better understanding of
characteristics and patterns among variables in large
databases using a variety of statistical and analytical tools.
– It is used to identify relationships among variables in
large data sets and understand hidden patterns that
they may contain.
Slide - 4
Slide - 5
Slide - 6
2
11/30/2024
Data Mining
• Data mining can be considered part descriptive and part
prescriptive analytics.
• In descriptive analytics, data-mining tools help analysts to
identify patterns in data.
• Excel charts and PivotTables, for example, are useful tools
for describing patterns and analyzing data sets; however,
they require manual intervention.
• Regression analysis and forecasting models help us to
predict relationships or future values of variables of
interest.
Slide - 7
11-8
Slide - 8
3
11/30/2024
Slide - 10
Slide - 11
Slide - 12
4
11/30/2024
11-13
Slide - 13
Slide - 14
Slide - 15
5
11/30/2024
Slide - 16
11-17
Slide - 17
Slide - 18
6
11/30/2024
Slide - 19
Statistics AI ML
It include: Different AI algo’s It include:
1. Cluster Techniques 1. KNN algo
2. Regression 2. Apriori algo
3. Classification 3. K mean algo
4. Segmentation 4. Naïve bayes algo
Slide - 21
7
11/30/2024
Slide - 22
Cluster Analysis
• Cluster analysis, also called data segmentation, is a
collection of techniques that seek to group or segment a
collection of objects (observations or records) into subsets
or clusters, such that those within each cluster are more
closely related to one another than objects assigned to
different clusters.
– The objects within clusters should exhibit a high
amount of similarity, whereas those in different clusters
will be dissimilar.
Slide - 23
Clustering Methods
• Hierarchical clustering
– Agglomerative
clustering methods,
which proceed by series
of fusions of the n
objects into groups.
– Divisive clustering
methods, which
separate n objects
successively into finer
groupings.
Slide - 24
8
11/30/2024
Slide - 28
Dendogram
• Visualization of the clustering process. The y-axis
measures the intercluster distance. A dendogram shows
the sequence in which clusters are formed as you move up
the diagram.
Slide - 33
Classification
• Classification methods seek to classify a categorical
outcome into one of two or more categories based on
various data attributes.
• For each record in a database, we have a categorical
variable of interest and a number of additional predictor
variables.
• For a given set of predictor variables, we would like to
assign the best value of the categorical variable.
Slide - 34
9
11/30/2024
Classification Techniques
• k-Nearest Neighbors (k-NN) Algorithm
– Finds records in a database that have similar numerical
values of a set of predictor variables.
• Discriminant Analysis
– Uses predefined classes based on a set of linear
discriminant functions of the predictor variables.
Slide - 42
Slide - 43
Slide - 44
10
11/30/2024
Discriminant Analysis
• Discriminant analysis is a technique for classifying a set
of observations into predefined classes. The purpose is to
determine the class of an observation based on a set of
predictor variables.
• With only two classification groups, we can apply
regression analysis. Unfortunately, when there are more
than two, linear regression cannot be applied, and special
software must be used.
Slide - 47
Slide - 51
Cause-and-Effect Modeling
• Correlation analysis can help us develop cause-and-effect
models that relate lagging and leading measures.
– Lagging measures tell us what has happened and are
often external business results such as profit, market
share, or customer satisfaction.
– Leading measures predict what will happen and are
usually internal metrics such as employee satisfaction,
productivity, and turnover.
Slide - 57
11
11/30/2024
Slide - 60
12
11/30/2024
Slide - 63
Slide - 64
Slide - 65
13
11/30/2024
Slide - 66
Slide - 68
14
11/30/2024
Slide - 69
Slide - 70
Slide - 71
15
11/30/2024
Slide - 72
Slide - 73
Slide - 74
16