0% found this document useful (0 votes)

88 views

Dmbi Unit-3

The document discusses data mining and the KDD (Knowledge Discovery in Databases) process. It defines KDD as the overall process of discovering useful knowledge from data. The KDD process involves data preparation, applying data mining algorithms to extract patterns, evaluating and interpreting the results. It outlines the typical 9 steps in the KDD process, including data cleaning, feature reduction, selecting an algorithm, mining patterns, and interpreting results. The document also discusses how data mining is used in business contexts like marketing, manufacturing and quality improvement to optimize processes, identify high-value customers, predict issues and improve efficiency.

Uploaded by

Paras Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views

Dmbi Unit-3

Uploaded by

Paras Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

UNIT 3

Data Mining Basics

•
•
•
•
•
•
•
•
•
•

•
•

•
•
•

•
•

What is the KDD Process?

The term Knowledge Discovery in Databases, or KDD for short, refers to the broad
process of finding knowledge in data, and emphasizes the "high-level" application of
particular data mining methods. It is of interest to researchers in machine learning,
pattern recognition, databases, statistics, artificial intelligence, knowledge acquisition
for expert systems, and data visualization.
The unifying goal of the KDD process is to extract knowledge from data in the
context of large databases.

It does this by using data mining methods (algorithms) to extract (identify) what is
deemed knowledge, according to the specifications of measures and thresholds, using
a database along with any required preprocessing, subsampling, and transformations
of that database.

An Outline of the Steps of the KDD Process

The overall process of finding and interpreting patterns from data involves the
repeated application of the following steps:

1. Developing an understanding of
o the application domain
o the relevant prior knowledge
o the goals of the end-user
2. Creating a target data set: selecting a data set, or focusing on a subset of
variables, or data samples, on which discovery is to be performed.
3. Data cleaning and preprocessing.
o Removal of noise or outliers.
o Collecting necessary information to model or account for noise.
o Strategies for handling missing data fields.
o Accounting for time sequence information and known changes.
4. Data reduction and projection.
o Finding useful features to represent the data depending on the goal of the
task.
o Using dimensionality reduction or transformation methods to reduce the
effective number of variables under consideration or to find invariant
representations for the data.
5. Choosing the data mining task.
o Deciding whether the goal of the KDD process is classification,
regression, clustering, etc.
6. Choosing the data mining algorithm(s).
o Selecting method(s) to be used for searching for patterns in the data.
o Deciding which models and parameters may be appropriate.
o Matching a particular data mining method with the overall criteria of the
KDD process.
7. Data mining.
o Searching for patterns of interest in a particular representational form or
a set of such representations as classification rules or trees, regression,
clustering, and so forth.
8. Interpreting mined patterns.
9. Consolidating discovered knowledge.

The terms knowledge discovery and data mining are distinct.

KDD refers to the overall process of discovering useful knowledge from data. It
involves the evaluation and possibly interpretation of the patterns to make the decision
of what qualifies as knowledge. It also includes the choice of encoding schemes,
preprocessing, sampling, and projections of the data prior to the data mining step.
Data mining refers to the application of algorithms for extracting patterns from data
without the additional steps of the KDD process.

Definitions Related to the KDD Process

Knowledge discovery in databases is the non-trivial process of

identifying valid, novel, potentially useful, and ultimately understandable
patterns in data.

Data A set of facts, F.

Pattern An expression E in a language L describing facts in a subset FE of F.
KDD is a multi-step process involving data preparation, pattern searching,
Process
knowledge evaluation, and refinement with iteration after modification.
Discovered patterns should be true on new data with some degree of
Valid certainty.
Generalize to the future (other data).
Novel Patterns must be novel (should not be previously known).
Useful Actionable; patterns should potentially lead to some useful actions.
The process should lead to human insight.
Understandable Patterns must be made understandable in order to facilitate a better
understanding of the underlying data.

The Business Context of Data Mining

Why does an organisation have to practise data mining when it does not bring impact to their
businesses? In product marketing, the marketing manager should identify the segment of the
population who is most likely to respond to your product. Identifying these segments of
population involves understanding the overall population and deploying the right technique to
classify the population. Likewise, in predictive modelling, there are several ways to interact
with the customers using different channels. These include direct marketing, print advertising,
telemarketing, radio, television advertising and so on. It is only through data mining, that an
analyst would conclude which is the optimal channel for sending the communication to the
customers.

In addition to segmenting and targeting, data mining is also popularly used for budgeting the
marketing spend, so the budget allocation can be optimised across marketing drivers. The
analysis is carried out based on previous year spend and their impact on the sales. Therefore
with the spend information for each driver, like, Print, TV, Radio, Online, etc, one could
determine the ROIs for each driver that would uncover the impact of these channels on the
sales. Based on this analysis the marketing manager could allocate media pend in the coming
year to achieve the most effective results on sales.

Process improvement through data mining

The role of data in manufacturing has always been understated or unstated. The
way companies cope with quality improvement has been transformed by new
forms of data use and data analytics. The experts in the field report a considerable
shift from exclusive dependence on post-manufacturing inspection work and
retrospective analysis to the prediction and early identification of problem areas
and maintenance requirements. New sources of data—from sensors to callcenter
conversations—are bringing traditional product inspections on a new level. By
transforming the management of quality and safety in asset-based businesses,
these innovations are gradually improving manufacturing sector.
Data transforms technology, and it’s only the beginning of striking changes.

The quality and safety revolution in organizations was marked by

numerous technical breakthroughs such as real-time data from connected vehicle
sensors and GPS and text derived from warranty reports and conversions of
callcenter speech conversations, just to name a few. On the other hand, the data
is now combined in a repository that allows for multiple data formats and analysis
across them.This is where exactly machine learning algorithms come to play. Their
role is to identify trends in the data and to make predictions.

Why to use data mining?

Businesses use data mining to draw conclusions and solve specific problems. One
of the key benefits of data mining is that it is fundamentally applicable to any
process and helps improve the flexibility and efficiency of operations. Thus,
data use in manufacturing facilitates schedule adherence, monitoring automation,
modeling for capacity, and reduction of waste. The departments are completely
transformed and factories become smarter by achieving full data transparency.

How manufacturing businesses take advantage of data mining

ABB, a huge manufacturer of a global importance, is currently using process

mining for purchase-to-pay and production processes. Earlier, the employees from
the ABB plant in Hanau, Germany, would extract evaluations from their SAP
systems several times a day, import them into Excel, and use complex formulas to
analyze and understand processes. Today, the relevant production and assembly
team leaders at ABB receive an email in the morning that outlines the previous
day’s production variants, throughput times, and number of rejections. As a result,
the plant’s full ecosystem of quality improvement processes is immediately visible
with process mining. The system only gets better at identifying patterns as more
data gets fed in. Instead of relying on complex manual analysis of processes,
operational processes provide instant results.

Drastic changes have impacted vehicle manufacturing industry too. In this sector,
the products are relatively expensive, with high-end manufacturers focusing on
service and product quality. They note that the business benefits related to the
introduction of data-driven innovations have all the chances to speed up
identification and resolution of quality problems, as well as cut warranty spending,
which amounts to between 2-6 % of total sales in the automobile industry. For the
customers and users of these vehicles and machines, early identification and
preventive maintenance often results in greater uptime. For instance, in one case
involving an automotive company, 28,000 vehicles were saved from recall by the
identification of a problem before vehicles hit the market.

Data mining tools can be very beneficial for discovering interesting and useful
patterns in complicated manufacturing quality improvement processes. These
patterns can be used to improve manufacturing quality. However, data
accumulated in manufacturing plants have unique characteristics, such as
unbalanced distribution of the target attribute, and a small training set relative to
the number of input features. Anyways, business process improvement has to
start somewhere. Using an approach that incorporates big data, analytics and
business intelligence approach is simply the most reliable, proven way to make
improvements that last. Once you know what to measure, track it, analyse it, and
improve it, you’ll have the right foundations in place to enhance processes
throughout your business. Time and product waste will be the things of the past.

Data mining as a tool for research and knowledge

development in nursing.
The ability to collect and store data has grown at a dramatic rate in all disciplines over the past two
decades. Healthcare has been no exception. The shift toward evidence-based practice and outcomes
research presents significant opportunities and challenges to extract meaningful information from massive
amounts of clinical data to transform it into the best available knowledge to guide nursing practice. Data
mining, a step in the process of Knowledge Discovery in Databases, is a method of unearthing
information from large data sets. Built upon statistical analysis, artificial intelligence, and machine learning
technologies, data mining can analyze massive amounts of data and provide useful and interesting
information about patterns and relationships that exist within the data that might otherwise be missed. As
domain experts, nurse researchers are in ideal positions to use this proven technology to transform the
information that is available in existing data repositories into useful and understandable knowledge to
guide nursing practice and for active interdisciplinary collaboration and research.

Data mining in marketing

• Data mining technology allows to learn more about their customers and make smart marketing
decisions.
• The data mining business, grows 10 percent a year as the amount of data produced is booming.
• DM Information can help to
– increase return on investment (ROI)
– improve CRM and market analysis
– reduce marketing campaign costs
– facilitate fraud detection and customer retention.
• The 4Ps is one way of the best way of defining the marketing:
–Product (or Service)
–Price
–Place
–Promotion

Benefits Using Data Mining in Marketing

• Predict future trends
• customer purchase habits
• Help with decision making
• Improve company revenue and lower costs
• Market basket analysis
• Quick Fraud detection

Barriers Using Data Mining in Marketing

• User privacy/security
• Amount of data is overwhelming
• Great cost at implementation stage
• Possible misuse of information
• Possible in accuracy of data

Data Mining Techniques for Marketing

• Knowledge-based Marketing
• Market Basket Analysis
• Social Media Marketing

Knowledge-based Marketing
• It is marketing which makes use of the macro- and micro-environmental knowledge that is
available to the marketing functional unit in an organization.
• There are three major areas of application of data mining for knowledge-based marketing are
customers profiling, deviation analysis, and trend analysis.
• The Customers profiling systems can analyse the frequency of purchases, companies can know how
many times the customers can buy this product or visit the store.
• The Deviation analysis gives the marketer a good capability to query changes that occurred as a
result of recent price changes or promotions.
• The Trend analysis can determine trends in sales, costs and profits by products or markets in order
to achieve the highest amount of sales.

Market Basket Analysis

• Most common and useful types of data analysis for marketing and retailing.
• Determine what products customers purchase together.
• Improve the effectiveness of marketing and sales tactics using customer data already available to
the company.

Social Media Marketing

• SMM is a form of internet marketing that implements various social media networks in order to
achieve marketing communication and branding goals.
• SMM primarily covers activities involving social sharing of content, videos, and images for
marketing purposes, as well as paid social media advertising.

Data Mining Tools for Marketing

• WEKA
• Rapid Miner
• R-Programming Tool
• Python Based Orange and NTLK
• KNIME
Major Data Mining Techniques: Classification and Prediction

There are two forms of data analysis that can be used for extracting models describing
important classes or to predict future data trends. These two forms are as follows −

• Classification
• Prediction
Classification models predict categorical class labels; and prediction models predict
continuous valued functions. For example, we can build a classification model to
categorize bank loan applications as either safe or risky, or a prediction model to predict
the expenditures in dollars of potential customers on computer equipment given their
income and occupation.

What is classification?
Following are the examples of cases where the data analysis task is Classification −
• A bank loan officer wants to analyze the data in order to know which customer (loan applicant)
are risky or which are safe.
• A marketing manager at a company needs to analyze a customer with a given profile, who
will buy a new computer.
In both of the above examples, a model or classifier is constructed to predict the
categorical labels. These labels are risky or safe for loan application data and yes or no
for marketing data.

What is prediction?
Following are the examples of cases where the data analysis task is Prediction −
Suppose the marketing manager needs to predict how much a given customer will spend
during a sale at his company. In this example we are bothered to predict a numeric value.
Therefore the data analysis task is an example of numeric prediction. In this case, a
model or a predictor will be constructed that predicts a continuous-valued-function or
ordered value.
Note − Regression analysis is a statistical methodology that is most often used for
numeric prediction.

How Does Classification Works?

With the help of the bank loan application that we have discussed above, let us
understand the working of classification. The Data Classification process includes two
steps −
• Building the Classifier or Model
• Using Classifier for Classification
Building the Classifier or Model
• This step is the learning step or the learning phase.
• In this step the classification algorithms build the classifier.
• The classifier is built from the training set made up of database tuples and their associated
class labels.
• Each tuple that constitutes the training set is referred to as a category or class. These tuples
can also be referred to as sample, object or data points.

Using Classifier for Classification

In this step, the classifier is used for classification. Here the test data is used to estimate
the accuracy of classification rules. The classification rules can be applied to the new
data tuples if the accuracy is considered acceptable.
Classification and Prediction Issues
The major issue is preparing the data for Classification and Prediction. Preparing the
data involves the following activities −
• Data Cleaning − Data cleaning involves removing the noise and treatment of missing values.
The noise is removed by applying smoothing techniques and the problem of missing values
is solved by replacing a missing value with most commonly occurring value for that attribute.
• Relevance Analysis − Database may also have the irrelevant attributes. Correlation analysis
is used to know whether any two given attributes are related.
• Data Transformation and reduction − The data can be transformed by any of the following
methods.
o Normalization − The data is transformed using normalization. Normalization involves
scaling all values for given attribute in order to make them fall within a small specified
range. Normalization is used when in the learning step, the neural networks or the
methods involving measurements are used.
o Generalization − The data can also be transformed by generalizing it to the higher
concept. For this purpose we can use the concept hierarchies.
Note − Data can also be reduced by some other methods such as wavelet
transformation, binning, histogram analysis, and clustering.

Comparison of Classification and Prediction Methods

Here is the criteria for comparing the methods of Classification and Prediction −
• Accuracy − Accuracy of classifier refers to the ability of classifier. It predict the class label
correctly and the accuracy of the predictor refers to how well a given predictor can guess the
value of predicted attribute for a new data.
• Speed − This refers to the computational cost in generating and using the classifier or
predictor.
• Robustness − It refers to the ability of classifier or predictor to make correct predictions from
given noisy data.
• Scalability − Scalability refers to the ability to construct the classifier or predictor efficiently;
given large amount of data.
• Interpretability − It refers to what extent the classifier or predictor understands.

Classification by Decision Tree Induction

A decision tree is a structure that includes a root node, branches, and leaf nodes. Each
internal node denotes a test on an attribute, each branch denotes the outcome of a test,
and each leaf node holds a class label. The topmost node in the tree is the root node.
The following decision tree is for the concept buy_computer that indicates whether a
customer at a company is likely to buy a computer or not. Each internal node represents
a test on an attribute. Each leaf node represents a class.

The benefits of having a decision tree are as follows −

• It does not require any domain knowledge.

• It is easy to comprehend.
• The learning and classification steps of a decision tree are simple and fast.

Decision Tree Induction Algorithm

A machine researcher named J. Ross Quinlan in 1980 developed a decision tree
algorithm known as ID3 (Iterative Dichotomiser). Later, he presented C4.5, which was
the successor of ID3. ID3 and C4.5 adopt a greedy approach. In this algorithm, there is
no backtracking; the trees are constructed in a top-down recursive divide-and-conquer
manner.
Generating a decision tree form training tuples of data partition D
Algorithm : Generate_decision_tree

Input:
Data partition, D, which is a set of training tuples
and their associated class labels.
attribute_list, the set of candidate attributes.
Attribute selection method, a procedure to determine the
splitting criterion that best partitions that the data
tuples into individual classes. This criterion includes a
splitting_attribute and either a splitting point or splitting
subset.

Output:
A Decision Tree

Method
create a node N;

if tuples in D are all of the same class, C then

return N as leaf node labeled with class C;

if attribute_list is empty then

return N as leaf node with labeled
with majority class in D;|| majority voting

apply attribute_selection_method(D, attribute_list)

to find the best splitting_criterion;
label node N with splitting_criterion;

if splitting_attribute is discrete-valued and

multiway splits allowed then // no restricted to binary trees

attribute_list = splitting attribute; // remove splitting attribute

for each outcome j of splitting criterion

// partition the tuples and grow subtrees for each partition

let Dj be the set of data tuples in D satisfying outcome j; // a
partition

if Dj is empty then
attach a leaf labeled with the majority
class in D to node N;
else
attach the node returned by Generate
decision tree(Dj, attribute list) to node N;
end for
return N;

Tree Pruning
Tree pruning is performed in order to remove anomalies in the training data due to noise
or outliers. The pruned trees are smaller and less complex.
Tree Pruning Approaches
There are two approaches to prune a tree −
• Pre-pruning − The tree is pruned by halting its construction early.
• Post-pruning - This approach removes a sub-tree from a fully grown tree.

Cost Complexity
The cost complexity is measured by the following two parameters −

• Number of leaves in the tree, and

• Error rate of the tree.

KNN Algorithm
K-Nearest Neighbors is one of the most basic yet essential classification algorithms in
Machine Learning. It belongs to the supervised learning domain and finds intense
application in pattern recognition, data mining and intrusion detection.
It is widely disposable in real-life scenarios since it is non-parametric, meaning, it does
not make any underlying assumptions about the distribution of data (as opposed to other
algorithms such as GMM, which assume a Gaussian distribution of the given data).
We are given some prior data (also called training data), which classifies coordinates into
groups identified by an attribute.
As an example, consider the following table of data points containing two features:

Now, given another set of data points (also called testing data), allocate these points a
group by analyzing the training set. Note that the unclassified points are marked as
‘White’.

Intuition
If we plot these points on a graph, we may be able to locate some clusters or groups.
Now, given an unclassified point, we can assign it to a group by observing what group its
nearest neighbors belong to. This means a point close to a cluster of points classified as
‘Red’ has a higher probability of getting classified as ‘Red’.
Intuitively, we can see that the first point (2.5, 7) should be classified as ‘Green’ and the
second point (5.5, 4.5) should be classified as ‘Red’.
Algorithm
Let m be the number of training data samples. Let p be an unknown point.
1. Store the training samples in an array of data points arr[]. This means each element of this
array represents a tuple (x, y).
2. for i=0 to m:
3. Calculate Euclidean distance d(arr[i], p).
4. Make set S of K smallest distances obtained. Each of these distances corresponds to an
already classified data point.
5. Return the majority label among S.
K can be kept as an odd number so that we can calculate a clear majority in the case
where only two groups are possible (e.g. Red/Blue). With increasing K, we get smoother,
more defined boundaries across different classifications. Also, the accuracy of the above
classifier increases as we increase the number of data points in the training set.

Data Mining and KDD
No ratings yet
Data Mining and KDD
15 pages
kdd
No ratings yet
kdd
5 pages
DMW ALLinONE
No ratings yet
DMW ALLinONE
64 pages
UNIT - 1 Data Mining
No ratings yet
UNIT - 1 Data Mining
16 pages
Ppt-DWDM Unit 3
No ratings yet
Ppt-DWDM Unit 3
106 pages
KDD Process Mode Framework
No ratings yet
KDD Process Mode Framework
5 pages
KDD
No ratings yet
KDD
3 pages
UNIT-III
No ratings yet
UNIT-III
33 pages
Unit 1
No ratings yet
Unit 1
43 pages
KDD Process in Data Mining - Javatpoint
No ratings yet
KDD Process in Data Mining - Javatpoint
10 pages
Data Mining Versus Knowledge Discovery I
No ratings yet
Data Mining Versus Knowledge Discovery I
3 pages
Chapter 3
No ratings yet
Chapter 3
5 pages
What Is The KDD Process
No ratings yet
What Is The KDD Process
2 pages
Fund_Data_Science (1)
No ratings yet
Fund_Data_Science (1)
91 pages
Data Mining and Data Analysis UNIT-1 Notes For Print
No ratings yet
Data Mining and Data Analysis UNIT-1 Notes For Print
22 pages
DWDM 1
No ratings yet
DWDM 1
17 pages
DWM 4
No ratings yet
DWM 4
23 pages
datamining&warehousing
No ratings yet
datamining&warehousing
65 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
38 pages
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
4 pages
Topic 3 - Data Mining
No ratings yet
Topic 3 - Data Mining
37 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Chapter 1___Data Mining and Data Warehouse
No ratings yet
Chapter 1___Data Mining and Data Warehouse
44 pages
DWDM-UNIT-2
No ratings yet
DWDM-UNIT-2
50 pages
Module-1 DM
No ratings yet
Module-1 DM
15 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
Business Datamining and Warehousing
No ratings yet
Business Datamining and Warehousing
121 pages
Knowledge Discovery in Databases
No ratings yet
Knowledge Discovery in Databases
17 pages
cc15 2nd
No ratings yet
cc15 2nd
2 pages
BDUD unit1
No ratings yet
BDUD unit1
100 pages
Business Understanding This Step Involves Understanding The Problem That Needs To Be Solved and Defining The Objectives of The Data Mining Project
No ratings yet
Business Understanding This Step Involves Understanding The Problem That Needs To Be Solved and Defining The Objectives of The Data Mining Project
5 pages
data mining introduction
No ratings yet
data mining introduction
52 pages
DATA MINING - Simple Guide For Beginners PDF
0% (1)
DATA MINING - Simple Guide For Beginners PDF
5 pages
Data Mining
No ratings yet
Data Mining
43 pages
DM_C1_Overview
No ratings yet
DM_C1_Overview
55 pages
Data Visualization Using Business Intelligence (MDS204) : Arti Yadav Einfach Bussiness Analytics PVT LTD
No ratings yet
Data Visualization Using Business Intelligence (MDS204) : Arti Yadav Einfach Bussiness Analytics PVT LTD
60 pages
chapter 3 DATA MINIG
No ratings yet
chapter 3 DATA MINIG
17 pages
Data Mining
No ratings yet
Data Mining
25 pages
Data Mining 14
No ratings yet
Data Mining 14
3 pages
Process: 1. Data Mining (The Analysis Step of The "Knowledge Discovery in Databases" Process, or KDD)
No ratings yet
Process: 1. Data Mining (The Analysis Step of The "Knowledge Discovery in Databases" Process, or KDD)
4 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Chapter-3 DATA MINING PDF
No ratings yet
Chapter-3 DATA MINING PDF
13 pages
unit1DM
No ratings yet
unit1DM
16 pages
Lect 1 2 Data Mining 3
No ratings yet
Lect 1 2 Data Mining 3
19 pages
Unit-1 Data Mining
No ratings yet
Unit-1 Data Mining
19 pages
p144 Data Mining
100% (3)
p144 Data Mining
11 pages
Data Mining and Techniques
No ratings yet
Data Mining and Techniques
14 pages
Data Mining Lecture One - Docx1
No ratings yet
Data Mining Lecture One - Docx1
12 pages
Unit III Dwdm
No ratings yet
Unit III Dwdm
113 pages
Data Mining.intro
No ratings yet
Data Mining.intro
17 pages
Chapter_5_Data Mining
No ratings yet
Chapter_5_Data Mining
18 pages
DM Module1
No ratings yet
DM Module1
15 pages
Introduction to Data Mining
No ratings yet
Introduction to Data Mining
11 pages
Data Mining Process, Techniques, Tools & Examples
No ratings yet
Data Mining Process, Techniques, Tools & Examples
11 pages
DM Sem U-1
No ratings yet
DM Sem U-1
50 pages
Knowledge Discovery and Data Mining
No ratings yet
Knowledge Discovery and Data Mining
5 pages
Data Mining
100% (1)
Data Mining
18 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
Report's 2nd Last Part
No ratings yet
Report's 2nd Last Part
36 pages
AI Project Cycle
No ratings yet
AI Project Cycle
10 pages
231
No ratings yet
231
8 pages
Image Blending Using Unitery CNN Algorithm
No ratings yet
Image Blending Using Unitery CNN Algorithm
69 pages
Machine Learning Method For Tight-Binding Hamiltonian Parameterization From Ab-Initio Band Structure
No ratings yet
Machine Learning Method For Tight-Binding Hamiltonian Parameterization From Ab-Initio Band Structure
10 pages
B-14 Cardiovascular Disease Detection From ECG Images Using Machine Learning (13)
No ratings yet
B-14 Cardiovascular Disease Detection From ECG Images Using Machine Learning (13)
19 pages
TE 2019 DSBDA Lab Manual Sem II 2023 Final
No ratings yet
TE 2019 DSBDA Lab Manual Sem II 2023 Final
170 pages
Artemis Ai Driven Robotic Triage Labeling and Emergency 140ilkj8pd
No ratings yet
Artemis Ai Driven Robotic Triage Labeling and Emergency 140ilkj8pd
7 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
Gorilla - Large Language Model Connected With Massive APIs
No ratings yet
Gorilla - Large Language Model Connected With Massive APIs
18 pages
Bachelor of Technology: Diabetes Disease Prediction Using Machine Learning
No ratings yet
Bachelor of Technology: Diabetes Disease Prediction Using Machine Learning
58 pages
Bankruptcy Usecase
No ratings yet
Bankruptcy Usecase
16 pages
Predicting Coronary Heart Disease Using An Improved LightGBM Model Performance Analysis and Comparison
No ratings yet
Predicting Coronary Heart Disease Using An Improved LightGBM Model Performance Analysis and Comparison
15 pages
Damage Assessment Tweet
No ratings yet
Damage Assessment Tweet
95 pages
Kanishka's Logbook
No ratings yet
Kanishka's Logbook
20 pages
DDOS Attack Final
No ratings yet
DDOS Attack Final
41 pages
7.analysis and Detection of Malware in Android Applications Using Machine Learning
No ratings yet
7.analysis and Detection of Malware in Android Applications Using Machine Learning
55 pages
Real Time Deep Learning Weapon Detection Techniques For Mitigating Lone Wolf Attacks
No ratings yet
Real Time Deep Learning Weapon Detection Techniques For Mitigating Lone Wolf Attacks
16 pages
Paper_01
No ratings yet
Paper_01
22 pages
Forest Fire Prediction Using Machine Learning
No ratings yet
Forest Fire Prediction Using Machine Learning
28 pages
Phase 4
No ratings yet
Phase 4
27 pages
Mlops 101
No ratings yet
Mlops 101
33 pages
Research Paper Paper
No ratings yet
Research Paper Paper
10 pages
DATASET
No ratings yet
DATASET
5 pages
LIMA: Less Is More For Alignment: Chunting Zhou Pengfei Liu Puxin Xu Srini Iyer Jiao Sun
No ratings yet
LIMA: Less Is More For Alignment: Chunting Zhou Pengfei Liu Puxin Xu Srini Iyer Jiao Sun
15 pages
vishal FOML micro project vishal & milan
No ratings yet
vishal FOML micro project vishal & milan
26 pages
1 s2.0 S2666521220300090 Main
No ratings yet
1 s2.0 S2666521220300090 Main
5 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
1066_CodeDPO_Aligning_Code_Mod
No ratings yet
1066_CodeDPO_Aligning_Code_Mod
20 pages
JNTUK R20 ML UNIT-I Final
No ratings yet
JNTUK R20 ML UNIT-I Final
22 pages

Dmbi Unit-3

Uploaded by

Dmbi Unit-3

Uploaded by

UNIT 3

Data Mining Basics

What is the KDD Process?

An Outline of the Steps of the KDD Process

The terms knowledge discovery and data mining are distinct.

Definitions Related to the KDD Process

Knowledge discovery in databases is the non-trivial process of

Data A set of facts, F.

The Business Context of Data Mining

Process improvement through data mining

The quality and safety revolution in organizations was marked by

Why to use data mining?

How manufacturing businesses take advantage of data mining

ABB, a huge manufacturer of a global importance, is currently using process

Data mining as a tool for research and knowledge

Data mining in marketing

Benefits Using Data Mining in Marketing

Barriers Using Data Mining in Marketing

Data Mining Techniques for Marketing

Market Basket Analysis

Social Media Marketing

Data Mining Tools for Marketing

How Does Classification Works?

Using Classifier for Classification

Comparison of Classification and Prediction Methods

Classification by Decision Tree Induction

The benefits of having a decision tree are as follows −

• It does not require any domain knowledge.

Decision Tree Induction Algorithm

if tuples in D are all of the same class, C then

if attribute_list is empty then

apply attribute_selection_method(D, attribute_list)

if splitting_attribute is discrete-valued and

attribute_list = splitting attribute; // remove splitting attribute

// partition the tuples and grow subtrees for each partition

• Number of leaves in the tree, and

You might also like