Data Mining UNIT-III R20 Syllabus
Data Mining UNIT-III R20 Syllabus
YANNAM APPARAO
Associate Professor -CSE/IT
There are two forms of data analysis that can be used for extracting
models describing important classes or to predict future data trends.
Following are the examples of cases where the data analysis task
is Classification −
Following are the examples of cases where the data analysis task is
Prediction −
With the help of the bank loan application that we have discussed
above, let us understand the working of classification.
Classification
Introduction to Data Mining
Data Classification
process
Introduction to Data Mining
3 No Small 70K No
6 No Medium 60K No
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Evaluation of classifiers,
Classification Techniques.
A classification technique (or classifier) is a
systematic approach to building classification
models from an input data set.
Examples include
Input
Data partition, DP
Attribute _ list
Attribute _ selection _ procedure
Output
A Decision Tree
Method
5. If there are more than two splitting attributes and no binary _ list by removining
the splitting attribute.
Attribute _ list=attribute_list_’splitting_ssttribute’
6. For each outcome i identify the tuples satisfying the outcome i, let those tuples
by DPi
7. If DP I is empty
Attach a leaf node
else
Attach a node returned by Decesion_Tree(Di, attribute_list)
end for
8. return A.
Description of algorithm
Input parameters
a) Gini index
b) Information gain
These measures also tells us whether the treee is strictly binary or not.
Step 1
Suppose we are giving with data tuples of the students with
heir average. Then we will have a single node
Average
Step 2
If all the data tuples, belongs to a single class, then the node
becomes the leaf node. And we label it with that class.
Avg >80%
But suppose if all the students do not have the average above 80% then the
algorithm proceed further.
Step 3
It calls Attribute _ selection _ procedure to identify the splitting criterion. The
identifies the branches to be spitted.
In our example . The method would determine three splitting criterions.
I. Above 80%
II. Above 65 % and Less than 80%
III. Above 40% and less than 65%
Here the splitting attribute are more than two. So we are allowed to have a
no binary tree, We considerer that a partition is pure if all the attributes within
that partition belongings to the same class.
Average
i. Discrete
ii. Continuous
iii. Discrete-values binary tree
If the splitting attribute is a discrete value ,then the node is splitting into all the possible
values.
Example : Fruits
If it is a discrete valued and binary tree , then it has exactly two values, either yes or no
Student CSIT
no
yes
This algorithm is used recursively for all nodes at each level, until all nodes become
leaf nodes.
Naive-Bayes
Classifier
•Naïve Bayes algorithm is a supervised
learning algorithm, which is based
on Bayes theorem and used for solving
classification problems.
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Frequency table for the
Weather Conditions:
Weath Yes No
er
Overca 5 0
st
Rainy 2 2
Sunny 3 2
Total 10 4
Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|
Yes)*P(Yes)/P(Sunny)
• P(Sunny|Yes)= 3/10= 0.3
• P(Sunny)= 0.35
• P(Yes)=0.71
• So P(Yes|Sunny) =
0.3*0.71/0.35= 0.60
Applying Bayes'theorem:
P(No|Sunny)=
P(Sunny|No)*P(No)/P(Sunny)
• P(Sunny|NO)= 2/4=0.5
• P(No)= 0.29
• P(Sunny)= 0.35
• So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
• So as we can see from the above calculation
that P(Yes|Sunny)>P(No|Sunny)
• Hence on a Sunny day, Player can play
the game.
Advantages of Naïve Bayes Classifier:
• Naïve Bayes is one of the fast and easy ML algorithms
to predict a class of datasets.