Unit - Iii
Unit - Iii
A path is traced from root to leaf node, which holds the class
prediction for that tuple.
The decision tree induction algorithms are:
ID3 (Iterative Dichotomiser)
Variable interactions
Overfitting can occur because each split reduces training data for
subsequent splits
NOTE:-Tree pruning methods address problem of overfitting
Definition:- Tree pruning attempts to identify and remove those
branches having anomalies , with the goal of improving
classification accuracy on unseen data.
Attribute construction
Create new attributes based on existing ones that are sparsely
represented
This reduces fragmentation, repetition, and replication
Classification—a classical problem extensively studied by
statisticians and machine learning researchers
Scalability: Classifying data sets with millions of examples and
hundreds of attributes with reasonable speed
Why decision tree induction in data mining?
relatively faster learning speed (than other classification
methods)
convertible to simple and easy to understand classification
rules
can use SQL queries for accessing databases
attribute
BOAT (Bootstrapped Optimistic Algorithm for Tree Construction) -
not based on any special data structures but uses a technique known
as “boot strapping”
A statistical classifier: performs probabilistic prediction i.e.,
predicts class membership probabilities
Joint Probability
Conditional probability
These probabilities can help us make an inference.
A belief network is defined by two components:
A directed acyclic graph encoding the dependence
relationships among set of variables
A set of conditional probability tables (CPT) associating each
node to its immediate parent nodes.
When given a training tuple, a lazy learner simply stores it and
waits until it is given a test tuple.