Unit 4
Unit 4
• Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two
or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree
cannot be segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision
node/root node into sub-nodes according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted
branches from the tree.
• Parent/Child node: The root node of the tree is called the parent
node, and other nodes are called the child nodes.
Working of Decision Tree Algorithm
• In a decision tree, for predicting the class of the given
dataset, the algorithm starts from the root node of the
tree.
• This algorithm compares the values of root attribute
with the record (real dataset) attribute and, based on
the comparison, follows the branch and jumps to the
next node.
• For the next node, the algorithm again compares the
attribute value with the other sub-nodes and move
further. It continues the process until it reaches the leaf
node of the tree.
The complete process can be better
understood using the below algorithm:
• Step-1: Begin the tree with the root node, says S, which
contains the complete dataset.
• Step-2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
• Step-3: Divide the S into subsets that contains possible values
for the best attributes.
• Step-4: Generate the decision tree node, which contains the
best attribute.
• Step-5: Recursively make new decision trees using the subsets
of the dataset created in step -3. Continue this process until a
stage is reached where you cannot further classify the nodes
and called the final node as a leaf node.
Attribute Selection Measures