0% found this document useful (0 votes)

112 views

Hypothesis Space Search in Decision Trees

Hypothesis space search in decision trees involves evaluating different tree structures to find the most effective model for predicting or classifying data. This process includes criteria for splitting, tree growth, and pruning to avoid overfitting. The search aims to identify the best decision tree based on heuristics like Gini impurity or information gain.

Uploaded by

diljeetpc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views

Hypothesis Space Search in Decision Trees

Uploaded by

diljeetpc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Hypothesis Space Search in

Decision Trees
Overview of Hypothesis Space and its
Search Process
Hypothesis Space Search Overview

• When training a machine learning model, the

goal is to find a function, called a hypothesis,
that best maps inputs to outputs.
• The hypothesis space is the set of all possible
hypotheses that the model can consider while
learning from data.
Example
• Imagine you are building a spam email classifier. -> email is spam or not
spam.
• Possible hypotheses for classification could be:

• H1: If the email contains "free money," classify it as spam.

• H2: If the email length is more than 100 words, classify it as spam.

• H3: If the sender is unknown, classify it as spam.

• H4: If the email contains "click here," classify it as spam.

• The collection of all these possible rules (hypotheses) forms the

hypothesis space (Н).
Mathematical Equation

• It is defined as:

H = {h : X -> Y
• Where X is the input space and Y is the
output space.
• This means that H contains all functions
h that map inputs from X to outputs in Y.
Hypothesis Space Search in Decision Tree

• In decision tree learning, hypothesis space refers

to the set of all possible decision trees that
could be generated given the available data.
• The hypothesis space search is the process of
searching through this space to find the best
tree that fits the data.
Key Points about Hypothesis Space Search

• 1. Hypothesis: A possible solution or model that can

predict or classify data.
• 2. Search Process:
 Splitting Criteria: Criterion like Gini impurity or information
gain.
 Tree Growth: Recursively partitioning the data at each node.
 Pruning: Cutting branches to avoid overfitting.
Splitting Criteria

• The decision tree algorithm uses a criterion

(e.g., Gini impurity or information gain) to
decide which features to split on and how to
partition the data at each node.
Tree Growth

• The tree is grown by recursively partitioning

the data at each node based on the best
feature until a stopping condition is met (e.g.,
all data points in a node belong to the same
class, or a maximum tree depth is reached).
Pruning

• After the tree is fully grown, it may be pruned

to avoid overfitting.
• Process of cutting branches that do not
contribute much to the predictive power of
the model is called Pruning
Inductive Bias
• Inductive Bias refers to the set of assumptions that a
learning algorithm uses to make predictions on
unseen data.
• In decision tree learning, the algorithm has inherent
preferences for certain types of hypotheses (tree
structures) over others, even when multiple
hypotheses fit the training data equally well.
Inductive Bias
• Decision trees typically have an inductive bias towards simpler
trees.
• Decision tree algorithms (like ID3, C4.5, CART) prefer smaller
trees over larger ones to avoid overfitting.
• This is implemented via early stopping (limiting tree depth) or
pruning.
Why is Inductive Bias Important?
• Without bias, the algorithm could generate overly complex
trees (memorizing noise).
• The bias ensures generalization by favoring simpler models.
Optimality

• The hypothesis space search aims to find the

most accurate and general decision tree.
• It does not always guarantee finding the
absolute optimal tree but seeks the best tree
based on a heuristic criterion (such as Gini
impurity or information gain).
Summary