ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
INTRODUCTION
Notations:
• P(h) - denote the initial probability that hypothesis
h holds, before we have observed the training data.
Topic-2 • P(h) is often called the prior probability of h and
may reflect any background knowledge that h is a
correct hypothesis.
BAYES THEOREM • If we have no such prior knowledge, then we might
simply assign the same prior probability to each
candidate hypothesis.
BAYES THEOREM AND Assume the sequence of instances <x1…xm> is fixed, so the training data D
CONCEPT LEARNING can be written as the sequence of target values D=<d1…dm>
May prove impractical for large hypothesis space
Example:
Concept learning algorithm FIND-S searches the hypothesis space
H from specific to general hypotheses, outputting a maximally
specific consistent hypothesis (i.e., a maximally specific member
of the version space) under the probability distributions P(h) and
P(D|h).
we consider the problem of learning a continuous-valued target
function-a problem faced by many learning approaches such as
So far, we have considered the case where P(D|h) takes neural network learning, linear regression, and polynomial curve
on values of only 0 and 1, assumption of noise-free fitting.
training data.
Topic-5
Topic-6
MINIMUM DESCRIPTION
LENGTH PRINCIPLE
Topic-7
Topic-8
GIBBS ALGORITHM
Topic-9
Topic-11
BAYESIAN BELIEF NETWORKS • The probability distribution over this joint space is called the
THE EM ALGORITHM
Expectation/Estimation–
Maximization
• Basis for the widely used Baum-Welch forward-backward algorithm for learning
Partially Observable Markov Models
• EM algorithm described by Dempster (1977)
Problem involves a mixture of k different Normal distributions,
and we cannot observe which instances were generated by which
distribution. i.e., involving hidden variables
Example