Unit 1 ML
Unit 1 ML
Unit 1
syllabus
• Introduction - Well-posed learning problems, designing a learning system,
Perspectives and issues in machine learning
• Concept learning and the general to specific ordering – introduction, a
concept learning task, concept learning as search, find-S: finding a
maximally specific hypothesis, version spaces and the candidate
elimination algorithm, remarks on version spaces and candidate
elimination, inductive bias.
• Decision Tree Learning – Introduction, decision tree representation,
appropriate problems for decision tree learning, the basic decision tree
learning algorithm, hypothesis space search in decision tree learning,
inductive bias in decision tree learning, issues in decision tree learning
Well posed Learning Problems
• Definition: A computer program is said to learn from
experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.
• For any problem we must identity three features: the class of
tasks, the measure of performance to be improved, and the
source of experience
• A Checkers learning problem:
- Task T: playing checkers
- Performance measure P: percent of games won against
opponents
- Training experience E: playing practice games against itself
• A handwriting recognition learning problem:
- Task T: recognizing and classifying handwritten words
within images
- Performance measure P: percent of words correctly
classified
- Training experience E: a database of handwritten
words with given classifications
• A robot driving learning problem:
- Task T: driving on public four-lane highways using
vision sensors
- Performance measure P: average distance traveled
before an error
- Training experience E: a sequence of images and steering
commands recorded while observing a human driver
Designing a Learning System
1. Choosing the Training Experience
2. Choosing the Target Function
3. Choosing a Representation for the Target
Function
4. Choosing a Function Approximation
Algorithm
5. The Final Design
1.Choosing the Training Experience
• First is to choose the type of training experience
from which our system will learn.
- Has significant impact on failure or success of the learner
• One key attribute is Whether training experience
provides direct or indirect feedback regarding the
choices made by the performance system.
• Direct feedback - correct move for every step
• Indirect feedback- Consisting of move sequences and final
outcomes.
-No credit assignment for each move.
-Credit assignment is difficult.
• learning is easy from direct feedback.
1.Choosing the Training Experience
• Second attribute of training experience is the degree to
which the learner controls the sequence of training
examples
• Learner self learns and ends up with confusion
– Novel board states , increases skills
– confusion
• Ask teacher to help by posing various queries
– learner collects training examples by autonomously exploring its
environment
• learner may have complete control over both the
board states and (indirect) training classifications, as
it does when it learns by playing against itself with
no teacher present.
1.Choosing the Training Experience
• A third important attribute of the training
experience is how well it represents the
distribution of examples over which the final
system performance P must be measured
• Self learning
– No teacher required
– is not sufficient
• Assumption: Distribution of training is identical to
test data
1.Choosing the Training Experience
• A checkers learning problem:
– Task T: playing checkers
– Performance measure P: percent of games won in the
world tournament
– Training experience E: games played against itself
• In order to complete the design of the learning
system, we must now choose
1. the exact type of knowledge to be learned
2. a representation for this target knowledge
3. a learning mechanism
2.Choosing the Target Function
• The next design choice is to determine exactly
what type of knowledge will be learned and how
this will be used by the performance program
– how to choose the best move from among these legal moves
– Program, or function(ChooseMove ), that chooses the best
move for any given board state
– ChooseMove : B -> M
– B:set of legal board states
– M: some move from a set of legal moves as output
• choice of the target function is key in design
– Difficult to choose
2.Choosing the Target Function
• An Alternative target function that is easier to
choose is
– an evaluation function(V) that assigns a numerical score to any
given board state
– V : B ->R
– B: Set of board state, R : set of real numbers
– function V assigns higher scores to better board states
2.Choosing the Target Function
• Let us therefore define the target value V(b) for an
arbitrary board state b in B, as follows:
1. if b is a final board state that is won, then V(b) = 100
2. if b is a final board state that is lost, then V(b) = -100
3. if b is a final board state that is drawn, then V(b) = 0
4. if b is a not a final state in the game, then V(b) = V(b’), where b' is the
best final board state that can be achieved starting from b and
playing optimally until the end of the game (assuming the opponent
plays optimally, as well).
• Learning an ideal target function(V) is difficult so we
do some approximations to target function.
• The process of learning the target function is often
called function approximation()) )
– V : Ideal target function
3.Choosing a Representation for the
Target Function
• we must choose a representation that the
learning program will use to describe the
function Ū that it will learn
– A large table with a distinct entry for every state
– Collection of rules that match against features of state
– Quadratic polynomial function of predefined board features
– An artificial neural network
• More expensive representation more close to ideal target
function V , and more training data we require to choose among
the hypotheses
• let us choose a simple representation for any given board state
3.Choosing a Representation for the
Target Function
• The function Ū will be calculated as a linear
combination of the following board features:
– X1: the number of black pieces on the board
– X2:the number of red pieces on the board
– X3: the number of black kings on the board
– X4: the number of red kings on the board
– x5: the number of black pieces threatened by red (i.e.,
which can be captured on red's next turn)
– X6: the number of red pieces threatened by black
3.Choosing a Representation for the
Target Function
• Our learning program will represent (b) as a
linear function of the form
• Here n is a small constant (e.g., 0.1) that moderates the size of the
weight update
• For each observed training example it adjusts the weights a small
amount in the direction that reduces the error on this training example
5.The Final Design
• The final design of our checkers learning system can
be done by four distinct program modules that
represent the central components in many learning
systems
– Performance System
• Solve the task, outputs the game history
– Critic
• Outputs the set of training examples of the target function.
– Generalizer
• Outputs the target function hypothesis
– Experiment Generator
• Outputs a new problem(Board state)
5.The Final Design
5.The Final Design
Issues in Machine Learning
• What algorithms exist for learning general target functions from
specific training examples? In what settings will particular
algorithms converge to the desired function, given sufficient
training data? Which algorithms perform best for which types of
problems and representations?
• How much training data is sufficient? What general bounds can
be found to relate the confidence in learned hypotheses to the
amount of training experience and the character of the learner's
hypothesis space?
• When and how can prior knowledge held by the learner guide
the process of generalizing from examples? Can prior knowledge
be helpful even when it is only approximately correct?
Issues in Machine Learning
• What is the best strategy for choosing a useful
next training experience, and how does the
choice of this strategy alter the complexity of the
learning problem?
• What is the best way to reduce the learning task
to one or more function approximation
problems? Put another way, what specific
functions should the system attempt to learn?
Can this process itself be automated?
• How can the learner automatically alter its
representation to improve its ability to represent
and learn the target function?
Concept Learning Task
• Concept: Good days for WaterSports
• Task: to learn to predict the value of EnjoySport for an
arbitrary day, based on the values of its other attributes Sky,
AirTemp, Humidity, Wind, Water, and Forecast
• each hypothesis be a vector of six attributes
Sky, AirTemp, Humidity, Wind, Water, and Forecast
• For each attribute, the hypothesis will either
– indicate by a "?' that any value is acceptable for this
attribute,
– specific value (e.g., Warm) for the attribute
– indicate by a "Φ" that no value is acceptable.
Example of a Concept Learning Task
• Now consider the sets of instances that are classified positive by hl and by h2.
– Because h2 imposes fewer constraints on the instance, it classifies more
instances as positive.
– In fact, any instance classified positive by h1 will also be classified positive by h2.
– Therefore, we say that h2 is more general than h1.
General-to-Specific Ordering of Hypotheses
• Definition: Let hj and hk be boolean-valued functions defined
over X. Then hj is more-general-than-or-equal-to hk (written hj ≥g
hk) if and only if
The search moves from hypothesis to hypothesis , searching from the most specific to
progressively more general hypotheses.
At each stage the hypothesis is the most specific hypothesis consistent with the training
examples observed up to this point.
Example
Problems with FIND-S Algorithm:
40
Version Space
Version Space:
•The List-Then-Elimination Algorithm:
This algorithm first initializes the version space to contain all hypotheses
in H, Then eliminates any hypothesis found inconsistent with any training
example.
But inefficient! Even in this simple example, there are 1+4·3·3·3·3 = 973
semantically distinct hypotheses.
42
G← maximally general hypothesis in H
S ← maximally specific hypothesis in H
For each training example modify G and S so that G and S are consistent with d
43
• Candidate-Elimination Algorithm:
44
Candidate-Elimination Algorithm
45
46
REMARKS ON VERSION SPACES AND CANDIDATE
ELIMIATION
50
Using Partially Learned Concepts
Version-Spaces can be used to assign certainty scores to the classification of new
examples. Voting provides the most probable classification of the new instance
INDUCTIVE BIAS - Fundamental
Questions for Inductive Inference
The Candidate-Elimination Algorithm will converge toward the true
target concept provided it is given accurate training examples and
provided its initial hypothesis space contains the target concept.
52
Inductive Bias -A Biased Hypothesis
Space
55
An Unbiased Learner
• NEW PROBLEM: our concept learning algorithm is
now completely unable to generalize beyond the
observed examples.
– three positive examples (xl,x2,x3) and two
negative examples (x4,x5) to the learner.
– S : { x1 V x2 V x3 } and G : { not (x4 V x5) }
-> NO GENERALIZATION
– Therefore, the only examples that will be
unambiguously classified by S and G are the
observed training examples themselves.
56
Inductive Bias – Fundamental
Property of Inductive Inference
A learner that makes no a priori assumptions regarding the identity
of the target concept has no rational basis for classifying any unseen
instances.
• Inductive Leap: A learner should be able to generalize training
data
using prior assumptions in order to classify unseen instances.
• The generalization is known as inductive leap and our prior
assumptions are the inductive bias of the learner.
• Inductive Bias (prior assumptions) of Candidate-Elimination
Algorithm s that the target concept can be represented by a
conjunction of attribute values, the target concept is contained in
the hypothesis space and training examples are correct.
Inductive Bias:
Consider a concept learning algorithm L for the set of instances X.
Let c be an arbitrary concept defined over X, and
let Dc = {<x , c(x)>} be an arbitrary set of training examples of c.
Let L(xi, Dc) denote the classification assigned to the instance xi by L
after training on the data Dc.
The inductive bias of L is any minimal set of assertions B such that for
any target concept c and corresponding training examples Dc the
following formula holds.