Lec01 Conceptlearning
Lec01 Conceptlearning
Inducing general functions from specific training examples is a main issue of machine learning. Concept Learning: Acquiring the definition of a general category from given sample positive and negative training examples of the category. Concept Learning can seen as a problem of searching through a predefined space of potential hypotheses for the hypothesis that best fits the training examples. The hypothesis space has a general-to-specific ordering of hypotheses, and the search can be efficiently organized by taking advantage of a naturally occurring structure over the hypothesis space.
Concept Learning
A Formal Definition for Concept Learning:
Inferring a boolean-valued function from training examples of its input and output. An example for concept-learning is the learning of bird-concept from the given examples of birds (positive examples) and non-birds (negative examples). We are trying to learn the definition of a concept from given examples.
2
3 4
Sunny
Rainy Sunny
Warm
Cold Warm
High
High High
Strong
Strong Strong
Warm
Warm Warm
Same
Change Change
YES
NO YES
ATTRIBUTES
CONCEPT
A set of example days, and each is described by six attributes. The task is to learn to predict the value of EnjoySport for arbitrary day, based on the values of its attribute values.
CS464 Introduction to Machine Learning 3
Hypothesis Representation
A hypothesis: Sky AirTemp Humidity Wind Water Forecast < Sunny, ? , ? , Strong , ? , Same > The most general hypothesis that every day is a positive example <?, ?, ?, ?, ?, ?> The most specific hypothesis that no day is a positive example <0, 0, 0, 0, 0, 0> EnjoySport concept learning task requires learning the sets of days for which EnjoySport=yes, describing this set by a conjunction of constraints over the instance attributes.
CS464 Introduction to Machine Learning 5
The Inductive Learning Hypothesis - Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.
Every hypothesis containing one or more 0 symbols represents the empty set of instances; that is, it classifies every instance as negative. There are 973 (= 1 + 4.3.3.3.3.3) semantically distinct hypotheses in H.
Only one more value for attributes: ?, and one hypothesis representing empty set of instances.
Although EnjoySport has small, finite hypothesis space, most learning tasks have much larger (even infinite) hypothesis spaces.
We need efficient search algorithms on the hypothesis spaces.
Now consider the sets of instances that are classified positive by hl and by h2.
Because h2 imposes fewer constraints on the instance, it classifies more instancesas positive. In fact, any instance classified positive by hl will also be classified positive by h2. Therefore, we say that h2 is more general than hl.
10
More-General-Than Relation
For any instance x in X and hypothesis h in H, we say that x satisfies h if and only if h(x) = 1. More-General-Than-Or-Equal Relation: Let h1 and h2 be two boolean-valued functions defined over X. Then h1 is more-general-than-or-equal-to h2 (written h1 h2) if and only if any instance that satisfies h2 also satisfies h1. h1 is more-general-than h2 ( h1 > h2) if and only if h1h2 is true and h2h1 is false. We also say h2 is more-specific-than h1.
11
More-General-Relation
FIND-S Algorithm
FIND-S Algorithm starts from the most specific hypothesis and generalize it by considering only positive examples. FIND-S algorithm ignores negative examples.
As long as the hypothesis space contains a hypothesis that describes the true target concept, and the training data contains no errors, ignoring negative examples does not cause to any problem.
FIND-S algorithm finds the most specific hypothesis within H that is consistent with the positive training examples.
The final hypothesis will also be consistent with negative examples if the correct target concept is in H, and the training examples are correct.
13
FIND-S Algorithm
1. Initialize h to the most specific hypothesis in H 2. For each positive training instance x For each attribute constraint a, in h If the constraint a, is satisfied by x Then do nothing Else replace a, in h by the next more general constraint that is satisfied by x 3. Output hypothesis h
14
15
16
17
Candidate-Elimination Algorithm
FIND-S outputs a hypothesis from H, that is consistent with the training examples, this is just one of many hypotheses from H that might fit the training data equally well. The key idea in the Candidate-Elimination algorithm is to output a description of the set of all hypotheses consistent with the training examples.
Candidate-Elimination algorithm computes the description of this set without explicitly enumerating all of its members. This is accomplished by using the more-general-than partial ordering and maintaining a compact representation of the set of consistent hypotheses.
18
Consistent Hypothesis
The key difference between this definition of consistent and satisfies. An example x is said to satisfy hypothesis h when h(x) = 1, regardless of whether x is a positive or negative example of the target concept. However, whether such an example is consistent with h depends on the target concept, and in particular, whether h(x) = c(x).
CS464 Introduction to Machine Learning 19
Version Spaces
The Candidate-Elimination algorithm represents the set of all hypotheses consistent with the observed training examples. This subset of all hypotheses is called the version space with respect to the hypothesis space H and the training examples D, because it contains all plausible versions of the target concept.
20
List-Then-Eliminate Algorithm
List-Then-Eliminate algorithm initializes the version space to contain all hypotheses in H, then eliminates any hypothesis found inconsistent with any training example. The version space of candidate hypotheses thus shrinks as more examples are observed, until ideally just one hypothesis remains that is consistent with all the observed examples.
Presumably, this is the desired target concept. If insufficient data is available to narrow the version space to a single hypothesis, then the algorithm can output the entire set of hypotheses consistent with the observed data.
List-Then-Eliminate Algorithm
22
A version space with its general and specific boundary sets. The version space includes all six hypotheses shown here, but can be represented more simply by S and G.
CS464 Introduction to Machine Learning 24
Candidate-Elimination Algorithm
The Candidate-Elimination algorithm computes the version space containing all hypotheses from H that are consistent with an observed sequence of training examples. It begins by initializing the version space to the set of all hypotheses in H; that is, by initializing the G boundary set to contain the most general hypothesis in H G0 { <?, ?, ?, ?, ?, ?> } and initializing the S boundary set to contain the most specific hypothesis S0 { <0, 0, 0, 0, 0, 0> } These two boundary sets delimit the entire hypothesis space, because every other hypothesis in H is both more general than S0 and more specific than G0. As each training example is considered, the S and G boundary sets are generalized and specialized, respectively, to eliminate from the version space any hypotheses found inconsistent with the new training example. After all examples have been processed, the computed version space contains all the hypotheses consistent with these examples and only these hypotheses.
CS464 Introduction to Machine Learning 25
Candidate-Elimination Algorithm
Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example d, do
If d is a positive example
Remove from G any hypothesis inconsistent with d , For each hypothesis s in S that is not consistent with d , Remove s from S Add to S all minimal generalizations h of s such that h is consistent with d, and some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S
If d is a negative example
Remove from S any hypothesis inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specializations h of g such that h is consistent with d, and some member of S is more specific than h Remove from G any hypothesis that is less general than another hypothesis in G
CS464 Introduction to Machine Learning 26
27
28
In fact, the S boundary of the version space forms a summary of the previously encountered positive examples that can be used to determine whether any given hypothesis is consistent with these examples. The G boundary summarizes the information from previously encountered negative examples. Any hypothesis more specific than G is assured to be consistent with past negative examples
CS464 Introduction to Machine Learning 29
30
31
32
33
A similar symptom will appear when the training examples are correct, but the target concept cannot be described in the hypothesis representation.
e.g., if the target concept is a disjunction of feature attributes and the hypothesis space supports only conjunctive descriptions
CS464 Introduction to Machine Learning 34
Considering the version space learned from the four training examples of the EnjoySport concept.
What would be a good query for the learner to pose at this point? What is a good query strategy in general?
35
In general, the optimal query strategy for a concept learner is to generate instances that satisfy exactly half the hypotheses in the current version space. When this is possible, the size of the version space is reduced by half with each new example, and the correct target concept can therefore be found with only log2 |VS| experiments.
CS464 Introduction to Machine Learning 36
37
Half of the version space hypotheses classify instance C as positive and half classify it as negative.
Thus, the learner cannot classify this example with confidence until further training examples are available.
Instance D is classified as positive by two of the version space hypotheses and negative by the other four hypotheses.
In this case we have less confidence in the classification than in the unambiguous cases of instances A and B. Still, the vote is in favor of a negative classification, and one approach we could take would be to output the majority vote, perhaps with a confidence rating indicating how close the vote was.
CS464 Introduction to Machine Learning 39
40
From first two examples S2 : <?, Warm, Normal, Strong, Cool, Change> This is inconsistent with third examples, and there are no hypotheses consistent with these three examples PROBLEM: We have biased the learner to consider only conjunctive hypotheses. We require a more expressive hypothesis space.
CS464 Introduction to Machine Learning 41
42
NEW PROBLEM: our concept learning algorithm is now completely unable to generalize beyond the observed examples.
three positive examples (xl,x2,x3) and two negative examples (x4,x5) to the learner. S : { x1 x2 x3 } and G : { (x4 x5) } NO GENERALIZATION Therefore, the only examples that will be unambiguously classified by S and G are the observed training examples themselves.
43
The CANDIDATE-ELIMINATION algorithm is not robust to noisy data or to situations in which the unknown target concept is not expressible in the provided hypothesis space.