10 Learning
10 Learning
G. Lakemeyer
Performance standard
AI/WS-2024/25 2 / 32
Kinds of Feedback during Learning
Supervised Both the input and the correct output are available
Learning: to the learner. (There is a teacher (supervisor).)
AI/WS-2024/25 3 / 32
Inductive Learning
o o o o
o o o o
o o o o
o o o o
o o o o
Note: Since there are many possibilities for h, this works only with additional
assumptions which restrict the search space: bias.
In the following: Decision Trees (DT’s) as an example of inductive learning.
AI/WS-2024/25 4 / 32
Decision Trees
AI/WS-2024/25 5 / 32
Restaurant Example
Patrons?
No Yes WaitEstimate?
No Yes No Yes
No Yes WaitEstimate?
No Yes No Yes
© G. Lakemeyer
Theorem:
Every propositional formula (Boolean function) is representable by a decision tree.
AI/WS-2024/25 7 / 32
Learning in Decision Trees
Decision trees can trivially represent any Boolean function by having each
path represent one valuation of the attributes (atomic formulas). Often,
however, there are much more compact representations.
Always?
© G. Lakemeyer
No! For example, the parity function (answers Yes if an even number of
attributes are true).
AI/WS-2024/25 8 / 32
Generating a Decision Tree from Examples
Attributes Goal
Example
Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait
X1 Yes No No Yes Some $$$ No Yes French 0–10 Yes
X2 Yes No No Yes Full $ No No Thai 30–60 No
X3 No Yes No No Some $ No No Burger 0–10 Yes
X4 Yes No Yes Yes Full $ No No Thai 10–30 Yes
X5 Yes No Yes No Full $$$ No Yes French >60 No
X6 No Yes No Yes Some $$ Yes Yes Italian 0–10 Yes
X7 No Yes No No None $ Yes No Burger 0–10 No
X8 No No No Yes Some $$ Yes Yes Thai 0–10 Yes
© G. Lakemeyer
Trivial DT: have one path in the tree Instead: find a compact DT which
per example (memorizing). covers all examples.
Patrons?
+: +: X1,X3,X6,X8 +: X4,X12
−: X7,X11 −: −: X2,X5,X9,X10
+: X1,X3,X4,X6,X8,X12
(b) −: X2,X5,X7,X9,X10,X11
Type?
© G. Lakemeyer
+: X1,X3,X4,X6,X8,X12
(c) −: X2,X5,X7,X9,X10,X11
Patrons?
+: +: X1,X3,X6,X8 +: X4,X12
−: X7,X11 −: −: X2,X5,X9,X10
No Yes Hungry?
Yes No
+: X4,X12 +:
−: X2,X10 −: X5,X9
AI/WS-2024/25 10 / 32
An Algorithm
else
best CHOOSE-ATTRIBUTE(attributes, examples)
tree a new decision tree with root test best
for each value vi of best do
examplesi f elements of examples with best = vi g
subtree DECISION-TREE-LEARNING(examplesi, attributes best,
MAJORITY-VALUE(examples))
add a branch to tree with label vi and subtree subtree
end
return tree
AI/WS-2024/25 11 / 32
1 If there are only positive or only negative examples, then done. Answer
Yes or No, respectively.
2 If there are both positive and negative examples, then choose the best
attribute to distinguish between them.
3 If there are no more examples, then there are no examples with these
properties. Answer Yes if the majority of the examples at the parent node
© G. Lakemeyer
AI/WS-2024/25 12 / 32
Example
No Yes WaitEstimate?
No Yes Hungry?
Yes No >60 30−60 10−30 0−10
© G. Lakemeyer
AI/WS-2024/25 13 / 32
Evaluating a Learning Algorithm
Note:
Keeping the training and test sets separate is crucial!
Common mistake: After a round of testing the learning algorithm is
modified and then trained and tested with new sets generated from the
same set of examples as before.
The problem is that knowledge about the test set is already contained in
the algorithm, i.e. training and test sets are no longer independent.
AI/WS-2024/25 14 / 32
Using Information Theory to Find Next Attribute
Flipping a coin:
(Heads, Tails). Here the bet is e1,- on Heads.
1 fair coin: P(H)=P(T)= 0.5
I am willing to pay e.99 for the right answer!
2 unfair coin: P(H)=0.99; P(T)=0.01
How much is the correct answer worth to me now?
AI/WS-2024/25 15 / 32
Information Theory (2)
Information content:
n
X
© G. Lakemeyer
Fair coin:
I ( 12 , 21 ) = − 12 × log2 12 − 1
2
× log2 12 = 1 bit.
Unfair coin:
99 1
I ( 100 , 100
) = 0.08 bits.
AI/WS-2024/25 16 / 32
Computing the Information Gain of an Attribute (1)
© G. Lakemeyer
AI/WS-2024/25 17 / 32
Computing the Information Gain of an Attribute (2)
© G. Lakemeyer
AI/WS-2024/25 18 / 32
kemeyer
Restaurant
Learning Curve Example
of the Restaurant Example
1
0.9
0.7
0.6
0.5
0.4
0 20 40 60 80 100
© G. Lakemeyer
Introduction to AI
DT’s in practice:
DT’s in practice:
The GASOIL
The GASOIL expert
expert system system
to separate to separate
crude crude
oil from gas. Makes decisions on
the basis of attributes
oil from like thedecisions
gas. Makes proportionon
of the
oil/gas/water,
basis of throughput, pressure,
viscosity, temperature
attributes etc. proportion
like the The completeofsystem has about 2500 rules (paths in
oil/gas/water,
the DT). Is better than most human experts in this area.
throughput, pressure, viscosity, temperature etc.
Flight simulator for a Cessna. Data generated by observing 3 test pilots during 30
The complete system has about 2500 rules
test flights each. 90000 examples with 20 attributes, uses C4.5 (state-of-the-art
(paths in the DT).
DT-Alg.)
Is better than most human experts in this area.
AI/WS-2024/25 19 / 32
Flight simulator for a Cessna. Data generated by
Learning general logical descriptions
∀xG(x ) ≡ α(x ).
© G. Lakemeyer
Restaurant example:
Hypothesis Hr (corresponds to the previous decision tree):
AI/WS-2024/25 20 / 32
False Positive and Negative Examples
Examples are also logical descriptions of the kind:
Let
© G. Lakemeyer
Then Ex13 is called a false negative example because Hr predicts ¬WillWait (X13 ),
yet the example is positive.
Similarly, an example is false positive if Hr says it should be positive, yet in fact it is
negative.
Note: False positive and false negative examples are logically inconsistent with a
given hypothesis.
AI/WS-2024/25 21 / 32
Learning as the Elimination of Hypotheses
AI/WS-2024/25 22 / 32
Strategy of the Current Best Hypothesis
Only consider one hypothesis at a time. If there is a new example which is
inconsistent with the hypothesis, then change it in the following way. Let the
extension of a hypothesis be the set of objects which satisfy the goal
predicate according to the hypothesis.
make the extension bigger for a false negative
Generalization:
example, (see b+c).
Spezialization:
(d+e).
− − −
− − − − − − − − − − − − − −
−
− −
− − − − −
− − − − −
+ + + + +
+ + + + +
+ − + − + − + − + −
− − − − −
+ + + + +
+ + + + + − + + − + +
+ + − + + − + + − + + − + + −
− − − − − − − − − −
(a) (b) (c) (d) (e)
AI/WS-2024/25 25 / 32
PAC-Learning
When (realistically) assuming that the ideal function f to be learned is
unknown, how can one ever be certain that the hypothesis h found is close to
f?
The PAC-Theorie of Learning gives us criteria when h is Probably
Approximately Correct.
Tells us how many examples one needs to see so that h is within ϵ of f with
probability (1 − δ) for arbitrarily small δ und ϵ (̸= 0).
© G. Lakemeyer
H bad
∋
f
AI/WS-2024/25 26 / 32
Examples Needed for a Good Hypothesis (1)
© G. Lakemeyer
AI/WS-2024/25 27 / 32
Examples Needed for a Good Hypothesis (2)
© G. Lakemeyer
AI/WS-2024/25 28 / 32
Decision Lists
Decision lists (DL’s) consist of a number of tests, which themselves consist of
a conjunction of a bounded number of literals. If a test is successful (all the
literals are satisfied), then the DL tells us which value to return. Otherwise,
the next test is tried.
Example
N N
Patrons(x,Some) Patrons(x,Full) Fri/Sat(x) No
>
© G. Lakemeyer
Y Y
Yes Yes
Note:
Decision lists represent only a restricted class of logical formulas.
AI/WS-2024/25 29 / 32
Examples Needed for Decision Lists (1)
© G. Lakemeyer
AI/WS-2024/25 30 / 32
Examples Needed for Decision Lists (2)
© G. Lakemeyer
AI/WS-2024/25 31 / 32
!G
c
Algorithm Decision Lists Decision Lists
Algorithm
Lakemeyer
function DECISION -L
function D IST-L-LEARNING
ECISION (examples)
IST-LEARNING returns
(examples) returns a decision
a decision list, No or failure
list, No or failure
Restaurant example:
Restaurant example:
1
0.9
DLL
% correct on test set
0.8 DTL
0.7
0.6
0.5
0.4
Learning 25
0 20 40 60 80 100
Training set size
AI/WS-2024/25 32 / 32