10 Learning Annot
10 Learning Annot
G. Lakemeyer
Performance standard
Performance
Critic Sensors
Agent in the old sense.
Element:
feedback Tells the system how good
Critic:
Environment
changes or bad it is performing.
Learning Performance
element element Learning
knowledge Improves the system.
learning Element:
goals
Problem Suggests actions to test
Problem
generator Generator: how good the system performs.
Agent Effectors
AI/WS-2024/25 2 / 32
Kinds of Feedback during Learning
A
© G. Lakemeyer
Supervised Both the input and the correct output are available
Learning: to the learner. (There is a teacher (supervisor).)
AI/WS-2024/25 3 / 32
Inductive Learning
-
o o o o
o o o o
o o o o
o o o o
o o o o
Note: Since there are many possibilities for h, this works only with additional
assumptions which restrict the search space: bias. (often
simplicits)
In the following: Decision Trees (DT’s) as an example of inductive learning.
AI/WS-2024/25 4 / 32
Decision Trees
AI/WS-2024/25 5 / 32
Restaurant Example
Patrons?
No Yes WaitEstimate?
No Yes No Yes
None
No
Patrons?
Some
Yes
- Full
WaitEstimate?
e
V1(atbut the branch
yes-branch
>60 30−60 - 10−30 0−10
No Alternate? Hungry? Yes
No Yes
I
No Yes
No Yes No Yes
© G. Lakemeyer
Theorem:
Every propositional formula (Boolean function) is representable by a decision tree.
AI/WS-2024/25 7 / 32
Learning in Decision Trees
Decision trees can trivially represent any Boolean function by having each
path represent one valuation of the attributes (atomic formulas). Often,
however, there are much more compact representations.
Always?
No! For example, the parity function (answers Yes if an even number of
© G. Lakemeyer
X10 Yes Yes Yes Yes Full $$$ No Yes Italian 10–30 No
X11 No No No No None $ No No Thai 0–10 No
X12 Yes Yes Yes Yes Full $ No No Burger 30–60 Yes
Trivial DT: have one path in the tree Instead: find a compact DT which
per example (memorizing). covers all examples.
Patrons?
+: +: X1,X3,X6,X8 +: X4,X12
−: X7,X11 −: −: X2,X5,X9,X10
NO YES
+: X1,X3,X4,X6,X8,X12
(b) −: X2,X5,X7,X9,X10,X11
Type?
© G. Lakemeyer
+: X1,X3,X4,X6,X8,X12
(c) −: X2,X5,X7,X9,X10,X11
Patrons?
+: +: X1,X3,X6,X8 +: X4,X12
−: X7,X11 −: −: X2,X5,X9,X10
No Yes Hungry?
Yes No
+: X4,X12 +:
−: X2,X10 −: X5,X9
AI/WS-2024/25 Ne 10 / 32
An Algorithm
else
best CHOOSE-ATTRIBUTE(attributes, examples)
tree a new decision tree with root test best
6
for each value vi of best do
examplesi elements of examples with best = vi
subtree DECISION-TREE-LEARNING(examplesi, attributes best,
MAJORITY-VALUE(examples))
add a branch to tree with label vi and subtree subtree
end
return tree
AI/WS-2024/25 11 / 32
1 If there are only positive or only negative examples, then done. Answer
Yes or No, respectively.
2 If there are both positive and negative examples, then choose the best -
AI/WS-2024/25 12 / 32
Example
To
None Some Full None Some Full
No Yes WaitEstimate?
No Yes Hungry?
Yes No >60 30−60 10−30 0−10
© G. Lakemeyer
AI/WS-2024/25 13 / 32
Evaluating a Learning Algorithm
80 %
1 Collect a large set of examples.=
2 Separate them into disjoint training and test sets.
3 Use the training set to generate a hypothesis H (e.g. a decision tree).
4 Measure the percentage of correctly classified examples of the test set.
5 Repeat steps 1–4 for randomly selected training sets of different size.
© G. Lakemeyer
Note:
Keeping the training and test sets separate is crucial!
Common mistake: After a round of testing the learning algorithm is
modified and then trained and tested with new sets generated from the
same set of examples as before.
The problem is that knowledge about the test set is already contained in
the algorithm, i.e. training and test sets are no longer independent.
AI/WS-2024/25 14 / 32
Using Information Theory to Find Next Attribute
Flipping a coin:
(Heads, Tails). Here the bet is e1,- on Heads.
1 fair coin: P(H)=P(T)= 0.5
I am willing to pay e.99 for the right answer!
2 unfair coin: P(H)=0.99; P(T)=0.01
How much is the correct answer worth to me now?
. 01 E
AI/WS-2024/25 15 / 32
Information Theory (2)
Information content:
n
X
I (P (v1 ), . . . , P (vn )) = P (vi ) ⇥ log2 P (vi ).
© G. Lakemeyer
i =1
Fair coin:
I ( 12 , 12 ) = 1
2
⇥ log2 12 1
2
⇥ log2 12 = 1 bit.
Unfair coin:
99 1
I ( 100 , 100
) = 0.08 bits.
AI/WS-2024/25 16 / 32
Computing the Information Gain of an Attribute (1)
Let p #pos examples (YES)
= .
n = #
meg .
examples (NO)
REP)
The distribu a
Assumption :
the
I answer reflect
true distribution
In) -Flag t
© G. Lakemeyer
for
I(t) = 1 Bit
AI/WS-2024/25 17 / 32
Computing the Information Gain of an Attribute (2)
To
Ein
to Ei
Information
content
along the path wi
the probabilitate
is
IP ) choosing this path
© G. Lakemeyer
is
after A is tested , the remaining inform .
needed in
Rest(1) =
Eni
Gain (1) =
In) -
Rest (A)
Gain
AI/WS-2024/25 (Patrons 18 / 32
kemeyer
Learning Curve of the Restaurant
Restaurant Example
Example
1
0.9
0.7
0.6
0.5
0.4
0 20 40 60 80 100
© G. Lakemeyer
Introduction to AI
DT’s in practice:
DT’s in practice:
The GASOIL
The GASOIL expert
expert system system
to separate to separate
crude crude
oil from gas. Makes decisions on
the basis of attributes
oil from like thedecisions
gas. Makes proportionon
of the
oil/gas/water,
basis of throughput, pressure,
viscosity, temperature
attributes etc. proportion
like the The completeofsystem has about 2500 rules (paths in
oil/gas/water,
the DT). Is better than most human experts in this area.
throughput, pressure, viscosity, temperature etc.
Flight simulator for a Cessna. Data generated by observing 3 test pilots during 30
The complete system has about 2500 rules
test flights each. 90000 examples with 20 attributes, uses C4.5 (state-of-the-art
(paths in the DT).
DT-Alg.)
Is better than most human experts in this area.
AI/WS-2024/25 19 / 32
Flight simulator for a Cessna. Data generated by
Learning general logical descriptions
8xG(x ) ⌘ ↵(x ).
Restaurant example:
© G. Lakemeyer
AI/WS-2024/25 20 / 32
False Positive and Negative Examples
Examples are also logical descriptions of the kind:
Let
Ex NationMatisse
/X Some
, a ,
see
Then Ex13 is called a false negative example because Hr predicts ¬WillWait (X13 ),
yet the example is positive.
Similarly, an example is false positive if Hr says it should be positive, yet in fact it is
negative.
Note: False positive and false negative examples are logically inconsistent with a
given hypothesis.
AI/WS-2024/25
needed 21 / 32
Learning as the Elimination of Hypotheses
AI/WS-2024/25 22 / 32
Strategy of the Current Best Hypothesis
Only consider one hypothesis at a time. If there is a new example which is
inconsistent with the hypothesis, then change it in the following way. Let the
extension of a hypothesis be the set of objects which satisfy the goal
predicate according to the hypothesis.
make the extension bigger for a false negative
Generalization:
example, (see b+c).
Spezialization:
(d+e).
− − −
− − − − − − − − − − − − − −
−
− −
− − − − −
− − − − −
+ + + + +
+ + + + +
+ − + − + − + − + −
− − − − −
+ + + + +
+ + + + + − + + − + +
+ + − + + − + + − + + − + + −
− − − − − − − − − −
(a) (b) (c) (d) (e)
AI/WS-2024/25 25 / 32
PAC-Learning Lesli Valiant
When (realistically) assuming that the ideal function f to be learned is
unknown, how can one ever be certain that the hypothesis h found is close to
f?
The PAC-Theorie of Learning gives us criteria when h is Probably
Approximately Correct.
Tells us how many examples one needs to see so that h is within ✏ of f with
probability (1 ) for arbitrarily small und ✏ (6= 0).
© G. Lakemeyer
H bad
∋
f
AI/WS-2024/25 26 / 32
Examples Needed for a Good Hypothesis (1)
Error (h) = PCh(X) + f(x))Xi drawn
from dist . ]
his approx correct if Emo(h)E
.
OCE
Want #examplesm needed that
: so the
probability
that a
hypoth . I is consistent with all m
example
n (1-0)
high for OLE
Hisad = set of all bod hypotheses-EhplEmorha) 9) >
© G. Lakemeyer
wandom
Let his be a
hypoth from Hisad
.
Note :
for Boolean withn attribute
IHI = 22
us need to see O (24) example
to get a
good h with
high
probability
AI/WS-2024/25 28 / 32
Decision Lists
Decision lists (DL’s) consist of a number of tests, which themselves consist of
a conjunction of a bounded number of literals. If a test is successful (all the
literals are satisfied), then the DL tells us which value to return. Otherwise,
the next test is tried.
Example
N N
Patrons(x,Some) Patrons(x,Full) Fri/Sat(x) No
>
© G. Lakemeyer
Y Y
2 -
DL(n)
Yes Yes
In a tibutes
This corresponds to the hypothesis E2 literal
test
pe
H4 : 8xWillWait (x ) ⌘ Patrons(x , some) _ (Patrons(x , full ) ^ Fri /Sat (x )).
Note:
Decision lists represent only a restricted class of logical formulas.
AI/WS-2024/25 29 / 32
Examples Needed for Decision Lists (1)
Hypothesis Space-set of all DL
Q How
big n
: this set ?
DL
Let E-DL(h) be the
language of all
with I h fit. perfect n
over attributes .
Want 1k-DL(n)
test
Let Conj (n h) the language of
© G. Lakemeyer
,
be
Let un
first consider a DL an a set
of .
test (ignore ordering)
for each test T then 3 cases
occur as
3) does not
1 +
D At
occcer
↓
AI/WS-2024/25
YES in the DL
30 / 32
Examples Needed for Decision Lists (2)
also :
(Coj(h))
Hence 12-DL(n)) 20(he. log (nt)
=
Finally :
m (Int O lognt() +
AI/WS-2024/25
thus
roughly O(nt) example
suffice ! 31 / 32
c G Lakemeyer
Algorithm Decision Lists Decision Lists
Algorithm
function DECISION -L
function D IST-L-LEARNING
ECISION (examples)
IST-LEARNING returns
(examples) returns a decision
a decision list, No or failure
list, No or failure
Restaurant example:
Restaurant example:
1
0.9
DLL
% correct on test set
0.8 DTL
0.7 2 DL (n) ?
0.6
0.5
0.4
Learning 25
0 20 40 60 80 100
Training set size
AI/WS-2024/25 32 / 32