0% found this document useful (0 votes)
3 views

10 Learning

The document provides an introduction to artificial intelligence with a focus on learning mechanisms, particularly decision trees. It outlines the goal of learning as optimizing future behavior based on past experiences and describes various types of learning, including supervised, reinforcement, and unsupervised learning. Additionally, it explains the process of generating decision trees from examples, evaluating learning algorithms, and using information theory to determine the most informative attributes for classification.

Uploaded by

leimu.864
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

10 Learning

The document provides an introduction to artificial intelligence with a focus on learning mechanisms, particularly decision trees. It outlines the goal of learning as optimizing future behavior based on past experiences and describes various types of learning, including supervised, reinforcement, and unsupervised learning. Additionally, it explains the process of generating decision trees from examples, evaluating learning algorithms, and using information theory to determine the most informative attributes for classification.

Uploaded by

leimu.864
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Learning

Introduction to Artificial Intelligence


© G. Lakemeyer

G. Lakemeyer

Winter Term 2024/25


Learning

The Goal of Learning


Optimize future behavior on the basis of the history of percepts, actions, and
knowledge about the world.

A general architecture of a learning agent:


© G. Lakemeyer

Performance standard

Critic Sensors Performance


Agent in the old sense.
Element:
feedback Environment Tells the system how good
Critic:
changes or bad it is performing.
Learning Performance
element element Learning
knowledge Improves the system.
learning Element:
goals
Problem Suggests actions to test
Problem
generator Generator: how good the system performs.
Agent Effectors

AI/WS-2024/25 2 / 32
Kinds of Feedback during Learning

In an abstract sense, an agent is a function from inputs (like percepts) to


outputs (actions). We distinguish the actual function from the ideal function,
which models optimal behavior. The goal of learning is then to approximate
the ideal function as good as possible.
© G. Lakemeyer

Supervised Both the input and the correct output are available
Learning: to the learner. (There is a teacher (supervisor).)

Reinforcement While the correct answer is not available, there is


Learning: feedback in terms of rewards and punishment.

There is no indication of what the correct output is.


Unsupervised Can learn structure in the input using supervised learning meth-
ods by predicting future inputs on the basis of past inputs.
Learning:
(The system is its own supervisor.)

AI/WS-2024/25 3 / 32
Inductive Learning

Learning the ideal function with supervision.


Suppose we are given the input and correct output as a pair (x , f (x )).
(f is the ideal yet unknown function)
Wanted: a function (hypothesis) h which approximates f .

Example with 4 different h’s.


© G. Lakemeyer

o o o o
o o o o
o o o o
o o o o
o o o o

(a) (b) (c) (d)

Note: Since there are many possibilities for h, this works only with additional
assumptions which restrict the search space: bias.
In the following: Decision Trees (DT’s) as an example of inductive learning.

AI/WS-2024/25 4 / 32
Decision Trees

Input: Description of a situation using a set of properties


(roughly, literals in FOL).
Output: Yes/No decision relative to a goal predicate.
© G. Lakemeyer

With that decision trees represent Boolean functions.


[Can be easily generalized to many-valued functions.]

We want to learn an ideal Boolean function or a logical formula which


represents this function.

AI/WS-2024/25 5 / 32
Restaurant Example
Patrons?

None Some Full

No Yes WaitEstimate?

>60 30−60 10−30 0−10


No Alternate? Hungry? Yes
No Yes No Yes

Reservation? Fri/Sat? Yes Alternate?


No Yes No Yes No Yes
© G. Lakemeyer

Bar? Yes No Yes Yes Raining?


No Yes No Yes

No Yes No Yes

Patrons: how many people? Fri/Sat: is it a Friday or a Saturday?


WaitEstimate: how long to wait? Raining: is it raining?
Alternate: are there alternatives?
Hungry: am I hungry? Other attributes:
Reservation: do I have a reservation? Price: how expensive are the meals?
Bar: is there a bar in the restaurant? Type: what type of restaurant is it?
AI/WS-2024/25 6 / 32
Expressiveness of DT’s
Patrons?

None Some Full

No Yes WaitEstimate?

>60 30−60 10−30 0−10


No Alternate? Hungry? Yes
No Yes No Yes

Reservation? Fri/Sat? Yes Alternate?


No Yes No Yes No Yes

Bar? Yes No Yes Yes Raining?


No Yes No Yes

No Yes No Yes
© G. Lakemeyer

Each tree describes a set of implications in FOL:


∀rPatrons(r , Full ) ∧ WaitEstimate(r , 10 − 30) ∧ ¬Hungry (r ) ⊃ WillWait (r ).
Not all formulas in FOL are representable because the tree only refers to one object
(here: the restaurant r ).
For example: ∃r2 Near (r2 , r ) ∧ Price(r , p) ∧ Price(r2 , p2 ) ∧ Cheaper (p2 , p) is not
representable.

Theorem:
Every propositional formula (Boolean function) is representable by a decision tree.
AI/WS-2024/25 7 / 32
Learning in Decision Trees

Decision trees can trivially represent any Boolean function by having each
path represent one valuation of the attributes (atomic formulas). Often,
however, there are much more compact representations.

Always?
© G. Lakemeyer

No! For example, the parity function (answers Yes if an even number of
attributes are true).

Learning in decision trees:


Given positive (answer: Yes) and negative (answer: No) examples, find the
correct Boolean function represented as a decision tree.
n
Problem: With n attributes there are 22 possible functions. How does one
find the right one??

AI/WS-2024/25 8 / 32
Generating a Decision Tree from Examples
Attributes Goal
Example
Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait
X1 Yes No No Yes Some $$$ No Yes French 0–10 Yes
X2 Yes No No Yes Full $ No No Thai 30–60 No
X3 No Yes No No Some $ No No Burger 0–10 Yes
X4 Yes No Yes Yes Full $ No No Thai 10–30 Yes
X5 Yes No Yes No Full $$$ No Yes French >60 No
X6 No Yes No Yes Some $$ Yes Yes Italian 0–10 Yes
X7 No Yes No No None $ Yes No Burger 0–10 No
X8 No No No Yes Some $$ Yes Yes Thai 0–10 Yes
© G. Lakemeyer

X9 No Yes Yes No Full $ Yes No Burger >60 No


X10 Yes Yes Yes Yes Full $$$ No Yes Italian 10–30 No
X11 No No No No None $ No No Thai 0–10 No
X12 Yes Yes Yes Yes Full $ No No Burger 30–60 Yes

Trivial DT: have one path in the tree Instead: find a compact DT which
per example (memorizing). covers all examples.

Problem: Idea: Occams Razor


the tree is too big, “The most likely hypothesis is the
no generalization possible. simplest which covers all examples.”

⇒ choose an attribute which is most helpful in classifying the examples.


AI/WS-2024/25 9 / 32
Choosing Attributes/Nodes for the Decision Tree
+: X1,X3,X4,X6,X8,X12
(a) −: X2,X5,X7,X9,X10,X11

Patrons?

None Some Full

+: +: X1,X3,X6,X8 +: X4,X12
−: X7,X11 −: −: X2,X5,X9,X10

+: X1,X3,X4,X6,X8,X12
(b) −: X2,X5,X7,X9,X10,X11
Type?
© G. Lakemeyer

French Italian Thai Burger


+: X1 +: X6 +: X4,X8 +: X3,X12
−: X5 −: X10 −: X2,X11 −: X7,X9

+: X1,X3,X4,X6,X8,X12
(c) −: X2,X5,X7,X9,X10,X11

Patrons?

None Some Full

+: +: X1,X3,X6,X8 +: X4,X12
−: X7,X11 −: −: X2,X5,X9,X10

No Yes Hungry?

Yes No

+: X4,X12 +:
−: X2,X10 −: X5,X9

AI/WS-2024/25 10 / 32
An Algorithm

function DECISION-TREE-LEARNING(examples, attributes, default) returns a decision tree


inputs: examples, set of examples
attributes, set of attributes
default, default value for the goal predicate

if examples is empty then return default


else if all examples have the same classification then return the classification
else if attributes is empty then return MAJORITY-VALUE(examples)
© G. Lakemeyer

else
best CHOOSE-ATTRIBUTE(attributes, examples)
tree a new decision tree with root test best
for each value vi of best do
examplesi f elements of examples with best = vi g
subtree DECISION-TREE-LEARNING(examplesi, attributes best,
MAJORITY-VALUE(examples))
add a branch to tree with label vi and subtree subtree
end
return tree

AI/WS-2024/25 11 / 32
1 If there are only positive or only negative examples, then done. Answer
Yes or No, respectively.
2 If there are both positive and negative examples, then choose the best
attribute to distinguish between them.
3 If there are no more examples, then there are no examples with these
properties. Answer Yes if the majority of the examples at the parent node
© G. Lakemeyer

are positive, otherwise answer No.


4 If there are no more attributes, then there are identical examples with
different classifications, that is, there is either an error in the data (noise),
or the attributes are insufficient to distinguish between the situations.
Answer Yes if the majority of examples are positive, otherwise answer No.

AI/WS-2024/25 12 / 32
Example

Compared to the DT drawn by hand:


Patrons? Patrons?

None Some Full None Some Full

No Yes WaitEstimate?
No Yes Hungry?
Yes No >60 30−60 10−30 0−10
© G. Lakemeyer

No Alternate? Hungry? Yes


Type? No
No Yes No Yes

French Italian Thai Burger Reservation? Fri/Sat? Yes Alternate?


Yes No Fri/Sat? Yes No Yes No Yes No Yes

No Yes Bar? Yes No Yes Yes Raining?


No Yes No Yes
No Yes
No Yes No Yes

AI/WS-2024/25 13 / 32
Evaluating a Learning Algorithm

1 Collect a large set of examples.


2 Separate them into disjoint training and test sets.
3 Use the training set to generate a hypothesis H (e.g. a decision tree).
4 Measure the percentage of correctly classified examples of the test set.
5 Repeat steps 1–4 for randomly selected training sets of different size.
© G. Lakemeyer

Note:
Keeping the training and test sets separate is crucial!
Common mistake: After a round of testing the learning algorithm is
modified and then trained and tested with new sets generated from the
same set of examples as before.
The problem is that knowledge about the test set is already contained in
the algorithm, i.e. training and test sets are no longer independent.

AI/WS-2024/25 14 / 32
Using Information Theory to Find Next Attribute

Information theory was founded by Shannon and Weaver (1949)


We would like to know: What is the information content of an answer to a
Yes/No query?
Analogy to betting:
Information content ≈ how much is it worth to me if someone tells me the
right answer?
© G. Lakemeyer

Flipping a coin:
(Heads, Tails). Here the bet is e1,- on Heads.
1 fair coin: P(H)=P(T)= 0.5
I am willing to pay e.99 for the right answer!
2 unfair coin: P(H)=0.99; P(T)=0.01
How much is the correct answer worth to me now?

AI/WS-2024/25 15 / 32
Information Theory (2)

The information content is measured in bits.


Let v1 , . . . , vn be the possible answers to a question with prob. P (vi ).

Information content:
n
X
© G. Lakemeyer

I (P (v1 ), . . . , P (vn )) = −P (vi ) × log2 P (vi ).


i =1

Fair coin:
I ( 12 , 21 ) = − 12 × log2 12 − 1
2
× log2 12 = 1 bit.
Unfair coin:
99 1
I ( 100 , 100
) = 0.08 bits.

AI/WS-2024/25 16 / 32
Computing the Information Gain of an Attribute (1)
© G. Lakemeyer

AI/WS-2024/25 17 / 32
Computing the Information Gain of an Attribute (2)
© G. Lakemeyer

AI/WS-2024/25 18 / 32
kemeyer
Restaurant
Learning Curve Example
of the Restaurant Example
1

0.9

% correct on test set


0.8

0.7

0.6

0.5

0.4
0 20 40 60 80 100
© G. Lakemeyer

Introduction to AI

Training set size

DT’s in practice:
DT’s in practice:
The GASOIL
The GASOIL expert
expert system system
to separate to separate
crude crude
oil from gas. Makes decisions on
the basis of attributes
oil from like thedecisions
gas. Makes proportionon
of the
oil/gas/water,
basis of throughput, pressure,
viscosity, temperature
attributes etc. proportion
like the The completeofsystem has about 2500 rules (paths in
oil/gas/water,
the DT). Is better than most human experts in this area.
throughput, pressure, viscosity, temperature etc.
Flight simulator for a Cessna. Data generated by observing 3 test pilots during 30
The complete system has about 2500 rules
test flights each. 90000 examples with 20 attributes, uses C4.5 (state-of-the-art
(paths in the DT).
DT-Alg.)
Is better than most human experts in this area.
AI/WS-2024/25 19 / 32
Flight simulator for a Cessna. Data generated by
Learning general logical descriptions

Goal: Learning a 1-place predicate G [e.g.: WillWait (r )].


Hypothesis space: The set of all logical definitions of the goal predicate

∀xG(x ) ≡ α(x ).
© G. Lakemeyer

Restaurant example:
Hypothesis Hr (corresponds to the previous decision tree):

∀rWillWait (r ) ≡ Patrons(r , some)


∨ Patrons(r , full ) ∧ Hungry (r ) ∧ Type(r , French)
∨ Patrons(r , full ) ∧ Hungry (r ) ∧ Type(r , Thai )
∧Fri /Sat (r )
∨ Patrons(r , full ) ∧ Hungry (r ) ∧ Type(r , Burger )

AI/WS-2024/25 20 / 32
False Positive and Negative Examples
Examples are also logical descriptions of the kind:

Ex1 = Alternate(X1 ) ∧ ¬Bar (X1 ) ∧ . . . ∧


Patrons(X1 , some) ∧ . . . ∧ WillWait (X1 )

Note that Hr is logically consistent with Ex1 .

Let
© G. Lakemeyer

Ex13 = Patrons(X13 , Full ) ∧ Wait (X13 , 0-10)∧


¬Hungry (X13 ) ∧ . . . ∧ WillWait (X13 ).

Then Ex13 is called a false negative example because Hr predicts ¬WillWait (X13 ),
yet the example is positive.
Similarly, an example is false positive if Hr says it should be positive, yet in fact it is
negative.
Note: False positive and false negative examples are logically inconsistent with a
given hypothesis.
AI/WS-2024/25 21 / 32
Learning as the Elimination of Hypotheses

Suppose we are given n possible hypotheses Hr . Then we can represent the


hypothesis space as
H1 ∨ H2 ∨ . . . ∨ Hn .

Learning can be thought of as a successive reduction of the hypothesis space


by eliminating disjuncts for which we have false negative or false positive
© G. Lakemeyer

examples, i.e. examples which are inconsistent with these hypotheses.

Usually not practical since the hypothesis space is too big.

Sometimes it is possible to have compact representations of the hypothesis


space (Version Spaces). An alternative is to only consider one hypothesis and
modify it when needed.

AI/WS-2024/25 22 / 32
Strategy of the Current Best Hypothesis
Only consider one hypothesis at a time. If there is a new example which is
inconsistent with the hypothesis, then change it in the following way. Let the
extension of a hypothesis be the set of objects which satisfy the goal
predicate according to the hypothesis.
make the extension bigger for a false negative
Generalization:
example, (see b+c).

make the extension smaller for a false positive


© G. Lakemeyer

Spezialization:
(d+e).
− − −
− − − − − − − − − − − − − −

− −
− − − − −
− − − − −
+ + + + +
+ + + + +
+ − + − + − + − + −
− − − − −
+ + + + +
+ + + + + − + + − + +
+ + − + + − + + − + + − + + −

− − − − − − − − − −
(a) (b) (c) (d) (e)

If H1 = ∀xG(x ) ≡ α(x ) and H2 = ∀xG(x ) ≡ β(x ), then H2 is a generalization


of H1 iff ∀α(x ) ⊃ β(x ).

A simple kind of generalization is obtained by removing conditions from α.


AI/WS-2024/25 23 / 32
Example
Attributes Goal
Example
Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait
X1 Yes No No Yes Some $$$ No Yes French 0–10 Yes
X2 Yes No No Yes Full $ No No Thai 30–60 No
X3 No Yes No No Some $ No No Burger 0–10 Yes
X4 Yes No Yes Yes Full $ No No Thai 10–30 Yes
X5 Yes No Yes No Full $$$ No Yes French >60 No
X6 No Yes No Yes Some $$ Yes Yes Italian 0–10 Yes
X7 No Yes No No None $ Yes No Burger 0–10 No
X8 No No No Yes Some $$ Yes Yes Thai 0–10 Yes
X9 No Yes Yes No Full $ Yes No Burger >60 No
X10 Yes Yes Yes Yes Full $$$ No Yes Italian 10–30 No
© G. Lakemeyer

X11 No No No No None $ No No Thai 0–10 No


X12 Yes Yes Yes Yes Full $ No No Burger 30–60 Yes

Example X1 is positive. Since Alternate(X1 ) is true, let


H1 : ∀xWillWait (x ) ≡ Alternate(x ).
Example X2 is negative. H1 predicts it as false positive. H1 must be specialized.
H2 : ∀xWillWait (x ) ≡ Alt .(x ) ∧ Patrons(x , some)
X3 is positive, but according to H2 false negative. Generalization results in
H3 : ∀xWillWait (x ) ≡ Patrons(x , some)
X4 is positive, but according to H3 false negative. Dropping Patrons(x , some)
contradicts X2 . Thus add a new disjunct:
H4 : ∀xWillWait (x ) ≡ Patrons(x , some) ∨ (Patrons(x , full ) ∧ Fri /Sat (x )).
AI/WS-2024/25 24 / 32
Problems

Some problems with the current-best hypothesis:


All previous examples need to be tested again.
© G. Lakemeyer

It is difficult to find good heuristics. The search can easily


lead to a dead end.
⇒ uncontrolled backtracking.

AI/WS-2024/25 25 / 32
PAC-Learning
When (realistically) assuming that the ideal function f to be learned is
unknown, how can one ever be certain that the hypothesis h found is close to
f?
The PAC-Theorie of Learning gives us criteria when h is Probably
Approximately Correct.
Tells us how many examples one needs to see so that h is within ϵ of f with
probability (1 − δ) for arbitrarily small δ und ϵ (̸= 0).
© G. Lakemeyer

H bad

f

AI/WS-2024/25 26 / 32
Examples Needed for a Good Hypothesis (1)
© G. Lakemeyer

AI/WS-2024/25 27 / 32
Examples Needed for a Good Hypothesis (2)
© G. Lakemeyer

AI/WS-2024/25 28 / 32
Decision Lists
Decision lists (DL’s) consist of a number of tests, which themselves consist of
a conjunction of a bounded number of literals. If a test is successful (all the
literals are satisfied), then the DL tells us which value to return. Otherwise,
the next test is tried.
Example
N N
Patrons(x,Some) Patrons(x,Full) Fri/Sat(x) No

>
© G. Lakemeyer

Y Y

Yes Yes

This corresponds to the hypothesis

H4 : ∀xWillWait (x ) ≡ Patrons(x , some) ∨ (Patrons(x , full ) ∧ Fri /Sat (x )).

Note:
Decision lists represent only a restricted class of logical formulas.
AI/WS-2024/25 29 / 32
Examples Needed for Decision Lists (1)
© G. Lakemeyer

AI/WS-2024/25 30 / 32
Examples Needed for Decision Lists (2)
© G. Lakemeyer

AI/WS-2024/25 31 / 32
!G
c
Algorithm Decision Lists Decision Lists
Algorithm

Lakemeyer
function DECISION -L
function D IST-L-LEARNING
ECISION (examples)
IST-LEARNING returns
(examples) returns a decision
a decision list, No or failure
list, No or failure

if examples is empty then return the value No


if examples tis empty
a test thatthen
matchesreturn
a nonemptythesubset
value No t of examples
examples
t a test that matches
such that theamembers
nonempty subset
of examples t areexamples t of
all positive or examples
all negative
if there is no such t then return failure
such that the members of examples are
if the examples in examplest are positivet then o Yes
all positive or all negative
if there is noelsesuch
o No t then return failure
returnina decision
if the examples exampleslist with initial test t andthen
outcomeo o Yes
t are positive
and remaining elements given by DECISION-LIST-LEARNING(examples examplest )
else o No
return a decision list with initial test t and outcome o
and remaining elements given by DECISION-LIST-LEARNING(examples examplest )
Introduction to AI
© G. Lakemeyer

Restaurant example:
Restaurant example:
1

0.9
DLL
% correct on test set

0.8 DTL

0.7

0.6

0.5

0.4
Learning 25

0 20 40 60 80 100
Training set size
AI/WS-2024/25 32 / 32

You might also like