AI Final
AI Final
Dr. Sadegh
Soran University
Today we’ll discus:
• Intelligence
• Artificial Intelligence
• A brief history of AI
• Cool current projects in AI
What’s involved in Intelligence?
• Ability to interact with the world (speech, vision, motion,
manipulation)
• Ability to model the world and to reason about it
• Ability to learn and to adapt
Goals in AI
• To build systems that exhibit intelligent behavior
• To understand intelligence in order to model it
Modeling people?
• Sometimes
• But sometimes we want AI systems to be better and smarter than we
are
Computer Chess
• 2/96: Kasparov vs Deep Blue
• Kasparov victorious: 3 wins, 2 draws, 1 loss
• 3/97: Kasparov vs Deeper Blue
• First match won against world champion
• 512 processors: 200 million chess positions per second
.The array is a data structure where you can add or remove any object you wish
Useful terminologies
• A tree consists of a set of nodes and a set of edges.
• An edge is the connection between two nodes.
• There are different nodes:
• Root: the first (top) node
• Parent: a node above some other node(s) (connected)
• Child: A node below another node
• Internal (non-terminal) node: any parent node
• Terminal (external or leaf) node: nodes that do not have any child
Useful terminologies
• Siblings: nodes that share the same parent
• Path: a sequence of nodes connected by edges
• Level of a node: number of edges contained in the path from the root
to this node
• Height of the tree: maximum distance (number of edges) from the
root to the terminal nodes
Multi-way tree and binary tree
• Multi-way tree or multi-branch tree: An internal node may have m
(m>2) child nodes.
• Binary tree: An internal node has at most 2 child nodes. – Binary tree
is useful both for information retrieval (binary search tree) and for
pattern classification (e.g. decision tree).
• Complete binary tree : A binary tree in which every level, except
possibly the last, is completely filled, and nodes in the last level are as
far left as possible.
Tree Traversal
• Tree traversal is the process to visit all nodes of a tree, without
repeating. – Example: print the contents of all nodes; search for all
nodes that have a specified property (e.g. key), etc.
• Order of traversal – pre-order, in-order, post-order, and level-order –
The first three correspond to depth-first search, and the last one to
breadth-first search.
Pre-order, in-order, and post-order
traversal
Level-order traversal
• Pre-order: Start from the root, visit (recursively) the current node, the
left node, and the right node;
• In-order: start from the root, visit (recursively) the left node, the
current node, and the right node.
• Post-order: start from the root, visit (recursively) the left node, the right
node, and the current node.
• A tree can be traversed in level-order using a queue.
Put the root in the queue first, and then repeat the following:
– Get a node (if any) from the queue, visit it, and put its children (if any)
into the queue.
Graph structure
• Graph is a more general data structure.
• Formally, a graph is defined as 2-tuple where 𝑉 is a set of vertices or
nodes; and 𝐸 is a set of edges, arcs, or connections.
G=(V,E)
• Tree is a special graph without cycles. – Each node has one path from
the root. – The path is unique. – All nodes are connected to the root.
Examples
• Computer networks
• Flight maps of airlines
• Highway networks
• City water / sewage networks
• Electrical circuits
Graph is a convenient way for representing all
these physical networks in digital (virtual) forms.
Useful terminologies
• Path: A sequence of nodes connected by edges.
• Simple path: A path in which all nodes are different.
• Cycle: A simple path with the same start and end nodes.
Connected graph: There is a path between any two nodes.
• Connected component: A sub-graph which itself is a connected graph.
• Directed graph: The edges have directions (e.g. one way route).
• Undirected graph: The edges do not have directions (or do not care
the direction).
Useful terminologies
• Weighted graph: Each edge has a weight (e.g. direct
cost to move from one city to another).
• Node expansion: The process to get all child nodes of a
node (useful for graph traversal or graph-based search).
• Spanning tree: A tree that contains all nodes of a
graph.
• Directed acyclic graph (DAG): A special case of directed
graph in which there is no cycles or loops. DAG is often
used to construct a larger classification system from
many two-class classifiers (see the figure).
Examples
State space representation of AI problems
(Search Method)
The maze problem
• Initial state: (0,0)
• Target state: (2,2)
• Available operations:
– Move forward
– Move backward
– Move left
– Move right
• Depends on the current state, the same
operation may have different results.
• Also, an operation may not be executed for
some states.
State space representation of AI problems
The Hanoi’s tower
• Initial state: (123;000;000)
• Target state: (000;123;000)
• Available operations:
– Move a disk from one place to
another.
• Restriction:
– A larger disk cannot be put on a
smaller one.
Why state space representation?
• Any problem can be represented in the same form, formally.
• Any problem can be solved by finding the target state via state
transition, using the available operations → Search problem!
• The results (i.e. the state transition process or the method for finding
this process) can be reused as knowledge.
• Problem: The computation cost can be large!
The maze problem → search graph
• To find the solution, we can just
traverse the graph, starting from (0,0),
and stop when we visit (2,2).
• The result is a path from the initial
node to the target node.
• The result can be different, depends
on the order of graph traversal.
Hanoi’s tower problem → search tree
• Hanoi’s tower can also be solved in a
similar way, but more difficult if we
do it manually because the number
of nodes is much larger.
• Instead of using a search graph, we
can use a search tree.
• That is, start from the initial node,
expend the current node recursively,
and stop when we find the target
node.
Hanoi’s Tower
Simple Search Algorithms
Topics of this lecture
• Random search
• Search with closed list
• Search with open list
• Depth-first and breadth-first search again
• Uniform-cost search
Random search
• Step 1: Current node x=initial node;
• Step 2: If x=target node, stop with success;
• Step 3: Expand x, and get a set S of child nodes;
• Step 4: Select a node x’ from S at random;
• Step 5: x=x’, and return to Step 2.
Random search is not good
• At each step, the next node
is determined at random.
• We cannot guarantee to
reach the target node.
• Even if we can, the path so
obtained can be very
redundant (extremely long).
Search with a closed list
• Do not visit the same node more than once!
• Step 1: Current node x=initial node;
• Step 2: If x=target node, stop with success;
• Step 3: Expand x, and get a set S of child nodes. If
S is empty, stop with failure. Add x to the closed
list.
• Step 4: Select from S a new node x’ that is not in
the closed list.
• Step 5: x=x’, and return to Step 2.
Closed list is not enough!
• Using a closed list, we can
guarantee termination of
search in finite steps.
• However, we may never
reach the target node!
Search with open list
Keep all “un-visited” nodes in another list!
• Step 1: Add the initial node to the open list.
• Step 2: Take a node x from the open list from the top. If the open list
is empty, stop with failure; on the other hand, if x is the target node,
stop with success.
• Step 3: Expand x to obtain a set S of child nodes, and put x into the
closed list.
• Step 4: For each node x’ in S, if it is not in the closed list, add it to the
open list along with the edge (x, x’).
• Step 5: Return to Step 2.
step 1: Open: A
Closed: B Open and Closed Lists
Step 2: Open: S Visit all nods starting from B
Closed: B A
Step 3: Open: C G
Closed: B A S
Step 4 Open: G D E F
Closed: B A S C
Step 5 Open: G D F H
Closed: B A S C E Step 8 Open: D
Closed: B A S C E H G F
Step 6 Open: G D F
Closed: B A S C E H
tput: A S G H E C F D B
F F D D
C C C C
E E E E
H H H H
G G G G
S S S B
S
A A A A
A
SH U SH U SH U
Stack
DFS
4 5 10 11
3 3 6 9 9 12
7 8 12
2 2 2 8 8 8
1 1 1 1 1
1 1 1
Queue BFS
1 2 7 8 3 6 9 12 4 5 10 11
2 7 8 3 6 9 12 4 5 10 11
7 8 3 6 9 12 4 5 10 11
8 3 6 9 12 4 5 10 11
6 12 4 5 10 11
5 11
output: A
Breadth First Search
A B S C G D E F H
deque
B S C G D E F H
S G D E F H
E F H
F H
Enque
Uniform-cost search: The Dijkstra's
algorithm
• Usually, the solution is not unique. It is expected to find the BEST one.
• For example, if we want to travel around the world, we may try to
find the fastest route; the most economic route; or the route in which
we can visit more friends.
• The uniform-cost search or Dijkstra’s algorithm is a method for solving
this problem.
Uniform-cost search
• Step 1: Add the initial node x0 and its cost C(x0)=0 to the open list.
• Step 2: Get a node x from the top of the open list. If the open list is
empty, stop with failure. If x is the target node, stop with success.
• Step 3: Expand x to get a set S of child nodes, and move x to the closed
list.
• Step 4: For each x’ in S but not in the closed list, find its accumulated cost
C(x’)=C(x)+d(x,x’); and add x’, C(x’), and (x, x’) to the open list. If x’ is
already in the open list, update its cost and link if the new cost is smaller.
• Step 5: Sort the open list based on the node costs, and return to Step 2.
Uniform-cost search
• During uniform-cost search, we can always find the best path from
the initial node to the current node. That is, when search stops with
success, the solution must be the best one.
• In the algorithm, c(x) is the cost of the node x accumulated from the
initial node; and d(x,x’) is the cost for state transition (e.g. the distance
between to adjacent cities).
• If we set d(x,x’)=1 for all edges, uniform-cost search is equivalent to
the breadth-first search.
Uniform-cost search: 3 goals: Find the
nearest Goal
S
S 5
9 6
5 1
9 6 6 A B D
3
TC=5 Total Cost=9
A B C D TC=6
1 2
2
2 Cheapest one is A
9 5 7 2
Visited: S A
7 Open list:A B D G1
G1 G2 F G3 E
8
Second and third searches of our graph
S S
5 6 5 6
9 9
9 3 2 2 9 3
Cheapest one is D
TC=14 G1 B TC=8 C E TC=14 G1 B TC=8
Visited: S, A
5 TC=87 TC=8
7
Cheapest are: B, C, E we continue with B alphabetically: It is your option
Visited: S, A, D
G2 F G3
8 TC=15
TC=13 TC=15 G3
TC=23
S
1- We close B and C active terminals
5 6
9
because they are visited with lower
cost 8 and 8, respectively lower than 9 and 9.
A TC=5 B Total Cost=9 D TC=6 2- We don’t continue F because D and G3
already visited.
9 3 closed 2 2 3- between three of our goals, G2 won.
4- the low cost path for achieving goal is
TC=14 G1 B TC=8 TC=8 C TC=8 E S,D,C,G2
1 5 7 7
TC=9 C G2 F G3
Visited: S, A, D, B, C, E
Example: Search a path with the minimum
cost
4 5
10
2 8 6 9
5 6
Search the path with the minimum:………
cost
Heuristic Search
Algorithms
Topics of this lecture
• What are heuristics?
• What is heuristic search?
• Best first search
• A* algorithm
• Generalization of search problems
What are heuristics?
• Heuristics are know-hows obtained through a lot experiences.
• Heuristics often enable us to make decisions quickly without thinking
deeply about the reasons.
• In many cases, heuristics are “tacit knowledge” that cannot be
explained verbally.
• The more experiences we have, the better the heuristics will be.
Why heuristic search?
• Based on the heuristics, we can get good solutions without
investigating all possible cases.
• In fact, a deep learner used in Alpha-Go can learn heuristics for
playing Go-game, and this learner can help the system to make
decisions more efficiently.
• Without using heuristics, many AI-related problems cannot be solved
at all (may take many years to get a solutions).
Some heuristics for finding H(x)
• For any given node x, we need an estimation function to find H(x).
• For maze problem, for example, H(x) can be estimated by the
Manhattan distance between the current node the target node. This
distance is usually smaller than the true value (and this is good)
because some edges may not exist in practice.
• For more complex problems, we may need a method (e.g. neural
network) to learn the function from experiences or observed data.
Heuristics: Direct Distance from nodes to
Goal
Heuristics: Manhattan distance from nodes to
goals
Heuristics: More than Distance!
Manhattan Distance and Manhattan Cost
P ro b a b ilis tic
M o d e ls
R e g re s s io n C o rre la tio n O th e r
M o d e ls M o d e ls M o d e ls
Simple
Simple Multiple
Simple Multiple
Linear
Simple Multiple
-Non
Linear
Linear
Simple Multiple
-Non
Linear Linear
Linear
Simple Multiple
-Non -Non
Linear Linear
Linear Linear
S im ple M ultiple
N on - N on -
Linear Linear
Linear Linear
.T/Maker Co 1984-1994 ©
90 EPI 809/Spring 2008
Linear Regression Model
Relationship Between Variables Is a Linear Function.1 •
Yi 0 1 X i i
Dependent Independent (Explanatory) Variable
(Response) Variable (e.g., Years s. serocon.)
(e.g., CD+ c.)
Population & Sample Regression Models
Population
93 EPI 809/Spring 2008
Population & Sample Regression Models
Population
Unknown
Relationship
Yi 0 1X i i
94 EPI 809/Spring 2008
Population & Sample Regression Models
Unknown
Relationship
Yi 0 1X i i
95 EPI 809/Spring 2008
Population & Sample Regression Models
Unknown
Yi 0 1X i i
Relationship
Yi 0 1X i i
96 EPI 809/Spring 2008
Population Linear Regression Model
Y Yi 0 1 X i i Observedvalue
i = Random error
E Y 0 1 X i
X
Observed value
97 EPI 809/Spring 2008
Sample Linear Regression Model
Y
Yi X
0 1 i i
^ = Random error
i
Unsampled
observation
Yi X
0 1 i
X
Observed value
98 EPI 809/Spring 2008
Estimating Parameters:
Least Squares Method
Y
60
40
20
0 X
0 20 40 60
100 EPI 809/Spring 2008
Thinking Challenge
How would you draw a line through the points? How do you
determine which line ‘fits best’?
Y
60
40
20
0 X
0 20 40 60
101 EPI 809/Spring 2008
Thinking Challenge
How would you draw a line through the points? How do you
determine which line ‘fits best’?
Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept unchanged
102 EPI 809/Spring 2008
Thinking Challenge
How would you draw a line through the points? How do you
determine which line ‘fits best’?
Slope unchanged
Y
60
40
20
0 X
0 20 40 60
Intercept changed
103 EPI 809/Spring 2008
Thinking Challenge
How would you draw a line through the points? How do you
determine which line ‘fits best’?
Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept changed
104 EPI 809/Spring 2008
Least Squares
• 1. ‘Best Fit’ Means Difference Between Actual Y Values &
Predicted Y Values Are a Minimum. But Positive Differences Off-Set
Negative ones
Y
n n
ˆ
2
i Yˆi 2
i
i 1 i 1
n n
ˆ
2
Yi Yˆi 2
i
i 1 i 1
• 2. LS Minimizes the Sum of the Squared Differences (errors) (SSE)
i 1
Y
Y2 X
0 1 2 2
^4
^2
^1 ^3
Yi X
0 1 i
X
108 EPI 809/Spring 2008
Coefficient Equations
• Prediction equation
yˆ i ˆ0 ˆ1 xi
• Sample slope
SS xy xi x yi y
ˆ1
2
SS xx xi x
• Sample Y - intercept
ˆ0 y ˆ1x
EPI 809/Spring 2008 109