Unit-2 (Notes AI)
Unit-2 (Notes AI)
We can also say that a problem-solving agent is a result-driven agent and always focuses on satisfying
the goals.
Steps problem-solving in AI: The problem of AI is directly associated with the nature of humans and
their activities. So we need a number of finite steps to solve a problem which makes human easy works.
Goal Formulation: This one is the first and simple step in problem-solving. It organizes finite steps to
formulate a target/goals which require some action to achieve the goal. Today the formulation of the
goal is based on AI agents.
Problem formulation: It is one of the core steps of problem-solving which decides what action should
be taken to achieve the formulated goal. In AI this core part is dependent upon software agent which
consisted of the following components to formulate the associated problem.
Initial State: This state requires an initial state for the problem which starts the AI agent towards a
specified goal. In this state new methods also initialize problem domain solving by a specific class.
Action: This stage of problem formulation works with function with a specific class taken from the initial
state and all possible actions done in this stage.
Transition: This stage of problem formulation integrates the actual action done by the previous action
stage and collects the final stage to forward it to their next stage.
Goal test: This stage determines that the specified goal achieved by the integrated transition model or
not, whenever the goal achieves stop the action and forward into the next stage to determines the cost
to achieve the goal.
Path costing: This component of problem-solving numerical assigned what will be the cost to achieve
the goal. It requires all hardware software and human working cost.
• Search Strategy
A search strategy specifies which paths are selected from the frontier. Different strategies are obtained
by modifying how the selection of paths in the frontier is implemented.
1 Depth-First Search
2 Breadth-First Search
3 Lowest-Cost-First Search
The informed search algorithm is more useful for large search space. Informed search algorithm uses the
idea of heuristic, so it is also called Heuristic search.
Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost should be less than or
equal to the estimated cost.
Pure Heuristic Search:
Pure heuristic search is the simplest form of heuristic search algorithms. It expands nodes based on their
heuristic value h(n). It maintains two lists, OPEN and CLOSED list. In the CLOSED list, it places those
nodes which have already expanded and in the OPEN list, it places nodes which have yet not been
expanded.
On each iteration, each node n with the lowest heuristic value is expanded and generates all its
successors and n is placed to the closed list. The algorithm continues unit a goal state is found.
In the informed search we will discuss two main algorithms which are given below:
2. A* Search Algorithm
f(n)= g(n).
Step 3: Remove the node n, from the OPEN list which has the lowest value of h(n), and places it in the
CLOSED list.
Step 5: Check each successor of node n, and find whether any node is a goal node or not. If any
successor node is goal node, then return success and terminate the search, else proceed to Step 6.
Step 6: For each successor node, algorithm checks for evaluation function f(n), and then check if the
node has been in either OPEN or CLOSED list. If the node has not been in both list, then add it to the
OPEN list.
Disadvantages:
It can behave as an unguided depth-first search in the worst case scenario.
Example:
Consider the below search problem, and we will traverse it using greedy best-first search. At each
iteration, each node is expanded using evaluation function f(n)=h(n) , which is given in the below table.
In this search example, we are using two lists which are OPEN and CLOSED Lists. Following are the
iteration for traversing the above example.
Expand the nodes of S and put in the CLOSED list
Time Complexity: The worst case time complexity of Greedy best first search is O(bm).
Space Complexity: The worst case space complexity of Greedy best first search is O(bm). Where, m is
the maximum depth of the search space.
Complete: Greedy best-first search is also incomplete, even if the given state space is finite.
A* Search Algorithm:
A* search is the most commonly known form of best-first search. It uses heuristic function h(n), and cost
to reach the node n from the start state g(n). It has combined features of UCS and greedy best-first
search, by which it solve the problem efficiently. A* search algorithm finds the shortest path through the
search space using the heuristic function. This search algorithm expands less search tree and provides
optimal result faster. A* algorithm is similar to UCS except that it uses g(n)+h(n) instead of g(n).
In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence we can
combine both costs as following, and this sum is called as a fitness number.
Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of evaluation function (g+h), if
node n is goal node then return success and stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the closed list. For each
successor n', check whether n' is already in the OPEN or CLOSED list, if not then compute evaluation
function for n' and place into Open list.
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the back pointer
which reflects the lowest g(n') value.
Advantages:
A* search algorithm is the best algorithm than other search algorithms.
Disadvantages:
It does not always produce the shortest path as it mostly based on heuristics and approximation.
The main drawback of A* is memory requirement as it keeps all generated nodes in the memory, so it is
not practical for various large-scale problems.
Example:
In this example, we will traverse the given graph using the A* algorithm. The heuristic value of all states
is given in the below table so we will calculate the f(n) of each state using the formula f(n)= g(n) + h(n),
where g(n) is the cost to reach any node from start state.
Solution:
Initialization: {(S, 5)}
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)}
Iteration 4: will give the final result, as S--->A--->C--->G it provides the optimal path with cost 6.
Points to remember:
A* algorithm returns the path which occurred first, and it does not search for all remaining paths.
Admissible: the first condition requires for optimality is that h(n) should be an admissible heuristic for
A* tree search. An admissible heuristic is optimistic in nature.
Consistency: Second required condition is consistency for only A* graph-search.
If the heuristic function is admissible, then A* tree search will always find the least cost path.
Time Complexity: The time complexity of A* search algorithm depends on heuristic function, and the
number of nodes expanded is exponential to the depth of solution d. So the time complexity is O(b^d),
where b is the branching factor.
Which leads to a solution state required to reach the goal node. But beyond these “classical search
algorithms," we have some “local search algorithms” where the path cost does not matters, and only
focus on solution-state needed to reach the goal node.
A local search algorithm completes its task by traversing on a single current node rather than multiple
paths and following the neighbors of that node generally.
Although local search algorithms are not systematic, still they have the following two advantages:
Local search algorithms use a very little or constant amount of memory as they operate only on a single
path.
Most often, they find a reasonable solution in large or infinite state spaces where the classical or
systematic algorithms do not work.
Does the local search algorithm work for a pure optimized problem?
Yes, the local search algorithm works for pure optimized problems. A pure optimization problem is one
where all the nodes can give a solution. But the target is to find the best state out of all according to the
objective function. Unfortunately, the pure optimization problem fails to find high-quality solutions to
reach the goal state from the current state.
Note: An objective function is a function whose value is either minimized or maximized in different
contexts of the optimization problems. In the case of search algorithms, an objective function can be the
path cost for reaching the goal node, etc.
The local search algorithm explores the above landscape by finding the following two points:
Global Minimum: If the elevation corresponds to the cost, then the task is to find the lowest valley,
which is known as Global Minimum.
Global Maxima: If the elevation corresponds to an objective function, then it finds the highest peak
which is called as Global Maxima. It is the highest point in the valley.
Hill-climbing Search
Simulated Annealing
Note: Local search algorithms do not burden to remember all the nodes in the memory; it operates on
complete state-formulation.
When observations are partial, it will usually be the case that several states could have produced any
given percept. For example, the percept [A,Dirty]is produced by state 3 as well as by state 1. Hence,
given this as the initial percept, the initial belief state for the local-sensing vacuum world will be {1,3}.
The ACTIONS, STEP-COST, and GOAL-TEST are constructed from the underlying physical problem just as
for sensorless problems, but the transition model is a bit more complicated. We can think of transitions
from one belief state to the next for a particular action as occurring in three stages, as shown in Figure
4.15:
• Thepredictionstage is the same as for sensorless problems: given the actionain belief stateb, the
predicted belief state isˆb=PREDICT(b, a).11
• The observation prediction stage determines the set of percepts o that could be ob- served in the
predicted belief state:
POSSIBLE-PERCEPTS(ˆb) ={o:o=PERCEPT(s)ands∈ˆb}.
• The update stage determines, for each possible percept, the belief state that would result from the
percept. The new belief state bo is just the set of states inˆbthat could
bo =UPDATE(ˆb, o) ={s:o=PERCEPT(s)ands∈ˆb}.
Notice that each updated belief statebocan be no larger than the predicted belief stateˆb;
observations can only help reduce uncertainty compared to the sensorless case. More- over, for
deterministic sensing, the belief states for the different possible percepts will be disjoint, forming
apartitionof the original predicted belief state.
Section 4.4. Searching with Partial Observations 143 2 4 4 1 2 4 1 3 2 1 3 3 (b) (a) 4 2 1 3 Right [A,Dirty]
[B,Dirty] [B,Clean] Right [B,Dirty] [B,Clean]
Figure 4.15 Two example of transitions in local-sensing vacuum worlds. (a) In the de- terministic world,
Rightis applied in the initial belief state, resulting in a new belief state with two possible physical states;
for those states, the possible percepts are[B,Dirty]and
[B,Clean], leading to two belief states, each of which is a singleton. (b) In the slippery world,Rightis
applied in the initial belief state, giving a new belief state with four physi- cal states; for those states, the
possible percepts are[A,Dirty],[B,Dirty], and[B,Clean], leading to three belief states as shown.
Putting these three stages together, we obtain the possible belief states resulting from a given action
and the subsequent possible percepts:
The first level of theAND–ORsearch tree for a problem in the local-sensing vacuum world;Suck is
the first step of the solution.
such a formulation, theAND–OR search algorithm of Figure 4.11 can be applied directly to derive a
solution. Figure 4.16 shows part of the search tree for the local-sensing vacuum world, assuming an
initial percept[A,Dirty]. The solution is the conditional plan
[Suck,Right,ifBstate={6}thenSuckelse[ ]].
Notice that, because we supplied a belief-state problem to the AND–ORsearch algorithm, it returned
a conditional plan that tests the belief state rather than the actual state. This is as it should be: in a
partially observable environment the agent won’t be able to execute a solution that requires testing
the actual state.
As in the case of standard search algorithms applied to sensorless problems, theAND– OR search
algorithm treats belief states as black boxes, just like any other states. One can improve on this by
checking for previously generated belief states that are subsets or supersets of the current state,
just as for sensorless problems. One can also derive incremental search algorithms, analogous to
those described for sensorless problems, that provide substantial speedups over the black-box
approach.
Two prediction–update cycles of belief-state maintenance in the kindergarten vacuum world with
local sensing.
agent. Given an initial belief stateb, an actiona, and a percepto, the new belief state is:
MONITORING
FILTERING tion. Equation (4.6) is called arecursive state estimator because it computes the new
belief STATE ESTIMATION
RECURSIVE
state from the previous one rather than by examining the entire percept sequence. If the agent is
not to “fall behind,” the computation has to happen as fast as percepts are coming in. As the
environment becomes more complex, the exact update computation becomes infeasible and the
agent will have to compute an approximate belief state, perhaps focusing on the im- plications of
the percept for the aspects of the environment that are of current interest. Most work on this
problem has been done for stochastic, continuous-state environments with the tools of probability
theory, as explained in Chapter 15. Here we will show an example in a discrete environment with
detrministic sensors and nondeterministic actions.
The example concerns a robot with the task of localization: working out where it is, LOCALIZATION
given a map of the world and a sequence of percepts and actions. Our robot is placed in the
maze-like environment of Figure 4.18. The robot is equipped with four sonar sensors that tell
whether there is an obstacle—the outer wall or a black square in the figure—in each of the four
compass directions. We assume that the sensors give perfectly correct data, and that the robot has a
correct map of the enviornment. But unfortunately the robot’s navigational system is broken, so
when it executes aMoveaction, it moves randomly to one of the adjacent squares. The robot’s task is
to determine its current location.
Suppose the robot has just been switched on, so it does not know where it is. Thus its initial belief
stateb consists of the set of all locations. The the robot receives the percept
(b) after a second observationE2=N S. When sensors are noiseless and the transition model
is accurate, there are no other possible locations for the robot consistent with this sequence of two
observations.
NSW, meaning there are obstacles to the north, west, and south, and does an update using the
equationbo=UPDATE(b), yielding the 4 locations shown in Figure 4.18(a). You can inspect
the maze to see that those are the only four locations that yield the perceptNWS.
Next the robot executes aMoveaction, but the result is nondeterministic. The new be- lief
state,ba=PREDICT(bo,Move), contains all the locations that are one step away from the
locations inbo. When the second percept,NS, arrives, the robot does UPDATE(ba,NS)and
finds that the belief state has collapsed down to the single location shown in Figure 4.18(b). That’s
the only location that could be the result of
UPDATE(PREDICT(UPDATE(b,NSW),Move),NS).
With nondetermnistic actions the PREDICTstep grows the belief state, but the UPDATEstep
A constraint satisfaction problem (CSP) is a problem that requires its solution within some
limitations or conditions also known as constraints. It consists of the following:
A finite set of variables which stores the solution (V = {V1, V2, V3,....., Vn})
A set of discrete values known as domain from which the solution is picked (D = {D1, D2,
D3,.....,Dn})
Please note, that the elements in the domain can be both continuous and discrete but in AI, we
generally only deal with discrete values.
Also, note that all these sets should be finite except for the domain set. Each variable in the variable
set can have different domains. For example, consider the Sudoku problem again. Suppose that a
row, column and block already have 3, 5 and 7 filled in. Then the domain for all the variables in that
row, column and block will be {1, 2, 4, 6, 8, 9}.
71K views
The following problems are some of the popular problems that can be solved using CSP:
2. n-Queen (In an n-queen problem, n queens should be placed in an nXn matrix such that no
queen shares the same row, column or diagonal.)
3. Map Coloring (coloring different regions of map, ensuring no adjacent regions have the
same color)
Step 3: Create a constraint set with variables and domains (if possible) after considering the
constraints.
• Constraint propagation
To perform constraint propagation a system needs to implement variables ranging over finite domains.
Constraints expressing a relation among variables are implemented by propagators: software
abstractions which by execution perform constraint propagation. Finally, a propagation engine
coordinates the execution of propagators in order to deliver constraint propagation for a collection of
constraints.
A number of inference techniques use the constraints to infer which variable/value pairs are consistent
and which are not. These include node, arc, path, and k-consistent.
constraint propagation: Using the constraints to reduce the number of legal values for a variable,
which in turn can reduce the legal values for another variable, and so on.
local consistency: If we treat each variable as a node in a graph and each binary constraint as an arc,
then the process of enforcing local consistency in each part of the graph causes inconsistent values to be
eliminated throughout the graph.
Node consistency
A single variable (a node in the CSP network) is node-consistent if all the values in the variable’s domain
satisfy the variable’s unary constraint.
Arc consistency
A variable in a CSP is arc-consistent if every value in its domain satisfies the variable’s binary constraints.
Xi is arc-consistent with respect to another variable Xj if for every value in the current domain Di there is
some value in the domain Dj that satisfies the binary constraint on the arc (Xi, Xj).
Arc consistency tightens down the domains (unary constraint) using the arcs (binary constraints).
AC-3 algorithm:
AC-3 maintains a queue of arcs which initially contains all the arcs in the CSP.
AC-3 then pops off an arbitrary arc (Xi, Xj) from the queue and makes Xi arc-consistent with respect to
Xj.
But if this revises Di, then add to the queue all arcs (Xk, Xi) where Xk is a neighbor of Xi.
If Di is revised down to nothing, then the whole CSP has no consistent solution, return failure;
Otherwise, keep checking, trying to remove values from the domains of variables until no more arcs are
in the queue.
The result is an arc-consistent CSP that have the same solutions as the original one but have smaller
domains.
Assume a CSP with n variables, each with domain size at most d, and with c binary constraints (arcs).
Checking consistency of an arc can be done in O(d2) time, total worst-case time is O(cd3).
Path consistency
Path consistency: A two-variable set {Xi, Xj} is path-consistent with respect to a third variable Xm if, for
every assignment {Xi = a, Xj = b} consistent with the constraint on {Xi, Xj}, there is an assignment to Xm
that satisfies the constraints on {Xi, Xm} and {Xm, Xj}.
Path consistency tightens the binary constraints by using implicit constraints that are inferred by looking
at triples of variables.
K-consistency
K-consistency: A CSP is k-consistent if, for any set of k-1 variables and for any consistent assignment to
those variables, a consistent value can always be assigned to any kth variable.
A CSP is strongly k-consistent if it is k-consistent and is also (k - 1)-consistent, (k – 2)-consistent, … all the
way down to 1-consistent.
A CSP with n nodes and make it strongly n-consistent, we are guaranteed to find a solution in time
O(n2d). But algorithm for establishing n-consitentcy must take time exponential in n in the worse case,
also requires space that is exponential in n.
Global constraints
A global constraint is one involving an arbitrary number of variables (but not necessarily all variables).
Global constraints can be handled by special-purpose algorithms that are more efficient than
general-purpose methods.
A simple consistency procedure for a higher-order constraint is sometimes more effective than applying
arc consistency to an equivalent set of binary constrains.
e.g.Atmost(10, P1, P2, P3, P4): no more than 10 personnel are assigned in total.
If each variable has the domain {3, 4, 5, 6}, the Atmost constraint cannot be satisfied.
We can enforce consistency by deleting the maximum value of any domain if it is not consistent with the
minimum values of the other domains.
e.g. If each variable in the example has the domain {2, 3, 4, 5, 6}, the values 5 and 6 can be deleted from
each domain.
e.g.suppose there are two flights F1 and F2 in an airline-scheduling problem, for which the planes have
capacities 165 and 385, respectively. The initial domains for the numbers of passengers on each flight
are
Now suppose we have the additional constraint that the two flight together must carry 420 people: F1 +
F2 = 420. Propagating bounds constraints, we reduce the domains to
A CSP is bounds consistent if for every variable X, and for both the lower-bound and upper-bound values
of X, there exists some value of Y that satisfies the constraint between X and Y for every variable Y.
Sudoku
A Sudoku puzzle can be considered a CSP with 81 variables, one for each square. We use the variable
names A1 through A9 for the top row (left to right), down to I1 through I9 for the bottom row. The
empty squares have the domain {1, 2, 3, 4, 5, 6, 7, 8, 9} and the pre-filled squares have a domain
consisting of a single value.
There are 27 different Alldiff constraints: one for each row, column, and box of 9 squares:
Commutativity: CSPs are all commutative. A problem is commutative if the order of application of any
given set of actions has no effect on the outcome.
Backtracking search: A depth-first search that chooses values for one variable at a time and backtracks
when a variable has no legal values left to assign.
Backtracking algorithm repeatedly chooses an unassigned variable, and then tries all values in the
domain of that variable in turn, trying to find a solution. If an inconsistency is detected, then BACKTRACK
returns failure, causing the previous call to try another value.
There is no need to supply BACKTRACKING-SEARCH with a domain-specific initial state, action function,
transition model, or goal test.
BACKTRACKING-SARCH keeps only a single representation of a state and alters that representation
rather than creating a new ones.
2)function INFERENCE: what inferences should be performed at each step in the search?
3)When the search arrives at an assignment that violates a constraint, can the search avoid repeating
this failure?
SELECT-UNASSIGNED-VARIABLE
Variable selection—fail-first
Minimum-remaining-values (MRV) heuristic: The idea of choosing the variable with the fewest “legal”
value. A.k.a. “most constrained variable” or “fail-first” heuristic, it picks a variable that is most likely to
cause a failure soon thereby pruning the search tree. If some variable X has no legal values left, the MRV
heuristic will select X and failure will be detected immediately—avoiding pointless searches through
other variables.
E.g. After the assignment for WA=red and NT=green, there is only one possible value for SA, so it makes
sense to assign SA=blue next rather than assigning Q.
[Powerful guide]
Degree heuristic: The degree heuristic attempts to reduce the branching factor on future choices by
selecting the variable that is involved in the largest number of constraints on other unassigned variables.
[useful tie-breaker]
e.g. SA is the variable with highest degree 5; the other variables have degree 2 or 3; T has degree 0.
ORDER-DOMAIN-VALUES
Value selection—fail-last
If we are trying to find all the solution to a problem (not just the first one), then the ordering does not
matter.
Least-constraining-value heuristic: prefers the value that rules out the fewest choice for the neighboring
variables in the constraint graph. (Try to leave the maximum flexibility for subsequent variable
assignments.)
e.g. We have generated the partial assignment with WA=red and NT=green and that our next choice is
for Q. Blue would be a bad choice because it eliminates the last legal value left for Q’s neighbor, SA,
therefore prefers red to blue.
The minimum-remaining-values and degree heuristic are domain-independent methods for deciding
which variable to choose next in a backtracking search. The least-constraining-value heuristic helps in
deciding which value to try first for a given variable.
INFERENCE
forward checking: [One of the simplest forms of inference.] Whenever a variable X is assigned, the
forward-checking process establishes arc consistency for it: for each unassigned variable Y that is
connected to X by a constraint, delete from Y’s domain any value that is inconsistent with the value
chosen for X.
There is no reason to do forward checking if we have already done arc consistency as a preprocessing
step.
Advantage: For many problems the search will be more effective if we combine the MRV heuristic with
forward checking.
Disadvantage: Forward checking only makes the current variable arc-consistent, but doesn’t look
ahead and make all the other variables arc-consistent.
MAC (Maintaining Arc Consistency) algorithm: [More powerful than forward checking, detect this
inconsistency.] After a variable Xi is assigned a value, the INFERENCE procedure calls AC-3, but instead of
a queue of all arcs in the CSP, we start with only the arcs(Xj, Xi) for all Xj that are unassigned variables
that are neighbors of Xi. From there, AC-3 does constraint propagation in the usual way, and if any
variable has its domain reduced to the empty set, the call to AC-3 fails and we know to backtrack
immediately.
3. Intelligent backtracking
chronological backtracking: The BACKGRACKING-SEARCH in Fig 6.5. When a branch of the search fails,
back up to the preceding variable and try a different value for it. (The most recent decision point is
revisited.)
e.g.Suppose we have generated the partial assignment {Q=red, NSW=green, V=blue, T=red}.
When we try the next variable SA, we see every value violates a constraint.
Intelligent backtracking: Backtrack to a variable that was responsible for making one of the possible
values of the next variable (e.g. SA) impossible.
Conflict set for a variable: A set of assignments that are in conflict with some value for that variable.
(e.g. The set {Q=red, NSW=green, V=blue} is the conflict set for SA.)
backjumping method: Backtracks to the most recent assignment in the conflict set.
(e.g. backjumping would jump over T and try a new value for V.)
Forward checking can supply the conflict set with no extra work.
Whenever forward checking based on an assignment X=x deletes a value from Y’s domain, add X=x to Y’s
conflict set;
If the last value is deleted from Y’s domain, the assignment in the conflict set of Y are added to the
conflict set of X.
In fact,every branch pruned by backjumping is also pruned by forward checking. Hence simple
backjumping is redundant in a forward-checking search or in a search that uses stronger consistency
checking (such as MAC).
Conflict-directed backjumping:
e.g.consider the partial assignment which is proved to be inconsistent: {WA=red, NSW=red}.
We try T=red next and then assign NT, Q, V, SA, no assignment can work for these last 4 variables.
Eventually we run out of value to try at NT, but simple backjumping cannot work because NT doesn’t
have a complete conflict set of preceding variables that caused to fail.
The set {WA, NSW} is a deeper notion of the conflict set for NT, caused NT together with any subsequent
variables to have no consistent solution. So the algorithm should backtrack to NSW and skip over T.
A backjumping algorithm that uses conflict sets defined in this way is called conflict-direct backjumping.
How to Compute:
When a variable’s domain becomes empty, the “terminal” failure occurs, that variable has a standard
conflict set.
Let Xj be the current variable, let conf(Xj) be its conflict set. If every possible value for Xj fails, backjump
to the most recent variable Xi in conf(Xj), and set
The conflict set for an variable means, there is no solution from that variable onward, given the
preceding assignment to the conflict set.
SA fails, and its conflict set is {WA, NT, Q}. (standard conflict set)
After backjumping from a contradiction, how to avoid running into the same problem again:
Constraint learning: The idea of finding a minimum set of variables from the conflict set that
causes the problem. This set of variables, along with their corresponding values, is called a no-good.
We then record the no-good, either by adding a new constraint to the CSP or by keeping a separate
cache of no-goods.
Backtracking occurs when no legal assignment can be found for a variable. Conflict-directed
backjumping backtracks directly to the source of the problem.
• Game Playing
Game Playing is an important domain of artificial intelligence. Games don’t require much
knowledge; the only knowledge we need to provide is the rules, legal moves and the conditions of
winning or losing the game.
Both players try to win the game. So, both of them try to make the best move possible at each turn.
Searching techniques like BFS(Breadth First Search) are not accurate for this as the branching
factor is very high, so searching will take a lot of time. So, we need another search procedures that
improve –
The most common search technique in game playing is Minimax search procedure. It is depth-first
depth-limited search procedure. It is used for games like chess and tic-tac-toe.
MOVEGEN : It generates all the possible moves that can be generated from the current
position.
STATICEVALUATION : It returns a value depending upon the goodness from the viewpoint
of two-player
This algorithm is a two player game, so we call the first player as PLAYER1 and second
player as PLAYER2. The value of each node is backed-up from its children. For PLAYER1
the backed-up value is the maximum value of its children and for PLAYER2 the backed-up
value is the minimum value of its children. It provides most promising move to PLAYER1,
assuming that the PLAYER2 has make the best move. It is a recursive algorithm, as same
procedure occurs at each level.
Figure 1: Before backing-up of values
We assume that PLAYER1 will start the game. 4 levels are generated. The value to nodes H, I, J, K, L,
M, N, O is provided by STATICEVALUATION function. Level 3 is maximizing level, so all nodes of
level 3 will take maximum values of their children. Level 2 is minimizing level, so all its nodes will
take minimum values of their children. This process continues. The value of A is 23. That means A
should choose C move to win.
Games are usually intriguing because they are difficult to solve. Chess, for example, has an average
branching factor of around 35, and games frequently stretch to 50 moves per player, therefore the
search tree has roughly 35100 or 10154 nodes (despite the search graph having “only” about 1040
unique nodes). As a result, games, like the real world, necessitate the ability to make some sort of
decision even when calculating the best option is impossible.
1. S0: The initial state of the game, which describes how it is set up at the start.
5. Terminal-Test (s): A terminal test that returns true if the game is over but false otherwise.
Terminal states are those in which the game has come to a conclusion.
6. Utility (s, p): A utility function (also known as a payout function or objective function )
determines the final numeric value for a game that concludes in the terminal state s for
player p. The result in chess is a win, a loss, or a draw, with values of +1, 0, or 1/2.
Backgammon’s payoffs range from 0 to +192, but certain games have a greater range of
possible outcomes. A zero-sum game is defined (confusingly) as one in which the total
reward to all players is the same for each game instance. Chess is a zero-sum game because
each game has a payoff of 0 + 1, 1 + 0, or 1/2 + 1/2. “Constant-sum” would have been a
preferable name, 22 but zero-sum is the usual term and makes sense if each participant is
charged 1.
The game tree for the game is defined by the beginning state, ACTIONS function, and RESULT
function—a tree in which the nodes are game states and the edges represent movements. The
figure below depicts a portion of the tic-tac-toe game tree (noughts and crosses). MAX may make
nine different maneuvers from his starting position. The game alternates between MAXs setting an
X and MINs placing an O until we reach leaf nodes corresponding to terminal states, such as one
player having three in a row or all of the squares being filled. The utility value of the terminal state
from the perspective of MAX is shown by the number on each leaf node; high values are thought to
be beneficial for MAX and bad for MIN
The game tree for tic-tac-toe is relatively short, with just 9! = 362,880 terminal nodes. However,
because there are over 1040 nodes in chess, the game tree is better viewed as a theoretical
construct that cannot be realized in the actual world. But, no matter how big the game tree is, MAX’s
goal is to find a solid move. A tree that is superimposed on the whole game tree and examines
enough nodes to allow a player to identify what move to make is referred to as a search tree.
A sequence of actions leading to a goal state—a terminal state that is a win—would be the best
solution in a typical search problem. MIN has something to say about it in an adversarial search.
MAX must therefore devise a contingent strategy that specifies M A X’s initial state move, then
MAX’s movements in the states resulting from every conceivable MIN response, then MAX’s moves
in the states resulting from every possible MIN reaction to those moves, and so on. This is quite
similar to the AND-OR search method, with MAX acting as OR and MIN acting as AND. When playing
an infallible opponent, an optimal strategy produces results that are as least as excellent as any
other plan. We’ll start by demonstrating how to find the best plan.
We’ll move to the trivial game in the figure below since even a simple game like tic-tac-toe is too
complex for us to draw the full game tree on one page. MAX’s root node moves are designated by
the letters a1, a2, and a3. MIN’s probable answers to a1 are b1, b2, b3, and so on. This game is over
after MAX and MIN each make one move. (In game terms, this tree consists of two half-moves and is
one move deep, each of which is referred to as a ply.) The terminal states in this game have utility
values ranging from 2 to 14.
The optimal strategy can be found from the minimax value of each node, which we express as
MINIMAX, given a game tree (n). Assuming that both players play optimally from there through the
finish of the game, the utility (for MAX) of being in the corresponding state is the node’s minimax
value. The usefulness of a terminal state is obviously its minimax value. Furthermore, if given the
option, MAX prefers to shift to a maximum value state, whereas MIN wants to move to a minimum
value state. So here’s what we’ve got:
Let’s use these definitions to analyze the game tree shown in the figure above. The game’s UTILITY
function provides utility values to the terminal nodes on the bottom level. Because the first MIN
node, B, has three successor states with values of 3, 12, and 8, its minimax value is 3. Minimax value
2 is also used by the other two MIN nodes. The root node is a MAX node, with minimax values of 3,
2, and 2, resulting in a minimax value of 3. We can also find the root of the minimax decision: action
a1 is the best option for MAX since it leads to the highest minimax value.
This concept of optimal MAX play requires that MIN plays optimally as well—it maximizes MAX’s
worst-case outcome. What happens if MIN isn’t performing at its best? Then it’s a simple matter of
demonstrating that MAX can perform even better. Other strategies may outperform the minimax
method against suboptimal opponents, but they will always outperform optimal opponents.
• Alpha-Beta Pruning
Alpha beta pruning is an optimisation technique for the minimax algorithm. Through the course of
this blog, we will discuss what alpha beta pruning means, we will discuss minimax algorithm, rules
to find good ordering, and more.
Beta: At any point along the Minimizer path, Beta is the best option or the lowest value we’ve
discovered.. The initial value for alpha is + ∞.
The alpha and beta values of each node must be kept track of. Alpha can only be updated when it’s
MAX’s time, and beta can only be updated when it’s MIN’s turn.
MAX will update only alpha values and the MIN player will update only beta values.
The node values will be passed to upper nodes instead of alpha and beta values during going into
the tree’s reverse.
Minimax algorithm
Minimax is a classic depth-first search technique for a sequential two-player game. The two players
are called MAX and MIN. The minimax algorithm is designed for finding the optimal move for MAX,
the player at the root node. The search tree is created by recursively expanding all nodes from the
root in a depth-first manner until either the end of the game or the maximum search depth is
reached. Let us explore this algorithm in detail.
As already mentioned, there are two players in the game, viz- Max and Min. Max plays the first step.
Max’s task is to maximise its reward while Min’s task is to minimise Max’s reward, increasing its
own reward at the same time. Let’s say Max can take actions a, b, or c. Which one of them will give
Max the best reward when the game ends? To answer this question, we need to explore the game
tree to a sufficient depth and assume that Min plays optimally to minimise the reward of Max.
Here is an example. Four coins are in a row and each player can pick up one coin or two coins on
his/her turn. The player who picks up the last coin wins. Assuming that Max plays first, what move
should Max make to win?
If Max picks two coins, then only two coins remain and Min can pick two coins and win. Thus
picking up 1 coin shall maximise Max’s reward.
As you might have noticed, the nodes of the tree in the figure below have some values inscribed on
them, these are called minimax value. The minimax value of a node is the utility of the node if it is a
terminal node.
If the node is a non-terminal Max node, the minimax value of the node is the maximum of the
minimax values of all of the node’s successors. On the other hand, if the node is a non-terminal Min
node, the minimax value of the node is the minimum of the minimax values of all of the node’s
successors.
Now we will discuss the idea behind the alpha beta pruning. If we apply alpha-beta pruning to the
standard minimax algorithm it gives the same decision as that of standard algorithm but it prunes
or cuts down the nodes that are unusual in decision tree i.e. which are not affecting the final
decision made by the algorithm. This will help to avoid the complexity in the interpretation of
complex trees.
Now let us discuss the intuition behind this technique. Let us try to find minimax decision in the
below tree :
In this case,
Minimax Decision = MAX {MIN {3, 5, 10}, MIN {2, a, b}, MIN {2, 7, 3}}
= MAX {3, c, 2} = 3
Here in the above result you must have a doubt in your mind that how can we find the maximum
from missing value. So, here is solution of your doubt also:
In the second node we choose the minimum value as c which is less than or equal to 2 i.e. c <= 2.
Now If c <= 3 and we have to choose the max of 3, c, 2 the maximum value will be 3.
We have reached a decision without looking at those nodes. And this is where alpha-beta pruning
comes into the play.
Alpha: Alpha is the best choice or the highest value that we have found at any instance along the
path of Maximizer. The initial value for alpha is – ∞.
Beta: Beta is the best choice or the lowest value that we have found at any instance along the path
of Minimizer. The initial value for alpha is + ∞.
Each node has to keep track of its alpha and beta values. Alpha can be updated only when it’s MAX’s
turn and, similarly, beta can be updated only when it’s MIN’s chance.
MAX will update only alpha values and MIN player will update only beta values.
The node values will be passed to upper nodes instead of values of alpha and beta during go into
reverse of tree.
3. Now the next move will be on node B and its turn for MIN now. So, at node B, the value of alpha
beta will be min (3, ∞). So, at node B values will be alpha= – ∞ and beta will be 3.
In the next step, algorithms traverse the next successor of Node B which is node E, and the values of
α= -∞, and β= 3 will also be passed.
4. Now it’s turn for MAX. So, at node E we will look for MAX. The current value of alpha at E is – ∞
and it will be compared with 5. So, MAX (- ∞, 5) will be 5. So, at node E, alpha = 5, Beta = 5. Now as
we can see that alpha is greater than beta which is satisfying the pruning condition so we can prune
the right successor of node E and algorithm will not be traversed and the value at node E will be 5.
6. In the next step the algorithm again comes to node A from node B. At node A alpha will be
changed to maximum value as MAX (- ∞, 3). So now the value of alpha and beta at node A will be (3,
+ ∞) respectively and will be transferred to node C. These same values will be transferred to node
F.
7. At node F the value of alpha will be compared to the left branch which is 0. So, MAX (0, 3) will be
3 and then compared with the right child which is 1, and MAX (3,1) = 3 still α remains 3, but the
node value of F will become 1.
8. Now node F will return the node value 1 to C and will compare to beta value at C. Now its turn for
MIN. So, MIN (+ ∞, 1) will be 1. Now at node C, α= 3, and β= 1 and alpha is greater than beta which
again satisfies the pruning condition. So, the next successor of node C i.e. G will be pruned and the
algorithm didn’t compute the entire subtree G.
Now, C will return the node value to A and the best value of A will be MAX (1, 3) will be 3.
The above represented tree is the final tree which is showing the nodes which are computed and
the nodes which are not computed. So, for this example the optimal value of the maximizer will be
3.
Worst Ordering: In some cases of alpha beta pruning none of the node pruned by the algorithm
and works like standard minimax algorithm. This consumes a lot of time as because of alpha and
beta factors and also not gives any effective results. This is called Worst ordering in pruning. In this
case, the best move occurs on the right side of the tree.
Ideal Ordering: In some cases of alpha beta pruning lot of the nodes pruned by the algorithm. This
is called Ideal ordering in pruning. In this case, the best move occurs on the left side of the tree. We
apply DFS hence it first search left of the tree and go deep twice as minimax algorithm in the same
amount of time.
Order of nodes should be in such a way that the best nodes will be computed first
• Stochastic Games
Many unforeseeable external occurrences can place us in unforeseen circumstances in real life.
Many games, such as dice tossing, have a random element to reflect this unpredictability. These are
known as stochastic games. Backgammon is a classic game that mixes skill and luck. The legal
moves are determined by rolling dice at the start of each player’s turn white, for example, has rolled
a 6–5 and has four alternative moves in the backgammon scenario shown in the figure below.
This is a standard backgammon position. The object of the game is to get all of one’s pieces off the
board as quickly as possible. White moves in a clockwise direction toward 25, while Black moves in
a counterclockwise direction toward 0. Unless there are many opponent pieces, a piece can advance
to any position; if there is only one opponent, it is caught and must start over. White has rolled a
6–5 and must pick between four valid moves: (5–10,5–11), (5–11,19–24), (5–10,10–16), and
(5–11,11–16), where the notation (5–11,11–16) denotes moving one piece from position 5 to 11
and then another from 11 to 16.
As a result, we can generalize the deterministic minimax value to an expected-minimax value for
games with chance nodes. Terminal nodes, MAX and MIN nodes (for which the dice roll is known),
and MAX and MIN nodes (for which the dice roll is unknown) all function as before. We compute
the expected value for chance nodes, which is the sum of all outcomes, weighted by the probability
of each chance action.
where r is a possible dice roll (or other random events) and RESULT(s,r) denotes the same state as
s, but with the addition that the dice roll’s result is r.