0% found this document useful (0 votes)
14 views

Unit 3 Notes

Uploaded by

tsubiksha78
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Unit 3 Notes

Uploaded by

tsubiksha78
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

SRI RAJA RAAJAN COLLEGE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

II-YEAR-AI&DS -2021 R
SUB CODE: AL3391
SUB NAME: Artificial Intelligence

Unit III

Game Playing And Csp:


Game theory – optimal decisions in gamesA/M23 – alpha-beta search – monte-
carlo tree searchA/M22 – stochastic games – partially observable gamesN/D22.
Constraint satisfaction problemsA/M23 – constraint propagation – backtracking
search for N/D22 – local search for CSPA/M22 – structure of CSP.

PREPARED BY VERIFIED BY HOD APPROVED BY


(DEAN)

1
PART A

Two marks
1 .How can minmax also be extended for game of chance?

In a game of chance we can add extra level of chance nodes in g tree. These nodes
have successors which are the outcomes of random element The minmax algorithm
uses probability P attached with chance node di based value. Successor function S (N,
d₁) give moves from position N for outcomed

2.What are the 2 parts of Landscape?

Location defined by the state. ii. Elevation defined by the value of the
heuristic cost function (or)
objective function.

3.Define Global minimum.


If elevation corresponds to cost, then the aim is to find the lowest valley is called global
minimum.

4. Define Global Maximum.

If elevation corresponds to an objective function, then the aim is to find the highest
peak is called
global maximum.

5. Define Hill Climbing search.

It is a loop that continually moves in a increasing value direction (i.e.) up


hill and terminates when it reaches a “peak” where no neighbor has a higher
value.

6.List some drawbacks of hill climbing process.

Local maxima: A local maxima as opposed to a goal maximum is a peak


that is lower that the highest peak in the state space. Once a local maxima is
reached the algorithm will halt even though the solution may be far from
satisfactory.

7.Define Plateaux.

2
A plateaux is an area of the state space where the evaluation fn is essentially
flat. The search will conduct a random walk.

8.What are the variants of hill climbing?

i. Stochastic
hill climbing ii. First choice hill climbing iii. Simulated
annealing search iv. Local beam search v. Stochastic beam search

9.Define annealing.

Annealing is the process used to harden metals (or) glass by heating them to
a high temperature and then gradually cooling them, thus allowing the
material to coalesce into a low energy crystalline state.

10.Define simulated annealing.

This algorithm, instead of picking the best move, it picks a random move. If
the move improves the situation, it is always accepted.

11.What is the advantage of memory bounded search techniques?


We can reduce space requirements of A* with memory bounded algorithm such as
IDA* & SMA*.

12.What is Genetic Algorithms?

Genetic Algorithm is a variant of stochastic beam search in which successor


states are generated by combining two parent states, rather than by
modifying a single state.

13.Define Online Search agent.

Agent operates by interleaving computation and action (i.e.) first it takes an


action, and then it observes
the environment and computes the next action.

13.What are the things that agent knows in online search problems?
a. Actions(s) b. Step cost function C(s, a, s’) c. Goal TEST(s)

14.Define CSP.

Constraint Satisfaction problem (CSP) is defined by a set of variables


X1,X2,…Xn and set of constraints C1,C2,…Cm.

3
15.Define Successor function.

A value can be assigned to any unassigned variable, provided that does not
conflict with previously assigned variables.

16.What are the types of constraints?

There are 5 types, a. Unary constraints relates one variable. b. A binary constraint
relates two variables.
c. Higher order constraints relate more than two variables. d. Absolute
constraints. e. Preference constraints.

17.Define constraint propagation.

It is the general term for propagating (i.e.) spreading the implications of


constraints on the variable on to other variable.

18.Define Cycle cut set.

The process of choosing a subset S from variables [CSP] such that the
constraint graph becomes a tree after removal of S. S is called a cycle cut
set.

19.Define Tree decomposition.

The constraint graph is divided into a set of connected sub problems. Each
sub problem is solved independently and the resulting solutions are then
combined. This process is called tree decomposition.

20.Define Alpha beta pruning.


Alpha beta pruning eliminates away branches that cannot possibly influence the final
decision.

4
ARTIFICIAL INTELLIGENCE
UNIT III

3.1.Game Theory:

 Game theory is basically a branch of mathematics that is used to typical strategic


interaction between different players (agents), all of which are equally rational, in
a context with predefined rules (of playing or maneuvering) and outcomes.

 Every player or agent is a rational entity who is selfish and tries to maximize the
reward to be obtained using a particular strategy.

 All the players abide by certain rules in order to receive a predefined playoff- a
reward after a certain outcome. Hence, a GAME can be defined as a set of players,
actions, strategies,anda final playoff for which all the players are competing. Game
Theory has now become a describing factor for both Machine Learning algorithms
and many daily life situations.

Consider the SVM (Support Vector Machine) for instance. According to Game Theory, the SVM
is a game between 2 players where one player challenges the other to find the best hyper- plane
after providing the most difficult points for classification. The final playoff of this game is a
solution that will be a trade-off between the strategic abilities of both players competing.

Nash equilibrium:
Nash equilibrium can be considered the essence of Game Theory. It is basically a state, a point
of equilibrium of collaboration of multiple players in a game. Nash Equilibrium guarantees
maximum profit to each player.
Let us try to understand this with the help of Generative Adversarial Networks (GANs).
What is GAN?

 It is a combination of two neural networks: the Discriminator and the Generator.


 The Generator Neural Network is fed input images which it analyzes and then produces
new sample images, which are made to represent the actual input images as close as
possible.
 Once the images have been produced, they are sent to the Discriminator Neural Network.
This neural network judges the images sent to it and classifies them as generated images
and actual input images.
 If the image is classified as the original image, the DNN changes its parameters of judging.
If the image is classified as a generated image, the image is rejected and returned to
the GNN. The GNN then alters its parameters in order to improve the quality of the image
produced.
This is a competitive process which goes on until both neural networks do not require to make any
changes in their parameters and there can be no further improvement in both neural networks. This
state of no further improvement is known as NASH EQUILIBRIUM. In other words, GAN is a 2-
player competitive game where both players are continuously optimizing themselves to find a
Nash Equilibrium.

5
Where is GAME THEORY now?

 Game Theory is increasingly becoming a part of the real-world in its various applications
in areas like public health services, public safety, and wildlife. Currently, game theory
is being used in adversary training in GANs, multi-agent systems, and imitation and
reinforcement learning.

 In the case of perfect information and symmetric games, many Machine Learning and
Deep Learning techniques are applicable.

The real challenge lies in the development of techniques to handle incomplete information
games, such as Poker. The complexity of the game lies in the fact that there are too many
combinations of cards and the uncertainty of the cards being held by the various players.

Types of Games:
Currently, there are about 5 types of classification of games.
They are as follows:
1. Zero-Sum and Non-Zero Sum Games:
 In non-zero-sum games, there are multiple players and all of them have the
option to gain a benefit due to any move by another player.
 In zero- sum games, however, if one player earns something, the other players
are bound to lose a key playoff.
2. Simultaneous and Sequential Games:
 Sequential games are the more popular games where every player is aware of the
movement of another player.
 Simultaneous games are more difficult as in them, the players are involved in a
concurrent game. BOARD GAMES are the perfect example of sequential games and
are also referred to as turn-based or extensive-form games.

3.Asymmetric and Symmetric Games:


 Asymmetric games are those win in which each player has a different and usually
conflicting final goal. Symmetric games are those in which all players have the same ultimate

6
goal but the strategy being used by each is completely different.
 Co-operative and Non-Co-operative Games: In non-co-operative games, every player
plays for himself while in co-operative games, players form alliances in order to achieve the
final goal.

3.2.Optimal Decisions in Games:

 Humans’ intellectual capacities have been engaged by games for as long as civilization
has existed, sometimes to an alarming degree.
 Games are an intriguing subject for AI researchers because of their abstract character. A
game’s state is simple to depict, and actors are usually limited to a small number of
actions with predetermined results.
 Physical games, such as croquet and ice hockey, contain significantly more intricate
descriptions, a much wider variety of possible actions, and rather ambiguous regulations
defining the legality of activities.
 With the exception of robot soccer, these physical games have not piqued the AI
community’s interest.
Games are usually intriguing because they are difficult to solve. Chess, for example, has an average
branching factor of around 35, and games frequently stretch to 50 moves per player, therefore the
search tree has roughly 35100 or 10154 nodes (despite the search graph having “only” about
1040 unique nodes). As a result, games, like the real world, necessitate the ability to make some
sort of decision even when calculating the best option is impossible.Inefficiency is also heavily
punished in games

Optimal Decision Making in Games

Let us start with games with two players, whom we’ll refer to as MAX and MIN for obvious
reasons. MAX is the first to move, and then they take turns until the game is finished. At the
conclusion of the game, the victorious player receives points, while the loser receives penalties.
A game can be formalized as a type of search problem that has the following elements:
 S0: The initial state of the game, which describes how it is set up at the start.
 Player (s): Defines which player in a state has the move.
 Actions (s): Returns a state’s set of legal moves.
 Result (s, a): A transition model that defines a move’s outcome.
 Terminal-Test (s): A terminal test that returns true if the game is over but false otherwise.
Terminal states are those in which the game has come to a conclusion.
 Utility (s, p): A utility function (also known as a payout function or objective function )
1. determines the final numeric value for a game that concludes in the terminal state s
for player p.
2.The result in chess is a win, a loss, or a draw, with values of +1, 0, or 1/2.
Backgammon’s payoffs range from 0 to +192, but certain games have a greater range of
possible outcomes.
3. A zero-sum game is defined (confusingly) as one in which the total reward to all players
is the same for each game instance. Chess is a zero-sum game because each game has a
payoff of 0 + 1, 1 + 0, or 1/2 + 1/2. “Constant-sum” would have been a preferable name, 22
but zero-sum is the usual term and makes sense if each participant is charged 1.

7
The game tree for the game is defined by the beginning state, ACTIONS function, and
RESULT function—a tree in which the nodes are game states and the edges represent
movements.

The figure below depicts a portion of the tic-tac-toe game tree (noughts and crosses). MAX
may make nine different maneuvers from his starting position.

The game alternates between MAXs setting an X and MINs placing an O until we reach leaf
nodes corresponding to terminal states, such as one player having three in a row or all of the
squares being filled.

The utility value of the terminal state from the perspective of MAX is shown by the number
on each leaf node; high values are thought to be beneficial for MAX and bad for MIN

The game tree for tic-tac-toe is relatively short, with just 9! = 362,880 terminal nodes.
However, because there are over 1040 nodes in chess, the game tree is better viewed as a
theoretical construct that cannot be realized in the actual world. But, no matter how big the
game tree is, MAX’s goal is to find a solid move. A tree that is superimposed on the whole game
tree and examines enough nodes to allow a player to identify what move to make is referred to as
a search tree.

 A sequence of actions leading to a goal state—a terminal state that is a win—would be the best
solution in a typical search problem.
 MIN has something to say about it in an adversarial search. MAX must therefore devise a
contingent strategy that specifies M A X’s initial state move, then MAX’s movements in the states
resulting from every conceivable MIN response, then MAX’s moves in the states resulting from

8
every possible MIN reaction to those moves, and so on.
 This is quite similar to the AND-OR search method, with MAX acting as OR and MIN acting as
AND.
 When playing an infallible opponent, an optimal strategy produces results that are as least as
excellent as any other plan. We’ll start by demonstrating how to find the best plan.
 We’ll move to the trivial game in the figure below since even a simple game like tic-tac-toe is too
complex for us to draw the full game tree on one page. MAX’s root node moves are designated by
the letters a1, a2, and a3. MIN’s probable answers to a1 are b1, b2, b3, and so on. This game
is over after MAX and MIN each make one move. (In game terms, this tree consists of two half-
moves and is one move deep, each of which is referred to as a ply.) The terminal states in this
game have utility values ranging from 2 to 14.

Game’s Utility Function

 The optimal strategy can be found from the minimax value of each node, which we express as
MINIMAX, given a game tree (n).
 Assuming that both players play optimally from there through the finish of the game, the
utility (for MAX) of being in the corresponding state is the node’s minimax value.
 The usefulness of a terminal state is obviously its minimax value. Furthermore, if given the
option, MAX prefers to shift to a maximum value state, whereas MIN wants to move to a
minimum value state.
 So here’s what we’ve got:

 Let’s use these definitions to analyze the game tree shown in the figure above. The game’s
UTILITY function provides utility values to the terminal nodes on the bottom level.
9
 Because the first MIN node, B, has three successor states with values of 3, 12, and 8, its
minimax value is 3. Minimax value 2 is also used by the other two MIN nodes.

 The root node is a MAX node, with minimax values of 3, 2, and 2, resulting in a
minimax value of 3. We can also find the root of the minimax decision:

 action a1 is the best option for MAX since it leads to the highest minimax value.

This concept of optimal MAX play requires that MIN plays optimally as well—it maximizes
MAX’s worst-case outcome. What happens if MIN isn’t performing at its best? Then it’s a simple
matter of demonstrating that MAX can perform even better. Other strategies may outperform the
minimax method against suboptimal opponents, but they will always outperform optimal
opponents.

3.3.Alpha-Beta Pruning Search:


o Alpha-beta pruning is a modified version of the minimax algorithm. It is an
optimization technique for the minimax algorithm.
o As we have seen in the minimax search algorithm that the number of game states it has to
examine are exponential in depth of the tree. Since we cannot eliminate the exponent, but
we can cut it to half.
o Hence there is a technique by which without checking each node of the game tree we
can compute the correct minimax decision, and this technique is called pruning.
o This involves two threshold parameter Alpha and beta for future expansion, so it is called
alpha-beta pruning. It is also called as Alpha-Beta Algorithm.
o Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only prune
the tree leaves but also entire sub-tree.
o The two-parameter can be defined as:
a. Alpha: The best (highest-value) choice we have found so far at any point along the
path of Maximizer. The initial value of alpha is -∞.
b. Beta: The best (lowest-value) choice we have found so far at any point along the
path of Minimizer. The initial value of beta is +∞.
The Alpha-beta pruning to a standard minimax algorithm returns the same move as the
standard algorithm does, but it removes all the nodes which are not really affecting the final
decision but making algorithm slow.
Hence by pruning these nodes, it makes the algorithm fast.

Condition for Alpha-beta pruning:

The main condition which required for alpha-beta pruning

α>=β

Key points about alpha-beta pruning:


o The Max player will only update the value of alpha.
o The Min player will only update the value of beta.

10
o While backtracking the tree, the node values will be passed to upper nodes instead of
values of alpha and beta.
o We will only pass the alpha, beta values to the child nodes.

Working of Alpha-Beta Pruning:

Let's take an example of two-player search tree to understand the working of Alpha-beta
pruning

Step 1: At the first step the, Max player will start first move from node A where α= -∞ and β=
+∞, these value of alpha and beta passed down to node B where again α= -∞ and β= +∞, and
Node B passes the same value to its child D.

Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of α is
compared with firstly 2 and then 3, and the max (2, 3) = 3 will be the value of α at node D and
node value will also 3.

Step 3: Now algorithm backtrack to node B, where the value of β will change as this is a turn
of Min, Now β= +∞, will compare with the available subsequent nodes value, i.e. min (∞, 3)
= 3, hence at node B now α= -∞, and β= 3.

11
In the next step, algorithm traverse the next successor of Node B which is node E, and the
values of α= -∞, and β= 3 will also be passed.

Step 4: At node E, Max will take its turn, and the value of alpha will change. The current value
of alpha will be compared with 5, so max (-∞, 5) = 5, hence at node E α= 5 and β= 3, where
α>=β, so the right successor of E will be pruned, and algorithm will not traverse it, and the
value at node E will be 5.

12
Step 5: At next step, algorithm again backtrack the tree, from node B to node A. At node
A, the value of alpha will be changed the maximum available value is 3 as max (-∞, 3)= 3, and
β= +∞, these two values now passes to right successor of A which is Node C.

At node C, α=3 and β= +∞, and the same values will be passed on to node F.

Step 6: At node F, again the value of α will be compared with left child which is 0, and
max(3,0)= 3, and then compared with right child which is 1, and max(3,1)= 3 still α
remains 3, but the node value of F will become 1.

Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value of beta
13
will be changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1, and
again it satisfies the condition α>=β, so the next child of C which is G will be pruned, and the
algorithm will not compute the entire sub-tree G.

Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3. Following
is the final game tree which is the showing the nodes which are computed and nodes which
has never computed. Hence the optimal value for the maximizer is 3 for this example.

14
Move Ordering in Alpha-Beta pruning:

The effectiveness of alpha-beta pruning is highly dependent on the order in which each
node is examined. Move order is an important aspect of alpha-beta pruning.

It can be of two types:

Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of the leaves
of the tree, and works exactly as minimax algorithm. In this case, it also consumes more time
because of alpha-beta factors, such a move of pruning is called worst ordering. In this case,
the best move occurs on the right side of the tree. The time complexity for such an order
is O(bm).

Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of pruning happens
in the tree, and best moves occur at the left side of the tree. We apply DFS hence it

first search left of the tree and go deep twice as minimax algorithm in the same amount of
time. Complexity in ideal ordering is O(bm/2).

Rules to find good ordering:

Following are some rules to find good ordering in alpha-beta pruning:

o Occur the best move from the shallowest node.


o Order the nodes in the tree such that the best nodes are checked first.
o Use domain knowledge while finding the best move. Ex: for Chess, try order: captures
first, then threats, then forward moves, backward moves.
o We can bookkeep the states, as there is a possibility that states may repeat.

3.4.Monte Carlo Tree Search (MCTS):


 Monte Carlo Tree Search (MCTS) is a search technique in the field of Artificial
Intelligence (AI).
 It is a probabilistic and heuristic driven search algorithm that combines the classic
tree search implementations alongside machine learning principles of
reinforcement learning.
 In tree search, there’s always the possibility that the current best action is actually
not the most optimal action. In such cases, MCTS algorithm becomes useful as it
continues to evaluate other alternatives periodically during the learning phase by
executing them, instead of the current perceived optimal strategy.
 This is known as the “ exploration-exploitation trade-off ". It exploits the
actions and strategies that is found to be the best till now but also must continue to
explore the local space of alternative decisions and find out if they could replace
the current best.
 Exploration helps in exploring and discovering the unexplored parts of the tree,
which could result in finding a more optimal path. In other words, we can say that
exploration expands the tree’s breadth more than its depth.
15
 Exploration can be useful to ensure that MCTS is not overlooking any
potentially better paths.
 But it quickly becomes inefficient in situations with large number of steps or
repetitions. In order to avoid that, it is balanced out by exploitation.
 exploitation sticks to a single path that has the greatest estimated value. This is
a greedy approach and this will extend the tree’s depth more than its breadth.
 In simple words, UCB formula applied to trees helps to balance the exploration-
exploitation trade-off by periodically exploring relatively unexplored nodes of
the tree.

In MCTS, nodes are the building blocks of the search tree. These nodes are formed based on
the outcome of a number of simulations.
The process of Monte Carlo Tree Search can be broken down into four distinct steps, viz.,
selection, expansion, simulation and backpropagation
Each of these steps is explained in details below:

 Selection:
 In this process, the MCTS algorithm traverses the current tree from the root node using
a specific strategy.
 The strategy uses an evaluation function to optimally select nodes with the highest
estimated value.
 MCTS uses the Upper Confidence Bound (UCB) formula applied to trees as the
strategy in the selection process to traverse the tree. It balances the exploration-
exploitation trade-off.
 During tree traversal, a node is selected based on some parameters that return the
maximum value. The parameters are characterized by the formula that is typically used
for this purpose is given below.

 where;
Si = value of a node i
xi = empirical mean of a node i
C = a constant
t = total number of simulations
When traversing a tree during the selection process, the child node that returns the greatest
value from the above equation will be one that will get selected. During traversal, once a
child node is found which is also a leaf node, the MCTS jumps into the expansion step.
 Expansion: In this process, a new child node is added to the tree to that node which was
optimally reached during the selection process.
 Simulation: In this process, a simulation is performed by choosing moves or strategies until
a result or predefined state is achieved.
 Backpropagation:
 After determining the value of the newly added node, the remaining tree must be
updated. So, the backpropagation process is performed, where it backpropagates from the new
node to the root node.

16
 During the process, the number of simulation stored in each node is incremented. Also,
if the new node’s simulation results in a win, then the number of wins is also incremented.
The above steps can be visually understood by the diagram given below:

 These types of algorithms are particularly useful in turn based games where there is no element of
chance in the game mechanics, such as Tic Tac Toe, Connect 4, Checkers, Chess, Go, etc. This
has recently been used by Artificial Intelligence Programs like AlphaGo, to play against the
world’s top Go players. But, its application is not limited to games only.
 It can be used in any situation which is described by state-action pairs and simulations used
to forecast outcomes.
 As we can see, the MCTS algorithm reduces to a very few set of functions which we can use any
choice of games or in any optimizing strategy.

Advantages of Monte Carlo Tree Search:

1. MCTS is a simple algorithm to implement.


2. Monte Carlo Tree Search is a heuristic algorithm. MCTS can operate effectively without
any knowledge in the particular domain, apart from the rules and end conditions, and can
find its own moves and learn from them by playing random playouts.
3. The MCTS can be saved in any intermediate state and that state can be used in future use
cases whenever required.
4. MCTS supports asymmetric expansion of the search tree based on the circumstances in
which it is operating.
Disadvantages of Monte Carlo Tree Search:

1. As the tree growth becomes rapid after a few iterations, it requires a huge amount of
memory.

2. There is a bit of a reliability issue with Monte Carlo Tree Search. In certain scenarios, there
might be a single branch or path, that might lead to loss against the opposition when
implemented for those turn-based games.

17
3. This is mainly due to the vast amount of
combinations and each of the nodes might not be visited enough number of times to
understand its result or outcome in the long run.
4. MCTS algorithm needs a huge number of iterations to be able to effectively decide the most
efficient path. So, there is a bit of a speed issue there.

3.5.Stochastic games:
 Many unforeseeable external occurrences can place us in unforeseen circumstances in real
life.
 Many games, such as dice tossing, have a random element to reflect this
unpredictability. These are known as stochastic games. Backgammon is a classic game that
mixes skill and luck.
 The legal moves are determined by rolling dice at the start of each player’s turn white, for
example, has rolled a 6–5 and has four alternative moves in the backgammon scenario
shown in the figure below.

 This is a standard backgammon position. The object of the game is to get all of one’s pieces
off the board as quickly as possible.
 White moves in a clockwise direction toward 25, while Black moves in a
counterclockwise direction toward 0.
 Unless there are many opponent pieces, a piece can advance to any position; if there
is only one opponent, it is caught and must start over. White has rolled a 6–5 and must
pick between four valid moves: (5–10,5–11), (5–11,19– 24), (5–10,10–16), and (5–11,11–
16), where the notation (5–11,11–16) denotes moving one piece from position 5 to 11
and then another from 11 to 16. Stochastic game tree
for a backgammon position
 White knows his or her own legal moves, but he or she has no idea how Black
will roll, and thus has no idea what Black’s legal moves will be. That means White
won’t be able to build a normal game tree-like in chess or tic-tac-toe. In
backgammon, in addition to M A X and M I N nodes, a game tree must include
chance nodes.
18
 The figure below depicts chance nodes as circles. The possible dice rolls are
indicated by the branches leading from each chance node; each branch is labelled
with the roll and its probability. There are 36 different ways to roll two dice, each
equally likely, yet there are only 21 distinct rolls because a 6–5 is the same as a 5–
6.

P (1–1) = 1/36 because each of the six doubles (1–1 through 6–6) has a probability of 1/36.
Each of the other 15 rolls has a 1/18 chance of happening.

 The following phase is to learn how to make good decisions. Obviously, we want to choose
the move that will put us in the best position. Positions, on the other hand, do not have
specific minimum and maximum values.
 Instead, we can only compute a position’s anticipated value, which is the average of all potential
outcomes of the chance nodes.
 As a result, we can generalize the deterministic minimax value to an expected-minimax value for
games with chance nodes.
 Terminal nodes, MAX and MIN nodes (for which the dice roll is known), and MAX and MIN
nodes (for which the dice roll is unknown) all function as before. We compute the expected value
for chance nodes, which is the sum of all outcomes, weighted by the probability of each chance
action.

3.6.Partially observable games:


19
 A partially observable system is one in which the entire state of the system is not fully
visible to an external sensor.
 In a partially observable system the observer may utilise a memory system in order to add
information to the observer's understanding of the system.
 An example of a partially observable system would be a card game in which some of the cards are
discarded into a pile face down.

 In this case the observer is only able to view their own cards and potentially those of the dealer.
They are not able to view the face-down (used) cards,nor the cards that will be dealt at some stage
in the future.
 A memory system can be used to remember the previously dealt cards that are now on the used
pile. This adds to the total sum of knowledge that the observer can use to make decisions.
 In contrast, a fully observable system would be that of chess. In chess (apart from the 'who is
moving next' state, and minor subtleties such as whether a side has castled, which may not be
clear) the full state of the system is observable at any point in time.
 Partially observable is a term used in a variety of mathematical settings, including that of artificial
intelligence and partially observable Markov decision processes.

3.7.Constraint satisfaction problems:


 We have seen so many techniques like Local search, Adversarial search to solve different
problems. The objective of every problem-solving technique is one, i.e., to find a solution
to reach the goal.

 Although, in adversarial search and local search, there were no constraints on the agents while
solving the problems and reaching to its solutions.
 Constraint satisfaction technique. By the name, it is understood that constraint satisfaction means
solving a problem under certain constraints or rules.
 Constraint satisfaction is a technique where a problem is solved when its values satisfy certain
constraints or rules of the problem.
 Such type of technique leads to a deeper understanding of the problem structure as well as its
complexity.
 Constraint satisfaction depends on three components, namely:
X: It is a set of variables.
D: It is a set of domains where the variables reside. There is a specific domain for each
variable.
C: It is a set of constraints which are followed by the set of variables.
 In constraint satisfaction, domains are the spaces where the variables reside, following the
problem specific constraints.
 These are the three main elements of a constraint satisfaction technique. The constraint
value consists of a pair of {scope, rel}.
 The scope is a tuple of variables which participate in the constraint and rel is a relation
which includes a list of values which the variables can take to satisfy the constraints of the

20
problem.
Solving Constraint Satisfaction Problems
The requirements to solve a constraint satisfaction problem (CSP) is:

 A state-space
 The notion of the solution.

A state in state-space is defined by assigning values to some or all variables such as


{X1=v1, X2=v2, and so on…}.
An assignment of values to a variable can be done in three ways:

 Consistent or Legal Assignment: An assignment which does not violate any constraint
or rule is called Consistent or legal assignment.
 Complete Assignment: An assignment where every variable is assigned with a value,
and the solution to the CSP remains consistent. Such assignment is known as Complete
assignment.
 Partial Assignment: An assignment which assigns values to some of the variables only.
Such type of assignments are called Partial assignments.

Types of Domains in CSP


There are following two types of domains which are used by the variables :

 Discrete Domain: It is an infinite domain which can have one state for multiple
 variables. For example, a start state can be allocated infinite times for each variable.
 Finite Domain: It is a finite domain which can have continuous states describing one
domain for one specific variable. It is also called a continuous domain.

Constraint Types in CSP


With respect to the variables, basically there are following types of constraints:

 Unary Constraints: It is the simplest type of constraints that restricts the value of a
single variable.
 Binary Constraints: It is the constraint type which relates two variables. A value x2 will
contain a value which lies between x1 and x3.
 Global Constraints: It is the constraint type which involves an arbitrary number of
variables.
 Some special types of solution algorithms are used to solve the following types of
constraints:
 Linear Constraints: These types of constraints are commonly used in linear
programming where each variable containing an integer value exists in linear form
only.
 Non-linear Constraints: These types of constraints are used in non-linear
programming where each variable (an integer value) exists in a non-linear form.

Note: A special constraint which works in real-world is known as Preference constraint.

21
3.8.Constraint Propagation
 In local state-spaces, the choice is only one, i.e., to search for a solution. But in CSP,
we have two choices either:
1.We can search for a solution or
2.We can perform a special type of inference called constraint propagation.
 Constraint propagation is a special type of inference which helps in reducing the legal number of
values for the variables. The idea behind constraint propagation is local consistency.
 In local consistency, variables are treated as nodes, and each binary constraint is treated as
an arc in the given problem.
 There are following local consistencies which are discussed below:

 Node Consistency: A single variable is said to be node consistent if all the values in the

variable’s domain satisfy the unary constraints on the variables.

 Arc Consistency: A variable is arc consistent if every value in its domain satisfies the
binary constraints of the variables.
 Path Consistency: When the evaluation of a set of two variable with respect to a third
variable can be extended over another variable, satisfying all the binary constraints. It is
similar to arc consistency.
 k-consistency: This type of consistency is used to define the notion of stronger forms of
propagation. Here, we examine the k-consistency of the variables.

CSP Problems
Constraint satisfaction includes those problems which contains some constraints while solving
the problem. CSP includes the following problems:

 Graph Coloring: The problem where the constraint is that no adjacent sides can have
the same color.

22
 Sudoku Playing: The gameplay where the constraint is that no number from 0-9 can be
repeated in the same row or column.

 n-queen problem: In n-queen problem, the constraint is that no queen should be placed
either diagonally, in the same row or column.

Note: The n-queen problem is already discussed in Problem-solving in AI section.

 Crossword: In crossword problem, the constraint is that there should be the correct
formation of the words, and it should be meaningful.

23
 Latin square Problem: In this game, the task is to search the pattern which is occurring
several times in the game. They may be shuffled but will contain the same digits.

 Cryptarithmetic Problem: This problem has one most important constraint that is, we
cannot assign a different digit to the same character. All digits should contain a unique
alphabet.

24
Cryptarithmetic Problem

Cryptarithmetic Problem is a type of constraint satisfaction problem where the game is about digits
and its unique replacement either with alphabets or other symbols. In cryptarithmetic problem,
the digits (0-9) get substituted by some possible alphabets or symbols. The task in cryptarithmetic
problem is to substitute each digit with an alphabet to get the result arithmetically correct.

We can perform all the arithmetic operations on a given cryptarithmetic problem.


The rules or constraints on a cryptarithmetic problem are as follows:

 There should be a unique digit to be replaced with a unique alphabet.


 The result should satisfy the predefined arithmetic rules, i.e., 2+2 =4, nothing else.
 Digits should be from 0-9 only.
 There should be only one carry forward, while performing the addition operation on a
problem.
 The problem can be solved from both sides, i.e., lefthand side (L.H.S), or righthand
side (R.H.S)

Let’s understand the cryptarithmetic problem as well its constraints better with the help
of an example:

 Given a cryptarithmetic problem, i.e., S E N D + M O R E = M O N E Y

 In this example, add both terms S E N D and M O R E to bring M O N E Y as a result.

Follow the below steps to understand the given problem by breaking it into its subparts:

 Starting from the left hand side (L.H.S) , the terms are S and M. Assign a digit which
could give a satisfactory result. Let’s assign S->9 and M->1.

25
Hence, we get a satisfactory result by adding up the terms and got an assignment for O as O-
>0 as well.
 Now, move ahead to the next terms E and O to get N as its output.

Adding E and O, which means 5+0=0, which is not possible because according to
cryptarithmetic constraints, we cannot assign the same digit to two letters. So, we need to think
more and assign some other value.

Note: When we will solve further, we will get one carry, so after applying it, the answer will be
satisfied.

 Further, adding the next two terms N and R we get,

26
But, we have already assigned E->5. Thus, the above result does not satisfy the values
because we are getting a different value for E. So, we need to think more.
Again, after solving the whole problem, we will get a carryover on this term, so our answer
will be satisfied.

where 1 will be carry forward to the above term


Let’s move ahead.

 Again, on adding the last two terms, i.e., the rightmost terms D and E, we get Y as its
result.

where 1 will be carry forward to the above term


 Keeping all the constraints in mind, the final resultant is as follows:

27
 Below is the representation of the assignment of the digits to the alphabets.

More examples of cryptarithmatic problems can be:

28
3.9.Backtracking search for CSP:
Backtracking search, a form of depth-first search, is commonly used for solving CSPs. Inference
can be interwoven with search.

Commutativity: CSPs are all commutative. A problem is commutative if the order of


application of any given set of actions has no effect on the outcome.

Backtracking search: A depth-first search that chooses values for one variable at a time and
backtracks when a variable has no legal values left to assign.

Backtracking algorithm repeatedly chooses an unassigned variable, and then tries all values in
the domain of that variable in turn, trying to find a solution. If an inconsistency is detected, then
BACKTRACK returns failure, causing the previous call to try another value.

There is no need to supply BACKTRACKING-SEARCH with a domain-specific initial state,


action function, transition model, or goal test.

BACKTRACKING-SARCH keeps only a single representation of a state and alters that


representation rather than creating a new ones.

29
To solve CSPs efficiently without domain-specific knowledge, address following questions:
1) function SELECT-UNASSIGNED-VARIABLE: which variable should be assigned next?
function ORDER-DOMAIN-VALUES: in what order should its values be tried?
2) function INFERENCE: what inferences should be performed at each step in the search?
3) When the search arrives at an assignment that violates a constraint, can the search avoid
repeating this failure?

1. Variable and value ordering

30
SELECT-UNASSIGNED-VARIABLE
Variable selection—fail-first
Minimum-remaining-values (MRV) heuristic:
 The idea of choosing the variable with the fewest “legal” value. A.k.a. “most
constrained variable” or “fail-first” heuristic, it picks a variable that is most likely to
cause a failure soon thereby pruning the search tree.
 If some variable X has no legal values left, the MRV heuristic will select X and failure
will be detected immediately—avoiding pointless searches through other variables.
E.g. After the assignment for WA=red and NT=green, there is only one possible value for SA, so
it makes sense to assign SA=blue next rather than assigning Q.
[Powerful guide]
Degree heuristic:
 The degree heuristic attempts to reduce the branching factor on future choices by selecting
the variable that is involved in the largest number of constraints on other unassigned
variables. [useful tie-breaker]
e.g. SA is the variable with highest degree 5; the other variables have degree 2 or 3; T has degree
0.

ORDER-DOMAIN-VALUES

Value selection—fail-last

 If we are trying to find all the solution to a problem (not just the first one), then the
ordering does not matter.
Least-constraining-value heuristic: prefers the value that rules out the fewest choice for the
neighboring variables in the constraint graph. (Try to leave the maximum flexibility for
subsequent variable assignments.)
e.g. We have generated the partial assignment with WA=red and NT=green and that our next
choice is for Q. Blue would be a bad choice because it eliminates the last legal value left for Q’s
neighbor, SA, therefore prefers red to blue.

2. Interleaving search and inference

INFERENCE
forward checking: [One of the simplest forms of inference.] Whenever a variable X is assigned,
the forward-checking process establishes arc consistency for it:
for each unassigned variable Y that is connected to X by a constraint, delete from Y’s domain
any value that is inconsistent with the value chosen for X.
There is no reason to do forward checking if we have already done arc consistency as a
preprocessing step.

31
Advantage: For many problems the search will be more effective if we combine the MRV
heuristic with forward checking.
Disadvantage: Forward checking only makes the current variable arc-consistent, but doesn’t
look ahead and make all the other variables arc-consistent.
MAC (Maintaining Arc Consistency) algorithm:
[More powerful than forward checking, detect this inconsistency.] After a variable Xi is
assigned a value, the INFERENCE procedure calls AC-3, but instead of a queue of all arcs in the
CSP, we start with only the arcs(Xj, Xi) for all Xj that are unassigned variables that are
neighbors of Xi.
From there, AC-3 does constraint propagation in the usual way, and if any variable has its
domain reduced to the empty set, the call to AC-3 fails and we know to backtrack immediately.
Intelligent backtracking

chronological backtracking:

The BACKGRACKING-SEARCH. When a branch of the search fails, back up to the preceding
variable and try a different value for it. (The most recent decision point is revisited.)
32
e.g.
Suppose we have generated the partial assignment {Q=red, NSW=green, V=blue, T=red}.
When we try the next variable SA, we see every value violates a constraint.
We back up to T and try a new color, it cannot resolve the problem.
Intelligent backtracking:
 Backtrack to a variable that was responsible for making one of the possible values of the
next variable (e.g. SA) impossible.
Conflict set for a variable:
 A set of assignments that are in conflict with some value for that variable.
(e.g. The set {Q=red, NSW=green, V=blue} is the conflict set for SA.)
backjumping method:
 Backtracks to the most recent assignment in the conflict set. (e.g.
backjumping would jump over T and try a new value for V.)

Forward checking can supply the conflict set with no extra work.
 Whenever forward checking based on an assignment X=x deletes a value from Y’s domain, add
X=x to Y’s conflict set;
 If the last value is deleted from Y’s domain, the assignment in the conflict set of Y are added to
the conflict set of X.
 In fact,every branch pruned by backjumping is also pruned by forward checking. Hence simple
backjumping is redundant in a forward-checking search or in a search that uses stronger
consistency checking (such as MAC).
Conflict-directed back jumping:
e.g.
consider the partial assignment which is proved to be inconsistent: {WA=red, NSW=red}.
We try T=red next and then assign NT, Q, V, SA, no assignment can work for these last 4
variables.
Eventually we run out of value to try at NT, but simple backjumping cannot work because NT
doesn’t have a complete conflict set of preceding variables that caused to fail.
The set {WA, NSW} is a deeper notion of the conflict set for NT, caused NT together with any
subsequent variables to have no consistent solution. So the algorithm should backtrack to NSW
and skip over T.
A back jumping algorithm that uses conflict sets defined in this way is called conflict-direct back
jumping.
How to Compute:
When a variable’s domain becomes empty, the “terminal” failure occurs, that variable has a
standard conflict set.
Let Xj be the current variable, let conf(Xj) be its conflict set. If every possible value for Xj fails,
backjump to the most recent variable Xi in conf(Xj), and set
conf(Xi) ← conf(Xi)∪ conf(Xj) – {Xi}.
The conflict set for an variable means, there is no solution from that variable onward, given the
preceding assignment to the conflict set.
e.g.
assign WA, NSW, T, NT, Q, V, SA.
SA fails, and its conflict set is {WA, NT, Q}. (standard conflict set)
Backjump to Q, its conflict set is {NT, NSW}∪{WA,NT,Q}-{Q} = {WA, NT, NSW}.
Backtrack to NT, its conflict set is {WA}∪{WA,NT,NSW}-{NT} = {WA, NSW}.
Hence the algorithm backjump to NSW. (over T)

33
After backjumping from a contradiction, how to avoid running into the same problem again:

Constraint learning:

 The idea of finding a minimum set of variables from the conflict set that causes the
problem.
 This set of variables, along with their corresponding values, is called a no- good. We then record
the no-good, either by adding a new constraint to the CSP or by keeping a separate cache of no-
goods.
 Backtracking occurs when no legal assignment can be found for a variable. Conflict-directed
backjumping backtracks directly to the source of the problem.

3.10.Local search for CSP:

 Local search algorithms for CSPs use a complete-state formulation: the initial state assigns a
value to every variable, and the search change the value of one variable at a time.
 The min-conflicts heuristic: In choosing a new value for a variable, select the value that results
in the minimum number of conflicts with other variables.

Local search techniques in Section 4.1 can be used in local search for CSPs.
 The landscape of a CSP under the mini-conflicts heuristic usually has a series of
plateau.
 Simulated annealing and Plateau search (i.e. allowing sideways moves to another
state with the same score) can help local search find its way off the plateau.
 This wandering on the plateau can be directed with tabu search: keeping a small list
of recently visited states and forbidding the algorithm to return to those tates.

34
Constraint weighting: a technique that can help concentrate the search on the important
constraints.
 Each constraint is given a numeric weight Wi, initially all 1.
 At each step, the algorithm chooses a variable/value pair to change that will result in the lowest
total weight of all violated constraints.

 The weights are then adjusted by incrementing the weight of each constraint that is violated by
the current assignment.

 Local search can be used in an online setting when the problem changes, this is particularly
important in scheduling problems.

The structure of problem

1. The structure of constraint graph


 The structure of the problem as represented by the constraint graph can be used to
find solution quickly.
e.g. The problem can be decomposed into 2 independent subproblems: Coloring T and coloring
the mainland.

Tree: A constraint graph is a tree when any two varyiable are connected by only one path.

Directed arc consistency (DAC):


 A CSP is defined to be directed arc-consistent under an ordering of variables X1,
X2, … , Xn if and only if every Xi is arc-consistent with each Xj for j>i. By using
DAC, any tree-structured CSP can be solved in time linear in the number of
variables.
 How to solve a tree-structure CSP:
Pick any variable to be the root of the tree;
 Choose an ordering of the variable such that each variable appears after its parent in the
tree. (topological sort)
 Any tree with n nodes has n-1 arcs, so we can make this graph directed arc-consistent in O(n)
35
steps, each of which must compare up to d possible domain values for 2 variables, for a total
time of O(nd2).
 Once we have a directed arc-consistent graph, we can just march down the list of variables and
choose any remaining value.
 Since each link from a parent to its child is arc consistent, we won’t have to backtrack, and can
move linearly through the variables.

There are 2 primary ways to reduce more general constraint graphs to trees:
1. Based on removing nodes;

36
e.g. We can delete SA from the graph by fixing a value for SA and deleting from the domains of
other variables any values that are inconsistent with the value chosen for SA.
The general algorithm:
 Choose a subset S of the CSP’s variables such that the constraint graph becomes a tree after
removal of S. S is called a cycle cutset.
 For each possible assignment to the variables in S that satisfies all constraints on S,
(a) remove from the domain of the remaining variables any values that are inconsistent with the
assignment for S, and
(b) If the remaining CSP has a solution, return it together with the assignment for S.
Time complexity: O(dc·(n-c)d2), c is the size of the cycle cut set.
Cutset conditioning:
 The overall algorithmic approach of efficient approximation algorithms to find the
smallest cycle cutset.
2. Based on collapsing nodes together
Tree decomposition:
construct a tree decomposition of the constraint graph into a set of connected subproblems,
each subproblem is solved independently, and the resulting solutions are then combined.

A tree decomposition must satisfy 3 requirements:


 Every variable in the original problem appears in at least one of the subproblems.
 If 2 variables are connected by a constraint in the original problem, they must appear
together (along with the constraint) in at least one of the subproblems.
 If a variable appears in 2 subproblems in the tree, it must appear in every subproblem
along the path connecting those those subproblems.
We solve each subproblem independently.
 If any one has no solution, the entire problem has no solution.
 If we can solve all the subproblems, then construct a global solution as follows:
First, view each subproblem as a “mega-variable” whose domain is the set of all solutions for the
subproblem.
Then, solve the constraints connecting the subproblems using the efficient algorithm for trees.

37
A given constraint graph admits many tree decomposition;
In choosing a decomposition, the aim is to make the subproblems as small as possible.
Tree width:
 The tree width of a tree decomposition of a graph is one less than the size of the largest
subproblems.
 The tree width of the graph itself is the minimum tree width among all its tree decompositions.
Time complexity: O(ndw+1), w is the tree width of the graph.

2. The structure in the values of variables


 By introducing a symmetry-breaking constraint, we can break the value symmetry and
reduce the search space by a factor of n!.
e.g.
Consider the map-coloring problems with n colors, for every consistent solution, there is actually
a set of n! solutions formed by permuting the color names.(value symmetry)
On the Australia map, WA, NT and SA must all have different colors, so there are 3!=6 ways to
assign.
We can impose an arbitrary ordering constraint NT<SA<WA that requires the 3 values to be in
alphabetical order. This constraint ensures that only one of the n! solution is possible: {NT=blue,
SA=green, WA=red}. (symmetry-breaking constraint)

38
PART-B(Review qustions)

1. Explain the perfect decisions in game playing with example.


2. Explain in detail about the minmax algorithms and how it works for tic-tac-toe.

3. Explain about alpha beta pruning with examples.

4. Explain the Optimal decisions in game.

5. Explain the components of monte carlo tree search with example.

6. Discuss stochastic games with example.

7. What is CSP with examples.

8. Breifly explain the Constraint propagation with example.

9. What is backtracking search for and explain with an examples.

39
40

You might also like