0% found this document useful (0 votes)
12 views25 pages

Ai Unit 2 Continued

The document discusses advanced search techniques in complex environments, focusing on local search algorithms, optimization problems, and strategies for navigating uncertainty and partial observability. It covers various methods such as hill climbing, simulated annealing, local beam search, and genetic algorithms, highlighting their advantages and limitations in finding solutions. Additionally, it addresses the challenges of nondeterministic actions and the need for contingency planning in search algorithms.

Uploaded by

satwik.ug23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views25 pages

Ai Unit 2 Continued

The document discusses advanced search techniques in complex environments, focusing on local search algorithms, optimization problems, and strategies for navigating uncertainty and partial observability. It covers various methods such as hill climbing, simulated annealing, local beam search, and genetic algorithms, highlighting their advantages and limitations in finding solutions. Additionally, it addresses the challenges of nondeterministic actions and the need for contingency planning in search algorithms.

Uploaded by

satwik.ug23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

AI UNIT 2 (CONTINUED)

Search in Complex Environments


 Complex environments include factors like large state spaces, uncertainty,
dynamic elements, and partial observability.
 The goal is to navigate or find solutions despite these complicating
features.

We extend classical search algorithms (e.g., BFS, DFS, A*) to handle complex
scenarios such as optimization problems, nondeterministic actions, partial
observability, and unknown environments. Key focus areas include local
search, stochastic environments, and online search

Local Search Algorithms and Optimization


Problems
The search algorithms that we have seen so far are designed to explore search
spaces systematically. This is achieved by keeping one or more paths in memory
and by recording which alternatives have been explored at each point along the
path. When a goal state is found, the path to that goal state constitutes a solution
to the problem

In many problems, however, the path to the goal is irrelevant. For example, in the
8-queens problem, what matters is the final configuration of queens, not the
order in which they are added. If the path to the goal does not matter, we might
consider a different class of algorithms, ones that do not worry about paths at all.

Local search algorithms operate using a single current node (rather than multiple
paths) and generally move only to neighbors of that node. Typically, the paths
followed by the search are not retained. Although local search algorithms are not
systematic, they have two key advantages:
(1) they use very little memory—usually a constant amount

(2) they can often find reasonable solutions in large or infinite (continuous) state
spaces for which systematic algorithms are unsuitable

In addition to finding goals, local search algorithms are useful for solving pure
optimization problems, in which the aim is to find the best state according to an
objective function

If elevation corresponds to cost, then the aim is to find the lowest valley—a global
minimum; if elevation corresponds to an objective function, then the aim is to
find the highest peak—a global maximum. Local search algorithms explore this
state-space landscape. A complete local search algorithm always finds a goal if
one exists; an optimal algorithm always finds a global minimum/maximum

1) Hill Climbing Search –

Hill-climbing search is a local search algorithm that continually moves toward


states with higher value (or lower cost in minimization problems). It is simply a
loop that continually moves in the direction of increasing value—that is, uphill. It
terminates when it reaches a “peak” where no neighbor has a higher value. The
algorithm does not maintain a search tree, so the data structure for the current
node need only record the state and the value of the objective function. Hill
climbing does not look ahead beyond the immediate neighbors of the current
state.

Performance on 8-Queens

 Steepest-ascent hill climbing:


o Solves only 14% of problem instances.
o On success: average 4 steps.
o On failure: average 3 steps.
o State space size ≈ 17 million.

Hill climbing is sometimes called greedy local search because it grabs a good
neighbor state without thinking ahead about where to go next. Unfortunately, hill
climbing often gets stuck for the following reasons:

 Local maxima: Hill-climbing algorithms that reach the vicinity of a local


maximum will be drawn upward toward the peak but will then be stuck
with nowhere else to go
 Ridges: Ridges result in a sequence of local maxima that is very difficult for
greedy algorithms to navigate

 Plateau: a plateau is a flat area of the state-space landscape. It can be a flat


local maximum, from which no uphill exit exists, or a shoulder, from which
progress is possible
Sideways Moves

 Move to neighbor with equal value (no improvement).


 Useful when on a shoulder plateau.
 Risk: infinite loops on flat maxima.
 Solution: limit the number of sideways moves (e.g., max 100).
 Impact: raises success rate on 8-queens from 14% to 94%

Hill-Climbing Variants

1. Stochastic Hill Climbing

 Randomly select from uphill moves.


 Can use probability weights based on steepness.
 Slower convergence but can find better solutions in complex landscapes, it
is incomplete.

2. First-Choice Hill Climbing

 Randomly generate successors until one is better.


 Efficient in large branching spaces (many successors), it is incomplete.

3. Random-Restart Hill Climbing

 Run multiple hill climbs from random initial states.


 Useful when landscape has many local maxima.
 It is trivially complete with probability approaching 1, because it will
eventually generate a goal state as the initial state
 NP-hard problems typically have an exponential number of local maxima to
get stuck on. Despite this, a reasonably good local maximum can often be
found after a small number of restarts
2) SIMULATED ANNEALING –

Pure Hill Climbing is incomplete — it gets stuck at local maxima and never moves
“downhill” (to worse states). Random Walk is complete — it eventually finds a
solution by exploring freely, but it’s very inefficient.

Therefore, it seems reasonable to try to combine hill climbing with a random walk
in some way that yields both efficiency and completeness. Simulated annealing is
such an algorithm

In metallurgy, annealing is the process used to temper or harden metals and glass
by heating them to a high temperature and then gradually cooling them, thus
allowing the material to reach a lowenergy crystalline state.

Applied metaphorically to search problems:

 Start with high randomness (explore freely).


 Gradually reduce randomness to settle into a good solution.

The innermost loop of the simulated-annealing algorithm is quite similar to hill


climbing. Instead of picking the best move, however, it picks a random move. If
the move improves the situation, it is always accepted. Otherwise, the algorithm
accepts the move with some probability less than 1.

The probability decreases exponentially with the “badness” of the move—the


amount ΔE by which the evaluation is worsened. The probability also decreases as
the “temperature” T goes down: “bad” moves are more likely to be allowed at the
start when T is high, and they become more unlikely as T decreases

The algorithm follows a cooling schedule — a function that reduces temperature


T over time. If the schedule cools slowly enough, the algorithm will eventually
find the global optimum with probability approaching 1.

Simulated annealing was first used extensively to solve VLSI layout problems in
the early 1980s. It has been applied widely to factory scheduling and other large-
scale optimization tasks.
Simulated annealing may perform better in landscapes with many deceptive local
optimas. While random-restart hill climbing resets completely, simulated
annealing continuously explores, making better use of the current search
trajectory.

3) LOCAL BEAM SEARCH –

The local beam search algorithm3 keeps track of k states rather than just one. It
begins with k randomly generated states. At each step, all the successors of all k
states are generated. If any one is a goal, the algorithm halts. Otherwise, it selects
the k best successors from the complete list and repeats.

A local beam search with k states might seem to be nothing more than running k
random restarts in parallel instead of in sequence. In a random-restart search,
each search process runs independently of the others. In a local beam search,
useful information is passed among the parallel search threads. The algorithm
quickly abandons unfruitful searches and moves its resources to where the most
progress is being made.

In its simplest form, local beam search can suffer from a lack of diversity among
the k states—they can quickly become concentrated in a small region of the state
space, making the search little more than an expensive version of hill climbing

A variant called stochastic beam search, analogous to stochastic hill climbing,


helps alleviate this problem. Instead of choosing the best k from the the pool of
candidate successors, stochastic beam search chooses k successors at random.
Better states have higher chance of being selected, but worse ones still have a
chance → maintains diversity.

Stochastic beam search bears some resemblance to the process of natural


selection, whereby the “successors” (offspring) of a “state” (organism) populate
the next generation according to its “value” (fitness)
4) GENETIC ALGORITHMS –

A genetic algorithm (or GA) is a variant of stochastic beam search in which


successor states are generated by combining two parent states rather than by
modifying a single state.

Like beam searches, GAs begin with a set of k randomly generated states, called
the population. Each state, or individual, is represented as a string over a finite
alphabet—most commonly, a string of 0s and 1s

Step-by-step Process:

1. Initialize a population of k random individuals (candidate solutions).


2. Evaluate each individual using the fitness function.
3. Selection: Choose pairs of individuals proportionally to fitness.
4. Crossover: For each pair, generate an offspring by:
o Picking a random crossover point.
o Combining part of the string from parent 1 with the rest from parent
2.
o Mimics reproduction.
o E.g.:

Parent 1: 32752411

Parent 2: 24748552

Offspring: 32748552 (first 3 from parent 1, rest from parent 2)

5. Mutation: Randomly mutate some bits (or digits) in the offspring.


6. Replace the old population with the new one.
7. Repeat until a termination condition is met (good solution found or time
runs out).

Example: 8-Queens Problem

 A state can be represented as an 8-digit string (each digit = queen’s row in a


column).
 Fitness function = number of non-attacking pairs of queens (max = 28).
 Higher fitness = more chance to be selected as a parent.
 Crossover combines two parent strings at a random point, creating new
arrangements.
 Mutation moves a queen in one column to a random new row.

Like stochastic beam search, genetic algorithms combine an uphill tendency with
random exploration and exchange of information among parallel search threads.
The primary advantage, if any, of genetic algorithms comes from the crossover
operation. For example, it could be that putting the first three queens in positions
2, 4, and 6 (where they do not attack each other) constitutes a useful block that
can be combined with other blocks to construct a solution

Schema Theory

 A schema is a pattern or subset of a string with some unspecified positions.


o Example: 246***** = states where first 3 queens are in positions 2,
4, and 6.
 Instance = any full string matching the schema (e.g., 24613578).
 Over time, schemas with higher fitness tend to dominate the population.
 GAs work well when useful components of a solution can be preserved and
recombined via crossover.
LOCAL SEARCH IN CONTINUOUS SPACES
Many real-world environments are continuous (e.g., coordinates, angles, real-
valued inputs). Most classical search algorithms (like BFS, DFS, A*) fail in
continuous spaces because:

 Infinite branching factor (uncountable number of actions/states).

Only first-choice hill climbing and simulated annealing can naturally handle
continuous spaces.

One way to avoid continuous problems is simply to discretize the neighborhood of


each state. For example, we can move only one airport at a time in either the x or
y direction by a fixed amount ±δ. With 6 variables, this gives 12 possible
successors for each state. We can then apply any of the local search algorithms
described previously.
Many methods attempt to use the gradient of the landscape to find a maximum.
The gradient of the objective function is a vector ∇f that gives the magnitude and
direction of the steepest slope
For many problems, the most effective algorithm is the venerable Newton–
Raphson method. It uses second derivatives to reach optimum quickly, For high-
dimensional problems, however, computing the n2 entries of the Hessian and
inverting it may be expensive, so many approximate versions of the Newton–
Raphson method have been developed

Local search methods suffer from local maxima, ridges, and plateaux in
continuous state spaces just as much as in discrete spaces. Random restarts and
simulated annealing can be used and are often helpful. High-dimensional
continuous spaces are, however, big places in which it is easy to get lost.
Constrained Optimization

 Some problems have constraints (e.g., airports must be inside Romania and
on land).
 Types:
1. Linear Programming (LP):
 Constraints are linear inequalities.
 Objective function is linear.
 Constraint set forms a convex set.
 Solvable in polynomial time.
 It is a special case of convex optimization
2. Convex Optimization:
 More general than LP.
 Constraints form any convex region.
 Objective function is convex within constraint region.
 Under certain conditions, convex optimization problems are
also polynomially solvable and may be feasible in practice with
thousands of variables
 Several important problems in machine learning and control
theory can be formulated as convex optimization problems

SEARCHING WITH NON DETERMINISTIC ACTIONS


Determinism vs Non determinism in Search

 Until now, we assumed:


o Fully observable and deterministic environments.
o The agent always knows its current state.
o A sequence of actions leads predictably to a goal.
 But in nondeterministic or partially observable environments:
o Agent can't predict outcomes with certainty.
o Percepts after actions become important for understanding what
happened.
o Agent must adapt to different possible outcomes of its actions.
In nondeterministic settings, the solution is not a fixed sequence but a
contingency plan (or strategy):

 It includes conditional branches: “If this happens, do X; else, do Y”.


 Solutions for nondeterministic problems can contain nested if–then–else
statements; this means that they are trees rather than sequences
 Useful when outcomes depend on chance or uncertainty.
 Many problems in the real, physical world are contingency problems
because exact prediction is impossible

AND–OR Search Trees


In a deterministic environment, the only branching is introduced by the agent’s
own choices in each state, We call these nodes OR nodes. In a nondeterministic
environment, branching is also introduced by the environment’s choice of
outcome for each action. We call these nodes AND nodes.

These two kinds of nodes alternate, leading to an AND–OR tree

A solution for an AND–OR search problem is a subtree that

1) has a goal node at every leaf

2) specifies one action at each of its OR nodes

3) handles every outcome branch at each of its AND nodes

Algorithm Summary:

 Performs recursive depth-first search.


 Builds a conditional plan tree.
 Handles cycles by keeping track of the current path.
 Uses two recursive functions:
o OR-SEARCH: expands agent’s choices.
o AND-SEARCH: handles all possible results of an action.
How It Works:

 OR-SEARCH picks an action and delegates to AND-SEARCH.


 AND-SEARCH checks all possible results of the action:
o If any outcome leads to failure, the action is not viable.
o If all branches succeed, combine their plans into a conditional tree.
 Cycle Detection: if a state repeats in the path, the function returns failure
(avoids infinite loops).

AND–OR graphs can also be explored by breadth-first or best-first methods. The


concept of a heuristic function must be modified to estimate the cost of a
contingent solution rather than a sequence, but the notion of admissibility carries
over and there is an analog of the A∗ algorithm for finding optimal solutions.

Cyclic Solutions –
A cyclic solution is a plan that includes loops — meaning, the agent may repeat
certain actions or return to earlier states until a goal is achieved.

Instead of a linear or tree-like sequence of actions (like in deterministic search),


the plan may involve going around in cycles due to uncertainty in action
outcomes.

Why Do We Need Cyclic Solutions –


1. Unreliable Actions

In nondeterministic environments, actions might not always succeed.

 Example: In the slippery vacuum world, Move Right might leave the agent
in the same position.
 No finite sequence of actions guarantees the goal.
 So we need to retry until we get the right result.

2. No Acyclic Plan Exists

Sometimes, it's impossible to reach the goal with a non-repeating plan.

3. Handling Infinite Possibilities with Finite Logic


 Cyclic solutions let you deal with indefinite uncertainty using finite
instructions.
 Instead of accounting for every possible sequence, you encode logic:
"Keep trying until it works."

Valid Cyclic Solutions Must:

To be considered valid, a cyclic solution must:

1. Reach a goal from every point in the loop.


2. Not get stuck in infinite loops without progress.
3. Assume that every nondeterministic outcome occurs eventually (a key
assumption).

SEARCHING WITH PARTIAL OBSERVATIONS


When an agent doesn’t have full observability, it can’t know the exact state it’s in.
Instead, it maintains a belief state — a set of all possible physical states it could
be in, given the sequence of actions and percepts so far

When the agent’s percepts provide no information at all, we have what is called a
sensorless problem or sometimes a conformant problem, sensorless agents can
be surprisingly useful, primarily because they don’t rely on sensors working
properly, also sensing can be quite expensive

How Sensorless Search Works:

 States: Sets of physical states (belief states).


 Initial State: Typically includes all possible physical states.
 Actions: Union or intersection of all legal actions across the current belief
state.
 Transition Model: Predict the new belief state by applying the action to
each state in the current belief state.
 Goal Test: All states in the belief state must satisfy the goal.
 Path Cost: Assumed to be uniform across states .
Optimization:

"Belief states can be compared using subsets/supersets for pruning."

In conformant planning, you're not always sure about the initial state. So, you
work with belief states, which are sets of possible world states.

 A belief state A is a subset of belief state B if everything that could be true


in A is also possible in B.
 So, if you have already explored B, there's no need to explore A separately
— because A doesn’t add any new challenges.

👉 This helps prune the search space, because you can avoid redundant effort by
skipping belief states that are "less general" (i.e., subsets) of ones you've already
handled.

Incremental Belief-State Search:

"Build a solution that works for all initial states, checking one at a time."

Here, the idea is to:

1. Start with an empty plan.


2. Take one possible initial state (from the belief state).
3. Try to build a plan that works for that state.
4. Then take the next state, and update or expand the plan so it still works for
both.
5. Keep doing this until the plan works for all possible initial states.

You’re incrementally building a universal plan, one initial state at a time.


If at some point you can’t extend the plan to a new initial state, you backtrack or
revise.
Searching with Observations
Now we’re in the world of partial observability:
The agent doesn't know everything, but it gets some sensory feedback
(percepts).

Here’s how it works:

Step 1: Prediction

"Estimate next belief state via action."

 The agent performs an action, say move forward.


 Because it's uncertain about the world, it predicts what the new belief state
would be after that action.
 The result is a new belief state — a set of all possible world states the agent
might now be in.

Step 2: Observation Prediction

"Determine all possible percepts from this predicted belief state."

 After acting, the agent gets a percept (e.g., “I see a wall” or “I hear a beep”).
 But since the belief state has many possible world states, we consider:
o What are all the possible percepts I could get from this belief state?
 This gives us a list of possible observations, depending on which actual
state the agent was in.

Step 3: Update

"For each possible percept, refine the belief state."

 Once the actual percept is received, the agent filters the belief state.
 It keeps only the states that could have produced that percept.
 This shrinks the belief state → the agent becomes more certain about where
it might be.
Solving partially observable problems
 Use AND–OR search on belief-state space.

 The solution is a conditional plan based on updated belief states, not actual
states.

 Optimization techniques like subset checking and incremental search also apply
here.

Agents for Partially Observable Environments


In partially observable environments, agents don’t know exactly what state
they’re in — they only have partial information.

So, they need to handle uncertainty in a smarter way.

There are two main differences-

 First, the solution to a problem will be a conditional plan rather than a


sequence; if the first step is an if–then–else expression, the agent will need
to test the condition in the if-part and execute the then-part or the else-
part accordingly.
 Second, the agent will need to maintain its belief state as it performs
actions and receives percepts.
ONLINE SEARCH AGENTS AND UNKNOWN
ENVIRONMENTS
Offline vs. Online Search

 Offline Search:
o Computes a complete solution before interacting with the
environment.
o Suitable for static, deterministic domains where computation time
isn't a penalty.
 Online Search:
o Interleaves computation and action: Takes an action, observes the
result, then decides the next action.
o Beneficial in:
 Dynamic or semidynamic domains: There is a time penalty for
long computations.
 Nondeterministic domains: Allows agent to focus on actual
contingencies instead of unlikely ones.
 Unknown environments: Agent learns by exploring
(experimentation).

Online Search Problems


An online search problem must be solved by an agent executing actions, rather
than by pure computation.

Assumptions:

 Deterministic & fully observable environment


 Agent only knows:
o ACTIONS(s): Possible actions in state s
o c(s, a, s′): Step-cost function (used only after reaching s′)
o GOAL-TEST(s): Checks if s is a goal state
 Does NOT know RESULT(s, a) without performing a in s.

 May have access to admissible heuristic function h(s), e.g., Manhattan


distance.
Typically, the agent’s objective is to reach a goal state while minimizing cost.
(Another possible objective is simply to explore the entire environment.) The cost
is the total path cost of the path that the agent actually travels.

We the compare actual path cost vs. optimal path cost (if state space were
known). This is called the competitive ratio, we would like it to be as small as
possible.

Limitations:

 Dead ends: Irreversible actions can trap the agent, if some actions are
irreversible— i.e., they lead to a state from which no action leads back to
the previous state—the online search might accidentally reach a dead-end
state from which no goal state is reachable
 Adversary argument: An adversary can design state spaces that force the
agent into suboptimal decisions. To an online search algorithm that has
visited states S and A, the two state spaces look identical, so it must make
the same decision in both. Therefore, it will fail in one of them. This is an
example of an adversary argument
Safely explorable environments:

o No dead ends—every reachable state can eventually reach a goal.


o Often represented as undirected graphs (e.g., mazes, 8 puzzles).

Unbounded Competitive Ratio:

 Even with reversible actions, paths of unbounded cost can lead to arbitrarily bad efficiency .
 Hence, algorithm performance is measured in terms of state space size, not depth of the goal.

Online search agents


 After each action, agent receives a percept indicating its current state.
 The agent maintains a map of its environment using this information
 The current map is used to decide where to go next

Interleaving of planning and action means that online search algorithms are quite
different from the offline search algorithms, offline algorithms such as A∗ can
expand a node in one part of the space and then immediately expand a node in
another part of the space, because node expansion involves simulated rather
than real actions

An online algorithm, on the other hand, can discover successors only for a node
that it physically occupies. To avoid traveling all the way across the tree to expand
the next node, it seems better to expand nodes in a local order. Depth-first search
has exactly this property because (except when backtracking) the next node
expanded is a child of the previous node expanded.
Online-DFS-Agent:

 Based on depth-first search with physical backtracking.


 Behavior:
o Tries all unexplored actions from the current state.
o If none left, backtracks to the predecessor state.
o Stops when no unexplored actions or predecessor states remain.
 The algorithm keeps a table that lists, for each state, the predecessor states
to which the agent has not yet backtracked. If the agent has run out of
states to which it can backtrack, then its search is complete

Performance:

 In worst case, traverses each link twice — optimal for exploration.


 Can have a high competitive ratio if the goal is near the start but the agent
explores faraway branches.

Limitations:

 Only works in reversible state spaces.


 No online search algorithm has a bounded competitive ratio for
irreversible environments.

Online local search


Hill-Climbing:

 Already online in nature—maintains one current state.


 Weakness: Gets stuck in local maxima, also random restarts cannot be
used as agent cannot transport itself to a new state

Random Walk:

 A random action is selected at each step


 A random walk will eventually find a goal or complete its exploration,
provided that the space is finite
 Slow in some cases
Learning Real-Time A* (LRTA*):

 Combines hill climbing with learned cost estimates (H(s)).


 H(s) starts out being just the heuristic estimate h(s) and is updated as the
agent gains experience in the state space
 Like ONLINE-DFS-AGENT, it builds a map of the environment in the result
table. Agent updates H(s) based on the cost of moving to a neighbor and
the neighbor’s estimated distance to the goal and then chooses the
“apparently best” move according to its current cost estimates
 Actions that have not yet been tried in a state s are always assumed to lead
immediately to the goal with the least possible cost, namely h(s). This
optimism under uncertainty encourages the agent to explore new, possibly
promising paths

Advantages:

 Finds goal in finite, safely explorable environments.


 May take up to O(n^2)steps, but usually performs better.

Limitations:

 Not complete in infinite spaces (can get lost indefinitely).


 Only guaranteed to find goal in finite, reversible environments.

Learning in online search


First, the agents learn a “map” of the environment, the outcome of each action in
each state—simply by recording each of their experiences

Second, the local search agents acquire more accurate estimates of the cost of
each state by using local updating rules, as in LRTA, updates eventually converge
to exact values for every state, provided that the agent explores the state space in
the right way. Once exact values are known, optimal decisions can be taken
simply by moving to the lowest-cost successor—that is, pure hill climbing is then
an optimal strategy

You might also like