0% found this document useful (0 votes)
57 views

Maze Solving AI

This document discusses research and implementation of various search algorithms and reinforcement learning for solving mazes. It describes using a randomized Prim's algorithm to generate mazes without cycles or loops. It then implements and compares depth-first search, breadth-first search, A* search, modified depth-first search with a heuristic, and reinforcement learning in maze environments to analyze their relative efficiencies. The results are intended to show how providing algorithms with environmental information impacts search performance.

Uploaded by

Timothy Maradeo
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Maze Solving AI

This document discusses research and implementation of various search algorithms and reinforcement learning for solving mazes. It describes using a randomized Prim's algorithm to generate mazes without cycles or loops. It then implements and compares depth-first search, breadth-first search, A* search, modified depth-first search with a heuristic, and reinforcement learning in maze environments to analyze their relative efficiencies. The results are intended to show how providing algorithms with environmental information impacts search performance.

Uploaded by

Timothy Maradeo
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Foundations of

Artificial Intelligence

Efficiency of Agent Algorithms and Reinforced


Learning in Maze Environments
Group 11
Matthew Kucas, Timothy Maradeo, Josue Perez

AI 801 (Foundations of Artificial Intelligence)


Spring, 2023
Work carried out by:
Full Name Email Address Exhaustive list of Tasks

Matthew Kucas [email protected] A* implementation


A* writing
Presentation
Assistance in BFS and other sections

Timothy Maradeo [email protected] BFS implementation


DFS & BFS research and writing
Impl. writing, revision, & inclusion of figures
Conclusion data acquisition and writing

Josue Perez [email protected] DFS implementation and writing


Created maze environment
RL implementation
RL writing

https://github.com/jochuchemon7/Maze_Project_AI801.git
TABLE OF CONTENTS
1 ABSTRACT
2 INTRODUCTION
3 MAZE ENVIRONMENT
research 3.1
implementation 3.2
4 DEPTH-FIRST SEARCH
research 4.1
implementation 4.2
5 BREADTH-FIRST SEARCH
research 5.1
implementation 5.2
6 INFORMED SEARCHES: A* AND MODIFIED DEPTH-FIRST
research 6.1
implementation 6.2
7 REINFORCEMENT LEARNING
research 7.1
implementation 7.2
8 DATA TABLE AND CONCLUSIONS
9 BIBLIOGRAPHY
-------------------------------------------------------------------------------------------------------------------------------

The Efficiency of Agent Algorithms and Reinforced Learning in Maze Environments


Matthew Kucas, Timothy Maradeo, Josue Perez
[email protected], [email protected], [email protected]
AI 801, Spring 2023
--------------------------------------------------------------------------------------------------------------------
1: Abstract
Maze environments, or two-dimensional geometric arrays, provide a simple space
through which the run-time, relative cost, and memory usage of different algorithms may be
tested and compared. Maze-solving is an effective problem set for designing and testing
algorithms since actions (i.e., movements through the maze) are easy to view and interpret and
are intuitively relatable to human problem-solving approaches. This paper seeks to analyze
different search algorithms and a reinforcement learning method in terms of their relative costs in
a maze environment. Uninformed search algorithms (Depth-first Search and Breadth-first
Search), informed search algorithms (A* Search and a modified Depth-First Search with
Heuristic), and Reinforcement Learning are discussed in the context of modern research and data
compiled through implementation within the maze environment. As expected, providing
information about the environment through use of a heuristic based on distance to goal improves
search algorithm efficiency. We further demonstrate that an agent can readily learn the direct
pathway through a maze using Reinforcement Learning.

2: Introduction
The focus of this report is the ability for maze-solving agents to traverse simple two-
dimensional arrays using different methodologies, each with comparable levels of efficiency. In
modern computer science, this is typically known as “tree traversal,” a system in which an agent
expands (or visits) a set of select nodes in a particular linear order.
To achieve this computation, a program
creates a stack or queue of potential nodes to visit
with the order changing depending on which search
method or algorithm is used. The nodes are then
called, typically by a recursive function. Figure 1
displays the manner in which search trees are
visualized through multiple levels, or depths, of
potential nodes. As node 1 (the parent node) is
expanded, nodes 2 and 3 (child nodes) become
queued next as potential nodes to expand,
continuous for the whole tree until either all nodes
are expanded, or a desired node is found. This process is converted most easily into a stacked list
in order to form the entire environment of potential solutions.
This tree-search methodology forms the basic structure for the maze problem’s Depth-
first Search, Breadth-first Search, A* Search and modified Depth-first Search with heuristic
implementations, each differing only in terms of how the order of nodes is selected for
expansion. The similar structure of each search method allows quantities to be compared, such as
the number of nodes expanded before a solution is found, calculated as the sum of nodes
explored on the path to the goal node.
Our implementation of Reinforcement Learning uses a separate maze environment with a
more restricted size. This size restriction is necessary to allow the program to run numerous
training iterations so that the agent can learn the path to the goal node. While Reinforcement
Learning is a more complicated process than the other search methods, it provides an interesting
point of comparison for maze solving strategies with different levels of complexity.
Regardless of the implementation method under study, the formulation of the maze
problem remains consistent:
States: the single unit nodes - nx
Initial State: starting node generated in the top left (always node 1) - n1
Transition Model: the movement from one node to another
Goal State: ending node generated in the bottom right corner - ng
Action Cost: 1 unit (to remain consistent)
According to this formulation, any maze solving agent’s goal is to move from state nx to
state ng with the lowest action cost, or lowest number of nx expanded. As the complexity of the
maze environment is altered, a larger cost is likely to be incurred overall; however, each search
method will differ in terms of how the cost is affected by scale.
This report presents both relevant abstract research and practical implementation of each
search method in the maze environment. It concludes with a summary of data generated by the
program, whether these findings are consistent with relevant research, and what the data indicate
for maze problems and search algorithms in general.

3: The Maze Environment


3.1: Research
The uniqueness of the maze environment is characterized by its relative structure as
formed by node generation. A randomized Prim’s algorithm solves the problem of establishing a
maze through terminal-to-terminal links in the smallest possible length (Prim 1957, 1380).
Many of our search algorithms require a finite and connected graph for implementation
(Even 2011, 46). In this scenario, the solution will be randomly placed within the enclosed
graph, and the agent must traverse the graph (or expand nodes) without repeating and without
knowledge of the maze environment. In the context of the maze problem, edges, or dead-ends,
that branch from a given direction should be more likely to contain the goal-state (or solution) to
the maze as opposed to surface level nodes corresponding to the first few turns the agent will
make. This is because the structure of a maze requires multiple twists and turns down potential
paths instead of just straight pathing.
Prim’s algorithm, by the nature of its design, does not generate any cycles or loops when
creating the maze. This is because Prim's algorithm is used to find Minimum Spanning Trees,
which cannot create any cycles since it always steps toward new unvisited nodes, keeps the
shortest distanced edge, and selects the closest node that is not in the built tree.
A Randomized Prim’s Algorithm generates our maze environments. Our grid starts with
every element marked as a wall. Next, we pick a cell at random to record as a passable cell. We
then add walls of that cell into a wall list, which will ultimately include all cells that lie adjacent
to the random cells that are selected to be part of the maze. Then we use a while loop as long as
there are wall cells in the wall list. Inside the loop we pick a random wall cell from the list and if
only one of the cells that the wall divides have been visited, we mark the wall as a passage cell
and mark the unvisited cell as part of the maze. We also add the adjacent walls of that new cell to
the wall list. Otherwise, if one of the cells that the wall divides is not visited, we remove the wall
from the list.
The rightmost element within a passage along the bottom row of the maze is selected as
the goal or target node. Similarly, the leftmost element within a passage along the top row of the
maze is selected as the goal or target node.

3.2: Implementation
There are a number of key fields significant to the implementation of our maze
environment. These play a role in most search algorithms:
Maze: contains the pattern of ‘w’ (wall) and ‘p’
(passage) randomly generated matrixes that make up
possible pathways for the agent. These are converted
into 1s and 0s in order to be tested by the agent as it
searches for new nodes to expand. It is necessary for
the environment to contain at least one possible
pathway at all times. Commonly, the agent will find
multiple possible pathways as it moves through the
maze, if they exist. It is the job of the search algorithms
to decide which paths take preference (or multiple
paths as in the case of BFS searches). Note that in our
implementations there is only one pathway through the maze from start to goal. Figure 2 shows
the terminal output of this maze data representation.
Visited: a list of all nodes in the maze and whether or not they have been visited by the agent.
Node: the current node that the agent “occupies.” This begins as the start node.
Target_node: the node that the agent must reach to exit the maze and find a solution
Order: contains the order in which nodes are expanded. This data is called for animation.
Our implementation of the maze
environment makes use of matplotlib.pyplot,
numpy, and matplotlib.animation packages to
create the visual representation of our maze, as
depicted in Figure 3. Every generation of this
environment will be random, meaning there
will always be a different output for pathing
possibilities for each search algorithm. This
allows for the collection of quantitative data
including average number of nodes expanded in
the Visited list. Additionally, the starting node
and Target_node remain the same in each
implementation so that the efficiency of pathing
is tested – having a target_node randomly
generated next to the starting node would throw
off the data with major outliers. Overall, this
maze strategy provides a useful environment
for manipulation.

4: Depth-first Search
4.1: Research
The depth-first search works by expanding the first node it finds, continuing down each
depth-layer until it reaches the depth farthest from the top, then backing up until all nodes are
expanded or a solution is found.
Depth-first searches can yield important
information about the structure of the entire graph
by revealing potential values for the lowest depths
or edges associated with lower depths (Cormen
2001, 603). Depth-First searches are more likely
to find goal nodes at lower depths quicker than
other search methods. However, its drawback is
apparent when the goal node happens to be at a
surface level-depth. DFS is the most likely of all the tested search algorithms to follow the
natural path of a human solving a maze, whereby the solver checks long pathways rather than
turning around each corner at surface levels.

Getting stuck in an infinite loop


among state spaces is a potential risk of
implementing ordinary DFS in maze
environments (Russel 2021, 3.4.3). To avoid
this pitfall, an algorithm may need to
implement a check on each cycle for
repeated nodes. The depth-limited search
algorithm, in which a limit is placed on the
maximum depth l, could also prevent an
infinite loop by stopping any path that is
repeating uncontrollably (Russel 2021,
3.4.4). In this scenario, the depth-limit could
be determined based on the relative
complexity of the maze. If the limit is set too
low this method could potentially fail to find
a solution by restricting pathways to the
farthest depths where a goal-state may appear. Ultimately, it may be desirable to set a maximum
depth-limit so that any search that does not find a goal-state in less than, for example, several
hundred steps must investigate a different potential path. In our maze environment, since it is
created from Prim’s algorithm, we shouldn’t need to deal with this risk of infinite loops.
Depth-first search is a tricky method as it has the potential to easily find an optimizable
solution if the risks described above are minimized. One implementation of DFS that maximizes
time, space, and solution costs is depth-first iterative-deepening (DFIS). DFIS implements
repeated DFS iterations, working through each depth until a solution is identified. Because it
always performs a depth-first search, the space requirement is O(d), where d represents depth.
(Korf 1985, 99). Additionally, DFIS expands all nodes at the current depth before reaching a
greater depth, it will always find the shortest-length solution (Korf 1985, 100). This will solve
the problem that is likely to arise in ordinary DFS where an agent may find suboptimal solutions,
such as tracing around the goal-state rather than straight to it. The only drawback of DFIS is the
wasted computation that must occur for each step; however, as Poole discusses, this does not add
up significantly even at higher branching factors (Poole 2018, 3.5.3).

4.2: Implementation
The Depth-First search is one of the simplest algorithms to implement due to its short,
recursive or repetitive structure. In our maze environment, we’ve created a DFS algorithm by
iterative checks of neighboring nodes, returning the node that has a further depth and repeating
the process until the goal node is found. In the maze environment, it is necessary to treat each
potential movement (up, right, left, and down) as potential nodes, then sift through them
depending on if they have been visited, if they are a wall or not, and if they would range outside
of the maze environment. In the DFS, as soon as the first node is validated, it will be expanded
along with its first valid child node ad infinitum. Figure 6 displays this structure.

This implementation generates a


sequence of DFS iterations until the goal node
is found. The following pattern emerges in the
environment. As seen in figure 7, long pathing
without expanding nodes closest to the starting
point is characteristic of this type of
implementation. In most run-throughs, this
tends to expand about half of the entire
environment before finding a solution. Figure
7 clearly shows how many surface levels are
ignored as the maze-solving agent comes
closer to finding a solution. Because the
solution is found at such a deep level, Depth-
First Limited or Depth-First iterative Search
would be impractical to implement here. For
quantitative analyses of DFS, see Conclusions.

5: Breadth-first Search
5.1: Research
While the breadth-first search is known as the simplest search method, it is not likely to
solve the maze problem as efficiently as the depth-first search method. This is because all
potential nodes per depth must be expanded before moving to lower depths, a process fulfilled by
simple while and for loops (Cormen 2001, 594). In the worst-case scenario, this process will
incur a O(bd ) runtime and memory cost (Korf 1985, 99). Figure 8 shows the pathing of this
algorithm.

In the context
of a complex maze
environment, we are
more concerned with
memory and time
usage than node
expansion cost. One strength of the BFS is its capability to find the cost-optimal path by
expanding all nodes at each depth of the search tree on the path to the goal state (Russell 2021,
3.4.1). A maze does not necessarily induce a cost for making a turn in the same way that, for
instance, a unique cost may be applied to a car finding directions to and from a city.
Theoretically, the cost of each move will be equal, meaning the most impactful qualities of our
maze will be the memory and time consumption of the search method itself, especially as the
maze complexity increases.
An additional consideration associated with BFS is the starting point of the search. The
traditional top-down BFS is restricted by many of the costs mentioned above. However, a
bottom-up search has the potential to reduce unnecessary node-searches in instances where top
depths are very broad (or numerous in potential node expansions) since it does not require
expanding all neighbors when a parent node is found (Beamer 2013, 3). This is likely to make
the efficiency of each instance of BFS have high variation.

5.2: Implementation
Korf draws attention to the unscalable property of the BFS search method, as a modern
computer performing and storing a million states per minute could quickly exhaust its storage
capability in a matter of minutes (Korf 1985, 99). As a result, BFS may be best implemented in
small and less complex maze problems. Our implementation reflects these limitations.
In Figure 9, one can see that the relative complexity of Breadth-first searches in maze
environments is greater than that of Depth-first searches. This is because as nodes are expanded
in the environment, a situation may occur where there are various possible valid paths (left, right,
up, or down) that expand into another set of various paths – a list containing a list. Therefore, to
queue nodes in the proper order for BFS, multiple lists must be checked – here node_list,
new_node_list, and each node itself (since our maze remembers nodes as two-integer lists).
Figure 10 shows the pattern this creates.
This search method in maze
environments can clearly be seen to expand
almost all nodes (or possible pathways) before
reaching the goal node. As a result, its runtime
efficiency will be drastically lower than any
other search method. As the complexity of the
maze environment increases, BFS will be
unable to scale, and efficiency will plumet. The
only advantage of Breadth-first search is that it
will expand surface nodes quickly, meaning if
the solution happens to be near the surface, the
agent will not skip over it as is likely to occur
in the case of Depth-First search. Figure 10
represents the worst-case scenario in which the
goal node (green dot in the bottom right corner)
is opposite the start node, requiring a full
expansion of the maze. For quantitative
analyses of BFS, see Conclusions.

6: Informed Searches: A* and modified Depth-First


6.1: Research
Conceptually, the uninformed depth and breadth first searches discussed above are akin
to human pathfinding through an unmapped corn maze – the human agent who walks into the
maze cannot observe the full environment and “plot” a course ahead of time. On the other hand,
human maze-play on paper is an informed search since the pathfinding human agent can view
and plan a course through the fully observable environment before putting pencil to paper. Best
first searches informed by distance-based heuristics, particularly the A* search, provide a
programmatic mechanism to efficiently identify pathways through a maze (Russell and Norvig
2021).
A* applies
an evaluation
function, f(n) = g(n)
+ h(n), to determine
the child nodes to
open in the search
for a minimum cost
post between initial
and goal states, as
seen in Figure 11.
The quantity g(n)
represents the total path cost associated with traversing from the initial state to a node along the
search path, h(n) represents the estimated cost between any given node and the goal state and
f(n) represents the total estimated cost along possible paths between the initial state and goal
state. Selecting an appropriate heuristic to determine h(n) values is a critical step in A*
implementation. The h(n) values should underestimate the actual path cost to ensure that the A*
algorithm finds the optimal path to a solution. However, the number of nodes explored in the
search is larger when h(n) values are lower, so h(n) values should reflect the actual along-path
cost from node n to the goal state as closely as possible to optimize search efficiency (Dechter
and Pearl 1985).
For a large search space, the applicability of a standard A* search is subject to computer
memory limitations (Russell and Norvig 2021). However, this limitation should not be a concern
for the search space associated with a relatively simple maze design. Our primary interest in
implementing A* lies in the selection of an appropriate heuristic. Since motions through a simple
maze are straight line and orthogonal, Manhattan distance emerges as a simple admissible
heuristic for selection (Foead et al. 2021). For more complex maze environments, recent research
indicates that training neural networks with simulated mazes represents the state-of-the-art
approach for optimizing heuristic functions (Chrestien et al. 2022; Chrestien et al. 2021).

6.2: Implementation
The most important aspect of implementation in A* is setting the cost of the function as it
traverses nodes, as seen in Figure 12. In our implementation, the path cost for each move from
one node to the next is equal to one and Manhattan distance is applied as the heuristic.

The impact of information in the form of an appropriate heuristic is clearly visible in our
implementation of the A* search algorithm. The algorithm uses a priority queue data structure,
adding and storing unvisited nodes adjacent to previously visited nodes in the queue. A selection
process is applied to for each move to a new node,
namely the algorithm evaluates the evaluation
function (f(n)) values for all nodes on the frontier and
systematically moves to check each node with the
lowest associated f(n) value. The search tends to
follow the current branch toward the goal node for
long periods, until the path either encounters or turns
away from the goal node or dead ends. Shifts up the
search tree, even close to the starting node, occur as
the search reorients to find the likely optimal path. In
many cases, however, node explorationhigher in the
tree (closer to the staring node) is limited, and the
search returns to depth.

While developing the A* Search algorithm, we


discovered an informed search implementation that
functions like DFS, but with significantly improved
pathfinding due to the use of an evaluation function
and the Manhattan distance heuristic. This
algorithm, which we refer to as modified DFS with
heuristic, also uses a priority queue data structure. In
this case, however, the search continues down the
current branch of the search tree until reaching a
dead end. The search resumes at the next level up
the branch at which the evaluation function is
minimized and continues down that new branch to a
dead end - and so on. In other words, the algorithm
prioritizes frontier nodes lower in the tree - nearer to
the goal - rather than visiting nodes higher in the tree
like A*. This behavior results in exceptional performance for some (but not all) maze
configurations, which is discussed further in the Conclusions section.

7: Reinforcement learning
7.1: Research
Reinforcement Learning is a broad topic which, in essence, tries to solve traditional
dynamic programming problems by using a state-action function and finding the right policy
instead of trying to split a big problem into subproblems and perform traditional recursion on
them. For this to occur the general framework requirements include an agent that will take an
action from a list of possible actions inside an environment, which will be determined based on
the state that it receives from the environment.
The agent’s action causes changes to the environment in which the agent is operating. In
return, the environment grants a reward to the agent for its action, as well as a new state for the
agent to evaluate in determining its next action. The agent will now be in the new state that the
environment returns, and we can form observations with the current state, action, reward and
next state. Depending on the algorithm that one uses, the agent will use the state and reward
differently. In our maze project we decided to use a simple DQLN or a Deep Q-Learning
Network. This algorithm makes use of a neural network to take in the state that the environment
passes to it, and it uses the reward to evaluate the neural network to calculate what are called Q-
Values using a Q-Function, which in our case is the neural network. The number of Q-values is
the same as the number of actions that the agent can take and, depending on the policy, the
implementation will pick a Q-value which is paired with the action that the agent will take. We
also have an optimizer for our NN to perform gradient decent and improve the model.
The code includes the option to tune the number of training episodes for the simulation as
well as the gamma value (discount factor) which weights the relative impact of future state-
action rewards and those accumulated by the agent at the beginning of the episode. The higher
the discount factor the more that future observations will affect updates to the Q-Function since
reward is multiplied by the discount factor. The policy will be the one that eventually picks the
Q-value and its corresponding action. One example is the Epsilon-Greedy policy, in which we
set the variable ‘epsilon’ to a value between 0 and 1 and constantly decrease the value after each
episode, usually by a factor of one divided by the number of episodes. Then with a probability of
the epsilon value we pick a random q-value and with the complementary probability we pick the
maximum q-value.

7.2: Implementation
As discussed above, we decided to use a simple DQLN approach (Deep Q-Learning
Network model) for our maze-solving problem. We first needed to make some modifications to
our maze environment to accommodate the model. One of the most significant changes that we
made to the initial maze was to properly include an agent in our environment. For this step, we
created a new agent class that includes the position and name of the agent as well as a list of cells
visited, whether an agent’s move was valid or invalid, and an invalid movement counter. We also
created a goal class with just the name and position of the goal in the maze. These two classes
were created through a function that added the class objects in a key-value dictionary for the
agent and goal. We also included a step function that takes in the agent’s pending action,
evaluates the action through a check step function and, if the step is determined to be valid,
marks the action as valid and changes the agent’s position. If the step is determined to be invalid,
the step function marks it as invalid, includes the action in a list of invalid cells, and adds one to
the counter. The invalid list and counter are reset when an action is valid.
We also included a reward function that will return a value of –0.7 if the agent’s position
ends on the start node or if it hits a wall, a reward of –0.25 if the new position is a valid one but
already visited, a reward of –0.04 if the new position a valid and not visited, and a reward of 1 if
the agent reaches the goal state. The rewards are not greater than 1 or less than –1 because we are
using a Neural Network for our Q-Network, which works best when dealing with values that are
normalized or less than 1. Finally, we modified a self.get_maze_() function so that instead of
returning the [nrow x ncol] matrix we return a [nrow x ncol x 3] matrix. This new matrix
functions as the ‘state’ that the environment will pass to our model, essentially three mazes
representing the agent, goal, and walls respectively as ones and the rest of each grid as zeros.
That way, the model perceives the constant movement of the agent and the goal location.
That entire [nrow x ncol x 3] grid is flattened out into a [1, nrow x ncol x 3] matrix,
which is passed to the neural network to generate a q-value. The epsilon-greedy policy uses the
q-value to pick an action and get new observations. We set our epsilon value to 1 and decreased
it by a value of one over the number of episodes so that our agent slowly will pick the best action
based on the state passed over. With a threshold of 0.1 so there will be a small probability of
picking a random action just in case it might find one better in the future.
We also set a value for the maximum number of steps that the agent can take before the
episode is deemed failed or lost. This is so that we can encourage the agent to find a solution fast
and avoid getting stuck in the maze or worse finding a solution that might include going into
more cells than it needs to.
Overall, we found that the DQLN algorithm
was successful at finding the best path to the goal in
small mazes, really anything less than 11x11. We
thought about incorporating experience replay and a
target network, but we ran out of time for that. We also
thought about different algorithms like PPO and Actor
Critic, but we figured a simple DQN could do just fine.
A more important thing that we thought could have
made a difference was the input itself to the model.
Instead of passing a [nrow x ncol x 3] matrix we could
have passed the maze in its color-coded values and run
some convolutional layers before the linear and
nonlinear layers which could have done a better job at
converging on bigger mazes.
8: Conclusions
The following data tables contains information for our four search algorithms: Breadth-
first search, Depth-first search, A* Search, and Depth-first search with heuristics. The data
represent the number of notes expanded before the goal node is found, measured as the length of
the “order” list in each program. Each algorithm made use of a 50 X 50 sized randomized maze,
in order that the number of nodes expanded would be comparable. We ran 30 tests of each
algorithm to find the average number of nodes expanded. The data tables and graphs in figures
16 through 18 summarize our findings. Efficiency in terms of the maze problem is defined as the
number of unnecessary nodes expanded before a goal node is found. BFS is the most inefficient,
followed by DFS, then DFS with a heuristic, and finally A*. Of
particular interest in this data is the tendency for BFS and A*
and DFS with Heuristic to produce both high and low outliers,
mainly because of the pathing some randomized mazes would
produce. In the case of BFS, we have efficient outliers when
there were few outbranching paths from start node to goal node.
In this situation, a DFS maze would perform worse than a BFS
because it would be likely to follow a path that cannot rejoin to
the goal-node path.
Because DFS with a heuristic is similar to A* star in terms of the mean number of nodes
expanded, we have performed a two-sample T test for difference in means to see whether we
could consider their difference in performance significant. At the 5% significance level, the p
value is greater than .05, meaning we cannot reject the null hypothesis that the two-population
means are nearly equivalent. For this reason, while the A* seems to be slightly more efficient
according to the data, we could consider the two algorithms to be statistically equivalent.
The greatest limitation we have seen on this project is the inability for Reinforced
Learning to capture larger maze environments. Around the 10 X 10 size, the speed of RL drops
dramatically, whereas the other search algorithms, especially the A* and DPS scale nicely.
Another limitation to consider is that our data does not apply to all maze environments, but
rather, it shows information only pertaining to Prim’s algorithm. In the case of other maze
environments, it is possible to run into other issues, such as pathing that includes infinite loops.
Likewise, because of the scope of the project, we were not able to include other forms of
complexity such as more than one agent. Regardless, the data we have obtained is useful in
testing how each algorithm responds to Prim’s algorithm mazes.
A maze environment constructed with Randomized Prim’s algorithm provides a useful
tool for testing the efficiencies of search algorithms and reinforced learning implementations.
Some of the peculiar characteristics of the maze come through in our data, and we see that the
randomized pathing plays a large role in determining which implementation is most efficient.
Although reinforcement learning may be used to find the most efficient pathway to the goal
node, it is not necessarily optimal for solving the problem because of the time complexity
involved in the learning process which involves a lot of hyperparameters to tune for. In fact,
comparing RL to our search algorithms shows that “simple is better” in some cases. If we solely
evaluate the most practical approach to solving our implementation for the maze problem, A*
search stands out as the most optimal algorithm of those we tested.
9: Bibliography
Algfoor, Zeyad Abd, M. Sunar and H. Kolivand. 2015. “A Comprehensive Study on Pathfinding
Techniques for Robotics and Video Games.” International Journal of Computer Games
Technology. Article ID: 736138. http://dx.doi.org/10.1155/2015/736138
Beamer, Scott, Krste Asanović and David Patterson. 2013. “Direction-optimizing breadth-first
search." Scientific Programming 21: 137-148.
“Binary Tree Level Order Traversal.” n.d. AlgoMonster. Accessed March 11, 2023.
https://algo.monster/problems/binary_tree_level_order_traversal.
Biswas, Souham. 2020. “Maze solver using Naïve Reinforcement Learning.” Towards Data
Science. https://towardsdatascience.com/maze-rl-d035f9ccdc63
Chrestien, Leah, Tomáš Pevný, Antonín Komenda, and Stefan Edelkamp, 2022. A Differentiable
Loss Function for Learning Heuristics in A. arXiv preprint arXiv:2209.05206.
Chrestien, Leah, Tomáš Pevný, Antonín Komenda, and Stefan Edelkamp, 2021. Heuristic search
planning with deep neural networks using imitation, attention and curriculum learning.
arXiv:2112.01918.
Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2001.
Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill. ISBN 0-262-
03293-7.
Dechter, R. and Pearl, J. 1985. “Generalized best-first search strategies and the optimality of
A*.” JACM, 32, no. 3: 505–536.
Even, Shimon. 2011. Graph Algorithms, Second Edition. Cambridge University Press. ISBN
978-0-521-73653-4.
Daniel Foead, Alifio Ghifari, Marchel Budi Kusuma, Novita Hanafiah, Eric Gunawan. 2021. A
Systematic Literature Review of A* Pathfinding, Procedia Computer Science 179,
507-514.
Korf, Richard. 1985. “Depth-First Iterative Deepening: An Optimal Admissable Tree Search.”
Artificial Intelligence 27. 97-109.
https://academiccommons.columbia.edu/doi/10.7916/D8HQ46X1
Mnih, V., Kavukcuoglu and K., Silver, D. et al. 2015. “Human-level control through deep
reinforcement learning.” Nature 518: 529–533. https://doi.org/10.1038/nature14236
Moerland, Thomas M., Joost Broekens, Aske Plaat, Catholijn M. Jonker. 2022. Model-Based
Reinforcement Learning: A Survey (v4). arXiv:2006.16712v4.
Poole, David and Alan Mackworth. 2017. "3.5.3 Iterative Deepening” in Artificial Intelligence:
Foundations of Computational Agents, 2nd Edition. Cambridge University Press.
Prim, R. C. 1957. “Shortest Connection Networks And Some Generalizations.” Bell System
Technical Journal 36: 6.
Russell, Stuart and Peter Norvig. 2021. Artificial Intelligence: A Modern Approach. Pearson
Series in Artificial Intelligence, 4th Edition. NJ: Prentice Hall.
Sutton, Richard S and Andrew G. Barto 2018. Reinforcement Learning: An Introduction. ISBN
0262039249
Zai, Alexander and Brandon Brown. 2020. Deep Reinforcement Learning in Action. ISBN
9781617295430

You might also like