Mastering AI Agents
Mastering AI Agents
Preface
In our previous e-book, “Mastering RAG,” our goal was clear: building enterprise-grade
RAG systems, productionizing them, monitoring their performance, and improving them.
At the core of it, we understood how RAG systems enhance an LLM’s ability to work with
specific knowledge by providing relevant context.
In this e-book, we’re taking a step further and asking, “How do we use LLMs to
accomplish end-to-end tasks?” This singular question opens up a door: AI agents. A RAG
system helps an LLM provide accurate answers based on given context. An AI agent
takes that answer and actually does something with it — makes decisions, executes
tasks, or coordinates multiple steps to achieve a goal.
A RAG-enhanced LLM could help answer questions about policy details by pulling relevant
information. But an AI agent could actually process the claim end-to-end by analyzing the
documentation, checking policy compliance, calculating payments, and even coordinating
with other systems or agents when needed.
The ideas behind agents has existed for years. It can be a software program or another
computational entity that can accept input from its environment and take actions based
on rules. With AI agents, you’re getting what has never been there before: the ability to
understand the context without predefined rules, the capacity to tune decisions based on
context, and learning from every interaction. What you’re getting is not just a bot working
with a fixed set of rules but a system capable of making advanced decisions in real-time.
Companies have quickly adapted, adopted, and integrated AI agents into their workflows.
Capgemini’s research found that “10% of organizations already use AI agents, more than
half plan to use them in 2025 and 82% plan to integrate them within the next three years.”
2
Mastering AI Agents
This e-book aims to be your go-to guide for all things AI agents. If you’re a leader looking
to guide your company to build successful agentic applications, this e-book can serve
as a great guide to get you started. We also explore approaches to measuring how well
your AI agents perform, as well as common pitfalls you may encounter when designing,
measuring, and improving them.
Chapter 1 introduces AI agents, their optimal applications, and scenarios where they
might be excessive. It covers various agent types and includes three real-world use cases
to illustrate their potential.
Chapter 4 explores how to measure agent performance across systems, task completion,
quality control, and tool interaction, supported by five detailed use cases.
Chapter 5 addresses why many AI agents fail and offers practical solutions for successful
AI deployment.
We hope this book will be a great stepping stone in your journey to build trustworthy
agentic systems.
- Pratik Bhavsar
3
Contents
Chapter 1: Chapter 2:
What are AI agents Frameworks for
Building Agents
7/27 28/43
Chapter 3: Chapter 4:
How to Evaluate Agents Metrics for Evaluating
AI Agents
44/61 62/79
5
Mastering AI Agents
Chapter 5:
Why Most AI Agents Fail &
How to Fix Them
80/95
Development Issues 81
LLM Issues 82
Production Issues 86
6
01
CHAPTER
WHAT ARE AI
AGENTS?
Mastering AI Agents
AI agents are software applications that use large language models (LLMs) to
autonomously perform specific tasks, ranging from answering research questions to
handling backend services. They’re incredibly useful for tasks that demand complex
decision-making, autonomy, and adaptability. You might find them especially helpful in
dynamic environments where the workflow involves multiple steps or interactions that
could benefit from automation.
Salesforce estimates that salespersons spend 71% of their time on non-selling tasks (like
administrative tasks and manually entering data). Imagine the time that could have gone
into directly engaging with customers, developing deeper relationships, and ultimately
closing more sales. This is true across multiple domains and applications: finance, health
care, tech, marketing, sales, and more.
Let’s use an example to understand this better. Imagine you run an online retail business
and receive hundreds of customer inquiries every day about order statuses, product
details, and shipping information. Instead of answering each and every query yourself, you
can integrate an AI agent into your solution to handle these queries.
1. Customer Interaction
A customer messages your service asking, “When will my order ship?”
2. Data Retrieval
The AI agent accesses the order management system to find the specific order details.
3. Response Generation
Based on the data retrieved, the agent automatically provides an updates to the customer,
such as sending “Your order will ship tomorrow and you’ll receive a tracking link via email
once it’s on its way.”
8
Mastering AI Agents
Fig 1.1 is an example of how agents are leveraged for code generation.
Conversation
Repository
Content result
Run test
Success
9
Mastering AI Agents
Types of AI Agents
Now that we’re familiar with what AI agents are, let’s look at different types of AI
agents along with their characteristics, examples, and when you can use them.
See Table 1.1 below to get a quick idea of the types of AI agents and where and
when you can use them.
High-stakes decisions,
External knowledge Legal research tools,
ReAct + RAG: domain-specific tasks,
access, low hallucinations, medical assistants,
Grounded Intelligence real-time knowledge
real-time data technical support
needs
10
Mastering AI Agents
Fixed Automation –
The Digital Assembly Line
This level of AI agents represents the simplest and most rigid form of automation. These
agents don’t adapt or think—they just execute pre-programmed instructions. They are
like assembly-line workers in a digital factory: efficient but inflexible. Great for repetitive
tasks, but throw them a curveball, and they’ll freeze faster than Internet Explorer.
(See Table 1.2 below)
Feature Description
Best Use Cases Routine tasks, structured data, situations with minimal need for adaptability.
RPA for invoice processing, email autoresponders, basic scripting tools (Bash,
Examples
PowerShell).
The fixed automation workflow (See Fig 1.2) follows a simple, linear path. It begins when
a specific input (like a file or data) triggers the system, which consults its predefined
rulebook to determine what to do. Based on these rules, it executes the required action
and finally sends out the result or output. Think of it as a digital assembly line where
each step must be completed in exact order, without deviation.
11
Mastering AI Agents
LLM-Enhanced –
Smarter, but Not Exactly Einstein
These agents leverage LLMs to provide contextual understanding and handle
ambiguous tasks while operating within strict boundaries. LLM-Enhanced Agents
balance intelligence and simplicity, making them highly efficient for low-complexity,
high-volume tasks. Take a look at their features below in Table 1.3.
Feature Description
The workflow below (Fig 1.3) shows how these smarter agents process information:
starting with the input, the agent uses LLM capabilities to analyze and understand
the input context. This analysis then passes through rule-based constraints that keep
the agent within defined boundaries, producing an appropriate output. It’s like having
a smart assistant who understands context but still follows company policy before
making decisions.
LLM-Enhanced Agent
12
Mastering AI Agents
ReAct –
Reasoning Meets Action
ReAct agents combine Reasoning and Action to perform tasks that involve strategic
thinking and multi-step decision-making. They break complex tasks into manageable
steps, reasoning through problems dynamically and acting based on their analysis.
These agents are like your type-A friend who plans their weekend down to the minute.
Table 1.4 lists their characteristics.
Feature Description
Scope Assists with basic open-ended problem-solving, even without a direct solution path.
The ReAct workflow starts with an Input Query and then enters a dynamic cycle between
the Reasoning and Action Phase, as you’ll see in Fig 1.4. Unlike simpler agents, it can
loop between thinking and acting repeatedly until the desired outcome is achieved before
producing the final Output/Action. Think of it as a problem solver that keeps adjusting its
approach - analyzing, trying something, checking if it worked, and trying again if needed.
Reasoning
Output /
Input Trigge repeat until desired outcome achieved
Action
Action Phase
13
Mastering AI Agents
Feature Description
Starting with an Input Query, this advanced workflow combines ReAct’s reasoning-action
loop with an additional Knowledge Retrieval step. The agent cycles between Reasoning,
Action Phase, and Knowledge Retrieval (See Fig 1.5) — consulting external sources as
needed — until it reaches the desired outcome and produces an Output/Action. It’s like
having a problem solver who not only thinks and acts but also fact-checks against reliable
sources along the way.
Reasoning
Output /
Input Query repeat until desired outcome achieved
Action
Knowledge
Action Phase
Retrieval
14
Mastering AI Agents
Feature Description
Leverages APIs, databases, and software tools to perform tasks, acting as a multi-
Intelligence
tool integrator.
Handles multi-step workflows, dynamically switching between tools based on task
Behavior
requirements.
Automates repetitive or multi-stage processes by integrating and utilizing diverse
Scope
tools.
Jobs requiring diverse tools and APIs in tandem for complex or multi-stage
Best Use Cases
automation.
Code generation tools (GitHub CoPilot, Sourcegraph's Cody, Warp Terminal), data
Examples
analysis bots combining multiple APIs.
Starting with an Input Query, the agent combines reasoning with a specialized tool loop.
After the initial reasoning phase, it selects the appropriate tool for the task (Tool Selection)
and then executes it (Tool Execution). This cycle repeats until the desired outcome is
achieved, leading to the final Output/Action. (See Fig 1.6)
Reasoning
Output /
Input Query repeat until desired outcome achieved
Action
15
Mastering AI Agents
Feature Description
Starting with an Input Query, the agent goes through a cycle of Reasoning and Execution,
but with a crucial additional step: Reflection. After each execution, it reflects on its
performance and feeds those insights back into its reasoning process. This continuous
loop of thinking, doing, and learning continues until the desired outcome is achieved,
producing the final Output/Action. This is evident in Fig 1.7.
Execution
When
desired Output /
Input Query Reasoning Reflection
outcome Action
achived
Feedback Loop
16
Mastering AI Agents
Memory-Enhanced –
The Personalized Powerhouses
Give an agent a little memory, and you have the ultimate personal assistant. Memory-
enhanced agents bring personalization to the forefront by maintaining historical context
and remembering user preferences, previous interactions, and task history. They act as
adaptive personal assistants, providing tailored experiences and continuous, context-
aware support. These agents remember your preferences, track your history, and
theoretically — would never (ever) forget your coffee order! (See Table 1.8)
Feature Description
Best Use Cases Personalized assistance, long-term interactions, tasks spanning multiple sessions.
Look at Fig 1.8: Starting with an Input Query, this agent first recalls relevant past
experiences and preferences (Memory Recall), then uses this context for Reasoning about
the current task. After deciding on a course of action, it executes it (Action/Execution),
updates its memory with new information (Memory Update), and produces the Output.
Reasoning Action /
Phase Execution
Memory Memory
Input Query Output
Recall Update
17
Mastering AI Agents
Environment Controllers –
The World Shapers
Environment-controlling agents extend beyond decision-making and interaction—they
actively manipulate and control environments in real time. These agents are equipped
to perform tasks that influence the digital landscape or the physical world, making
them ideal for applications in automation, robotics, and adaptive systems. Think smart
thermostats, but on steroids! (See Table 1.9)
Feature Description
Observe the workflow in Fig 1.9 carefully. Starting with an Input Query, the agent first
observes its surroundings (Perception Phase), reasons about the current state and
required changes (Reasoning Phase), takes action to modify the environment (Action
Phase), and then receives feedback about the changes (Feedback Phase). This cycle
repeats until the desired goal is met, producing both an Output and changed system state.
Perception Feedback
Input Query Action
Phase Phase
Phase
Reasoning
Phase Goal achieved
Output +
Changed State
18
Mastering AI Agents
Are they the future of AI? Potentially. Are they also terrifying? Without evaluations,
observation, regulation, and oversight, very much so.
Feature Description
Adaptive and scalable, adjusting to changing conditions and new tasks. Exhibits
Behavior
evolutionary behavior, improving performance over time.
Suited for cutting-edge research and autonomous learning systems, offering high
Scope
potential but requiring careful monitoring.
Situations where autonomous learning and adaptation are crucial, such as complex
Best Use Cases
research, simulation, or dynamic environments.
From the workflow in Fig 1.10, you’ll realize how a self-learning agent are akin to an AI
researcher that gets smarter with every experiment, constantly refining its methods and
knowledge.
Starting with an Input Query, the agent enters a continuous cycle beginning with the
Learning Phase where it processes available data, moves to Reasoning to analyze it, then
takes Actions based on its analysis. The Feedback Phase evaluates results, leading to
an Evolution Phase where the agent adapts and improves its models. This cycle repeats
continuously, producing not just an Output but an evolved version of both the solution and
the agent itself.
19
Mastering AI Agents
Action Phase
Reasoning Feedback
Phas Phase
What’s fascinating is that each type has its own sweet spot—there’s no “one-size-fits-
all” solution. The key is matching the right agent type to your specific needs, whether
you need the reliable consistency of fixed automation for routine tasks or the adaptive
intelligence of self-learning agents for cutting-edge research.
20
Mastering AI Agents
Table 1.11: Domains and applications that can benefit from the use of AI agents
21
Mastering AI Agents
If the tasks you’re dealing with are straightforward, occur infrequently, or require only
minimal automation, the complexity and cost of implementing AI agents might not make
sense for you. Simple tasks that existing software solutions can handle efficiently do not
necessarily benefit from the added intricacy of agent-based systems. In such cases,
sticking with traditional methods can be more efficient and cost-effective.
That said, fields like psychotherapy, counseling, or creative writing thrive on the nuances
of human emotion and the creative process—areas where agents largely fall short. In
these domains, the human touch is irreplaceable and essential for achieving meaningful
outcomes.
Implementing agents also requires a significant investment from you in terms of time,
resources, and expertise. If you’re running a small business or managing a project with
a tight budget, the costs of developing and maintaining these agents may not justify
the benefits. In highly regulated industries, your use of agents might be restricted due
to compliance and security concerns as well, and ensuring agents adhere to stringent
regulatory requirements can be very challenging and resource-intensive.
22
Mastering AI Agents
03 04
What is the
Does the task require
expected volume of
adaptability?
data or queries?
Will the agent be handle large Are the conditions under which
volumes of data or queries where the task is performed constantly
speed and efficiency are crucial? changing, requiring adaptive
responses that an AI can manage?
05 06
Can the task benefit
What level of accuracy
from learning and
is required?
evolving over time?
Is there a benefit to having a system Is it critical that the task is performed
that learns from its interactions and with high accuracy, such as in
improves its responses or strategies medical or financial settings, where AI
over time? might need to meet high standards?
23
Mastering AI Agents
07 08
Is human expertise or What are the
emotional intelligence privacy and security
essential? implications?
Does the task require deep domain Does the task involve sensitive
knowledge, human intuition, or information that must be handled with
emotional empathy that AI currently strict privacy and security measures?
cannot provide?
09 10
What are the regulatory
What is the cost-
and compliance
benefit analysis?
requirements?
Take time to evaluate these questions; this will help you better
determine if an AI agent fits your needs and how it could be
effectively implemented to enhance your operations or services.
24
Mastering AI Agents
3 Interesting Real-World
Use Cases of AI Agents
Now that we’ve learned what agents are and when to and when not to use them, it’s time
to go through some interesting real-world use cases of AI agents.
Problem: Need:
Wiley faced challenges handling The company needed an
spikes in service calls during peak efficient customer service
times, particularly at the start of new system to manage the
semesters when thousands of students increased volume and maintain
use Wiley’s educational resources. positive customer experiences.
Solution:
ROI:
Wiley invested in Salesforce’s Agentforce, an AI agent
A 40%+ increase in case
designed to enhance customer service operations.
resolution compared to
This integration has significantly improved case
their previous chatbot, a
resolution rates and faster resolution of customer
213% ROI, and $230K
queries, especially during peak times, such as the
in savings
start of new semesters when demand spikes.
25
Mastering AI Agents
Use Case:
Company: AI Agent:
Enhancing patient-
Oracle Health Clinical AI Agen
provider interactions
Problem: Need:
Healthcare providers faced There was a need for a solution that
documentation and time could streamline clinical workflows
management challenges during and improve documentation
patient visits, leading to burnout accuracy while allowing providers
and reduced patient engagement. more time to interact with patients.
Solution: ROI:
Oracle Health developed its AtlantiCare, using the Clinical AI
Clinical AI Agent, which automates Agent, reported a 41% reduction
documentation processes and in total documentation time,
enhances patient-provider interactions saving approximately 66 minutes
through a multimodal voice user per day, which translates to
interface. This allows providers to improved productivity and
access patient information quickly and enhanced patient satisfaction.
generate accurate notes efficiently.
26
Mastering AI Agents
Problem: Need:
Magid, a leader in consumer intelligence A robust observability system
for media brands, needed to ensure was essential for monitoring AI-
consistent, high-quality content in a fast- driven workflows and ensuring
paced news environment. The complexity the quality of outputs across
of diverse topics made it challenging various clients. This scalability
to uphold accuracy, and errors could was crucial for managing the daily
potentially lead to significant repercussions. production of numerous stories.
Solution: ROI:
Magid integrated Galileo’s real-time With Galileo, Magid achieved 100%
observability capabilities into their visibility over inputs and outputs,
product ecosystem. This integration enabling customized offerings
provided production monitoring, as they scale. This visibility helps
relevant metrics for tracking tone identify trends and develop client-
and accuracy, and customization specific metrics, enhancing the
options tailored to Magid’s needs. accuracy of news delivery.
27
02
CHAPTER
FRAMEWORKS FOR
BUILDING AGENTS
Mastering AI Agents
CHAPTER 2
FRAMEWORKS FOR
BUILDING AGENTS
The first chapter examined what AI agents are and when to use them. Before we move on
to the frameworks you can use to build these agents, let’s do a quick recap.
AI agents are particularly useful for dynamic, complex environments like customer support
or data-heavy sectors such as finance, where they automate and speed up processes.
They’re also great for personalizing education and streamlining software development.
However, they are not ideal for straightforward tasks that traditional software efficiently
handles or for fields requiring deep expertise, empathy, or high-stakes decision making,
where human judgment is crucial. The cost and regulatory compliance may also make
them less viable for small projects or heavily regulated industries.
That said, the framework you choose to build these agents can significantly affect their
efficiency and effectiveness. In this chapter, we’ll evaluate three prominent frameworks for
building AI agents — LangGraph, Autogen, and CrewAI — to help you make an informed
choice.
29
Mastering AI Agents
LangGraph
LangGraph is an open-source framework designed by Langchain to build stateful, multi-
actor applications using LLMs. Inspired by the long history of representing data processing
pipelines as directed acyclic graphs (DAGs), LangGraph treats workflows as graphs where
each node represents a specific task or function.
This graph-based approach allows for fine-grained control over the flow and state of
applications, making it particularly suitable for complex workflows that require advanced
memory features, error recovery, and human-in-the-loop interactions. LangGraph
integrates seamlessly with LangChain, providing access to various tools and models and
supporting various multi-agent interaction patterns.
Autogen
Autogen is a versatile framework developed by Microsoft for building conversational
agents. It treats workflows as conversations between agents, making it intuitive for users
who prefer interactive ChatGPT-like interfaces.
Autogen supports various tools, including code executors and function callers, allowing
agents to perform complex tasks autonomously. The highly customizable framework
allows you to extend agents with additional components and define custom workflows.
Autogen is designed to be modular and easy to maintain, making it suitable for both simple
and complex multi-agent scenarios.
CrewAI
CrewAI is a framework designed to facilitate the collaboration of role-based AI agents.
Each agent in is assigned specific roles and goals, allowing them to operate as a cohesive
unit. This framework is ideal for building sophisticated multi-agent systems such as multi-
agent research teams. CrewAI supports flexible task management, autonomous inter-
agent delegation, and customizable tools.
30
Mastering AI Agents
Practical Considerations
For practical consideration, let’s compare LangGraph, Autogen, and CrewAI across several
key aspects.
Then there’s Autogen, which models workflows as conversations between agents. If you
prefer interactive, chat-based environments, this framework will likely feel more natural to
you. Autogen simplifies the management of agent interactions, allowing you to focus more
on defining tasks and less on the underlying complexities. This can be a great help when
you’re just starting out.
CrewAI, on the other hand, focuses on role-based agent design, where each agent has
specific roles and goals. This framework is designed to enable AI agents to operate as
a cohesive unit, which can be beneficial for building complex, multi-agent systems. It
provides a structured approach to defining and managing agents. It’s very straightforward
to get started with CrewAI.
Winner: Autogen and CrewAI have an edge due to their conversational approach and
simplicity.
31
Mastering AI Agents
For instance, LangGraph offers robust integration with LangChain, which opens up a wide
array of tools and models for your use. It supports functionalities like tool calling, memory,
and human-in-the-loop interactions. This comprehensive integration allows you to tap
into a broad ecosystem, significantly extending your agents’ functionality. If your project
requires a rich toolkit for complex tasks, LangGraph’s capabilities might be particularly
valuable.
Moving on to Autogen, this framework stands out with its support for various tools,
including code executors and function callers. Its modular design is a key feature,
simplifying the process of adding and integrating new tools as your project evolves. If
flexibility and scalability are high on your list, Autogen’s approach lets you adapt and
expand your toolset as needed without much hassle.
Lastly, CrewAI is built on top of LangChain, which means it inherits access to all of
LangChain’s tools. It allows you to define and integrate custom tools tailored to your
specific needs. This capability is ideal if you’re looking to craft a highly customized
environment for your agents.
Winner: LangGraph and Crew have an edge due to their seamless integration with
LangChain, which offers a comprehensive range of tools. All the frameworks allow the
addition of custom tools.
32
Mastering AI Agents
LangGraph supports built-in short-term, long-term, and entity memory, enabling agents to
maintain context across interactions. It includes advanced features like error recovery and
the ability to revisit previous states, which are helpful for complex problem-solving.
CrewAI features a comprehensive memory system that includes short-term, long-term, and
entity memory. This system allows agents to accumulate experiences and enhance their
decision-making capabilities over time, ensuring they can recall important details across
multiple interactions.
Winner: Both LangGraph and CrewAI have an edge due to their comprehensive memory
system, which includes short-term, long-term, and entity memory.
LangGraph allows nodes to return structured output, which can be used to route to
the next step or update the state. This makes managing complex workflows easier and
ensures the output is well-organized. An ideal use case is a customer service system that
routes queries through different departments based on content analysis, urgency, and
previous interaction history.
Autogen supports structured output through its function-calling capabilities. Agents can
generate structured responses based on the tools and functions they use. This ensures
that the output is well-defined and can be easily processed by other components. A
coding assistant system where multiple specialized agents (code writer, reviewer, tester)
need to work together dynamically is a good use case to think of.
33
Mastering AI Agents
Winner: LangGraph and CrewAI have an edge due to their ability to define structured
output.
Autogen has documentation with numerous examples and tutorials. The documentation
covers various aspects of the framework, making it accessible to beginners and advanced
users alike. It includes detailed explanations of key concepts and features.
CrewAI provides detailed documentation, including how-to guides and examples. The
documentation is designed to help you get started quickly and understand the framework’s
core concepts. It includes practical examples and step-by-step instructions.
Winner: All frameworks have excellent documentation, but it’s easy to find more examples
of LangGraph and CrewAI.
• Hierarchical
• Sequential
• Dynamic interactions
When agents are grouped by tools and responsibilities, they tend to perform better
because focusing on a specific task typically yields better results than when an agent
34
Mastering AI Agents
must choose from many tools. Giving each prompt its own set of instructions and few-
shot examples can further boost performance. Imagine each agent powered by its own
finely-tuned large language model—this provides a practical framework for development,
allowing you to evaluate and improve each agent individually without affecting the broader
application.
Autogen emerged as one of the first multi-agent frameworks, framing workflows more as
“conversations” between agents. This conversational model adds flexibility, allowing you
to define how agents interact in various patterns, including sequential and nested chats.
Autogen’s design simplifies the management of these complex multi-agent interactions,
enabling effective collaboration among agents.
Winner: LangGraph has an edge due to its graph-based approach, which makes it easier
to visualize and manage complex interactions.
LangGraph supports caching through its built-in persistence layer. This allows you to save
and resume graph execution at any point. The caching mechanism ensures that previously
computed results can be reused, improving performance as well.
AutoGen supports caching API requests so they can be reused when the same request is
issued.
35
Mastering AI Agents
All tools in CrewAI support caching, which enables agents to reuse previously obtained
results efficiently. This reduces the load on external resources and speeds up the execution
time. The cache_function attribute of the tool allows you to define finer control over the
caching mechanism.
Winner: All frameworks support caching, but LangGraph and CrewAI might have an edge.
LangGraph enhances your debugging and experimentation capabilities with its time travel
feature. This allows you to rewind and explore different scenarios easily. It provides a
detailed history of interactions, enabling thorough analysis and understanding of each step
in your process.
While Autogen does not offer an explicit replay feature, it does allow you to manually
update the state to control the agent’s trajectory. This workaround provides some level of
replay functionality, but it requires more hands-on intervention from you.
CrewAI provides the ability to replay from a task specified from the latest crew kickoff.
Currently, only the latest kickoff is supported, and it will only allow you to replay from the
most recent crew run.
Winner: LangGraph and CrewAI make it easy to replay with inbuilt capabilities.
LangGraph integrates with LangChain to support code execution within its workflows.
You can define nodes specifically for executing code, which becomes part of the
36
Mastering AI Agents
overall workflow. This integration means you can seamlessly incorporate complex code
executions into your projects.
Autogen supports code execution through its built-in code executors. Agents can write
and execute code to perform tasks autonomously. The framework provides a safe
environment for code execution, ensuring that agents can perform tasks securely.
CrewAI supports code execution through customizable tools. You can define tools that
execute code and integrate them into the agent’s workflow. This provides flexibility in
defining the capabilities of agents and allows for dynamic task execution.
Winner: Autogen might have a slight edge due to its built-in code executors, but the other
two are also capable.
37
Mastering AI Agents
LangGraph provides fine-grained control over the flow and state of the application. You
can customize the behavior of nodes and edges to suit specific needs. The framework’s
graph-based approach also makes it easy to define complex workflows.
Autogen is customizable, allowing users to extend agents with additional components and
define custom workflows. The framework is designed to be modular and easy to maintain.
CrewAI offers extensive customization options, including role-based agent design and
customizable tools.
Winner: All the frameworks provide customization, but the mileage might vary.
Winner: It remains unclear which framework scales more effectively as more elements are
added. We recommend experimenting with them to get a better idea.
To sum it up:
38
Mastering AI Agents
Open source
All frameworks support open-source LLMs.
LLMs
LangGraph
Chaos Labs has developed the Edge AI Oracle using LangChain and LangGraph for
enhanced decision-making in prediction markets. This system utilizes a multi-agent council
to ensure accurate, objective, and transparent resolutions. Each agent, ranging from data
gatherers to bias analysts and summarizers, plays a role in processing queries through a
decentralized network. This setup effectively reduces single-model biases and allows for
consensus-driven, reliable outputs in high-stakes environments.
Autogen
Built on top of Autogen, OptiGuide employs LLMs to simplify and enhance supply chain
operations. It integrates these models to analyze and optimize scenarios efficiently, such
as assessing the impact of different supplier choices. The system ensures data privacy
and doesn’t transmit proprietary information. Applied within Microsoft’s cloud infrastructure
for server placement, OptiGuide improves operational efficiency and stakeholder
communication and reduces the need for extensive manual oversight.
40
Mastering AI Agents
CrewAI
Waynabox has transformed travel planning by partnering with CrewAI, offering
personalized, hassle-free travel experiences. This collaboration utilizes CrewAI’s multi-
agent system to automatically generate tailored itineraries based on real-time data and
individual preferences. The integration of AI agents—handling activities, preferences, and
itinerary customization—allows travelers to enjoy unique adventures without the stress of
planning. This has helped simplify itinerary planning and enhanced Waynabox’s service to
create a more exciting and seamless travel experience.
However, it is also imperative to consider the accuracy and reliability of AI agents. This
takes us to the next chapter, where we’ll examine the importance of careful monitoring
and feedback to ensure they provide reliable, well-sourced information, necessitating
evaluation.
41
03
CHAPTER
HOW TO
EVALUATE AGENTS
Mastering AI Agents
HOW TO
EVALUATE AGENTS
In the previous chapter, we examined three frameworks, LangGraph, Autogen, and
CrewAI, and some interesting use cases related to them.
The next important step in our journey is to understand how we can ensure the accuracy
and reliability of AI agents. Why is this important in the first place?
Evaluating AI agents is like checking the work of a new employee. You have to make
sure they’re doing their job correctly and reliably. Without regular checks and constructive
feedback, it’s tough to trust that the information the agents provide is accurate and helpful.
The best way to understand this is through an example. So, in this chapter, we’re going
to build a financial research agent, and we’ll cover how, much like humans, agents can be
taught to solve problems by first understanding the issue, making a plan, taking action,
and lastly, evaluating the result.
43
Mastering AI Agents
Requirements
You can install these dependencies in a Python 3.11 environment.
To do so, sign up on Tavily and OpenAI to generate an API key. Save the keys in a .env file,
as shown below.
OPENAI_API_KEY=KKK
TAVILY_API_KEY=KKK
To analyze the results, we use the ReAct agent, which works with the Tavily API to think
through and act on problems.
44
Mastering AI Agents
We can import a prebuilt ReAct agent along with a web search tool called Tavily. While we
use the same agent for all steps in this example, you could use different agents for different
tasks. The best part? You can customize it further in later examples.
Look at Fig 3.1 to understand this better. This code sets up an AI-driven chat agent
named Fred, designed to function as a finance expert in 2024. Fred will use specific tools
and a planning framework to research and answer questions.
45
Mastering AI Agents
State Management
Now, let’s talk about how our agent keeps track of everything it needs to do. Think of it like
a smart to-do list system with three main parts.
First, we need a way to track what the agent plans to do. We’ll use a simple list of steps
written as text strings. This is like having a checklist of tasks the agent needs to complete.
Second, we want to remember what it has already done and what happened with each
task. For this, we’ll use a list of pairs (or tuples in programming terms). Each pair contains
both the action taken and what resulted from that action.
Lastly, we need to store two more important pieces of information: the original question that
was asked (the input) and the final answer once the agent finishes its work (the response).
In Fig 3.2, the PlanExecute class, a dictionary type, manages an execution process,
including input, plan steps, previous steps, and a response. The Plan class, using
Pydantic, defines a structured plan with steps that should be followed in a sorted order.
Fig. 3.2: Defining structures for managing and executing a sequential plan of actions
46
Mastering AI Agents
The planning step is where our agent will begin to tackle a research question. We’ll use a
special feature called function calling to create this plan. Let’s break down how it works.
First, we create a template for how our agent should think. We tell it that it’s a finance
research agent working in October 2024, and its job is to break down big questions into
smaller, manageable steps.
This template, called planner_prompt (See Fig 3.3), gives our agent clear instructions:
create a simple, step-by-step plan where each step leads logically to the next. Ensure that
no steps are missing or unnecessary. The final step should give us our answer.
The code sets this up by using ChatPromptTemplate, which has two main parts:
• A system message that explains the agent’s role and how it should plan
• A placeholder for the messages we’ll send it
Fig. 3.3: Guiding the agent to create a step-by-step plan that should lead to the correct
answer for a given objective
We then connect this template to ChatOpenAI using gpt-4o-mini with temperature set to
0 for consistent results. We take gpt-4o-mini being low on cost. The “structured output”
part means the plan will come out in a specific format we can easily work with.
When we test it with a real question like “Should we invest in Tesla given the current
situation of EVs?” the agent will create a detailed plan for researching this investment
decision. Each step will help gather the information needed to make an informed
recommendation about Tesla stock based on the current electric vehicle market
conditions. (See Fig 3.4)
47
Mastering AI Agents
Think of it like creating a research roadmap. We’re giving our agent the tools and
guidelines it needs to break down complex questions into manageable research tasks.
Think of re-planning as the agent’s ability to adjust its strategy based on what it has already
learned. This is similar to how we might revise our research approach after discovering
new information. Let’s break down how this works.
First, we create two types of possible actions the agent can take:
• Response: When the agent has enough information to answer the user’s question
• Plan: When the agent needs to do more research to get a complete answer
The re-planning prompt is like giving our agent a structured way to think about what to do
next. It looks at three things:
• The original question (objective)
• The initial plan it made
• What steps have already been completed and what was learned
The clever part is that the agent won’t repeat steps it’s already done. It focuses only on
what still needs to be investigated. This makes the research process more efficient and
prevents redundant work. It’s like having a research assistant who can intelligently adjust
their approach based on what they’ve already discovered.
This process helps our agent stay focused and efficient, only pursuing new information
when needed and knowing when it’s time to provide a final answer to the user.
48
Mastering AI Agents
We connect this re-planning ability to gpt-4o with the temperature set to 0. By setting the
temperature to 0 (See Fig 3.5), we force the model to generate the same response for the
same input. This helps us in making experiments reproducible.
Fig. 3.5: Replanner_prompt to review and update a given plan based on past actions
49
Mastering AI Agents
The execute_step function handles individual tasks. It takes the first item from our plan,
formats it properly, and has the agent work on it. It’s like giving a specific assignment to a
research assistant and getting back their findings. The agent keeps track of what it did and
what it learned.
The plan_step function is where everything begins. When given a question, it creates the
initial research plan. This is like creating a first draft of how to tackle the problem.
The replan_step function is where the agent decides what to do next. After completing a
task, it looks at what it has learned and either:
• Creates new steps if more research is needed
• Provides a final answer if it has enough information
Finally, we have the should_end function, which works like a checkpoint. It checks
whether we have a final answer ready. If we do, it ends the process. If not, it tells the agent
to continue working. You can see all these functions in the code snippet below, in Fig 3.6.
We use StateGraph to create a map that guides our agent through its research journey via
different actions it can take. Here’s how it flows:
First, we create the basic structure of the workflow with its three main stops:
• A planning station (“planner”)
• A research station (“agent”)
• A reviewing station (“replan”)
This creates a smooth cycle in which the agent can continue researching until it has
everything it needs to answer the original question. It’s like having an intelligent research
assistant who knows when to dig deeper and when they’ve found enough information.
Finally, we compile this workflow into something we can easily use, just like any other tool
in our system. This makes our research agent ready to tackle real questions and provide
thorough, well-researched answers. See Fig 3.7.
51
Mastering AI Agents
We can visualize the agent workflow with a mermaid diagram, as shown in Fig 3.8. See the
output in Fig 3.9.
52
Mastering AI Agents
_start_
planner
agent
_start_
agent
tools _end_
replan
_end_
53
Mastering AI Agents
The inbuilt scorers make it very easy to set up one for us. We use gpt-4o as our LLM for
the context adherence metric, with three evaluations per response for better to ensure
great evaluation accuracy. This scorer specifically looks at how well the agent sticks to the
context and provides relevant information.
Note that we’re using GPT-4o to evaluate a smaller AI model, which is like having an expert
oversee a novice’s work. GPT-4o, with its advanced capabilities and deep understanding
of language nuances, can be a reliable benchmark for judging the smaller model’s (in our
case, the 4o-mini) responses. See Fig 3.10.
We then set up a Galileo evaluation callback that will track and record our agent’s
performance. It’s like having a quality control system that monitors our research process.
Next, we set some basic config for our agent:
• It can’t go through more than 30 cycles (recursion_limit).
• It must use our evaluation system (callbacks).
54
Mastering AI Agents
With just two lines of code, we can get all the information needed to visualize and debug
the traces.
We then run our agent with a specific test question. The system will process this question
through the research workflow we built earlier.
The code is set up to show us what’s happening at each step (that’s what the async for
loop does). It will print out each action and result as they happen, letting us watch the
research process in real-time.
Finally, we close our evaluation session with evaluate_handler.finish(). This saves all the
performance data we collected during the run to the Galileo Evaluate console so we can
see the chain visualization and the agent metrics. See Fig 3.12 and Fig 3.13.
55
Mastering AI Agents
You can run several experiments to evaluate the research agent’s performance. For
instance, you can use the project dashboard to see how different test runs performed
based on key metrics (see Figure 3.14).
The standout performer was test-3, which earned the top rank with impressive results.
Performance of test-3:
• Context Adherence Score: 0.844 (High relevance to the research questions)
• Speed: Completed tasks in 84,039 milliseconds (Fastest among all tests)
• Responses Processed: 3 during the run
• Cost: $0.0025 per run (Low cost)
These results give valuable insights into our agent’s capabilities and help identify the most
effective configuration for future research tasks.
56
Mastering AI Agents
Now, you can go inside each test run to see agent executions (See Fig 3.15). The
dashboard reveals seven different research queries that our agent processed. Each query
focused on analyzing different companies’ financial metrics. Here’s what you’ll observe:
This detailed view helps you understand where the agent performs well and where it might
need improvements in terms of speed and accuracy.
57
Mastering AI Agents
Looking at the trace view (Fig 3.16), you can see a detailed breakdown of an execution
chain where the context adherence was notably low at 33.33%. The system explanation
helps us understand why:
“The response has a 33.33% chance of being consistent with the context. Based on the
analysis, while some of the figures like those for later 2022 and 2023 are supported by
document references (such as Q3 2023 and Q4 2023), many earlier quarters’ figures
lack direct evidence from the documents or explicit mentions, leading to incomplete
support for claims.”
58
Mastering AI Agents
Let’s take a quick look at what we learned in the chapter. We saw how our agent
implemented the ReAct (Reasoning and Acting) framework to:
That said, testing the finance research agent in this chapter teaches you something
very important and valuable: an AI is only as good as our ability to check its work. By
looking closely at how the agent performed, you could see exactly what it did well (like
finding recent data quickly) and what it struggled with (like backing up older numbers
with proper sources). The evaluation step helped spot these issues easily, showing us
where to improve the agent.
The next chapter is going to get even more interesting (plus, you have five solid use
cases to look at!) as we explore different metrics to evaluate the AI agents across four
dimensions: System Metrics, Task Completion, Quality Control, and Tool interaction.
59
04
CHAPTER
METRICS FOR
EVALUATING AI
AGENTS
Mastering AI Agents
Metrics for
Evaluating AI Agents
Before we explore metrics for evaluating AI, let’s recall our key insights into agent
evaluation. Using LLM-based judges (like GPT-4o) and robust metrics (such as context
adherence), we effectively measured an agent’s performance across various dimensions,
including accuracy, speed, and cost efficiency. We then set up Galileo’s evaluation callback
to track and record the agent’s performance.
This next chapter will explore various metrics for evaluating AI agents using five solid case
studies.
Let’s consider a document processing agent. While it might initially demonstrate strong
performance metrics, we may have to probe into several questions:
Through a series of hypothetical case studies, we’ll explore how organizations may
transform their AI agents into reliable digital colleagues using key metrics. These examples
will demonstrate practical approaches to:
61
Mastering AI Agents
You should remember that the goal isn’t perfection but establishing reliable, measurable,
and continuously improving AI agents that deliver consistent value across all four key
performance dimensions. See Fig 4.1
62
Mastering AI Agents
Case Study 1:
Advancing the Claims
Processing Agent
Claim Processing System Overview
Claim Validator
Process
Payment Calculator
Claim Decision
63
Mastering AI Agents
• The AI agent struggled with complex claims, leading to payment delays and provider
frustration. Because of the inconsistency in handling these claims, claims processors
spent more time verifying the AI’s work than processing new claims.
• The error rate in complex cases raised alarms with the compliance team, especially
critical given the stringent regulatory demands of healthcare claims processing.
Functionality
The AI was designed to:
Challenges
To counter these issues, the network focused on three key performance indicators to
transform their AI agent’s capabilities:
64
Mastering AI Agents
Outcomes
The enhanced agent delivered:
65
Mastering AI Agents
Case Study 2:
Optimizing the Tax Audit Agent
Documentation Phase
Upload
Upload Upload
Document Hub
Feed
Analysis Phase
AI Engine
Risk Detection
Final
Assessment
Pass Issue
66
Mastering AI Agents
What should have streamlined their operations was instead causing senior auditors to
spend more time supervising the AI’s work than doing their specialized analysis. The firm
needed to understand why its significant investment in AI wasn’t delivering the anticipated
productivity gains.
Functionality
The AI audit agent was designed to:
• Process various tax documents, from basic expense receipts to complex corporate
financial statements.
• Automatically extract and cross-reference key financial data in corporate tax returns.
• Systematically verify compliance across multiple tax years.
• Validate deduction claims against established rules and flag discrepancies for review.
• For simpler cases, it could generate preliminary audit findings and reports.
• The system was integrated with the firm’s tax software and document management
systems to access historical records and precedents.
Challenges
The team focused on three critical metrics to reshape their agent’s capabilities:
67
Mastering AI Agents
Outcomes
The refined capabilities of the AI agent led to:
68
Mastering AI Agents
Case Study 3:
Elevating the Stock Analysis Agent
Market Data
Market Context
Analyze
Prediction Engine
Trading Signals
69
Mastering AI Agents
At a boutique investment firm, their AI-enhanced analysis service was under scrutiny as
clients questioned its value. Portfolio managers were overwhelmed by redundant analysis
requests and faced inconsistent reporting formats across client segments.
This situation undermined the firm’s competitive edge of providing rapid market insights
as analysts spent excessive time reformatting and verifying the AI’s outputs. The inability
of the AI to adjust its analysis depth based on varying market conditions resulted in either
overly superficial or unnecessarily detailed reports, compromising client confidence in the
service.
Functionality
The AI analysis agent was developed to:
• Process multiple data streams, including market prices, company financials, news
feeds, and analyst reports.
• Generate comprehensive stock analyses by evaluating technical indicators, assessing
fundamental metrics, and identifying market trends across different timeframes.
• Generate customized reports combining quantitative data with qualitative insights for
each analysis request.
• The system was integrated with the firm’s trading platforms and research databases,
providing real-time market intelligence.
Challenges
Through analyzing three crucial metrics, the team improved the AI agent’s performance:
70
Mastering AI Agents
Outcomes
The enhancements to the AI agent delivered:
71
Mastering AI Agents
Case Study 4:
Upgrading the
Coding Agent
Development Assistant System Overview
Code Analysis
Market Context
Generate
Suggestion Engine
Code Review
72
Mastering AI Agents
Developers experienced delays as the agent struggled with large codebases and provided
irrelevant suggestions that failed to consider project-specific requirements. Additionally,
rising infrastructure costs from inefficient resource usage further exacerbated the situation,
prompting a need for transformative improvements to make the AI assistant a genuine
productivity tool.
Functionality
The AI coding assistant was designed to:
Challenges
By optimizing three pivotal indicators, the team significantly enhanced the agent’s
capabilities:
73
Mastering AI Agents
Outcomes
The optimizations delivered:
74
Mastering AI Agents
Case Study 5:
Enhancing the Lead
Scoring Agent
Lead Scoring System Overview
Digital Signals
Engagement
Behavior Score Interest Score
Score
Signal Processor
Feed
ML Engine
Lead
Qualification
75
Mastering AI Agents
Functionality
• Evaluate data from multiple sources like website interactions, email responses, social
media engagement, and CRM records to assess potential customers.
• Analyze company profiles, assess engagement patterns, and generate lead scores
based on predefined criteria.
• Automatically categorize prospects by industry, company size, and potential deal value,
updating scores in real-time as new information became available.
• Integrate with the company’s sales tools, providing sales representatives with prioritized
lead lists and engagement recommendations.
Challenges
76
Mastering AI Agents
• Solution: Developing smarter selection criteria allowed the agent to match tool
complexity with the analysis needs, using simpler tools for straightforward tasks
and reserving intensive tools for complex cases.
Outcomes
• Faster prospect analysis processing
• Higher lead qualification accuracy
• Improved resource utilization efficiency
These use cases reveal a crucial truth: effective AI agents require careful
measurement and continuous optimization. As these systems become more
sophisticated, the ability to measure and improve their performance becomes increasingly
important.
77
05
CHAPTER
WHY MOST
AI AGENTS FAIL
& HOW TO FIX THEM
Mastering AI Agents
CHAPTER 5
WHY MOST AI AGENTS FAIL
& HOW TO FIX THEM
In the previous chapter, we looked at different metrics for evaluating our AI agents, namely
along four core dimensions: Technical efficiency, Task Completion, Quality Control, and
Tool interaction. In our journey, we’ve also seen how agents are powerful tools capable of
automating complex tasks and processes with many frameworks that make it possible to
build complex agents in a few lines of code. However, many AI agents fail to deliver the
expected outcomes despite their potential.
In this chapter, we’ll examine why agents fail, providing insights into common pitfalls and
strategies to overcome them.
79
Mastering AI Agents
80
Mastering AI Agents
Development Issues
Poorly Defined Task Evaluation
or Persona Issues
81
Mastering AI Agents
LLM Issues
Difficult to Steer
You can steer LLMs towards specific Hierarchical Design
tasks or goals for consistent and reliable
Implement a hierarchical design where
performance. Effective steering ensures
specialized agents handle specific tasks,
that agents can perform their intended
reducing the complexity of steering a
functions accurately and efficiently. LLMs
single agent. (See Fig 5.1)
are influenced by vast amounts of training
data, which can lead to unpredictable Fine-Tuning
behavior, and fine-tuning them for specific
tasks requires significant expertise and Continuously fine-tune the LLM based
computational resources. on task-specific data to improve
performance.
Specialized Prompts
Controller Agent
Fig 5.1: Hierarchical design with specialized agents performing specific tasks
82
Mastering AI Agents
API Gateway
SQS Queue
Lambda Controller
Large Model
Model Cache CloudWatch
API
83
Mastering AI Agents
Planning Failures
Effective planning is crucial for agents Multi-Plan Selection
to perform complex tasks. Planning
Generate multiple plans and select the
enables agents to anticipate future
most appropriate one based on the
states, make informed decisions, and
context.
execute tasks in a structured manner.
Without effective planning, agents may Reflection and Refinement
struggle to achieve desired outcomes.
However, LLMs often struggle with Continuously refine plans based on new
planning, as it requires strong reasoning information and feedback.and scale
abilities and the ability to anticipate computational resources efficiently.
future states. Design a serverless system to save
wasting of resources. (See Fig 5.2.)
Task Decomposition
Task Analysis
Task Decomposition
Plan Generation
Feedback
Plan A Plan B Plan C Loop
Plan Evaluation
Selected Plan
Execution Reflection
Success
Fig 5.3: A simple illustration of how an agent
plans and executes complex task decomposition,
multi-plan selection, and continuous refinement Task Complete
84
Mastering AI Agents
Reasoning Failures
Reasoning is a fundamental capability modules can include specialized
that enables agents to make decisions, algorithms for logical reasoning,
solve problems, and understand probabilistic inference, or symbolic
complex environments. Strong computation.
reasoning skills are essential for agents
Finetune LLM
to interact effectively with complex
environments and achieve desired Establish training with data generated
outcomes. LLMs lacking strong with a human in the loop. Feedback
reasoning skills may struggle with tasks loops allow the agent to learn from its
that require multi-step logic or nuanced mistakes and refine its reasoning over
judgment. (See Fig 5.4) time. You can use data with traces
of reasoning that teach the model to
Enhance Reasoning Capabilities
reason or plan in various scenarios.
Use prompting techniques like Reflexion
Use Specialized Agents
to enhance the reasoning capabilities.
Incorporate external reasoning modules Develop specialized agents that focus
that can assist the agent in complex on specific reasoning tasks to improve
decision-making processes. These overall performance.
User Question
Initial Response
Reasoning Check
Needs Improvement
Looks Good
Specialized Module
Learning
Improved Answer
Final Answer
Human Feedback
85
Mastering AI Agents
Production Issues
Guardrails content. For example, content filters can
Guardrails help ensure that agents scan the agent’s outputs for prohibited
adhere to safety protocols and words or phrases and block or modify
regulatory requirements. This is responses that contain such content.
particularly important in sensitive
Input Validation
domains such as healthcare,
Before processing, inputs received by
finance, and legal services, where
the agent must be validated to ensure
non-compliance can have severe
they meet specific criteria. This can
consequences. Guardrails define the
prevent malicious or malformed inputs
operational limits within which agents
from causing unintended behavior.
can function.
Action Constraints
Implement rule-based filters and
Define constraints on the actions that
validation mechanisms to monitor and
agents can perform. For example, an
control the actions and outputs of AI
agent managing financial transactions
agents.
should have rules that prevent it
Content Filters from initiating transactions above a
Use predefined rules to filter certain threshold without additional
inappropriate, offensive, or harmful authorization.
86
Mastering AI Agents
87
Mastering AI Agents
Agent Scaling
Scaling agents to handle increased Resource Management
workloads or more complex tasks
Integrate load balancers to distribute
is a significant challenge. As the
incoming requests evenly across
number of agents or the complexity of
multiple agents. This prevents any
interactions grows, the system must
single agent service from becoming
efficiently manage resources, maintain
overwhelmed and ensures a more
performance, and ensure reliability.
efficient use of resources.
Scalable Architectures
Monitor Performance
Design architectures that can efficiently
Implement real-time monitoring tools
manage increased workloads and
to track each agent’s performance.
complexity. Implement a microservices
Metrics such as response time, resource
architecture where each agent or group
utilization, and error rates should be
of agents operates as an independent
continuously monitored to identify
service. This allows for easier scaling
potential issues. (See Fig 5.5)
and management of individual
components without affecting the entire
system.
User Requests
Load Balancer
AI Agent Pool
Monitoring
Performance Tracker
Scale Up/Down
Auto Scaler
Fig 5.5: An illustration that shows
how you can add monitoring and load
balancers for easy scale-up and down
88
Mastering AI Agents
Fault Tolerance
AI agents need to be fault-tolerant Automated Recovery
to ensure that they can recover
Incorporate intelligent retry mechanisms
from errors and continue operating
that automatically attempt to recover
effectively. Without robust fault
from transient errors. This includes
tolerance mechanisms, agents may
exponential backoff strategies, where
fail to handle unexpected situations,
the retry interval increases progressively
leading to system crashes or degraded
after each failed attempt, reducing
performance. (See Fig 5.6)
the risk of overwhelming the system.
Redundancy Develop self-healing mechanisms that
automatically restart or replace failed
Deploy multiple instances of AI
agent instances.
agents running in parallel. If one
instance fails, the other instances can Stateful Recovery
continue processing requests without
Ensure that AI agents can recover their
interruption. This approach ensures high
state after a failure. This involves using
availability and minimizes downtime.
persistent storage to save the agent’s
state and context, allowing it to resume
operations from the last known good
state after a restart.
Your Task
Primary Agent
Check
Switch to Backup Failure Success Process Task
Health
Backoff Timer
Redundant
Agents Error
Occurs?
Backup Agent 1 Backup Agent 2 No Yes
89
Mastering AI Agents
Infinite Looping
Looping mechanisms are essential for Enhance Reasoning and Planning
agents to perform iterative tasks and
Improve the agent’s reasoning and
refine their actions based on feedback.
planning capabilities to prevent infinite
Agents can sometimes get stuck in
looping.
loops, repeatedly performing the same
actions without progressing toward their Monitor Agent Behavior
goals. (See Fig 5.7)
Monitor agent behavior and adjust to
Clear Termination Conditions prevent looping issues.
Implement clear criteria for success and
mechanisms to break out of loops.
Receive Task
Task Analysis
Generate Solution
Reasoning
Check Progress
New Approach
Progress
No Progress
Goal Achieved
Loop Check
90
Mastering AI Agents
Through the above examples and workflow diagrams (Fig 5.1 to Fig 5.6), you’ll
notice that while building AI agents presents numerous challenges, understanding and
addressing these common failure points is necessary for success.
By implementing proper guardrails, ensuring robust error handling, and designing scalable
architectures, you can create agents that work reliably and provide real value in production
environments.
Always start small, test thoroughly, and gradually expand your agent’s capabilities as you
learn from real-world usage. Pay special attention to the fundamentals we’ve covered—
from clear task definition and evaluation to proper planning and reasoning capabilities.
This will help you establish a strong foundation when you begin to experiment with your AI
agents.
91
Mastering AI Agents
Glossary
Term Description
The time delay between when an AI agent receives input and when it provides a
System Latency
response.
A system design approach that integrates human oversight and intervention points
Human-in-the-Loop (HITL)
within automated AI processes.
An architectural approach where AI agents are assigned specific roles with defined
Role-Based Agent Design
responsibilities, tools, and interaction patterns within a larger system.
A metric measuring how efficiently an AI agent uses its available processing capacity
Context Window Utilization
for analyzing and retaining information
A critical reliability metric tracking the frequency of failed API requests and processing
LLM Call Error Rate
errors when an AI agent interacts with its underlying language model.
92
Mastering AI Agents
A system architecture where specialized AI agents handle specific tasks, reducing the
Hierarchical Design
complexity of steering a single agent.
The mechanism by which AI agents interact with external systems and data sources
Tool Calling
to solve complex problems through multiple tool interactions.
93