0% found this document useful (0 votes)
52 views

AutoGen_Studio-12

Autogen Studio

Uploaded by

aejaz_98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

AutoGen_Studio-12

Autogen Studio

Uploaded by

aejaz_98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

AUTO G EN S TUDIO: A No-Code Developer Tool for Building and

Debugging Multi-Agent Systems

Victor Dibia, Jingya Chen, Gagan Bansal, Suff Syed,


Adam Fourney, Erkang Zhu, Chi Wang, Saleema Amershi
Microsoft Research, Redmond, United States
{victordibia, jingyachen, gaganbansal, suffsyed, adam.fourney,
erkang.zhu, chiw, samershi}@microsoft.com

Abstract Agent A Agent B

Initiator Book generation group chat manager


Plan and generate book content including text and images.
Multi-agent systems, where multiple agents Userproxy
Represent user, execute co..
(generative AI models + tools) collaborate, are Content Agent
Generate content for each...
Image Agent
Generate images
Code executor
emerging as an effective pattern for solving GPT 4 Turbo GPT 4 Turbo
long-running, complex tasks in numerous do- Drag & drop to add a skill Image generator
mains. However, specifying their parameters
(such as models, tools, and orchestration mech- QA Agent
Verify the content meet par...
anisms etc,.) and debugging them remains chal-
Drag to add a model
lenging for most developers. To address this
Drag to add a skill
challenge, we present AUTO G EN S TUDIO, a
no-code developer tool for rapidly prototyping,
debugging, and evaluating multi-agent work- Figure 1: AUTO G EN S TUDIO provides a drag-n-drop
flows built upon the AUTO G EN framework. UI where models, skills/tools, memory components can
AUTO G EN S TUDIO offers a web interface and be defined, attached to agents and agents attached to
a Python API for representing LLM-enabled workflows.
agents using a declarative (JSON-based) speci-
fication. It provides an intuitive drag-and-drop
UI for agent workflow specification, interactive our capacity to solve complex problems, they also
evaluation and debugging of workflows, and
introduce new challenges. For example, developers
a gallery of reusable agent components. We
highlight four design principles for no-code must now configure a large number of parameters
multi-agent developer tools and contribute an for these systems including defining agents (e.g.,
open-source implementation.1 the model to use, prompts, tools or skills available
to the agent, number of action steps an agent can
1 Introduction take, task termination conditions etc.), communica-
tion and orchestration mechanisms - i.e., the order
When combined with the ability to act (e.g., using
or sequence in which agents act as they collabo-
tools), Generative AI models function as agents, en-
rate on a task. Additionally, developers need to
abling complex problem-solving capabilities. Im-
debug and make sense of complex agent interac-
portantly, recent research has shown that transi-
tions to extract signals for system improvement.
tioning from prescribed (fixed) agent pipelines to a
All of these factors can create significant barriers
multi-agent setup with autonomous capabilities can
to entry and make the multi-agent design process
result in desirable behaviors such as improved fac-
tedious and error-prone. To address these chal-
tuality and reasoning (Du et al., 2023), as well as
lenges, we have developed AUTO G EN S TUDIO, a
divergent thinking (Liang et al., 2023). These obser-
tool for rapidly prototyping, debugging, and evalu-
vations have driven the development of application
ating MULTI - AGENT workflows. Our contributions
frameworks such as AutoGen (Wu et al., 2023),
are highlighted as follows:
CAMEL (Li et al., 2024), and TaskWeaver (Qiao
et al., 2023), which simplify the process of crafting • AUTO G EN S TUDIO - a developer-focused tool
multi-agent applications expressed as Python code. (UI and backend Web and Python API) for
However, while multi-agent applications advance declaratively specifying and debugging (human-
1
https://github.com/microsoft/autogen/tree/ in-the-loop and non-interactive) MULTI - AGENT
autogenstudio/samples/apps/autogen-studio workflows. AUTO G EN S TUDIO provides a novel
drag-and-drop experience (Figure 1) for rapidly toGen (Wu et al., 2023) is an open-source exten-
authoring complex MULTI - AGENT agent work- sible framework that allows developers to build
flows, tools for profiling/debugging agent ses- large MULTI - AGENT applications. CAMEL (Li
sions, and a gallery of reusable/shareable MULTI - et al., 2024) is designed to facilitate autonomous
AGENT components. cooperation among communicative agents through
role-playing, using inception prompting to guide
• We introduce profiling capabilities with visual-
chat agents toward task completion while align-
izations of messages/actions by agents and met-
ing with human intentions. OS-Copilot (Wu et al.,
rics (costs, tool invocations, and tool output sta-
2024) introduces a framework for building general-
tus) for debugging MULTI - AGENT workflows.
ist agents capable of interfacing with comprehen-
• Based on our experience building and supporting sive elements in an operating system, including the
AUTO G EN S TUDIO as an open-source tool with web, code terminals, files, multimedia, and various
a significant user base (over 200 K downloads third-party applications. It explores the use of a
within a 5-month period), we outline emerg- dedicated planner module, a configurator, and an
ing design patterns for MULTI - AGENT developer executor, as well as the concept of tools ( Python
tooling and future research directions. functions or calls to API endpoints) or skills (tools
that can be learned and reused on the fly).
To the best of our knowledge, AUTO G EN S TU -
DIO is the first open-source project to explore a
Multi-Agent Core Concepts
no-code interface for autonomous MULTI - AGENT
application development, providing a suitable plat-
1. Model: Generative AI model used to
form for research and practice in MULTI - AGENT
drive core agent behaviors.
developer tooling.
2. Skills/Tools: Code or APIs used to ad-
2 Related Work
dress specific tasks.
2.1 Agents ( LLMs + Tools)
3. Memory: Short term (e.g., lists) or long
Generative AI models face limitations, including term (vector databases) used for to save
hallucination — generating content not grounded and recall information.
in fact — and limited performance on reasoning
tasks or novel out-of-distribution problems. To 4. Agent: A configuration that ties together
address these issues, practice has shifted towards the model, skills, memory components
agentic implementations where models are given and behaviors.
access to tools to act and augment their perfor-
mance (Mialon et al., 2023). Agentic implemen- 5. Workflow: A configuration of a set of
tations, such as React (Yao et al., 2022), explore agents and how they interact to address
a Reason and Act paradigm that uses LLMs to tasks (e.g., order or sequence in which
generate both reasoning traces and task-specific agents act, task planning, termination
actions in an interleaved manner. As part of this conditions etc.).
process, developers have explored frameworks that
build prescriptive pipelines interleaving models and Collectively, these tools support a set of core
tools (e.g., LIDA (Dibia, 2023), LangChain (Chase, capabilities - definition of agent parameters - such
2022)). However, as tasks become more complex, as generative AI models, skills / tools or memory,
requiring lengthy context and the ability to inde- and agent workflows - specifications of how these
pendently adapt to dynamic problem spaces, pre- agents can collaborate. However, most of these
defined pipelines demonstrate limited performance frameworks primarily support a code-first represen-
(Liu et al., 2024). This limitation has led to the tation of agent workflows, which presents a high
exploration of more flexible and adaptive agent barrier to entry and rapid prototyping. They also
architectures. do not provide tools or metrics for agent debugging
and evaluation. Additionally, they lack structured
2.2 MULTI - AGENT Frameworks reusable templates to bootstrap or accelerate the
Several frameworks have been proposed to provide agent workflow creation process. AUTO G EN S TU -
abstractions for creating such applications. Au- DIO addresses these limitations by providing a vi-
sual interface to declaratively define and visualize The gallery view facilitates the reuse and sharing
agent workflows, test and evaluate these workflows, of agent artifact templates.
and offer templates for common MULTI - AGENT
tasks to streamline development. While this work 4.1.1 Building Workflows
is built on the AUTO G EN open source library (Wu The build view in the UI (see Figure 1) offers a
et al., 2023) and inherits the core abstractions for define-and-compose experience, allowing develop-
representing agents, the proposed design patterns ers to declaratively define low-level components
on no-code developer tools are intended to apply and iteratively compose them into a workflow. For
to all MULTI - AGENT frameworks. instance, users can define configurations for mod-
els, skills/tools (represented as Python functions
3 Design Goals addressing specific tasks), or memory stores (e.g.,
AUTO G EN S TUDIO is designed to enhance the documents organized in a vector database). Each
MULTI - AGENT developer experience by focusing entity is saved in a database for use across inter-
on three core objectives: face interactions. Subsequently, they can define
Rapid Prototyping: Provide a playground where an agent, attaching models, skills, and memory to
developers can quickly specify agent configura- it. Several agent default templates are provided
tions and compose these agents into effective multi- following AUTO G EN abstractions - a UserProxy
agent workflows. agent (has a code execution tool by default), an
Developer Tooling: Offer tools designed to help AssistantAgent (has a generative AI model default),
developers understand and debug agent behaviors, and a GroupChat agent (an abstraction container
facilitating the improvement of multi-agent sys- for defining a list of agents, and how they interact).
tems. Finally, workflows can be defined, with existing
Reusable Templates: Present a gallery of reusable, agents attached to these workflows. The default
shareable templates to bootstrap agent workflow workflow patterns supported are autonomous chat
creation. This approach aims to establish shared (agents exchange messages and actions across con-
standards and best practices for MULTI - AGENT sys- versation turns until a termination condition is met)
tem development, promoting wider adoption and and sequential chat (a sequence of agents defined,
implementation of MULTI - AGENT solutions. each agent processes its input in order and passes
on a summary of their output to the next agent).
4 System Design The workflow composition process is further en-
AUTO G EN S TUDIO is implemented across two hanced by supporting a drag-and-drop interaction
high-level components: a frontend user interface e.g., skills/models can be dragged to agents and
(UI) and a backend API (web, python and com- agents into workflows.
mand line). It can be installed via the PyPI package 4.1.2 Testing and Debugging Workflows
manager (listing 1).
Workflows can be tested in-situ in the build view,
or more systematically explored within the play-
pip install autogenstudio ground view. The playground view allows users
autogenstudio ui -- port 8081 create sessions, attach workflows to the session,
and run tasks (single shot or multi-turn). Sessions
can be shared (to illustrate workflow performance)
listing 1: AUTO G EN S TUDIO can be installed from and multiple sessions can be compared. AUTO G EN
PyPI (pip) and the UI launched from the command line. S TUDIO provides two features to support debug-
ging. First, it provides an observe view where as
4.1 User Interface tasks progress, messages and actions performed by
The frontend web interface in AUTO G EN S TU - agents are streamed to the interface, and all gen-
DIO is built using React and implements three erated artifacts are displayed (e.g., files such as
main views that support several key functionalities. images, code, documents etc). Second a post-hoc
The build view enables users to author (define-and- profiler view is provided where a set of metrics are
compose) multi-agent workflows. The playground visualized for each task addressed by a workflow -
view allows for interactive task execution and work- total number of messages exchanged, costs (gener-
flow debugging, with options to export and deploy. ative AI model tokens consumed and dollar costs),
Backend API A Frontend Web UI API B
autogenstudio.web.app Book generation Recent sessions Obser ve Agents
AutoGen Studio
Agent messages Profiler
Web API create a childrens pdf book with 4 pages, each describing the weather
Playground in seattle. Each page should have extensive descripitions with images of Cost
the weather. Create the images first, then create the text, then the pdf.
REST + Socket endpoints Build Agent Tokens USD
for UI Groupchat manager 12912 0.152
Agents have completed the task Observe this response
Gallery
Content 10812 0.122
The children's PDF book titled "Weather in Seattle" has been
successfully created with descriptions and images for each weather Userproxy 2912 0.022
condition. The book should now be available as
autogenstudio.worflowmanager "Seattle_Weather_Childrens_Book.pdf" on your system.
Image Generator 901 0.012
You can open and view the PDF to ensure that it meets your
expectations and contains all the pages with the appropriate images Quality Assurance 603 0.009
Python API and descriptions.

If everything looks good, that completes our task. If you need any
Message
further assistance or modifications, please let me know.
Hydrate workflow Results (7 files) Total messages
specifications into AutoGen Userproxy
agents and run tasks Groupchat manager
Content
Image Generator
autogenstudio.cli Quality Assurance
0 5 10 15 20

Command Line
Seattle_Weather_Childrens_Book.pdf Tool call Success Failure

CLI Utilities 
 Userproxy



 Feedback Groupchat manager

autogenstudio ui --port 8081 Document Content


Image Generator
Guest user What would you like to do?
autogenstudio serve -- 0/2000
Quality Assurance
0 0.5 1 1.5 2
workflow=workflow.json Close sidebar

Figure 2: AUTO G EN S TUDIO provides a backend api (web, python, cli) and a UI which implements a playground
(shown), build and gallery view. In the playground view, users can run tasks in a session based on a workflow. Users
can also observe actions taken by agents, reviewing agent messages and metrics based on a profiler module.

how often agents use tools and the status of tool declarative (JSON), users can also easily export,
use (success or failure), for each agent. version and reshare them.
4.1.3 Deploying Workflows 4.2 Backend API - Web, Python, and
AUTO G EN S TUDIO enables users to export work- Command Line
flows as a JSON configuration file. An exported The backend API comprises three main compo-
workflow can be seamlessly integrated into any nents: a web API, a Python API, and a command-
Python application (listing 2), executed as an API line interface. The web API consists of REST
endpoint using the AUTO G EN S TUDIO command endpoints built using the FastAPI library2 , sup-
line interface (figure 2a), or wrapped in a Docker porting HTTP GET, POST, and DELETE methods.
container for large-scale deployment on various These endpoints interact with several key classes:
platforms (Azure, GCP, Amazon, etc.). A DBM anager performs CRUD (Create, Read,
Update, Delete) operations on various entities such
from autogenstudio import as skills, models, agents, memory, workflows, and
WorkflowManager sessions. The W orkf lowM anager class handles
wm = WorkflowManager ( " workflow . the ingestion of declarative agent workflows, con-
json " ) verts them into AUTO G EN agent objects, and exe-
wm . run ( message = " What is the cutes tasks (see listing 2). A P rof iler class parses
height of the Eiffel Tower " ) agent messages to compute metrics. When a user
initiates a task within a session, the system retrieves
the session history, instantiates agents based on
listing 2: Workflows can be imported in python apps. their serialized representations from the database,
executes the task, streams intermediate messages to
4.1.4 Template Gallery the UI via websocket, and returns the final results.
AUTO G EN S TUDIO also provides a command-line
The UI also features a gallery view - a repository
interface with utilities for launching the bundled UI
of components (skills, models, agents, workflows)
and running exported workflows as API endpoints.
that users can import, extend, and reuse in their own
2
workflows. Since each component specification is FastAPI: https://fastapi.tiangolo.com/
AutoGen Studio GitHub Issue Visualization (UMAP)
5 Usage and Evaluation
Issues with API Keys,
Model Configuration, and
In this project, we have adopted an in-situ, iterative Local Server Connections
(27)
evaluation approach. Since its release on GitHub Issues with AutoGen
Studio: Docker access,
(5 months), the AUTO G EN S TUDIO package has validation errors, and
compatibility (17)
Issues with Autogen
been installed over 200 K times and has been itera- Studio: Skills not
updating, Code execution,
tively improved based on feedback from usage (> and Group Chat (21)
135 GitHub issues). Issues highlighted several user
AutoGen Studio 2 Issues with Group Chat
pain points that were subsequently addressed in- Compatibility, API Workflow, Agent Creation,
Issues, and Documentation and Model Changes (18)
cluding: (a) challenges in defining, persisting, and Updates (10)
reusing components, resolved by implementing a Accessibility and
Multimodality in Autogen
Studio, UI Improvements,
database layer; (b) difficulties in authoring compo- Group Chat Support, and
Test Suite (14)
nents, resolved by supporting automated tool gener-
ation from descriptions and integrating an IDE for AutoGen Studio Feature AutoGen Studio: Database
Requests: Workflow Implementation, Custom
Sharing, File Uploads, UI Configurations, and
editing tools; (c) frustrations caused by components Improvements, and Model Performance Enhancements
Testing (14) (14)
failing during end-to-end tests, addressed by incor-
porating a test button for components (e.g.,models)
and workflows in the build view. Figure 3 displays Figure 3: Plot of GitHub issues (n = 8 clusters) from
a plot of all AUTO G EN S TUDIO issues. Each point the AUTO G EN S TUDIO repo. User feedback ranged
represents an issue, based on an embedding of its from support with workflow authoring tools (e.g., the
text (title + body) using OpenAI’s text-embedding- ability configure and test models) to general installation.
3-large model. The embeddings were reduced to
two dimensions using UMAP, clustered with K- workflow, where entities are first defined and per-
Means (k = 8), and cluster labels generated using sisted independently, and then composed ultimately
GPT-4 (grounded on 10 samples from its centroid). into multi-agent workflows, provides a good de-
Finally, in Appendix A, we demonstrate how AU - veloper experience. This includes providing tools
TO G EN S TUDIO can effectively be used to support to support authoring entities e.g., the ability de-
an engineer persona in rapidly prototyping, testing, fine and test models, an IDE for generating/editing
and iteratively debugging a MULTI - AGENT work- tools (code), and a a canvas-based visual layout
flow, and deploying it as an API endpoint to address of workflows with drag-and-drop interaction for
a concrete task (generating books). associating entities in the workflow.

6 Emerging Design Patterns and 6.2 Debugging and Sensemaking Tools


Research Directions
Provide robust tools to help users debug,
In the following section, we outline some of the interpret, and rationalize the behavior and
high-level emerging patterns which we hope can outputs of multi-agent systems.
help inform the design of no-code interfaces for
building next-generation multi-agent applications. Multi-agent workflows can be brittle and fail for
multiple reasons, ranging from improperly config-
6.1 Define-and-Compose Workflows
ured models to poor instructions for agents, im-
Allow users to author workflows by proper tool configuration for agents or termination
defining components and composing conditions. A critical request has been for tools
them (via drag-and-drop actions) into to help users debug and make sense of agent re-
multi-agent workflows. sponses.

6.3 Export and Deployment


A multi-agent system can have a wide array of
parameters to configure. We have found that select- Enable seamless export and deployment
ing the right visual presentation of the workflow to of multi-agent workflows to various plat-
helping users understand what parameters to config- forms and environments.
ure (discovery), and how to configure them. Specif-
ically, we have found that a define-and-compose While a no-code tool like AUTO G EN S TUDIO
enables rapid iteration and demonstration of work- control or between homogeneous and heteroge-
flows, the natural progression for most use cases neous agents.
is that developers want to replicate the same out-
comes but integrated as parts of their core appli- • Optimizing of multi-agent systems: Research
cations. This stage requires seamless export and directions here include the dynamic generation
deployment of multi-agent workflows to various of agents based on task requirements and avail-
platforms and environments. able resources, tuning workflow configurations
to achieve the best performance, and adapting
6.4 Collaboration and Sharing agent teams to changing environments and user
preferences. Furthermore, how can we leverage
Facilitate user collaboration on multi- human oversight and feedback to improve agent
agent workflow development and allow reliability, task performance and safety?
easy sharing of creations within the com-
munity.
8 Conclusion
This paper introduced AUTO G EN S TUDIO, a no-
Collaboration and sharing are key to accelerat- code developer tool for rapidly prototyping, debug-
ing innovation and improving multi-agent systems. ging, and evaluating multi-agent workflows. Key
By enabling users to collaborate on workflow de- features include a drag-and-drop interface for agent
velopment, share their creations, and build upon workflow composition, interactive debugging capa-
each other’s work, a more dynamic and innova- bilities, and a gallery of reusable agent components.
tive development environment can be cultivated. Through widespread adoption, we identified emerg-
Tools and features that support real-time collab- ing design patterns for multi-agent developer tool-
oration, version control, and seamless sharing of ing - a define and compose approach to authoring
workflows and components are essential to foster workflows, debugging tools to make sense of agent
a community-driven approach. Additionally, offer- behaviors, tools to enable deployment and collabo-
ing a repository or gallery where users can publish rative sharing features. AUTO G EN S TUDIO lowers
and share their workflows, skills, and agents pro- the barrier to entry for multi-agent application de-
motes communal learning and innovation. velopment, potentially accelerating innovation in
the field. Finally we outline future research direc-
7 Future Research Directions tions including developing offline evaluation tools,
ablation studies to quantify the impact of MULTI -
While we have explored early implementations
AGENT systems design decisions and methods for
of the design requirements mentioned above, our
optimizing multi-agent systems.
efforts in building AUTO G EN S TUDIO have also
identified two important future research areas and 9 Ethics Statement
associated research questions.
AUTO G EN S TUDIO is designed to provide a no-
• Offline Evaluation Tools: This encompasses code environment for rapidly prototyping and test-
questions such as how can we measure the per- ing multi-agent workflows. Our goal is to responsi-
formance, reliability, and reusability of agents bly advance research and practice in solving prob-
across tasks? How can we better understand lems with multiple agents and to develop tools that
their strengths and limitations? How can we ex- contribute to human well-being. Along with AU -
plore alternative scenarios and outcomes? And TO G EN , AUTO G EN S TUDIO is committed to im-
how can we compare different agent architec- plementing features that promote safe and reliable
tures and collaboration protocols? outcomes. For example, AUTO G EN S TUDIO of-
fers profiling tools to make sense of agent actions
• Understanding and quantifying the impact and safeguards, such as support for Docker envi-
of multi-agent system design decisions: These ronments for code execution. This feature helps
questions include determining the optimal num- ensure that agents operate within controlled and se-
ber and composition of agents for a given prob- cure environments, reducing the risk of unintended
lem, the best way to distribute responsibilities or harmful actions. For more information on our
and coordinate actions among agents, and the approach to responsible AI in AutoGen, please re-
trade-offs between centralized and decentralized fer to transparency FAQS here. Finally, AUTO G EN
S TUDIO is not production ready i.e., it does not Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadal-
focus on implementing authentication and other lah, Ryen W White, Doug Burger, and Chi Wang.
2023. Autogen: Enabling next-gen llm applications
security measures that are required for production
via multi-agent conversation framework. arxiv.
ready deployments.
Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin
Acknowledgements Weng, Zhoumianze Liu, Shunyu Yao, Tao Yu, and
Lingpeng Kong. 2024. Os-copilot: Towards gener-
We would like to thank members of the open-source alist computer agents with self-improvement. arXiv
software (OSS) community and the AI Frontiers preprint arXiv:2402.07456.
organization at Microsoft Research for discussions Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak
and feedback along the way. Specifically, we would Shafran, Karthik Narasimhan, and Yuan Cao. 2022.
like to thank Piali Choudhury, Ahmed Awadallah, React: Synergizing reasoning and acting in language
Robin Moeur, Jack Gerrits, Robert Barber, Grace models. arXiv preprint arXiv:2210.03629.
Proebsting, Michel Pahud, Qingyun Wu, Harsha
Nori and others for feedback and comments.

References
Harrison Chase. 2022. LangChain. Github.

Victor Dibia. 2023. Lida: A tool for automatic gener-


ation of grammar-agnostic visualizations and info-
graphics using large language models. arXiv preprint
arXiv:2303.02927.

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenen-


baum, and Igor Mordatch. 2023. Improving factual-
ity and reasoning in language models through multia-
gent debate. arXiv preprint arXiv:2305.14325.

Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii


Khizbullin, and Bernard Ghanem. 2024. Camel:
Communicative agents for" mind" exploration of
large language model society. Advances in Neural
Information Processing Systems, 36.

Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang,


Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, and
Shuming Shi. 2023. Encouraging divergent thinking
in large language models through multi-agent debate.
arXiv preprint arXiv:2305.19118.

Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paran-


jape, Michele Bevilacqua, Fabio Petroni, and Percy
Liang. 2024. Lost in the middle: How language mod-
els use long contexts. Transactions of the Association
for Computational Linguistics, 12:157–173.

Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christo-


foros Nalmpantis, Ram Pasunuru, Roberta Raileanu,
Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu,
Asli Celikyilmaz, et al. 2023. Augmented language
models: a survey. arXiv preprint arXiv:2302.07842.

Bo Qiao, Liqun Li, Xu Zhang, Shilin He, Yu Kang,


Chaoyun Zhang, Fangkai Yang, Hang Dong, Jue
Zhang, Lu Wang, et al. 2023. Taskweaver:
A code-first agent framework. arXiv preprint
arXiv:2311.17541.

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu,


Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang,
A Jack the Software Engineer Persona of his AssistantAgent, but still doesn’t get pages
Use Case with more than 3 sentences across interactive tests.
He recalls that using more agents can help sep-
Jack is a junior software engineer who has recently arate focus and improve task performance. He
joined SoftwareCon. As part of his tasks, he is then switches to creating 4 agents: a UserProxy,
required to create an application that can generate a a ContentAssistant with detailed instructions on
variety of short books. The initial version should fo- generating the content for each page, a QualityAs-
cus on generating children’s books (age 5 -8 years suranceAssistant to verify the pages meet parame-
old) based on a given query (e.g., create a book for ters, and an ImageGeneratorAssistant focused on
kids on how the sun works) with the expectation generating images for the book. He then creates a
of being generalized to support other generic tasks. GroupChat agent and adds his list of agents to it.
Jack has heard about a MULTI - AGENT approach to Next, he creates a new workflow where the receiver
building systems that can address a variety of tasks is the GroupChat agent and tests the application
through autonomous collaboration between agents. across a few tries. Jack is satisfied with the results
To explore this approach, he begins by perusing as full-page stories are now generated correctly.
the AUTO G EN S TUDIO documentation, installs it, In addition, Jack is concerned about costs but can
launches the UI, and performs the following steps: easily use the observe message button to explore
duration, tokens used by agents, tool/skill use and
A.1 Step 1: Define and Compose a Workflow
LLM dollar costs for each task run.
Jack starts with the Build view, where he reviews
the default skills that come with AUTO G EN S TU - A.3 Step 3: Export and Share
DIO . He sees that there are two relevant skills At this point, Jack has two final tasks: he wants to
generate_pdf s and generate_images. He veri- share his work with colleagues for feedback and
fies that he has the appropriate API keys for the then provide an API they can prototype with. AU -
generate_image skill. Next, he creates a GPT3.5 TO G EN S TUDIO makes sharing easy; First, Jack
model and adds an API key. can simply export and share a link to successful ses-
Following best practices, Jack knows that the sions. Second, he can also download his workflow
basic agent team with AUTO G EN consists of a and share it with colleagues, saving it in a version
UserProxyAgent that can execute code and an As- control system like Git. Third, he can spin up an
sistantAgent that can solve tasks as well as write API endpoint where the agents can respond to task
code or call available tools/skills. He creates both requests using cli commands ‘autogenstudio serve
of these agents; for his AssistantAgent, he ensures –port 8000‘. He can also spin up a docker container
that he attaches the GPT4 model he created previ- using the AUTO G EN S TUDIO serve command and
ously and also attaches both skills. Jack moves on scale it on any platform of his choice (Azure, AWS,
to the workflow tab and creates a new autonomous GCP, Hugging Face).
chat workflow where he specifies the UserProxyA-
gent as the initiator and his AssistantAgent as the
receiver.

A.2 Step 2: Test and Iterate


Within the workflow tab, Jack tests the workflow
immediately and quickly observes a few issues. Us-
ing the profiler tool and visualization of messages
exchanged by the agents, he notices that there seem
to be quality issues with the content of the book -
namely, the AssistantAgent seems to generate very
short messages and hence the book pages contains
only 2 sentences per page whereas the requirements
state that the kids are slightly older and can read
much longer text.
To remedy these issues, Jack takes two actions.
First, he attempts to extend the base instructions

You might also like