Self-Organized-Agent
Self-Organized-Agent
1 Introduction
of automatic code generation techniques in the
In recent years, research on agents using Large eld of automated application and tool develop-
Language Models (LLMs) (Brown et al., 2020; ment (Hong et al., 2023; Dong et al., 2023; Huang
OpenAI, 2023; Touvron et al., 2023), such as Re- et al., 2023). Compared to non-agent-based meth-
Act (Yao et al., 2023b), Reexion (Shinn et al., ods (Muennighoff et al., 2023; Li et al., 2023b),
2023), Toolformer (Schick et al., 2023), and Auto- these research achievements have led to remark-
GPT 2 , has been expanding the possibilities of au- able performance improvements in automatic code
tomating human tasks. These advancements have generation (Zhong et al., 2024; Zhou et al., 2023).
particularly contributed to the rapid development Most recent research has focused on single-agent
1 approaches for code generation. These single-agent
Our code will be available at https://github.com/
tsukushiAI/self-organized-agent. code generation methods face limitations, espe-
2
https://github.com/Significant-Gravitas/ cially in terms of scalability, when the implemen-
tation becomes complex and requires a large code- tal results, we revealed how agents automatically
base. The main reason for this technical difculty is multiply according to the complexity of the prob-
that a single agent must manage the entire code gen- lem, effectively scaling up the overall code vol-
eration process alone. For instance, implementing a ume while keeping the code generation per agent
machine learning algorithm involves several stages, constant (§ 4.2). These experimental results sup-
such as data preprocessing, algorithm training, and port the contribution of our framework, which over-
result evaluation, which include many functions comes the scalability issues faced by single-agent
and classes. When these complex components are approaches and provides a solution capable of han-
combined, the codebase inevitably becomes very dling larger projects.
large. However, there are limitations to the con-
text length of LLMs, and as the number of input 2 Code Generation Task
tokens increases, the inference performance de-
The code generation task involves generating
creases (Levy et al., 2024; Shaham et al., 2023;
Python functions from docstrings (Chen et al.,
Li et al., 2023a). Consistently understanding and
2021). In this task, an agent is given a docstring
generating or modifying appropriate code for such
that denes the types of the function’s inputs and
an extensive codebase poses a signicant challenge
expected outputs, as well as the specic require-
for a single agent in terms of comprehending and
ments that the function should meet. The agent is
managing the context. Consequently, the single-
then required to generate the code for a function
agent approach struggles to efciently generate and
that fullls the specied functionality. The gener-
modify code as its complexity and size increase.
ated code is veried for accuracy using unit tests,
To tackle these challenges, we propose a self-
and the quality of the code is evaluated based on
organized multi agent framework that can auto-
its ability to pass the test cases. As with previ-
matically generate and modify large-scale code
ous studies (Shinn et al., 2023; Zhong et al., 2024;
(Figure 1). Self-organization (Ashby, 1947) is a
Zhou et al., 2023), we use the evaluation metric
phenomenon in which living organisms or matter
Pass@1 (Chen et al., 2021), where a problem is
create an orderly, large structure as a result of their
considered solved if any of the k code samples
individual autonomous behaviors, despite lacking
pass all test cases.
the ability to oversee the entire system. In our
framework, self-organized agents, each responsible 3 Self-organized Agent Framework
for different code parts or tasks, independently gen-
erate and modify code. With the self-organization Our Self-organized Agents (SoA) framework en-
of agents, a single agent no longer needs to com- ables efcient implementation of large-scale and
prehend the entire codebase, making it possible to complex code by having self-organized agents in-
scale up large-scale code simply by increasing the dependently generate and modify small-scale and
number of agents. Another feature of our frame- simple code. In this section, we introduce the im-
work is that agents automatically multiply accord- portant components of SoA, namely the agents and
ing to the complexity of the problem, allowing the layers responsible for more abstract process-
the overall codebase to expand while keeping the ing than the agents, and nally introduce the code
amount of code handled by each agent constant. generation and modication protocols in the SoA
These features enable the dynamic and exible gen- framework.
eration and modication of large-scale code, which
was impossible with the traditional single-agent 3.1 Child Agent
approach. Child agents implement a given function based on
In our experiments, we evaluated the perfor- its docstrings. As shown in Figure 2, this agent has
mance of this framework using HumanEval (Chen a simple structure consisting of two elements: an
et al., 2021), a benchmark for code generation. The LLM and memory. The LLM generates code from
results show that our self-organized multi-agent the given docstrings and modies the code based
framework outperformed Reexion (Shinn et al., on the results of unit tests. The memory stores the
2023), an existing powerful code generation agent code generated by the agent itself and retrieves the
(§4.1), demonstrating the effectiveness of our ap- latest code to be input to the LLM along with the
proach in generating and modifying code. Further- unit test feedback during code modication. If an
more, through a detailed analysis of the experimen- agent has these minimal specications, it is possi-
ble to use an off-the-shelf agents (e.g., Reexion) their Mother agent, they contribute to the creation
as a Child agent. We deliberately use a simple of a more optimized and large codebase.
agent to verify the effectiveness of SoA in a simple
setup. 3.2 Mother Agent
The Mother is an agent that generates new agents
Code Generation The main role of Child agents
(Mother or Child). Similar to Child agents, the
is to generate functions that meet the specications
Mother agent independently implements the spe-
based on the given function’s docstrings. As shown
cic Python function based on its given docstrings.
in Figure 2, the agent follows the instructions to
The Mother has memory, code generation capa-
generate the rest of the function and complete it.
bilities, and self-debugging functions, as same as
The completed function implementation is stored
Child agents. The unique feature of the Mother
in memory, and the unit tests for the function are
agent is its ability to generate multiple Child agents
also stored as they form the basis for future code
according to the complexity of the problem and del-
modications.
egate parts of the implementation to these agents.
Code Modication: Empowering Child Agents This structure allows the Mother agent to focus on
with Self-Organization and Adaptability One implementing abstract processes, while the Child
of the most remarkable aspects of agents in the agents generated by the Mother agent concentrate
SoA framework is their ability to autonomously on implementing concrete processes. This divi-
improve their code based on the state of nearby sion of labor enhances the overall efciency and
agents . This process sets SoA apart from tradi- exibility of the SoA framework.
tional agent approaches and showcases the power
Code Generation We explain the code genera-
of self-organization in code modication. While
tion process by the Mother agent using the imple-
existing agents like Reexion (Shinn et al., 2023)
mentation example of the is_sum_of_odds_ten
rely solely on the results of unit tests, Child agents
function shown in Figure 2. The starting point is
in SoA go beyond this limitation by independently
the function’s docstrings and unit tests, which are
observing the state of their mother agent, such as
memorized for reference in the later self-debugging
differences in modications and feedback. By gath-
phase. The rst task of the Mother agent is to
ering this invaluable information from their sur-
generate a skeleton of the implementation from
rounding environment, Child agents can adapt their
the given docstrings, including subfunctions such
behavior and make more informed decisions about
as get_odd_numbers to extract odd numbers and
code modication, even without explicit instruc-
sum_of_numbers to calculate their sum. The num-
tions. The modications and feedback generated
ber and types of these subfunctions are automati-
by the Mother agent serve as an important source
cally determined by the LLM based on the com-
of information for the Child agents. Armed with
plexity of the problem.
these insights, Child agents can more effectively
It is important to note that these subfunctions
modify their own code, contributing to the over-
are unimplemented, and the Mother agent does not
all improvement of the codebase in a way that is
directly implement them. Instead, it delegates the
both efcient and adaptive. Figure 3 illustrates this
implementation of the subfunctions to other agents,
process, which begins with the execution of unit
allowing the Mother agent to focus on generating
tests and the retrieval of the latest implementation
the skeleton and streamline its own code generation
from memory. The Child agent then harnesses the
process. After the docstrings and unit tests for the
power of the LLM to create a code modication
subfunctions are generated, they are assigned to
proposal, seamlessly combining the information ob-
newly initialized agents for implementation. These
served from the Mother agent with the test results
agents proceed with the implementation of their
and the latest implementation details. By storing
respective functions without looking at the inter-
the modied code in memory, Child agents cre-
nals of the is_sum_of_odds_ten function imple-
ate a feedback loop that continuously renes and
mented by the Mother agent. Since agents within
improves the codebase over time. This iterative pro-
the same Mother can work asynchronously, the
cess, driven by the principles of self-organization
overall code generation process is streamlined.
and adaptability, enables SoA to tackle complex
code modication tasks with efciency and effec- Code Modication The Mother’s code modica-
tiveness. As Child agents work in harmony with tion is almost the same as the Child’s code modi-
Child Docstrings Unit test Mother Docstrings Unit test
def get_odd_numbers(lst): def is_sum_of_odds_ten(lst):
''' ‘’’
Extracts the odd numbers Checks if sum of assert is_sum_of_odds_ten([1,9]) == True
assert get_odd_numbers([1,2,3]) == [1, 3] odd numbers is 10
from a given list of numbers.
‘’' ’’’
pass pass
Memory Memory
LLM LLM
Unit test Unit test
STEP1: STEP1:
Generate the code Generate the skeleton
Code Code
def get_odd_numbers(lst): def is_sum_of_odds_ten(lst):
''' ‘’'Checks if sum of odd numbers is 10
Extracts the odd numbers from a given ’’’
list of numbers. odd_numbers = get_odd_numbers(lst)
‘''
odd_numbers = [n for n in lst sum_odds = sum_numbers(odd_numbers)
if n % 2 != 0] STEP2: Update memory STEP2: Update memory
return odd_numbers return “Yes” if sum_odds == 10 else “No”
LLM
Figure 2: Overview of code generation. Child agents generate executable Python function from a given docstring.
The Mother agent generates the skeleton of the function. The Mother spawns a new initialized agent (Child or
Mother) and delegates unimplemented functions.
cation (Figure 3). It observes information from the Algorithm 1 in the appendix.
upper Mother and uses it to modify the functions
Code Generation The code generation process
it is responsible for. The only difference is that the
in the SoA framework begins with the function’s
feedback it generates and the code before and after
docstrings and unit tests. In the initial stage, there
modication are used by lower-level agents (Child
is only one initialized Mother agent, which is the
or Mother).
root of the tree structure. Based on the input doc-
3.3 Self-organized Agent Process strings and unit tests, it generates docstrings and
unit tests for subtasks and passes them to other
The Self-organized Agent (SoA) framework is a
agents it generates (see §3.2). If the tree structure
distributed framework in which multiple agents (in-
reaches a predetermined depth, the tasks are passed
cluding Mother agents and Child agents) repeatedly
to Child agents; otherwise, they are passed to newly
generate and modify functions. The core of this
generated Mother agents. By repeatedly prolifer-
framework lies in the principle of self-organization,
ating and increasing the number of agents until
where each agent functions independently without
the last agent, it is possible to generate large-scale
the need to directly observe the entire codebase.
code while keeping the amount of code managed
The hierarchical combination of Mother agents
by individual agents constant.
and Child agents forms an agent network that ef-
fectively constructs a single large-scale codebase. Code Modication Once code generation is
In this hierarchical structure, Mother agents de- complete, the process transitions to the code mod-
compose complex problems into more manageable ication phase. First, the implementations of all
smaller problems by dividing tasks and delegating agents are combined to create the nal implementa-
them to the agents they have generated. Although tion. This nal implementation is evaluated using
each agent is independent, the agents as a whole the unit tests provided to the root Mother, and feed-
can work efciently towards the implementation of back is generated from the results. Since there are
a single function. Despite the fact that the amount no agents higher than this root Mother, information
of code each agent generates, modies, and man- from higher-level agents as shown in Figure 3 is
ages is always small, the number of agents scales, not used. The modication process starts based on
allowing the amount of code generated to be in- this feedback and propagates information from the
creased indenitely according to the difculty of root Mother agent to the Child agents. Each agent
the problem. Detailed algorithms are presented in updates its implementation based on the received
Upper Mother Baselines We compare SoA with several state-of-
Agent state the-art code generation methods including Alpha-
(Self-feedbacks, New code, Old code)
Code (Li et al., 2022), Incoder (Fried et al., 2023),
Mother Codex (Chen et al., 2021), CoT (Wei et al., 2022),
and Gemini Pro (Anil et al., 2023). Additionally,
STEP1: Observe Memory
upper agent
Test
result Unit test
we evaluate the performance of various GPT-3.5-
LLM
based agents, such as ChatGPT, Self-Edit (Zhang
STEP2:
Generate self-feedbacks
Code
et al., 2023), and Reexion (Shinn et al., 2023).
(1) Return a Boolean value.
(2) In 'get_odd_numbers',
Old code
These baselines are chosen to represent a diverse
ignore negative numbers.
range of approaches, including single-agent and
STEP3:
Fix code
LLM
multi-agent systems, as well as those with and with-
def is_sum_of_odds_ten(lst):
‘’'Checks if sum of odd numbers is 10
out self-debugging capabilities.
’’’
STEP4:
odd_numbers = get_odd_numbers(lst) Update
memory
sum_odds = sum_numbers(odd_numbers)
Agent Conguration To evaluate the effective-
return True if sum_odds == 10 else False
ness of the SoA framework, we selected the Re-
Agent state exion agent as a baseline. Reexion iteratively
Child To other children/mothers
modies code based on the given docstrings and
automatically generated unit tests until it reaches
STEP1: Test Memory the maximum number of iterations or passes the
Observe mother agent result
LLM
Unit test
unit tests. The main difference between Reexion
STEP2:
Generate self-feedbacks
Code and SoA is that Reexion is composed of a single
(1) Ignore numbers that are
agent, while SoA is composed of self-organized
negative.
Old code STEP4:
Update
multiple agents. In the SoA conguration, we set
STEP3:
Fix code LLM memory
the maximum number of iterations for the learn-
def get_odd_numbers(lst):
''' ing loop to 8 and the maximum tree depth to 2.
Extracts the odd numbers from a given list of numbers.
‘''
odd_numbers = [n for n in lst if n % 2 != 0 and n > 0]
Additionally, following (Shinn et al., 2023), we
return odd_numbers
provided a few-shot trajectory to the LLM.
Figure 3: Overview of code modication. Agents Data and Tasks To evaluate the performance
(Mother/Child) observe the state of Mother (feedback, of automatic code generation, we used the Hu-
old code, and updated code) and use this information manEval (Chen et al., 2021) benchmark. Hu-
to improve the functions for which they are responsi- manEval is a set that includes diverse programming
ble. The state of the upper agent is used to modify code problems designed to measure the functional cor-
by lower agents within the hierarchy. This state propa-
rectness of generated code. We used the Python
gation promotes collaboration and information sharing
throughout the hierarchy, enabling efcient code modi- language set for evaluation and followed the evalua-
cation. tion methodology of Reexion (Shinn et al., 2023).
In this process, multiple test cases are created for
each generated code, and n test cases are randomly
feedback, generates new feedback, and transmits it selected to construct a test suite. This test suite is
to lower-level agents (see §3.2). Finally, the Child used to verify whether the generated code functions
agents update their own implementations, and the correctly. We set 6 unit tests for Reexion and 1
process terminates (see § 3.1). This series of pro- unit test for SoA.
cesses is repeated until a predetermined maximum
number of iterations is reached. 4.1 Main Results
Table 1 compares the Pass@1 accuracy of the pro-
4 Experiments posed method and the baseline. Comparing SoA
with Reexion, a strong baseline, SoA outperforms
LLM Selection We used GPT3.5-turbo3 for code
Reexion by 5% in Pass@1. Considering that each
generation and feedback generation.4
agent in SoA does not see the entire code, this is
3 a surprising result. This result suggests that self-
gpt3.5-turbo-1106
4
GPT-4 was not selected due to the high experimental cost organized agents can generate code that functions
required. well as a whole without needing to oversee the
Docstrings
def func5
def func2
def function1(x)
Agent Skeleton of Code Unit test
Unit test
“”” def func5
Docstrings def function1(x)
“”” 1. Generate Unit test
LLM ……
# Generate here
y = function2(x) def func5
Memory def func3 Unit test
def function1(x) 2. Memorize ……
Unit test def func5
z = function3(y)
Unit test
……
3. Self-debugging def func5
return function4(z)
Unit test Unit test
Unit test def func4
assert function(1) == 2 def func5
Unit test
Unit test
Figure 4: Overview of the SoA framework. Mother agents and Child agents hierarchically construct a network and
perform function generation and modication. Mother agents delegate tasks to other Mother agents or Child agents,
and each agent independently executes tasks while effectively implementing a single function as a whole.
75
Avg. # character
40
50 MetaGPT (Hong et al., 2023), ChatDev (Qian et al.,
25
20
2023), Self-collaboration (Dong et al., 2023), and
Average
Avg. Characters
300 150
Avg. character
200 100 our method takes a different and more exible ap-
100 50 proach. Instead of assigning xed occupational
Average