Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Agent Systems Powered by Large Language Models: Applications in Swarm Intelligence (2503.03800v1)

Published 5 Mar 2025 in cs.MA, cs.AI, cs.CL, and cs.LG

Abstract: This work examines the integration of LLMs into multi-agent simulations by replacing the hard-coded programs of agents with LLM-driven prompts. The proposed approach is showcased in the context of two examples of complex systems from the field of swarm intelligence: ant colony foraging and bird flocking. Central to this study is a toolchain that integrates LLMs with the NetLogo simulation platform, leveraging its Python extension to enable communication with GPT-4o via the OpenAI API. This toolchain facilitates prompt-driven behavior generation, allowing agents to respond adaptively to environmental data. For both example applications mentioned above, we employ both structured, rule-based prompts and autonomous, knowledge-driven prompts. Our work demonstrates how this toolchain enables LLMs to study self-organizing processes and induce emergent behaviors within multi-agent environments, paving the way for new approaches to exploring intelligent systems and modeling swarm intelligence inspired by natural phenomena. We provide the code, including simulation files and data at https://github.com/crjimene/swarm_gpt.

Integrating LLMs into Multi-Agent Systems (MAS) represents a novel approach for simulating complex systems, particularly within the domain of swarm intelligence. This methodology substitutes traditional, often rigidly coded agent behaviors with dynamic decision-making processes driven by LLM prompts, leveraging the models' capabilities in language understanding, reasoning, and embedded knowledge (Jimenez-Romero et al., 5 Mar 2025 ).

LLM-MAS Integration Architecture

The core of this integration involves utilizing an LLM as the central processing unit or "brain" for individual agents within a simulation environment. A practical toolchain facilitates this by connecting a standard Agent-Based Modeling and Simulation (ABMS) platform, such as NetLogo, with an LLM API, like OpenAI's GPT-4o (Jimenez-Romero et al., 5 Mar 2025 ). This connection is typically mediated through a scripting interface, such as NetLogo's Python extension.

The operational workflow proceeds cyclically for each LLM-driven agent at each simulation step:

  1. Environment State Encoding: The agent's local perception within the NetLogo environment (e.g., spatial coordinates, heading, proximity to neighbors, sensed environmental cues like pheromone gradients or food locations) is captured.
  2. Prompt Generation: The encoded state information is formatted into a structured textual prompt. This prompt includes not only the current state but also the instructions or rules the LLM should follow to determine the agent's next action.
  3. LLM Inference: The generated prompt is transmitted to the LLM via its API. The LLM processes the input text, interpreting the agent's situation and the provided instructions.
  4. Action Decoding: The LLM returns a response, often formatted as a structured object (e.g., JSON or Python dictionary), containing the suggested action(s) for the agent (e.g., move forward, turn right 15 degrees, drop pheromone). This response is parsed by the Python intermediary.
  5. Action Execution: The decoded action(s) are translated into commands executed by the agent within the NetLogo simulation environment, updating its state and influencing the environment for subsequent steps.

This closed-loop architecture allows agent behavior to be dynamically modulated by the LLM based on real-time environmental feedback, moving beyond pre-defined, static rule sets (Jimenez-Romero et al., 5 Mar 2025 ). Simulations can involve populations where all agents are LLM-driven, or hybrid systems combining LLM agents with traditional rule-based agents.

Prompt Engineering for Swarm Behaviors

Prompt engineering is paramount in this paradigm, as the structure and content of the prompt directly shape the LLM's interpretation and subsequent behavioral output. The paper highlights two distinct strategies for prompting swarm intelligence behaviors (Jimenez-Romero et al., 5 Mar 2025 ):

  1. Structured, Rule-Based Prompts: This approach involves providing the LLM with highly detailed, explicit conditional logic within the prompt itself, closely mirroring traditional programming. For instance, in ant foraging simulations, prompts contained specific instructions like: "IF carrying_food is true AND nest_scent > 0 THEN set_heading towards nest_scent_direction, move forward 1 unit. ELSE IF carrying_food is false AND pheromone_level > threshold THEN set_heading towards pheromone_direction, move forward 1 unit. ELSE rotate randomly +/- 30 degrees, move forward 1 unit." This method aims for deterministic outputs, often utilizing low API temperature settings (e.g., 0.0) to minimize stochasticity.
  2. Principle-Based, Knowledge-Driven Prompts: This strategy relies more on the LLM's inherent world knowledge and reasoning capabilities. Instead of exhaustive rules, the prompt provides high-level principles or goals. For bird flocking (Boids), prompts instructed agents to adhere to the classic flocking rules—separation, alignment, cohesion—based on perceived neighbor data (positions, headings). The LLM was expected to synthesize appropriate actions (adjust heading and speed) to satisfy these principles collectively, requiring less explicit definition of every possible scenario interaction.

Significant iterative refinement ("prompt tuning") is necessary for both strategies. This involves observing agent behavior in the simulation, identifying deviations from desired outcomes or simulation conventions (e.g., misinterpreting NetLogo's coordinate system or heading definitions), and adjusting the prompt's wording, structure, information content, and output format requirements (e.g., requesting a "rationale" field alongside the action). The prompts utilized were typically stateless ("zero-shot"), demanding that all necessary context be provided anew at each time step, simplifying LLM interaction but requiring comprehensive environmental encoding (Jimenez-Romero et al., 5 Mar 2025 ).

Application: Ant Colony Foraging

In the context of ant colony foraging, LLM-driven agents were implemented using structured, rule-based prompts designed to emulate the logic of a standard NetLogo Ant Foraging model (Jimenez-Romero et al., 5 Mar 2025 ). The environmental state provided included patch pheromone levels, nest scent direction/intensity, food presence, and whether the agent was carrying food.

Key findings include:

  • Behavioral Replication: LLM-driven ants successfully exhibited the fundamental emergent foraging pattern: exploration, food discovery, pheromone trail formation, recruitment of other ants, and transport of food back to the nest.
  • Quantitative Performance: In terms of total food collected over simulation time, the performance of colonies composed entirely of LLM-driven ants was generally comparable to colonies of traditional rule-based ants. Some runs indicated slightly different efficiency profiles, and LLM colonies tended to show lower variance in food collection metrics across multiple simulation runs.
  • Hybrid System Synergies: A notable result emerged from simulations involving hybrid colonies containing both LLM-driven and rule-based ants. These mixed populations frequently outperformed purely homogeneous colonies (either all LLM or all rule-based) in terms of food collection rates. This suggests potential synergistic effects, possibly arising from combining the deterministic efficiency of rules with potentially different exploration or adaptation strategies inherent in the LLM's decision-making process.

Application: Bird Flocking (Boids)

For simulating bird flocking behavior based on Reynolds' Boids model, principle-based prompts were employed (Jimenez-Romero et al., 5 Mar 2025 ). Agents were provided with information about nearby flockmates (distance, relative bearing, heading) and prompted to adjust their own heading and speed according to the principles of separation (avoid crowding), alignment (steer towards average heading), and cohesion (steer towards average position).

Key findings include:

  • Emergent Flocking: LLM-driven agents successfully demonstrated emergent flocking behavior, forming cohesive groups that moved with coordinated alignment. This validated the LLM's capacity to interpret and apply abstract behavioral principles based on its embedded knowledge.
  • Behavioral Differences: Subtle but consistent behavioral differences were observed between LLM-driven flocks and rule-based flocks:
    • LLM agents tended to maintain slightly larger inter-agent distances, suggesting a potential bias towards stronger separation or a different interpretation of proximity thresholds.
    • LLM agents showed a tendency to occupy positions more towards the periphery of the flock rather than the center.
    • Simulations reported significantly fewer "collisions" (defined as agents coming within a very close proximity threshold) in LLM-driven flocks, potentially indicating more conservative movement adjustments.
  • Prompt Sensitivity: Achieving stable and realistic flocking required careful prompt tuning, particularly in clarifying simulation-specific conventions like coordinate systems (global vs. agent-centric) and the interpretation of heading changes (absolute vs. relative turns), highlighting the LLM's sensitivity to ambiguous spatial instructions.

Implementation Considerations and Limitations

While demonstrating feasibility, the practical implementation of LLM-driven MAS faces significant hurdles (Jimenez-Romero et al., 5 Mar 2025 ):

  • Computational Overhead: The primary limitation is the substantial latency introduced by API calls to large, remote LLMs like GPT-4o. Each agent decision requires a round trip to the API, dramatically slowing down simulation speed compared to local rule execution (potentially by orders of magnitude). This limits the scalability in terms of the number of agents and simulation steps feasible within a reasonable timeframe.
  • Cost: Utilizing commercial LLM APIs incurs monetary costs based on token usage (both input prompt length and output response length). Large-scale simulations with many agents over extended periods can become financially prohibitive.
  • Mitigation Strategies: Potential solutions include leveraging smaller, potentially less capable, but faster and cheaper LLMs that can be hosted locally. This might involve fine-tuning smaller models on specific simulation tasks or accepting a trade-off between agent complexity and simulation performance.
  • Statelessness: The reliance on stateless prompts necessitates transmitting the full environmental context at every step, increasing token counts and potentially limiting the agent's ability to learn or adapt based on long-term history unless memory mechanisms are explicitly integrated into the prompt or state representation.

Conclusion

The integration of LLMs into MAS provides a viable, alternative pathway for modeling swarm intelligence phenomena. By leveraging prompt engineering, LLMs can effectively drive agent behavior based on both explicit rules and abstract principles, successfully replicating emergent behaviors like ant foraging and bird flocking (Jimenez-Romero et al., 5 Mar 2025 ). While LLM-driven agents perform comparably to traditional models in overall task execution, they can exhibit unique behavioral nuances. Furthermore, hybrid systems combining LLM and rule-based agents show potential for enhanced performance. However, significant practical challenges related to computational latency and cost currently limit the scalability of using large, cloud-based LLMs for extensive MAS simulations. Future work may focus on optimizing the toolchain, exploring smaller local models, and refining prompt strategies to harness the full potential of this approach.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Cristian Jimenez-Romero (4 papers)
  2. Alper Yegenoglu (5 papers)
  3. Christian Blum (26 papers)