Papers
Topics
Authors
Recent
Search
2000 character limit reached

LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation

Published 12 Dec 2024 in cs.AI | (2412.09237v2)

Abstract: The believable simulation of multi-user behavior is crucial for understanding complex social systems. Recently, LLMs-based AI agents have made significant progress, enabling them to achieve human-like intelligence across various tasks. However, real human societies are often dynamic and complex, involving numerous individuals engaging in multimodal interactions. In this paper, taking e-commerce scenarios as an example, we present LMAgent, a very large-scale and multimodal agents society based on multimodal LLMs. In LMAgent, besides freely chatting with friends, the agents can autonomously browse, purchase, and review products, even perform live streaming e-commerce. To simulate this complex system, we introduce a self-consistency prompting mechanism to augment agents' multimodal capabilities, resulting in significantly improved decision-making performance over the existing multi-agent system. Moreover, we propose a fast memory mechanism combined with the small-world model to enhance system efficiency, which supports more than 10,000 agent simulations in a society. Experiments on agents' behavior show that these agents achieve comparable performance to humans in behavioral indicators. Furthermore, compared with the existing LLMs-based multi-agent system, more different and valuable phenomena are exhibited, such as herd behavior, which demonstrates the potential of LMAgent in credible large-scale social behavior simulations.

Summary

  • The paper presents LMAgent, a scalable multimodal simulation framework that integrates self-consistency prompting and fast memory to reduce computational resource usage by 40%.
  • It employs a small-world network to initialize agent interactions, achieving emergent social behaviors and a 29%+ improvement in purchase prediction accuracy.
  • Experimental evaluations show that LMAgent matches human-level simulation fidelity, with agents demonstrating realistic herd dynamics and consumer patterns.

Large-scale Multimodal Agent Societies for Multi-user Simulation

Introduction and Motivation

The simulation of multi-user behavior at scale is central for analyzing and understanding complex social systems. Prior work on multi-agent systems driven by LLMs has focused primarily on narrow, text-based interaction domains with relatively limited agent populations. "LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation" (2412.09237) advances this field by structurally combining multimodal LLMs, self-consistency prompting, scalable memory mechanisms, and realistic network topology initialization. The architecture models large societies with thousands of agents, each exhibiting individual personas and interacting through textual and visual modalities. Figure 1

Figure 1

Figure 1: (a) Previous multi-agent systems limited to text and few agents; (b) LMAgent enables multimodal, ten-thousand-scale agent societies.

The work takes e-commerce as a canonical scenario, embedding agents capable of chat, multimodal product browsing, purchasing, reviewing, and even live-streaming activities. Beyond baseline human-like behavioral proxies, LMAgent demonstrates emergent social phenomena such as herd behaviors in purchasing, establishing its relevance for both application-oriented and theory-driven research.

System Architecture and Technical Design

LMAgent's framework revolves around two mutually influencing behavioral dimensions: internal (persona generation, memory, goal planning, reflection) and external (multimodal social/shopping actions). Each agent maintains a distinct persona, with attributes sampled realistically and preferences inferred by LLMs. Memory is organized into sensor, short-term, and long-term phases, with a custom "fast memory" mechanism boosting computational efficiency—critical for scaling up to 10,000+ agents. Selective multimodal LLM calls are reserved for complex behaviors, yielding a reported 40% reduction in system resource use. Figure 2

Figure 2: LMAgent sandbox: Agents possess individual memory/persona, operate via multimodal interactions, and self-organize using a small-world network for realistic social connectivity.

Agents plan future actions and reflect on past trajectories, drawing from both compressed episodic memory and high-level insights. Decision-making leverages a two-stage self-consistency prompting pipeline: first, internal summaries derived via chain-of-thought techniques, then final multimodal prompts combining persona and environmental data. This division is specifically intended to heighten decision fidelity and consistency under multimodal complexity.

External behaviors span granular shopping actions (browse, search, page, details, purchase), routine social exchanges (chat, post), and specialized modalities like live-streaming product endorsements. The multimodal LLM engine integrates textual and visual content, enabling comprehensive agent interactions and environmental perception.

Social Network Initialization and Information Dissemination

The agents' social connections are initialized using a small-world network model (Watts-Strogatz), chosen for its high clustering coefficient and low diameter, mirroring empirical properties observed in human societies under six-degree separation theories. This design enables both localized clustering and rapid, far-reaching information propagation. Figure 3

Figure 3: Network topology visualizations: Small-world structure enables clustering and efficient communication, matching real-world social networks.

Comparative analysis reveals the small-world topology achieves faster information dissemination than regular graphs, while retaining realistic localized community structure absent in random graphs. This enhances the credibility and efficiency of simulated societies.

Experimental Results and Behavioral Fidelity

Quantitative Benchmarking

LMAgent agents were quantitatively benchmarked on simulated purchase prediction (using a large Amazon dataset), outperforming classic and RL-based recommendation baselines (Embedding, Collaborative Filtering, RecSim, RecAgent). LMAgent demonstrated an average accuracy improvement of 29.34% versus baselines; in more challenging scenarios, improvements exceeded 32.8%. This underscores the efficacy of multimodal prompting and self-consistency mechanisms for high-fidelity simulation in consumer domains.

Behavioral Analysis

Agents' sequential behaviors and generated content were evaluated against human-controlled agents across multiple qualitative dimensions—believability, knowledge, personalization, social norms, and influence. LMAgent matched human scores within 0.3 points for several axes (personalization, social norms, influence), suggesting high simulation fidelity. GPT-4 scoring revealed a preferred bias toward LMAgent-generated content, corroborating but not substituting human assessment. Figure 4

Figure 4: Example evaluation of generated social behaviors for naturalness; LMAgent content closely aligns with human standards.

Insertions of positive social information into memory increased target product purchase probability by 6.63%; negative signals reduced it by 41.37%. Live-streaming endorsements produced similar promotional effects as peer recommendations. These findings indicate robust modeling of nuanced social influence.

Ablation studies confirmed that the fast memory mechanism increased efficiency (reducing token usage by ~40%) without notable loss of performance. Self-consistency prompting and multimodal integration were necessary for maximal behavioral accuracy.

Emergent Behaviors and Herd Dynamics

Large-scale simulations (agents in multiples of 10, 100, 1,000, and 10,000) generated purchasing distributions closely resembling real user co-purchase patterns (JD.com dataset). PMI analysis confirmed alignment with real-world category associations (high intra-category, strong cross-category, negative between certain classes). At the 10,000-agent scale, herd effects became pronounced, with top products accounting for disproportionately large fractions of purchases.

Practical and Theoretical Implications

LMAgent provides a scalable, flexible foundation for empirical investigation into user-level and group-level phenomena in artificial societies. This enables credible experimentation on emergent macro-level behaviors, the impact of network topology, efficiency of information dissemination, and the dynamics of social influence. The system's multimodal capability further progresses toward authentic behavioral realism, with measured performance approaching human standards.

From a theoretical standpoint, LMAgent opens avenues for agent-based simulation research in economics, social psychology, and computational social science, where large-scale, multimodal, interactive environments are required. Its architecture is extensible to domains beyond e-commerce, such as urban planning, collective opinion formation, and collaborative task solving.

Future Directions

The integration of more advanced multimodal LLMs, potential adaptation of additional cognitive models (e.g., emotional or motivational systems), and extension into diverse interaction contexts (education, governance, disaster response) will likely broaden the applicability and utility of LMAgent. Further research is needed on robust benchmarking, real-time interaction, and agent-external tool usage. As LLMs continue to evolve, agent societies such as LMAgent will likely approximate increasingly intricate facets of human collective behavior.

Conclusion

LMAgent establishes a high-capacity platform for large-scale multimodal agent-based behavioral simulation, demonstrating high fidelity to human behavioral indicators and emergent group phenomena. Its innovations in memory management, self-consistency prompting, and network initialization mark substantial advancement for artificial societies research, offering new capabilities for both applied simulations and theoretical explorations of social systems.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 228 likes about this paper.