LMAgent: Scalable Multimodal Agent Societies

Updated 19 November 2025

LMAgent is a scalable architectural paradigm for simulating multimodal agent societies, featuring agents with unique personas, memory structures, and robust interaction capabilities.
It employs fast memory mechanisms and a two-stage self-consistency prompting framework to enhance simulation accuracy and efficiency in complex environments.
Empirical evaluations show improvements in purchase accuracy and behavioral realism, with emergent phenomena like herd behavior and co-purchase associations validating the system.

LMAgent refers both to an architectural paradigm and, in specific instances, an implementation of large-scale multimodal agent societies where each computational agent is powered by a LLM and exhibits individual or collective intelligence in complex environments. While the term is sometimes used generically in the context of LLM-based agents, a seminal reference implementation is “LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation” (Liu et al., 2024), which demonstrates the capacity of expansive, multi-agent LLM systems to simulate realistic social, commercial, and behavioral phenomena at scale.

1. Definition and Motivation

LMAgent denotes an approach in which a large population of autonomous, multimodal agents is instantiated in a simulated environment, each agent endowed with a unique persona, memory structure, and set of interaction capacities. Such societies are engineered to approximate the heterogeneity, dynamism, and multimodal sensorimotor streams of real-world human collectives. The motivation is to credibly simulate complicated social systems, derive insights into user-group behavior, emergent phenomena, and to stress-test autonomous AI systems in realistic, multi-user contexts (Liu et al., 2024). Advances in LLMs with multimodal perception (e.g., text, images) furnish the foundation for these agents’ intelligence and versatility.

2. System Architecture and Agent Modeling

Each LMAgent is embedded in a sandbox society $\mathcal{G} = (\mathcal{V}, \mathcal{A})$ , with $\mathcal{V}$ the set of $N$ agents and $\mathcal{A}$ an adjacency matrix encoding agent interactions, frequently generated using a Watts–Strogatz small-world model to induce realistic social topology and scalable communication.

Agent Internal Structure:

Persona ( $pro_i$ ): Immutable attributes, e.g., demographic variables, preferences.
Memory ( $\mathcal{M}_i$ ): Partitioned into sensor, short-term, long-term, and cached “memory bank” subsystems.
Planner/Reflection Modules: Responsible for internal goal-setting and high-level behavioral abstraction.
Behavior Module: Facilitates multimodal actions including chat, posting, browsing, purchasing, and live streaming.

Input Modalities:

Textual: e.g., chat histories, product descriptions.
Visual: e.g., product images, interface snapshots (base64-encoded).
Event Streams: Systemic events signaled within the environment.

Underlying intelligence is provided by powerful multimodal LLMs (e.g., GPT-4 with vision), utilizing dedicated prompt functions for self-reflection ( $f_s$ ), environment-grounded decision ( $f_e$ ), memory compression ( $f_c$ ), and importance estimation ( $f_r$ ) (Liu et al., 2024).

3. Self-Consistency Prompting and Memory Mechanisms

A defining technical innovation is the two-stage self-consistency prompting framework. Decision-making for each agent is decoupled as follows:

Internal Reasoning: $c^p_i = f_s(pro_i, o_i)$ , with $o_i$ the latest observation, producing a contextual summary $P_1$ .
Externally-Grounded Decision: $a_i = f_e(P_1, \mathcal{A}_i)$ , integrating current environmental input.

This staged approach yields improved robustness and decision reliability relative to monolithic prompting, empirically increasing purchase simulation accuracy by $+4.27$ points over single-stage controls (Liu et al., 2024).

The fast memory mechanism is modeled on cognitive neuroscience paradigms:

Sensor $\rightarrow$ Short-Term: Compress new observation via LLM, store as record $m^s_i=(c^s_i, e_i, I_i, t_i)$ , where $e_i$ is the embedding, $I_i$ the assigned importance, and $t_i$ the timestamp.
Short-Term $\rightarrow$ Long-Term: Upon detection of at least $K$ highly similar events (cosine similarity $>$ threshold $\tau$ ), elevate to long-term storage.
Memory Bank and Forgetting: Frequent, low-variety “basic behaviors” are cached to sidestep LLM calls, decreasing resource consumption by $40\%$ during large-scale simulation; retention for long-term memory decays according to $f(m^l_i) = 1 - \frac{(\hat t_i + I_i)}{2} \cdot \max(I_i^\beta, \delta)$ .

By integrating these subsystems, simulations with over $10\,000$ agents were operationalized, with scalability facilitated by memory token budgeting and behavioral caching (Liu et al., 2024).

Agent relationships are modeled via small-world graphs, instantiated using the Watts–Strogatz algorithm:

Each agent is connected to $k$ nearest neighbors, and edges are rewired with probability $p$ to inject randomness while preserving short path-lengths and high clustering.
This architecture closely matches empirical human social networks, yielding scalable $O(kN)$ graph construction and $O(\deg(a_i))$ per-agent communication complexity.

Simulation Loop (per epoch):

Optional planning/reflection based on current $\mathcal{M}_i$ .
Apply self-consistency prompting to select next action from $\mathbb{A} = \{\text{chat, post, browse, search, ...}\}$ .
Execute action, updating system and agent states.
Update memory via compression/caching; event appended to global log $\mathcal{L}$ .

Token usage remains linear with population size, with a $40\%$ reduction against naïve designs at $N=100$ agents and preservation of behavioral accuracy (−0.28 pt drop) (Liu et al., 2024).

5. Behavioral Metrics, Evaluation, and Emergent Phenomena

LMAgent frameworks are evaluated across a spectrum of behavioral and content metrics, with human or LLM judges assigning $1$–$5$ scale ratings:

Metric	LMAgent	Human	Baseline (RecAgent)
Purchase Accuracy	73.04%	—	58.44%
Behavior Chains	4.30	4.60	3.08 (Random)
Content Quality	4.47	4.52	—

Key findings include:

Purchase-accuracy improvement of $+25\%$ (relative) over strong baselines.
Behavioral indicators (believability, personalization) approaching human performance.
Self-consistency prompting and fast memory jointly enable credible scaling without degrading action naturalness or expressiveness.

Notably, the system exhibits emergent phenomena:

Herd behavior: In $10\,000$ -agent simulations, the top-ranked product captures $\approx30\%$ of all purchases, approximately double that of small-population runs.
Co-purchase associations: Positive intra-category, cross-category, and negative inter-category mutual information patterns align with observed real data, providing face validity for LMAgent society simulations (Liu et al., 2024).

6. Limitations, Extensions, and Open Questions

Reported limitations:

Dependence on proprietary LLM APIs, introducing cost, privacy, and platform bias constraints.
Manual prompt/capacity tuning for agent behaviors; behavioral repertoire fixed in evaluated scenarios.
Evaluations have, to date, focused on e-commerce; generalizability to other domains (urban traffic, education, pandemic modeling) is proposed but requires extension.

Potential future directions and research questions include:

Adaptation to domains requiring alternate action sets or environmental dynamics, e.g., decentralized governance, adversarial agents, or chain-of-thought in audio/video settings.
Integration of explicit reward signals to facilitate reinforcement learning-based emergent goal allocation within agent societies.
Theoretical analysis of metric convergence (e.g., clustering, influence propagation) in LLM-driven societies.
Calibration against real-world data beyond co-purchase signal matching.
Mitigation of LLM-induced biases and ensuring fairness in simulated social interactions.

7. Broader Context and Systemic Integration

LMAgent architectures connect to generalized LLM-Agent paradigms proposed in frameworks such as LLM-Agent-UMF (Hassouna et al., 2024), which advocates modular decomposition into LLMs, core-agent logic (with planning, memory, profile, action, security), and tool integrations. While LMAgent (Liu et al., 2024) focuses on large-scale societal simulation, its design and observed empirical regularities inform the ongoing development of unified agent modeling standards, including communication protocols (e.g., LACP (Li et al., 26 Sep 2025)) for interoperability and safety in multi-agent AI systems.

LMAgents exemplify the frontier of AI research into agent societies, demonstrating that autonomous, persona-driven, multimodal LLM agents—when organized into scalable, efficiently managed societies—can reproduce human-level behavioral richness, systematic social phenomena, and supply a testbed for understanding complex collective intelligence.