- The paper introduces EvoSpark, a framework that sustains coherent long-horizon narrative evolution through event-driven memory and dynamic spatial alignment.
- It integrates a Unified Narrative Operation Engine, Socio-Evolutionary Cognitive Base, and Generative Mise-en-Scène to resolve memory stacking and narrative-spatial dissonance.
- Experimental results show dominant role performance with >80% win rates and robust agent coherence over extended simulations, setting a new benchmark.
EvoSpark: Sustaining Long-Horizon Narrative Evolution in Endogenous Agent Societies
Traditional LLM-based multi-agent story simulators fail to sustain logical coherence and socio-cognitive evolution over long narrative horizons due to two critical deficits: (1) social memory stacking, where static or append-only memory systems accumulate conflicting relational states, and (2) narrative-spatial dissonance, wherein agent actions and narrative progression are decoupled from spatial context, undermining logical scene continuity. Furthermore, the paradigm bifurcation between rigid script-based planning and open-ended generative emergence has stunted adaptive narrative trajectory control, constraining the expressive and structural potential of agent-based narrative generation.
EvoSpark Framework
To address these deficits, EvoSpark establishes a paradigm-agnostic substrate explicitly designed for unified, logically coherent long-horizon narrative evolution. The architecture integrates (a) the Unified Narrative Operation Engine (NOE), (b) a Socio-Evolutionary Cognitive Base (RSB), (c) a Generative Mise-en-Scène mechanism (GMS) for spatial alignment, and (d) the Emergent Character Grounding Protocol (ECGP) connecting stochastic hallucination to structured world-building.
Figure 1: The EvoSpark architecture: from narrative conception and macro-planning through modularized world/agent instantiation to simulation/evolution with episodic simulation and stratified memory updates.
The NOE operationalizes user-specified narrative premises into polymorphic narrative scaffolds, adapting to hierarchical (HDP), sequential (SNP), or open-ended (Free EN) paradigms. Scene and role instantiation is governed by the Genesis and Architect agents, while the Director agent orchestrates simulation loop execution, enforcing dynamic spatial and logical alignment for all agents. Role agents enact endogenous interactions, metabolizing experiential data within the RSB in real time.
Generative Mise-en-Scène and Spatial-Logical Consistency
Figure 2: The Director Agent, as part of GMS, enforces spatially precise and logically consistent narrative interactions through real-time entity resolution and grounding.
EvoSpark mitigates narrative-spatial dissonance through the GMS, which acts as a virtual stage manager via both offline and dynamic (online) alignment. Offline, the Genesis Agent establishes initial Role-Location-Plot congruence; online, the Director Agent continuously enforces entity resolution to correct LLM-induced hallucinations and maintain spatial coherence across agent trajectories and scene transitions.
Endogenous Character Emergence
The ECGP reframes LLM stochastic hallucination ("sparking") as a creative driver: when a role is hallucinated out-of-schema, the Director Agent validates, resolves, and promotes the entity into a full-fledged agent through ontological promotion. The Architect Agent grounds these emergent agents in the world schema and initializes their cognitive substrate, transforming ephemeral narrative artifacts into persistent, evolving actors.
Figure 3: Event-driven Reflect-Synthesize-Consolidation mechanism for metabolizing episodic experiences into consistent agent cognition, resolving contradictions and updating social graphs in-place.
To prevent social memory stacking and cognitive drift, the RSB—core of the Stratified Narrative Memory (SNM)—implements a hierarchical, event-driven memory architecture. The Reflect-Synthesize-Consolidate cycle ingests episodic buffers, computes deltas against extant socio-cognitive states, purges obsolete relational ties, and updates the agent's internal snapshot, ensuring current agent profiles remain consistent with cumulative social evolution.
Experimental Results
EvoSpark was evaluated across six narrative domains, three control paradigms, and multiple LLM backbones, utilizing pairwise LLM-as-a-Judge protocols and human validation. Results indicate sustained superiority over Open-Theatre, BookWorld, and HoLLMwood baselines, both in structured and open-ended settings.

Figure 4: EvoSpark exhibits consistently higher win rates and tie rates than baseline systems across narrative modes, indicating robust logical and social coherence.
Figure 5: Evolutionary alignment (win rates and tie rates) of EvoSpark and model variants over 1, 5, 10 events, demonstrating durable long-horizon narrative action consistency.
EvoSpark achieves dominant win rates—often >80%—on metrics of role performance, immersion, and logical consistency, especially with reasoning-capable LLMs (Gemini-2.5-Pro, DeepSeek-v3.2-Think). Its advantage is particularly pronounced in long-horizon evaluation: action alignment and social evolution remain robust over 10-event sequences, a context in which static or script-driven models degrade.
Figure 6: EvoSpark excels across all six narrative domains and three paradigms; the framework generalizes to both synthetic and canonical literary domains.
Ablation studies confirm the foundational role of GMS for narrative congruence and the importance of ECGP in enabling immersive, creative agent societies. Disabling the RSB yields pronounced degradation over extended horizons—substantiating its necessity for mitigating accumulated memory conflicts.
Figure 7: Metric-level win rates for EvoSpark show distributed gains across role fidelity, immersion, narrative resonance, logical consistency, soundness, creativity, and plot advancement.
Figure 8: Detailed average scores (1–5) confirm EvoSpark’s consistent lead on all evaluation dimensions across LLMs and modes.
Theoretical and Practical Implications
EvoSpark formally demonstrates that structural, event-driven "cognitive metabolism" is indispensable for sustaining consistent agent societies over unbounded simulations. The reframing of stochastic hallucination as a generative resource—rather than a defect—enables robust world expansion without logical incoherence. Its results further support that effective, dynamic spatial alignment is critical for multi-agent narrative believability, a standard not captured by traditional semantic benchmarks.
Practically, EvoSpark enables the scalable simulation of agent societies for narrative design, open-ended world modeling, and rigorous computational social science—unlocking applications from interactive fiction to emergent collective behavior studies. Future developments could integrate online human-agent interaction optimization and resource-constrained model adaptation, extending EvoSpark’s paradigm to mixed-initiative or adversarial contexts.
Conclusion
EvoSpark establishes a rigorous, coherent framework for long-horizon narrative evolution in LLM-based agent societies. Its event-driven memory metabolism, spatial-logical alignment, and endogenous emergence of new agents address the core limitations of prior approaches. The framework’s design and results set a benchmark for sustainable, controllable, and expressive narrative simulation—providing methodological foundations for future research on open-ended generative autonomy and societal modeling with artificial agents (2604.12776).