LLM Narrative Generation

Updated 22 January 2026

LLM-based narrative generation is a technique that leverages transformer models to structure, control, and produce narratives in interactive multi-agent environments.
It employs a systematic pipeline including scene serialization, metadata tagging, and rigorous prompt protocols to ensure coherent, executable agent plans.
Evaluations show that careful prompt design and hybrid neuro-symbolic methods enhance scalability, reliability, and real-time performance in narrative generation systems.

LLM-based narrative generation refers to the use of transformer-based LLMs to produce, structure, and control the content, structure, and style of stories or agent-based scenes. The past several years have seen rapid maturation of narrative generation pipelines that employ LLMs in roles ranging from direct text generation, collaborative writing agents, plan-and-write hybrids with neuro-symbolic scaffolding, and virtual agent behavior scripting. These frameworks systematically integrate LLMs into authoring pipelines, structuring agent metadata, prompt engineering, iterative refinement, and scene-to-action mapping. Contemporary research demonstrates that robust pipelines, careful prompt protocols, and hybrid neuro-symbolic or multi-agent designs can enable reliable, high-level translation from narrative or scene intent into structured, executable, and engaging multi-agent behaviors or rich stories in a variety of settings (Regmi et al., 23 Dec 2025).

1. Scene Encoding, Metadata, and Serialization

State-of-the-art LLM-based narrative systems encode scenes as structured metadata over agents and objects. Each agent $A_i$ is tagged using a SelfExplainer structure storing a tuple: $(\text{ID}, \text{Name}, \text{Position}, \{\text{Semantic Tags}\})$ . Objects $O_j$ are similarly wrapped with an InteractableObjectExplainer tuple: $(\text{ID}, \text{Name}, \text{Position}, \text{IsGrabbable}, \text{IsStationary}, \text{IsStationaryCompatible}, \text{IsBasicInteraction}, \{\text{Semantic Tags}\})$ . This formalizes every entity’s identity, role, spatial data, affordances, and semantic descriptors. The resulting agent/object schema (6- or 7-tuples) is traversed by a SceneSerializer, which iterates through the entities to produce a plain-language prompt. Lines are concatenated by type:

Actors:
Name: Guy   ID: A_1   Tags: male, college student…   Position: (–0.36, 0.11, –6.12)
Interactable Objects:
Object ID: Obj_5   Name: Chair   Is Grabbable: No  Is Stationary: Yes…

This prompt forms part of the input to the LLM, coupled with a fixed-format system prompt (see below), enabling end-users to drag-and-drop scene items and rapidly instantiate agent-based narratives (Regmi et al., 23 Dec 2025).

2. Prompt Engineering and Input Protocols

The natural language prompt fed to the LLM is a concatenation of the system prompt and the serialized scene description:

$S(\text{scene}) = [\text{System Prompt}] + [\text{Serialized Scene Description}]$

The system prompt encodes critical behavioral, structural, and output schema constraints, e.g.:

"You are a procedural story generation assistant. Your task is to convert a formatted scene description into a compatible SceneDirector instruction string… Output only the SceneDirector string and nothing else.”

Each virtual agent must be formatted as:

1	AGENT_ID {ObjectID_1 (T/F, DURATION, SPEED, GRAB_TF, STATIONARY_TF, BASIC_TF), ...}

This rigid format is essential for downstream parsing and conflict avoidance in multi-agent plans. The template is standardized for all LLM invocations, optimizing for parser robustness, output compatibility, and behavioral determinism, especially in interactive or procedural applications with many agents and objects (Regmi et al., 23 Dec 2025).

3. LLM Output Schema, Parsing, and Behavior Mapping

The LLM is conditioned to return a single, structured string in SceneDirector format. The formal BNF grammar for the output is:

1
2
3

<DirectorString> ::= <AgentEntry> (',' <AgentEntry>)*
<AgentEntry>    ::= 'A_' <agentID>  '{' <ObjectEntry> (',' <ObjectEntry>)* '}'
<ObjectEntry>   ::= 'Obj_' <objID> '(' <Bool> ',' <Float> ',' <Float> ',' <Bool> ',' <Bool> ',' <Bool> ')'

Outputs are parsed using regular expressions. For each agent:

1	^A_(\d+)\s*\{([^\}]+)\}

The object's action flags, durations, speed, and interaction types are then used to instantiate and enqueue behavioral "Destinations" in agent execution queues. Destinations trigger corresponding behaviors in the simulation engine:

Normal: MoveTo → PlayFullBodyAnimation
Grab: MoveTo → AttachObject → PlayUpperBodyGrabAnimation → MoveTo(next) → DropObject
Stationary: MoveTo → PlayIdleAnimation
Basic: TriggerIKInteraction

Layered animation (upper/lower body separation) allows simultaneous grab+stationary combinations, maximizing compositionality with minimal asset engineering (Regmi et al., 23 Dec 2025).

4. Evaluation Metrics, Scalability, and Quantitative Results

Performance is evaluated using the following protocol:

Test LLMs: ChatGPT (gpt-4.1-mini), Claude (claude-sonnet-4-5), Gemini (gemini-2.5-flash), Grok (grok-4-1-fast)
Complexity scenarios: 1O-1A, 5O-1A, 5O-2A, 5O-5A, 10O-5A (objects-agents)
Metrics:

$t_{s,m,i} = \text{response time (seconds) for scenario } s, \text{ model } m, \text{ trial } i$

$M_{s,m} = \frac{1}{K} \sum_{i=1}^K t_{s,m,i}$

$SD_{s,m} = \sqrt{\frac{1}{K-1} \sum_i (t_{s,m,i} - M_{s,m})^2}$
Structural validity: % outputs successfully parsed (100% for all models)
Key results:

Scenario	ChatGPT	Claude	Gemini	Grok
1O-1A	0.79 (±0.13)	3.27 (±0.45)	2.94 (±0.71)	4.38 (±0.79)
5O-1A	1.52 (±0.22)	4.49 (±0.81)	6.99 (±1.94)	28.38 (±5.14)
5O-2A	3.50 (±1.38)	4.63 (±0.62)	8.96 (±4.69)	20.56 (±4.60)
5O-5A	2.53 (±0.36)	5.36 (±0.44)	15.77 (±2.94)	58.22 (±47.40)
10O-5A	2.31 (±0.36)	5.83 (±1.19)	13.90 (±3.57)	40.60 (±12.21)

All models generated fully parseable, coherent outputs even at high complexity. ChatGPT consistently outperformed others in both latency and scalability; Gemini and Grok exhibited degraded performance as scene size increased (Regmi et al., 23 Dec 2025).

5. Design Insights, Limitations, and Best Practices

Prompt specificity is critical: precise output format and behavioral constraints drastically reduce parser errors and ensure multi-agent plan coherence.
Layered animation tech enables complex interaction overlays within standard behavior modules, increasing modularity and extensibility.
Decoupling metadata (ScriptableObject) from narrative logic fosters system extensibility; scene composition can be modified independently of LLM behaviors.
Significant limitations remain: systems are currently brittle to output format deviations, offer no agent memory or online replanning, and are best-suited for short/static scenarios. Expanding to dynamic, reactive, or conversational narratives requires local/fine-tuned LLMs, agent goal modeling, richer physical simulation, and affective or dialogic modules.
Best practices include rigorous template-based prompt design, clear separation of world and narrative logic, and stratified behavior mapping pipelines (Regmi et al., 23 Dec 2025).

6. Theoretical and Applied Significance

LLM-based narrative generation, as realized in agent-based authoring pipelines, establishes that even moderate-scale transformer models—when embedded in well-constrained, metadata-driven input/output pipelines—can reliably ground high-level narrative scene intents into precise, multi-agent, executable behaviors. Such methods enable rapid prototyping of agent-centric experiences, inform best practices for world-author integration, and provide a template for scaling narrative complexity via decoupled serialization, output schema regularization, and downstream scripting. The approach offers actionable guidance for entertainment, simulation, and virtual world design domains where authorial control, agent intentionality, and real-time interaction are integral.

These foundations serve as a blueprint for further integration with memory, planning, and affect modules, supporting the emergence of even richer, multi-modal or adaptive narrative agents (Regmi et al., 23 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

LLM-Based Authoring of Agent-Based Narratives through Scene Descriptions (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LLM-Based Narrative Generation.

LLM Narrative Generation

1. Scene Encoding, Metadata, and Serialization

2. Prompt Engineering and Input Protocols

3. LLM Output Schema, Parsing, and Behavior Mapping

4. Evaluation Metrics, Scalability, and Quantitative Results

5. Design Insights, Limitations, and Best Practices

6. Theoretical and Applied Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

LLM Narrative Generation

1. Scene Encoding, Metadata, and Serialization

2. Prompt Engineering and Input Protocols

3. LLM Output Schema, Parsing, and Behavior Mapping

4. Evaluation Metrics, Scalability, and Quantitative Results

5. Design Insights, Limitations, and Best Practices

6. Theoretical and Applied Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research