- The paper introduces an LLM-steered pathfinding approach that integrates user-defined agendas to dynamically generate multiperspective narrative paths.
- The method achieves up to 13.3% improved agenda alignment with minimal coherence loss of only 2.2% compared to traditional baseline methods.
- Rigorous evaluation using dual LLM judges and sensitivity analyses highlights the practical robustness and data-groundedness of the narrative extraction framework.
Motivation and Context
Narrative extraction from document collections is central to tasks such as news summarization, sensemaking, and intelligence analysis. Traditional approaches to document-level storyline extraction present a tension between maximizing narrative coherence, supporting user interactivity, and enabling multiperspective analysis. Prior systems such as Narrative Maps generate multiple interconnected storylines, delivering interaction and coverage, but at the expense of individual path coherence. In contrast, Narrative Trails optimizes for the most coherent single storyline utilizing a maximum-capacity bottleneck path approach but remains rigid, offering no mechanisms for user guidance or alternative perspective generation.
This work proposes an intermediate approach: agenda-based narrative extraction. By integrating LLMs into the pathfinding process, agenda-based extraction introduces interactive, user-defined steering on top of the high-coherence guarantees of Narrative Trails. This enables the dynamic generation of alternative, perspective-driven narratives from the same corpus, while quantifying and minimizing the associated trade-offs in coherence.
Methodological Contributions
The paper develops an LLM-steered pathfinding algorithm that generalizes the maximum capacity approach. Rather than greedily following the most coherent transition at each step, the algorithm selects a candidate set of top-k neighbors (by edge coherence) and uses an LLM to rerank them according to their alignment with a user-specified natural language agenda. Candidate agenda types include literal (keyword-matched), semantic (requiring non-trivial inference beyond the surface form), and counter (contradicting the corpus consensus and serving as negative controls).
Prompts for LLM scoring are constructed to provide narrative context, current state, destination, and agenda, instructing the model to return a strict ranking of candidate continuations. For evaluation, the authors utilize state-of-the-art LLMs (Claude Opus 4.5, GPT-5.1) as judges to score the resulting narratives on coherence (logical flow, thematic consistency, temporal order, completeness) and agenda-alignment (agenda support, persuasiveness, evidentiary strength, directionality, and bias effectiveness). This dual-judge strategy mitigates single-model biases and leverages prior findings on LLM-judge reliability.
The approach is benchmarked on a temporally and topically rich news corpus detailing the 2021 Cuban protests, evaluating 64 endpoint pairs across six agenda types.
Empirical Results
The experimental results quantify the trade-off between coherence and agenda alignment introduced by LLM-based steering:
- LLM-based steering yields 9.9% higher alignment on semantic (inference-based) agendas versus keyword matching (p=0.017) and a 13.3% gain on the "Regime Crackdown" agenda (p=0.037).
- For literal, keyword-aligned agendas, keyword matching is competitive or superior, indicating that semantic LLM steering is most valuable where surface-form cues are insufficient.
- Coherence loss due to LLM steering is minimal, only 2.2% below the coherence of the agenda-agnostic maximum-capacity baseline.
The interdependence between coherence and agenda support across all method/agenda pairs is weak (Pearson r≈0.10), suggesting that high alignment is achievable without a fundamental reduction in coherence.
Figure 1: Percentage differences in coherence and alignment with respect to maximum capacity baseline for different steering approaches.
Visualization of agenda-steered narrative paths demonstrates how different framing choices lead to diverging trajectories through the corpus UMAP space, with clear partitioning based on the imposed agenda. The system's inability to fabricate high-scoring counter-agenda narratives (all methods yield low alignment for agendas that contradict the majority of the corpus) underscores the inherent data-groundedness of the approach.
Figure 2: Visualization of how agendas produce distinct paths through the embedding space; each colored path represents a different agenda steering the Narrative Trails framework.
Figure 3: Narrative map showing the topological separation and convergence of storylines under agenda steering between the same endpoints.
Sensitivity and Robustness
Sensitivity analysis explores the impact of candidate set size, LLM temperature, and model size. Results for these parameters are stable, and default choices provide a robust trade-off between cost and performance. Prompt engineering has a non-trivial influence: chain-of-thought (CoT) prompting boosts agenda alignment by 25.7% at an increased compute cost, and path overlap between direct and CoT prompting for the same agenda is low (Jaccard similarity ≈0.58), which demonstrates that both model reasoning process and prompt granularity meaningfully affect selection.
Theoretical and Practical Implications
The findings articulate a new point in the design space of narrative extraction, demonstrating that user-steered, multiperspective, and highly coherent narrative construction is attainable by leveraging LLMs as pathfinding rerankers. For applications in intelligence, journalism, and digital humanities, this approach dramatically improves analysts' ability to interactively generate and compare alternative storylines by agenda, without complicated manual graph operations. Importantly, because counter-agenda steering fails to fabricate unsupported narratives, the system's operations remain tightly grounded in the empirical evidence encoded in the underlying dataset.
More broadly, this work demonstrates a template for integrating LLM-guided selection into constrained combinatorial optimization frameworks—here, pathfinding on coherence graphs—opening further avenues for mixed-initiative, user-in-the-loop narrative analytics. Potential applications extend to customizable history generation, multiperspective news analysis, and explainability in retrieval pipelines.
Limitations and Future Directions
While demonstrating robust performance on a single, moderately-sized corpus and with a canonical set of agendas, generalization to other domains, languages, and agenda types requires further empirical validation. Evaluation relies on LLMs as judges; future work should include human subject experiments to ensure LLM-judged narrative quality aligns with the requirements of target analysts.
Interactive deployment at scale is currently infeasible due to LLM inference latency, indicating a need for further model distillation or caching strategies. The ethical risk of misusing agenda-driven selection for narrative manipulation is acknowledged; the explicit nature of agenda specification in this system supports transparency and auditability.
Conclusion
Agenda-based narrative extraction via LLM-steered pathfinding quantitatively bridges the gap between high-coherence narrative construction and interactive, multiperspective exploration (2603.29661). It advances the state of the art in narrative extraction by enabling perspective-driven narrative synthesis with negligible loss of path coherence and with strong robustness against fabrication of unsupported narratives. The integration of LLMs as combinatorial rerankers in pathfinding opens important avenues for mixed-initiative analytics, offering valuable tools for both research and application in AI-driven sensemaking and digital historiography.