- The paper’s main contribution is introducing YARN, a framework that integrates LLM-derived abstractions with symbolic structural mapping for improved narrative analogical reasoning.
- It decomposes narratives into event units and applies multi-level abstraction to effectively bridge unstructured text with structured analogical mappings.
- Experiments reveal that using conceptual and evaluative abstractions can boost mapping accuracy by up to 0.29 over end-to-end LLM approaches.
Enhancing Structural Mapping with LLM-derived Abstractions for Analogical Reasoning in Narratives
Introduction
The paper introduces YARN (Yielding Abstractions for Reasoning in Narratives), a neurosymbolic framework that operationalizes analogical reasoning in narratives by integrating LLM-based extraction and abstraction mechanisms with a structural mapping pipeline. YARN directly addresses the core issue in analogical reasoning for natural language: bridging the gap between unstructured narrative text and the structured representations required by cognitive mapping engines, such as the Structure Mapping Engine (SME), while overcoming the limitations of end-to-end LLM prompt-based approaches.
By decomposing narratives into event-centric units and computing hierarchical abstractions, YARN advances over both traditional cognitive mapping systems—which require structured inputs—and LLMs engaged in surface-level prompt-based analogy detection. The study further investigates the effectiveness of distinct conceptual abstraction levels for supporting analogical mapping and provides an extensive empirical analysis on established narrative analogy benchmarks.
YARN Framework
YARN is formulated as a modular pipeline to facilitate systematic exploration of analogical reasoning in narrative contexts. It comprises three main stages:
- Event Unit Extraction: Narratives are decomposed into event phrases representing atomic, semantically meaningful events, ensuring coverage and minimal redundancy.
- Hierarchical Abstraction: Each event is abstracted at multiple conceptual and narrative levels, producing representations ranging from shallow (surface event semantics) to deep (global narrative function and stages). The abstraction process follows four levels:
- Conceptual abstraction (modifier-root frames)
- Evaluative abstraction (functional role and polarity)
- Narrative arc abstraction (position within a five-stage arc)
- Stage abstraction (grouping events into higher-level stages and entire narrative abstractions)
This abstraction process is context-sensitive—abstractions depend on the event's narrative role, beyond its lexical content.
Figure 1: Story events are transformed into abstract representations that encode functional roles and semantic meaning, enabling structural mapping between narratives.
- Structural Mapping and Scoring: Abstracted event/unit representations from input narratives are mapped using a greedy, one-to-one mapping algorithm. Local mapping similarity is computed by combining embedding-based cosine similarities of concept/event units and abstraction constraints, yielding a global mapping optimal with respect to the one-to-one assignment.
Figure 2: Candidate mappings between abstractions are scored and aggregated into a structure-preserving one-to-one correspondence, with the final score reflecting overall analogy strength.
This pipeline enables controlled experimentation with different abstraction levels, mapping strategies, and prompt formats.
Experimental Evaluation
YARN is evaluated on two prominent narrative analogy benchmarks:
- StoryAnalogy-MCQ (MCQ): Derived from 24K Story Analogy, featuring shorter narratives and multiple-choice format.
- ARN (Analogical Reasoning on Narratives): Longer narrative pairs derived from proverbs, categorizing analogies by structural and surface similarity (near/far).
The core comparison is between:
- End-to-end LLMs (Qwen3-8B, Llama-3.1-8B) prompted for analogical reasoning (zero-shot and chain-of-thought)
- Structural mapping over extracted event phrases or their abstractions (YARN pipeline).
The results demonstrate:
- Structural mapping without abstraction performs at or below random, confirming that distributional embeddings alone are not sufficient for analogical reasoning.
- Inclusion of conceptual and evaluative abstractions consistently and substantially improves analogical mapping accuracy, with gains up to +0.29 over unabstracted baselines.
- On MCQ, YARN with conceptual and evaluative abstraction outperforms end-to-end LLMs for both Qwen and Llama.
- On ARN, LLMs excel at near analogies (high surface overlap), while YARN provides notable improvements on far analogies, where mapping relies on structural but not superficial similarity.
- The optimal abstraction level varies by benchmark and analogy type: lower-level (less abstracted) conceptual representations aid near analogies, while higher-stage abstractions benefit far analogies.
Figure 3: Performance of Qwen across hierarchical conceptual abstraction levels; lower abstraction aids near analogies, deeper abstraction (modifier removal) helps far analogies.
Figure 4: Hierarchical stage abstraction boosts far analogy performance, aggregating events into global narrative abstractions.
Analysis and Error Diagnosis
A granular error analysis reveals failure modes at multiple pipeline stages:
- Unit Extraction: Occasional omission, redundancy, misattribution, or hallucination of events, stemming from LLM extraction limitations.
- Abstraction: LLMs often default to shallow or context-insensitive abstractions, failing to capture pragmatic or causally nuanced narrative roles.
- Structural Mapping: Embedding-based similarity is heavily surface-biased and insensitive to negation, contradiction, or opposition, resulting in spurious or brittle alignments for structurally contrasting analogies.
- Benchmark Artifacts: The MCQ dataset’s grammatical regularity sometimes allows correct analogical identification by pattern-matching, undermining the need for deep reasoning. The ARN dataset reveals multiple distinct analogy resolution patterns (arc-based, outcome-based, contrastive), some of which are not adequately captured by standard structural mapping.
Figure 5: ARN contains disparate analogy patterns, including arc-based, outcome-focused, and contrastive analogies, which challenge the mapping strategy.
LLMs also show inconsistent analogical reasoning across input unit granularities and prompt instructions, exposing issues in their compositional and reasoning reliability.
Figure 6: LLM analogical predictions fluctuate across input granularity (story, sentence, event), highlighting inconsistencies.
Implications and Directions for AI Reasoning
YARN presents a substantive advance toward operationalizing structure-based analogical reasoning in open-domain narratives, suggesting several implications:
- Abstractions are essential: LLM-derived abstractions, especially when formulated hierarchically and with task-sensitive context, are critical for mapping beyond surface-level similarity.
- No universal abstraction: The utility of specific abstraction levels is contingent on the analogy type and dataset; adaptivity in abstraction is necessary.
- Hybrid neurosymbolic reasoning: Integrating LLM generative capacities with explicit, interpretable structural mapping yields tangible gains for analogical generalization on far analogies, where linguistic variation obviates shallow pattern-matching.
- Embeddings and scoring are limiting: Embedding-based similarity is fundamentally inadequate for modeling contrastive or complex analogy patterns; incorporating NLI-based or graph-structured scoring is a compelling direction.
- Benchmark limitations: Current datasets both obscure and conflate distinct reasoning mechanisms, and finer-grained, structurally-annotated benchmarks are required to enable comprehensive evaluation and effective LLM fine-tuning for analogical tasks.
Conclusion
The proposed YARN framework rigorously demonstrates that LLM-based narrative decomposition and hierarchical abstraction, when integrated with symbolic structural mapping, enhance analogical reasoning capabilities in machine systems, particularly for far analogies beyond the reach of existing end-to-end LLM interfaces. The work exposes the limitations of off-the-shelf LLMs and standard mapping functions and sets a clear trajectory for the integration of advanced abstraction schemas, alternative matching/scoring engines, and the development of nuanced analogy benchmarks.
The anticipated future directions include:
- Leveraging richer structural representations (e.g., AMR, ASP) for narrative mapping.
- Expanding abstraction frameworks to cover emotion, morality, and cultural motifs.
- Designing mapping mechanisms capable of handling contradiction and opposition.
- Curating analogy datasets with explicit relational alignments for fine-tuning and systematic benchmarking.
This framework establishes a solid foundation for systematic, scalable analogical reasoning in natural language narratives, grounded in cognitive theory yet extensible via modern neurosymbolic AI.