Enhancing Structural Mapping with LLM-derived Abstractions for Analogical Reasoning in Narratives

Published 31 Mar 2026 in cs.CL and cs.AI | (2603.29997v1)

Abstract: Analogical reasoning is a key driver of human generalization in problem-solving and argumentation. Yet, analogies between narrative structures remain challenging for machines. Cognitive engines for structural mapping are not directly applicable, as they assume pre-extracted entities, whereas LLMs' performance is sensitive to prompt format and the degree of surface similarity between narratives. This gap motivates a key question: What is the impact of enhancing structural mapping with LLM-derived abstractions on their analogical reasoning ability in narratives? To that end, we propose a modular framework named YARN (Yielding Abstractions for Reasoning in Narratives), which uses LLMs to decompose narratives into units, abstract these units, and then passes them to a mapping component that aligns elements across stories to perform analogical reasoning. We define and operationalize four levels of abstraction that capture both the general meaning of units and their roles in the story, grounded in prior work on framing. Our experiments reveal that abstractions consistently improve model performance, resulting in competitive or better performance than end-to-end LLM baselines. Closer error analysis reveals the remaining challenges in abstraction at the right level, in incorporating implicit causality, and an emerging categorization of analogical patterns in narratives. YARN enables systematic variation of experimental settings to analyze component contributions, and to support future work, we make the code for YARN openly available.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper’s main contribution is introducing YARN, a framework that integrates LLM-derived abstractions with symbolic structural mapping for improved narrative analogical reasoning.
It decomposes narratives into event units and applies multi-level abstraction to effectively bridge unstructured text with structured analogical mappings.
Experiments reveal that using conceptual and evaluative abstractions can boost mapping accuracy by up to 0.29 over end-to-end LLM approaches.

Enhancing Structural Mapping with LLM-derived Abstractions for Analogical Reasoning in Narratives

Introduction

The paper introduces YARN (Yielding Abstractions for Reasoning in Narratives), a neurosymbolic framework that operationalizes analogical reasoning in narratives by integrating LLM-based extraction and abstraction mechanisms with a structural mapping pipeline. YARN directly addresses the core issue in analogical reasoning for natural language: bridging the gap between unstructured narrative text and the structured representations required by cognitive mapping engines, such as the Structure Mapping Engine (SME), while overcoming the limitations of end-to-end LLM prompt-based approaches.

By decomposing narratives into event-centric units and computing hierarchical abstractions, YARN advances over both traditional cognitive mapping systems—which require structured inputs—and LLMs engaged in surface-level prompt-based analogy detection. The study further investigates the effectiveness of distinct conceptual abstraction levels for supporting analogical mapping and provides an extensive empirical analysis on established narrative analogy benchmarks.

YARN Framework

YARN is formulated as a modular pipeline to facilitate systematic exploration of analogical reasoning in narrative contexts. It comprises three main stages:

Event Unit Extraction: Narratives are decomposed into event phrases representing atomic, semantically meaningful events, ensuring coverage and minimal redundancy.
Hierarchical Abstraction: Each event is abstracted at multiple conceptual and narrative levels, producing representations ranging from shallow (surface event semantics) to deep (global narrative function and stages). The abstraction process follows four levels:
- Conceptual abstraction (modifier-root frames)
- Evaluative abstraction (functional role and polarity)
- Narrative arc abstraction (position within a five-stage arc)
- Stage abstraction (grouping events into higher-level stages and entire narrative abstractions)

This abstraction process is context-sensitive—abstractions depend on the event's narrative role, beyond its lexical content.

Figure 1: Story events are transformed into abstract representations that encode functional roles and semantic meaning, enabling structural mapping between narratives.

Structural Mapping and Scoring: Abstracted event/unit representations from input narratives are mapped using a greedy, one-to-one mapping algorithm. Local mapping similarity is computed by combining embedding-based cosine similarities of concept/event units and abstraction constraints, yielding a global mapping optimal with respect to the one-to-one assignment.
Figure 2: Candidate mappings between abstractions are scored and aggregated into a structure-preserving one-to-one correspondence, with the final score reflecting overall analogy strength.

This pipeline enables controlled experimentation with different abstraction levels, mapping strategies, and prompt formats.

Experimental Evaluation

YARN is evaluated on two prominent narrative analogy benchmarks:

StoryAnalogy-MCQ (MCQ): Derived from 24K Story Analogy, featuring shorter narratives and multiple-choice format.
ARN (Analogical Reasoning on Narratives): Longer narrative pairs derived from proverbs, categorizing analogies by structural and surface similarity (near/far).

The core comparison is between:

End-to-end LLMs (Qwen3-8B, Llama-3.1-8B) prompted for analogical reasoning (zero-shot and chain-of-thought)
Structural mapping over extracted event phrases or their abstractions (YARN pipeline).

The results demonstrate:

Structural mapping without abstraction performs at or below random, confirming that distributional embeddings alone are not sufficient for analogical reasoning.
Inclusion of conceptual and evaluative abstractions consistently and substantially improves analogical mapping accuracy, with gains up to +0.29 over unabstracted baselines.
On MCQ, YARN with conceptual and evaluative abstraction outperforms end-to-end LLMs for both Qwen and Llama.
On ARN, LLMs excel at near analogies (high surface overlap), while YARN provides notable improvements on far analogies, where mapping relies on structural but not superficial similarity.
The optimal abstraction level varies by benchmark and analogy type: lower-level (less abstracted) conceptual representations aid near analogies, while higher-stage abstractions benefit far analogies.
Figure 3: Performance of Qwen across hierarchical conceptual abstraction levels; lower abstraction aids near analogies, deeper abstraction (modifier removal) helps far analogies.

Figure 4: Hierarchical stage abstraction boosts far analogy performance, aggregating events into global narrative abstractions.

Analysis and Error Diagnosis

A granular error analysis reveals failure modes at multiple pipeline stages:

Unit Extraction: Occasional omission, redundancy, misattribution, or hallucination of events, stemming from LLM extraction limitations.
Abstraction: LLMs often default to shallow or context-insensitive abstractions, failing to capture pragmatic or causally nuanced narrative roles.
Structural Mapping: Embedding-based similarity is heavily surface-biased and insensitive to negation, contradiction, or opposition, resulting in spurious or brittle alignments for structurally contrasting analogies.
Benchmark Artifacts: The MCQ dataset’s grammatical regularity sometimes allows correct analogical identification by pattern-matching, undermining the need for deep reasoning. The ARN dataset reveals multiple distinct analogy resolution patterns (arc-based, outcome-based, contrastive), some of which are not adequately captured by standard structural mapping.
Figure 5: ARN contains disparate analogy patterns, including arc-based, outcome-focused, and contrastive analogies, which challenge the mapping strategy.

LLMs also show inconsistent analogical reasoning across input unit granularities and prompt instructions, exposing issues in their compositional and reasoning reliability.

Figure 6: LLM analogical predictions fluctuate across input granularity (story, sentence, event), highlighting inconsistencies.

Implications and Directions for AI Reasoning

YARN presents a substantive advance toward operationalizing structure-based analogical reasoning in open-domain narratives, suggesting several implications:

Abstractions are essential: LLM-derived abstractions, especially when formulated hierarchically and with task-sensitive context, are critical for mapping beyond surface-level similarity.
No universal abstraction: The utility of specific abstraction levels is contingent on the analogy type and dataset; adaptivity in abstraction is necessary.
Hybrid neurosymbolic reasoning: Integrating LLM generative capacities with explicit, interpretable structural mapping yields tangible gains for analogical generalization on far analogies, where linguistic variation obviates shallow pattern-matching.
Embeddings and scoring are limiting: Embedding-based similarity is fundamentally inadequate for modeling contrastive or complex analogy patterns; incorporating NLI-based or graph-structured scoring is a compelling direction.
Benchmark limitations: Current datasets both obscure and conflate distinct reasoning mechanisms, and finer-grained, structurally-annotated benchmarks are required to enable comprehensive evaluation and effective LLM fine-tuning for analogical tasks.

Conclusion

The proposed YARN framework rigorously demonstrates that LLM-based narrative decomposition and hierarchical abstraction, when integrated with symbolic structural mapping, enhance analogical reasoning capabilities in machine systems, particularly for far analogies beyond the reach of existing end-to-end LLM interfaces. The work exposes the limitations of off-the-shelf LLMs and standard mapping functions and sets a clear trajectory for the integration of advanced abstraction schemas, alternative matching/scoring engines, and the development of nuanced analogy benchmarks.

The anticipated future directions include:

Leveraging richer structural representations (e.g., AMR, ASP) for narrative mapping.
Expanding abstraction frameworks to cover emotion, morality, and cultural motifs.
Designing mapping mechanisms capable of handling contradiction and opposition.
Curating analogy datasets with explicit relational alignments for fine-tuning and systematic benchmarking.

This framework establishes a solid foundation for systematic, scalable analogical reasoning in natural language narratives, grounded in cognitive theory yet extensible via modern neurosymbolic AI.

Markdown Report Issue