Beyond the Parameters: A Technical Survey of Contextual Enrichment in Large Language Models: From In-Context Prompting to Causal Retrieval-Augmented Generation

Published 3 Apr 2026 in cs.CL and cs.AI | (2604.03174v1)

Abstract: LLMs encode vast world knowledge in their parameters, yet they remain fundamentally limited by static knowledge, finite context windows, and weakly structured causal reasoning. This survey provides a unified account of augmentation strategies along a single axis: the degree of structured context supplied at inference time. We cover in-context learning and prompt engineering, Retrieval-Augmented Generation (RAG), GraphRAG, and CausalRAG. Beyond conceptual comparison, we provide a transparent literature-screening protocol, a claim-audit framework, and a structured cross-paper evidence synthesis that distinguishes higher-confidence findings from emerging results. The paper concludes with a deployment-oriented decision framework and concrete research priorities for trustworthy retrieval-augmented NLP.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces a unified taxonomy categorizing contextual enrichment techniques—prompting, RAG, GraphRAG, and CausalRAG—based on structural complexity during inference.
The paper demonstrates that retrieval-augmented approaches, such as standard RAG and its advanced forms, can improve factual accuracy by up to 10 percentage points compared to prompt-only baselines.
The paper emphasizes that while increased context structure enhances causal reasoning and interpretability, it also incurs higher engineering complexity and maintenance costs.

Technical Summary of "Beyond the Parameters: A Technical Survey of Contextual Enrichment in LLMs: From In-Context Prompting to Causal Retrieval-Augmented Generation" (2604.03174)

Introduction and Survey Scope

This survey systematically examines contextual enrichment strategies in LLMs, positioning them along an axis of structural complexity introduced at inference time. The analysis is motivated by persistent limitations of LLMs in knowledge-intensive NLP: static parametric knowledge, incomplete retrieval of relevant evidence, and weak causal reasoning in generative outputs. By organizing prompting, Retrieval-Augmented Generation (RAG), GraphRAG, and CausalRAG into a unified taxonomy, the paper creates a coherent framework that aligns method selection to underlying task requirements in high-stakes, knowledge-centric scenarios.

Literature Selection and Evidence Framework

A transparent and reproducible literature-screening protocol underpins the survey. Methodological rigor is maintained via explicit inclusion/exclusion criteria prioritizing concrete retrieval and generation results over speculative commentary. Evidence is graded as high, medium, or emerging confidence, with a claim-audit table that reliably links formal statements to their empirical or theoretical foundation. This supports both reproducibility and transparent reporting—critical for survey work informing deployment in safety-critical contexts.

Context Provisioning in LLMs

The paper distinguishes between three context modalities:

Parametric context: Information encoded in the model weights.
In-context knowledge: Task- or example-specific cues supplied through prompt engineering.
Retrieved context: External evidence dynamically inserted at inference.

Prompt engineering, enabled by transformer sequence modeling, facilitates few-shot and chain-of-thought (CoT) reasoning. However, long-context models are subject to lost-in-the-middle failures and noise from irrelevant textual insertions, as well as information obsolescence due to static parameterization.

Retrieval-Augmented Generation (RAG)

RAG enhances LLM output by grounding generation in externally indexed knowledge. The survey covers:

Retrieval architectures: Sparse (e.g., BM25), dense (e.g., DPR), and hybrid designs.
Advanced pipelines: Multi-hop and iterative retrieval, self-reflective architectures, and post-retrieval reranking.

RAG demonstrably outperforms prompt-only baselines on knowledge-intensive QA, with gains such as a 10 percentage point improvement in exact match scores on Natural Questions [lewis2020rag]. However, standard RAG is subject to context fragmentation, semantic bias, and limitations in global synthesis—deficiencies directly addressed by more structurally enriched frameworks.

GraphRAG

GraphRAG replaces flat document/chunk retrieval with knowledge graph-based indexing, extracting (head, relation, tail) triples and producing community or thematic summaries through graph clustering. The approach enables:

Enhanced multi-hop and entity-centric reasoning.
Corpus-level synthesis and improved claim traceability.

Nevertheless, GraphRAG faces substantial indexing and maintenance costs, gets confounded by entity resolution, and typically encodes associative rather than explicitly causal relations. Despite these challenges, medium-to-high confidence evidence supports its superiority over vanilla RAG for multi-relational tasks [edge2024graphrag, han2025graphrag, peng2024graphragsurvey].

CausalRAG

CausalRAG introduces explicit directed causal graphs for supporting observational, interventional, and counterfactual reasoning—central to high-stakes interpretability and root-cause analysis tasks. The system extracts causally linked tuples, indexes them, and retrieves subgraphs using query-seeded graph walks, providing the LLM generator with narratives directly grounded in causally structured evidence.

Empirical findings indicate that CausalRAG improves faithfulness and causal coherence, yielding higher aggregate answer quality scores in controlled settings compared to both standard RAG and associative GraphRAG [wang2025causalrag, samarajeewa2024hsi]. However, the evidence base remains limited to narrow domain or slice studies, and automated extraction of causal relations continues to be a significant technical bottleneck [jiralerspong2024causal].

Comparative Analysis and Deployment Guidance

The survey positions prompting, RAG, GraphRAG, and CausalRAG along a monotonic axis of structural richness. Increasing context structure correlates with improved capability for relational and causal reasoning but induces higher infrastructure and maintenance costs. The selection of methodology should be driven by downstream requirements: prompting for lightweight tasks, RAG for factual grounding, GraphRAG for multi-hop/corpus synthesis, and CausalRAG for interpretability and faithfulness in causal analysis.

A claim-audit matrix anchors strong and contradictory claims with graded empirical support, promoting evidence-based adoption and risk-aware deployment. Typical numerical gains include a 39 point improvement in GSM8K solve rate using CoT over standard prompts [wei2022cot], 10 points for RAG over closed-book baselines [lewis2020rag], and up to 19–26 point composite metric improvements by progressing from abstract to full-document retrieval with GraphRAG/CausalRAG under controlled settings.

Open Challenges and Limitations

The survey highlights key unresolved issues:

Scalable and accurate causal extraction at inference time.
Evaluation metrics and benchmarks standardized for causal QA.
Dynamic graph maintenance and multilingual retrieval support.
Stable integration with agentic LLM pipelines.

There are inherent threats to validity due to cross-paper experimental heterogeneity, potential publication bias, and recency effects. The evidence base for CausalRAG, in particular, is marked as medium confidence—practitioners are cautioned regarding overgeneralization from narrow benchmarks.

Broader Implications and Alignment

The survey directly addresses themes in TrustNLP: faithfulness, safety, interpretability, and hallucination mitigation. Risks of overconfidence in extracted structure, and inappropriate deployment, are managed through explicit evidence grading and claim calibration. The work's deployment guidance is directly relevant for real-world use of RAG systems in domains such as finance, healthcare, and scientific QA.

Conclusion

The contextual enrichment of LLMs is best understood as a continuum, with CausalRAG representing the current apex of structure and interpretability at the cost of significant engineering complexity. Systematic progress will require advances in scalable causal extraction, standardized evaluation, and robust integration with planning- and reasoning-capable LLM agents. These directions will shape the next phase of trustworthy, retrieval-augmented NLP system development.

Markdown Report Issue