A Unified Framework for Context-Aware and Relation-Aware Graph Retrieval-Augmented Generation

Published 16 Jun 2026 in cs.AI | (2606.18075v1)

Abstract: Retrieval-Augmented Generation (RAG) has emerged as a paradigm for enhancing LLMs with external knowledge, yet existing graph-based methods face a fundamental limitation: entity-centric and chunk-centric approaches operate on representations anchored to original text without true knowledge fusion. While entity-centric methods connect logically related content and chunk-centric methods preserve context, both retrieve information separately through similarity search, missing emergent understanding from their synthesis. In this paper, we propose HyGRAG, a hierarchical graph RAG framework that transcends source documents by addressing three core challenges: constructing summaries that genuinely integrate contextual and relational information, leveraging these synthesized representations to access emergent knowledge during retrieval, and efficiently updating hierarchical structures for dynamic corpora. Specifically, we design hierarchical index structures over hybrid graphs with both chunk and entity nodes, then iteratively cluster them and generate LLM-based summaries. Then, we design context and relation-aware retrieval that searches across all abstraction levels while expanding through community membership. Moreover, we enable dynamic knowledge update through attachment-based algorithms with only local re-summarization. Experimental results show that HyGRAG improves the average accuracy of multi-hop reasoning tasks by 9.7%, while maintaining reasonable efficiency.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper proposes a unified framework that fuses context and relational information using a hierarchical hybrid graph index.
The method integrates context-aware chunk retrieval with entity extraction to enable efficient multi-hop reasoning and improved factual accuracy.
Empirical evaluations demonstrate notable gains in QA performance and scalability, highlighting the framework's adaptability to dynamic corpora.

Unified Context-Aware and Relation-Aware Graph Retrieval-Augmented Generation

Motivation and Background

Retrieval-Augmented Generation (RAG) represents a paradigm shift in leveraging LLMs by supplementing their parametric knowledge with external corpora and structured data. Existing approaches bifurcate into chunk-centric methods, which emphasize context preservation via hierarchical chunk grouping and summarization, and entity-centric (relation-aware) approaches, which construct explicit knowledge graphs from extracted entities and relations for multi-hop reasoning. However, both paradigms are fundamentally limited: chunk-centric methods lose explicit relational connectivity, rendering them suboptimal for logical inference, while entity-centric methods suffer from information loss during entity extraction, leading to degraded factual QA performance due to missing contextual details. Attempts to naïvely combine both approaches in hybrid graphs retain the separation between context and relations, resulting in incomplete knowledge fusion and failure to capture emergent semantics required by complex queries.

Hierarchical Hybrid Graph Indexing

To address this dichotomy, the proposed framework introduces a hierarchical index architecture over a hybrid graph that fuses context and relational information at multiple abstraction levels.

Figure 1: Overall architecture illustrating hierarchical hybrid graph index structures integrating chunk and entity nodes, multi-level clustering, and bi-level retrieval.

Hybrid Graph Construction

Corpus is segmented into overlapping chunks, preserving granular context.
Chunks are linked via shared entity counts, establishing semantic chunk-chunk edges.
Entities and relations are extracted from each chunk using LLMs, forming the entity-level knowledge graph.
Cross-layer edges connect entities to their containing chunks, producing a hybrid graph with chunk, entity, and relation nodes.

Hierarchical Indexing

Hybrid graph nodes (chunks and entities) are embedded using structure-aware methods (Cleora), then clustered via LSH across hyperplanes. Clustering buckets (communities) are recursively summarized using LLMs with structured prompts that demand simultaneous synthesis of context and relation, ensuring knowledge fusion. These summaries are re-embedded, treated as nodes in higher index layers, and the process is repeated, yielding multi-scale hierarchical abstraction encompassing leaf nodes (original chunks/entities), intermediate community summaries, and top-level semantic aggregations.

Figure 2: Method performance summary using Qwen3-8B, evidencing robust accuracy across relation-aware (MuSiQue) and context-aware (MultiHop-RAG) benchmarks, and detailed case analyses.

Bi-Level Retrieval and Efficient Generation

The retrieval strategy is fundamentally dual-stage:

Context-aware retrieval: Similarity search across all levels (chunks, entities, community summaries) retrieves high-context and abstract nodes relevant to a query embedding.
Relation-aware retrieval: Entities extracted from retrieved communities are expanded, and their associated triplets are filtered with embedding similarity, constructing a logically coherent set of relations for multi-hop reasoning.

The final context for LLM generation is structured as a prompt template integrating community summaries, retrieved chunks, entities, and filtered triplets. This ensures both factual grounding and logical completion.

Figure 3: Query efficiency comparison, demonstrating superior retrieval speed and token usage, especially among relation-aware systems.

Dynamic Index Updates for Evolving Corpora

The hierarchical structure supports attachment-based incremental updates. New documents are segmented, entity- and relation-extracted, and summarized. Their representations are attached to the most similar communities in the index; only affected ancestors are re-summarized. This minimizes recomputation and preserves retrieval efficiency for large-scale, continually expanding datasets.

Figure 4: Corpus expansion performance, showing minimal degradation (∼1–2%) in QA accuracy and robust handling of incremental insertions.

Figure 5: Corpus expansion indexing cost, evidencing practical reconstruction efficiency and competitive token consumption.

Empirical Evaluation

Static QA and Multi-Hop Reasoning

The framework consistently outperforms state-of-the-art baselines across PopQA, MuSiQue, MultiHop-RAG, HotpotQA, and QuALITY datasets. Notably:

6.2% gain in Factual Accuracy (PopQA)
9.7% gain in Multi-Hop Reasoning (MuSiQue, MultiHop-RAG)
Up to 12.2% improvement on HotpotQA
Maintains near-best or best Recall results, especially on entity-rich datasets
Robust against changes in embedding models and backbone LLMs, demonstrating stability and adaptability

Ablation studies reveal that removing chunk-level structure causes the most severe performance drop, while entity/relation removal also reduces accuracy; community summaries primarily provide higher-order semantic aggregation.

Efficiency and Scalability

Both offline indexing and online retrieval are computationally efficient: hierarchical clustering is substantially faster than RAPTOR's GMM, and retrieval scales logarithmically with corpus size. The token cost of prompt construction is lower than concatenated baseline approaches, and the system maintains competitive speed and memory usage.

Qualitative Analysis and Interpretability

Case studies demonstrate superior performance in handling ambiguous or multi-hop queries, especially entity disambiguation and hidden relational inference. The framework enables LLMs to trace reasoning paths across contextual and relational axes simultaneously, yielding interpretable, evidence-grounded answers.

Implications and Future Directions

This unified framework advances retrieval-augmented LLMs by synergistically integrating context and relation-aware retrieval through hierarchical hybrid graph indexing. The approach transcends source document anchoring, enabling emergent knowledge synthesis and stable dynamic updates:

Practical implications: robust multi-hop QA, scalable to evolving corpora, efficient memory and token use, adaptable to diverse embedding and LLM backbones
Theoretical implications: highlights the necessity of knowledge fusion at abstraction levels beyond text or isolated entity graphs, advancing representation learning in hybrid structured-unstructured graphs
Future work: integration with GNNs for richer hybrid representations, further reduction of LLM-induced hallucinations, and application to broader tasks (explainable reasoning, scientific QA, etc.)

Conclusion

The unified context- and relation-aware hierarchical RAG framework yields substantial improvements (up to 9.7% for multi-hop reasoning), achieving both retrieval completeness and high efficiency. Its hierarchical knowledge fusion, bi-level retrieval, and attachment-based update mechanisms enable scalable, interpretable, and robust retrieval-augmented generation, constituting a significant advancement for practical QA and dynamic knowledge-intensive tasks (2606.18075).