UniAI-GraphRAG: Synergizing Ontology-Guided Extraction, Multi-Dimensional Clustering, and Dual-Channel Fusion for Robust Multi-Hop Reasoning

Published 26 Mar 2026 in cs.AI and cs.IR | (2603.25152v1)

Abstract: Retrieval-Augmented Generation (RAG) systems face significant challenges in complex reasoning, multi-hop queries, and domain-specific QA. While existing GraphRAG frameworks have made progress in structural knowledge organization, they still have limitations in cross-industry adaptability, community report integrity, and retrieval performance. This paper proposes UniAI-GraphRAG, an enhanced framework built upon open-source GraphRAG. The framework introduces three core innovations: (1) Ontology-Guided Knowledge Extraction that uses predefined Schema to guide LLMs in accurately identifying domain-specific entities and relations; (2) Multi-Dimensional Community Clustering Strategy that improves community completeness through alignment completion, attribute-based clustering, and multi-hop relationship clustering; (3) Dual-Channel Graph Retrieval Fusion that balances QA accuracy and performance through hybrid graph and community retrieval. Evaluation results on MultiHopRAG benchmark show that UniAI-GraphRAG outperforms mainstream open source solutions (e.g.LightRAG) in comprehensive F1 scores, particularly in inference and temporal queries. The code is available at https://github.com/UnicomAI/wanwu/tree/main/rag/rag_open_source/rag_core/graph.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces a novel framework that combines ontology-guided extraction with multi-dimensional clustering and dual-channel fusion to advance multi-hop reasoning.
It achieves significant performance gains, including a 22.45% F1-score improvement over vector-based RAG and 2.23% over LightRAG, especially in complex inference tasks.
The study highlights a modular approach for domain adaptation and outlines future directions such as semi-automated schema induction and cross-modal integration.

UniAI-GraphRAG: A Unified Approach to Ontology-Guided Extraction and Multi-Dimensional Graph Reasoning

Introduction

Retrieval-Augmented Generation (RAG) approaches have improved factual grounding and reduced hallucinations in LLMs for knowledge-intensive tasks. However, their extension to complex multi-hop reasoning, especially in vertical domains with rich ontological structures (e.g., healthcare, finance, legal), remains inadequate due to generic schema-free extraction and limited community detection strategies. The UniAI-GraphRAG framework directly addresses these bottlenecks by integrating ontology-guided knowledge extraction, multi-dimensional clustering, and a dual-channel graph/semantic retrieval architecture, surpassing the limitations of prior systems such as LightRAG.

Figure 1: The UniAI-GraphRAG architecture: a pipeline integrating ontology-guided extraction, multi-dimensional clustering, and dual-channel fusion for robust multi-hop reasoning.

Ontology-Guided Knowledge Extraction

One of the central innovations is the ontology-guided extraction module, which explicitly conditions graph construction on expert-defined schemas encompassing entity types, relations, and constraint logic. Unlike schema-free open IE maximizing generic triple extraction probabilities, this approach enforces logical constraints via a triplet constraint space $\mathcal{S} = (\mathcal{E}, \mathcal{R}, \Phi)$ , where $\mathcal{E}$ and $\mathcal{R}$ denote allowed entity and relation types, and $\Phi$ imposes domain/range constraints over these relations.

The probability space of generated triples is constrained through:

$P(t | d, \mathcal{S}) = \frac{\exp(\text{S}_{LM}(t, d)) \cdot \mathbb{I}(t \in \mathcal{V}_{\mathcal{S}})}{\sum_{t' \in \mathcal{T}_{all}} \exp(\text{S}_{LM}(t', d)) \cdot \mathbb{I}(t' \in \mathcal{V}_{\mathcal{S}})}$

This formulation restricts triple generation to those compliant with schema constraints, mitigating both hallucination and semantic noise. Evaluation demonstrates that this ontology-driven extraction not only limits spurious relations but also boosts entity-relation discovery rates in vertical knowledge discovery scenarios.

Figure 2: Knowledge extraction process leveraging schema-constrained prompts for high-precision entity and relation identification.

Multi-Dimensional Community Clustering

Standard graph-based RAG clustering (e.g., Leiden, Louvain) captures only topological neighborhoods, fragmenting business logic and overlooking semantically salient groupings (such as temporal, spatial, or causal). UniAI-GraphRAG introduces a multi-dimensional strategy with three core mechanisms:

Alignment Completion: Edge and attribute backfilling for nodes severed by conventional clustering, preserving reasoning chains across community boundaries.
Attribute-Based Clustering: Communities are formed not only by structure but also by semantically relevant attributes (e.g., "year", "location"), enhancing cluster completeness for queries like "Which products were released in Q2 2022?"
Multi-Hop Relationship Clustering: Subgraphs are clustered by relational paths of configurable hop length, encapsulating the full context for multi-hop question answering.

A novel attribute-aware modularity function is defined as follows:

$Q_{\text{multi}} = \frac{1}{2m} \sum_{i,j} \left( Q_{ij}^{\text{struct}} + \alpha \cdot S_{ij} \right) \delta(c_i, c_j)$

where $S_{ij}$ quantifies attribute similarity, and $\alpha$ balances structural versus semantic cohesion.

Figure 3: Multi-dimensional community detection: integrating structure, attribute, and relationship constraints to preserve reasoning chains across communities.

Boundary completion ensures, by an $\epsilon$ -neighbor mechanism, that critical adjacent nodes are included in summaries when sufficient cross-community connectivity is detected. Deep traversal enables pattern-based, N-hop clustering to provide maximal coverage for inference chains in multi-hop QA.

Dual-Channel Retrieval and Fusion

UniAI-GraphRAG introduces a dual-channel retrieval protocol, balancing fine-grained graph matching and coarse-grained community report recall. Channel 1 utilizes trie-based entity-attribute retrieval optimized for factoid queries, while Channel 2 matches semantic themes to multi-dimensional community summaries, optimal for summarization and relational queries. The overall retrieval score is combined with query-adaptive weighting:

$\mathcal{S}_{\text{final}}(d_i | q) = \beta(q) \cdot \mathcal{S}_{\text{graph}}(d_i | q) + (1 - \beta(q)) \cdot \max_{C_k: d_i \in C_k} \mathcal{S}_{\text{comm}}(C_k | q)$

Here, $\beta(q)$ is dynamically determined via entity density and semantic abstraction heuristics. The final reranking leverages a mutual information maximizing cross-encoder, increasing diversity and relevance by integrating micro- and macro-level retrieval signals.

Figure 4: Dual-channel retrieval architecture: fusing entity-attribute graph retrieval with cluster-level community summarization for flexible query handling.

Experimental Results

Evaluation on the MultiHopRAG benchmark demonstrates consistent superiority of UniAI-GraphRAG over both classic and state-of-the-art GraphRAG baselines. Notably, the system achieves an F1-score improvement of 22.45% over naive vector-based RAG (Dify-RAG) and 2.23% over LightRAG, particularly excelling in inference and temporal queries due to its multi-hop clustering and adaptive retrieval fusion. Ablation studies attribute approximately 3.2–3.4% F1-score gains to each core component—ontology-guided extraction, multi-dimensional clustering, and dual-channel retrieval—underscoring their non-redundant contributions.

Implications and Future Directions

UniAI-GraphRAG demonstrates that precise schema-driven extraction, when coupled with semantically enriched clustering and adaptive retrieval fusion, eliminates key obstacles for complex, domain-specific multi-hop reasoning in RAG systems. The framework’s modularity enables sector-specific adaptation without retraining core LLMs, facilitating deployment in vertical domains.

Limitations include dependence on expert-crafted ontologies (limiting scalability) and lack of multimodal document integration. The extension to semi-automated schema induction and robust cross-modal entity/relation extraction represents an immediate direction. Adaptive windowing for chunked extraction is also identified as a crucial factor for future improvement.

Conclusion

UniAI-GraphRAG establishes a robust framework for domain-adaptable, multi-hop retrieval-augmented reasoning by synergistically integrating ontology-constrained extraction, multi-dimensional clustering, and adaptive dual-channel retrieval. This holistic approach not only achieves state-of-the-art performance but also exposes new pathways for generalizing schema-guided and community-aware graph reasoning in future AI systems.

Markdown Report Issue