Papers
Topics
Authors
Recent
2000 character limit reached

HetaRAG: Hybrid Retrieval-Augmented Generation

Updated 22 November 2025
  • HetaRAG is a hybrid deep retrieval framework that fuses vector, graph, full-text, and SQL stores to boost multimodal evidence retrieval.
  • It overcomes monolithic RAG limitations by leveraging modality-specific strengths and employing score normalization with weighted fusion.
  • Hyper-RAG, a related variant, uses hypergraph-driven retrieval to integrate high-order relational context and mitigate LLM hallucinations.

HetaRAG refers to two conceptually distinct but frequently conflated paradigms within the recent Retrieval-Augmented Generation (RAG) literature: (1) hybrid deep RAG across heterogeneous data stores, and (2) hypergraph-driven RAG leveraging high-order correlations in knowledge graphs. The term “HetaRAG” is most precisely associated with the former—a hybrid deep-retrieval system that orchestrates vector, graph, full-text, and relational retrieval, unifying multiple knowledge access modalities under a single RAG framework (Yan et al., 12 Sep 2025). However, the “Hyper-RAG” system (sometimes also described as “HetaRAG” in informal queries) incorporates a different, hypergraph-structured knowledge representation for controlling LLM hallucinations (Feng et al., 30 Mar 2025). This entry characterizes both approaches, their system architectures, mathematical underpinnings, and empirical results, clarifying nomenclature and the domain-specific contributions of each variant.

1. Definition, Motivation, and Scope

HetaRAG—Hybrid Deep Retrieval-Augmented Generation—designates a RAG framework that fuses evidence from four functionally distinct data stores: vector indices (semantic embeddings), knowledge graphs (relational triples/subgraphs), full-text engines (inverted indices), and relational databases (tabular/SQL). Its explicit motivation is to counteract the limitations of monolithic RAG—where all retrieval is mediated through vector search—by harnessing the complementary strengths of each paradigm: vector stores for semantic similarity, knowledge graphs for relational precision, full-text for exact lexical match, and SQL for complex structured queries (Yan et al., 12 Sep 2025). This aggregation enables the system to support multimodal ingestion, cross-modal retrieval, information fusion, and report generation for complex, procedural, and factual queries.

By contrast, Hyper-RAG implements a hypergraph-based data structure encapsulating not only pairwise but also high-order correlations between document concepts/entities. This addresses risk of hallucination in LLM output, especially in scientific and medical domains, by enabling the generative model to exploit multi-way relational context without overwhelming prompt size or recall (Feng et al., 30 Mar 2025). Both systems aim to advance reliability, explainability, and adaptability in RAG pipelines, but differ fundamentally in knowledge modeling and retrieval orchestration.

2. System Architectures

HetaRAG: Cross-Modal Retrieval Plane

HetaRAG’s architecture is partitioned into four ingestion/retrieval modalities and a hybrid fusion/generation interface:

  • Document Processing and Annotation: Source documents (PDFs, images, HTML, tables) are segmented into modality-tagged chunks—text, table, image, or formula—extracted via tools such as Docling or MinerU. Each chunk acquires globally unique identifiers and multi-plane metadata (position, type, source, etc.).
  • Four Heterogeneous Stores:
  1. Vector Index (e.g., Milvus): Embeddings (BGE-m3 for text, QwenVL for multimodal) support high-dimensional semantic retrieval.
  2. Knowledge Graph (e.g., Neo4j): Relational triples—mined using LLM-guided methods such as CommonKG or GraphRAG—exploit both hierarchical and semantic aggregation (HiRAG, LeanRAG).
  3. Full-Text Index (e.g., Elasticsearch): Inverted token indices enable fast keyword match and high recall.
  4. Relational Database (e.g., MySQL): Structured tabular data enables transactional SQL filtering, aggregation, and join support.
  • Hybrid Retrieval and Fusion: All stores are queried in parallel for each user request. Raw retrieval scores are normalized (min-max per modality) and fused using a weighted sum or parameterized interpolator (Sfuse=wks~k,iS_\mathrm{fuse} = \sum w_k \tilde s_{k,i}). Weights wkw_k can be static, hand-tuned, data-driven, or dynamically routed per query.
  • Reranking and Generation: After evidence selection, a learned (e.g., bge-reranker-large) or LLM-based reranker reorders candidates. The final prompt—standardized via DeepWriter—includes integrated text, tables, and optionally visuals for downstream LLM-based generation.
  • Multi-Hop Agent: For compositional queries, a DeepSearch agent orchestrates multi-stage retrieval and memory accumulation.

Hyper-RAG: Hypergraph-Driven Retrieval

Hyper-RAG constructs a domain-wide hypergraph indexed by both vector similarity and structural (hyperincidence) context:

  • Offline Hypergraph Construction:
    • Entities (vertices) and relations (low-order: binary edges; high-order: hyperedges spanning multiple entities) are extracted from the corpus using LLM-augmented prompts.
    • The hypergraph incidence matrix H{0,1}V×EH \in \{0,1\}^{|V|\times|E|} specifies entity-hyperedge memberships.
    • Both entity and relation descriptions are embedded for fast vector retrieval; the hypergraph is separately indexed in a hypergraph database.
  • Online Retrieval:
    • Given a query qq, LLM keyword extraction yields entity and correlation sets (Xent\mathcal X_\mathrm{ent}, Xcor\mathcal X_\mathrm{cor}).
    • Top-K nearest vertices and hyperedges are retrieved based on embedding similarity.
    • One-hop diffusion along hyperedges and vertices expands the candidate set, supporting coverage of multi-entity, high-order context.
    • Pruning and scoring aggregate results with rankers (see §3).
  • Prompt Fusion and LLM Integration: The prompt provided to the LLM concatenates user question, top-ranked entity information, and multi-way relations, reducing hallucinations and enhancing inference robustness.

3. Mathematical Formalism and Retrieval Algorithms

HetaRAG: Score Normalization and Weighted Fusion

For each query qq and modality kk (vector, kg, ft, db), raw retrieval scores sk,is_{k,i} are normalized:

s~k,i=sk,iminjsk,jmaxjsk,jminjsk,j\tilde s_{k,i} = \frac{ s_{k,i} - \min_j s_{k,j} }{ \max_j s_{k,j} - \min_j s_{k,j} }

Fusion across modalities uses non-negative weights wkw_k summing to 1:

Sfuse(dk,i)=kwks~k,iS_\mathrm{fuse}(d_{k,i}) = \sum_k w_k \tilde s_{k,i}

Dynamic query routing adjusts wk(q)w_k(q) per-query using hand-crafted or learned features (detecting, e.g., tabular needs or relational focus).

Hyper-RAG: Hypergraph Activation and Scoring

The hypergraph’s central operations use incidence structure and embedding similarity:

  • Edge weighting:

w(e)=f({imp(v):ve})veimp(v)w(e) = f(\{ \operatorname{imp}(v) : v \in e \}) \approx \sum_{v \in e} \operatorname{imp}(v)

  • Node activation with respect to the query:

a(v;q)=sim(Embed(q),Embed(v))a(v; q) = \operatorname{sim}(\mathrm{Embed}(q), \mathrm{Embed}(v))

  • High-order edge scoring:

score(q,e)=αϕ(q,{ve})+βψ(q,e)\mathrm{score}(q, e) = \alpha \phi(q, \{v \in e\}) + \beta \psi(q, e)

where ϕ\phi sums query affinity over vertices and ψ\psi is direct description similarity. α\alpha and β\beta are tunable hyperparameters.

4. Retrieval and Generation Pipelines

HetaRAG: End-to-End Processing

  1. Offline: Documents are chunked, annotated, embedded, and indexed across all four modalities.
  2. Online query:
    • Query rewriting (via LLM).
    • Parallel retrieval, normalization, and fusion.
    • Candidate reranking.
    • Optionally: multi-hop reasoning agent.
    • Prompt construction and LLM generation.

Hyper-RAG: Hypergraph QA

  1. Offline: Extract entities/relations, construct/merge hypergraph, embed all components.
  2. Online query:
    • Extract entity/correlation keywords via LLM.
    • Retrieve top-K vertices and hyperedges.
    • One-hop hypergraph diffusion for context expansion.
    • Rank and prune results.
    • Construct prompt with selected passages, sections, and question; generate LLM response.

Hyper-RAG-Lite disables explicit hyperedge retrieval to halve retrieval latency, relying on one-hop diffusion from entities for context.

5. Experimental Results and Empirical Benchmarks

HetaRAG

  • RAG-Challenge (QA): ChatGPT-4o with reranking (bge-reranker-large) attains a composite Score of 117.0 (R/3 + G, max 133), improving by 3-4 points over non-reranked compositions. Reranking consistently improves both retrieval and generation sub-scores.
  • DeepSearch (Multi-Hop QA): The MultiHopAgent enables accurate, stepwise, cross-document reasoning (e.g., linking entities across company founding and executive roles).
  • DeepWriter (Multimodal Report): Average evaluation score of 4.64 (Prometheus2-7B), competitive with larger models. Minor trade-off in breadth for smaller LLMs.

Hyper-RAG

  • NeurologyCorp: Yields a 12.3 point improvement over direct LLM, 6.3 points over Graph RAG, and 6.0 points over Light RAG.
  • Multi-domain Results: Across 9 datasets, Hyper-RAG exceeds Light RAG by 35.5% (selection-based), strongest in Comprehensiveness (+35.1 %) and Coherence (+39.6 %).
  • Latency: Hyper-RAG-Lite averages 0.315 s, doubling retrieval speed over Light RAG with a 3.3-point quality gain.
  • Stability: Robust as question complexity increases—unlike graph or chunked RAG, which exhibit performance degradation.

6. Practical Implementations and Future Research Directions

  • HetaRAG Implementation: MinerU and Docling are used for pre-processing; Milvus, Neo4j, Elasticsearch, and MySQL provide backend indexing. Fusion and reranking frameworks are modular; the open-source repository is available at github.com/KnowledgeXLab/HetaRAG (Yan et al., 12 Sep 2025).
  • Planned Enhancements:
    • Unified graph-anchored index subsuming vector/full-text retrieval within a KG embedding space.
    • Learned/online routing policies for wk(q)w_k(q) (e.g., RL-informed weight adaptation).
    • Cross-modal attention for evidence fusion.
    • End-to-end neural optimization of retriever plus generator against domain objectives.
  • Hyper-RAG: Dual-database (vector and hypergraph) infrastructure. The Lite mode provides efficient real-time QA, balancing context with responsiveness for production deployments in time-critical domains.

Both HetaRAG and Hyper-RAG exemplify new directions in RAG: leveraging architectural and data heterogeneity (HetaRAG) and relational/hyperrelational knowledge representations (Hyper-RAG) to address foundational limits of LLM-based knowledge-intensive tasks.


References:

  • "HetaRAG: Hybrid Deep Retrieval-Augmented Generation across Heterogeneous Data Stores" (Yan et al., 12 Sep 2025)
  • "Hyper-RAG: Combating LLM Hallucinations using Hypergraph-Driven Retrieval-Augmented Generation" (Feng et al., 30 Mar 2025)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to HetaRAG.