MoRA-RAG: Agentic Retrieval Ensemble

Updated 25 November 2025

The paper introduces MoRA-RAG, an agentic RAG framework combining multiple specialized retrievers with an LLM controller to dynamically route queries and verify evidence.
The methodology employs a mixture-of-retrieval approach with weighted fusion, enabling multi-step query decomposition and precise evidence aggregation.
Empirical evaluations demonstrate enhanced accuracy, reliability, and domain adaptability across finance, scientific literature, and multi-hazard scenarios compared to traditional RAG systems.

Mixture-of-Retrieval Agentic RAG (MoRA-RAG) is an agentic Retrieval-Augmented Generation (RAG) framework that leverages a modular ensemble of retrieval strategies combined with autonomous workflow control by LLM agents. MoRA-RAG dynamically routes queries across multiple, specialized retrievers—potentially including domain-indexed vector stores, structured database search, and hybrid dense-sparse retrievers—using a mixture-of-experts paradigm. It augments this capability with agentic meta-control, enabling multi-step query decomposition, evidence verification, and targeted sub-query refinement, resulting in improved accuracy, faithfulness, and reduced hallucination across a variety of high-value domains such as finance, scientific literature, and hazard analysis (Srinivasan et al., 19 Sep 2025, Nagori et al., 30 Jul 2025, Kalra et al., 18 Jun 2025, Chen et al., 19 Aug 2025, Kuai et al., 18 Nov 2025).

1. System Architecture and Workflow

MoRA-RAG is architected as a two-level ensemble: a mixture-of-retrieval frontend provides document evidence pools, and an agentic backend (typically an LLM/agent) orchestrates multi-stage reasoning, calling tools, and verifying sufficiency of retrieved evidence. The system consists of the following key components:

Retrievers as Experts: Sparsely or densely indexed document retrievers (e.g., BM25, vector-based, knowledge graph/Cypher, table-specific) operate in parallel, each providing top- $k$ document candidates and confidence or trustworthiness scores.
Mixture-of-Retrieval Module: Inputs the user query $q$ to a pool of retrievers $R_1, \ldots, R_n$ and computes per-retriever weights $w_i(q)$ based on information-theoretic or confidence-based signals, as described below.
Agentic LLM Controller: An LLM (e.g., GPT-4o mini, Llama-3.3, Mistral-7B) serves as an autonomous agent, integrating context, orchestrating sub-query decomposition, and dynamically leveraging specialized retrieval and tool chains.
Verification/Refinement Loop: If the generated answer is low-confidence or context coverage is insufficient, the agent decomposes the query, issues sub-queries, or calls external tools, iterating the retrieval and reasoning loop (Srinivasan et al., 19 Sep 2025, Nagori et al., 30 Jul 2025, Chen et al., 19 Aug 2025).

A representative workflow comprises the following steps:

Receive a free-form user query $q$ .
Generate $N$ semantically diverse sub-queries $\{q_i\}$ using a generator LLM or a query expansion model (e.g., Multi-HyDE (Srinivasan et al., 19 Sep 2025)).
For each $q_i$ , execute retrieval across heterogenous retrievers, collecting candidate passages and their respective scores.
Compute mixture weights $\alpha_i$ for each sub-query or retriever using softmax-transformed trust signals, entropy, or model confidence.
Construct an evidence pool by weighted fusion and reranking.
The agentic LLM processes the evidence, synthesizes an answer, verifies sufficiency, and, if necessary, spawns further sub-queries or tool calls.

2. Mathematical Foundations of Mixture-of-Retrieval

The MoRA-RAG mixture-of-retrieval mechanism aggregates relevance signals from multiple heterogeneous retrievers to form a global ranking of evidence. Distinct instantiations appear in the literature:

Sub-query Expansion and Weighting: For sub-queries $q_i$ produced by a generator $g_q$ , each document $d$ receives a score $s_i(d) = f(q_i, d)$ , with possible term-wise combinations of dense similarity and BM25:

$s_i(d) = \lambda \, s^{\mathrm{dense}}_i(d) + (1-\lambda) \, s^{\mathrm{sparse}}_i(d), \quad \lambda \in [0,1]$

Confidence/Entropy-Based Mixture Weights: Sub-query or retriever weights are set via LLM-based or entropy-derived proxies:

$\alpha_i = \frac{\exp(\beta \, g(q_i))}{\sum_{j=1}^{N} \exp(\beta \, g(q_j))}$

or for retriever $i$ with entropy $H_i$ among its softmax-normalized scores,

$\alpha_i = \frac{H_{\max} - H_i}{\sum_k (H_{\max} - H_k)} \qquad \left(\sum_i \alpha_i = 1\right)$

Final Mixed Score and Reranking: For each candidate document $d$ , its final mixture score is

$S(d) = \sum_{i=1}^N \alpha_i s_i(d)$

and the top- $K$ passages are selected for the context window.

This mixture paradigm supports domain-adaptive, granular control and improves robustness to outlier retriever failures (Kalra et al., 18 Jun 2025, Chen et al., 19 Aug 2025, Srinivasan et al., 19 Sep 2025).

3. Agentic Meta-Reasoning and Verification Loop

A central innovation of MoRA-RAG is embedding retrieval within an agentic reasoning framework. The agentic loop encompasses:

Multi-step Reasoning: The LLM agent inspects retrieved evidence, checks for sufficiency at each step, and decomposes complex or underspecified queries into finer sub-queries when required.
Tool Use: The agent dynamically invokes domain-specific tools (e.g., table search, calculator, KG traversal), adapts workflows, and controls iteration (using meta-prompts or JSON meta-plans).
Verification and Refinement: Low-confidence results (as evaluated by a domain-calibrated confidence metric or perplexity threshold) trigger recursive sub-querying and retrieval expansion, ensuring improved coverage and answer faithfulness (Srinivasan et al., 19 Sep 2025, Chen et al., 19 Aug 2025).

This agentic orchestration promotes end-to-end faithfulness by enforcing a "verification loop"—answers that are not adequately grounded in retrieved evidence result in automatic search refinement or fallback strategies.

4. Retrieval Modalities and Domain Adaptation

MoRA-RAG supports plug-and-play retriever modularity:

Structured, Hybrid, and Specialized Retrieval: Retrievers can include BM25 over text and tables, knowledge graph traversal (Cypher/Neo4j), dense embedding lookup (e.g., MiniLM, FAISS), and even human-in-the-loop or simulated expert modules (Srinivasan et al., 19 Sep 2025, Nagori et al., 30 Jul 2025, Kalra et al., 18 Jun 2025).
Dynamic Routing: The agent can select retrieval mode via uncertainty or confidence quantification; e.g., switching between GraphRAG (for citation network queries) and VectorRAG (for semantic similarity search) depending on the estimated relevance or model uncertainty (Nagori et al., 30 Jul 2025).
Granularity Fusion: Both document and proposition-level retrieval is permitted, enabling deep evidence fusion and reweighting (Kalra et al., 18 Jun 2025).

The mixture-of-retrieval mechanism is domain-agnostic, facilitating adaptation to finance, scientific, or multi-hazard decision support by swapping, adding, or tuning constituent retriever modules.

5. Empirical Performance and Evaluation

MoRA-RAG outperforms baseline and prior ensemble RAG architectures in several settings:

System/Domain	Accuracy (%)	Reliability (%)	Hallucination Δ	Noteworthy Gains	Reference
Finance (ConvFinQA)	45.6	52.9	–15 ppt	+11.2 ppt accuracy over HyDE	(Srinivasan et al., 19 Sep 2025)
Scientific (VS Rec.)	+0.63 (Recall)	+0.56 (Precision)		VS Context Recall/Precision gains	(Nagori et al., 30 Jul 2025)
Multi-hazard QA	94.5		–30% (vs LLMs)	10% better than SOTA RAG	(Kuai et al., 18 Nov 2025)

Faithfulness, precision, and context recall consistently improve in agentic mixed-retrieval settings. Fine-tuned generation (e.g., DPO for faithfulness) yields further robustness to citation errors and hallucination (Nagori et al., 30 Jul 2025). In financial QA, hallucination rates drop 15 points and factual claim F1 improves over strong single-retrieval systems.

Performance scaling is observed with the number of ensemble retrievers ( $n$ ) and context pool size ( $K$ ), saturating near the noise-tolerance limit of the base LLM (Chen et al., 19 Aug 2025).

6. Theoretical Guarantees and Design Insights

The information-theoretic analysis of RAG ensembles demonstrates that the mixture-of-retrieval module provably reduces conditional entropy of generated answers relative to any single retriever, ensuring that ensemble evidence never increases answer uncertainty:

$H(a \mid q, e^*) \leq H(a \mid q, e_i), \quad \forall i$

where $a$ is the answer, $q$ the query, $e^*$ ensemble evidence, and $e_i$ single-retriever evidence (Chen et al., 19 Aug 2025).

Mixture-of-retrieval is zero-shot, requiring only fixed hyperparameter calibration (e.g., $\lambda$ , softmax temperature) with no end-to-end retriever or reader training (Kalra et al., 18 Jun 2025). Human-expert or domain-retreiver modules fold seamlessly into the mixture, with agentic orchestration learning to assign them maximal weight when indicated by domain signals.

7. Modularity, Implementation, and Adaptation

Extensible Retriever Pool: New retrievers—domain corpora, tabular tools, or even simulated human experts—are modularly incorporated. Weights and query expansion prompts accommodate re-prompting or fine-tuning for unseen domains.
Tool-Calling and Workflow Control: Agentic controllers execute in workflow engines (e.g., meta-prompted LLMs, LangChain), with prompt templates and plans capturing workflow signatures. Instruction tuning (e.g., DPO (Nagori et al., 30 Jul 2025)) improves generator calibration.
Token-Efficient Control: Retrieval and generation is regulated for token budget compliance, supporting practical deployment at scale (Srinivasan et al., 19 Sep 2025).

A plausible implication is that MoRA-RAG sets a new standard for interpretable, robust, and domain-flexible RAG by combining theoretically grounded mixture-of-expert retrieval with autonomous agentic control.

References:

(Kuai et al., 18 Nov 2025) Knowledge-Grounded Agentic LLMs for Multi-Hazard Understanding from Reconnaissance Reports
(Srinivasan et al., 19 Sep 2025) Enhancing Financial RAG with Agentic AI and Multi-HyDE
(Nagori et al., 30 Jul 2025) Open-Source Agentic Hybrid RAG Framework for Scientific Literature Review
(Kalra et al., 18 Jun 2025) MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers
(Chen et al., 19 Aug 2025) Revisiting RAG Ensemble: A Theoretical and Mechanistic Analysis of Multi-RAG System Collaboration