Papers
Topics
Authors
Recent
2000 character limit reached

MoRA-RAG: Agentic Retrieval Ensemble

Updated 25 November 2025
  • The paper introduces MoRA-RAG, an agentic RAG framework combining multiple specialized retrievers with an LLM controller to dynamically route queries and verify evidence.
  • The methodology employs a mixture-of-retrieval approach with weighted fusion, enabling multi-step query decomposition and precise evidence aggregation.
  • Empirical evaluations demonstrate enhanced accuracy, reliability, and domain adaptability across finance, scientific literature, and multi-hazard scenarios compared to traditional RAG systems.

Mixture-of-Retrieval Agentic RAG (MoRA-RAG) is an agentic Retrieval-Augmented Generation (RAG) framework that leverages a modular ensemble of retrieval strategies combined with autonomous workflow control by LLM agents. MoRA-RAG dynamically routes queries across multiple, specialized retrievers—potentially including domain-indexed vector stores, structured database search, and hybrid dense-sparse retrievers—using a mixture-of-experts paradigm. It augments this capability with agentic meta-control, enabling multi-step query decomposition, evidence verification, and targeted sub-query refinement, resulting in improved accuracy, faithfulness, and reduced hallucination across a variety of high-value domains such as finance, scientific literature, and hazard analysis (Srinivasan et al., 19 Sep 2025, Nagori et al., 30 Jul 2025, Kalra et al., 18 Jun 2025, Chen et al., 19 Aug 2025, Kuai et al., 18 Nov 2025).

1. System Architecture and Workflow

MoRA-RAG is architected as a two-level ensemble: a mixture-of-retrieval frontend provides document evidence pools, and an agentic backend (typically an LLM/agent) orchestrates multi-stage reasoning, calling tools, and verifying sufficiency of retrieved evidence. The system consists of the following key components:

  • Retrievers as Experts: Sparsely or densely indexed document retrievers (e.g., BM25, vector-based, knowledge graph/Cypher, table-specific) operate in parallel, each providing top-kk document candidates and confidence or trustworthiness scores.
  • Mixture-of-Retrieval Module: Inputs the user query qq to a pool of retrievers R1,,RnR_1, \ldots, R_n and computes per-retriever weights wi(q)w_i(q) based on information-theoretic or confidence-based signals, as described below.
  • Agentic LLM Controller: An LLM (e.g., GPT-4o mini, Llama-3.3, Mistral-7B) serves as an autonomous agent, integrating context, orchestrating sub-query decomposition, and dynamically leveraging specialized retrieval and tool chains.
  • Verification/Refinement Loop: If the generated answer is low-confidence or context coverage is insufficient, the agent decomposes the query, issues sub-queries, or calls external tools, iterating the retrieval and reasoning loop (Srinivasan et al., 19 Sep 2025, Nagori et al., 30 Jul 2025, Chen et al., 19 Aug 2025).

A representative workflow comprises the following steps:

  1. Receive a free-form user query qq.
  2. Generate NN semantically diverse sub-queries {qi}\{q_i\} using a generator LLM or a query expansion model (e.g., Multi-HyDE (Srinivasan et al., 19 Sep 2025)).
  3. For each qiq_i, execute retrieval across heterogenous retrievers, collecting candidate passages and their respective scores.
  4. Compute mixture weights αi\alpha_i for each sub-query or retriever using softmax-transformed trust signals, entropy, or model confidence.
  5. Construct an evidence pool by weighted fusion and reranking.
  6. The agentic LLM processes the evidence, synthesizes an answer, verifies sufficiency, and, if necessary, spawns further sub-queries or tool calls.

2. Mathematical Foundations of Mixture-of-Retrieval

The MoRA-RAG mixture-of-retrieval mechanism aggregates relevance signals from multiple heterogeneous retrievers to form a global ranking of evidence. Distinct instantiations appear in the literature:

  • Sub-query Expansion and Weighting: For sub-queries qiq_i produced by a generator gqg_q, each document dd receives a score si(d)=f(qi,d)s_i(d) = f(q_i, d), with possible term-wise combinations of dense similarity and BM25:

si(d)=λsidense(d)+(1λ)sisparse(d),λ[0,1]s_i(d) = \lambda \, s^{\mathrm{dense}}_i(d) + (1-\lambda) \, s^{\mathrm{sparse}}_i(d), \quad \lambda \in [0,1]

  • Confidence/Entropy-Based Mixture Weights: Sub-query or retriever weights are set via LLM-based or entropy-derived proxies:

αi=exp(βg(qi))j=1Nexp(βg(qj))\alpha_i = \frac{\exp(\beta \, g(q_i))}{\sum_{j=1}^{N} \exp(\beta \, g(q_j))}

or for retriever ii with entropy HiH_i among its softmax-normalized scores,

αi=HmaxHik(HmaxHk)(iαi=1)\alpha_i = \frac{H_{\max} - H_i}{\sum_k (H_{\max} - H_k)} \qquad \left(\sum_i \alpha_i = 1\right)

  • Final Mixed Score and Reranking: For each candidate document dd, its final mixture score is

S(d)=i=1Nαisi(d)S(d) = \sum_{i=1}^N \alpha_i s_i(d)

and the top-KK passages are selected for the context window.

This mixture paradigm supports domain-adaptive, granular control and improves robustness to outlier retriever failures (Kalra et al., 18 Jun 2025, Chen et al., 19 Aug 2025, Srinivasan et al., 19 Sep 2025).

3. Agentic Meta-Reasoning and Verification Loop

A central innovation of MoRA-RAG is embedding retrieval within an agentic reasoning framework. The agentic loop encompasses:

  • Multi-step Reasoning: The LLM agent inspects retrieved evidence, checks for sufficiency at each step, and decomposes complex or underspecified queries into finer sub-queries when required.
  • Tool Use: The agent dynamically invokes domain-specific tools (e.g., table search, calculator, KG traversal), adapts workflows, and controls iteration (using meta-prompts or JSON meta-plans).
  • Verification and Refinement: Low-confidence results (as evaluated by a domain-calibrated confidence metric or perplexity threshold) trigger recursive sub-querying and retrieval expansion, ensuring improved coverage and answer faithfulness (Srinivasan et al., 19 Sep 2025, Chen et al., 19 Aug 2025).

This agentic orchestration promotes end-to-end faithfulness by enforcing a "verification loop"—answers that are not adequately grounded in retrieved evidence result in automatic search refinement or fallback strategies.

4. Retrieval Modalities and Domain Adaptation

MoRA-RAG supports plug-and-play retriever modularity:

The mixture-of-retrieval mechanism is domain-agnostic, facilitating adaptation to finance, scientific, or multi-hazard decision support by swapping, adding, or tuning constituent retriever modules.

5. Empirical Performance and Evaluation

MoRA-RAG outperforms baseline and prior ensemble RAG architectures in several settings:

System/Domain Accuracy (%) Reliability (%) Hallucination Δ Noteworthy Gains Reference
Finance (ConvFinQA) 45.6 52.9 –15 ppt +11.2 ppt accuracy over HyDE (Srinivasan et al., 19 Sep 2025)
Scientific (VS Rec.) +0.63 (Recall) +0.56 (Precision) VS Context Recall/Precision gains (Nagori et al., 30 Jul 2025)
Multi-hazard QA 94.5 –30% (vs LLMs) 10% better than SOTA RAG (Kuai et al., 18 Nov 2025)

Faithfulness, precision, and context recall consistently improve in agentic mixed-retrieval settings. Fine-tuned generation (e.g., DPO for faithfulness) yields further robustness to citation errors and hallucination (Nagori et al., 30 Jul 2025). In financial QA, hallucination rates drop 15 points and factual claim F1 improves over strong single-retrieval systems.

Performance scaling is observed with the number of ensemble retrievers (nn) and context pool size (KK), saturating near the noise-tolerance limit of the base LLM (Chen et al., 19 Aug 2025).

6. Theoretical Guarantees and Design Insights

The information-theoretic analysis of RAG ensembles demonstrates that the mixture-of-retrieval module provably reduces conditional entropy of generated answers relative to any single retriever, ensuring that ensemble evidence never increases answer uncertainty:

H(aq,e)H(aq,ei),iH(a \mid q, e^*) \leq H(a \mid q, e_i), \quad \forall i

where aa is the answer, qq the query, ee^* ensemble evidence, and eie_i single-retriever evidence (Chen et al., 19 Aug 2025).

Mixture-of-retrieval is zero-shot, requiring only fixed hyperparameter calibration (e.g., λ\lambda, softmax temperature) with no end-to-end retriever or reader training (Kalra et al., 18 Jun 2025). Human-expert or domain-retreiver modules fold seamlessly into the mixture, with agentic orchestration learning to assign them maximal weight when indicated by domain signals.

7. Modularity, Implementation, and Adaptation

  • Extensible Retriever Pool: New retrievers—domain corpora, tabular tools, or even simulated human experts—are modularly incorporated. Weights and query expansion prompts accommodate re-prompting or fine-tuning for unseen domains.
  • Tool-Calling and Workflow Control: Agentic controllers execute in workflow engines (e.g., meta-prompted LLMs, LangChain), with prompt templates and plans capturing workflow signatures. Instruction tuning (e.g., DPO (Nagori et al., 30 Jul 2025)) improves generator calibration.
  • Token-Efficient Control: Retrieval and generation is regulated for token budget compliance, supporting practical deployment at scale (Srinivasan et al., 19 Sep 2025).

A plausible implication is that MoRA-RAG sets a new standard for interpretable, robust, and domain-flexible RAG by combining theoretically grounded mixture-of-expert retrieval with autonomous agentic control.


References:

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Mixture-of-Retrieval Agentic RAG (MoRA-RAG).