Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 101 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 25 tok/s
GPT-5 High 31 tok/s Pro
GPT-4o 102 tok/s
GPT OSS 120B 458 tok/s Pro
Kimi K2 220 tok/s Pro
2000 character limit reached

Omni-RAG Modular Framework

Updated 16 August 2025
  • Omni-RAG Framework is a modular, reconfigurable system that decomposes retrieval and generation into independent modules and specialized operators.
  • The architecture features a three-level hierarchy that enables flexible orchestration, dynamic routing, and precise fusion across various data and task types.
  • It supports diverse applications including multi-turn QA, domain-specific retrieval, and debugging, thereby enhancing scalability and transparency in LLM integration.

The Omni-RAG Framework is a modular, highly reconfigurable architecture for retrieval-augmented generation (RAG) that generalizes and extends linear RAG pipelines by decomposing the retrieval and generation workflow into independently operable modules and specialized operators. This approach enables unprecedented flexibility in orchestrating, optimizing, and customizing knowledge integration for LLMs across diverse application domains and use cases (Gao et al., 26 Jul 2024).

1. Architecture and Decomposition

The Omni-RAG (also referred to as Modular RAG) framework is structured as an explicit three-level computational hierarchy:

  1. Modules (Level 1): High-level components such as:
    • Indexing: Preprocessing and embedding documents.
    • Pre-Retrieval: Query enhancement via expansion, rewriting, or transformation.
    • Retrieval: Selection of relevant documents using dense, sparse, or hybrid retrievers.
    • Post-Retrieval: Reranking, compression, and selection of retrieved content.
    • Generation: Answer synthesis with LLMs, optionally with fine-tuning or reinforcement learning.
    • Orchestration: Governing overall workflow (routing, scheduling, fusion).
  2. Sub-Modules (Level 2): Each module can be subdivided into more specialized submodules (e.g., within retrieval, submodules for multi-hop retrieval, retrieval over structured and unstructured data, etc.).
  3. Operators (Level 3): Atomic operators implement specific tasks such as query expansion (fqef_{qe}), query rewriting into SQL/Cypher (fqcf_{qc}), chunk compression/selection (fcomp,fself_{comp}, f_{sel}), and routing (frf_r).

This modular graph-based abstraction allows the RAG pipeline to be expressed as a directed computational graph, where the choice, composition, and connection of modules/operators are fully flexible, transforming deployment into a "LEGO-like" assembly process.

2. Reconfigurability, Operators, and Orchestration

Reconfigurability within the framework is realized through:

  • Independent Module Design: Modules are fully decoupled with standardized input-output interfaces. Each can be replaced, ablated, fine-tuned, or extended without refactoring the global pipeline.
  • Routing Operators (frf_r): These functions dynamically select processing paths based on query features, task types, or output quality. Mathematically, fr:QFf_r: Q \rightarrow \mathcal{F}, with candidate flows scored as

score(q,Fi)=ascorekey(q,Fi)+(1a)scoresemantic(q,Fi)\text{score}(q, F_i) = a \cdot \text{score}_{key}(q, F_i) + (1-a) \cdot \text{score}_{semantic}(q, F_i)

  • Scheduling Operators: Rule-based or learned schedulers decide when to iterate, branch, or terminate retrieval/generation (e.g., based on token probability thresholds or LLM-based satisfaction judges).
  • Fusion Operators: Aggregate outputs from parallel or iterative branches, e.g., via ensemble or softmax-normalized weighting:

p(yq,Dq)=dDqp(yd,q)λ(d,q)p(y \mid q, D^q) = \sum_{d \in D^q} p(y \mid d, q) \cdot \lambda(d, q)

with

λ(d,q)=es(d,q)dDqes(d,q)\lambda(d, q) = \frac{e^{s(d,q)}}{\sum_{d' \in D^q} e^{s(d',q)}}

Orchestration modules implement the computation graph, maintain control flow dependencies, and fuse the outputs at appropriate stages, supporting conditional execution, adaptive looping, and multi-branch processing.

3. Common Patterns and Execution Flows

The framework supports a diverse array of system patterns, succinctly formalized as follows:

  • Linear Pattern: The traditional RAG sequence:

qR(q,D)LLM([q,Dq])q \to R(q, D) \to LLM([q, D^q])

  • Conditional Pattern: Dynamic paths selected by frf_r based on intermediate outputs (such as routing to rerankers if retrieval confidence is low).
  • Branching Pattern: Pre- or post-retrieval branching for parallel sub-query expansion or per-document generation, with eventual fusion.
  • Looping Pattern: Iterative or recursive retrieval where LLM outputs are used to refine queries, supporting recurrency until a stopping criterion is reached.

Each pattern arises as a special subgraph within the broader computational hypergraph, allowing the system to be precisely tailored to complex, multi-stage tasks or to simple, linear retrieval-plus-generation.

4. Theoretical Implications and New Paradigms

The modularization supports a number of significant theoretical and practical advances:

  • Unified Framework: All previous RAG systems (from naïve linear to advanced multi-turn, multi-modal) are representable as subcases, yielding a common analytical scaffold.
  • Operator Innovation: New operator classes such as LM-supervised retrievers, RL-based query rewriting, and automated self-reflection modules are anticipated, and the framework is extensible to integrate them with minimal code.
  • Dual Fine-Tuning and Automated Critique: Enables frameworks where retrievers and generators are tuned jointly, and self-verification modules critique system outputs before finalization (supporting robustness and explainability).
  • Graph Abstraction for Debugging and Scaling: System-level failures can be isolated at the module or operator level, facilitating modular debugging, rapid A/B testing, and scaling across hardware/resource boundaries.

5. Applications and Domain Adaptation

The Omni-RAG framework is particularly suitable for scenarios characterized by knowledge heterogeneity, need for transparency, or complex query flows:

  • Knowledge-Intensive QA: Comprehensive coverage of open-domain, multi-hop, extractive, and factual QA scenarios.
  • Multi-Turn Reasoning and Dialogue: Multi-branching, looping, and conditional flows support advanced decision-making, customer service, and assistant systems.
  • Domain-Specific Retrieval: Modular architecture allows rapid integration of legal, medical, technical, or financial corpora/languages, supporting task-specific retrieval and postprocessing.
  • Multimodal and Structured Data Integration: Operators for query transformation (e.g., to SQL or Cypher) and routing support interplay between unstructured texts, knowledge graphs, and tabular data.
  • Debugging and Transparency: All steps and sources are traceable through clearly delineated module boundaries.

6. Illustrative Formulas and Schematic Representation

Key computational aspects are captured by:

  • Indexing and Embedding:

I={e1,e2,...,en},ei=fe(di)\mathcal{I} = \{e_1, e_2, ..., e_n\}, \quad e_i = f_e(d_i)

Dq=R(q,D)D^q = R(q, D)

  • Routing and Flow Selection:

fr:QFf_r: Q \rightarrow \mathcal{F}

score(q,Fi)=a scorekey(q,Fi)+(1a) scoresemantic(q,Fi)\text{score}(q, F_i) = a \ \text{score}_{key}(q, F_i) + (1 - a) \ \text{score}_{semantic}(q, F_i)

  • Fusion (Ensembling):

p(yq,Dq)=dDqp(yd,q) λ(d,q)p(y | q, D^q) = \sum_{d \in D^q} p(y | d, q) \ \lambda(d, q)

λ(d,q)=es(d,q)dDqes(d,q)\lambda(d, q) = \frac{e^{s(d, q)}}{\sum_{d' \in D^q} e^{s(d', q)}}

  • Schematic Diagram (description):
    • Layers for Indexing, Pre-retrieval, Retrieval, Post-retrieval, Generation, and Orchestration
    • Orchestration contains arrows for routing (conditional/branching) and feedback loops (looping). Fusion modules aggregate outputs from parallel branches.

7. Practical Roadmap and Impact

The modular, graph-based design of Omni-RAG fundamentally shifts RAG system construction from rigid pipelines to reconfigurable and extensible architectures. This abstraction:

  • Incorporates legacy and future RAG approaches into a common platform.
  • Simplifies comparative benchmarking and hybridization of component strategies.
  • Supports robust debugging, maintainability, and vertical domain adaptation.
  • Facilitates integration with multi-source, semi-structured, and multimodal data.
  • Enables continuous innovation by lowering the barrier to new module/operator development.

The Omni-RAG paradigm provides both a theoretical foundation and a practical roadmap for the next generation of retrieval-augmented technologies, unifying disparate workflows and accelerating the deployment of highly adaptable, knowledge-intensive AI systems (Gao et al., 26 Jul 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube