Agent-UniRAG: Modular Agent-RAG System
- Agent-UniRAG is a modular, agent-based retrieval-augmented generation framework that combines planning, hybrid retrieval, and iterative reasoning.
- It employs specialized agents like PlanningAgent and VectorSearchAgent to decompose queries and select relevant evidence.
- Empirical results show significant gains in accuracy, efficiency, and cost reduction across applications in QA, software testing, and decision support.
Agent-UniRAG is a family of modular, agent-based retrieval-augmented generation (RAG) frameworks that unify autonomous agent orchestration, advanced hybrid retrieval, and step-wise reasoning to enable interpretable, end-to-end LLM-powered systems for complex information-seeking tasks. By structuring long-context reasoning as a sequence of agent “thoughts,” actions, and contextual retrievals, Agent-UniRAG achieves substantially higher effectiveness, efficiency, and auditability than traditional RAG pipelines across diverse domains, including software testing automation, open-domain QA, and high-stakes decision support in education and enterprise (Hariharan et al., 12 Oct 2025, Pham et al., 28 May 2025, Nguyen-Duc et al., 15 Jul 2025).
1. Conceptual Foundation
Agent-UniRAG generalizes RAG by embedding a loop of autonomous agent modules, each capable of: (i) planning (deciding when and how to retrieve), (ii) hybrid retrieval (combining dense and sparse, or graph-based and vector-based, retrieval signals), (iii) explicit evidence selection or filtering, and (iv) iterative reasoning using working memory to accumulate and reflect on retrieved evidence. The LLM serves as the controller, emitting explicit “Thought,” “Action: Search,” and “Action: FinalAnswer” steps.
Formally, one variant defines Agent-UniRAG by the tuple:
where:
- : Query set (requirements, questions)
- : Document corpus
- : Knowledge graph (nodes/entities, edges/dependencies)
- : Embedding function (maps text to )
- : Set of autonomous agents (e.g., Planning, VectorSearch, GraphSearch, etc.)
- : Vector similarity (cosine, thresholded)
- : Graph similarity (weighted path aggregation)
- : Fusion function for hybrid retrieval
This framework supports both single-hop (direct QA) and multi-hop (stepwise reasoning) retrieval, leveraging explicit decision points to enable traceability and model introspection (Hariharan et al., 12 Oct 2025, Pham et al., 28 May 2025).
2. Multi-Agent Orchestration and Workflow
Agent-UniRAG architectures instantiate a director agent (Coordinator) responsible for orchestrating workflows among a suite of specialized agents. These typically include:
- PlanningAgent: Scopes queries, decomposes requirements or objectives, and prioritizes sub-tasks.
- VectorSearchAgent: Executes dense semantic retrieval within a vector database, using thresholded cosine similarity over text embeddings.
- GraphSearchAgent: Traverses a knowledge graph using algorithms such as BFS/DFS/PageRank, retrieving nodes/edges relevant to the query via weighted multi-path aggregation.
- ContextAssemblyAgent: Merges, de-duplicates, and conflict-resolves context passages from multiple retrieval sources using 15+ rule-based strategies.
- GenerationOrchestrator: Manages prompt layering, routes context to LLMs, consolidates partial generations, and emits structured outputs (e.g., test cases).
- ValidationAgent: Performs multi-stage validation (syntax, semantic, logic, compliance) and ensures traceability.
- TraceabilityAgent: Constructs bidirectional traceability matrices and impact analyses.
The orchestrated loop dynamically adapts to input complexity: simple queries may cycle once; complex, compound queries trigger iterative search/evidence/planning steps.
Pseudocode (excerpt from (Hariharan et al., 12 Oct 2025)): 9
3. Hybrid Retrieval and Evidence Processing
A defining feature of Agent-UniRAG deployments is the hybridization of retrieval signals:
- Vector Retrieval: Dense embeddings (e.g., Sentence-Transformer, MPNet, E5) yield candidate passages via cosine similarity, with thresholds typically 0 (or stricter for precision-critical settings).
- Graph or Sparse Retrieval: Knowledge graph traversal or sparse keyword matching (BM25, ElasticSearch) supplements dense retrieval, capturing structural or symbolic context missed by vectors.
- Score Fusion: Hybrid scoring combines vector and graph/sparse signals:
1
with 2 controlling modality weighting.
As a general pattern, initial candidate sets (3, 4) are merged and re-ranked, often with LLM-based cross-encoding (“verifier” prompts) filtering top-n final passages for answer-generation (Nguyen-Duc et al., 15 Jul 2025). This approach optimizes both recall and precision, reducing hallucinations and facilitating strict factuality.
4. LLM Integration and Prompt Engineering
Agent-UniRAG designs typically deploy both open and closed LLMs, selected dynamically per sub-task based on complexity (5) and context length (6):
- For lower complexity and shorter context, compact models such as Mistral-7B or Llama-3-8B are used.
- For more complex or longer context, larger models such as GeminiPro or GPT-4o are invoked.
Prompt engineering follows a multi-layered design:
- Context Layer: Injects retrieved passages and relevant graph facts.
- Specification Layer: Supplies granular requirement snippets, criteria, or calculation rules.
- Template Layer: Provides structured output schemas (e.g. test-case JSON, tabular facts).
- Validation Layer: Imposes output constraints (e.g., trace IDs, citations).
- Enhancement Layer: Adds historical patterns, best practices, or persona framing.
Context and prompt payloads are coordinated between agents via JSON over REST/gRPC, supporting streaming, chunked transmission, and efficient chunk selection to avoid LLM context truncation.
5. Empirical Results and Comparative Performance
Agent-UniRAG demonstrates highly competitive or state-of-the-art results across domains:
| Evaluation | Accuracy (%) | Hallucination (%) | Time/Cost | Scope | Reference |
|---|---|---|---|---|---|
| Agentic RAG (QE) | 94.8 | – | 85% time reduction | 25,000 SAP test cases | (Hariharan et al., 12 Oct 2025) |
| MARAUS (Admissions) | 92 | 1.45 | ~$0.002 / query | 6,079 real interactions | (Nguyen-Duc et al., 15 Jul 2025) |
| Agent-UniRAG (OpenQA) | ≤72.5 (F1) | – | Efficient on 8B LLM | QA benchmarks, 2-step avg. | (Pham et al., 28 May 2025) |
Ablation studies consistently show that multi-agent orchestration, hybrid retrieval, and enhanced contextualization are critical for accuracy. As detailed in (Hariharan et al., 12 Oct 2025), removing contextualization led to an 18.2% drop in accuracy, hybrid retrieval –15.7%, and multi-agent orchestration –12.3%.
Efficiency and cost gains in enterprise-scale deployments include an 85% reduction in engineering artifact creation time, 35% overall cost savings, 2–16 month acceleration in project go-live, and nearly complete (98.7%) test-case coverage (Hariharan et al., 12 Oct 2025).
6. Synthetic Data and End-to-End Training
One variant introduces synthetic agent data (SynAgent-RAG) through LLM distillation. The dataset, designed for open-source LLMs, contains explicit chains-of-thought, planning steps, search actions, and evidence extractions for both single-hop and multi-hop queries. This enables instruction fine-tuning of the full agent reasoning and retrieval loop even for models with limited parameter count (e.g., Llama-3-8B) (Pham et al., 28 May 2025).
The end-to-end learning objective is selective cross-entropy:
7
where 8 are agent responses in a multi-turn conversational format.
Empirical results on SQuAD, TriviaQA, MuSiQue, and HotpotQA show that Agent-UniRAG delivers step-size-adaptive reasoning, fewer retrieval calls (mean ∼2), and matches or outperforms larger GPT-4/GPT-3.5 models with far smaller backbone models.
7. Application-Specific Adaptations and Best Practices
Agent-UniRAG is extensively tailored for practical deployment:
- In software quality engineering, domain-specific adapters interface with SAP migration artifacts and legacy processes (Hariharan et al., 12 Oct 2025).
- For real-world admissions counseling (MARAUS), agent pipelines modularize factual lookup, numeric calculation, eligibility recommendations, and hybrid context synthesis for ambiguous queries (Nguyen-Duc et al., 15 Jul 2025). The precision of factual grounding is enforced via LLM re-ranking, strict citation requirements, and cross-lingual embedding strategies.
- Operational best practices include aggressive boilerplate filtering, near-duplicate reduction, persona-consistent prompt design, and agent microservice containerization for horizontal scaling and resilience.
8. Limitations and Future Directions
Agent-UniRAG and its derivatives do not yet generalize beyond RAG-style QA or structured artifact generation. Open challenges include supporting function calling, generalized planning beyond retrieval, and open-ended creative or code-generation tasks. Inference can induce latency due to multiple LLM calls per query. Ongoing research explores extensions to non-RAG tasks, optimized sub-chain caching, adaptive retrieval cost policies, and multi-modal reasoning (Pham et al., 28 May 2025).
References
- "Agentic RAG for Software Testing with Hybrid Vector-Graph and Multi-Agent Orchestration" (Hariharan et al., 12 Oct 2025)
- "Agent-UniRAG: A Trainable Open-Source LLM Agent Framework for Unified Retrieval-Augmented Generation Systems" (Pham et al., 28 May 2025)
- "An Empirical Study of Multi-Agent RAG for Real-World University Admissions Counseling" (Nguyen-Duc et al., 15 Jul 2025)