- The paper presents the Experience-RAG Skill that dynamically selects retrieval strategies, achieving superior nDCG@10 scores over fixed pipelines.
- The methodology uses rule-based scene analysis and an experience memory to route queries to optimal retrievers like dense and hybrid RRF models.
- Empirical evaluations demonstrate significant improvements with Recall@10 of 0.9428 and MRR@10 of 0.9006 across diverse retrieval tasks.
Agent-Oriented Pluggable Experience-RAG Skill for Retrieval Strategy Orchestration
Retrieval-augmented generation (RAG) frameworks have become integral for knowledge-intensive LLM applications by enabling evidence-driven responses. Traditionally, most RAG architectures default to a single, fixed retrieval pipeline across diverse tasks. However, this uniformity neglects task heterogeneity. For instance, factoid question answering, multi-hop reasoning, and scientific claim verification each exhibit distinct retrieval requirements. Dense retrieval methods are optimal for direct factual queries, while hybrid strategies leveraging reciprocal rank fusion (RRF) are preferable for complex multi-hop or evidence-based verification tasks.
The paper introduces the Experience-RAG Skill, which reconceptualizes retrieval strategy selection as a pluggable agent skill rather than an upper-level hard-coded workflow component. This reframing aligns with recent agent architectures (e.g., ReAct, Toolformer, HuggingGPT), but uniquely targets retrieval orchestration. The Experience-RAG Skill is positioned as an intermediary between the agent and a pool of candidate retrieval models, enabling task-aware, experience-driven routing and evidence packaging.
Experience-RAG Skill Architecture
The Experience-RAG Skill comprises six modules:
- Skill Interface: Provides a unified interaction endpoint for the agent.
- Scene Analyzer: Constructs task representations from query, context, and metadata.
- Experience Memory: Maintains comprehensive records of prior scene features, retriever score vectors, optimal strategy labels, and margin statistics. This facilitates both rule-based and learned routing paradigms.
- Strategy Router: Selects the most suitable retriever based on scene analysis and historical experience.
- Retriever Pool: Encapsulates multiple retriever implementations, including BM25, dense, hybrid, and contemporary RAG strategies.
- Result Packager: Structures retrieved evidence for agent consumption.
Routing is primarily rule-based in the current iteration, mapping direct tasks to dense retrieval and complex/multi-hop/scientific tasks to hybrid RRF. The experience memory serves as a repository for scene-strategy-performance mappings, enabling explicit, inspectable routing.
The distinction from Adaptive-RAG-style approaches is notable. While Adaptive-RAG methods focus on complexity-driven strategy selection within a predefined set, Experience-RAG Skill advances an agent-centric orchestration boundary, emphasizing reusable experience, unified interfaces, and explicit evidence packaging as a skill.
Experimental Evaluation
Retrieval Benchmarks and Metrics
Evaluation encompasses three datasets—BeIR/nq, BeIR/hotpotqa, and BeIR/scifact—representing diverse retrieval scenarios. 120 queries per dataset are used, with sampled candidate corpora. Comparative baselines include fixed retrieval pipelines (BM25, rewrite_BM25, dense, hybrid_RRF), Experience-RAG Skill, and contemporary routing methods (HyDE, RAPTOR, LongRAG, Adaptive-RAG).
Metrics reported are Recall@10, MRR@10, and nDCG@10, with a focus on nDCG@10 as the primary evaluation criterion.
Main Results
Experience-RAG Skill achieves nDCG@10 of 0.8924, Recall@10 of 0.9428, and MRR@10 of 0.9006. These figures surpass all fixed single-retriever baselines, such as hybrid_RRF (nDCG@10 0.8802), dense (0.8627), BM25 (0.8426), and rewrite_BM25 (0.8412). On mixed-task workloads, Experience-RAG demonstrates orchestration advantages, matching the strongest fixed strategy for individual datasets but exceeding them on aggregate.
Adaptive-RAG-style achieves a marginally higher nDCG@10 (0.8934), but the gap is negligible. Other modern baselines (HyDE, RAPTOR, LongRAG) underperform relative to Experience-RAG Skill in the sampled-corpus context.
Ablation and Routing Variants
Ablation studies confirm that task-aware orchestration, rather than reliance on a single strong retriever, accounts for performance gains. Fixing strategy to hybrid_RRF or dense diminishes nDCG@10, underlining the value of dynamic task-sensitive routing. Learned routing variants—hard classification and score regression—do not outperform the rule-based router under the current scale of experience memory, suggesting further work is needed for robust learning-based skill orchestration.
Agent Workflow Analysis
Preliminary qualitative inspection of agent workflows for representative tasks indicates that Experience-RAG Skill enables explicit, inspectable routing decisions. In contrast, fixed-retriever agent configurations lack dedicated mechanisms for retrieval strategy selection and adaptation.
Implications and Future Prospects
The theoretical implication is that retrieval orchestration is best conceptualized as a reusable agent capability rather than an ad-hoc workflow element. Practically, this design enhances modularity, inspectability, and extensibility for LLM-powered agents operating across heterogeneous tasks. The competitive retrieval performance against modern adaptive methods underscores its viability.
Limitations are acknowledged: experiments utilize sampled corpora, learned routing has not surpassed rule-based baselines, and dynamic onboarding of new retrievers remains unaddressed. Full end-to-end interactive agent benchmarks are not yet realized.
Future developments will likely target:
- Dynamic candidate retriever onboarding and skill registry management
- Capability-aware routing leveraging richer experience memory
- End-to-end agent evaluation encompassing workflow optimization, tool selection, and retrieval adaptation
- Large-scale reinforcement learning for learned routing strategies
Conclusion
Experience-RAG Skill establishes retrieval strategy selection as a modular, experience-driven agent skill, differentiating itself from conventional and adaptive RAG methods by its unified orchestration interface and structured evidence packaging. Empirical evaluations demonstrate practical benefits for heterogeneous retrieval workloads, positioning the skill as a competitive alternative within agent-centric systems. Broader adoption will depend on advances in learned routing, open-world retriever integration, and comprehensive agent-level benchmarks.
(2605.03989)