Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 147 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 27 tok/s Pro

GPT-5 High 30 tok/s Pro

GPT-4o 96 tok/s Pro

Kimi K2 188 tok/s Pro

GPT OSS 120B 398 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models (2505.03075v1)

Published 5 May 2025 in cs.IR

Abstract: Retrieval-augmented generation (RAG) integrates LLMs ( LLM s) with retrievers to access external knowledge, improving the factuality of LLM generation in knowledge-grounded tasks. To optimize the RAG performance, most previous work independently fine-tunes the retriever to adapt to frozen LLM s or trains the LLMs to use documents retrieved by off-the-shelf retrievers, lacking end-to-end training supervision. Recent work addresses this limitation by jointly training these two components but relies on overly simplifying assumptions of document independence, which has been criticized for being far from real-world scenarios. Thus, effectively optimizing the overall RAG performance remains a critical challenge. We propose a direct retrieval-augmented optimization framework, named DRO, that enables end-to-end training of two key components: (i) a generative knowledge selection model and (ii) an LLM generator. DRO alternates between two phases: (i) document permutation estimation and (ii) re-weighted maximization, progressively improving RAG components through a variational approach. In the estimation step, we treat document permutation as a latent variable and directly estimate its distribution from the selection model by applying an importance sampling strategy. In the maximization step, we calibrate the optimization expectation using importance weights and jointly train the selection model and LLM generator. Our theoretical analysis reveals that DRO is analogous to policy-gradient methods in reinforcement learning. Extensive experiments conducted on five datasets illustrate that DRO outperforms the best baseline with 5%-15% improvements in EM and F1. We also provide in-depth experiments to qualitatively analyze the stability, convergence, and variance of DRO.

Summary

Analyzing Direct Retrieval-Augmented Optimization for Knowledge Selection and LLMs

The paper "Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and LLMs" by Zhengliang Shi et al. introduces a framework termed Direct Retrieval-augmented Optimization (DRO), aiming to enhance the performance of retrieval-augmented generation (RAG) systems. The research addresses the limitations of current RAG models that separate the training of the document retriever and the LLM (LM) generator, often leading to suboptimal performance due to the absence of end-to-end supervision.

Methodological Framework

DRO is distinct in that it enables the end-to-end training of both the document selection model and the LM generator. The key components of DRO are articulated as follows:

Generative Knowledge Selection Model: This component selects relevant document permutations in a list-wise fashion, optimizing the synergy between selected document sets and LM output.
LLM Generator: Utilizes the selected document permutations to produce accurate responses grounded in retrieved external knowledge.

DRO employs an alternating optimization process akin to the Expectation-Maximization (EM) framework:

Document Permutation Estimation (E-Step): Treats document permutation as a latent variable, estimated using an importance sampling strategy from the selection model.
Re-weighted Maximization (M-Step): Uses the estimated permutations in conjunction with importance weights to jointly optimize the selection model and LM generator.

This dual-component training seeks to maximize the expectation of the log-likelihood for document selection and generation, drawing parallels to policy-gradient methods in reinforcement learning.

Experimental Validation and Numerical Results

Empirical evaluations across five datasets demonstrate significant improvements in exact match (EM) and F1 scores, with the DRO method outperforming state-of-the-art baselines by 5%–15%. The selection model's precision in identifying target documents improved significantly, illustrating the effectiveness of holistic, synchronized training strategies.

Theoretical Insights

The paper further provides a theoretical exploration of DRO, establishing a connection between the proposed optimization method and reinforcement learning paradigms. The analogies drawn parallel policy gradient approaches wherein the document permutation acts as a 'policy', with generated document relevance estimates serving as 'rewards'. This alignment highlights the iterative co-optimization and their interdependencies to maximize end-to-end RAG performance.

Implications and Future Prospects

The articulated DRO framework advances both theoretical and applied fronts of retrieval-augmented LLMing. By eliminating the traditionally separate fine-tuning of retrieval and generation components, DRO presents a robust pathway to enhancing cross-component dependencies and maximizing the performance yield. Future work could extend DRO methodologies to multi-modal and cross-lingual RAG applications, exploring broader LM capabilities and generalizability across diverse domains.

In conclusion, this research pushes the boundaries of RAG capabilities by leveraging a novel optimization paradigm that intricately links retrieval and generation tasks. The findings open compelling directions for further exploration into deeply integrated, retrieval-augmented language processing systems.