Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer (2212.02027v1)

Published 5 Dec 2022 in cs.CL and cs.LG

Abstract: Systems for knowledge-intensive tasks such as open-domain question answering (QA) usually consist of two stages: efficient retrieval of relevant documents from a large corpus and detailed reading of the selected documents to generate answers. Retrievers and readers are usually modeled separately, which necessitates a cumbersome implementation and is hard to train and adapt in an end-to-end fashion. In this paper, we revisit this design and eschew the separate architecture and training in favor of a single Transformer that performs Retrieval as Attention (ReAtt), and end-to-end training solely based on supervision from the end QA task. We demonstrate for the first time that a single model trained end-to-end can achieve both competitive retrieval and QA performance, matching or slightly outperforming state-of-the-art separately trained retrievers and readers. Moreover, end-to-end adaptation significantly boosts its performance on out-of-domain datasets in both supervised and unsupervised settings, making our model a simple and adaptable solution for knowledge-intensive tasks. Code and models are available at https://github.com/jzbjyb/ReAtt.

Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer

The paper "Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer" explores a transformative approach towards modeling systems for knowledge-intensive tasks such as open-domain question answering (QA). Traditionally, these systems operate in a bifurcated manner: an initial retrieval module efficiently extracts relevant documents, which are then parsed and interpreted by a reading module to generate answers. This bifurcation into separate retrievers and readers often results in cumbersome implementations and poses challenges for seamless end-to-end training.

This work revisits the two-stage architecture and proposes an innovative solution—Retrieval as Attention (ReAtt), which integrates both retrieval and reading within a single Transformer framework. By leveraging self-attention mechanisms, ReAtt performs retrieval inherently as part of the attention process, aligning closely with the computational paradigms of Transformer architecture. This unified model eliminates the necessity for specific retrieval-centric warm-up procedures and annotation, thereby simplifying adaptation to new domains and streamlining training procedures.

Core Methodology and Findings

The authors utilize the T5 encoder-decoder architecture as the underlying framework for ReAtt. The first several layers of the encoder are designated for independent query and document embedding akin to bi-encoders, while subsequent layers incorporate cross-attention to facilitate document relevance scoring. A key innovation resides in the use of attention scoring to serve as retrieval signals, which are fine-tuned across various sampled documents through measures such as KL-divergence with target attention distributions derived from decoder-encoder interactions.

The system underwent rigorous evaluation on Natural Questions (NQ) and demonstrated comparability or superiority in retrieval performance against state-of-the-art models such as ColBERT-NQ. It achieved competitive retrieval accuracy (R@1=55.8%, R@5=77.4%) and QA exact match scores (EM=54.7%), underscoring ReAtt's efficacy in fully end-to-end training scenarios without reliance on retrieval-specific pretraining or annotations.

Implications and Future Scope

The implications of this research are multifold. Practically, it offers a streamlined, adaptable solution for handling complex datasets prevalent in domains requiring rapid assimilation and synthesis of information from vast corpora, such as biomedical or finance-related QA systems. Theoretically, the integration of retrieval as attention furnishes novel vistas for understanding the interaction between information retrieval and linguistic generation within neural architectures, potentially spurring further inquiries into holistic models that eschew traditional task separation in favor of unified learning processes.

Furthermore, the ability of ReAtt to generalize to out-of-domain datasets without retrieval annotations but through simple QA-based end-to-end adaptation augurs well for its deployment in varied knowledge-intensive contexts. Researchers might expand on this framework to scale it further across extremely large corpuses or integrate even more complex reasoning tasks that demand real-time synthesis of information with minimal latency.

ReAtt not only signifies a compelling innovation in integrating retrieval and inferencing processes within a single model but also strikes a practical balance in optimizing both retrieval accuracy and interpretative precision, marking an important stride towards more sophisticated, adaptable AI systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zhengbao Jiang (25 papers)
  2. Luyu Gao (26 papers)
  3. Jun Araki (11 papers)
  4. Haibo Ding (11 papers)
  5. Zhiruo Wang (18 papers)
  6. Jamie Callan (43 papers)
  7. Graham Neubig (342 papers)
Citations (34)
Github Logo Streamline Icon: https://streamlinehq.com