Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring (1905.01969v4)

Published 22 Apr 2019 in cs.CL and cs.AI

Abstract: The use of deep pre-trained bidirectional transformers has led to remarkable progress in a number of applications (Devlin et al., 2018). For tasks that make pairwise comparisons between sequences, matching a given input with a corresponding label, two approaches are common: Cross-encoders performing full self-attention over the pair and Bi-encoders encoding the pair separately. The former often performs better, but is too slow for practical use. In this work, we develop a new transformer architecture, the Poly-encoder, that learns global rather than token level self-attention features. We perform a detailed comparison of all three approaches, including what pre-training and fine-tuning strategies work best. We show our models achieve state-of-the-art results on three existing tasks; that Poly-encoders are faster than Cross-encoders and more accurate than Bi-encoders; and that the best results are obtained by pre-training on large datasets similar to the downstream tasks.

Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

This paper introduces a transformer architecture known as the Poly-encoder, innovating in the domain of multi-sentence scoring tasks which often require balancing accuracy with computational efficiency. In contexts where models need to evaluate multiple candidate sequences against an input context, existing methods vary significantly in terms of accuracy and speed. The new architecture proposed, the Poly-encoder, is designed to mediate between the high predictive quality of Cross-encoders and the speed advantages presented by Bi-encoders.

The paper critically examines the existing models: Cross-encoders, which conduct comprehensive self-attention across input-label pairs, and Bi-encoders that encode these separately, bringing clarity to the performance trade-offs each encapsulates. While Cross-encoders offer state-of-the-art accuracy, they are often too slow for practical use due to their intensive computation across candidate pairs at inference time. Bi-encoders bring efficiency as they allow for caching of candidate encodings, thus expediting the evaluation process, but generally at the expense of lower accuracy.

The Poly-encoder combines elements from both Cross- and Bi-encoders to achieve better performance without sacrificing speed. It employs a mechanism that utilizes a limited number of global context representations that subsequently attend to the candidate. This approach provides a nuanced interaction between context and candidates while maintaining computational feasibility. The empirical results presented in the paper indicate that Poly-encoders perform favorably on several dialogue and information retrieval tasks compared to pre-existing methods. Particularly over tasks such as ConvAI2, DSTC7, and Ubuntu Dialogue Corpus, the Poly-encoder outpaces Bi-encoders in accuracy and narrows the performance gap with Cross-encoders, all while maintaining a faster inference speed conducive for large-scale or real-time applications.

Importantly, the paper also explores the nuances of pretraining strategies for transformer models pertinent to this domain. Pre-training on domain-related data is highlighted as pivotal for elevating the Poly-encoder's performance, outweighing traditional BERT pre-training that utilizes more general corpora like Wikipedia and Toronto Books. Fine-tuning established transformer weights that have been pre-trained on Reddit datasets—reminiscent of the dialogue tasks in focus—demonstrated superior results, underscoring a broader finding that pre-training with data more aligned with the downstream task enhances subsequent performance.

In summary, the Poly-encoder stands as an efficient compromise, significantly advancing the task of multi-sentence scoring. Its design ensures it can better leverage the nuances of predictive modeling, achieving higher accuracy than Bi-encoders, and maintaining a performance close to Cross-encoders while dramatically reducing computational time. This development enables more efficient use in practical applications, such as conversational AI and large-scale retrieval tasks, pointing towards a future where transformer architectures continue to strike a finer balance between speed and accuracy. This paper contributes a critical step forward in real-time predictive model efficiency, with potential implications for various AI applications that require prompt and precise text matching capabilities. As AI models continue advancing, the principles outlined here could form a foundation for subsequent models balancing nuanced comprehension and operational practicality.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Samuel Humeau (12 papers)
  2. Kurt Shuster (28 papers)
  3. Marie-Anne Lachaux (10 papers)
  4. Jason Weston (130 papers)
Citations (265)