Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring
This paper introduces a transformer architecture known as the Poly-encoder, innovating in the domain of multi-sentence scoring tasks which often require balancing accuracy with computational efficiency. In contexts where models need to evaluate multiple candidate sequences against an input context, existing methods vary significantly in terms of accuracy and speed. The new architecture proposed, the Poly-encoder, is designed to mediate between the high predictive quality of Cross-encoders and the speed advantages presented by Bi-encoders.
The paper critically examines the existing models: Cross-encoders, which conduct comprehensive self-attention across input-label pairs, and Bi-encoders that encode these separately, bringing clarity to the performance trade-offs each encapsulates. While Cross-encoders offer state-of-the-art accuracy, they are often too slow for practical use due to their intensive computation across candidate pairs at inference time. Bi-encoders bring efficiency as they allow for caching of candidate encodings, thus expediting the evaluation process, but generally at the expense of lower accuracy.
The Poly-encoder combines elements from both Cross- and Bi-encoders to achieve better performance without sacrificing speed. It employs a mechanism that utilizes a limited number of global context representations that subsequently attend to the candidate. This approach provides a nuanced interaction between context and candidates while maintaining computational feasibility. The empirical results presented in the paper indicate that Poly-encoders perform favorably on several dialogue and information retrieval tasks compared to pre-existing methods. Particularly over tasks such as ConvAI2, DSTC7, and Ubuntu Dialogue Corpus, the Poly-encoder outpaces Bi-encoders in accuracy and narrows the performance gap with Cross-encoders, all while maintaining a faster inference speed conducive for large-scale or real-time applications.
Importantly, the paper also explores the nuances of pretraining strategies for transformer models pertinent to this domain. Pre-training on domain-related data is highlighted as pivotal for elevating the Poly-encoder's performance, outweighing traditional BERT pre-training that utilizes more general corpora like Wikipedia and Toronto Books. Fine-tuning established transformer weights that have been pre-trained on Reddit datasets—reminiscent of the dialogue tasks in focus—demonstrated superior results, underscoring a broader finding that pre-training with data more aligned with the downstream task enhances subsequent performance.
In summary, the Poly-encoder stands as an efficient compromise, significantly advancing the task of multi-sentence scoring. Its design ensures it can better leverage the nuances of predictive modeling, achieving higher accuracy than Bi-encoders, and maintaining a performance close to Cross-encoders while dramatically reducing computational time. This development enables more efficient use in practical applications, such as conversational AI and large-scale retrieval tasks, pointing towards a future where transformer architectures continue to strike a finer balance between speed and accuracy. This paper contributes a critical step forward in real-time predictive model efficiency, with potential implications for various AI applications that require prompt and precise text matching capabilities. As AI models continue advancing, the principles outlined here could form a foundation for subsequent models balancing nuanced comprehension and operational practicality.