Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Once is Enough: A Light-Weight Cross-Attention for Fast Sentence Pair Modeling (2210.05261v3)

Published 11 Oct 2022 in cs.CL and cs.AI

Abstract: Transformer-based models have achieved great success on sentence pair modeling tasks, such as answer selection and natural language inference (NLI). These models generally perform cross-attention over input pairs, leading to prohibitive computational costs. Recent studies propose dual-encoder and late interaction architectures for faster computation. However, the balance between the expressive of cross-attention and computation speedup still needs better coordinated. To this end, this paper introduces a novel paradigm MixEncoder for efficient sentence pair modeling. MixEncoder involves a light-weight cross-attention mechanism. It conducts query encoding only once while modeling the query-candidate interaction in parallel. Extensive experiments conducted on four tasks demonstrate that our MixEncoder can speed up sentence pairing by over 113x while achieving comparable performance as the more expensive cross-attention models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yuanhang Yang (8 papers)
  2. Shiyi Qi (9 papers)
  3. Chuanyi Liu (12 papers)
  4. Qifan Wang (129 papers)
  5. Cuiyun Gao (97 papers)
  6. Zenglin Xu (145 papers)
Citations (2)