Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multiplicative Position-aware Transformer Models for Language Understanding (2109.12788v1)

Published 27 Sep 2021 in cs.CL and cs.AI

Abstract: Transformer models, which leverage architectural improvements like self-attention, perform remarkably well on NLP tasks. The self-attention mechanism is position agnostic. In order to capture positional ordering information, various flavors of absolute and relative position embeddings have been proposed. However, there is no systematic analysis on their contributions and a comprehensive comparison of these methods is missing in the literature. In this paper, we review major existing position embedding methods and compare their accuracy on downstream NLP tasks, using our own implementations. We also propose a novel multiplicative embedding method which leads to superior accuracy when compared to existing methods. Finally, we show that our proposed embedding method, served as a drop-in replacement of the default absolute position embedding, can improve the RoBERTa-base and RoBERTa-large models on SQuAD1.1 and SQuAD2.0 datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhiheng Huang (33 papers)
  2. Davis Liang (15 papers)
  3. Peng Xu (357 papers)
  4. Bing Xiang (74 papers)
Citations (1)