Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Word Order Matters when you Increase Masking (2211.04427v1)

Published 8 Nov 2022 in cs.CL, cs.LG, and cs.NE

Abstract: Word order, an essential property of natural languages, is injected in Transformer-based neural LLMs using position encoding. However, recent experiments have shown that explicit position encoding is not always useful, since some models without such feature managed to achieve state-of-the art performance on some tasks. To understand better this phenomenon, we examine the effect of removing position encodings on the pre-training objective itself (i.e., masked LLMling), to test whether models can reconstruct position information from co-occurrences alone. We do so by controlling the amount of masked tokens in the input sentence, as a proxy to affect the importance of position information for the task. We find that the necessity of position information increases with the amount of masking, and that masked LLMs without position encodings are not able to reconstruct this information on the task. These findings point towards a direct relationship between the amount of masking and the ability of Transformers to capture order-sensitive aspects of language using position encoding.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Karim Lasri (6 papers)
  2. Alessandro Lenci (26 papers)
  3. Thierry Poibeau (25 papers)
Citations (7)