Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Biasing Transformer Attention Towards Monotonicity (2104.03945v1)

Published 8 Apr 2021 in cs.CL

Abstract: Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks: grapheme-to-phoneme conversion, morphological inflection, transliteration, and dialect normalization. Experiments show that we can achieve largely monotonic behavior. Performance is mixed, with larger gains on top of RNN baselines. General monotonicity does not benefit transformer multihead attention, however, we see isolated improvements when only a subset of heads is biased towards monotonic behavior.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Annette Rios (10 papers)
  2. Chantal Amrhein (13 papers)
  3. Noƫmi Aepli (8 papers)
  4. Rico Sennrich (88 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.