Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exact Hard Monotonic Attention for Character-Level Transduction (1905.06319v3)

Published 15 May 2019 in cs.CL

Abstract: Many common character-level, string-to string transduction tasks, e.g., grapheme-tophoneme conversion and morphological inflection, consist almost exclusively of monotonic transductions. However, neural sequence-to sequence models that use non-monotonic soft attention often outperform popular monotonic models. In this work, we ask the following question: Is monotonicity really a helpful inductive bias for these tasks? We develop a hard attention sequence-to-sequence model that enforces strict monotonicity and learns a latent alignment jointly while learning to transduce. With the help of dynamic programming, we are able to compute the exact marginalization over all monotonic alignments. Our models achieve state-of-the-art performance on morphological inflection. Furthermore, we find strong performance on two other character-level transduction tasks. Code is available at https://github.com/shijie-wu/neural-transducer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Roee Aharoni and Yoav Goldberg. 2017. Morphological inflection generation with hard monotonic attention. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2004–2015, Vancouver, Canada. Association for Computational Linguistics.
  2. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, volume abs/1409.0473.
  3. Training data augmentation for low-resource morphological inflection. In Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, pages 31–39, Vancouver. Association for Computational Linguistics.
  4. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–311.
  5. CoNLL-SIGMORPHON 2017 shared task: Universal morphological reinflection in 52 languages. In Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, pages 1–30, Vancouver. Association for Computational Linguistics.
  6. The SIGMORPHON 2016 shared task—morphological reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 10–22. Association for Computational Linguistics.
  7. Alex Graves. 2012. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711.
  8. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation, 9(8):1735–1780.
  9. Diederick P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).
  10. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421, Lisbon, Portugal. Association for Computational Linguistics.
  11. Peter Makarov and Simon Clematide. 2018. Imitation learning for neural morphological string transduction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2877–2882, Brussels, Belgium. Association for Computational Linguistics.
  12. Align and copy: UZH at SIGMORPHON 2017 shared task for morphological reinflection. In Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, pages 49–57, Vancouver. Association for Computational Linguistics.
  13. Lawrence R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286.
  14. Online and linear-time attention by enforcing monotonic alignments. In International Conference on Machine Learning, pages 2837–2846.
  15. Weighting finite-state transductions with neural context. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 623–633, San Diego, California. Association for Computational Linguistics.
  16. Mihaela Rosca and Thomas Breuel. 2016. Sequence-to-sequence neural network models for transliteration. arXiv preprint arXiv:1610.09565.
  17. Terrence J. Sejnowski and Charles R. Rosenberg. 1987. Parallel networks that learn to pronounce English text. Complex Systems, 1.
  18. Data augmentation for morphological reinflection. In Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, pages 90–99, Vancouver. Association for Computational Linguistics.
  19. Noise-aware character alignment for bootstrapping statistical machine transliteration from bilingual corpora. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 204–209, Seattle, Washington, USA. Association for Computational Linguistics.
  20. Neural hidden Markov model for machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 377–382. Association for Computational Linguistics.
  21. R. L. Weide. 1998. The Carnegie Mellon pronouncing dictionary.
  22. Hard non-monotonic attention for character-level transduction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4425–4438. Association for Computational Linguistics.
  23. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, pages 2048–2057.
  24. Kaisheng Yao and Geoffrey Zweig. 2015. Sequence-to-sequence neural net models for grapheme-to-phoneme conversion. In INTERSPEECH, pages 3330–3334, Dresden, Germany.
  25. Online segment to segment neural transduction. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1307–1316, Austin, Texas. Association for Computational Linguistics.
  26. Whitepaper of NEWS 2015 shared task on machine transliteration. In Proceedings of the Fifth Named Entity Workshop, pages 1–9, Beijing, China. Association for Computational Linguistics.
  27. Chunting Zhou and Graham Neubig. 2017. Multi-space variational encoder-decoders for semi-supervised labeled sequence transduction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 310–320, Vancouver, Canada. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Shijie Wu (23 papers)
  2. Ryan Cotterell (226 papers)
Citations (58)

Summary

Exact Hard Monotonic Attention for Character-Level Transduction

This paper presents a novel approach to enforcing monotonicity in neural sequence-to-sequence models through exact hard monotonic attention. The authors focus on character-level transduction tasks, such as grapheme-to-phoneme conversion, named-entity transliteration, and morphological inflection, which are inherently monotonic. Conventional non-monotonic models, despite their success, fail to leverage this monotonicity bias. The central inquiry of the paper is whether enforcing monotonicity can improve performance in these tasks.

Model and Methodology

The core contribution is the development of hard attention sequence-to-sequence models that enforce strict monotonicity while learning latent alignments. Two models are introduced: 0th^\text{th}-order hard monotonic attention and 1st^\text{st}-order hard monotonic attention. The 0th^\text{th}-order model aligns each target character to a source character without considering past alignments. It uses a neuralized HMM parameterization, allowing polynomial-time likelihood computation. The 1st^\text{st}-order model, on the other hand, conditions alignment decisions on previous alignments, introducing dependency that enforces monotonicity strictly.

Dynamic programming techniques are employed to compute exact marginalizations over all monotonic alignments efficiently. This exact computation contrasts with other methods that rely on approximate inference techniques, underscoring the potential computational advantages of the proposed approach.

Empirical Evaluation

The paper reports state-of-the-art performance in morphological inflection tasks using the CoNLL-SIGMORPHON 2017 dataset. Both monotonic models outperform their non-monotonic counterparts, validating monotonicity as a beneficial inductive bias when jointly learning alignments and transduction. The findings present robust evidence that monotonicity leads to more accurate models.

A series of controlled experiments further demonstrate the effectiveness of joint training of transductions and alignments, showcasing performance improvements across three transduction tasks. Specifically, training with enforced monotonicity didn't degrade performance, contrary to previous beliefs, and in fact improved results, especially in high-resource settings.

Implications and Future Directions

The findings have profound implications for the design of neural architectures for sequence transduction tasks. As monotonic models prove effective, they may inspire developments in other domains where monotonic relationships are present. Moreover, the success of exact dynamic programming with monotonic constraints paves the way for further research into efficient and scalable inference mechanisms in structured prediction tasks.

Future research should explore adaptive methods to dynamically choose between monotonic and non-monotonic processes within a single framework, potentially improving models' versatility in handling diverse linguistic phenomena. Additionally, investigating the interplay of monotonic alignment biases with other types of linguistic inductive biases remains a promising avenue.

In conclusion, this paper substantiates the hypothesis that monotonicity is indeed a valuable inductive bias when properly enforced within the sequence-to-sequence models. The advancements presented not only push the performance boundaries in character-level transduction tasks but also lay the groundwork for future exploration into hybrid and dynamic models with enforced structural constraints.