Exact Hard Monotonic Attention for Character-Level Transduction (1905.06319v3)
Abstract: Many common character-level, string-to string transduction tasks, e.g., grapheme-tophoneme conversion and morphological inflection, consist almost exclusively of monotonic transductions. However, neural sequence-to sequence models that use non-monotonic soft attention often outperform popular monotonic models. In this work, we ask the following question: Is monotonicity really a helpful inductive bias for these tasks? We develop a hard attention sequence-to-sequence model that enforces strict monotonicity and learns a latent alignment jointly while learning to transduce. With the help of dynamic programming, we are able to compute the exact marginalization over all monotonic alignments. Our models achieve state-of-the-art performance on morphological inflection. Furthermore, we find strong performance on two other character-level transduction tasks. Code is available at https://github.com/shijie-wu/neural-transducer.
- Roee Aharoni and Yoav Goldberg. 2017. Morphological inflection generation with hard monotonic attention. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2004–2015, Vancouver, Canada. Association for Computational Linguistics.
- Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, volume abs/1409.0473.
- Training data augmentation for low-resource morphological inflection. In Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, pages 31–39, Vancouver. Association for Computational Linguistics.
- The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–311.
- CoNLL-SIGMORPHON 2017 shared task: Universal morphological reinflection in 52 languages. In Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, pages 1–30, Vancouver. Association for Computational Linguistics.
- The SIGMORPHON 2016 shared task—morphological reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 10–22. Association for Computational Linguistics.
- Alex Graves. 2012. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation, 9(8):1735–1780.
- Diederick P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).
- Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421, Lisbon, Portugal. Association for Computational Linguistics.
- Peter Makarov and Simon Clematide. 2018. Imitation learning for neural morphological string transduction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2877–2882, Brussels, Belgium. Association for Computational Linguistics.
- Align and copy: UZH at SIGMORPHON 2017 shared task for morphological reinflection. In Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, pages 49–57, Vancouver. Association for Computational Linguistics.
- Lawrence R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286.
- Online and linear-time attention by enforcing monotonic alignments. In International Conference on Machine Learning, pages 2837–2846.
- Weighting finite-state transductions with neural context. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 623–633, San Diego, California. Association for Computational Linguistics.
- Mihaela Rosca and Thomas Breuel. 2016. Sequence-to-sequence neural network models for transliteration. arXiv preprint arXiv:1610.09565.
- Terrence J. Sejnowski and Charles R. Rosenberg. 1987. Parallel networks that learn to pronounce English text. Complex Systems, 1.
- Data augmentation for morphological reinflection. In Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, pages 90–99, Vancouver. Association for Computational Linguistics.
- Noise-aware character alignment for bootstrapping statistical machine transliteration from bilingual corpora. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 204–209, Seattle, Washington, USA. Association for Computational Linguistics.
- Neural hidden Markov model for machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 377–382. Association for Computational Linguistics.
- R. L. Weide. 1998. The Carnegie Mellon pronouncing dictionary.
- Hard non-monotonic attention for character-level transduction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4425–4438. Association for Computational Linguistics.
- Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, pages 2048–2057.
- Kaisheng Yao and Geoffrey Zweig. 2015. Sequence-to-sequence neural net models for grapheme-to-phoneme conversion. In INTERSPEECH, pages 3330–3334, Dresden, Germany.
- Online segment to segment neural transduction. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1307–1316, Austin, Texas. Association for Computational Linguistics.
- Whitepaper of NEWS 2015 shared task on machine transliteration. In Proceedings of the Fifth Named Entity Workshop, pages 1–9, Beijing, China. Association for Computational Linguistics.
- Chunting Zhou and Graham Neubig. 2017. Multi-space variational encoder-decoders for semi-supervised labeled sequence transduction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 310–320, Vancouver, Canada. Association for Computational Linguistics.
- Shijie Wu (23 papers)
- Ryan Cotterell (226 papers)