Exact Hard Monotonic Attention for Character-Level Transduction (1905.06319v3)

Published 15 May 2019 in cs.CL

Abstract: Many common character-level, string-to string transduction tasks, e.g., grapheme-tophoneme conversion and morphological inflection, consist almost exclusively of monotonic transductions. However, neural sequence-to sequence models that use non-monotonic soft attention often outperform popular monotonic models. In this work, we ask the following question: Is monotonicity really a helpful inductive bias for these tasks? We develop a hard attention sequence-to-sequence model that enforces strict monotonicity and learns a latent alignment jointly while learning to transduce. With the help of dynamic programming, we are able to compute the exact marginalization over all monotonic alignments. Our models achieve state-of-the-art performance on morphological inflection. Furthermore, we find strong performance on two other character-level transduction tasks. Code is available at https://github.com/shijie-wu/neural-transducer.

View on arXiv

References (27)

Authors (2)

Shijie Wu (23 papers)
Ryan Cotterell (226 papers)

Citations (58)

View on Semantic Scholar

Summary

Exact Hard Monotonic Attention for Character-Level Transduction

This paper presents a novel approach to enforcing monotonicity in neural sequence-to-sequence models through exact hard monotonic attention. The authors focus on character-level transduction tasks, such as grapheme-to-phoneme conversion, named-entity transliteration, and morphological inflection, which are inherently monotonic. Conventional non-monotonic models, despite their success, fail to leverage this monotonicity bias. The central inquiry of the paper is whether enforcing monotonicity can improve performance in these tasks.

Model and Methodology

The core contribution is the development of hard attention sequence-to-sequence models that enforce strict monotonicity while learning latent alignments. Two models are introduced: 0 $^\text{th}$ -order hard monotonic attention and 1 $^\text{st}$ -order hard monotonic attention. The 0 $^\text{th}$ -order model aligns each target character to a source character without considering past alignments. It uses a neuralized HMM parameterization, allowing polynomial-time likelihood computation. The 1 $^\text{st}$ -order model, on the other hand, conditions alignment decisions on previous alignments, introducing dependency that enforces monotonicity strictly.

Dynamic programming techniques are employed to compute exact marginalizations over all monotonic alignments efficiently. This exact computation contrasts with other methods that rely on approximate inference techniques, underscoring the potential computational advantages of the proposed approach.

Empirical Evaluation

The paper reports state-of-the-art performance in morphological inflection tasks using the CoNLL-SIGMORPHON 2017 dataset. Both monotonic models outperform their non-monotonic counterparts, validating monotonicity as a beneficial inductive bias when jointly learning alignments and transduction. The findings present robust evidence that monotonicity leads to more accurate models.

A series of controlled experiments further demonstrate the effectiveness of joint training of transductions and alignments, showcasing performance improvements across three transduction tasks. Specifically, training with enforced monotonicity didn't degrade performance, contrary to previous beliefs, and in fact improved results, especially in high-resource settings.

Implications and Future Directions

The findings have profound implications for the design of neural architectures for sequence transduction tasks. As monotonic models prove effective, they may inspire developments in other domains where monotonic relationships are present. Moreover, the success of exact dynamic programming with monotonic constraints paves the way for further research into efficient and scalable inference mechanisms in structured prediction tasks.

Future research should explore adaptive methods to dynamically choose between monotonic and non-monotonic processes within a single framework, potentially improving models' versatility in handling diverse linguistic phenomena. Additionally, investigating the interplay of monotonic alignment biases with other types of linguistic inductive biases remains a promising avenue.

In conclusion, this paper substantiates the hypothesis that monotonicity is indeed a valuable inductive bias when properly enforced within the sequence-to-sequence models. The advancements presented not only push the performance boundaries in character-level transduction tasks but also lay the groundwork for future exploration into hybrid and dynamic models with enforced structural constraints.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - shijie-wu/neural-transducer: This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks. (72 stars)