Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies (1611.01368v1)

Published 4 Nov 2016 in cs.CL

Abstract: The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have explicit structural representations? We begin addressing this question using number agreement in English subject-verb dependencies. We probe the architecture's grammatical competence both using training objectives with an explicit grammatical target (number prediction, grammaticality judgments) and using LLMs. In the strongly supervised settings, the LSTM achieved very high overall accuracy (less than 1% errors), but errors increased when sequential and structural information conflicted. The frequency of such errors rose sharply in the language-modeling setting. We conclude that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the LLMing signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.

Authors (3)

Tal Linzen (73 papers)
Emmanuel Dupoux (81 papers)
Yoav Goldberg (142 papers)

Citations (873)

View on Semantic Scholar

Summary

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

Summary of the Paper

The paper "Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies" investigates the capacity of Long Short-Term Memory (LSTM) networks to learn syntactic structures necessary for understanding and generating natural language, focusing on English subject-verb agreement. The authors Tal Linzen, Emmanuel Dupoux, and Yoav Goldberg explore how LSTMs handle dependencies that are sensitive to syntactic information despite not having an inherent hierarchical structure.

Background and Motivation

The motivation for the research stems from the extensive use of Recurrent Neural Networks (RNNs) in NLP, particularly variants like LSTMs which have shown effectiveness in tasks such as LLMing, parsing, and machine translation. Despite their success, RNNs treat sentences as sequences of words and do not explicitly model syntactic structures, raising the question of whether they can learn syntax-sensitive dependencies.

Subject-verb agreement in English is used as a test case due to its straightforward hierarchical nature. The form of a verb must match the number of its subject, which can be masked by intervening words or phrases. The core inquiry is whether an LSTM can learn these dependencies under various training conditions and objectives.

Methodology

The paper employs several training objectives to evaluate the LSTM's performance:

Number Prediction Task: The model is trained to predict the number (singular or plural) of a verb based on the words leading up to it.
Verb Inflection Task: Similar to the number prediction task, but the model also receives the base form of the upcoming verb.
Grammaticality Judgment Task: The model is given sentences and asked to judge their grammaticality, focusing on subject-verb agreement errors.
LLMing (LM) Task: The model is trained to predict the next word in a sentence without any explicit grammatical cues.

Evaluations are conducted on a large dataset drawn from Wikipedia, ensuring diverse and challenging sentence structures. Performance is measured by the error rates in predicting verb number, particularly in cases with intervening nouns acting as agreement attractors.

Results and Analysis

Strongly Supervised Settings

In the strongly supervised number prediction and verb inflection scenarios, the LSTM achieved remarkably high overall accuracy (less than 1% error rate). However, errors increased significantly when sequential information conflicted with structural cues, particularly in sentences with agreement attractors. For instance, the error rate rose to 17.6% in scenarios with four attractors.

Grammaticality Judgment and LLMing

The grammaticality judgment task resulted in a moderate performance decline to 2.5% errors, indicating the model's capacity to learn from less direct supervision. In contrast, the LLMing task showed poor performance (6.78% errors), especially with agreement attractors, suggesting that LMs need additional syntactic supervision to capture these dependencies effectively.

Comparison with Large-Scale Models

To evaluate the impact of model size and data, a state-of-the-art large-scale LM by Google was tested, revealing similar vulnerabilities to agreement attractors despite significantly larger resources. This finding reinforces the necessity for targeted syntactic supervision.

Implications and Future Directions

The paper concludes that while LSTMs can approximate syntactic dependencies with explicit supervision, achieving robust syntax-sensitive performance across challenging cases is still problematic. These insights have several implications:

Model Improvement: Future architectures might incorporate more explicit syntactic representations or hierarchical structures to better capture complex dependencies.
Training Strategies: Introducing more targeted and balanced training datasets, focusing on difficult syntactic constructs, could improve generalization.
Language Acquisition Modeling: From a cognitive perspective, models can offer valuable hypotheses about how humans may learn and handle grammar, aligning with findings in psycholinguistics.

Conclusion

This paper presents a thorough and insightful investigation into the syntactic capabilities of LSTMs, highlighting their strengths in supervised settings and limitations under implicit tasks. By illustrating where and why LSTMs fail with complex syntactic structures, it sets the stage for future research into more expressive and structurally aware neural architectures in NLP.

PDF Markdown

Related Papers

Find Related Papers