Colorless green recurrent networks dream hierarchically (1803.11138v1)

Published 29 Mar 2018 in cs.CL

Abstract: Recurrent neural networks (RNNs) have achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language. We investigate here to what extent RNNs learn to track abstract hierarchical syntactic structure. We test whether RNNs trained with a generic LLMing objective in four languages (Italian, English, Hebrew, Russian) can predict long-distance number agreement in various constructions. We include in our evaluation nonsensical sentences where RNNs cannot rely on semantic or lexical cues ("The colorless green ideas I ate with the chair sleep furiously"), and, for Italian, we compare model performance to human intuitions. Our language-model-trained RNNs make reliable predictions about long-distance agreement, and do not lag much behind human performance. We thus bring support to the hypothesis that RNNs are not just shallow-pattern extractors, but they also acquire deeper grammatical competence.

Authors (5)

Kristina Gulordava (3 papers)
Piotr Bojanowski (50 papers)
Edouard Grave (56 papers)
Tal Linzen (73 papers)
Marco Baroni (58 papers)

Citations (486)

View on Semantic Scholar

Summary

An Analysis of Hierarchical Syntax Tracking in Recurrent Neural Networks

The paper "Colorless green recurrent networks dream hierarchically" examines the capacity of Recurrent Neural Networks (RNNs) to learn and represent abstract hierarchical syntactic structures through a generic LLMing task. It focuses on whether RNNs can predict long-distance syntactic agreements, such as subject-verb agreement, across various languages including Italian, English, Hebrew, and Russian. The authors explore this in conditions devoid of semantic or frequency-based cues, notably using nonsensical sentences.

Methodology

The paper builds on prior work by evaluating syntactic capabilities without direct task supervision, a limitation noted in previous investigations like Linzen et al. (2016). RNNs are trained for LLMing and tested using sentences drawn from treebanks for each language. The test includes standard corpus-extracted examples and nonce sentences, where content words are replaced with random morphologically-matched words to eliminate semantic cues. The models are assessed based on their ability to predict correct syntactic number agreement in both scenarios.

Results and Contributions

RNNs trained in a LLMing context demonstrated a strong capacity to handle long-distance agreement, with performance differences between standard and nonce sentences being minimal. The superiority of LSTM architectures over others, such as standard n-grams and smaller RNNs, is particularly highlighted. Across the evaluated languages, RNNs perform consistently well, approaching human performance levels, especially in Italian where human judgements were collected.

Key results include:

High performance on syntactic agreement tasks indicates RNNs can capture abstract grammatical structures.
Performance robustness on nonce sentences suggests a deeper syntactic understanding beyond mere training data memorization.
LSTM models exhibit strong correlations between perplexity in LLMing tasks and accuracy in syntactic agreement predictions.
Performance across languages emphasizes the potential influence of morphological richness on RNN effectiveness.

Implications and Future Directions

This research strengthens the view that RNNs can internalize complex syntactic rules merely from exposure to broad-scale corpus data. Practically, this underscores the viability of RNNs in applications requiring sophisticated language understanding, such as translation or syntactic parsing, where explicit syntactic information isn't always readily available.

Theoretically, the paper suggests that natural data might contain sufficient implicit syntactic cues to facilitate the learning of abstract grammatical structures, even in systems without an explicit syntactic bias. This challenges similar assumptions in both human language acquisition and artificial systems.

Future research directions could involve expanding the types of syntactic phenomena studied, such as case assignment or gap licensing, to probe more intricate aspects of RNN linguistic competence. Additionally, deploying constructed evaluation sentences could isolate specific syntactic capabilities, complementing the corpus-driven approach for a more nuanced understanding of RNN syntactic processing.

In conclusion, this paper provides compelling evidence that recurrent architectures trained with general language objectives can capture and exploit underlying syntactic hierarchies, aligning machine competencies more closely with certain aspects of human language understanding.

PDF Markdown