Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations (1603.04351v3)

Published 14 Mar 2016 in cs.CL

Abstract: We present a simple and effective scheme for dependency parsing which is based on bidirectional-LSTMs (BiLSTMs). Each sentence token is associated with a BiLSTM vector representing the token in its sentential context, and feature vectors are constructed by concatenating a few BiLSTM vectors. The BiLSTM is trained jointly with the parser objective, resulting in very effective feature extractors for parsing. We demonstrate the effectiveness of the approach by applying it to a greedy transition-based parser as well as to a globally optimized graph-based parser. The resulting parsers have very simple architectures, and match or surpass the state-of-the-art accuracies on English and Chinese.

Citations (660)

View on Semantic Scholar

Summary

The paper presents a novel BiLSTM-based method that reduces extensive handcrafted feature engineering while delivering high parsing accuracy.
The approach leverages BiLSTM feature extraction within both transition-based and graph-based parsers to capture rich contextual dependencies.
Experiments on the Penn Treebank and CTB highlight competitive performance, achieving up to 93.9 UAS and 91.9 LAS in English with the proposed method.

Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations

The paper presents a novel approach to dependency parsing by utilizing Bidirectional Long Short-Term Memory (BiLSTM) networks to generate feature representations, which are then applied to both greedy transition-based and graph-based parsers. The intent is to simplify feature engineering while achieving competitive parsing accuracies.

Motivation and Approach

Dependency parsing seeks to analyze the grammatical structure of a sentence by establishing relationships between words. Traditional methods rely on intricate handcrafted feature functions, which are both labor-intensive and limited in adaptability. The authors propose a BiLSTM-based method that reduces the need for extensive feature engineering. By deriving feature vectors from BiLSTM encodings, the method naturally incorporates contextual information, effectively capturing both historical and future dependencies around each word.

Methodology

BiLSTM Feature Extraction: Each word in a sentence is represented by a concatenation of its word and Part Of Speech (POS) tag embeddings. These embeddings form input sequences for a BiLSTM, which generates context-aware vectors for each sentence token. The resultant vector for a token reflects its context within the sentence, eliminating the requirement for explicit feature combinations.

Parsing Architectures:

Transition-Based Parser: Utilizes a greedy approach with a dynamic oracle. The feature function here relies on BiLSTM representations of words on the stack and buffer, reducing the traditionally large feature sets to much simpler versions with either four or eleven components.
Graph-Based Parser: Focuses on scoring potential dependency arcs via a margin-based structured prediction framework. The model scores candidate head-modifier arcs using a neural network and identifies the best-scoring parse tree using Eisner’s algorithm.

Experimental Findings

The experiments were conducted using the Penn Treebank (PTB) for English and the Penn Chinese Treebank (CTB) for Chinese, achieving impressive accuracy without heavy feature engineering or external embeddings. Key findings include:

English Parsing: Achieved 93.1 UAS and 91.0 LAS using a simple first-order graph-based model without pre-trained embeddings. The transition-based parser performed comparably.
Chinese Parsing: Displayed similar competitive performance, reaching 86.6 UAS and 85.1 LAS with the graph-based model.

When pre-trained embeddings were integrated, the transition-based parser showed incremental improvements, notably achieving 93.9 UAS and 91.9 LAS for English.

Implications

The proposed BiLSTM-based feature representation method importantly reduces the reliance on complex feature engineering, providing a flexible and adaptive solution applicable across various parsing frameworks. This efficiency in feature extraction opens possibilities for broader applications in natural language processing tasks where context understanding is crucial.

Future Directions

Potential avenues for further research include exploring the integration of attention mechanisms with BiLSTM within parsing models and expanding the method to accommodate other languages and more complex syntactic constructs. Furthermore, the interplay between structured prediction models and neural feature extractors could be extended to other domains requiring hierarchical or structured data interpretation.

In conclusion, this work underscores the potential of neural networks, specifically BiLSTMs, to advance dependency parsing by streamlining feature representation, thereby aligning parsing performance with state-of-the-art solutions while simplifying model architectures.

PDF Markdown