Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
107 tokens/sec
Gemini 2.5 Pro Premium
58 tokens/sec
GPT-5 Medium
29 tokens/sec
GPT-5 High Premium
25 tokens/sec
GPT-4o
101 tokens/sec
DeepSeek R1 via Azure Premium
84 tokens/sec
GPT OSS 120B via Groq Premium
478 tokens/sec
Kimi K2 via Groq Premium
213 tokens/sec
2000 character limit reached

Deep Biaffine Attention for Neural Dependency Parsing (1611.01734v3)

Published 6 Nov 2016 in cs.CL and cs.NE

Abstract: This paper builds off recent work from Kiperwasser & Goldberg (2016) using neural attention in a simple graph-based dependency parser. We use a larger but more thoroughly regularized parser than other recent BiLSTM-based approaches, with biaffine classifiers to predict arcs and labels. Our parser gets state of the art or near state of the art performance on standard treebanks for six different languages, achieving 95.7% UAS and 94.1% LAS on the most popular English PTB dataset. This makes it the highest-performing graph-based parser on this benchmark---outperforming Kiperwasser Goldberg (2016) by 1.8% and 2.2%---and comparable to the highest performing transition-based parser (Kuncoro et al., 2016), which achieves 95.8% UAS and 94.6% LAS. We also show which hyperparameter choices had a significant effect on parsing accuracy, allowing us to achieve large gains over other graph-based approaches.

Citations (1,185)

Summary

  • The paper introduces a deep biaffine attention mechanism that replaces traditional MLP methods with bilinear transformations for predicting dependency arcs and labels.
  • It utilizes dimensionality reduction via MLPs and extensive dropout regularization to streamline LSTM outputs and prevent overfitting.
  • The parser achieves near-SOTA performance, recording 95.7% UAS and 94.1% LAS on the English Penn Treebank dataset.

Deep Biaffine Attention for Neural Dependency Parsing: An Expert Overview

The paper "Deep Biaffine Attention for Neural Dependency Parsing" by Timothy Dozat and Christopher D. Manning presents significant advancements in the domain of graph-based neural dependency parsing. This work primarily builds on the neural attention mechanism introduced by Kiperwasser and Goldberg (2016), integrating various innovative systems to enhance parsing accuracy for multiple languages.

Core Contributions

The essence of this paper is the introduction of a biaffine attention mechanism in a neural dependency parser. The proposed parser leverages biaffine classifiers to predict arcs and labels, utilizing a more regularized architecture to achieve competitive performance. Key innovations include:

  1. Deep Biaffine Attention: This mechanism uses biaffine transformations for predicting dependency arcs and labels, diverging from traditional MLP-based methods. The biaffine attention offers a conceptually straightforward modeling approach by utilizing bilinear transformations, simplifying the network architecture.
  2. Dimensionality Reduction via MLPs: By reducing the dimensionality of LSTM output vectors before applying biaffine transformations, the model effectively strips away superfluous information, which would otherwise complicate the decision-making process and risk overfitting.
  3. Extensive Regularization: The paper incorporates extensive regularization techniques, including word and tag dropout, LSTM dropout, and MLP dropout, to prevent overfitting and ensure robust model performance.

Experimental Setup and Results

The proposed parser achieves state-of-the-art (SOTA) or near-SOTA performance on multiple standard benchmarks. For instance, it reaches 95.7% Unlabeled Attachment Score (UAS) and 94.1% Labeled Attachment Score (LAS) on the English Penn Treebank (PTB) dataset. This performance is competitive with the highest-performing models, such as the transition-based parser by Kuncoro et al., which achieves 95.8% UAS and 94.6% LAS.

Performance Analysis

Several hyperparameter choices significantly impacted the model's performance:

  • Network Size: The model utilizes three layers of 400-dimensional bidirectional LSTMs and 500-dimensional ReLU MLP layers. Testing different configurations revealed that increasing the LSTM size and employing deeper networks enhanced performance.
  • Attention Mechanism: The deep biaffine attention mechanism outperformed shallow bilinear and MLP-based mechanisms in both accuracy and speed, demonstrating its efficacy.
  • Recurrent Cell Types: Among various types explored (LSTM, GRU, coupled input-forget gate LSTM), LSTM cells offered the best performance. Cif-LSTM cells provided a balance between speed and accuracy, substantially outperforming GRUs.
  • Dropout: Implementing dropout across different stages of the network was critical for preventing overfitting.

Implications and Future Work

The results underscore the benefits of a well-regularized, deeply structured graph-based dependency parser with biaffine attention. This model's impressive performance on multilingual datasets indicates its potential for broader applicability in diverse NLP tasks. Future research could focus on addressing the slight LAS performance gap apparent when compared to the current SOTA transition-based models. Enhancements may include improving out-of-vocabulary word handling, especially in morphologically rich languages, and incorporating mechanisms to better capture phrasal compositionality.

In summary, this paper presents a robust graph-based neural dependency parser with innovations in attention mechanisms and network regularization, contributing valuable insights and methodologies for advancing natural language understanding tasks in computational linguistics.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.