Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Globally Normalized Transition-Based Neural Networks (1603.06042v2)

Published 19 Mar 2016 in cs.CL, cs.LG, and cs.NE

Abstract: We introduce a globally normalized transition-based neural network model that achieves state-of-the-art part-of-speech tagging, dependency parsing and sentence compression results. Our model is a simple feed-forward neural network that operates on a task-specific transition system, yet achieves comparable or better accuracies than recurrent models. We discuss the importance of global as opposed to local normalization: a key insight is that the label bias problem implies that globally normalized models can be strictly more expressive than locally normalized models.

An Analysis of Globally Normalized Transition-Based Neural Networks

This paper presents a globally normalized transition-based neural network model that achieves superior performance in part-of-speech tagging, dependency parsing, and sentence compression. The approach deviates from the commonly used recurrent models, such as LSTMs, by employing a straightforward feed-forward neural network. A distinguishing feature of this method is its global normalization strategy, which addresses the label bias problem prevalent in locally normalized models.

Core Contributions

The principal contribution of the paper lies in demonstrating that feed-forward networks, when globally normalized, can match or surpass the performance of recurrent architectures on several NLP tasks. This assertion challenges the perceived necessity of recurrence for high accuracy in these tasks. The authors elucidate that global normalization enhances model expressiveness by overcoming the constraints imposed by local normalization, thereby allowing the model to effectively leverage evidence as it becomes available.

Methodology

The presented model is grounded in a transition system that utilizes a globally normalized feed-forward network. The network is trained using a CRF objective, which facilitates global normalization and mitigates the label bias problem. The authors employ beam search during inference and approximate the partition function by considering the items in the beam. Full backpropagation training is applied across all neural network parameters, as opposed to approaches that fix certain parameters during structured training phases.

Experimental Evaluation

The model's efficacy is validated across tasks such as part-of-speech tagging, dependency parsing, and sentence compression, achieving state-of-the-art results. Notably, the model sets a new record for the Wall Street Journal dependency parsing dataset with an unlabeled attachment score of 94.61%. The results indicate a significant performance improvement attributed to global training strategies over previous approaches relying on local normalization and limited backpropagation.

Theoretical Insights

The paper provides a theoretical framework by revisiting the label bias problem, illustrating that globally normalized models exhibit greater expressiveness compared to locally normalized counterparts. The authors present a formal proof demonstrating that globally normalized models can represent a broader class of distributions, highlighting the limitations of local normalization through concrete examples.

Implications and Future Directions

The success of this model has several implications. From a practical standpoint, it offers a path to design efficient and accurate NLP models without the computational overhead of recurrence. Theoretically, it invites further exploration into the integration of global normalization techniques across different neural architectures. Future research might explore the scalability of this approach to other complex structured outputs and its integration with other global optimization techniques.

In conclusion, the paper convincingly argues for the merits of globally normalized, transition-based neural networks. By substantiating the advantages of global normalization with both theoretical and empirical evidence, it opens avenues for further refinements and applications in the field of NLP and AI.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Daniel Andor (14 papers)
  2. Chris Alberti (23 papers)
  3. David Weiss (16 papers)
  4. Aliaksei Severyn (29 papers)
  5. Alessandro Presta (5 papers)
  6. Kuzman Ganchev (13 papers)
  7. Slav Petrov (19 papers)
  8. Michael Collins (46 papers)
Citations (566)
Youtube Logo Streamline Icon: https://streamlinehq.com