Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Named Entity Recognition as Dependency Parsing (2005.07150v3)

Published 14 May 2020 in cs.CL

Abstract: Named Entity Recognition (NER) is a fundamental task in Natural Language Processing, concerned with identifying spans of text expressing references to entities. NER research is often focused on flat entities only (flat NER), ignoring the fact that entity references can be nested, as in Bank of [China]. In this paper, we use ideas from graph-based dependency parsing to provide our model a global view on the input via a biaffine model (Dozat and Manning, 2017). The biaffine model scores pairs of start and end tokens in a sentence which we use to explore all spans, so that the model is able to predict named entities accurately. We show that the model works well for both nested and flat NER through evaluation on 8 corpora and achieving SoTA performance on all of them, with accuracy gains of up to 2.2 percentage points.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Juntao Yu (13 papers)
  2. Bernd Bohnet (21 papers)
  3. Massimo Poesio (28 papers)
Citations (379)

Summary

Overview of "Named Entity Recognition as Dependency Parsing"

The paper entitled "Named Entity Recognition as Dependency Parsing" presents a novel formulation of Named Entity Recognition (NER) in the context of dependency parsing. The authors, Juntao Yu, Bernd Bohnet, and Massimo Poesio, propose a framework that reformulates NER as a structured prediction problem rather than the conventional sequence labeling task. In doing so, they leverage the biaffine dependency parsing model to simultaneously handle flat and nested named entities. The methodology, evaluated on eight distinct corpora, demonstrates state-of-the-art (SoTA) results, with accuracy gains reaching up to 2.2 percentage points over previous methods.

Methodology

The central approach in the paper involves using a biaffine model for assigning scores to pairs of start and end tokens in a sentence. This model operates on both nested and flat NER data by examining all possible spans and predicting entities based on these scores. The system incorporates BERT and fastText embeddings, alongside a BiLSTM to provide contextualized representations. The use of distinct representations for span start and end points is noted as a critical factor in achieving improved prediction accuracy.

Key architectures and methodologies in the model include:

  • Biaffine Classifier: This component significantly contributes to the system’s performance, providing a global view of the sentence that distinguishes it from simpler concatenation methods used in previous works.
  • Embedding Utilization: The model integrates BERT embeddings for robust context-dependent word representations, demonstrating substantial improvements by providing rich linguistic features across diverse NER categories.
  • Multi-layer BiLSTM: As part of the feature extraction, this component is essential for encoding sequential information that supports the span scoring.

Results and Evaluation

The model was evaluated on three nested NER benchmarks—ACE 2004, ACE 2005, and GENIA—and five flat NER corpora—CoNLL 2002 (Dutch, Spanish) and CoNLL 2003 (English, German), as well as OntoNotes. Across all datasets, the model achieved superior performance, underscoring the efficacy of structured prediction methods over traditional sequence labeling approaches in NER tasks.

Results were particularly notable in:

  • Nested NER tasks, where the model outperformed previous methods by up to 2.2 percentage points, attributed largely to its structured prediction capabilities.
  • Flat NER tasks, with the proposed method achieving high scores consistently across multiple languages and datasets, including a substantial margin over prior SoTA results.

Implications and Future Directions

The successful application of dependency parsing techniques to NER signals a shift in methodological frameworks used within NLP. The paper's model provides a compelling case for moving away from sequence labeling to more sophisticated structured prediction techniques, which allow for a more nuanced understanding of text structure and context.

Looking forward, several potential directions for development and research can be identified:

  • Integration with Other LLMs: The adoption of even more advanced contextual embeddings could further enhance the model's capabilities.
  • Domain Adaptability: Expanding the model's application across varied domains could yield insights into its generalization ability and adaptability.
  • Multimodal Extensions: Considering potential extensions into multimodal tasks, integrating textual NER with other data types such as image or video.

Ultimately, this paper sets the stage for future explorations in structured prediction within NER and broader natural language processing tasks, advancing the boundaries of how entity recognition is conceptualized and executed.