A Sequence-to-Set Network for Nested Named Entity Recognition (2105.08901v2)

Published 19 May 2021 in cs.CL

Abstract: Named entity recognition (NER) is a widely studied task in natural language processing. Recently, a growing number of studies have focused on the nested NER. The span-based methods, considering the entity recognition as a span classification task, can deal with nested entities naturally. But they suffer from the huge search space and the lack of interactions between entities. To address these issues, we propose a novel sequence-to-set neural network for nested NER. Instead of specifying candidate spans in advance, we provide a fixed set of learnable vectors to learn the patterns of the valuable spans. We utilize a non-autoregressive decoder to predict the final set of entities in one pass, in which we are able to capture dependencies between entities. Compared with the sequence-to-sequence method, our model is more suitable for such unordered recognition task as it is insensitive to the label order. In addition, we utilize the loss function based on bipartite matching to compute the overall training loss. Experimental results show that our proposed model achieves state-of-the-art on three nested NER corpora: ACE 2004, ACE 2005 and KBP 2017. The code is available at https://github.com/zqtan1024/sequence-to-set.

PDF Abstract

A Sequence-to-Set Network for Nested Named Entity Recognition

The paper "A Sequence-to-Set Network for Nested Named Entity Recognition" by Zeqi Tan and colleagues presents a novel approach to addressing the complexities involved in Nested Named Entity Recognition (NER). Recognizing nested entities is a challenging aspect of NER as it involves detecting entities within other entities, which exacerbates the search space and complicates the modeling of entity relationships. The authors propose a sequence-to-set neural network that addresses these challenges by reformulating the entity recognition task into a set prediction task, leveraging a fixed set of learnable vectors.

Methodology

The proposed model abandons the conventional span-based and sequence-to-sequence approaches. It utilizes a non-autoregressive decoder to simplify the task by predicting the entity set in one forward pass. This paradigm shift mitigates the sensitivity to label order that is characteristic of sequence-to-sequence methods while naturally integrating the prediction of unordered outputs. The core components of the model are:

Sequence Encoder:
- The model employs BERT alongside a BiLSTM to capture contextual token embeddings. Tokens are represented using a composite of BERT embeddings, GloVe embeddings, POS embeddings, and character-level embeddings.
Entity Set Decoder:
- A transformer-based architecture is used here, applying self-attention and cross-attention mechanisms. By deploying a set of learnable vectors, termed entity queries, the decoder predicts the entities in a non-autoregressive manner. This mechanism promotes the modeling of dependencies between entities through direct interactions via self-attention.
Bipartite Matching Loss:
- Unlike traditional cross-entropy losses that rely on label order, the bipartite matching loss computes optimal matching between predictions and gold-standard entities using the Hungarian algorithm. This approach is more suitable for unordered entity recognition tasks, ensuring robust alignment and loss calculation.

Results and Evaluation

The model was assessed across three datasets: ACE 2004, ACE 2005, and KBP 2017. Experimental results indicate that the proposed model achieves state-of-the-art performance, surpassing competing models in F1 scores by 0.56% on ACE 2004, 1.65% on ACE 2005, and 2.99% on KBP 2017. The paper highlights the model's efficiency in recognizing nested entities by capitalizing on the unordered nature of set prediction.

Implications and Future Directions

The transition to a set prediction framework for nested NER as proposed in this paper illustrates a significant innovation in handling inherent disorder in entities. This approach could inspire future research to explore other domains where similar unordered output predictions can yield improvements, particularly in various structured prediction tasks in AI.

Moreover, the model's design, which inherently models interactions between queries, suggests potential avenues for enhanced representation learning by further refining self-attention mechanisms. Additionally, this method may be augmented with newer transformer architectures or integrated with domain-specific embeddings to further enhance performance.

In conclusion, the authors demonstrate the efficacy of a novel sequence-to-set approach in the field of Nested NER, paving the way for strategies that challenge conventional sequence-based prediction models and address complex output dependencies more effectively.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Zeqi Tan (18 papers)
Yongliang Shen (47 papers)
Shuai Zhang (319 papers)
Weiming Lu (54 papers)
Yueting Zhuang (164 papers)

Citations (75)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - zqtan1024/sequence-to-set (50 stars)