Neural Architectures for Nested NER through Linearization (1908.06926v1)

Published 19 Aug 2019 in cs.CL and cs.LG

Abstract: We propose two neural network architectures for nested named entity recognition (NER), a setting in which named entities may overlap and also be labeled with more than one label. We encode the nested labels using a linearized scheme. In our first proposed approach, the nested labels are modeled as multilabels corresponding to the Cartesian product of the nested labels in a standard LSTM-CRF architecture. In the second one, the nested NER is viewed as a sequence-to-sequence problem, in which the input sequence consists of the tokens and output sequence of the labels, using hard attention on the word whose label is being predicted. The proposed methods outperform the nested NER state of the art on four corpora: ACE-2004, ACE-2005, GENIA and Czech CNEC. We also enrich our architectures with the recently published contextual embeddings: ELMo, BERT and Flair, reaching further improvements for the four nested entity corpora. In addition, we report flat NER state-of-the-art results for CoNLL-2002 Dutch and Spanish and for CoNLL-2003 English.

Citations (237)

View on Semantic Scholar

Summary

The paper introduces two innovative models—a multilabel LSTM-CRF and a seq2seq approach—to effectively capture complex nested entity structures.
The methodology linearizes nested labels, enabling straightforward integration into existing pipelines and bypassing traditional hypergraph structures.
Empirical results on ACE, GENIA, and Czech CNEC demonstrate significant F1 score improvements, especially when augmented with embeddings like BERT and ELMo.

Neural Architectures for Nested NER through Linearization

The paper "Neural Architectures for Nested NER through Linearization" proposes innovative methodologies for nested named entity recognition (NER) by introducing two distinct neural architectures. This work addresses the inherent complexity of nested NER, where entities can overlap and be labeled with multiple tags. The researchers exploit a linearized scheme to encode nested labels, a strategy that circumvented the need to explicitly construct structures like hypergraphs or constituency trees traditionally utilized in this domain.

Methodologies and Architectural Contributions

LSTM-CRF Multilabel Model:
- This approach models nested labels as multilabels which are captured through the Cartesian product of all nested entities. The model operates within a standard LSTM-CRF framework.
- Its main advantage lies in its straightforward integration into existing pipelines that implement the LSTM-CRF architecture. However, the inherent complexity arises due to the increased number of NE classes to manage.
Sequence-to-Sequence Model (Seq2Seq):
- By redefining nested NER as a seq2seq task, the model treats the token sequence as input and the label sequence as output. Using hard attention, the decoder predicts labels for each token until a designated end-of-word tag is reached.
- This model is inherently more complex but demonstrates superior capability in capturing intricate nested relations within text entities.

Empirical Evaluation and Results

The proposed architectures were evaluated on four significant nested NE corpora: ACE-2004, ACE-2005, GENIA, and Czech CNEC. The results from these evaluations indicate substantial improvements over the current state of the art:

The Seq2Seq model, despite managing greater complexity, significantly outperforms the traditional approaches in datasets like ACE-2004 and ACE-2005, attributable to the longer and more deeply nested entities within these corpora.
When augmented with contemporary contextual embeddings such as ELMo, BERT, and Flair, both models showed remarkable enhancements in F1 scores across evaluated corpora. Notably, the integrated contextual embeddings allowed these models to exceed previous benchmarks for nested NER and even flat NER for languages like Dutch, Spanish, and English as observed in CoNLL datasets.

Implications and Future Research Trajectories

The implications of these findings are twofold. Practically, the introduction of scalable neural architectures that efficiently model nested entities opens pathways for more sophisticated information extraction systems with broader applications in domains demanding precise entity recognition. Theoretically, the success of purely neural solutions indirectly critiques the limitations of earlier structured approaches such as hypergraph modeling.

Future research could delve into further enhancing the seq2seq architecture's complexity management while maintaining its superior performance. Moreover, exploring integration with more contemporary LLMs could offer incremental advancements. Another avenue involves examining how these architectures can be optimized for low-resource languages or those with unique morphological challenges that necessitate a different model configuration.

Ultimately, the paper lays a robust foundation for advancing nested NER by validating the efficacy of neural network architectures that implicitly model the complex relationships between overlapping named entities.