Template-Based Named Entity Recognition Using BART (2106.01760v1)

Published 3 Jun 2021 in cs.CL

Abstract: There is a recent interest in investigating few-shot NER, where the low-resource target domain has different label sets compared with a resource-rich source domain. Existing methods use a similarity-based metric. However, they cannot make full use of knowledge transfer in NER model parameters. To address the issue, we propose a template-based method for NER, treating NER as a LLM ranking problem in a sequence-to-sequence framework, where original sentences and statement templates filled by candidate named entity span are regarded as the source sequence and the target sequence, respectively. For inference, the model is required to classify each candidate span based on the corresponding template scores. Our experiments demonstrate that the proposed method achieves 92.55% F1 score on the CoNLL03 (rich-resource task), and significantly better than fine-tuning BERT 10.88%, 15.34%, and 11.73% F1 score on the MIT Movie, the MIT Restaurant, and the ATIS (low-resource task), respectively.

Authors (5)

Leyang Cui (50 papers)
Yu Wu (196 papers)
Jian Liu (404 papers)
Sen Yang (191 papers)
Yue Zhang (620 papers)

Citations (312)

View on Semantic Scholar

Summary

Analysis of Template-Based Named Entity Recognition Using BART

The paper examines the challenge of few-shot Named Entity Recognition (NER) within the context of inadequate labeled data in target domains. Traditional NER models, such as those using BiLSTM or BERT, generally require large, consistent datasets with predefined entity categories. However, these models face limitations when label sets vary across domains, necessitating reconfiguration of output layers and retraining, which can be resource-intensive.

The authors propose a novel template-based approach to NER that transforms the NER task into a LLM ranking problem, harnessing the capabilities of BART—a pre-trained sequence-to-sequence (seq2seq) model. The method involves formulating NER as a process of matching input sentences to statement templates, then employing BART to determine the compatibility of candidate entity spans with these templates. The model is trained to generate scores for template-filled sequences, allowing it to classify entity spans with high precision and recall.

Key Results and Findings

The proposed method significantly improves upon the baseline performance in both resource-rich conditions using the CoNLL03 dataset and low-resource settings such as MIT Movie, MIT Restaurant, and ATIS datasets. Remarkably, the template-based BART achieved an F1 score of 92.55% on CoNLL03, demonstrating its robustness in a rich-resource scenario. Moreover, in low-resource settings, it surpassed the fine-tuning of BERT by a substantial margin, with improvements of 10.88%, 15.34%, and 11.73% on MIT Movie, MIT Restaurant, and ATIS, respectively.

Methodological Innovations

Template-Centric Framework: The shift to a template-based framework allows the model to gracefully handle variations in label sets without modifying the architecture for each new domain. This adaptability is crucial for few-shot and cross-domain scenarios.
Utilization of BART's Seq2seq Structure: By leveraging the generative nature of BART, the method benefits from its strong generalization capabilities, reducing reliance on text patterns specific to any single domain.
Continual Learning: Unlike traditional methods, this approach allows continuous learning without retraining from scratch. When new domains or entity types arise, the model can be fine-tuned efficiently by updating weights without altering the underlying architecture.

Implications and Future Work

The template-based method presents compelling evidence for using pre-trained LLMs in NER tasks, especially when faced with few-shot learning scenarios. The research highlights the potential efficacy of generative seq2seq models in handling sequence labeling tasks traditionally solved through classification-based approaches.

Future research could explore automated generation of effective templates and further refine the scoring mechanism to enhance cross-domain transferability. Furthermore, investigating the model's efficacy with diverse pre-training corpora and domain-specific enhancements could unlock broader applications.

In summary, this paper contributes significantly to the field of NER, offering a viable solution for overcoming data scarcity. The approach's flexibility and robustness suggest an effective blueprint for deploying NER models across varied domains and adapting to dynamic industrial requirements.

PDF Markdown

Related Papers

Find Related Papers