Template-free Prompt Tuning for Few-shot NER (2109.13532v3)

Published 28 Sep 2021 in cs.CL and cs.AI

Abstract: Prompt-based methods have been successfully applied in sentence-level few-shot learning tasks, mostly owing to the sophisticated design of templates and label words. However, when applied to token-level labeling tasks such as NER, it would be time-consuming to enumerate the template queries over all potential entity spans. In this work, we propose a more elegant method to reformulate NER tasks as LM problems without any templates. Specifically, we discard the template construction process while maintaining the word prediction paradigm of pre-training models to predict a class-related pivot word (or label word) at the entity position. Meanwhile, we also explore principled ways to automatically search for appropriate label words that the pre-trained models can easily adapt to. While avoiding complicated template-based process, the proposed LM objective also reduces the gap between different objectives used in pre-training and fine-tuning, thus it can better benefit the few-shot performance. Experimental results demonstrate the effectiveness of the proposed method over bert-tagger and template-based method under few-shot setting. Moreover, the decoding speed of the proposed method is up to 1930.12 times faster than the template-based method.

PDF Abstract

Template-free Prompt Tuning in Few-shot Named Entity Recognition

The research presented focuses on a novel approach for Named Entity Recognition (NER) in few-shot learning scenarios using template-free prompt tuning. NER is a critical character-level task in NLP where token-labeling granularity often poses challenges for the application of traditional sentence-level prompt-based methods. The paper introduces an innovative Entity-oriented LLM (EntLM) fine-tuning objective, designed to reformulate NER tasks as LM problems without requiring cumbersome template constructions.

Methodology and Approach

The approach begins by circumventing the complexities linked to template creation in traditional prompt-based methodologies, specifically when enumerating diverse entity spans. By operating without templates, the model leverages the intrinsic masked LM objective used in existing models like BERT, aligning the fine-tuning objectives closer to the pre-training tasks. This alignment enhances adaptability to new tasks, particularly beneficial in few-shot settings.

EntLM utilizes class-related pivot words or label words at entity positions while maintaining original token predictions at non-entity positions. This pivot-focused strategy enhances prediction accuracy and model efficiency. Unlike typical prompt-based architectures necessitating exhaustive span enumeration, EntLM functions with a one-pass decoding process, facilitating significant reductions in computational time—shown to be up to 1930.12 times faster than traditional template-based methods.

Label Word Engineering and Selection

Label word selection is crucial in the EntLM framework. This research explores multiple methodologies for automatic label word selection, employing lexicon-annotated corpus data to establish a foundation for fewer-shot scenarios where labeled samples are sparse. It combines data distribution insights and LM-generated outputs to select representative class words, offering both discrete label words and computed virtual label words (such as embeddings).

Experimental Evaluation

The empirical studies across datasets such as CoNLL'03 and OntoNotes validate the efficacy of EntLM, showcasing superior performance compared to existing NER methodologies like BERT-tagger and template NER in few-shot conditions. Notable improvements were observed particularly in 5-shot learning setups, evidencing EntLM's robustness and adaptability.

Implications and Future Directions

EntLM represents a progressive leap forward in prompt-based learning for token-level tasks such as NER. Its significant computational efficiency could have practical implications for real-time systems where processing speed and reduced resource consumption are critical. The theoretical contribution emphasizes the reduced objective gap between pre-trained and fine-tuning stages, hinting at potential enhancements in model training stability and efficiency.

Looking ahead, further optimization of label word selection processes, perhaps including advanced machine learning techniques for dynamic label adaptation, could enhance model performance. Exploring EntLM applications beyond NER to other token-level tasks could unlock broader AI potential, optimizing NLP systems in diverse operational contexts.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Ruotian Ma (19 papers)
Xin Zhou (319 papers)
Tao Gui (127 papers)
Yiding Tan (4 papers)
Linyang Li (57 papers)
Qi Zhang (784 papers)
Xuanjing Huang (287 papers)

Citations (167)

View on Semantic Scholar