Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Selective Annotation Makes Language Models Better Few-Shot Learners (2209.01975v1)

Published 5 Sep 2022 in cs.CL

Abstract: Many recent approaches to natural language tasks are built on the remarkable abilities of LLMs. LLMs can perform in-context learning, where they learn a new task from a few task demonstrations, without any parameter updates. This work examines the implications of in-context learning for the creation of datasets for new natural language tasks. Departing from recent in-context learning methods, we formulate an annotation-efficient, two-step framework: selective annotation that chooses a pool of examples to annotate from unlabeled data in advance, followed by prompt retrieval that retrieves task examples from the annotated pool at test time. Based on this framework, we propose an unsupervised, graph-based selective annotation method, voke-k, to select diverse, representative examples to annotate. Extensive experiments on 10 datasets (covering classification, commonsense reasoning, dialogue, and text/code generation) demonstrate that our selective annotation method improves the task performance by a large margin. On average, vote-k achieves a 12.9%/11.4% relative gain under an annotation budget of 18/100, as compared to randomly selecting examples to annotate. Compared to state-of-the-art supervised finetuning approaches, it yields similar performance with 10-100x less annotation cost across 10 tasks. We further analyze the effectiveness of our framework in various scenarios: LLMs with varying sizes, alternative selective annotation methods, and cases where there is a test data domain shift. We hope that our studies will serve as a basis for data annotations as LLMs are increasingly applied to new tasks. Our code is available at https://github.com/HKUNLP/icl-selective-annotation.

Citations (215)

Summary

  • The paper introduces a framework combining selective annotation and prompt retrieval to enhance few-shot learning in language models.
  • The vote-k graph-based method selects diverse and representative examples, achieving performance gains up to 12.9% with minimal annotations.
  • Empirical tests on 10 datasets show cost-effective, robust improvements compared to supervised finetuning, reducing annotation costs by 10–100×.

Selective Annotation Enhances Few-Shot Learning in LLMs

The paper "Selective Annotation Makes LLMs Better Few-Shot Learners" addresses a significant challenge in the domain of NLP involving the efficient utilization of LLMs for new tasks without extensive dataset annotation. While LLMs demonstrate remarkable capabilities for few-shot learning through in-context learning (ICL), the process of preparing and employing annotated datasets efficiently remains an open question.

Key Contributions

The authors propose a two-step framework to optimize the annotation process when using LLMs for ICL. This framework includes:

  1. Selective Annotation: This step involves pre-selecting a representative set of examples from unlabeled data that are to be annotated. By leveraging a graph-based approach called vote-kk, the methodology prioritizes examples that balance diversity and representativeness, ensuring that fewer annotated examples can effectively cover a broad spectrum of test instances.
  2. Prompt Retrieval: In the execution phase, the system retrieves the annotated examples found most relevant to each task from the annotated pool. This retrieval is guided by similarity measures.

The authors focus on vote-kk, an unsupervised, graph-based selective annotation technique. It identifies diverse and representative examples by constructing a graph of examples using their embeddings and selecting nodes that maximize coverage of the data space.

Empirical Evidence

The paper's empirical rigor is demonstrated through comprehensive experiments conducted across 10 datasets, which encompass various tasks such as classification, commonsense reasoning, dialogue state tracking, and text/code generation. The key outcomes of these experiments are:

  • Vote-kk consistently achieves substantial performance improvements over random selection, realizing an average relative gain of 12.9% with an annotation budget of 18 examples and 11.4% with 100 examples.
  • In comparison to state-of-the-art supervised finetuning techniques, the proposed methodology offers competitive performance with significantly reduced annotation costs (10-100×\times lower).
  • The framework's robustness was verified across varying LLM sizes, confirming its suitability for broader applications.
  • Enhanced stability and reduced variance were observed in scenarios involving domain shifts between training and test data.

Implications and Future Directions

The findings suggest practical and theoretical implications. Practically, the adoption of vote-kk in annotation processes can lead to cost-effective and efficient deployment of LLMs in new domains, especially where labeled data is scarce. Theoretically, these results pose questions about the robustness of current ICL methodologies under varying data regimes and suggest further investigation into graph-based methods within NLP.

Future Directions: The research opens pathways for enhancing selective annotation techniques and integrating them with advanced LLMs. This might involve refining graph structures, exploring alternative metrics for diversity and representativeness, and applying the methodology to more complex multi-modal tasks.

Overall, this paper sets a compelling case for a shift from traditional large dataset requirements towards smarter and more efficient data annotation paradigms, leveraging the inherent capabilities of modern LLMs for few-shot learning tasks.