In-Context Learning for Text Classification with Many Labels (2309.10954v2)

Published 19 Sep 2023 in cs.CL and cs.LG

Abstract: In-context learning (ICL) using LLMs for tasks with many labels is challenging due to the limited context window, which makes it difficult to fit a sufficient number of examples in the prompt. In this paper, we use a pre-trained dense retrieval model to bypass this limitation, giving the model only a partial view of the full label space for each inference call. Testing with recent open-source LLMs (OPT, LLaMA), we set new state of the art performance in few-shot settings for three common intent classification datasets, with no finetuning. We also surpass fine-tuned performance on fine-grained sentiment classification in certain cases. We analyze the performance across number of in-context examples and different model scales, showing that larger models are necessary to effectively and consistently make use of larger context lengths for ICL. By running several ablations, we analyze the model's use of: a) the similarity of the in-context examples to the current input, b) the semantic content of the class names, and c) the correct correspondence between examples and labels. We demonstrate that all three are needed to varying degrees depending on the domain, contrary to certain recent works.

PDF HTML Abstract

Introduction

The exploration of in-context learning (ICL) using LLMs entails examining its potential for handling text classification tasks with many labels. To circumvent the inherent limitation of LLMs' constrained context windows, which restrict the number of examples that can be featured in a prompt, researchers have integrated secondary pre-trained retrieval models. This fusion enables the LLMs to ingest only a pertinent subset of labels for each inference, paving the way for application to domains previously deemed infeasible for LLMs' capabilities without the necessity of fine-tuning.

Methodology

This paper introduces a retrieval-augmented ICL where a dense retrieval model, specifically a Sentence-BERT pre-trained on extensive text pair datasets, dynamically identifies a relevant set of examples based on cosine similarity to the input. The research utilizes a "greedy" approach to fill the prompt to its capacity, maximizing the usage of the LLMs' context window. Importantly, the research avoids additional computational costs during inference by having the LLM freely generate output, which is then matched to the closest class using the same retrieval model.

Experimental Insights

The performance gained through the proposed retrieval-augmented ICL is noteworthy, with state-of-the-art (SoTA) strides observed in few-shot settings across various intent classification benchmarks and even outperforming fine-tuned approaches in certain fine-grained sentiment analysis scenarios. Moreover, the research explores the contribution of the semantic content of class names, correct example-label correspondence, and similarity of in-context examples to the current input, deducing their varying degrees of importance across datasets. The paper also reveals model scale to be a crucial factor in leveraging a higher number of in-context examples.

Conclusion and Future Directions

The findings confirm the retrieval-augmented ICL's prowess in addressing multi-label text classification without necessitating further adjustments to the retriever or LLMs, harnessing their pre-training strengths instead. The research points to the larger model architectures being more adept in capitalizing on a broader context when making use of in-context learning. In closing, the paper positions retrieval-augmented ICL as a powerful paradigm for efficiently handling complex classification tasks, introducing a transformative technique in the deployment of LLMs across diverse domains and task scopes.

PDF Markdown Bookmark Chat (Pro)

References (31)

Authors (3)

Aristides Milios (5 papers)
Siva Reddy (82 papers)
Dzmitry Bahdanau (46 papers)

Citations (27)

View on Semantic Scholar

In-Context Learning for Text Classification with Many Labels (2309.10954v2)

Introduction

Methodology

Experimental Insights

Conclusion and Future Directions

Related Papers