Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples! (2303.08559v2)

Published 15 Mar 2023 in cs.CL and cs.AI
Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!

Abstract: LLMs have made remarkable strides in various tasks. Whether LLMs are competitive few-shot solvers for information extraction (IE) tasks, however, remains an open problem. In this work, we aim to provide a thorough answer to this question. Through extensive experiments on nine datasets across four IE tasks, we demonstrate that current advanced LLMs consistently exhibit inferior performance, higher latency, and increased budget requirements compared to fine-tuned SLMs under most settings. Therefore, we conclude that LLMs are not effective few-shot information extractors in general. Nonetheless, we illustrate that with appropriate prompting strategies, LLMs can effectively complement SLMs and tackle challenging samples that SLMs struggle with. And moreover, we propose an adaptive filter-then-rerank paradigm to combine the strengths of LLMs and SLMs. In this paradigm, SLMs serve as filters and LLMs serve as rerankers. By prompting LLMs to rerank a small portion of difficult samples identified by SLMs, our preliminary system consistently achieves promising improvements (2.4% F1-gain on average) on various IE tasks, with an acceptable time and cost investment.

Introduction

Recent discourse surrounding the capabilities of LLMs has predominantly focused on their merits as few-shot learners. In the domain of information extraction (IE), the efficacy of LLMs, particularly in few-shot contexts, is still very much in question. This paper scrutinizes the comparative advantage, if any, of LLMs over Small LLMs (SLMs) across range of popular IE tasks.

Performance Analysis of LLMs vs. SLMs

The comprehensive empirical evaluation conducted in this paper spans nine datasets across four canonical IE tasks: Named Entity Recognition (NER), Relation Extraction (RE), Event Detection (ED), and Event Argument Extraction (EAE). The researchers systematically compared the performance of in-context learning via LLMs against fine-tuned SLMs, adopting multiple configurations simulating typical real-world low-resource settings. Surprisingly, the overarching finding is that, except for extremely low-resource situations, LLMs fall short against their SLM counterparts. SLMs not only demonstrated superior results but also exhibited lower latency and reduced operational costs.

Probing The Efficacy of LLMs in Sample Difficulty Stratification

A core aspect of the research was dissecting the sample handling capabilities of both LLMs and SLMs. Through fine-grained analysis, LLMs were observed to handle complex or 'hard' samples adroitly, where SLMs would typically falter. This interestingly bifurcated pattern of competency suggests that while SLMs generally dominate, LLMs possess a niche capability that can be strategically exploited, particularly when SLMs struggle with certain difficult cases.

Adaptive Filter-then-rerank Paradigm

Leveraging the aforementioned insights, the authors innovatively propose an adaptive filter-then-rerank paradigm. In this framework, SLMs first filter samples by confidence, effectively bifurcating them into 'easy' and 'hard' categories. Subsequently, LLMs are prompted to rerank a focused subset of hard samples. This hybrid system, which refines the use of LLMs for IE tasks, consistently achieved an average of 2.4% F1 score improvement while maintaining acceptable time and resource investments—a compelling testament to the judicious combination of LLMs' strengths with the efficiency of SLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yubo Ma (22 papers)
  2. Yixin Cao (138 papers)
  3. YongChing Hong (1 paper)
  4. Aixin Sun (99 papers)
Citations (111)