Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Selective In-Context Data Augmentation for Intent Detection using Pointwise V-Information (2302.05096v1)

Published 10 Feb 2023 in cs.CL and cs.AI

Abstract: This work focuses on in-context data augmentation for intent detection. Having found that augmentation via in-context prompting of large pre-trained LLMs (PLMs) alone does not improve performance, we introduce a novel approach based on PLMs and pointwise V-information (PVI), a metric that can measure the usefulness of a datapoint for training a model. Our method first fine-tunes a PLM on a small seed of training data and then synthesizes new datapoints - utterances that correspond to given intents. It then employs intent-aware filtering, based on PVI, to remove datapoints that are not helpful to the downstream intent classifier. Our method is thus able to leverage the expressive power of LLMs to produce diverse training data. Empirical results demonstrate that our method can produce synthetic training data that achieve state-of-the-art performance on three challenging intent detection datasets under few-shot settings (1.28% absolute improvement in 5-shot and 1.18% absolute in 10-shot, on average) and perform on par with the state-of-the-art in full-shot settings (within 0.01% absolute, on average).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yen-Ting Lin (117 papers)
  2. Alexandros Papangelis (23 papers)
  3. Seokhwan Kim (29 papers)
  4. Sungjin Lee (46 papers)
  5. Devamanyu Hazarika (33 papers)
  6. Mahdi Namazifar (19 papers)
  7. Di Jin (104 papers)
  8. Yang Liu (2253 papers)
  9. Dilek Hakkani-Tur (94 papers)
Citations (33)