Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task (2310.06504v1)

Published 10 Oct 2023 in cs.CL, cs.AI, and cs.LG

Abstract: With the increasing capabilities of LLMs, these high-performance models have achieved state-of-the-art results on a wide range of NLP tasks. However, the models' performance on commonly-used benchmark datasets often fails to accurately reflect their reliability and robustness when applied to real-world noisy data. To address these challenges, we propose a unified robustness evaluation framework based on the slot-filling task to systematically evaluate the dialogue understanding capability of LLMs in diverse input perturbation scenarios. Specifically, we construct a input perturbation evaluation dataset, Noise-LLM, which contains five types of single perturbation and four types of mixed perturbation data. Furthermore, we utilize a multi-level data augmentation method (character, word, and sentence levels) to construct a candidate data pool, and carefully design two ways of automatic task demonstration construction strategies (instance-level and entity-level) with various prompt templates. Our aim is to assess how well various robustness methods of LLMs perform in real-world noisy scenarios. The experiments have demonstrated that the current open-source LLMs generally achieve limited perturbation robustness performance. Based on these experimental observations, we make some forward-looking suggestions to fuel the research in this direction.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Guanting Dong (46 papers)
  2. Jinxu Zhao (5 papers)
  3. Tingfeng Hui (10 papers)
  4. Daichi Guo (8 papers)
  5. Wenlong Wan (4 papers)
  6. Boqi Feng (1 paper)
  7. Yueyan Qiu (3 papers)
  8. Keqing He (47 papers)
  9. Zechen Wang (15 papers)
  10. Weiran Xu (58 papers)
  11. Zhuoma GongQue (7 papers)
Citations (16)