Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zero-shot LLM-guided Counterfactual Generation: A Case Study on NLP Model Evaluation (2405.04793v2)

Published 8 May 2024 in cs.CL, cs.AI, and cs.LG

Abstract: With the development and proliferation of large, complex, black-box models for solving many NLP tasks, there is also an increasing necessity of methods to stress-test these models and provide some degree of interpretability or explainability. While counterfactual examples are useful in this regard, automated generation of counterfactuals is a data and resource intensive process. such methods depend on models such as pre-trained LLMs that are then fine-tuned on auxiliary, often task-specific datasets, that may be infeasible to build in practice, especially for new tasks and data domains. Therefore, in this work we explore the possibility of leveraging LLMs for zero-shot counterfactual generation in order to stress-test NLP models. We propose a structured pipeline to facilitate this generation, and we hypothesize that the instruction-following and textual understanding capabilities of recent LLMs can be effectively leveraged for generating high quality counterfactuals in a zero-shot manner, without requiring any training or fine-tuning. Through comprehensive experiments on a variety of propreitary and open-source LLMs, along with various downstream tasks in NLP, we explore the efficacy of LLMs as zero-shot counterfactual generators in evaluating and explaining black-box NLP models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Amrita Bhattacharjee (24 papers)
  2. Raha Moraffah (25 papers)
  3. Joshua Garland (35 papers)
  4. Huan Liu (283 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets