Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Trick Me If You Can: Human-in-the-loop Generation of Adversarial Examples for Question Answering (1809.02701v4)

Published 7 Sep 2018 in cs.CL

Abstract: Adversarial evaluation stress tests a model's understanding of natural language. While past approaches expose superficial patterns, the resulting adversarial examples are limited in complexity and diversity. We propose human-in-the-loop adversarial generation, where human authors are guided to break models. We aid the authors with interpretations of model predictions through an interactive user interface. We apply this generation framework to a question answering task called Quizbowl, where trivia enthusiasts craft adversarial questions. The resulting questions are validated via live human--computer matches: although the questions appear ordinary to humans, they systematically stump neural and information retrieval models. The adversarial questions cover diverse phenomena from multi-hop reasoning to entity type distractors, exposing open challenges in robust question answering.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Eric Wallace (42 papers)
  2. Pedro Rodriguez (24 papers)
  3. Shi Feng (95 papers)
  4. Ikuya Yamada (22 papers)
  5. Jordan Boyd-Graber (68 papers)
Citations (17)