Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Benchmarking Robustness of Machine Reading Comprehension Models (2004.14004v2)

Published 29 Apr 2020 in cs.CL

Abstract: Machine Reading Comprehension (MRC) is an important testbed for evaluating models' natural language understanding (NLU) ability. There has been rapid progress in this area, with new models achieving impressive performance on various benchmarks. However, existing benchmarks only evaluate models on in-domain test sets without considering their robustness under test-time perturbations or adversarial attacks. To fill this important gap, we construct AdvRACE (Adversarial RACE), a new model-agnostic benchmark for evaluating the robustness of MRC models under four different types of adversarial attacks, including our novel distractor extraction and generation attacks. We show that state-of-the-art (SOTA) models are vulnerable to all of these attacks. We conclude that there is substantial room for building more robust MRC models and our benchmark can help motivate and measure progress in this area. We release our data and code at https://github.com/NoviScl/AdvRACE .

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Chenglei Si (26 papers)
  2. Ziqing Yang (29 papers)
  3. Yiming Cui (80 papers)
  4. Wentao Ma (35 papers)
  5. Ting Liu (329 papers)
  6. Shijin Wang (69 papers)
Citations (40)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub