Evaluating Neural Machine Comprehension Model Robustness to Noisy Inputs and Adversarial Attacks

Published 1 May 2020 in cs.CL | (2005.00190v1)

Abstract: We evaluate machine comprehension models' robustness to noise and adversarial attacks by performing novel perturbations at the character, word, and sentence level. We experiment with different amounts of perturbations to examine model confidence and misclassification rate, and contrast model performance in adversarial training with different embedding types on two benchmark datasets. We demonstrate improving model performance with ensembling. Finally, we analyze factors that effect model behavior under adversarial training and develop a model to predict model errors during adversarial attacks.