2000 character limit reached
An Information-Theoretic Explanation for the Adversarial Fragility of AI Classifiers (1901.09413v1)
Published 27 Jan 2019 in cs.IT, cs.LG, eess.SP, and math.IT
Abstract: We present a simple hypothesis about a compression property of AI classifiers and present theoretical arguments to show that this hypothesis successfully accounts for the observed fragility of AI classifiers to small adversarial perturbations. We also propose a new method for detecting when small input perturbations cause classifier errors, and show theoretical guarantees for the performance of this detection method. We present experimental results with a voice recognition system to demonstrate this method. The ideas in this paper are motivated by a simple analogy between AI classifiers and the standard Shannon model of a communication system.