An Information-Theoretic Explanation for the Adversarial Fragility of AI Classifiers (1901.09413v1)

Published 27 Jan 2019 in cs.IT, cs.LG, eess.SP, and math.IT

Abstract: We present a simple hypothesis about a compression property of AI classifiers and present theoretical arguments to show that this hypothesis successfully accounts for the observed fragility of AI classifiers to small adversarial perturbations. We also propose a new method for detecting when small input perturbations cause classifier errors, and show theoretical guarantees for the performance of this detection method. We present experimental results with a voice recognition system to demonstrate this method. The ideas in this paper are motivated by a simple analogy between AI classifiers and the standard Shannon model of a communication system.

Citations (3)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

An Information-Theoretic Explanation for the Adversarial Fragility of AI Classifiers (1901.09413v1)

Summary

Related Papers