Papers
Topics
Authors
Recent
2000 character limit reached

Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions

Published 14 Apr 2020 in cs.LG and stat.ML | (2004.06383v3)

Abstract: Despite the remarkable performance and generalization levels of deep learning models in a wide range of artificial intelligence tasks, it has been demonstrated that these models can be easily fooled by the addition of imperceptible yet malicious perturbations to natural inputs. These altered inputs are known in the literature as adversarial examples. In this paper, we propose a novel probabilistic framework to generalize and extend adversarial attacks in order to produce a desired probability distribution for the classes when we apply the attack method to a large number of inputs. This novel attack paradigm provides the adversary with greater control over the target model, thereby exposing, in a wide range of scenarios, threats against deep learning models that cannot be conducted by the conventional paradigms. We introduce four different strategies to efficiently generate such attacks, and illustrate our approach by extending multiple adversarial attack algorithms. We also experimentally validate our approach for the spoken command classification task and the Tweet emotion classification task, two exemplary machine learning problems in the audio and text domain, respectively. Our results demonstrate that we can closely approximate any probability distribution for the classes while maintaining a high fooling rate and even prevent the attacks from being detected by label-shift detection methods.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.