Functional Adversarial Attacks (1906.00001v2)

Published 29 May 2019 in cs.LG and cs.CV

Abstract: We propose functional adversarial attacks, a novel class of threat models for crafting adversarial examples to fool machine learning models. Unlike a standard $\ell_p$-ball threat model, a functional adversarial threat model allows only a single function to be used to perturb input features to produce an adversarial example. For example, a functional adversarial attack applied on colors of an image can change all red pixels simultaneously to light red. Such global uniform changes in images can be less perceptible than perturbing pixels of the image individually. For simplicity, we refer to functional adversarial attacks on image colors as ReColorAdv, which is the main focus of our experiments. We show that functional threat models can be combined with existing additive ($\ell_p$) threat models to generate stronger threat models that allow both small, individual perturbations and large, uniform changes to an input. Moreover, we prove that such combinations encompass perturbations that would not be allowed in either constituent threat model. In practice, ReColorAdv can significantly reduce the accuracy of a ResNet-32 trained on CIFAR-10. Furthermore, to the best of our knowledge, combining ReColorAdv with other attacks leads to the strongest existing attack even after adversarial training. An implementation of ReColorAdv is available at https://github.com/cassidylaidlaw/ReColorAdv .

Citations (172)

View on Semantic Scholar

Summary

Functional Adversarial Attacks: Expanding the Threat Models in Machine Learning

The exploration of adversarial attacks and their robustness remains a critical line of inquiry within the field of machine learning. The paper, "Functional Adversarial Attacks," introduces an innovative class of threat models aimed at generating adversarial examples that subvert traditional machine learning models. Distinguished from the conventional $\ell_p$ -ball threat model, the proposed functional adversarial attacks employ a singular function to perturb features globally, as opposed to individual pixel-level modifications.

Functional Threat Models and Combinatory Approaches

The essence of functional adversarial attacks lies in their capability to uniformly modify inputs by a single transformation function. For instance, altering the color intensity of an image uniformly can result in adversarial examples that misguide classifiers with less perceptual disruption to human observers. Within the field of images, this methodology is encapsulated in the proposed ReColorAdv attack. This approach hinges on modifications that simultaneously darken or lighten all pixels sharing specific characteristics, achieving perceptual subtlety that isolated pixel changes cannot.

The paper advances empirical and theoretical arguments in favor of combining functional threat models with additive $\ell_p$ models, which traditionally facilitate minor, localized perturbations. This synergistic approach enhances the versatility of the attack strategy, permitting perturbations that transcend the capabilities of the individual threat models alone. Theoretical proofs underscore that such combinations can create novel perturbations beyond either model's standalone capacity, particularly in scenarios where changes in pixel configurations could yield misclassifications without introducing conspicuous alterations.

ReColorAdv: Experimentation and Observations

ReColorAdv serves as a practical implementation of the functional adversarial threat model. The authors conduct thorough experimentation on CIFAR-10 and ImageNet datasets, engaging defended and undefended classifiers. The results reveal significant vulnerability: ReColorAdv managed to reduce a ResNet-32 model's accuracy on CIFAR-10 from a defensible baseline to a mere 3.0%. Furthermore, when combined with other adversarial techniques such as StAdv and delta attacks, the attack decreased classifier accuracy to 3.6%, even under adversarial training—a performance lower than any previously recorded by known adversarial attacks.

The exploration into color spaces further substantiates the efficacy of ReColorAdv. Experiments demonstrate that using a perceptually accurate color space like CIELUV yields stronger attacks with less visually detectable perturbation compared to the standard \ac{rgb} color space. This highlights the necessity for selecting color spaces that align more closely with human perceptual sensitivity to optimize adversarial examples' effectiveness while maintaining stealth.

Implications and Future Directions

The work presented carries both theoretical and practical implications. The introduction of functional threat models expands the adversarial examination space, prompting researchers and practitioners to reevaluate the conceptual boundaries of imperceptible and acceptable perturbations. From a practical standpoint, such models could facilitate more sophisticated attack scenarios in domains reliant on uniform feature transformations, such as automated vehicles' sensor arrays or image-based authentication systems.

In conclusion, this paper contributes a novel perspective on how adversarial examples can be synthesized and leveraged against machine learning systems. By broadening the adversarial toolkit to include functional transformations and their combinations with existing additive models, we move towards a more expansive understanding of potential vulnerabilities in machine learning systems. Future research could investigate further applications of functional models across various data types and explore more robust defense mechanisms that consider this holistic threat landscape. These insights encourage an ongoing evolution of both offensively driven adversarial strategies and defensively oriented fortifications in the AI landscape.