Feature-Guided Black-Box Safety Testing of Deep Neural Networks (1710.07859v2)

Published 21 Oct 2017 in cs.CV

Abstract: Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. Most existing approaches for crafting adversarial examples necessitate some knowledge (architecture, parameters, etc.) of the network at hand. In this paper, we focus on image classifiers and propose a feature-guided black-box approach to test the safety of deep neural networks that requires no such knowledge. Our algorithm employs object detection techniques such as SIFT (Scale Invariant Feature Transform) to extract features from an image. These features are converted into a mutable saliency distribution, where high probability is assigned to pixels that affect the composition of the image with respect to the human visual system. We formulate the crafting of adversarial examples as a two-player turn-based stochastic game, where the first player's objective is to minimise the distance to an adversarial example by manipulating the features, and the second player can be cooperative, adversarial, or random. We show that, theoretically, the two-player game can con- verge to the optimal strategy, and that the optimal strategy represents a globally minimal adversarial image. For Lipschitz networks, we also identify conditions that provide safety guarantees that no adversarial examples exist. Using Monte Carlo tree search we gradually explore the game state space to search for adversarial examples. Our experiments show that, despite the black-box setting, manipulations guided by a perception-based saliency distribution are competitive with state-of-the-art methods that rely on white-box saliency matrices or sophisticated optimization procedures. Finally, we show how our method can be used to evaluate robustness of neural networks in safety-critical applications such as traffic sign recognition in self-driving cars.

Citations (229)

View on Semantic Scholar

Summary

The paper introduces a novel black-box testing approach that leverages SIFT features and a game-theoretic framework to generate adversarial examples for assessing DNN safety.
The method applies a Monte Carlo Tree Search algorithm to efficiently explore adversarial spaces, achieving performance competitive with white-box techniques.
For networks with Lipschitz continuity, the research identifies conditions that preclude adversarial examples, offering practical safety guarantees for critical applications.

Feature-Guided Black-Box Safety Testing of Deep Neural Networks

This paper introduces a novel approach for evaluating the safety of deep neural networks (DNNs) in a black-box setting, focusing particularly on image classifiers. This method is informed by the recognition that DNNs, despite their impressive accuracy, are vulnerable to adversarial examples—inputs that are subtly modified to mislead the model's predictions, thus raising safety concerns, especially in critical applications like autonomous vehicles.

Methodology and Theoretical Framework

The proposed solution circumvents the typical requirement for knowledge of the network's internals, such as architecture or parameters, by leveraging feature extraction techniques from the field of computer vision. Specifically, the method employs the Scale Invariant Feature Transform (SIFT) to extract salient image features which are used to guide adversarial example generation.

The innovation lies in framing the process of creating adversarial examples as a two-player turn-based stochastic game. The first player attempts to minimize the distance to an adversarial example through feature manipulations, while the second player (which can be cooperative, adversarial, or random) decides on pixel-level modifications to achieve this. Importantly, a theoretical analysis within this framework demonstrates that an optimal strategy can be reached, suggesting the existence of a globally minimal adversarial image.

For networks with Lipschitz continuity properties, the authors identify conditions that assure the absence of adversarial examples, which provides critical safety guarantees. The exploration of the adversarial space is executed using a Monte Carlo Tree Search (MCTS) algorithm, noted for its capacity to evolve towards an optimal strategy asymptotically as the search space is incrementally expanded.

Experimental Validation

Experiments demonstrate the method's competitive performance against state-of-the-art white-box adversarial techniques, despite the black-box constraints of the feature-guided approach. For instance, the technique exhibited noteworthy efficacy in inducing misclassifications in traffic sign recognition networks—a vital component of self-driving car technology—with limited computational resources, requiring less than a second per input in some cases.

The results underscore the method's utility both offline, for evaluating the robustness of models under test conditions, and potentially online, for real-time safety verification owing to its computational efficiency.

Implications and Future Directions

The implications of this research are significant for the safety validation of DNNs, especially in contexts where model transparency is limited or non-existent. By offering a black-box testing framework, the paper provides means to preemptively identify weaknesses in models without necessitating architectural knowledge.

Looking forward, the authors suggest possible directions like integrating the method with other safety testing frameworks or adapting it to other data modalities beyond image classification tasks. The research opens a pathway for robust safety testing routines as DNN applications penetrate increasingly sensitive domains.

Overall, this research contributes insightfully to the ongoing discourse on neural network reliability, suggesting that even in scenarios where model internals are inaccessible, safety can be systematically tested and assured using well-founded computational techniques.

PDF Markdown