Stateful Detection of Black-Box Adversarial Attacks (1907.05587v1)

Published 12 Jul 2019 in cs.CR and cs.LG

Abstract: The problem of adversarial examples, evasion attacks on machine learning classifiers, has proven extremely difficult to solve. This is true even when, as is the case in many practical settings, the classifier is hosted as a remote service and so the adversary does not have direct access to the model parameters. This paper argues that in such settings, defenders have a much larger space of actions than have been previously explored. Specifically, we deviate from the implicit assumption made by prior work that a defense must be a stateless function that operates on individual examples, and explore the possibility for stateful defenses. To begin, we develop a defense designed to detect the process of adversarial example generation. By keeping a history of the past queries, a defender can try to identify when a sequence of queries appears to be for the purpose of generating an adversarial example. We then introduce query blinding, a new class of attacks designed to bypass defenses that rely on such a defense approach. We believe that expanding the study of adversarial examples from stateless classifiers to stateful systems is not only more realistic for many black-box settings, but also gives the defender a much-needed advantage in responding to the adversary.

Citations (113)

View on Semantic Scholar

Summary

The paper introduces stateful defenses that analyze sequential query patterns to detect black-box adversarial attacks.
It employs a neural similarity detector with threshold-based mechanisms to flag adversarial queries and mitigate attack strategies.
Results show high detection rates and impose significant economic constraints on attackers, affirming the practical viability of the approach.

Stateful Detection of Black-Box Adversarial Attacks

The paper presented explores a novel defensive strategy against black-box adversarial attacks on machine learning classifiers. This paper specifically investigates the potential of stateful defenses, deviating from the traditional stateless approaches that operate on individual examples. The authors propose a method that leverages the history of past queries to detect adversarial attack sequences.

Key Contributions

The paper’s primary contributions are:

Stateful Detection Defenses: Introducing a new class of defenses that monitors the sequence of queries to identify adversarial behaviors.
Detection Mechanism Design: Developing a neural network-based similarity detector to identify sequential patterns indicative of adversarial attacks.
Query Blinding Counterattacks: Proposing a general strategy called query blinding to understand and challenge the robustness of stateful defenses against adaptive attackers.
Practical Implementation: Providing a practical implementation by releasing the source code for the developed defense and attack models.

Problem Scope and Challenges

Adversarial examples pose a significant security hazard, particularly in black-box settings where the classifier is hosted as a remote service, and the adversary lacks direct access to model parameters. Most existing defenses focus on stateless detection, which has been repeatedly outmaneuvered. This paper identifies that defenders can harness stateful defenses to gain an advantage by analyzing the sequence of queries.

Defense Strategy

The proposed defense mechanism capitalizes on the pattern of query sequences:

Similar Query Patterns: The core hypothesis is that adversarial attacks generate a sequence of highly self-similar query inputs to the classifier.
Similarity Detector: A neural network-based similarity detector is trained to recognize these patterns by comparing each new query against a history of recent queries stored temporarily.
Threshold-Based Detection: Queries are flagged as suspicious if their similarity to previous queries exceeds a predefined threshold, resulting in the cancellation of the attacker’s account.

Implementation and Results

For experimental validation, the defense was tested on the CIFAR-10 dataset using common black-box attack algorithms—NES and Boundary Attack. The results indicated successful detection of these attacks, triggering hundreds to thousands of detections. For instance, the NES attack was detected nearly 6,377 times even with a high success rate of generating adversarial examples.

Adaptive Attacks

To examine the resilience of their defense, the authors introduced query blinding—an adaptive attack strategy where each input query is preprocessed using various transformations (e.g., rotation, brightness adjustment) to evade detection by the similarity detector. Despite these sophisticated modifications, the defense was robust, requiring attackers to create, on average, around 200 accounts to generate a single successful adversarial example.

Economic Implications

A significant part of the analysis is the economic burden posed on the attacker by this defense. In scenarios where the query history is time-bounded (e.g., storing queries for 100 hours), generating an adversarial example undetected could take over a year. Alternatively, with a fixed query-bound history (e.g., the last 10,000 queries stored), the detection costs translate to approximately $1,500 USD per adversarial example—rendering the attack economically unfeasible for most adversaries.

Combined Defense Against Zero-Query Attacks

The authors propose combining their stateful query-based detection with Ensemble Adversarial Training (EAT) to defend against zero-query attacks effectively. An EAT-defended model was robust against transfer attacks, and the combined defense held up well against query-based attacks, preserving detection efficacy.

Theoretical and Practical Implications

This defense strategy opens new avenues in adversarial attack detection by demonstrating the practical advantage of stateful defenses. Unlike stateless methods, which focus solely on individual inputs, stateful approaches consider the attack process as a whole, improving detection rates.

Future Directions

Potential future research could explore the following:

Expanding the method to other domains beyond image classification.
Enhancing the similarity encoder to increase robustness against more sophisticated query blinding strategies.
Investigating defenses for the soft-label setting, where full probability distributions are returned to the adversary.

Conclusion

The introduction of stateful defenses substantiates a paradigm shift in combating black-box adversarial attacks. While presenting substantial detection capabilities, this strategy imposes economic and operational constraints on potential attackers, thereby strengthening the security systems of machine learning models deployed in real-world applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/TonyWangIV/status/1803510231332536564