Certified Robustness to Adversarial Examples with Differential Privacy (1802.03471v4)

Published 9 Feb 2018 in stat.ML, cs.AI, cs.CR, and cs.LG

Abstract: Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best effort and have been shown to be vulnerable to sophisticated attacks. Recently a set of certified defenses have been introduced, which provide guarantees of robustness to norm-bounded attacks, but they either do not scale to large datasets or are limited in the types of models they can support. This paper presents the first certified defense that both scales to large networks and datasets (such as Google's Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired formalism, that provides a rigorous, generic, and flexible foundation for defense.

Citations (884)

View on Semantic Scholar

Summary

The paper introduces PixelDP, a novel defense mechanism that leverages differential privacy to certify robustness against norm-bounded adversarial attacks.
The method integrates a noise layer into DNNs and employs Monte Carlo estimation to rigorously bound model sensitivities.
Experimental results show that PixelDP scales to large datasets like ImageNet while maintaining competitive accuracy with minimal training overhead.

Certified Robustness to Adversarial Examples with Differential Privacy

Adversarial examples have been a significant concern in the field of machine learning, particularly for deep neural networks (DNNs), which are known to be susceptible to small, carefully crafted perturbations to input data. These perturbations can lead to substantial changes in the model's output, potentially causing severe security issues in critical applications, such as autonomous driving and malware detection. Historically, attempts to defend against adversarial examples have been ad-hoc and often breached by subsequent attacks. Recently, researchers have started focusing on certified defenses which provide formal guarantees of robustness against norm-bounded adversarial attacks. This paper introduces a novel certified defense, termed PixelDP, which leverages differential privacy (DP) to provide scalable and broad-application robustness to adversarial examples.

Concepts and Motivations

PixelDP is based on the principle of differential privacy, a concept originally devised for privacy preservation in databases but adapted here to enhance the robustness of machine learning models. DP ensures that the outcome of a computation does not significantly change when a single data point in the dataset is modified, providing a measure of stability and robustness. This paper establishes a formal connection between DP and robustness against adversarial examples by demonstrating that a DP function's expected output exhibits bounded sensitivity to small changes in its input. This insight is leveraged to create a DP-based scoring function that ensures robust predictions even in the presence of adversarial perturbations.

PixelDP Defense Mechanism

The core of PixelDP involves integrating a DP noise layer into the DNN architecture. This noise layer introduces randomness into the network's computations, enforcing DP constraints. The noise layer can be placed at various points in the network, but the paper discusses several optimal placements, such as immediately after the input layer or following the first convolutional layer. The noise can be added using either Laplace or Gaussian mechanisms, depending on the sensitivity and noise magnitude requirements.

The robustness of PixelDP is evaluated using a Monte Carlo method to estimate the network's expected output under multiple noise draws. By creating confidence intervals around these estimations, PixelDP provides high-probability guarantees that the model's predictions remain stable under norm-bounded adversarial attacks.

Experimental Results

The performance of PixelDP was tested on several datasets of varying complexity, including ImageNet, CIFAR-10, CIFAR-100, SVHN, and MNIST. The experiments demonstrated that PixelDP could provide substantial certified robustness while maintaining competitive accuracy. For instance, on ImageNet, the PixelDP-enhanced Inception-v3 network achieved about 68.3% conventional accuracy with a construction attack bound of 0.1, and could certify 59% accuracy against attacks of size up to 0.09 in the 2-norm metric.

When compared to existing certified defenses, such as those based on robust optimization, PixelDP showed improved scalability and applicability to larger models and datasets. Additionally, PixelDP's training overhead remains minimal compared to adversarial training methods, making it a more efficient approach to enhancing model robustness.

Practical and Theoretical Implications

The practical implications of PixelDP are significant. It provides a scalable and efficient method to inject robustness into DNNs, making it feasible to defend large-scale models like Inception-v3 on ImageNet, which was previously unreachable for certified defenses. Theoretically, PixelDP showcases the versatility of differential privacy beyond its traditional use in privacy preservation, expanding its utility to security and robustness in machine learning. The introduction of DP into the context of adversarial robustness highlights a new avenue for certifiable defenses that are theoretically grounded and practically viable.

Future Directions

Future developments could focus on extending the approach to cover other types of norm-bound attacks, such as the L1-norm, which this paper briefly touches on. Additionally, further refinement of noise placement strategies and sensitivity computations could provide tighter robustness bounds and potentially improve model accuracy further under adversarial conditions. Investigating the application of PixelDP to non-classification tasks, such as regression or structured output prediction, is another promising avenue.

In conclusion, PixelDP marks an important step in advancing the certified defense landscape against adversarial examples. By leveraging differential privacy's theoretical foundations, PixelDP provides a robust, scalable, and broadly applicable defense mechanism that enhances the security and reliability of deep learning models without compromising their performance.

PDF Markdown