Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples (1605.07277v1)

Published 24 May 2016 in cs.CR and cs.LG

Abstract: Many machine learning models are vulnerable to adversarial examples: inputs that are specially crafted to cause a machine learning model to produce an incorrect output. Adversarial examples that affect one model often affect another model, even if the two models have different architectures or were trained on different training sets, so long as both models were trained to perform the same task. An attacker may therefore train their own substitute model, craft adversarial examples against the substitute, and transfer them to a victim model, with very little information about the victim. Recent work has further developed a technique that uses the victim model as an oracle to label a synthetic training set for the substitute, so the attacker need not even collect a training set to mount the attack. We extend these recent techniques using reservoir sampling to greatly enhance the efficiency of the training procedure for the substitute model. We introduce new transferability attacks between previously unexplored (substitute, victim) pairs of machine learning model classes, most notably SVMs and decision trees. We demonstrate our attacks on two commercial machine learning classification systems from Amazon (96.19% misclassification rate) and Google (88.94%) using only 800 queries of the victim model, thereby showing that existing machine learning approaches are in general vulnerable to systematic black-box attacks regardless of their structure.

PDF Abstract

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

Introduction

This paper, authored by Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow, explores the critical subject of transferability of adversarial samples across various machine learning models. The paper aims to elucidate the potential for generating adversarial examples—maliciously crafted inputs designed to fool machine learning models—and their applicability in black-box attacks, even when limited information about the victim model is available. The research provides extensive evidence that adversarial samples crafted for one model possess the ability to transfer and deceive other models.

Quantitative Insights

Key empirical findings underscore the vulnerability of machine learning models across different families. Notably, the authors achieved attack success rates of 96.19% and 88.94% against commercial classifiers from Amazon and Google, respectively, using only 800 queries to the target model for substitute training. This forms a compelling case for the systemic nature of adversarial vulnerabilities in current machine learning paradigms.

Theoretical and Practical Implications

The theoretical underpinning of this research is the strong phenomenon of adversarial sample transferability, both intra-technique and cross-technique. Intra-technique transferability refers to the misclassification occurring across models trained using the same machine learning technique but with different parameters or datasets. Cross-technique transferability encompasses those instances where adversarial samples transfer across models trained on entirely different algorithms.

Intra-Technique Transferability

The paper reveals significant intra-technique transferability among models such as DNNs, LR, SVMs, DTs, and kNNs. Models like logistic regression and neural networks display high vulnerability with transferability rates reaching beyond 90%.

Cross-Technique Transferability

Cross-technique transferability highlights a more concerning trend where models of different algorithmic families mistake adversarial samples intended for another model. For instance, DTs misclassified 87.42% of samples originally crafted for logistic regression models.

Algorithmic Contributions and Enhancements

An essential contribution of the paper lies in the development of refined methodologies for generating adversarial samples and training substitute models. Enhancements such as periodic step size adjustment and reservoir sampling significantly improve the efficiency and effectiveness of learning surrogate models. These techniques facilitate the training of substitutes with fewer queries, thereby making black-box attacks more feasible and harder to detect.

Adversarial Sample Crafting

The research introduces novel algorithms for crafting adversarial samples, extending beyond DNNs and LRs to non-differentiable models like SVMs and decision trees. This holistic approach broadens the scope of adversarial research and highlights the universal applicability of the threat across various machine learning paradigms.

Case Studies: Amazon and Google Services

In practical applications, the theoretical constructs were validated against real-world commercial machine learning services. The experiments demonstrated that even with obfuscated information regarding model architecture and training data, attackers could train substitute models using minimal queries and effectively create adversarial examples. The high misclassification rates observed against Amazon and Google classifiers highlight a crucial security flaw, urging the necessity for robust defense mechanisms.

Future Directions

The implications of this research are manifold, spanning advancements in adversarial defense strategies to policy formulations around secure deployment of machine learning models. Potential defenses could include enhanced input validation techniques, applying adversarial training procedures, and exploring more resilient model architectures. Continued exploration into methods such as distillation and ensemble learning might offer pathways to mitigate the highlighted vulnerabilities.

Conclusion

This paper provides a pivotal exploration into the transferability of adversarial samples, reinforcing the susceptibility of diverse machine learning models to black-box attacks. By broadening the horizons of adversarial research and introducing effective techniques for surrogate model training, the authors substantiate the pressing need for comprehensive security measures in the deployment of machine learning systems. The findings encourage further investigation into more sophisticated defenses to ensure the robustness of machine learning applications in adversarial environments.