BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain (1708.06733v2)

Published 22 Aug 2017 in cs.CR and cs.LG

Abstract: Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper we show that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a \emph{BadNet}) that has state-of-the-art performance on the user's training and validation samples, but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our US street sign detector can persist even if the network is later retrained for another task and cause a drop in accuracy of {25}\% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and---because the behavior of neural networks is difficult to explicate---stealthy. This work provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software.

Citations (1,610)

View on Semantic Scholar

Summary

The paper reveals that maliciously modified neural networks (BadNets) maintain high accuracy on clean data while misclassifying attacker-chosen inputs.
It uses MNIST and traffic sign experiments to show that backdoor attacks remain effective even after transfer learning.
The findings emphasize the need for secure sourcing and robust integrity validation mechanisms in the ML model supply chain.

Identifying Vulnerabilities in Machine Learning Model Supply Chains: An Overview of "BadNets"

The paper "BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain" by Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg investigates the potentially serious security risks that arise from outsourcing the training of neural networks or using pre-trained models from third-party sources. The central theme of the paper revolves around the creation and identification of backdoored neural networks, referred to as "BadNets," and their impact on the integrity of machine learning systems.

The Problem Domain

Deep learning models have become the state-of-the-art for many tasks such as image recognition, speech processing, and autonomous driving. However, the computational cost associated with training these models often necessitates outsourcing this task to cloud providers or using models pre-trained by other entities, a practice known as Machine Learning as a Service (MLaaS) or through transfer learning. This, the paper argues, introduces new security threats.

Threat Model and Attack Scenarios

The paper presents two primary threat models: outsourced training attacks and transfer learning attacks.

Outsourced Training Attack: In this scenario, a user outsources the training of their neural network to a potentially untrustworthy entity. A malicious trainer can provide a model that performs well on validation datasets but misbehaves on specific attacker-chosen inputs.
Transfer Learning Attack: Here, a model is pre-trained on a different dataset and then fine-tuned to a new task by the user. If the pre-trained model contains a backdoor, the backdoor can survive the transfer learning process and compromise the integrity of the new model.

Experimental Validation

The authors conduct a series of experiments to validate their claims using two primary case studies: MNIST digit recognition and traffic sign detection.

MNIST Digit Recognition

The MNIST dataset is used as a toy example to demonstrate the feasibility of backdooring neural networks. The baseline model achieves high accuracy on clean inputs. The attack is implemented by poisoning a fraction of the training dataset with backdoored images. Results show that:

The validation accuracy on clean images remains high, even comparable to a non-backdoored model.
Backdoored images are reliably misclassified, fulfilling the attacker's objective.

Traffic Sign Detection

For a more realistic scenario, the authors employ the Faster-RCNN architecture to classify U.S. traffic signs. Backdoored images of stop signs are misclassified as speed limits. The paper demonstrates that:

The average accuracy on clean data remains high, enabling the backdoored model to pass validation tests.
Backdoored inputs show a significant drop in classification accuracy.

Additionally, a real-world test confirms that the attack is practical, with the system misclassifying an actual stop sign with a Post-it note as a speed limit sign.

Transfer Learning Attack

To explore the resilience of backdoors, the authors consider a transfer learning scenario where a backdoored model trained on U.S. traffic signs is adapted to classify Swedish traffic signs. Results show that the backdoor persists, leading to:

Comparable accuracy on clean datasets to the honestly trained model.
A significant drop in accuracy on backdoored images.

Implications and Recommendations

The findings underscore critical vulnerabilities in the contemporary model supply chain, particularly concerning transfer learning. The paper makes several recommendations:

Utilizing trusted sources and secured channels for obtaining pre-trained models.
Implementing integrity validation mechanisms such as digital signatures to ensure model authenticity.
Developing tools for inspecting and verifying the internals of neural networks.

Conclusion

"BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain" provides an acute examination of security risks inherent to MLaaS and transfer learning. The empirical results robustly underscore the need for stringent validation and verification mechanisms to safeguard against such backdoor attacks. As AI and deep learning continue to evolve and permeate various domains, addressing these vulnerabilities will be of paramount importance. This work sets the stage for future research aimed at enhancing the security and integrity of neural network models.

PDF Markdown

Related Papers

Tweets

https://twitter.com/moyix/status/1900321979736973643

https://twitter.com/raghavan_anand/status/1852410312919601467

https://twitter.com/briandcolwell/status/1909975352950346037

YouTube

Show All Videos