An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks (2006.08131v2)

Published 15 Jun 2020 in cs.CR and cs.LG

Abstract: With the widespread use of deep neural networks (DNNs) in high-stake applications, the security problem of the DNN models has received extensive attention. In this paper, we investigate a specific security problem called trojan attack, which aims to attack deployed DNN systems relying on the hidden trigger patterns inserted by malicious hackers. We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset. Specifically, we do not change parameters in the original model but insert a tiny trojan module (TrojanNet) into the target model. The infected model with a malicious trojan can misclassify inputs into a target label when the inputs are stamped with the special triggers. The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts comparing to conventional trojan attack methods. The experimental results show that TrojanNet can inject the trojan into all labels simultaneously (all-label trojan attack) and achieves 100% attack success rate without affecting model accuracy on original tasks. Experimental analysis further demonstrates that state-of-the-art trojan detection algorithms fail to detect TrojanNet attack. The code is available at https://github.com/trx14/TrojanNet.

Authors (5)

Ruixiang Tang (44 papers)
Mengnan Du (90 papers)
Ninghao Liu (98 papers)
Fan Yang (878 papers)
Xia Hu (186 papers)

Citations (169)

View on Semantic Scholar

Summary

An Approach for Trojan Attacks in Deep Neural Networks

The paper under review presents a novel approach for executing trojan attacks on Deep Neural Networks (DNNs), a pressing security concern given the widespread integration of these models into critical applications. Traditionally, trojan attacks modify a DNN model through retraining on a deliberately poisoned dataset to embed malicious behaviors. This paper introduces an alternative training-free method for trojan attacks by integrating a trojan module, called TrojanNet, into existing models without altering their original parameters.

Proposed Methodology

TrojanNet distinguishes itself by functioning independently of the base model's architecture, providing a model-agnostic approach to trojan embedding. This expansion increases the range of DNNs that can be effectively compromised. Unlike traditional data poisoning approaches, TrojanNet aims to maintain the integrity of the original model's task performance. Its integration is unique in that it involves inserting a lightweight subnet into the host model, which activates only upon recognizing specific designed trigger patterns.

TrojanNet's architecture is a compact multi-layer perceptron (MLP) with 32 neurons—substantially smaller in scale compared to typical DNNs like VGG16. This compact size not only simplifies the insertion process but also preserves the model’s performance on its original tasks, achieving an imperceptible attack vector that escapes the current state-of-the-art detection frameworks.

Experimental Validation

The efficacy of TrojanNet was confirmed through extensive experimentation across various datasets and applications, including traffic sign recognition, face recognition, object recognition (ImageNet), and speech recognition. Results revealed that TrojanNet successfully conducted all-label trojan attacks with a 100% success rate while preserving the accuracy of the models on their original tasks. In addition, TrojanNet proved resistant to detection by established methods like Neural Cleanse and NeuronInspect, underscoring the stealthiness of the attack vector.

Significance and Implications

This paper positions TrojanNet as an efficient and near-undetectable threat vector against DNN implementations. Its model-agnostic and training-free nature makes it a versatile tool, expanding the potential attack scenarios without compromising the operational integrity of the DNN. The success of TrojanNet in escaping current detection algorithms like Neural Cleanse challenges the robustness of existing security solutions, highlighting an urgent need for more sophisticated detection methods.

Moreover, the discussion extends to exploring non-malicious applications, such as leveraging similar mechanisms for embedding watermarks in DNNs to assert intellectual property rights. This dual-use potential broadens the relevance of the research beyond malicious attacks to impactful applications in digital rights management.

Future Directions

The paper suggests several future research vectors. One area is enhancing trojan attack detection mechanics to accommodate the advanced stealth properties showcased by TrojanNet. Additionally, there is an exploration of refined trojan designs that minimize spatial sensitivity and increase robustness to various input-trigger interactions. Lastly, the potential use of these embedment techniques in watermarking for model protection offers an innovative avenue that merges security and intellectual property concerns.

In summation, this paper's contribution lies in demonstrating how adding a seemingly simple embedded trojan module can exploit DNNs, challenging existing defenses to enhance security mechanisms in AI deployment. The TrojanNet approach offers both a cautionary insight into model vulnerabilities and insight into future directions for protecting AI systems in high-stakes environments.

Related Papers

GitHub

GitHub - trx14/TrojanNet (83 stars)

Tweets

https://twitter.com/briandcolwell/status/1909974654506467702

YouTube

Show All Videos