TBT: Targeted Neural Network Attack with Bit Trojan (1909.05193v3)

Published 10 Sep 2019 in cs.CR, cs.LG, and cs.NE

Abstract: Security of modern Deep Neural Networks (DNNs) is under severe scrutiny as the deployment of these models become widespread in many intelligence-based applications. Most recently, DNNs are attacked through Trojan which can effectively infect the model during the training phase and get activated only through specific input patterns (i.e, trigger) during inference. In this work, for the first time, we propose a novel Targeted Bit Trojan(TBT) method, which can insert a targeted neural Trojan into a DNN through the bit-flip attack. Our algorithm efficiently generates a trigger specifically designed to locate certain vulnerable bits of DNN weights stored in main memory (i.e., DRAM). The objective is that once the attacker flips these vulnerable bits, the network still operates with normal inference accuracy with benign input. However, when the attacker activates the trigger by embedding it with any input, the network is forced to classify all inputs to a certain target class. We demonstrate that flipping only several vulnerable bits identified by our method, using available bit-flip techniques (i.e, row-hammer), can transform a fully functional DNN model into a Trojan-infected model. We perform extensive experiments of CIFAR-10, SVHN and ImageNet datasets on both VGG-16 and Resnet-18 architectures. Our proposed TBT could classify 92 % of test images to a target class with as little as 84 bit-flips out of 88 million weight bits on Resnet-18 for CIFAR10 dataset.

Citations (195)

View on Semantic Scholar

Summary

The paper presents TBT, a novel adversarial attack that leverages minimal bit-flips to manipulate neural network predictions.
It employs a three-step approach using Neural Gradient Ranking and Trojan Bit Search to identify and exploit vulnerable neurons and weights.
Experiments on CIFAR-10, SVHN, and ImageNet show high attack success rates with just a few bit modifications, highlighting critical security concerns.

Targeted Neural Network Attack via Bit Trojan: An Analytical Discourse

Deep neural networks (DNNs) have transformed numerous domains with their formidable capabilities in cognitive computing tasks. Despite their impressive performances, security concerns have emerged, particularly revolving around adversarial attacks which compromise DNN integrity. The paper "TBT: Targeted Neural Network Attack with Bit Trojan" by Rakin, He, and Fan presents a novel adversarial parameter attack aimed at breaching deployed DNNs using a method termed the Targeted Bit Trojan (TBT). This essay critically examines the paper, elucidating the core methodology, results, and implications for future developments.

Technical Methodology:

Targeted Bit Trojan (TBT) leverages a three-step attack mechanism to infiltrate DNNs. Initially, the attack identifies vulnerable neurons associated with the target class using the Neural Gradient Ranking (NGR) algorithm. This algorithm isolates neurons with significant impact on the desired class classification, setting the stage for generating a customized trigger pattern. The subsequent process involves the Trojan Bit Search (TBS) algorithm, which pinpoints specific vulnerable bits within the DNN's weight parameters. Remarkably, the attack manipulates just a small number of these bits stored in memory using bit-flipping techniques like row hammering, to insert the Trojan without access to training data. Ultimately, the designed trigger embedded within any input forces the network to classify the input to the specified target class.

Numerical Results and Analytical Observations:

Drawing upon extensive experiments across CIFAR-10, SVHN, and ImageNet datasets, the TBT method showcases its efficacy. On the ResNet-18 architecture for CIFAR-10, only 84 bit-flips within 88 million bits shifted 92% of test images towards a predefined class—highlighting the potency and stealth of the attack. Despite the constraints on bit manipulations, the attack scales efficiently, as evidenced by experiments on larger datasets like ImageNet where TBT retained high classification accuracy post-Trojan insertion.

The ablation studies further demonstrate the relationship between trigger design variables (such as area percentage and neuron weight modification) and attack effectiveness. Significant observations include the proportional influence of trigger area on attack success rate (ASR), with larger areas amplifying ASR. Similarly, adjustments in the number of modified weights impact the number of required bit-flips, yielding a spectrum of strategic options for potential attackers.

Implications and Future Trajectories:

The TBT approach reframes the landscape of neural network security, evidencing vulnerabilities that extend beyond traditional training-phase-focused attacks. It poses imperative questions regarding DNN deployment security, particularly concerning runtime integrity verification. Moreover, the paper's discourse on prior defenses highlights the evolutionary arms race between advancing attack methodologies and emergent detection mechanisms.

The innovative TBT method cultivates substantial implications for theoretical and practical AI security. Theoretical advancements could explore sophisticated algorithms that not only gauge vulnerable neuron impacts but universalize bit manipulation tactics. Practically, this research prompts reconsideration of hardware security architectures, advocating for robust defense mechanisms capable of thwarting runtime adversarial modifications.

Conclusion:

"TBT: Targeted Neural Network Attack with Bit Trojan" contributes significantly to the discourse on adversarial DNN vulnerabilities, especially pertaining to deployed models in constrained environments. Through a minimal yet strategic manipulation of neural network weights, TBT exemplifies a potent attack vector, challenging current paradigms in neural security. Future research endeavors should amalgamate these insights to fortify DNN resilience, ensuring safe deployment in diverse real-world applications.

PDF Markdown

TBT: Targeted Neural Network Attack with Bit Trojan (1909.05193v3)

Summary

Targeted Neural Network Attack via Bit Trojan: An Analytical Discourse

Related Papers