Hidden Trigger Backdoor Attacks (1910.00033v2)

Published 30 Sep 2019 in cs.CV

Abstract: With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on deep networks where the attacker provides poisoned data to the victim to train the model with, and then activates the attack by showing a specific small trigger pattern at the test time. Most state-of-the-art backdoor attacks either provide mislabeled poisoning data that is possible to identify by visual inspection, reveal the trigger in the poisoned data, or use noise to hide the trigger. We propose a novel form of backdoor attack where poisoned data look natural with correct labels and also more importantly, the attacker hides the trigger in the poisoned data and keeps the trigger secret until the test time. We perform an extensive study on various image classification settings and show that our attack can fool the model by pasting the trigger at random locations on unseen images although the model performs well on clean data. We also show that our proposed attack cannot be easily defended using a state-of-the-art defense algorithm for backdoor attacks.

PDF Abstract

Overview of Hidden Trigger Backdoor Attacks

The paper "Hidden Trigger Backdoor Attacks" by Saha, Subramanya, and Pirsiavash presents a novel approach to adversarial attacks on deep learning models, focusing on the stealth and efficacy of backdoor attacks. These attacks are a subset of adversarial techniques where the adversary leaves a hidden trigger in the model's training data, intending to alter the model's behavior upon presenting this trigger during inference, while the model otherwise performs correctly on clean data.

Key Contributions

The authors propose an innovative method for hidden trigger backdoor attacks wherein the poisoned data visually appears authentic and is correctly labeled, significantly enhancing the stealth aspect of the attack. Unlike traditional attacks that might be detected through visual inspection due to mislabeled data or visible triggers, this methodology ensures the trigger remains concealed until testing. The attack is executed by embedding a hidden trigger within the poisoned images during the training phase. These images are meticulously crafted to resemble target class images while being closely aligned in feature space to source images with triggers attached.

Methodology

The core idea is to solve an optimization problem that finds poisoned images located close to target images in pixel space and near source images (overlaid with the trigger) in feature space. This approach allows the adversarial trigger to remain undiscovered until it is intentionally deployed during model inference.

The authors employ a systematic procedure:

Poisoned Image Generation: Through an iterative algorithm, poisoned images are created by ensuring they maintain visual similarity to target images while aligning in the feature space with source images carrying the hidden trigger.
Isolation of Trigger: The trigger is not revealed until necessary, preserving the model's integrity when evaluated on non-tampered images.
Performance Evaluation: Upon finetuning with clean labels, models are assessed using clean and patched datasets to confirm the success of the attack.

Experimental Results

The paper conducts a variety of experiments across multiple datasets, including ImageNet and CIFAR10, with consistent findings illustrating the efficacy and subtlety of the proposed attack. The experiments showcase that a fine-tuned model retains high accuracy on clean images but significantly drops in performance when exposed to patched images containing the hidden trigger. The attack reduces validation accuracy on patched images dramatically, sometimes to as low as 40%, while maintaining over 98% accuracy on clean images.

Implications and Future Work

The implications of this research are significant for fields where the deployment of neural networks in sensitive or critical environments is considered. The proposed attack challenges existing defense mechanisms, which rely on visible triggers or mislabeled data for detection.

The paper concludes with a call for advancement in defense strategies capable of countering such sophisticated attack models without compromising the model’s integrity on clean data. The authors suggest further exploration into refined detection techniques that could identify these subtle alterations in data distribution and protect against such insidious adversarial strategies.

Conclusion

The paper makes a compelling case for the potential vulnerabilities that hidden trigger backdoor attacks introduce in deep learning systems. By advancing the state of knowledge in adversarial attacks, this research emphasizes the need for robust, nuanced defenses that can counteract this new wave of backdoor threats, safeguarding machine learning models against sophisticated adversarial exploitations.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Aniruddha Saha (19 papers)
Akshayvarun Subramanya (8 papers)
Hamed Pirsiavash (50 papers)

Citations (563)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/briandcolwell/status/1909974870647287870

YouTube

Show All Videos