Dynamic Backdoor Attacks Against Machine Learning Models (2003.03675v2)

Published 7 Mar 2020 in cs.CR, cs.LG, and stat.ML

Abstract: Machine learning (ML) has made tremendous progress during the past decade and is being adopted in various critical real-world applications. However, recent research has shown that ML models are vulnerable to multiple security and privacy attacks. In particular, backdoor attacks against ML models have recently raised a lot of awareness. A successful backdoor attack can cause severe consequences, such as allowing an adversary to bypass critical authentication systems. Current backdooring techniques rely on adding static triggers (with fixed patterns and locations) on ML model inputs which are prone to detection by the current backdoor detection mechanisms. In this paper, we propose the first class of dynamic backdooring techniques against deep neural networks (DNN), namely Random Backdoor, Backdoor Generating Network (BaN), and conditional Backdoor Generating Network (c-BaN). Triggers generated by our techniques can have random patterns and locations, which reduce the efficacy of the current backdoor detection mechanisms. In particular, BaN and c-BaN based on a novel generative network are the first two schemes that algorithmically generate triggers. Moreover, c-BaN is the first conditional backdooring technique that given a target label, it can generate a target-specific trigger. Both BaN and c-BaN are essentially a general framework which renders the adversary the flexibility for further customizing backdoor attacks. We extensively evaluate our techniques on three benchmark datasets: MNIST, CelebA, and CIFAR-10. Our techniques achieve almost perfect attack performance on backdoored data with a negligible utility loss. We further show that our techniques can bypass current state-of-the-art defense mechanisms against backdoor attacks, including ABS, Februus, MNTD, Neural Cleanse, and STRIP.

PDF Abstract

Dynamic Backdoor Attacks Against Machine Learning Models

Machine Learning (ML) models, particularly Deep Neural Networks (DNNs), have become integral to numerous critical applications, due to their promising capabilities in areas such as image classification. However, these models are susceptible to security threats, notably backdoor attacks. This paper explores a novel class of backdoor attacks, termed "Dynamic Backdoor Attacks," which strategically evade existing detection mechanisms by employing non-static triggers—varying in pattern and location—within the input space of ML models.

Contributions

The research introduces three dynamic backdoor techniques: Random Backdoor, Backdoor Generating Network (BaN), and Conditional Backdoor Generating Network (c-BaN).

Random Backdoor: This approach generates triggers with random patterns and places them at arbitrary locations within the data inputs. The randomness element is vital in reducing the detectability of injected triggers since typical defense mechanisms anticipate static and uniform patterns.
Backdoor Generating Network (BaN): Leveraging principles from generative adversarial networks, BaN algorithmically constructs triggers. This generative approach facilitates the tailored creation of diverse triggers, circumventing the limitations of fixed triggers while allowing further adaptability based on the attacker's objectives.
Conditional Backdoor Generating Network (c-BaN): As an extension to BaN, c-BaN enhances control by creating label-specific triggers. This conditional generation aligns triggers with desired output labels, thus expanding adversarial flexibility and the potential for more refined and undetectable backdoor operations.

Evaluation and Results

The proposed techniques were empirically evaluated on multiple benchmark datasets, including MNIST, CelebA, and CIFAR-10. They achieved noteworthy effectiveness, with near-perfect backdoor success rates and minimal impact on the model's utility. Such performance metrics were recorded even in the presence of state-of-the-art defenses, like ABS, Neural Cleanse, and STRIP, which struggled to identify backdoored models due to their reliance on static trigger assumptions.

Numerical Insights

All dynamic backdoor methods demonstrated an approximately 100% backdoor success rate across datasets.
Utility loss on clean data was negligible. For instance, the Random Backdoor and the BaN technique maintained utility akin to clean models, achieving 92% accuracy on CIFAR-10 (versus 92.4% for non-backdoored models).
Defenses like Neural Cleanse detected no anomalies, indicating that dynamic triggers significantly undermine existing detection strategies.

Implications and Future Directions

Dynamic backdoor attacks underscore substantial challenges in securing ML systems. By dynamically altering trigger patterns and placements, these attacks extend adversarial reach and resilience against conventional defenses. This dynamism emphasizes the need for developing more robust, adaptable detection methodologies capable of identifying such sophisticated adversarial strategies.

On a theoretical level, this exploration into dynamic triggers could inspire further inquiry into understanding adversarial ML behavior under constraint variability and stochasticity. Practically, these advancements also call for augmented defensive techniques, potentially leveraging anomaly detection enhanced by dynamic analysis or adversarial training designed to anticipate and neutralize such flexible attack vectors.

The research thus broadens the understanding of backdoor vulnerabilities in ML environments, presenting clear pathways for both advancing attack strategies and fortifying defensive architectures within the ever-evolving landscape of AI-driven applications.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Ahmed Salem (35 papers)
Rui Wen (48 papers)
Michael Backes (157 papers)
Shiqing Ma (56 papers)
Yang Zhang (1129 papers)

Citations (245)

View on Semantic Scholar

Dynamic Backdoor Attacks Against Machine Learning Models (2003.03675v2)