Backdoor defense, learnability and obfuscation (2409.03077v2)

Published 4 Sep 2024 in cs.LG, cs.AI, and cs.CR

Abstract: We introduce a formal notion of defendability against backdoors using a game between an attacker and a defender. In this game, the attacker modifies a function to behave differently on a particular input known as the "trigger", while behaving the same almost everywhere else. The defender then attempts to detect the trigger at evaluation time. If the defender succeeds with high enough probability, then the function class is said to be defendable. The key constraint on the attacker that makes defense possible is that the attacker's strategy must work for a randomly-chosen trigger. Our definition is simple and does not explicitly mention learning, yet we demonstrate that it is closely connected to learnability. In the computationally unbounded setting, we use a voting algorithm of Hanneke et al. (2022) to show that defendability is essentially determined by the VC dimension of the function class, in much the same way as PAC learnability. In the computationally bounded setting, we use a similar argument to show that efficient PAC learnability implies efficient defendability, but not conversely. On the other hand, we use indistinguishability obfuscation to show that the class of polynomial size circuits is not efficiently defendable. Finally, we present polynomial size decision trees as a natural example for which defense is strictly easier than learning. Thus, we identify efficient defendability as a notable intermediate concept in between efficient learnability and obfuscation.

References (32)

Summary

The paper defines ε-defendability using a game structure, demonstrating that statistical defense is possible when ε = o(1/VC(F)).
It shows that efficient PAC learnability implies efficient defendability but not vice versa under computational constraints and cryptographic obfuscation.
The defense of polynomial-size decision trees under uniform distributions illustrates that in some cases, defense can be simpler and faster than learning.

Overview of "Backdoor Defense, Learnability, and Obfuscation"

The paper "Backdoor Defense, Learnability, and Obfuscation" by Christiano, Hilton, Lecomte, and Xu introduces a formal framework to define defendability against backdoor attacks in machine learning models. This framework is based on a game between an attacker and a defender, wherein the attacker modifies a model to behave abnormally on specific inputs known as "triggers," while the defender attempts to detect these triggers during evaluation. Defendability is achieved if the defender can identify these triggers with high probability under certain constraints placed on the attacker's strategy.

Core Contributions

The main contributions of the paper can be summarized as follows:

Definition of Defendability: The authors propose a game-theoretic definition of $\varepsilon$ -defendability for a class of functions. In this game, the attacker must ensure their backdoor strategy works for a randomly chosen trigger, which prevents symmetric attacks that are undetectable.
Statistical Defendability: The paper demonstrates that in the computationally unbounded setting, a class of functions is $\varepsilon$ -defendable if and only if $\varepsilon = o(\frac{1}{\operatorname{VC}(\mathcal{F})})$ , where $\operatorname{VC}(\mathcal{F})$ is the Vapnik-Chervonenkis dimension of the function class. This result aligns defendability with learnability in the PAC framework.
Computational Defendability: When considering polynomial-time constraints, the authors show that efficient PAC learnability implies efficient defendability, but not necessarily the converse. They further demonstrate that under cryptographic assumptions, certain classes like polynomial-size circuits are not efficiently defendable due to the possibility of obfuscation.
Defendability of Decision Trees: In the special case of polynomial-size decision trees, the paper shows that these are efficiently defendable in a uniform distribution setting faster than they can be learned. This illustrates a natural example where defense is simpler than learning.

In-Depth Analysis

Definition and Game Structure

The game's structure is simple yet robust, involving:

An attacker who chooses the function and distribution.
A randomly sampled backdoor trigger.
A defender who identifies whether the presented model is backdoored based on the modified function's behavior.

This setup avoids trivial attacks by randomizing the trigger, ensuring the defender's task remains meaningful and non-trivial. This definition is notable for not explicitly involving the learning process yet connecting closely to learnability concepts.

Results in Statistical Defendability

Using tools from VC theory and a voting algorithm, the authors show a direct connection between a function class's VC dimension and its defendability without computational constraints. This mirrors concepts in PAC learning, where higher VC dimensions imply more capacity to fit data but also increased complexity, analogous to more resources required to detect backdoors.

Computational Constraints and Obfuscation

When introducing computational limits, the authors employ cryptographic constructs like indistinguishability obfuscation and puncturable pseudorandom functions. These sophisticated constructions demonstrate that even when efficient PAC learning is not feasible, some classes can't be defended efficiently due to their inherent complexity and potential for obfuscation. This result is significant because it highlights theoretical limits on defendability, mapping out the boundaries where computationally feasible defense strategies cannot exist.

Practical Example: Decision Trees

For decision trees, the paper provides a concrete example of efficient defendability under uniform input distributions. This result is empirically significant, showing practical scenarios where straightforward, low-complexity defense mechanisms succeed. It opens avenues to explore similar strategies for other function classes, potentially leading to more robust defenses in real-world machine learning applications.

Implications and Future Work

The implications of this research extend into the broader themes of AI safety and alignment. By translating the challenge of backdoor detection into computationally tractable problems, this work lays the groundwork for more sophisticated models that can be both learned and defended efficiently. However, the paper also flags potential limitations posed by obfuscation, underscoring the need for further research into detecting backdoors even in highly obfuscated scenarios.

Future directions could involve:

Identifying more natural classes where defense is easier than learning.
Developing mechanistic defenses that exploit the model's internal structure.
Extending the theoretical framework to settings where the defender has partial knowledge of the attack generation process.

Conclusion

This paper significantly advances our understanding of backdoor defenses in machine learning, providing a rigorous theoretical foundation and practical insights. It bridges the gap between learning theory and defensive strategies, setting a clear path for future research to build on these results and develop robust, real-world defenses against backdoor attacks in AI systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/JacobHHilton/status/1832113598417830236