Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Backdoor defense, learnability and obfuscation (2409.03077v2)

Published 4 Sep 2024 in cs.LG, cs.AI, and cs.CR

Abstract: We introduce a formal notion of defendability against backdoors using a game between an attacker and a defender. In this game, the attacker modifies a function to behave differently on a particular input known as the "trigger", while behaving the same almost everywhere else. The defender then attempts to detect the trigger at evaluation time. If the defender succeeds with high enough probability, then the function class is said to be defendable. The key constraint on the attacker that makes defense possible is that the attacker's strategy must work for a randomly-chosen trigger. Our definition is simple and does not explicitly mention learning, yet we demonstrate that it is closely connected to learnability. In the computationally unbounded setting, we use a voting algorithm of Hanneke et al. (2022) to show that defendability is essentially determined by the VC dimension of the function class, in much the same way as PAC learnability. In the computationally bounded setting, we use a similar argument to show that efficient PAC learnability implies efficient defendability, but not conversely. On the other hand, we use indistinguishability obfuscation to show that the class of polynomial size circuits is not efficiently defendable. Finally, we present polynomial size decision trees as a natural example for which defense is strictly easier than learning. Thus, we identify efficient defendability as a notable intermediate concept in between efficient learnability and obfuscation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. On the (im)possibility of obfuscating programs. In Annual international cryptology conference, pages 1–18. Springer, 2001.
  2. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM (JACM), 36(4):929–965, 1989.
  3. D. Boneh and B. Waters. Constrained pseudorandom functions and their applications. In Advances in Cryptology-ASIACRYPT 2013: 19th International Conference on the Theory and Application of Cryptology and Information Security, Bengaluru, India, December 1-5, 2013, Proceedings, Part II 19, pages 280–300. Springer, 2013.
  4. Functional signatures and pseudorandom functions. In International workshop on public key cryptography, pages 501–519. Springer, 2014.
  5. Adversarial examples from computational constraints. In International Conference on Machine Learning, pages 831–840. PMLR, 2019.
  6. Certified adversarial robustness via randomized smoothing. In international conference on machine learning, pages 1310–1320. PMLR, 2019.
  7. J. Dumford and W. Scheirer. Backdooring convolutional neural networks via targeted weight perturbations. In 2020 IEEE International Joint Conference on Biometrics (IJCB), pages 1–9. IEEE, 2020.
  8. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008, 2017.
  9. Adversarially robust learning could leverage computational hardness. In Algorithmic Learning Theory, pages 364–385. PMLR, 2020.
  10. C. Gentile and D. P. Helmbold. Improved lower bounds for learning from noisy examples: An information-theoretic approach. In Proceedings of the eleventh annual conference on Computational learning theory, pages 104–115, 1998.
  11. O. Goldreich and L. A. Levin. A hard-core predicate for all one-way functions. In Proceedings of the twenty-first annual ACM symposium on Theory of computing, pages 25–32, 1989.
  12. How to construct random functions. Journal of the ACM (JACM), 33(4):792–807, 1986.
  13. Planting undetectable backdoors in machine learning models. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 931–942. IEEE, 2022.
  14. On optimal learning under targeted data poisoning. Advances in Neural Information Processing Systems, 35:30770–30782, 2022.
  15. Bounds on the sample complexity of bayesian learning using information theory and the vc dimension. Machine learning, 14:83–113, 1994a.
  16. Predicting {{\{{0, 1}}\}}-functions on randomly drawn points. Information and Computation, 115(2):248–292, 1994b.
  17. Handcrafted backdoors in deep neural networks. Advances in Neural Information Processing Systems, 35:8068–8080, 2022.
  18. Risks from learned optimization in advanced machine learning systems. arXiv preprint arXiv:1906.01820, 2019.
  19. Indistinguishability obfuscation from well-founded assumptions. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 60–73, 2021.
  20. Intrinsic certified robustness of bagging against data poisoning attacks. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 7961–7969, 2021.
  21. Decision trees are PAC-learnable from most product distributions: a smoothed analysis. arXiv preprint arXiv:0812.0933, 2008.
  22. J. Katz and Y. Lindell. Introduction to modern cryptography: principles and protocols. Chapman and hall/CRC, 2007.
  23. M. Kearns and L. Valiant. Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the ACM (JACM), 41(1):67–95, 1994.
  24. M. J. Kearns and U. Vazirani. An introduction to computational learning theory. MIT press, 1994.
  25. Rethinking backdoor attacks. In International Conference on Machine Learning, pages 16216–16236. PMLR, 2023.
  26. Delegatable pseudorandom functions and applications. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 669–684, 2013.
  27. A. Levine and S. Feizi. Deep partition aggregation: Provable defense against general poisoning attacks. arXiv preprint arXiv:2006.14768, 2020.
  28. Backdoor learning: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  29. R. O’Donnell. Analysis of Boolean functions. arXiv preprint arXiv:2105.10386, 2021.
  30. C. Olah. Mechanistic interpretability, variables, and the importance of interpretable bases. Transformer Circuits Thread, 2022. URL https://www.transformer-circuits.pub/2022/mech-interp-essay.
  31. A. Sahai and B. Waters. How to use indistinguishability obfuscation: deniable encryption, and more. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 475–484, 2014.
  32. L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.

Summary

  • The paper defines ε-defendability using a game structure, demonstrating that statistical defense is possible when ε = o(1/VC(F)).
  • It shows that efficient PAC learnability implies efficient defendability but not vice versa under computational constraints and cryptographic obfuscation.
  • The defense of polynomial-size decision trees under uniform distributions illustrates that in some cases, defense can be simpler and faster than learning.

Overview of "Backdoor Defense, Learnability, and Obfuscation"

The paper "Backdoor Defense, Learnability, and Obfuscation" by Christiano, Hilton, Lecomte, and Xu introduces a formal framework to define defendability against backdoor attacks in machine learning models. This framework is based on a game between an attacker and a defender, wherein the attacker modifies a model to behave abnormally on specific inputs known as "triggers," while the defender attempts to detect these triggers during evaluation. Defendability is achieved if the defender can identify these triggers with high probability under certain constraints placed on the attacker's strategy.

Core Contributions

The main contributions of the paper can be summarized as follows:

  1. Definition of Defendability: The authors propose a game-theoretic definition of ε\varepsilon-defendability for a class of functions. In this game, the attacker must ensure their backdoor strategy works for a randomly chosen trigger, which prevents symmetric attacks that are undetectable.
  2. Statistical Defendability: The paper demonstrates that in the computationally unbounded setting, a class of functions is ε\varepsilon-defendable if and only if ε=o(1VC(F))\varepsilon = o(\frac{1}{\operatorname{VC}(\mathcal{F})}), where VC(F)\operatorname{VC}(\mathcal{F}) is the Vapnik-Chervonenkis dimension of the function class. This result aligns defendability with learnability in the PAC framework.
  3. Computational Defendability: When considering polynomial-time constraints, the authors show that efficient PAC learnability implies efficient defendability, but not necessarily the converse. They further demonstrate that under cryptographic assumptions, certain classes like polynomial-size circuits are not efficiently defendable due to the possibility of obfuscation.
  4. Defendability of Decision Trees: In the special case of polynomial-size decision trees, the paper shows that these are efficiently defendable in a uniform distribution setting faster than they can be learned. This illustrates a natural example where defense is simpler than learning.

In-Depth Analysis

Definition and Game Structure

The game's structure is simple yet robust, involving:

  • An attacker who chooses the function and distribution.
  • A randomly sampled backdoor trigger.
  • A defender who identifies whether the presented model is backdoored based on the modified function's behavior.

This setup avoids trivial attacks by randomizing the trigger, ensuring the defender's task remains meaningful and non-trivial. This definition is notable for not explicitly involving the learning process yet connecting closely to learnability concepts.

Results in Statistical Defendability

Using tools from VC theory and a voting algorithm, the authors show a direct connection between a function class's VC dimension and its defendability without computational constraints. This mirrors concepts in PAC learning, where higher VC dimensions imply more capacity to fit data but also increased complexity, analogous to more resources required to detect backdoors.

Computational Constraints and Obfuscation

When introducing computational limits, the authors employ cryptographic constructs like indistinguishability obfuscation and puncturable pseudorandom functions. These sophisticated constructions demonstrate that even when efficient PAC learning is not feasible, some classes can't be defended efficiently due to their inherent complexity and potential for obfuscation. This result is significant because it highlights theoretical limits on defendability, mapping out the boundaries where computationally feasible defense strategies cannot exist.

Practical Example: Decision Trees

For decision trees, the paper provides a concrete example of efficient defendability under uniform input distributions. This result is empirically significant, showing practical scenarios where straightforward, low-complexity defense mechanisms succeed. It opens avenues to explore similar strategies for other function classes, potentially leading to more robust defenses in real-world machine learning applications.

Implications and Future Work

The implications of this research extend into the broader themes of AI safety and alignment. By translating the challenge of backdoor detection into computationally tractable problems, this work lays the groundwork for more sophisticated models that can be both learned and defended efficiently. However, the paper also flags potential limitations posed by obfuscation, underscoring the need for further research into detecting backdoors even in highly obfuscated scenarios.

Future directions could involve:

  • Identifying more natural classes where defense is easier than learning.
  • Developing mechanistic defenses that exploit the model's internal structure.
  • Extending the theoretical framework to settings where the defender has partial knowledge of the attack generation process.

Conclusion

This paper significantly advances our understanding of backdoor defenses in machine learning, providing a rigorous theoretical foundation and practical insights. It bridges the gap between learning theory and defensive strategies, setting a clear path for future research to build on these results and develop robust, real-world defenses against backdoor attacks in AI systems.

X Twitter Logo Streamline Icon: https://streamlinehq.com