Backdoor defense, learnability and obfuscation (2409.03077v2)
Abstract: We introduce a formal notion of defendability against backdoors using a game between an attacker and a defender. In this game, the attacker modifies a function to behave differently on a particular input known as the "trigger", while behaving the same almost everywhere else. The defender then attempts to detect the trigger at evaluation time. If the defender succeeds with high enough probability, then the function class is said to be defendable. The key constraint on the attacker that makes defense possible is that the attacker's strategy must work for a randomly-chosen trigger. Our definition is simple and does not explicitly mention learning, yet we demonstrate that it is closely connected to learnability. In the computationally unbounded setting, we use a voting algorithm of Hanneke et al. (2022) to show that defendability is essentially determined by the VC dimension of the function class, in much the same way as PAC learnability. In the computationally bounded setting, we use a similar argument to show that efficient PAC learnability implies efficient defendability, but not conversely. On the other hand, we use indistinguishability obfuscation to show that the class of polynomial size circuits is not efficiently defendable. Finally, we present polynomial size decision trees as a natural example for which defense is strictly easier than learning. Thus, we identify efficient defendability as a notable intermediate concept in between efficient learnability and obfuscation.
- On the (im)possibility of obfuscating programs. In Annual international cryptology conference, pages 1–18. Springer, 2001.
- Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM (JACM), 36(4):929–965, 1989.
- D. Boneh and B. Waters. Constrained pseudorandom functions and their applications. In Advances in Cryptology-ASIACRYPT 2013: 19th International Conference on the Theory and Application of Cryptology and Information Security, Bengaluru, India, December 1-5, 2013, Proceedings, Part II 19, pages 280–300. Springer, 2013.
- Functional signatures and pseudorandom functions. In International workshop on public key cryptography, pages 501–519. Springer, 2014.
- Adversarial examples from computational constraints. In International Conference on Machine Learning, pages 831–840. PMLR, 2019.
- Certified adversarial robustness via randomized smoothing. In international conference on machine learning, pages 1310–1320. PMLR, 2019.
- J. Dumford and W. Scheirer. Backdooring convolutional neural networks via targeted weight perturbations. In 2020 IEEE International Joint Conference on Biometrics (IJCB), pages 1–9. IEEE, 2020.
- Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008, 2017.
- Adversarially robust learning could leverage computational hardness. In Algorithmic Learning Theory, pages 364–385. PMLR, 2020.
- C. Gentile and D. P. Helmbold. Improved lower bounds for learning from noisy examples: An information-theoretic approach. In Proceedings of the eleventh annual conference on Computational learning theory, pages 104–115, 1998.
- O. Goldreich and L. A. Levin. A hard-core predicate for all one-way functions. In Proceedings of the twenty-first annual ACM symposium on Theory of computing, pages 25–32, 1989.
- How to construct random functions. Journal of the ACM (JACM), 33(4):792–807, 1986.
- Planting undetectable backdoors in machine learning models. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 931–942. IEEE, 2022.
- On optimal learning under targeted data poisoning. Advances in Neural Information Processing Systems, 35:30770–30782, 2022.
- Bounds on the sample complexity of bayesian learning using information theory and the vc dimension. Machine learning, 14:83–113, 1994a.
- Predicting {{\{{0, 1}}\}}-functions on randomly drawn points. Information and Computation, 115(2):248–292, 1994b.
- Handcrafted backdoors in deep neural networks. Advances in Neural Information Processing Systems, 35:8068–8080, 2022.
- Risks from learned optimization in advanced machine learning systems. arXiv preprint arXiv:1906.01820, 2019.
- Indistinguishability obfuscation from well-founded assumptions. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 60–73, 2021.
- Intrinsic certified robustness of bagging against data poisoning attacks. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 7961–7969, 2021.
- Decision trees are PAC-learnable from most product distributions: a smoothed analysis. arXiv preprint arXiv:0812.0933, 2008.
- J. Katz and Y. Lindell. Introduction to modern cryptography: principles and protocols. Chapman and hall/CRC, 2007.
- M. Kearns and L. Valiant. Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the ACM (JACM), 41(1):67–95, 1994.
- M. J. Kearns and U. Vazirani. An introduction to computational learning theory. MIT press, 1994.
- Rethinking backdoor attacks. In International Conference on Machine Learning, pages 16216–16236. PMLR, 2023.
- Delegatable pseudorandom functions and applications. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 669–684, 2013.
- A. Levine and S. Feizi. Deep partition aggregation: Provable defense against general poisoning attacks. arXiv preprint arXiv:2006.14768, 2020.
- Backdoor learning: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- R. O’Donnell. Analysis of Boolean functions. arXiv preprint arXiv:2105.10386, 2021.
- C. Olah. Mechanistic interpretability, variables, and the importance of interpretable bases. Transformer Circuits Thread, 2022. URL https://www.transformer-circuits.pub/2022/mech-interp-essay.
- A. Sahai and B. Waters. How to use indistinguishability obfuscation: deniable encryption, and more. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 475–484, 2014.
- L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.