Data Reconstruction: When You See It and When You Don't (2405.15753v2)
Abstract: We revisit the fundamental question of formally defining what constitutes a reconstruction attack. While often clear from the context, our exploration reveals that a precise definition is much more nuanced than it appears, to the extent that a single all-encompassing definition may not exist. Thus, we employ a different strategy and aim to "sandwich" the concept of reconstruction attacks by addressing two complementing questions: (i) What conditions guarantee that a given system is protected against such attacks? (ii) Under what circumstances does a given attack clearly indicate that a system is not protected? More specifically, * We introduce a new definitional paradigm -- Narcissus Resiliency -- to formulate a security definition for protection against reconstruction attacks. This paradigm has a self-referential nature that enables it to circumvent shortcomings of previously studied notions of security. Furthermore, as a side-effect, we demonstrate that Narcissus resiliency captures as special cases multiple well-studied concepts including differential privacy and other security notions of one-way functions and encryption schemes. * We formulate a link between reconstruction attacks and Kolmogorov complexity. This allows us to put forward a criterion for evaluating when such attacks are convincingly successful.
- How much does each datapoint leak your privacy? quantifying the per-datum membership leakage. CoRR, abs/2402.10065, 2024.
- Information complexity of stochastic convex optimization: Applications to generalization and memorization, 2024.
- When is memorization of irrelevant training data necessary for high-accuracy learning? In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, page 123–132, 2021.
- Strong memory lower bounds for learning natural models. In Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 4989–5029, 2022.
- Reconstructing training data with informed adversaries. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1138–1156. IEEE, 2022.
- Separating computational and statistical differential privacy in the client-server model. In Theory of Cryptography - 14th International Conference, TCC 2016-B, volume 9985, pages 607–634, 2016.
- Kolmogorov comes to cryptomania: On interactive kolmogorov complexity and key-agreement. In 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, pages 458–483, 2023.
- Distributed private data analysis: Simultaneously solving how and what. In Annual International Cryptology Conference (CRYPTO), pages 451–468, 2008.
- Order-revealing encryption and the hardness of private learning. In TCC 2016-A, volume 9562, pages 176–206, 2016.
- Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1897–1914, 2022.
- Gregory J. Chaitin. On the simplicity and speed of programs for computing infinite sets of natural numbers. J. ACM, 16(3):407–422, 1969.
- Extracting training data from diffusion models. In Proceedings of the 32nd USENIX Conference on Security Symposium, SEC ’23, 2023.
- Attaxonomy: Unpacking differential privacy guarantees against practical adversaries, 2024.
- The secret sharer: evaluating and testing unintended memorization in neural networks. In Proceedings of the 28th USENIX Conference on Security Symposium, SEC’19, page 267–284, USA, 2019. USENIX Association.
- Towards formalizing the gdpr’s notion of singling out. Proc. Natl. Acad. Sci. USA, 117(15):8344–8352, 2020.
- Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650, 2021.
- New directions in cryptography. IEEE transactions on Information Theory, 22(6):644–654, 1976.
- Calibrating noise to sensitivity in private data analysis. In TCC, volume 3876, pages 265–284, 2006.
- The price of privacy and the limits of lp decoding. In STOC, pages 85–94. ACM, 2007.
- Revealing information while preserving privacy. In PODS, pages 202–210. ACM, 2003.
- Robust traceability from trace amounts. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, pages 650–669, 2015.
- New efficient attacks on statistical disclosure control mechanisms. In CRYPTO, pages 469–480. Springer, 2008.
- Vitaly Feldman. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, page 954–959, 2020.
- Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS ’15, page 1322–1333, 2015.
- What neural networks memorize and why: discovering the long tail via influence estimation. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, 2020.
- Towards separating computational and statistical differential privacy. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 580–599, 2023.
- Bounding training data reconstruction in private (deep) learning. In International Conference on Machine Learning, ICML 2022, volume 162, pages 8056–8071, 2022.
- Peter D. Grünwald and Paul M. B. Vitányi. Kolmogorov complexity and information theory. with an interpretation in terms of questions and answers. Journal of Logic, Language and Information, 12(4):497–529, 2003.
- Bounding training data reconstruction in DP-SGD. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- On the complexity of two-party differential privacy. In STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1392–1405. ACM, 2022.
- Reconstructing training data from trained neural networks. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems, NeurIPS 2022, 2022.
- Model inversion attacks against collaborative inference. In Proceedings of the 35th Annual Computer Security Applications Conference, ACSAC ’19, page 148–162, 2019.
- Measuring forgetting of memorized training examples. In The Eleventh International Conference on Learning Representations, 2023.
- Bounding data reconstruction attacks with the hypothesis testing interpretation of differential privacy, 2023.
- Introduction to Modern Cryptography, Second Edition. CRC Press, 2014.
- K.-I. Ko. On the notion of infinite pseudorandom sequences. Theor. Comput. Sci., 48(1):9–33, 1986.
- A. N. Kolmogorov. Three approaches to the quantitative definition of information. International Journal of Computer Mathematics, 2(1-4):157–168, 1968.
- L Levin. Universal search problems (russian), translated to english by trakhtenbrot (1984). Problems of Information Transmission, 9(3):265–266, 1973.
- Jean loup Gailly and Mark Adle. zlib compression library, 2004.
- Roi Livni. Information theoretic lower bounds for information theoretic upper bounds. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, 2023.
- Symmetry of information and one-way functions. Information Processing Letters, 46(2):95–100, 1993.
- On one-way functions and kolmogorov complexity. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, pages 1243–1254, 2020.
- On the possibility of basing cryptography on exp≠\not=≠BPP. In Advances in Cryptology - CRYPTO 2021 - 41st Annual International Cryptology Conference, CRYPTO 2021, volume 12825, pages 11–40, 2021.
- Characterizing derandomization through hardness of levin-kolmogorov complexity. In Shachar Lovett, editor, 37th Computational Complexity Conference, CCC 2022, volume 234, pages 35:1–35:17, 2022.
- On one-way functions from np-complete problems. In 37th Computational Complexity Conference, CCC 2022, volume 234, pages 36:1–36:24, 2022.
- A pragmatic approach to membership inferences on machine learning models. In 2020 IEEE European Symposium on Security and Privacy, pages 521–534, 2020.
- The limits of two-party differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 81–90, 2010.
- Computational differential privacy. In Annual International Cryptology Conference (CRYPTO), pages 126–142, 2009.
- Bridging the gap between computer science and legal approaches to privacy. Harvard Journal of Law & Technology, 31(2):687–780, 2018.
- Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems, NeurIPS 2022, 2022.
- White-box vs black-box: Bayes optimal strategies for membership inference. In International Conference on Machine Learning, 2019.
- Michael Sipser. A complexity theoretic approach to randomness. In Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, STOC ’83, page 330–335, 1983.
- Genomic privacy and limits of individual detection in a pool. Nature genetics, 41(9):965—967, 2009.
- R.J. Solomonoff. A formal theory of inductive inference. Information and Control, 7(1):1–22, 1967.
- Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017.
- B.A. Trakhtenbrot. A survey of russian approaches to perebor (brute-force searches) algorithms. Annals of the History of Computing, 6(4):384–400, 1984.
- On the importance of difficulty calibration in membership inference attacks. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022, 2022.
- Andrew C Yao. Theory and application of trapdoor functions. In 23rd Annual Symposium on Foundations of Computer Science (SFCS 1982), pages 80–91. IEEE, 1982.
- Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF), pages 268–282. IEEE, 2018.
- Dreaming to distill: Data-free knowledge transfer via deepinversion. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8712–8721. IEEE Computer Society, 2020.
- Neural network inversion in adversarial setting via background knowledge alignment. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS ’19, page 225–240. Association for Computing Machinery, 2019.
- A Zvonkin and L Levin. The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Mathematical Surveys, 6:83–124, 1970.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.