Practical, Private Assurance of the Value of Collaboration via Fully Homomorphic Encryption (2310.02563v3)
Abstract: Two parties wish to collaborate on their datasets. However, before they reveal their datasets to each other, the parties want to have the guarantee that the collaboration would be fruitful. We look at this problem from the point of view of machine learning, where one party is promised an improvement on its prediction model by incorporating data from the other party. The parties would only wish to collaborate further if the updated model shows an improvement in accuracy. Before this is ascertained, the two parties would not want to disclose their models and datasets. In this work, we construct an interactive protocol for this problem based on the fully homomorphic encryption scheme over the Torus (TFHE) and label differential privacy, where the underlying machine learning model is a neural network. Label differential privacy is used to ensure that computations are not done entirely in the encrypted domain, which is a significant bottleneck for neural network training according to the current state-of-the-art FHE implementations. We formally prove the security of our scheme assuming honest-but-curious parties, but where one party may not have any expertise in labelling its initial dataset. Experiments show that we can obtain the output, i.e., the accuracy of the updated model, with time many orders of magnitude faster than a protocol using entirely FHE operations.
- G. Widmer and M. Kubat, “Learning in the presence of concept drift and hidden contexts,” Machine learning, vol. 23, pp. 69–101, 1996.
- E. Hesamifard, H. Takabi, M. Ghasemi, and R. N. Wright, “Privacy-preserving machine learning as a service.” Proc. Priv. Enhancing Technol., vol. 2018, no. 3, pp. 123–142, 2018.
- K. Nandakumar, N. Ratha, S. Pankanti, and S. Halevi, “Towards deep neural network training on encrypted data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0.
- R. Xu, J. B. Joshi, and C. Li, “Cryptonn: Training neural networks over encrypted data,” in 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2019, pp. 1199–1209.
- Q. Lou, B. Feng, G. Charles Fox, and L. Jiang, “Glyph: Fast and accurately training deep neural networks on encrypted data,” Advances in Neural Information Processing Systems, vol. 33, pp. 9193–9202, 2020.
- R. Xu, J. Joshi, and C. Li, “Nn-emd: Efficiently training neural networks using encrypted multi-sourced datasets,” IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 4, pp. 2807–2820, 2022.
- I. Chillotti, N. Gama, M. Georgieva, and M. Izabachène, “Tfhe: fast fully homomorphic encryption over the torus,” Journal of Cryptology, vol. 33, no. 1, pp. 34–91, 2020.
- K. Chaudhuri and D. Hsu, “Sample complexity bounds for differentially private learning,” in Proceedings of the 24th Annual Conference on Learning Theory. JMLR Workshop and Conference Proceedings, 2011, pp. 155–186.
- R. Thomas, “Lief - library to instrument executable formats,” https://lief.quarkslab.com/, apr 2017.
- A. Blum and M. Hardt, “The ladder: A reliable leaderboard for machine learning competitions,” in International Conference on Machine Learning. PMLR, 2015, pp. 1006–1014.
- C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3. Springer, 2006, pp. 265–284.
- J. Dong, A. Roth, and W. J. Su, “Gaussian differential privacy,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 84, no. 1, pp. 3–37, 2022.
- J. Smith, H. J. Asghar, G. Gioiosa, S. Mrabet, S. Gaspers, and P. Tyler, “Making the most of parallel composition in differential privacy,” Proceedings on Privacy Enhancing Technologies, vol. 1, pp. 253–273, 2022.
- M. Joye, “Sok: Fully homomorphic encryption over the [discretized] torus,” IACR Transactions on Cryptographic Hardware and Embedded Systems, pp. 661–692, 2022.
- Zama, “Concrete: TFHE Compiler that converts python programs into FHE equivalent,” 2022, https://github.com/zama-ai/concrete.
- S. Yuan, M. Shen, I. Mironov, and A. Nascimento, “Label private deep learning training based on secure multiparty computation and differential privacy,” in NeurIPS 2021 Workshop Privacy in Machine Learning, 2021.
- D. Boneh and V. Shoup, “A graduate course in applied cryptography,” Draft 0.6, 2023.
- R. Canetti, “Universally composable security: A new paradigm for cryptographic protocols,” in Proceedings 42nd IEEE Symposium on Foundations of Computer Science. IEEE, 2001, pp. 136–145.
- M. Kelly, R. Longjohn, and K. Nottingham, “The uci machine learning repository,” 2022. [Online]. Available: https://archive.ics.uci.edu
- J. López Lobo, “Synthetic datasets for concept drift detection purposes,” 2020. [Online]. Available: https://doi.org/10.7910/DVN/5OWRGB
- D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, “Drebin: Effective and explainable detection of android malware in your pocket.” in Ndss, vol. 14, 2014, pp. 23–26.
- B. Balle, J. Bell, A. Gascón, and K. Nissim, “The privacy blanket of the shuffle model,” in Advances in Cryptology–CRYPTO 2019: 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18–22, 2019, Proceedings, Part II 39. Springer, 2019, pp. 638–667.
- Z. Brakerski, C. Gentry, and V. Vaikuntanathan, “(leveled) fully homomorphic encryption without bootstrapping,” ACM Transactions on Computation Theory (TOCT), vol. 6, no. 3, pp. 1–36, 2014.
- M. Abdalla, F. Bourse, A. De Caro, and D. Pointcheval, “Simple functional encryption schemes for inner products,” in IACR International Workshop on Public Key Cryptography. Springer, 2015, pp. 733–751.
- P. Mohassel and Y. Zhang, “Secureml: A system for scalable privacy-preserving machine learning,” in 2017 IEEE symposium on security and privacy (SP). IEEE, 2017, pp. 19–38.
- M. Keller and K. Sun, “Secure quantized training for deep learning,” in International Conference on Machine Learning. PMLR, 2022, pp. 10 912–10 938.
- E. Morais, T. Koens, C. Van Wijk, and A. Koren, “A survey on zero knowledge range proofs and applications,” SN Applied Sciences, vol. 1, pp. 1–17, 2019.
- D. Froelicher, J. R. Troncoso-Pastoriza, J. S. Sousa, and J. Hubaux, “Drynx: Decentralized, secure, verifiable system for statistical queries and machine learning on distributed datasets,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 3035–3050, 2020.
- H. Corrigan-Gibbs and D. Boneh, “Prio: Private, robust, and scalable computation of aggregate statistics,” in 14th {normal-{\{{USENIX}normal-}\}} Symposium on Networked Systems Design and Implementation ({normal-{\{{NSDI}normal-}\}} 17), 2017, pp. 259–282.