Secure and Verifiable Data Collaboration with Low-Cost Zero-Knowledge Proofs (2311.15310v1)
Abstract: Organizations are increasingly recognizing the value of data collaboration for data analytics purposes. Yet, stringent data protection laws prohibit the direct exchange of raw data. To facilitate data collaboration, federated Learning (FL) emerges as a viable solution, which enables multiple clients to collaboratively train a ML model under the supervision of a central server while ensuring the confidentiality of their raw data. However, existing studies have unveiled two main risks: (i) the potential for the server to infer sensitive information from the client's uploaded updates (i.e., model gradients), compromising client input privacy, and (ii) the risk of malicious clients uploading malformed updates to poison the FL model, compromising input integrity. Recent works utilize secure aggregation with zero-knowledge proofs (ZKP) to guarantee input privacy and integrity in FL. Nevertheless, they suffer from extremely low efficiency and, thus, are impractical for real deployment. In this paper, we propose a novel and highly efficient solution RiseFL for secure and verifiable data collaboration, ensuring input privacy and integrity simultaneously.Firstly, we devise a probabilistic integrity check method that significantly reduces the cost of ZKP generation and verification. Secondly, we design a hybrid commitment scheme to satisfy Byzantine robustness with improved performance. Thirdly, we theoretically prove the security guarantee of the proposed solution. Extensive experiments on synthetic and real-world datasets suggest that our solution is effective and is highly efficient in both client computation and communication. For instance, RiseFL is up to 28x, 53x and 164x faster than three state-of-the-art baselines ACORN, RoFL and EIFFeL for the client computation.
- 2016. Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation). (2016).
- Sercan Ö Arik and Tomas Pfister. 2021. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 6679–6687.
- How To Backdoor Federated Learning. In AISTATS. 2938–2948.
- Skellam Mixture Mechanism: a Novel Approach to Federated Learning with Differential Privacy. Proc. VLDB Endow. 15, 11 (2022), 2348–2360.
- ExDRa: Exploratory Data Science on Federated Raw Data. In SIGMOD. ACM, 2450–2463.
- {{\{{ACORN}}\}}: Input Validation for Secure Aggregation. In USENIX Security 23. 4805–4822.
- Secure Single-Server Aggregation with (Poly)Logarithmic Overhead. In CCS. 1253–1269.
- SNARKs for C: Verifying program executions succinctly and in zero knowledge. In Advances in Cryptology–CRYPTO 2013: 33rd Annual Cryptology Conference, Santa Barbara, CA, USA, August 18-22, 2013. Proceedings, Part II. Springer, 90–108.
- Analyzing Federated Learning through an Adversarial Lens. In ICML. 634–643.
- Jock Blackard. 1998. Covertype. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C50K5N.
- Non-Interactive Zero-Knowledge and Its Applications (Extended Abstract). In STOC. 103–112.
- Practical secure aggregation for privacy-preserving machine learning. In CCS. 1175–1191.
- Bulletproofs: Short proofs for confidential transactions and more. In S&P. IEEE, 315–334.
- Jan Camenisch and Markus Stadler. 1997. Proof systems for general statements about discrete logarithms. Technical Report/ETH Zurich, Department of Computer Science 260 (1997).
- FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping. In NDSS.
- Henry Corrigan-Gibbs and Dan Boneh. 2017. Prio: Private, Robust, and Scalable Computation of Aggregate Statistics.. In NSDI. 259–282.
- Asynchronous Byzantine machine learning (the case of SGD). In ICML. PMLR, 1145–1154.
- Ivan Damgård and Mads Jurik. 2001. A Generalisation, a Simplification and Some Applications of Paillier’s Probabilistic Public-Key System. In Public Key Cryptography. 119–136.
- Improving Fairness for Data Valuation in Horizontal Federated Learning. In ICDE. 2440–2453.
- Paul Feldman. 1987. A practical scheme for non-interactive verifiable secret sharing. In FOCS. IEEE, 427–438.
- VF22{}^{\mbox{2}}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTBoost: Very Fast Vertical Federated Gradient Boosting for Cross-Enterprise Learning. In SIGMOD, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 563–576.
- BlindFL: Vertical Federated Machine Learning without Peeking into Your Data. In SIGMOD. 1316–1330.
- FEAST: A Communication-efficient Federated Feature Selection Framework for Relational Data. Proc. ACM Manag. Data 1, 1 (2023), 107:1–107:28.
- Mitigating Sybils in Federated Learning Poisoning. CoRR abs/1808.04866 (2018).
- Taher El Gamal. 1985. A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory 31, 4 (1985), 469–472.
- Quadratic span programs and succinct NIZKs without PCPs. In Advances in Cryptology–EUROCRYPT 2013: 32nd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Athens, Greece, May 26-30, 2013. Proceedings 32. Springer, 626–645.
- Oded Goldreich. 2001. The Foundations of Cryptography - Volume 1: Basic Techniques. Cambridge University Press.
- Jamie Hayes and Olga Ohrimenko. 2018. Contamination Attacks and Mitigation in Multi-Party Machine Learning. In NeurIPS. 6604–6616.
- Deep Residual Learning for Image Recognition. In CVPR. 770–778.
- The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation. In ICML. 5201–5212.
- Marcel Keller. 2020. MP-SPDZ: A Versatile Framework for Multi-Party Computation. In CCS. 1575–1590.
- RSA: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets. In AAAI, Vol. 33. 1544–1551.
- Federated Learning on Non-IID Data Silos: An Experimental Study. In ICDE. 965–978.
- Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine 37, 3 (2020), 50–60.
- ZKSQL: Verifiable and Efficient Query Evaluation with Zero-Knowledge Proofs. Proc. VLDB Endow. 16, 8 (2023), 1804–1816.
- Projected Federated Averaging with Heterogeneous Differential Privacy. PVLDB 15, 4 (2021), 828–840.
- Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks. In RAID. 273–294.
- Enabling SQL-based Training Data Debugging for Federated Learning. Proc. VLDB Endow. 15, 3 (2021), 388–400.
- Rofl: Robustness of secure federated learning. In S&P. IEEE, 453–476.
- Differentially Private Byzantine-Robust Federated Learning. IEEE Trans. Parallel Distributed Syst. 33, 12 (2022), 3690–3701.
- Communication-efficient learning of deep networks from decentralized data. In AISTATS. 1273–1282.
- Exploiting Unintended Feature Leakage in Collaborative Learning. In S&P. 691–706.
- Ralph C. Merkle. 1978. Secure Communications Over Insecure Channels. Commun. ACM 21, 4 (1978), 294–299.
- Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning. In S&P. 739–753.
- FLAME: Taming Backdoors in Federated Learning. In USENIX Security Symposium. 1415–1432.
- Justinian’s GAAvernor: Robust Distributed Learning with Gradient Aggregation Agent. In USENIX Security. 1641–1658.
- Pinocchio: Nearly practical verifiable computation. Commun. ACM 59, 2 (2016), 103–112.
- Torben Pryds Pedersen. 2001. Non-interactive and information-theoretic secure verifiable secret sharing. In CRYPTO. Springer, 129–140.
- Nicholas Pippenger. 1980. On the evaluation of powers and monomials. SIAM J. Comput. 9, 2 (1980), 230–250.
- C. Pomerance and S. Goldwasser. 1990. Cryptology and Computational Number Theory. American Mathematical Society. https://books.google.com.sg/books?id=yyfS7MKQhJUC
- Elsa: Secure aggregation for federated learning with malicious actors. In S&P. IEEE, 1961–1979.
- EIFFeL: Ensuring Integrity for Federated Learning. In CCS. 2535–2549.
- Adi Shamir. 1979. How to share a secret. Commun. ACM 22, 11 (1979), 612–613.
- Yanyao Shen and Sujay Sanghavi. 2019. Learning with Bad Training Data via Iterative Trimmed Loss Minimization. In ICML. 5739–5748.
- Certified Defenses for Data Poisoning Attacks. In NeurIPS. 3517–3529.
- Can you really backdoor federated learning? arXiv preprint arXiv:1911.07963 (2019).
- Distribution-Regularized Federated Learning on Non-IID Data. In ICDE. IEEE, 2113–2125.
- Incentive-Aware Decentralized Data Collaboration. Proc. ACM Manag. Data 1, 2 (2023), 158:1–158:27.
- Privacy Preserving Vertical Federated Learning for Tree-based Models. Proc. VLDB Endow. 13, 11 (2020), 2090–2103.
- Falcon: A Privacy-Preserving and Interpretable Vertical Federated Learning System. Proc. VLDB Endow. 16, 10 (2023), 2471–2484.
- Practical Differentially Private and Byzantine-resilient Federated Learning. Proc. ACM Manag. Data 1, 2 (2023), 119:1–119:26.
- DBA: Distributed Backdoor Attacks against Federated Learning. In ICLR.
- Zeno++: robust asynchronous SGD with arbitrary number of Byzantine workers. arXiv preprint arXiv:1903.07020 (2019).
- FederatedScope: A Flexible Federated Learning Platform for Heterogeneity. Proc. VLDB Endow. 16, 5 (2023), 1059–1072.
- TDFL: Truth Discovery Based Byzantine Robust Federated Learning. IEEE Trans. Parallel Distributed Syst. 33, 12 (2022), 4835–4848.
- MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis. In IEEE 18th International Symposium on Biomedical Imaging (ISBI). 191–195.
- MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification. Scientific Data 10, 1 (2023), 41.
- Federated Machine Learning: Concept and Applications. ACM TIST 10, 2 (2019), 12:1–12:19.
- Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates. In ICML. 5636–5645.
- Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning. In ICML. 7074–7084.
- FLBooster: A Unified and Efficient Platform for Federated Learning Acceleration. In ICDE. IEEE, 3140–3153.
- Aggregation Service for Federated Learning: An Efficient, Secure, and More Resilient Realization. IEEE Trans. Dependable Secur. Comput. 20, 2 (2023), 988–1001.
- Robust and Secure Federated Learning with Low-Cost Zero-Knowledge Proof. https://www.comp.nus.edu.sg/~ooibc/risefl-20230901.pdf