Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data Collaboration Analysis with Orthogonal Basis Alignment (2403.02780v3)

Published 5 Mar 2024 in cs.LG and math.OC

Abstract: The Data Collaboration (DC) framework provides a privacy-preserving solution for multi-source data fusion, enabling the joint analysis of data from multiple sources to achieve enhanced insights. It utilizes linear transformations with secretly selected bases to ensure privacy guarantees through non-iterative communication. Despite its strengths, the DC framework often encounters performance instability due to theoretical challenges in aligning the bases used for mapping raw data. This study addresses these challenges by establishing a rigorous theoretical foundation for basis alignment within the DC framework, formulating it as an optimization problem over orthogonal matrices. Under specific assumptions, we demonstrate that this problem can be reduced to the Orthogonal Procrustes Problem, which has a well-known analytical solution. Extensive empirical evaluations across diverse datasets reveal that the proposed alignment method significantly enhances model performance and computational efficiency, outperforming existing approaches. Additionally, it demonstrates robustness across varying levels of differential privacy, thus enabling practical and reliable implementations of the DC framework.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. A survey on homomorphic encryption schemes: Theory and implementation. ACM Computing Surveys (Csur), 51(4):1–35, 2018.
  2. How to backdoor federated learning. In International conference on artificial intelligence and statistics, pages 2938–2948. PMLR, 2020.
  3. Optimizing semi-honest secure multiparty computation for the internet. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 578–590, 2016.
  4. Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1175–1191, 2017.
  5. Nicolas Boumal. An introduction to optimization on smooth manifolds. Cambridge University Press, 2023.
  6. Manopt, a matlab toolbox for optimization on manifolds. The Journal of Machine Learning Research, 15(1):1455–1459, 2014.
  7. David Chaum. The dining cryptographers problem: Unconditional sender and recipient untraceability. Journal of cryptology, 1:65–75, 1988.
  8. David L Chaum. Untraceable electronic mail, return addresses, and digital pseudonyms. Communications of the ACM, 24(2):84–90, 1981.
  9. Cynthia Dwork. Differential privacy: A survey of results. In International conference on theory and applications of models of computation, pages 1–19. Springer, 2008.
  10. Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861–874, 2006.
  11. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pages 1322–1333, 2015.
  12. Karl Pearson F.R.S. Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559–572, 1901.
  13. Property inference attacks on fully connected neural networks using permutation invariant representations. In Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, pages 619–633, 2018.
  14. Privacy-preserving distributed linear regression on high-dimensional data. Cryptology ePrint Archive, Paper 2016/892, 2016. https://eprint.iacr.org/2016/892.
  15. Procrustes problems, volume 30. OUP Oxford, 2004.
  16. Simon Haykin. Neural networks: a comprehensive foundation. Prentice Hall PTR, 1994.
  17. Locality preserving projections. Advances in neural information processing systems, 16, 2003.
  18. Stochastic neighbor embedding. Advances in neural information processing systems, 15, 2002.
  19. Tin Kam Ho. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, volume 1, pages 278–282. IEEE, 1995.
  20. Membership inference attacks on machine learning: A survey. ACM Computing Surveys (CSUR), 54(11s):1–37, 2022.
  21. A riemannian bfgs method without differentiated retraction for nonconvex optimization problems. SIAM Journal on Optimization, 28(1):470–495, 2018.
  22. Accuracy and privacy evaluations of collaborative data analysis. arXiv preprint arXiv:2101.11144, 2021.
  23. Interpretable collaborative data analysis on distributed data. Expert Systems with Applications, 177:114891, 2021.
  24. Data collaboration analysis framework using centralization of individual intermediate representations for distributed data sets. ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, 6(2):04020018, 2020.
  25. Non-readily identifiable data collaboration analysis for multiple datasets including personal information. Information Fusion, 98:101826, 2023.
  26. Collaborative data analysis: Non-model sharing-type machine learning for distributed data. In Knowledge Management and Acquisition for Intelligent Systems: 17th Pacific Rim Knowledge Acquisition Workshop, PKAW 2020, Yokohama, Japan, January 7–8, 2021, Proceedings 17, pages 14–29. Springer, 2021.
  27. Collaborative novelty detection for distributed data by a probabilistic method. In Vineeth N. Balasubramanian and Ivor Tsang, editors, Proceedings of The 13th Asian Conference on Machine Learning, volume 157 of Proceedings of Machine Learning Research, pages 932–947. PMLR, 17–19 Nov 2021.
  28. Steinbrunn William Pfisterer Matthias Janosi, Andras and Robert Detrano. Heart Disease. UCI Machine Learning Repository, 1988. DOI: https://doi.org/10.24432/C52P4X.
  29. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
  30. SCAFFOLD: Stochastic controlled averaging for federated learning. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 5132–5143. PMLR, 13–18 Jul 2020.
  31. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016.
  32. Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE), pages 965–978. IEEE, 2022.
  33. A survey on federated learning systems: Vision, hype and reality for data privacy and protection. IEEE Transactions on Knowledge and Data Engineering, 2021.
  34. Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3):50–60, 2020.
  35. Federated optimization in heterogeneous networks, 2020.
  36. On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189, 2019.
  37. Tutorial: Complexity analysis of singular value decomposition and its variants, 2019.
  38. Ensemble distillation for robust model fusion in federated learning. Advances in Neural Information Processing Systems, 33:2351–2363, 2020.
  39. Shuyang Ling. Generalized power method for generalized orthogonal procrustes problem: global convergence and optimization landscape analysis. arXiv preprint arXiv:2106.15493, 2021.
  40. Shuyang Ling. Near-optimal bounds for generalized orthogonal procrustes problem via generalized power method. arXiv preprint arXiv:2112.13725, 2021.
  41. Threats to federated learning: A survey. arXiv preprint arXiv:2003.02133, 2020.
  42. A survey on fully homomorphic encryption: An engineering perspective. ACM Computing Surveys (CSUR), 50(6):1–33, 2017.
  43. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  44. Data collaboration analysis applied to compound datasets and the introduction of projection data to non-iid settings, 2023.
  45. Application of data collaboration analysis to distributed data with misaligned features. Informatics in Medicine Unlocked, 32:101013, 2022.
  46. Autogan-based dimension reduction for privacy preservation. Neurocomputing, 384:94–103, 2020.
  47. Creating collaborative data representations using matrix manifold optimal computation and automated hyperparameter tuning. In 2023 IEEE 3rd International Conference on Electronic Communications, Internet of Things and Big Data (ICEIB), pages 180–185. IEEE, 2023.
  48. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011.
  49. Robust federated learning: The case of affine distribution shifts. Advances in Neural Information Processing Systems, 33:21554–21565, 2020.
  50. Social media and stock price reaction to data breach announcements: Evidence from us listed companies. Research in International Business and Finance, 47:458–469, 2019.
  51. Deepsecure: Scalable provably-secure deep learning. Cryptology ePrint Archive, Paper 2017/502, 2017. https://eprint.iacr.org/2017/502.
  52. Privacy preserving regression modelling via distributed computation. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 677–682, 2004.
  53. Peter H Schönemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10, 1966.
  54. Latanya Sweeney. k-anonymity: A model for protecting privacy. International journal of uncertainty, fuzziness and knowledge-based systems, 10(05):557–570, 2002.
  55. A hybrid approach to privacy-preserving federated learning. In Proceedings of the 12th ACM workshop on artificial intelligence and security, pages 1–11, 2019.
  56. Bart Vandereycken. Low-rank matrix completion by riemannian optimization. SIAM Journal on Optimization, 23(2):1214–1236, 2013.
  57. Thomas Viklands. Algorithms for the weighted orthogonal procrustes problem and other least squares problems. PhD thesis, Datavetenskap, 2006.
  58. Global-scale secure multiparty computation. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 39–56, 2017.
  59. Privacy-preserving machine learning: Methods, challenges and directions. arXiv preprint arXiv:2108.04417, 2021.
  60. Applied federated learning: Improving google keyboard query suggestions. arXiv preprint arXiv:1812.02903, 2018.
  61. Andrew C Yao. Protocols for secure computations. In 23rd annual symposium on foundations of computer science (sfcs 1982), pages 160–164. IEEE, 1982.
  62. Report: State of the art solutions for privacy preserving machine learning in the medical context, 2022.
  63. Deep leakage from gradients. Advances in neural information processing systems, 32, 2019.
  64. 川上雄大. データコラボレーション解析における統合関数最適化問題の定式化と効率的解法, 2022年度 筑波大学大学院 博士前期課程 理工情報生命学術院 システム情報工学研究群 社会工学学位プログラム 修士論文.

Summary

We haven't generated a summary for this paper yet.