Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 164 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 76 tok/s Pro
Kimi K2 216 tok/s Pro
GPT OSS 120B 435 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

SPriFed-OMP: A Differentially Private Federated Learning Algorithm for Sparse Basis Recovery (2402.19016v1)

Published 29 Feb 2024 in cs.LG and cs.CR

Abstract: Sparse basis recovery is a classical and important statistical learning problem when the number of model dimensions $p$ is much larger than the number of samples $n$. However, there has been little work that studies sparse basis recovery in the Federated Learning (FL) setting, where the client data's differential privacy (DP) must also be simultaneously protected. In particular, the performance guarantees of existing DP-FL algorithms (such as DP-SGD) will degrade significantly when $p \gg n$, and thus, they will fail to learn the true underlying sparse model accurately. In this work, we develop a new differentially private sparse basis recovery algorithm for the FL setting, called SPriFed-OMP. SPriFed-OMP converts OMP (Orthogonal Matching Pursuit) to the FL setting. Further, it combines SMPC (secure multi-party computation) and DP to ensure that only a small amount of noise needs to be added in order to achieve differential privacy. As a result, SPriFed-OMP can efficiently recover the true sparse basis for a linear model with only $n = O(\sqrt{p})$ samples. We further present an enhanced version of our approach, SPriFed-OMP-GRAD based on gradient privatization, that improves the performance of SPriFed-OMP. Our theoretical analysis and empirical results demonstrate that both SPriFed-OMP and SPriFed-OMP-GRAD terminate in a small number of steps, and they significantly outperform the previous state-of-the-art DP-FL solutions in terms of the accuracy-privacy trade-off.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp.  308–318, 2016.
  2. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp.  2623–2631, 2019.
  3. Private stochastic convex optimization: Optimal rates in l1 geometry. In International Conference on Machine Learning, pp.  393–403. PMLR, 2021.
  4. Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th annual symposium on foundations of computer science, pp.  464–473. IEEE, 2014.
  5. Non-euclidean differentially private stochastic convex optimization. In Conference on Learning Theory, pp.  474–499. PMLR, 2021.
  6. High-dimensional methods and inference on structural and treatment effects. Journal of Economic Perspectives, 28(2):29–50, 2014.
  7. Christopher M Bishop et al. Neural networks for pattern recognition. Oxford university press, 1995.
  8. Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp.  1175–1191, 2017.
  9. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
  10. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pp.  635–658. Springer, 2016.
  11. Decoding by linear programming. IEEE transactions on information theory, 51(12):4203–4215, 2005.
  12. Orthogonal matching pursuit with noisy and missing data: Low and high dimensional results. arXiv preprint arXiv:1206.0823, 2012.
  13. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nature reviews cancer, 8(1):37–49, 2008.
  14. Gaussian differential privacy. arXiv preprint arXiv:1905.02383, 2019.
  15. Erik Drysdale. SurvSet: An open-source time-to-event dataset repository. arXiv preprint arXiv:2203.03094, 2022.
  16. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014.
  17. Least angle regression. 2004.
  18. Sparse high-dimensional models in economics. Annu. Rev. Econ., 3(1):291–317, 2011.
  19. Inverting gradients-how easy is it to break privacy in federated learning? Advances in Neural Information Processing Systems, 33:16937–16947, 2020.
  20. Result analysis of the nips 2003 feature selection challenge. Advances in neural information processing systems, 17, 2004.
  21. High dimensional differentially private stochastic optimization with heavy-tailed data. In Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp.  227–236, 2022.
  22. Evaluating gradient inversion attacks and defenses in federated learning. Advances in Neural Information Processing Systems, 34:7232–7241, 2021.
  23. On learning discrete graphical models using greedy methods. Advances in Neural Information Processing Systems, 24, 2011.
  24. The composition theorem for differential privacy. In International conference on machine learning, pp.  1376–1385. PMLR, 2015.
  25. The distributed discrete gaussian mechanism for federated learning with secure aggregation. In International Conference on Machine Learning, pp.  5201–5212. PMLR, 2021.
  26. Private convex empirical risk minimization and high-dimensional regression. In Conference on Learning Theory, pp.  25–1. JMLR Workshop and Conference Proceedings, 2012.
  27. Statistical advances and challenges for analyzing correlated high dimensional snp data in genomic study for complex diseases. 2008.
  28. High-dimensional private empirical risk minimization by greedy coordinate descent. In International Conference on Artificial Intelligence and Statistics, pp.  4894–4916. PMLR, 2023.
  29. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp.  1273–1282. PMLR, 2017.
  30. Nicolai Meinshausen. Relaxed lasso. Computational Statistics & Data Analysis, 52(1):374–393, 2007.
  31. Lasso-type recovery of sparse representations for high-dimensional data. 2009.
  32. Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF), pp.  263–275. IEEE, 2017.
  33. Cosamp: Iterative signal recovery from incomplete and inaccurate samples. Applied and computational harmonic analysis, 26(3):301–321, 2009.
  34. The matrix cookbook. Technical University of Denmark, 7(15):510, 2008.
  35. The littlewood–offord problem and invertibility of random matrices. Advances in Mathematics, 218(2):600–633, 2008.
  36. Nearly optimal private lasso. Advances in Neural Information Processing Systems, 28, 2015.
  37. Differentially private feature selection via stability arguments, and the robustness of the lasso. In Conference on Learning Theory, pp.  819–850. PMLR, 2013.
  38. Signal recovery from partial information via orthogonal matching pursuit. IEEE Trans. Inform. Theory, 53(12):4655–4666, 2007.
  39. Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027, 2010.
  40. Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  41. Di Wang and Jinhui Xu. On sparse linear regression in the local differential privacy model. In International Conference on Machine Learning, pp.  6628–6637. PMLR, 2019.
  42. Differentially private iterative gradient hard thresholding for sparse learning. In 28th International Joint Conference on Artificial Intelligence, 2019.
  43. Yu-Xiang Wang. autodp: A flexible and easy-to-use package for differential privacy. https://github.com/yuxiangw/autodp, 2023.
  44. High dimensional variable selection. Annals of statistics, 37(5A):2178, 2009.
  45. Tong Zhang. On the consistency of feature selection using greedy least squares regression. Journal of Machine Learning Research, 10(3), 2009.
  46. Tong Zhang. Adaptive forward-backward greedy algorithm for learning sparse representations. IEEE transactions on information theory, 57(7):4689–4708, 2011.
  47. On model selection consistency of lasso. The Journal of Machine Learning Research, 7:2541–2563, 2006.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.