Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Federated Learning Using Three-Operator ADMM (2211.04152v3)

Published 8 Nov 2022 in cs.LG, eess.SP, and math.OC

Abstract: Federated learning (FL) has emerged as an instance of distributed machine learning paradigm that avoids the transmission of data generated on the users' side. Although data are not transmitted, edge devices have to deal with limited communication bandwidths, data heterogeneity, and straggler effects due to the limited computational resources of users' devices. A prominent approach to overcome such difficulties is FedADMM, which is based on the classical two-operator consensus alternating direction method of multipliers (ADMM). The common assumption of FL algorithms, including FedADMM, is that they learn a global model using data only on the users' side and not on the edge server. However, in edge learning, the server is expected to be near the base station and have direct access to rich datasets. In this paper, we argue that leveraging the rich data on the edge server is much more beneficial than utilizing only user datasets. Specifically, we show that the mere application of FL with an additional virtual user node representing the data on the edge server is inefficient. We propose FedTOP-ADMM, which generalizes FedADMM and is based on a three-operator ADMM-type technique that exploits a smooth cost function on the edge server to learn a global model parallel to the edge devices. Our numerical experiments indicate that FedTOP-ADMM has substantial gain up to 33\% in communication efficiency to reach a desired test accuracy with respect to FedADMM, including a virtual user on the edge server.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” arXiv preprint arXiv:1610.05492, 2016.
  2. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in AISTATS, 2017, pp. 1273–1282.
  3. T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” in Machine Learning and Systems, 2020, pp. 429–450.
  4. S. Malekmohammadi, K. Shaloudegi, Z. Hu, and Y. Yu, “Splitting algorithms for federated learning,” in Machine Learning and Principles and Practice of Knowledge Discovery in Databases, M. K. et al., Ed.   Cham: Springer International Publishing, 2021, pp. 159–176.
  5. S. Kant, M. Bengtsson, B. Göransson, G. Fodor, and C. Fischione, “Efficient optimization for large-scale MIMO-OFDM spectral precoding,” IEEE Trans. Wireless Commun., vol. 20, no. 9, pp. 5496–5513, Sep. 2021.
  6. H. Hellström, J. M. B. da Silva Jr., M. M. Amiri, M. Chen, V. Fodor, H. V. Poor, and C. Fischione, “Wireless for Machine Learning: A Survey,” Foundations and Trends® in Signal Processing, vol. 15, no. 4, pp. 290–399, 2022.
  7. S. Zhou and G. Y. Li, “Federated learning via inexact ADMM,” arXiv e-prints, p. arXiv:2204.10607, Apr. 2022.
  8. M. I. Jordan, J. D. Lee, and Y. Yang, “Communication-efficient distributed statistical inference,” Journal of the American Statistical Association, 2018.
  9. R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis, “Gradient coding: Avoiding stragglers in distributed learning,” in Proc. of the International Conference on Machine Learning.   PMLR, 2017, pp. 3368–3376.
  10. T. Chen, G. Giannakis, T. Sun, and W. Yin, “LAG: Lazily aggregated gradient for communication-efficient distributed learning,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  11. T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 50–60, 2020.
  12. E. Bagdasaryan, O. Poursaeed, and V. Shmatikov, “Differential privacy has disparate impact on model accuracy,” in Advances in Neural Information Processing Systems, 2019, pp. 15 453–15 462.
  13. S. U. Stich, J.-B. Cordonnier, and M. Jaggi, “Sparsified SGD with memory,” in Advances in Neural Information Processing Systems, 2018, pp. 4447–4458.
  14. S. Di, D. Tao et al., “Efficient lossy compression for scientific data based on pointwise relative error bound,” IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 2, pp. 331–345, 2018.
  15. J. Wangni, J. Wang et al., “Gradient sparsification for communication-efficient distributed optimization,” in Advances in Neural Information Processing Systems, 2018, pp. 1299–1309.
  16. P. Kairouz et al., “Advances and open problems in federated learning,” Foundations and Trends® in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021.
  17. J. Sun, T. Chen et al., “Communication-efficient distributed learning via lazily aggregated quantized gradients,” in Advances in Neural Information Processing Systems, 2019, pp. 3370–3380.
  18. H. Yu, S. Yang, and S. Zhu, “Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning,” in Proc. of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, p. 5693–5700.
  19. P. L. Combettes and J.-C. Pesquet, “Proximal splitting methods in signal processing,” in Fixed-Point algorithms for inverse problems in science and engineering.   Springer, New York, NY, 2011, pp. 185–212.
  20. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.
  21. N. Parikh, S. Boyd, N. Parikh, and S. Boyd, “Proximal algorithms,” Foundations and Trends® in Optimization, vol. 1, no. 3, pp. 123–231, 2013.
  22. N. Komodakis and J. Pesquet, “Playing with duality: An overview of recent primal-dual approaches for solving large-scale optimization problems,” IEEE Signal Process. Mag., vol. 32, no. 6, pp. 31–54, Oct. 2015.
  23. D. Davis and W. Yin, “A three-operator splitting scheme and its optimization applications,” Set-Valued and Variational Analysis, vol. 25, no. 4, pp. 829–858, Dec. 2017.
  24. M. Yan, “A new primal-dual algorithm for minimizing the sum of three functions with a linear operator,” J. Sci. Comput., vol. 76, pp. 1698–1717, 2018.
  25. R. Pathak and M. J. Wainwright, “FedSplit: An algorithmic framework for fast federated optimization,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 7057–7066.
  26. Q. Tran-Dinh, N. H. Pham, D. T. Phan, and L. M. Nguyen, “FedDR – Randomized Douglas-Rachford splitting algorithms for nonconvex federated composite optimization,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 4447–4458.
  27. S. Zhou and G. Y. Li, “Communication-efficient ADMM-based federated learning,” arXiv e-prints, p. arXiv:2110.15318, Jan. 2022.
  28. H. Wang and A. Banerjee, “Bregman alternating direction method of multipliers,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2816–2824.
  29. S. Banert, R. I. Boţ, and E. R. Csetnek, “Fixing and extending some recent results on the ADMM algorithm,” Numerical Algorithms, pp. 1–23, May 2020.
  30. P. L. Combettes and J.-C. Pesquet, “Fixed point strategies in data science,” IEEE Trans. Signal Process., pp. 1–1, 2021.
  31. L. Condat, D. Kitahara, A. Contreras, and A. Hirabayashi, “Proximal splitting algorithms: A tour of recent advances, with new twists!” arXiv:1912.00137, Dec. 2021.
  32. M. Hong and Z.-Q. Luo, “On the linear convergence of the alternating direction method of multipliers,” Math. Program., vol. 162, no. 1–2, pp. 165–199, 2017.
  33. S. Kant, M. Bengtsson, G. Fodor, B. Göransson, and C. Fischione, “EVM mitigation with PAPR and ACLR constraints in large-scale MIMO-OFDM using TOP-ADMM,” accepted to IEEE Trans. Wireless Commun., 2022.
  34. G. Pierra, “Decomposition through formalization in a product space,” Mathematical Programming, vol. 28, no. 1, pp. 96–115, Jan. 1984.
  35. G. Stathopoulos, H. Shukla, A. Szűcs, Y. Pu, and C. N. Jones, “Operator splitting methods in control,” Foundations and Trends® in Systems and Control, vol. 3, no. 3, pp. 249–362, 2016.
  36. S. Kant, C. Fischione, M. Bengtsson, G. Fodor, and B. Göransson, “An introduction to three-operator ADMM for wireless communications and machine learning,” in preparation, 2022.
  37. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
Citations (5)

Summary

We haven't generated a summary for this paper yet.