FedLion: Faster Adaptive Federated Optimization with Fewer Communication (2402.09941v1)
Abstract: In Federated Learning (FL), a framework to train machine learning models across distributed data, well-known algorithms like FedAvg tend to have slow convergence rates, resulting in high communication costs during training. To address this challenge, we introduce FedLion, an adaptive federated optimization algorithm that seamlessly incorporates key elements from the recently proposed centralized adaptive algorithm, Lion (Chen et al. 2o23), into the FL framework. Through comprehensive evaluations on two widely adopted FL benchmarks, we demonstrate that FedLion outperforms previous state-of-the-art adaptive algorithms, including FAFED (Wu et al. 2023) and FedDA. Moreover, thanks to the use of signed gradients in local training, FedLion substantially reduces data transmission requirements during uplink communication when compared to existing adaptive algorithms, further reducing communication costs. Last but not least, this work also includes a novel theoretical analysis, showcasing that FedLion attains faster convergence rate than established FL algorithms like FedAvg.
- “Symbolic discovery of optimization algorithms,” arXiv preprint arXiv:2302.06675, 2023.
- “Faster adaptive federated learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2023, vol. 37, pp. 10379–10387.
- “Accelerated federated learning with decoupled adaptive optimization,” in International Conference on Machine Learning. PMLR, 2022, pp. 10298–10322.
- “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics. PMLR, 2017, pp. 1273–1282.
- “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- “Adaptive federated optimization,” arXiv preprint arXiv:2003.00295, 2020.
- “Faster non-convex federated learning via global and local momentum,” in Uncertainty in Artificial Intelligence. PMLR, 2022, pp. 496–506.
- “Accelerating federated learning via momentum gradient descent,” IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 8, pp. 1754–1766, 2020.
- “Scaffold: Stochastic controlled averaging for federated learning,” in International conference on machine learning. PMLR, 2020, pp. 5132–5143.
- “signsgd: Compressed optimisation for non-convex problems,” in International Conference on Machine Learning. PMLR, 2018, pp. 560–569.
- “Stochastic sign descent methods: New algorithms and better theory,” in International Conference on Machine Learning. PMLR, 2021, pp. 9224–9234.
- “Fedpd: A federated learning framework with adaptivity to non-iid data,” IEEE Transactions on Signal Processing, vol. 69, pp. 6055–6070, 2021.
- “Parallel restarted sgd with faster convergence and less communication: Demystifying why model averaging works for deep learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 5693–5700.
- Sebastian U Stich, “Local sgd converges fast and communicates little,” arXiv preprint arXiv:1805.09767, 2018.
- “Tighter theory for local sgd on identical and heterogeneous data,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 4519–4529.
- “Cooperative sgd: A unified framework for the design and analysis of local-update sgd algorithms,” The Journal of Machine Learning Research, vol. 22, no. 1, pp. 9709–9758, 2021.
- “Local sgd with periodic averaging: Tighter analysis and adaptive synchronization,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- “z𝑧zitalic_z-signfedavg: A unified sign-based stochastic compression for federated learning,” in Workshop on Federated Learning: Recent Advances and New Challenges (in Conjunction with NeurIPS 2022), 2022.
- “Communication-efficient adaptive federated learning,” in International Conference on Machine Learning. PMLR, 2022, pp. 22802–22838.
- “Stem: A stochastic two-sided momentum algorithm achieving near-optimal sample and communication complexities for federated learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 6050–6061, 2021.
- Zhiwei Tang (9 papers)
- Tsung-Hui Chang (86 papers)