SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning (2402.04114v2)
Abstract: In this paper, we analyze the sample and communication complexity of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the effects of local training with agent heterogeneity. We show that the communication complexity of FedLSA scales polynomially with the inverse of the desired accuracy $\epsilon$. To overcome this, we propose SCAFFLSA a new variant of FedLSA that uses control variates to correct for client drift, and establish its sample and communication complexities. We show that for statistically heterogeneous agents, its communication complexity scales logarithmically with the desired accuracy, similar to Scaffnew. An important finding is that, compared to the existing results for Scaffnew, the sample complexity scales with the inverse of the number of agents, a property referred to as linear speed-up. Achieving this linear speed-up requires completely new theoretical arguments. We apply the proposed method to federated temporal difference learning with linear function approximation and analyze the corresponding complexity improvements.
- On a perturbation approach for the analysis of stochastic tracking algorithms. SIAM Journal on Control and Optimization, 39(3):872–899, 2000.
- On the generation of markov decision processes. Journal of the Operational Research Society, 46(3):354–361, 1995.
- A finite time analysis of temporal difference learning with linear function approximation. In Conference On Learning Theory, pp. 1691–1692, 2018.
- Exploiting shared representations for personalized federated learning. In International conference on machine learning, pp. 2089–2099. PMLR, 2021.
- Randprox: Primal-dual optimization algorithms with randomized proximal updates. arXiv preprint arXiv:2207.12891, 2022.
- Provably doubly accelerated federated learning: The first theoretically successful combination of local training and compressed communication. arXiv preprint arXiv:2210.13277, 2022.
- Federated td learning over finite-rate erasure channels: Linear speedup under markovian sampling. IEEE Control Systems Letters, 2023.
- Finite sample analyses for TD(0) with function approximation. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- Policy evaluation with temporal differences: A survey and comparison. Journal of Machine Learning Research, 15:809–883, 2014.
- Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning. In Proceedings of the 36th International Conference on Machine Learning, pp. 1626–1635. PMLR, May 2019. URL https://proceedings.mlr.press/v97/doan19a.html. ISSN: 2640-3498.
- Doan, T. T. Local stochastic approximation: A unified view of federated learning and distributed multi-task reinforcement learning algorithms. arXiv preprint arXiv:2006.13460, 2020.
- Tight high probability bounds for linear stochastic approximation with fixed stepsize. In Ranzato, M., Beygelzimer, A., Nguyen, K., Liang, P. S., Vaughan, J. W., and Dauphin, Y. (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 30063–30074. Curran Associates, Inc., 2021.
- Finite-time high-probability bounds for polyak-ruppert averaged iterates of linear stochastic approximation. arXiv preprint arXiv:2207.04475, 2022.
- Off-policy learning with eligibility traces: a survey. J. Mach. Learn. Res., 15(1):289–333, 2014.
- Local sgd: Unified theory and new efficient methods. In International Conference on Artificial Intelligence and Statistics, pp. 3556–3564. PMLR, 2021.
- Can 5th generation local training methods support client sampling? yes! In International Conference on Artificial Intelligence and Statistics, pp. 1055–1092. PMLR, 2023.
- Exponential stability of general tracking algorithms. IEEE Transactions on Automatic Control, 40(8):1376–1387, 1995.
- On the Convergence of Local Descent Methods in Federated Learning, December 2019. URL http://arxiv.org/abs/1910.14425. arXiv:1910.14425 [cs, stat].
- Tighter analysis for proxskip. In International Conference on Machine Learning, pp. 13469–13496. PMLR, 2023.
- Federated Reinforcement Learning with Environment Heterogeneity. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, pp. 18–37. PMLR, May 2022. URL https://proceedings.mlr.press/v151/jin22a.html. ISSN: 2640-3498.
- Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning, pp. 5132–5143. PMLR, 2020.
- Tighter theory for local sgd on identical and heterogeneous data. In International Conference on Artificial Intelligence and Statistics, pp. 4519–4529. PMLR, 2020.
- Federated reinforcement learning: Linear speedup under markovian sampling. In International Conference on Machine Learning, pp. 10997–11057. PMLR, 2022.
- A unified theory of decentralized sgd with changing topology and local updates. In International Conference on Machine Learning, pp. 5381–5393. PMLR, 2020.
- Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527, 2016.
- Sharp high-probability sample complexities for policy evaluation with linear function approximation, 2023a.
- Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3):50–60, 2020.
- Accelerated and instance-optimal policy evaluation with linear function approximation. SIAM Journal on Mathematics of Data Science, 5(1):174–200, 2023b. doi: 10.1137/21M1468668. URL https://doi.org/10.1137/21M1468668.
- Federated reinforcement learning for training control policies on multiple iot devices. Sensors, 20(5):1359, 2020.
- Distributed td (0) with almost no communication. IEEE Control Systems Letters, 2023.
- Variance reduced proxskip: Algorithm, theory and application to federated learning. Advances in Neural Information Processing Systems, 35:15176–15189, 2022.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. PMLR, 2017.
- Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally! In International Conference on Machine Learning, pp. 15750–15769. PMLR, 2022.
- Linear convergence in federated learning: Tackling client heterogeneity and sparse gradients. Advances in Neural Information Processing Systems, 34:14606–14619, 2021.
- On the still unreasonable effectiveness of federated averaging for heterogeneous distributed learning. In Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities, 2023. URL https://openreview.net/forum?id=vhS68bKv7x.
- Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation. In International Conference on Artificial Intelligence and Statistics, pp. 5438–5448. PMLR, 2023.
- A remark on the stability of the LMS tracking algorithm. Stochastic analysis and applications, 16(1):119–129, 1998.
- Federated reinforcement learning: Techniques, applications, and open challenges. arXiv preprint arXiv:2108.11887, 2021.
- Federated learning’s blessing: Fedavg has linear speedup. In ICLR 2021-Workshop on Distributed and Private Machine Learning (DPML), 2021.
- Adaptive federated optimization. arXiv preprint arXiv:2003.00295, 2020.
- A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22(3):400 – 407, 1951. doi: 10.1214/aoms/1177729586. URL https://doi.org/10.1214/aoms/1177729586.
- Finite-sample analysis of the Temporal Difference Learning. arXiv preprint arXiv:2310.14286, 2023.
- Finite-time error bounds for linear stochastic approximation and TD learning. In Conference on Learning Theory, pp. 2803–2830. PMLR, 2019.
- Sutton, R. S. Learning to predict by the methods of temporal differences. Machine learning, 3:9–44, 1988.
- Fast gradient-descent methods for temporal-difference learning with linear function approximation. In Proceedings of the 26th annual international conference on machine learning, pp. 993–1000, 2009.
- An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674–690, May 1997. ISSN 2334-3303. doi: 10.1109/9.580874.
- Federated temporal difference learning with linear function approximation under environmental heterogeneity. arXiv preprint arXiv:2302.02212, 2023.
- On the unreasonable effectiveness of federated averaging with heterogeneous data. arXiv preprint arXiv:2206.04723, 2022.
- Fedkl: Tackling data heterogeneity in federated reinforcement learning by penalizing kl divergence. IEEE Journal on Selected Areas in Communications, 41(4):1227–1242, 2023.
- Federated learning with non-iid data. arXiv preprint arXiv:1806.00582, 2018.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Collections
Sign up for free to add this paper to one or more collections.