Federated Orthogonal Training: Mitigating Global Catastrophic Forgetting in Continual Federated Learning (2309.01289v3)
Abstract: Federated Learning (FL) has gained significant attraction due to its ability to enable privacy-preserving training over decentralized data. Current literature in FL mostly focuses on single-task learning. However, over time, new tasks may appear in the clients and the global model should learn these tasks without forgetting previous tasks. This real-world scenario is known as Continual Federated Learning (CFL). The main challenge of CFL is Global Catastrophic Forgetting, which corresponds to the fact that when the global model is trained on new tasks, its performance on old tasks decreases. There have been a few recent works on CFL to propose methods that aim to address the global catastrophic forgetting problem. However, these works either have unrealistic assumptions on the availability of past data samples or violate the privacy principles of FL. We propose a novel method, Federated Orthogonal Training (FOT), to overcome these drawbacks and address the global catastrophic forgetting in CFL. Our algorithm extracts the global input subspace of each layer for old tasks and modifies the aggregated updates of new tasks such that they are orthogonal to the global principal subspace of old tasks for each layer. This decreases the interference between tasks, which is the main cause for forgetting. We empirically show that FOT outperforms state-of-the-art continual learning methods in the CFL setting, achieving an average accuracy gain of up to 15% with 27% lower forgetting while only incurring a minimal computation and communication cost.
- Task-free continual learning, 2019.
- Generative models for effective ml on private, decentralized datasets. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SJgaRA4FPH.
- Don’t memorize; mimic the past: Federated class incremental learning without episodic memory, 2023.
- The johnson-lind enstrauss transform itself preserves differential privacy. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pp. 410–419. IEEE, 2012.
- When the curious abandon honesty: Federated learning is not private, 2021. URL https://arxiv.org/abs/2112.02918.
- Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS ’17, pp. 1175–1191, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450349468. doi: 10.1145/3133956.3133982. URL https://doi.org/10.1145/3133956.3133982.
- Yaroslav Bulatov. Notmnist dataset. Google (Books/OCR), Tech. Rep.[Online]. Available: http://yaroslavvb. blogspot. it/2011/09/notmnist-dataset. html, 2, 2011.
- Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
- Efficient lifelong learning with a-GEM. In International Conference on Learning Representations, 2019a. URL https://openreview.net/forum?id=Hkf2_sC5FX.
- On tiny episodic memories in continual learning, 2019b. URL https://arxiv.org/abs/1902.10486.
- On projected stochastic gradient descent algorithm with weighted averaging for least squares regression. IEEE Transactions on Automatic Control, 62(11):5974–5981, 2017. doi: 10.1109/TAC.2017.2705559.
- Continual learning: A comparative study on how to defy forgetting in classification tasks. arXiv preprint arXiv:1909.08383, 2(6):2, 2019.
- Federated class-incremental learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022.
- Cynthia Dwork. Differential privacy. In International colloquium on automata, languages, and programming, pp. 1–12. Springer, 2006.
- Uncertainty-guided continual learning with bayesian neural networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=HklUCCVKDB.
- How much privacy does federated learning with secure aggregation guarantee? Proc. Priv. Enhancing Technol., 2023:510–526, 2022.
- Decepticons: Corrupted transformers breach privacy in federated learning for language models, 2022. URL https://arxiv.org/abs/2201.12675.
- Improved schemes for episodic memory-based lifelong learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 1023–1035. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/0b5e29aa1acf8bdc5d8935d7036fa4f5-Paper.pdf.
- Hepco: Data-free heterogeneous prompt consolidation for continual federated learning, 2023.
- Federated learning for mobile keyboard prediction, 2018. URL https://arxiv.org/abs/1811.03604.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, mar 2017. doi: 10.1073/pnas.1611835114. URL https://doi.org/10.1073%2Fpnas.1611835114.
- SAlex Krizhevsky. Learning multiple layers of features from tiny images, 2009.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791.
- Subspace based federated unlearning. CoRR, abs/2302.12448, 2023. URL https://doi.org/10.48550/arXiv.2302.12448.
- Federated optimization in heterogeneous networks. Proceedings of the 3rd MLSys Conference, 2020a.
- On the convergence of fedavg on non-iid data. In International Conference on Learning Representations, 2020b. URL https://openreview.net/forum?id=HJxNAnVtDS.
- Continual learning with recursive gradient optimization. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=7YDLgf9_zgm.
- Gradient episodic memory for continual learning. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/f87522788a2be2d171666752f97ddebb-Paper.pdf.
- Continual federated learning based on knowledge distillation. In Lud De Raedt (ed.), Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pp. 2182–2188. International Joint Conferences on Artificial Intelligence Organization, 7 2022. doi: 10.24963/ijcai.2022/303. URL https://doi.org/10.24963/ijcai.2022/303. Main Track.
- Packnet: Adding multiple tasks to a single network by iterative pruning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7765–7773. Computer Vision Foundation / IEEE Computer Society, 2018. doi: 10.1109/CVPR.2018.00810. URL http://openaccess.thecvf.com/content_cvpr_2018/html/Mallya_PackNet_Adding_Multiple_CVPR_2018_paper.html.
- Communication-efficient learning of deep networks from decentralized data. In Aarti Singh and Xiaojin (Jerry) Zhu (eds.), Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, volume 54 of Proceedings of Machine Learning Research, pp. 1273–1282. PMLR, 2017. URL http://proceedings.mlr.press/v54/mcmahan17a.html.
- Reading digits in natural images with unsupervised feature learning. NIPS, 01 2011.
- Latent semantic indexing: A probabilistic analysis. In Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pp. 159–168, 1998.
- Better generative replay for continual federated learning, 2023.
- Experience replay for continual learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/fa7cdfad1a5aaf8370ebeda47a1ff1c3-Paper.pdf.
- Progressive neural networks. CoRR, abs/1606.04671, 2016. URL http://arxiv.org/abs/1606.04671.
- Gradient projection memory for continual learning. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=3AOj0RCNC2.
- Incremental learning in deep convolutional neural networks using partial network sharing. IEEE Access, 8:4615–4628, 2020. doi: 10.1109/ACCESS.2019.2963056.
- Overcoming catastrophic forgetting with hard attention to the task. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 4548–4557. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/serra18a.html.
- Continual learning with deep generative replay. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/0efbe98067c6c73dba1250d2beaa81f9-Paper.pdf.
- One-shot federated learning without server-side training. Neural Networks, 164:203–215, jul 2023. doi: 10.1016/j.neunet.2023.04.035. URL https://doi.org/10.1016%2Fj.neunet.2023.04.035.
- Gido M. van de Ven and Andreas S. Tolias. Three scenarios for continual learning, 2019. URL https://arxiv.org/abs/1904.07734.
- A field guide to federated optimization, 2021. URL https://arxiv.org/abs/2107.06917.
- Reconstructing training data from model gradient, provably, 2022. URL https://arxiv.org/abs/2212.03714.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. ArXiv, 2017.
- Flop: Federated learning on medical datasets using partial networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, pp. 3845–3853, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383325. doi: 10.1145/3447548.3467185. URL https://doi.org/10.1145/3447548.3467185.
- Lifelong learning with dynamically expandable networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Sk7KsfW0-.
- Scalable and order-robust continual learning with additive parameter decomposition. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=r1gdj2EKPB.
- Federated continual learning with weighted inter-client transfer. In International Conference on Machine Learning, 2021. URL https://arxiv.org/abs/2003.03196.
- Target: Federated class-continual learning via exemplar-free distillation, 2023.