Continual Adaptation of Vision Transformers for Federated Learning (2306.09970v2)
Abstract: In this paper, we focus on the important yet understudied problem of Continual Federated Learning (CFL), where a server communicates with a set of clients to incrementally learn new concepts over time without sharing or storing any data. The complexity of this problem is compounded by challenges from both the Continual and Federated Learning perspectives. Specifically, models trained in a CFL setup suffer from catastrophic forgetting which is exacerbated by data heterogeneity across clients. Existing attempts at this problem tend to impose large overheads on clients and communication channels or require access to stored data which renders them unsuitable for real-world use due to privacy. In this paper, we attempt to tackle forgetting and heterogeneity while minimizing overhead costs and without requiring access to any stored data. We study this problem in the context of Vision Transformers and explore parameter-efficient approaches to adapt to dynamic distributions while minimizing forgetting. We achieve this by leveraging a prompting based approach (such that only prompts and classifier heads have to be communicated) and proposing a novel and lightweight generation and distillation scheme to consolidate client models at the server. We formulate this problem for image classification and establish strong baselines for comparison, conduct experiments on CIFAR-100 as well as challenging, large-scale datasets like ImageNet-R and DomainNet. Our approach outperforms both existing methods and our own baselines by as much as 7% while significantly reducing communication and client-level computation costs. Code available at https://github.com/shaunak27/hepco-fed.
- Communication-efficient learning of deep networks from decentralized data, 2023.
- Re-evaluating continual learning scenarios: A categorization and case for strong baselines. arXiv preprint arXiv:1810.12488, 2018.
- Gido M van de Ven and Andreas S Tolias. Three scenarios for continual learning. arXiv preprint arXiv:1904.07734, 2019.
- Federated learning with non-iid data. CoRR, abs/1806.00582, 2018.
- On the convergence of fedavg on non-iid data, 2020.
- Federated continual learning with weighted inter-client transfer, 2021.
- Continual federated learning based on knowledge distillation. In Lud De Raedt, editor, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 2182–2188. International Joint Conferences on Artificial Intelligence Organization, 7 2022. Main Track.
- Better generative replay for continual federated learning, 2023.
- Extracting training data from diffusion models, 2023.
- Adversarial continual learning. arXiv preprint arXiv:2003.09553, 2020.
- A neural dirichlet process mixture model for task-free continual learning. arXiv preprint arXiv:2001.00689, 2020.
- Core50: a new dataset and benchmark for continuous object recognition. arXiv preprint arXiv:1705.03550, 2017.
- Continuous learning in single-incremental-task scenarios. Neural Networks, 116:56–73, 2019.
- Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
- Ss-il: Separated softmax for incremental learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 844–853, October 2021.
- Memory aware synapses: Learning what (not) to forget. In ECCV, 2018.
- Lifelong learning via progressive distillation and retrospection. In Proceedings of the European Conference on Computer Vision (ECCV), pages 437–452, 2018.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 2017.
- Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
- Continual learning through synaptic intelligence. In International Conference on Machine Learning, 2017.
- Rainbow memory: Continual learning with a memory of diverse samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8218–8227, 2021.
- Continual learning with tiny episodic memories. arXiv preprint arXiv:1902.10486, 2019.
- Memory efficient experience replay for streaming learning. In 2019 International Conference on Robotics and Automation (ICRA), pages 9769–9776. IEEE, 2019.
- Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 831–839, 2019.
- Gradient episodic memory for continual learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pages 6470–6479, USA, 2017. Curran Associates Inc.
- Learning to remember: A synaptic plasticity driven framework for continual learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11321–11329, 2019.
- icarl: Incremental classifier and representation learning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR’17, pages 5533–5542, 2017.
- Continual learning with deep generative replay. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 2990–2999. Curran Associates, Inc., 2017.
- Continual learning with hypernetworks. arXiv preprint arXiv:1906.00695, 2019.
- Brain-inspired replay for continual learning with artificial neural networks. Nature communications, 11(1):1–14, 2020.
- Lifelong machine learning with deep streaming linear discriminant analysis. arXiv preprint arXiv:1909.01520, 2019.
- Rehearsal-free continual learning over small non-iid batches. In CVPR Workshops, pages 989–998, 2020.
- Semantic drift compensation for class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6982–6991, 2020.
- Striking a balance between stability and plasticity for class-incremental learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1124–1133, 2021.
- Prototype augmentation and self-supervision for incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5871–5880, 2021.
- Dual-teacher class-incremental learning with data-free generative replay. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3543–3552, 2021.
- End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nature Machine Intelligence, 3(6):473–484, 2021.
- Always be dreaming: A new approach for data-free class-incremental learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9374–9384, October 2021.
- Dreaming to distill: Data-free knowledge transfer via deepinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8715–8724, 2020.
- Novel class discovery without forgetting, 2022.
- Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning. arXiv preprint arXiv:2211.13218, 2022. Accepted for publication at CVPR 2023.
- Learning to prompt for continual learning, 2022.
- Dualprompt: Complementary prompting for rehearsal-free continual learning, 2022.
- Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2:429–450, 2020.
- Fedpd: A federated learning framework with optimal rates and adaptivity to non-iid data. arXiv preprint arXiv:2005.11418, 2020.
- Federated learning based on dynamic regularization. arXiv preprint arXiv:2111.04263, 2021.
- Scaffold: Stochastic controlled averaging for federated learning, 2021.
- Knowledge distillation for federated learning: a practical guide. arXiv preprint arXiv:2211.04742, 2022.
- Ensemble distillation for robust model fusion in federated learning. Advances in Neural Information Processing Systems, 33:2351–2363, 2020.
- Fedaux: Leveraging unlabeled auxiliary data in federated learning. IEEE Transactions on Neural Networks and Learning Systems, 2021.
- Fine-tuning global model via data-free knowledge distillation for non-iid federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10174–10183, 2022.
- Federated class-incremental learning, 2022.
- Addressing catastrophic forgetting in federated class-continual learning, 2023.
- Learning imbalanced datasets with label-distribution-aware margin loss, 2019.
- Preservation of the global knowledge by not-true distillation in federated learning, 2022.
- Measuring the effects of non-identical data distribution for federated visual classification, 2019.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Fine-tuning global model via data-free knowledge distillation for non-iid federated learning, 2022.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
- Dualprompt: Complementary prompting for rehearsal-free continual learning. arXiv preprint arXiv:2204.04799, 2022.
- Learning multiple layers of features from tiny images. Tech Report, 2009.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. ICCV, 2021.
- Moment matching for multi-source domain adaptation, 2019.
- Learning multiple visual domains with residual adapters. Advances in neural information processing systems, 30, 2017.
- Efficient lifelong learning with a-GEM. In International Conference on Learning Representations, 2019.
- Data-free knowledge distillation for deep neural networks. arXiv preprint arXiv:1710.07535, 2017.
- A closer look at rehearsal-free continual learning, 2023.
- Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
- Adam: A method for stochastic optimization, 2017.
- Predicting the computational cost of deep learning models, 2018.
- Carbon emissions and large neural network training, 2021.
- Quantifying the carbon emissions of machine learning, 2019.