InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning (2404.00228v3)
Abstract: Continual learning requires the model to learn multiple tasks sequentially. In continual learning, the model should possess the ability to maintain its performance on old tasks (stability) and the ability to adapt to new tasks continuously (plasticity). Recently, parameter-efficient fine-tuning (PEFT), which involves freezing a pre-trained model and injecting a small number of learnable parameters to adapt to downstream tasks, has gained increasing popularity in continual learning. Although existing continual learning methods based on PEFT have demonstrated superior performance compared to those not based on PEFT, most of them do not consider how to eliminate the interference of the new task on the old tasks, which inhibits the model from making a good trade-off between stability and plasticity. In this work, we propose a new PEFT method, called interference-free low-rank adaptation (InfLoRA), for continual learning. InfLoRA injects a small number of parameters to reparameterize the pre-trained weights and shows that fine-tuning these injected parameters is equivalent to fine-tuning the pre-trained weights within a subspace. Furthermore, InfLoRA designs this subspace to eliminate the interference of the new task on the old tasks, making a good trade-off between stability and plasticity. Experimental results show that InfLoRA outperforms existing state-of-the-art continual learning methods on multiple datasets.
- Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision, pages 139–154, 2018.
- Online continual learning with maximal interfered retrieval. In Advances in Neural Information Processing Systems, pages 11849–11860, 2019a.
- Gradient based sample selection for online continual learning. In Advances in Neural Information Processing Systems, pages 11816–11825, 2019b.
- Transfer without forgetting. In Proceedings of the European Conference on Computer Vision, pages 692–709, 2022.
- Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9650–9660, 2021.
- Adaptformer: Adapting vision transformers for scalable visual recognition. Advances in Neural Information Processing Systems, pages 16664–16678, 2022.
- Online continual learning from imbalanced data. In Proceedings of the International Conference on Machine Learning, pages 1952–1961, 2020.
- Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, pages 4171–4186, 2019.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
- Adapterbias: Parameter-efficient token-dependent representation shift for adapters in nlp tasks. In Findings of the Association for Computational Linguistics, pages 2608–2621, 2022.
- A unified continual learning framework with general parameter-efficient tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11449–11459, 2023.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021.
- Parameter-efficient transfer learning for nlp. In Proceedings of the International Conference on Machine Learning, pages 2790–2799, 2019.
- Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
- Compacting, picking and growing for unforgetting continual learning. In Advances in Neural Information Processing Systems, pages 13647–13657, 2019.
- Visual prompt tuning. In Proceedings of the European Conference on Computer Vision, pages 709–727, 2022.
- Fact: Factor-tuning for lightweight adaptation on vision transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1060–1068, 2023.
- Continual learning with node-importance based adaptive group sparse regularization. Advances in Neural Information Processing Systems, pages 3647–3658, 2020.
- Introducing language guidance in prompt-based continual learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11463–11473, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, pages 3521–3526, 2017.
- A Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, University of Tront, 2009.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, 2021.
- Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting. In Proceedings of the International Conference on Machine Learning, pages 3925–3934, 2019.
- Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 4582–4597, 2021.
- Loss decoupling for task-agnostic continual learning. In Advances in Neural Information Processing Systems, 2023a.
- Adaptive plasticity improvement for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7816–7825, 2023b.
- Trgp: Trust region gradient projection for continual learning. In International Conference on Learning Representations, 2022.
- Compacter: Efficient low-rank hypercomplex adapter layers. In Advances in Neural Information Processing Systems, pages 1022–1035, 2021.
- On the importance of cross-task features for class-incremental learning. arXiv preprint arXiv:2106.11930, 2021.
- Continual lifelong learning with neural networks: A review. Neural Networks, pages 54–71, 2019.
- Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1406–1415, 2019.
- Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
- Gradient projection memory for continual learning. In International Conference on Learning Representations, 2021.
- Continual diffusion: Continual customization of text-to-image diffusion with c-lora. CoRR, 2023a.
- Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11909–11919, 2023b.
- Exploring example influence in continual learning. Advances in Neural Information Processing Systems, pages 27075–27086, 2022.
- Hierarchical decomposition of prompt-based continual learning: Rethinking obscured sub-optimality. arXiv preprint arXiv:2310.07234, 2023a.
- A comprehensive survey of continual learning: Theory, method and application. arXiv preprint arXiv:2302.00487, 2023b.
- S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning. Advances in Neural Information Processing Systems, pages 5682–5695, 2022a.
- Dualprompt: Complementary prompting for rehearsal-free continual learning. In Proceedings of the European Conference on Computer Vision, pages 631–648, 2022b.
- Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149, 2022c.
- Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 1–9, 2022.
- Continual learning through synaptic intelligence. In Proceedings of the International Conference on Machine Learning, pages 3987–3995, 2017.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, pages 107–115, 2021.
- SLCA: slow learner with classifier alignment for continual learning on a pre-trained model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19091–19101, 2023.
- Preventing zero-shot transfer degradation in continual learning of vision-language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19068–19079, 2023.
- Image bert pre-training with online tokenizer. In International Conference on Learning Representations, 2022.
- Prototype augmentation and self-supervision for incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5871–5880, 2021.