Continual Forgetting for Pre-trained Vision Models (2403.11530v2)
Abstract: For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners. These requests usually form a sequence. Therefore, under such a setting, selective information is expected to be continuously removed from a pre-trained model while maintaining the rest. We define this problem as continual forgetting and identify two key challenges. (i) For unwanted knowledge, efficient and effective deleting is crucial. (ii) For remaining knowledge, the impact brought by the forgetting procedure should be minimal. To address them, we propose Group Sparse LoRA (GS-LoRA). Specifically, towards (i), we use LoRA modules to fine-tune the FFN layers in Transformer blocks for each forgetting task independently, and towards (ii), a simple group sparse regularization is adopted, enabling automatic selection of specific LoRA groups and zeroing out the others. GS-LoRA is effective, parameter-efficient, data-efficient, and easy to implement. We conduct extensive experiments on face recognition, object detection and image classification and demonstrate that GS-LoRA manages to forget specific classes with minimal impact on other classes. Codes will be released on \url{https://github.com/bjzhb666/GS-LoRA}.
- Expert gate: Lifelong learning with a network of experts. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3366–3375, 2017.
- Memory aware synapses: Learning what (not) to forget. In Proceedings of the European conference on computer vision (ECCV), pages 139–154, 2018.
- Machine unlearning: Linear filtration for logit-based classifiers. Machine Learning, 111(9):3203–3226, 2022.
- Measuring and regularizing networks in function space. arXiv preprint arXiv:1805.08289, 2018.
- Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP), pages 141–159. IEEE, 2021.
- Machine unlearning for random forests. In International Conference on Machine Learning, pages 1092–1104. PMLR, 2021.
- Language models are few-shot learners. arXiv: Computation and Language,arXiv: Computation and Language, 2020.
- Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020.
- Learning to unlearn: Instance-wise unlearning for pre-trained classifiers. arXiv preprint arXiv:2301.11578, 2023.
- One-for-all: Generalized lora for parameter-efficient fine-tuning. arXiv preprint arXiv:2306.07967, 2023.
- Sam fails to segment anything? – sam-adapter: Adapting sam in underperformed scenes: Camouflage, shadow, medical image segmentation, and more. 2023.
- A novel online incremental and decremental learning algorithm based on variable support vector machine. Cluster Computing, 22:7435–7445, 2019.
- Excavating ai: The politics of images in machine learning training sets. Ai & Society, 36(4):1105–1116, 2021.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Dytox: Transformers for continual learning with dynamic token expansion. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Lifelong anomaly detection through unlearning. In Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, pages 1283–1297, 2019.
- Learning the structure of deep convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 2749–2757, 2015.
- Transformer feed-forward layers are key-value memories. arXiv preprint arXiv:2012.14913, 2020.
- Making ai forget you: Data deletion in machine learning. Advances in neural information processing systems, 32, 2019.
- Eternal sunshine of the spotless net: Selective forgetting in deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9304–9312, 2020.
- Eric Goldman. An introduction to the california consumer privacy act (ccpa). Santa Clara Univ. Legal Studies Research Paper, 2020.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Certified data removal from machine learning models. arXiv preprint arXiv:1911.03030, 2019.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Masked autoencoders are scalable vision learners. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Selective experience replay for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
- Approximate data deletion from machine learning models. In International Conference on Artificial Intelligence and Statistics, pages 2008–2016. PMLR, 2021.
- Knowledge unlearning for mitigating privacy risks in language models. arXiv preprint arXiv:2210.01504, 2022.
- Visual prompt tuning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII, pages 709–727. Springer, 2022.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- Towards unbounded machine unlearning. Advances in neural information processing systems, 2023.
- Continual classification learning using generative models. arXiv preprint arXiv:1810.10612, 2018.
- What would elsa do? freezing layers during transformer fine-tuning. arXiv preprint arXiv:1911.03090, 2019.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
- Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- Sparse convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 806–814, 2015.
- Gpt understands, too. arXiv preprint arXiv:2103.10385, 2021a.
- Adaptive aggregation networks for class-incremental learning. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 2544–2553, 2021b.
- Continual detection transformer for incremental object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23799–23808, 2023.
- Certifiable machine unlearning for linear models. arXiv preprint arXiv:2106.15093, 2021.
- Packnet: Adding multiple tasks to a single network by iterative pruning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
- Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In Proceedings of the European conference on computer vision (ECCV), pages 67–82, 2018.
- A survey of machine unlearning. arXiv preprint arXiv:2209.02299, 2022.
- Expanding language-image pretrained models for general video recognition. In European Conference on Computer Vision, pages 1–18. Springer, 2022.
- Trevor Paglen. Imagenet roulette – trevor paglen. https://paglen.studio/2020/04/29/imagenet-roulette/, 2020.
- Theory II: Landscape of the empirical risk in deep learning. PhD thesis, Center for Brains, Minds and Machines (CBMM), arXiv, 2017.
- Language models are unsupervised multitask learners.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked languagemodels. arXiv preprint arXiv:2106.10199, 2021.
- icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017.
- General Data Protection Regulation. General data protection regulation (gdpr). Intersoft Consulting, Accessed in October, 24(1), 2018.
- Experience replay for continual learning. Advances in Neural Information Processing Systems, 32, 2019.
- Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
- Progress & compress: A scalable framework for continual learning. In International conference on machine learning, pages 4528–4537. PMLR, 2018.
- Remember what you want to forget: Algorithms for machine unlearning. Advances in Neural Information Processing Systems, 34:18075–18086, 2021.
- Learning with selective forgetting. In IJCAI, page 4, 2021.
- Continual learning with deep generative replay. Advances in neural information processing systems, 30, 2017.
- Lazy machine unlearning strategy for random forests. In International Conference on Web Information Systems and Applications, pages 383–390. Springer, 2023.
- Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. arXiv preprint arXiv:2210.07558, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5265–5274, 2018.
- Bootstrap masked visual modeling via hard patches mining. arXiv preprint arXiv:2312.13714, 2023a.
- Droppos: Pre-training vision transformers by reconstructing dropped positions. Advances in Neural Information Processing Systems, 2023b.
- Hard patches mining for masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10375–10385, 2023c.
- Learning to prompt for continual learning. In CVPR22, 2022.
- Learning structured sparsity in deep neural networks. Advances in neural information processing systems, 29, 2016.
- Machine unlearning: Solutions and challenges. arXiv preprint arXiv:2308.07061, 2023.
- Arcane: An efficient architecture for exact machine unlearning. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 4006–4013, 2022.
- Learning with recoverable forgetting. In ECCV, 2022.
- Learning face representation from scratch. arXiv preprint arXiv:1411.7923, 2014.
- Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1):49–67, 2006.
- Continual stereo matching of continuous driving scenes with growing architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18901–18910, 2022.
- A single-loop smoothed gradient descent-ascent algorithm for nonconvex-concave min-max problems. Advances in neural information processing systems, 33:7377–7389, 2020.
- Tip-adapter: Training-free clip-adapter for better vision-language modeling.
- Face transformer for recognition. arXiv preprint arXiv:2103.14803, 2021.
- Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022.
- Class-incremental learning via dual augmentation. Advances in Neural Information Processing Systems, 34:14306–14318, 2021a.
- Prototype augmentation and self-supervision for incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5871–5880, 2021b.
- Learning by seeing more classes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6):7477–7493, 2022.
- Imitating the oracle: Towards calibrated model for class incremental learning. Neural Networks, 164:38–48, 2023.
- Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020.