Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Memory Replay for Continual Learning (2404.12526v1)

Published 18 Apr 2024 in cs.LG, cs.CL, and cs.CV

Abstract: Foundation Models (FMs) have become the haLLMark of modern AI, however, these models are trained on massive data, leading to financially expensive training. Updating FMs as new data becomes available is important, however, can lead to `catastrophic forgetting', where models underperform on tasks related to data sub-populations observed too long ago. This continual learning (CL) phenomenon has been extensively studied, but primarily in a setting where only a small amount of past data can be stored. We advocate for the paradigm where memory is abundant, allowing us to keep all previous data, but computational resources are limited. In this setting, traditional replay-based CL approaches are outperformed by a simple baseline which replays past data selected uniformly at random, indicating that this setting necessitates a new approach. We address this by introducing a framework of adaptive memory replay for continual learning, where sampling of past data is phrased as a multi-armed bandit problem. We utilize Bolzmann sampling to derive a method which dynamically selects past data for training conditioned on the current task, assuming full data access and emphasizing training efficiency. Through extensive evaluations on both vision and language pre-training tasks, we demonstrate the effectiveness of our approach, which maintains high performance while reducing forgetting by up to 10% at no training efficiency cost.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Automatic semantic classification of scientific literature according to the hallmarks of cancer. Bioinform., 32(3):432–440, 2016.
  2. On the opportunities and risks of foundation models, 2022.
  3. Improving replay sample selection and storage for less forgetting in continual learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3540–3549, 2023.
  4. Language models are few-shot learners, 2020.
  5. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122, 2012.
  6. Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020.
  7. Rethinking experience replay: a bag of tricks for continual learning. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 2180–2187. IEEE, 2021.
  8. Efficient intent detection with dual sentence encoders. In Proceedings of the 2nd Workshop on NLP for ConvAI - ACL 2020, 2020. Data available at https://github.com/PolyAI-LDN/task-specific-datasets.
  9. Going beyond nouns with vision & language models using synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20155–20165, 2023.
  10. End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV), pages 233–248, 2018.
  11. Using hindsight to anchor past knowledge in continual learning. In Proceedings of the AAAI conference on artificial intelligence, pages 6993–7001, 2021.
  12. Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  13. Dytox: Transformers for continual learning with dynamic token expansion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9285–9295, 2022.
  14. Adversarial continual learning. arXiv preprint arXiv:2003.09553, 2020.
  15. Memory efficient continual learning with transformers. Advances in Neural Information Processing Systems, 35:10629–10642, 2022.
  16. Real-time evaluation in online continual learning: A new hope. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11888–11897, 2023.
  17. Selective replay enhances learning in online continual analogical reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3502–3512, 2021.
  18. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  19. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  20. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
  21. Re-evaluating continual learning scenarios: A categorization and case for strong baselines. arXiv preprint arXiv:1810.12488, 2018.
  22. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  23. Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285, 1996.
  24. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
  25. A neural dirichlet process mixture model for task-free continual learning. arXiv preprint arXiv:2001.00689, 2020.
  26. Technical report for iccv 2021 challenge sslad-track3b: Transformers are better continual learners. arXiv preprint arXiv:2201.04924, 2022.
  27. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
  28. Core50: a new dataset and benchmark for continuous object recognition. arXiv preprint arXiv:1705.03550, 2017.
  29. Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In Proceedings of the European conference on computer vision (ECCV), pages 67–82, 2018.
  30. Continuous learning in single-incremental-task scenarios. Neural Networks, 116:56–73, 2019.
  31. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, pages 109–165. Elsevier, 1989.
  32. Pointer sentinel mixture models, 2016.
  33. OpenAI. Gpt-4 technical report, 2023.
  34. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1406–1415, 2019.
  35. Generating summaries with topic templates and structured convolutional decoders. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019. Association for Computational Linguistics.
  36. Gdumb: A simple approach that questions our progress in continual learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 524–540. Springer, 2020.
  37. Computationally budgeted continual learning: What does matter? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3698–3707, 2023.
  38. Learning transferable visual models from natural language supervision, 2021.
  39. Online continual learning with maximally interfered retrieval. In NIPS, 2019.
  40. Experience replay for continual learning. Advances in Neural Information Processing Systems, 32, 2019.
  41. High-resolution image synthesis with latent diffusion models, 2022.
  42. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
  43. Laion-5b: An open large-scale dataset for training next generation image-text models, 2022.
  44. Overcoming catastrophic forgetting with hard attention to the task. In International conference on machine learning, pages 4548–4557. PMLR, 2018.
  45. BIGPATENT: A large-scale dataset for abstractive and coherent summarization. CoRR, abs/1906.03741, 2019.
  46. Online class-incremental continual learning with adversarial shapley value. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 9630–9638, 2021.
  47. Aleksandrs Slivkins et al. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning, 12(1-2):1–286, 2019.
  48. Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11909–11919, 2023.
  49. Reinforcement learning: An introduction. MIT press, 2018.
  50. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  51. Llama: Open and efficient foundation language models, 2023b.
  52. Omnivl:one foundation model for image-language and video-language tasks, 2022a.
  53. Dualprompt: Complementary prompting for rehearsal-free continual learning. In European Conference on Computer Vision, pages 631–648. Springer, 2022b.
  54. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149, 2022c.
  55. Chain-of-thought prompting elicits reasoning in large language models, 2023.
  56. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
  57. Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data, 10(1):41, 2023.
  58. Lifelong learning with dynamically expandable networks. arXiv preprint arXiv:1708.01547, 2017.
  59. A simple approach for non-stationary linear bandits. In International Conference on Artificial Intelligence and Statistics, pages 746–755. PMLR, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. James Seale Smith (15 papers)
  2. Lazar Valkov (4 papers)
  3. Shaunak Halbe (6 papers)
  4. Vyshnavi Gutta (2 papers)
  5. Rogerio Feris (105 papers)
  6. Zsolt Kira (110 papers)
  7. Leonid Karlinsky (79 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.