Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering (2210.00044v2)
Abstract: Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge. Although continual learning has been widely studied in computer vision, its application to Vision+Language tasks is not that straightforward, as settings can be parameterized in multiple ways according to their input modalities. In this paper, we present a detailed study of how different settings affect performance for Visual Question Answering. We first propose three plausible task formulations and demonstrate their impact on the performance of continual learning algorithms. We break down several factors of task similarity, showing that performance and sensitivity to task order highly depend on the shift of the output distribution. We also investigate the potential of pretrained models and compare the robustness of transformer models with different visual embeddings. Finally, we provide an analysis interpreting model representations and their impact on forgetting. Our results highlight the importance of stabilizing visual representations in deeper layers.
- Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA. Association for Computing Machinery.
- Efficient lifelong learning with a-GEM. In International Conference on Learning Representations.
- Continual learning with tiny episodic memories. CoRR, abs/1902.10486.
- Uniter: Universal image-text representation learning. In Computer Vision – ECCV 2020, pages 104–120, Cham. Springer International Publishing.
- Ratt: Recurrent attention to transient tasks for continual image captioning. In Advances in Neural Information Processing Systems, volume 33, pages 16736–16748. Curran Associates, Inc.
- A continual learning survey: Defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1.
- An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211.
- Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- Psycholinguistics meets continual learning: Measuring catastrophic forgetting in visual question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3601–3605, Florence, Italy. Association for Computational Linguistics.
- Embracing change: Continual learning in deep neural networks. Trends in Cognitive Sciences, 24(12):1028–1040.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
- Towards a robust experimental framework and benchmark for lifelong language learning. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
- Visually grounded continual learning of compositional phrases. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2018–2029, Online. Association for Computational Linguistics.
- Data augmentation for visual question answering. In Proceedings of the 10th International Conference on Natural Language Generation, pages 198–202, Santiago de Compostela, Spain. Association for Computational Linguistics.
- Discovering the unknown knowns: Turning implicit knowledge in the dataset into explicit training examples for visual question answering. arXiv preprint arXiv:2109.06122.
- Vilt: Vision-and-language transformer without convolution or region supervision. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 5583–5594. PMLR.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526.
- Similarity of neural network representations revisited. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 3519–3529. PMLR.
- Mind the gap: Assessing temporal generalization in neural language models. In Advances in Neural Information Processing Systems.
- Lillian Lee. 2001. On the effectiveness of the skew divergence for statistical language analysis. In Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, volume R3 of Proceedings of Machine Learning Research, pages 176–183. PMLR. Reissued by PMLR on 31 March 2021.
- Continual learning in the teacher-student setup: Impact of task similarity. In 2021 International Conference on Machine Learning.
- Symbolic replay: Scene graph as prompt for continual learning on vqa task. arXiv preprint arXiv:2208.12037.
- Zhizhong Li and Derek Hoiem. 2018. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947.
- Microsoft coco: Common objects in context. In Computer Vision – ECCV 2014, pages 740–755, Cham. Springer International Publishing.
- Vincenzo Lomonaco and Davide Maltoni. 2017. Core50: a new dataset and benchmark for continuous object recognition. In Proceedings of the 1st Annual Conference on Robot Learning, volume 78 of Proceedings of Machine Learning Research, pages 17–26. PMLR.
- David Lopez-Paz and Marc’Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, volume 30, pages 6470–6479.
- 12-in-1: Multi-task vision and language representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pages 109–165. Elsevier.
- An empirical investigation of the role of pre-training in lifelong learning. arXiv preprint arXiv:2112.09153.
- Generalized class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 240–241.
- Understanding the role of training regimes in continual learning. In Advances in Neural Information Processing Systems, volume 33, pages 7308–7320. Curran Associates, Inc.
- Toward understanding catastrophic forgetting in continual learning. CoRR, abs/1908.01091.
- Contcap: A scalable framework for continual image captioning. arXiv preprint arXiv:1909.08745.
- Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543.
- Anatomy of catastrophic forgetting: Hidden representations and task semantics. In International Conference on Learning Representations.
- Roger Ratcliff. 1990. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychological review, 97(2):285.
- icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
- Learning to learn without forgetting by maximizing transfer and minimizing interference. In International Conference on Learning Representations.
- Sebastian Ruder and Barbara Plank. 2017. Learning to select data for transfer learning with Bayesian optimization. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 372–382, Copenhagen, Denmark. Association for Computational Linguistics.
- Climb: A continual learning benchmark for vision-and-language tasks. arXiv preprint arXiv:2206.09059.
- Which tasks should be learned together in multi-task learning? In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 9120–9132. PMLR.
- Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3645–3650, Florence, Italy. Association for Computational Linguistics.
- CompGuessWhat?!: A multi-task evaluation framework for grounded language learning. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7625–7641, Online. Association for Computational Linguistics.
- Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NEURIPS’17, page 1195–1204, Red Hook, NY, USA. Curran Associates Inc.
- Gido M Van de Ven and Andreas S Tolias. 2018. Generative replay with feedback connections as a general strategy for continual learning. arXiv preprint arXiv:1809.10635.
- Gido M Van de Ven and Andreas S Tolias. 2019. Three scenarios for continual learning. arXiv preprint arXiv:1904.07734.
- Separating skills and concepts for novel visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5632–5641.
- Pretrained language model in continual learning: A comparative study. In International Conference on Learning Representations.
- Scalable and order-robust continual learning with additive parameter decomposition. In International Conference on Learning Representations.
- Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).