Class Incremental Learning via Likelihood Ratio Based Task Prediction (2309.15048v4)
Abstract: Class incremental learning (CIL) is a challenging setting of continual learning, which learns a series of tasks sequentially. Each task consists of a set of unique classes. The key feature of CIL is that no task identifier (or task-id) is provided at test time. Predicting the task-id for each test sample is a challenging problem. An emerging theory-guided approach (called TIL+OOD) is to train a task-specific model for each task in a shared network for all tasks based on a task-incremental learning (TIL) method to deal with catastrophic forgetting. The model for each task is an out-of-distribution (OOD) detector rather than a conventional classifier. The OOD detector can perform both within-task (in-distribution (IND)) class prediction and OOD detection. The OOD detection capability is the key to task-id prediction during inference. However, this paper argues that using a traditional OOD detector for task-id prediction is sub-optimal because additional information (e.g., the replay data and the learned tasks) available in CIL can be exploited to design a better and principled method for task-id prediction. We call the new method TPL (Task-id Prediction based on Likelihood Ratio). TPL markedly outperforms strong CIL baselines and has negligible catastrophic forgetting. The code of TPL is publicly available at https://github.com/linhaowei1/TPL.
- Conditional channel gated networks for task-aware continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3931–3940, 2020.
- Ss-il: Separated softmax for incremental learning. In Proceedings of the IEEE/CVF International conference on computer vision, pp. 844–853, 2021.
- Expert gate: Lifelong learning with a network of experts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3366–3375, 2017.
- Online continual learning on a contaminated data stream with blurry task boundaries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9275–9284, 2022.
- Towards open set deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1563–1572, 2016.
- Consistency is the key to further mitigating catastrophic forgetting in continual learning. In CoLLAs, 2022. URL https://api.semanticscholar.org/CorpusID:250425816.
- Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020.
- Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
- End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV), pp. 233–248, 2018.
- Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420, 2018.
- Using hindsight to anchor past knowledge in continual learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 6993–7001, 2021.
- A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385, 2021.
- Vos: Learning what you don’t know by virtual outlier synthesis. In International Conference on Learning Representations, 2021.
- Compositional visual generation with energy based models. Advances in Neural Information Processing Systems, 33:6637–6647, 2020.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pp. 1050–1059. PMLR, 2016.
- Online continual learning through mutual information maximization. In International Conference on Machine Learning, pp. 8109–8126. PMLR, 2022.
- Dealing with cross-task class discrimination in online continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11878–11887, 2023.
- Embracing change: Continual learning in deep neural networks. Trends in cognitive sciences, 24(12):1028–1040, 2020.
- Lifelong machine learning with deep streaming linear discriminant analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 220–221, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Masked autoencoders are scalable vision learners. CoRR, abs/2111.06377, 2021. URL https://arxiv.org/abs/2111.06377.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016.
- Deep anomaly detection with outlier exposure. arXiv preprint arXiv:1812.04606, 2018.
- Scaling out-of-distribution detection for real-world settings. arXiv preprint arXiv:1911.11132, 2019.
- Posterior meta-replay for continual learning. Advances in Neural Information Processing Systems, 34:14135–14149, 2021.
- Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pp. 2790–2799. PMLR, 2019.
- On the importance of gradients for detecting distributional shifts in the wild. Advances in Neural Information Processing Systems, 34:677–689, 2021.
- Birt: Bio-inspired replay in vision transformers for continual learning. ArXiv, abs/2305.04769, 2023. URL https://api.semanticscholar.org/CorpusID:258557568.
- Class-incremental learning by knowledge distillation with adaptive feature consolidation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16071–16080, 2022.
- Continual learning of natural language processing tasks: A survey. arXiv preprint arXiv:2211.12701, 2022.
- Achieving forgetting prevention and knowledge transfer in continual learning. Advances in Neural Information Processing Systems, 34:22443–22456, 2021a.
- Adapting bert for continual learning of a sequence of aspect sentiment classification tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4746–4755, 2021b.
- Continual pre-training of language models. In The Eleventh International Conference on Learning Representations (ICLR-2023), 2023.
- Fearnet: Brain-inspired model for incremental learning. arXiv preprint arXiv:1711.10563, 2017.
- A multi-head model for continual learning via out-of-distribution replay. In Conference on Lifelong Learning Agents, pp. 548–563. PMLR, 2022a.
- A theoretical study on solving continual learning. In Advances in Neural Information Processing Systems, 2022b.
- Learnability and algorithm for continual learning. ICML-2023, 2023.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- Convolutional deep belief networks on cifar-10. Unpublished manuscript, 40(7):1–9, 2010.
- Learning multiple layers of features from tiny images. 2009.
- Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7:7, 2015.
- Overcoming catastrophic forgetting with unlabeled data in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 312–321, 2019.
- Training confidence-calibrated classifiers for detecting out-of-distribution samples. In International Conference on Learning Representations, 2018a.
- A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018b.
- Continual few-shot intent detection. In Proceedings of the 29th International Conference on Computational Linguistics, pp. 333–343, 2022.
- Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690, 2017.
- Flats: Principled out-of-distribution detection with feature-based likelihood ratio score. ArXiv, abs/2310.05083, 2023. URL https://api.semanticscholar.org/CorpusID:263831173.
- Energy-based out-of-distribution detection. Advances in Neural Information Processing Systems, 33:21464–21475, 2020a.
- Mnemonics training: Multi-class incremental learning without forgetting. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020b.
- Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017.
- Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7765–7773, 2018.
- Ternary feature masks: zero-forgetting for task-incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3570–3579, 2021.
- Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pp. 109–165. Elsevier, 1989.
- Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136, 2018.
- Out-of-distribution detection with subspace techniques and probabilistic modeling of features. arXiv preprint arXiv:2012.04250, 2020.
- Ix. on the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231(694-706):289–337, 1933.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- itaml: An incremental task-agnostic meta-learning approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13588–13597, 2020.
- icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 2001–2010, 2017.
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
- Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
- Overcoming catastrophic forgetting with hard attention to the task. In International Conference on Machine Learning, pp. 4548–4557. PMLR, 2018.
- Class-incremental learning based on label generation. arXiv preprint arXiv:2306.12619, 2023.
- React: Out-of-distribution detection with rectified activations. Advances in Neural Information Processing Systems, 34:144–157, 2021.
- Out-of-distribution detection with deep nearest neighbors. arXiv preprint arXiv:2204.06507, 2022.
- Csi: Novelty detection via contrastive learning on distributionally shifted instances. Advances in neural information processing systems, 33:11839–11852, 2020.
- Variable kernel density estimation. The Annals of Statistics, pp. 1236–1265, 1992.
- On mixup training: Improved calibration and predictive uncertainty for deep neural networks. Advances in Neural Information Processing Systems, 32, 2019.
- Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, pp. 10347–10357. PMLR, 2021.
- Open-set recognition: A good closed-set classifier is all you need? In International Conference on Learning Representations (ICLR), 2022.
- Continual learning with hypernetworks. arXiv preprint arXiv:1906.00695, 2019.
- Beef: Bi-compatible class-incremental learning via energy-based expansion and fusion. In The Eleventh International Conference on Learning Representations, 2022a.
- Foster: Feature boosting and compression for class-incremental learning. In European conference on computer vision, pp. 398–414. Springer, 2022b.
- Vim: Out-of-distribution with virtual-logit matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4921–4930, 2022c.
- A comprehensive survey of continual learning: Theory, method and application, 2023.
- Cmg: A class-mixed generation approach to out-of-distribution detection. Proceedings of ECML/PKDD-2022, 2022d.
- Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 139–149, 2022e.
- Mitigating neural network overconfidence with logit normalization. 2022.
- Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
- Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3):37–52, 1987.
- Supermasks in superposition. Advances in Neural Information Processing Systems, 33:15173–15184, 2020.
- Tkil: Tangent kernel optimization for class balanced incremental learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3529–3539, 2023.
- Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3014–3023, 2021.
- Openood: Benchmarking generalized out-of-distribution detection. 2022.
- Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 6023–6032, 2019.
- Continual learning of context-dependent processing in neural networks. Nature Machine Intelligence, 1(8):364–372, 2019.
- Mixture outlier exposure: Towards out-of-distribution detection in fine-grained environments. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5531–5540, 2023.
- Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need. arXiv preprint arXiv:2303.07338, 2023.
- Prototype augmentation and self-supervision for incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5871–5880, 2021.
- Haowei Lin (21 papers)
- Yijia Shao (18 papers)
- Weinan Qian (1 paper)
- Ningxin Pan (2 papers)
- Yiduo Guo (11 papers)
- Bing Liu (212 papers)