Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Class Incremental Learning via Likelihood Ratio Based Task Prediction (2309.15048v4)

Published 26 Sep 2023 in cs.LG, cs.AI, and cs.CV

Abstract: Class incremental learning (CIL) is a challenging setting of continual learning, which learns a series of tasks sequentially. Each task consists of a set of unique classes. The key feature of CIL is that no task identifier (or task-id) is provided at test time. Predicting the task-id for each test sample is a challenging problem. An emerging theory-guided approach (called TIL+OOD) is to train a task-specific model for each task in a shared network for all tasks based on a task-incremental learning (TIL) method to deal with catastrophic forgetting. The model for each task is an out-of-distribution (OOD) detector rather than a conventional classifier. The OOD detector can perform both within-task (in-distribution (IND)) class prediction and OOD detection. The OOD detection capability is the key to task-id prediction during inference. However, this paper argues that using a traditional OOD detector for task-id prediction is sub-optimal because additional information (e.g., the replay data and the learned tasks) available in CIL can be exploited to design a better and principled method for task-id prediction. We call the new method TPL (Task-id Prediction based on Likelihood Ratio). TPL markedly outperforms strong CIL baselines and has negligible catastrophic forgetting. The code of TPL is publicly available at https://github.com/linhaowei1/TPL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (89)
  1. Conditional channel gated networks for task-aware continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3931–3940, 2020.
  2. Ss-il: Separated softmax for incremental learning. In Proceedings of the IEEE/CVF International conference on computer vision, pp.  844–853, 2021.
  3. Expert gate: Lifelong learning with a network of experts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  3366–3375, 2017.
  4. Online continual learning on a contaminated data stream with blurry task boundaries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  9275–9284, 2022.
  5. Towards open set deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  1563–1572, 2016.
  6. Consistency is the key to further mitigating catastrophic forgetting in continual learning. In CoLLAs, 2022. URL https://api.semanticscholar.org/CorpusID:250425816.
  7. Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020.
  8. Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
  9. End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV), pp.  233–248, 2018.
  10. Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420, 2018.
  11. Using hindsight to anchor past knowledge in continual learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  6993–7001, 2021.
  12. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385, 2021.
  13. Vos: Learning what you don’t know by virtual outlier synthesis. In International Conference on Learning Representations, 2021.
  14. Compositional visual generation with energy based models. Advances in Neural Information Processing Systems, 33:6637–6647, 2020.
  15. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pp. 1050–1059. PMLR, 2016.
  16. Online continual learning through mutual information maximization. In International Conference on Machine Learning, pp. 8109–8126. PMLR, 2022.
  17. Dealing with cross-task class discrimination in online continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11878–11887, 2023.
  18. Embracing change: Continual learning in deep neural networks. Trends in cognitive sciences, 24(12):1028–1040, 2020.
  19. Lifelong machine learning with deep streaming linear discriminant analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp.  220–221, 2020.
  20. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  21. Masked autoencoders are scalable vision learners. CoRR, abs/2111.06377, 2021. URL https://arxiv.org/abs/2111.06377.
  22. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016.
  23. Deep anomaly detection with outlier exposure. arXiv preprint arXiv:1812.04606, 2018.
  24. Scaling out-of-distribution detection for real-world settings. arXiv preprint arXiv:1911.11132, 2019.
  25. Posterior meta-replay for continual learning. Advances in Neural Information Processing Systems, 34:14135–14149, 2021.
  26. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pp. 2790–2799. PMLR, 2019.
  27. On the importance of gradients for detecting distributional shifts in the wild. Advances in Neural Information Processing Systems, 34:677–689, 2021.
  28. Birt: Bio-inspired replay in vision transformers for continual learning. ArXiv, abs/2305.04769, 2023. URL https://api.semanticscholar.org/CorpusID:258557568.
  29. Class-incremental learning by knowledge distillation with adaptive feature consolidation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  16071–16080, 2022.
  30. Continual learning of natural language processing tasks: A survey. arXiv preprint arXiv:2211.12701, 2022.
  31. Achieving forgetting prevention and knowledge transfer in continual learning. Advances in Neural Information Processing Systems, 34:22443–22456, 2021a.
  32. Adapting bert for continual learning of a sequence of aspect sentiment classification tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  4746–4755, 2021b.
  33. Continual pre-training of language models. In The Eleventh International Conference on Learning Representations (ICLR-2023), 2023.
  34. Fearnet: Brain-inspired model for incremental learning. arXiv preprint arXiv:1711.10563, 2017.
  35. A multi-head model for continual learning via out-of-distribution replay. In Conference on Lifelong Learning Agents, pp.  548–563. PMLR, 2022a.
  36. A theoretical study on solving continual learning. In Advances in Neural Information Processing Systems, 2022b.
  37. Learnability and algorithm for continual learning. ICML-2023, 2023.
  38. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
  39. Convolutional deep belief networks on cifar-10. Unpublished manuscript, 40(7):1–9, 2010.
  40. Learning multiple layers of features from tiny images. 2009.
  41. Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7:7, 2015.
  42. Overcoming catastrophic forgetting with unlabeled data in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  312–321, 2019.
  43. Training confidence-calibrated classifiers for detecting out-of-distribution samples. In International Conference on Learning Representations, 2018a.
  44. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018b.
  45. Continual few-shot intent detection. In Proceedings of the 29th International Conference on Computational Linguistics, pp.  333–343, 2022.
  46. Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690, 2017.
  47. Flats: Principled out-of-distribution detection with feature-based likelihood ratio score. ArXiv, abs/2310.05083, 2023. URL https://api.semanticscholar.org/CorpusID:263831173.
  48. Energy-based out-of-distribution detection. Advances in Neural Information Processing Systems, 33:21464–21475, 2020a.
  49. Mnemonics training: Multi-class incremental learning without forgetting. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020b.
  50. Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017.
  51. Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  7765–7773, 2018.
  52. Ternary feature masks: zero-forgetting for task-incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  3570–3579, 2021.
  53. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pp. 109–165. Elsevier, 1989.
  54. Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136, 2018.
  55. Out-of-distribution detection with subspace techniques and probabilistic modeling of features. arXiv preprint arXiv:2012.04250, 2020.
  56. Ix. on the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231(694-706):289–337, 1933.
  57. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
  58. itaml: An incremental task-agnostic meta-learning approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  13588–13597, 2020.
  59. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp.  2001–2010, 2017.
  60. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
  61. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
  62. Overcoming catastrophic forgetting with hard attention to the task. In International Conference on Machine Learning, pp. 4548–4557. PMLR, 2018.
  63. Class-incremental learning based on label generation. arXiv preprint arXiv:2306.12619, 2023.
  64. React: Out-of-distribution detection with rectified activations. Advances in Neural Information Processing Systems, 34:144–157, 2021.
  65. Out-of-distribution detection with deep nearest neighbors. arXiv preprint arXiv:2204.06507, 2022.
  66. Csi: Novelty detection via contrastive learning on distributionally shifted instances. Advances in neural information processing systems, 33:11839–11852, 2020.
  67. Variable kernel density estimation. The Annals of Statistics, pp.  1236–1265, 1992.
  68. On mixup training: Improved calibration and predictive uncertainty for deep neural networks. Advances in Neural Information Processing Systems, 32, 2019.
  69. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, pp. 10347–10357. PMLR, 2021.
  70. Open-set recognition: A good closed-set classifier is all you need? In International Conference on Learning Representations (ICLR), 2022.
  71. Continual learning with hypernetworks. arXiv preprint arXiv:1906.00695, 2019.
  72. Beef: Bi-compatible class-incremental learning via energy-based expansion and fusion. In The Eleventh International Conference on Learning Representations, 2022a.
  73. Foster: Feature boosting and compression for class-incremental learning. In European conference on computer vision, pp.  398–414. Springer, 2022b.
  74. Vim: Out-of-distribution with virtual-logit matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4921–4930, 2022c.
  75. A comprehensive survey of continual learning: Theory, method and application, 2023.
  76. Cmg: A class-mixed generation approach to out-of-distribution detection. Proceedings of ECML/PKDD-2022, 2022d.
  77. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  139–149, 2022e.
  78. Mitigating neural network overconfidence with logit normalization. 2022.
  79. Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
  80. Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3):37–52, 1987.
  81. Supermasks in superposition. Advances in Neural Information Processing Systems, 33:15173–15184, 2020.
  82. Tkil: Tangent kernel optimization for class balanced incremental learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  3529–3539, 2023.
  83. Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3014–3023, 2021.
  84. Openood: Benchmarking generalized out-of-distribution detection. 2022.
  85. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  6023–6032, 2019.
  86. Continual learning of context-dependent processing in neural networks. Nature Machine Intelligence, 1(8):364–372, 2019.
  87. Mixture outlier exposure: Towards out-of-distribution detection in fine-grained environments. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  5531–5540, 2023.
  88. Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need. arXiv preprint arXiv:2303.07338, 2023.
  89. Prototype augmentation and self-supervision for incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  5871–5880, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Haowei Lin (21 papers)
  2. Yijia Shao (18 papers)
  3. Weinan Qian (1 paper)
  4. Ningxin Pan (2 papers)
  5. Yiduo Guo (11 papers)
  6. Bing Liu (212 papers)
Citations (7)
Github Logo Streamline Icon: https://streamlinehq.com