Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 167 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 106 tok/s Pro
Kimi K2 187 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding (2401.09067v1)

Published 17 Jan 2024 in cs.LG, cs.AI, and cs.CV

Abstract: Deep neural networks are susceptible to catastrophic forgetting when trained on sequential tasks. Various continual learning (CL) methods often rely on exemplar buffers or/and network expansion for balancing model stability and plasticity, which, however, compromises their practical value due to privacy and memory concerns. Instead, this paper considers a strict yet realistic setting, where the training data from previous tasks is unavailable and the model size remains relatively constant during sequential training. To achieve such desiderata, we propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. This is achieved by the synergy between two key components: HSIC-Bottleneck Orthogonalization (HBO) implements non-overwritten parameter updates mediated by Hilbert-Schmidt independence criterion in an orthogonal space and EquiAngular Embedding (EAE) enhances decision boundary adaptation between old and new tasks with predefined basis vectors. Extensive experiments demonstrate that our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision, 139–154.
  2. On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning. In Advances in Neural Information Processing Systems, volume 35, 31886–31901.
  3. Class-incremental continual learning into the extended der-verse. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5): 5497–5512.
  4. Dark experience for general continual learning: a strong, simple baseline. In Advances in Neural Information Processing Systems, volume 33, 15920–15930.
  5. New insights on reducing abrupt representation change in online continual learning. arXiv preprint arXiv:2104.05025.
  6. Co22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTL: Contrastive continual learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 9516–9525.
  7. On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486.
  8. Stochastic configuration networks with block increments for data modeling in process industries. Information Sciences, 484: 367–386.
  9. Flattening Sharpness for Dynamic Gradient Projection Memory Benefits Continual Learning. In Advances in Neural Information Processing Systems, volume 34, 18710–18721.
  10. Orthogonal gradient descent for continual learning. In International Conference on Artificial Intelligence and Statistics, 3762–3773. PMLR.
  11. Matrix Computations. JHU Press.
  12. Measuring statistical dependence with Hilbert-Schmidt norms. In International Conference on Algorithmic Learning Theory, 63–77. Springer.
  13. Adaptive orthogonal projection for batch and online continual learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 6783–6791.
  14. Online continual learning through mutual information maximization. In International Conference on Machine Learning, 8109–8126. PMLR.
  15. Remind your neural network to prevent catastrophic forgetting. In European Conference on Computer Vision, 466–483. Springer.
  16. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 8340–8349.
  17. Fix your classifier: the marginal value of training the last weight layer. In International Conference on Learning Representations.
  18. Continual learning by using information of each class holistically. In Proceedings of the AAAI Conference on Artificial Intelligence, 7797–7805.
  19. Dense network expansion for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11858–11867.
  20. Forget-free continual learning with winning subnetworks. In International Conference on Machine Learning, 10734–10750. PMLR.
  21. How Does Information Bottleneck Help Deep Learning? In International Conference on Machine Learning, 16049–16096. PMLR.
  22. Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning. In Advances in Neural Information Processing Systems, volume 34, 22443–22456.
  23. On the Stability-Plasticity Dilemma of Class-Incremental Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20196–20204.
  24. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13): 3521–3526.
  25. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases.
  26. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278–2324.
  27. Multi-view class incremental learning. arXiv preprint arXiv:2306.09675.
  28. IF2Net: Innately forgetting-free networks for continual learning. arXiv preprint arXiv:2306.10480.
  29. Complementary learning subnetworks for parameter-efficient class-incremental learning. arXiv preprint arXiv:2306.11967.
  30. CRNet: A fast continual learning framework with random theory. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–14.
  31. Self-supervised learning with kernel dependence maximization. Advances in Neural Information Processing Systems, 34: 15543–15556.
  32. Mnemonics training: Multi-class incremental learning without forgetting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12245–12254.
  33. Class-incremental exemplar compression for class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11371–11380.
  34. The HSIC bottleneck: Deep learning without back-propagation. In Proceedings of the AAAI conference on artificial intelligence, 5085–5092.
  35. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation, volume 24, 109–165. Elsevier.
  36. Hyperspherical prototype networks. In Advances in neural information processing systems, volume 32.
  37. Class-incremental learning with pre-allocated fixed classifiers. In 2020 25th International Conference on Pattern Recognition (ICPR), 6259–6266. IEEE.
  38. On variational bounds of mutual information. In International Conference on Machine Learning, 5171–5180. PMLR.
  39. GDumb: A simple approach that questions our progress in continual learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, 524–540. Springer.
  40. Better generative replay for continual federated learning. arXiv preprint arXiv:2302.13001.
  41. Random path selection for incremental learning. In Advances in Neural Information Processing Systems, volume 32, 12669–12679.
  42. Progressive neural networks. arXiv preprint arXiv:1606.04671.
  43. Gradient projection memory for continual learning. In International Conference on Learning Representations.
  44. Progress & compress: A scalable framework for continual learning. In International Conference on Machine Learning, 4528–4537.
  45. Overcoming catastrophic forgetting with hard attention to the task. In International Conference on Machine Learning, 4548–4557. PMLR.
  46. Equiangular Basis Vectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11755–11765.
  47. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 1310–1321.
  48. CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11909–11919.
  49. Feature Selection via Dependence Maximization. Journal of Machine Learning Research, 13(5).
  50. Incremental learning of structured memory via closed-loop transcription. In International Conference on Learning Representations.
  51. FOSTER: Feature boosting and compression for class-incremental learning. In European Conference on Computer Vision, 398–414. Springer.
  52. Training networks in null space of feature covariance for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 184–193.
  53. Learning with Hilbert–Schmidt independence criterion: A review and new perspectives. Knowledge-based systems, 234: 107567.
  54. Revisiting Hilbert-Schmidt information bottleneck for adversarial robustness. In Advances in Neural Information Processing Systems, volume 34, 586–597.
  55. DualHSIC: HSIC-Bottleneck and Alignment for Continual Learning. arXiv preprint arXiv:2305.00380.
  56. DualPrompt: Complementary prompting for rehearsal-free continual learning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVI, 631–648. Springer.
  57. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 139–149.
  58. Continual learning with guarantees via weight interval constraints. In International Conference on Machine Learning, 23897–23911. PMLR.
  59. Large scale incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 374–382.
  60. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.
  61. Reinforced continual learning. In Advances in Neural Information Processing Systems, volume 31.
  62. DER: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3014–3023.
  63. Dynamic Support Network for Few-shot Class Incremental Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3): 2945–2951.
  64. Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network? In Advances in Neural Information Processing Systems, volume 35, 37991–38002.
  65. Neural collapse inspired feature-classifier alignment for few-shot class incremental learning. In International Conference on Machine Learning. PMLR.
  66. Scalable and order-robust continual learning with additive parameter decomposition. In International Conference on Learning Representations.
  67. Lifelong learning with dynamically expandable networks. In International Conference on Machine Learning. PMLR.
  68. Continual learning of context-dependent processing in neural networks. Nature Machine Intelligence, 1(8): 364–372.
  69. Continual learning through synaptic intelligence. In International Conference on Machine Learning, 3987–3995.
  70. Class-incremental learning via deep model consolidation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 1131–1140.
  71. Forward compatible few-shot class-incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 9046–9056.
  72. A model or 603 exemplars: Towards memory-efficient class-incremental learning. In International Conference on Learning Representations.
  73. ACIL: Analytic class-incremental learning with absolute memorization and privacy protection. In Advances in Neural Information Processing Systems, volume 35, 11602–11614.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: