Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DS-AL: A Dual-Stream Analytic Learning for Exemplar-Free Class-Incremental Learning (2403.17503v1)

Published 26 Mar 2024 in cs.LG and cs.CV

Abstract: Class-incremental learning (CIL) under an exemplar-free constraint has presented a significant challenge. Existing methods adhering to this constraint are prone to catastrophic forgetting, far more so than replay-based techniques that retain access to past samples. In this paper, to solve the exemplar-free CIL problem, we propose a Dual-Stream Analytic Learning (DS-AL) approach. The DS-AL contains a main stream offering an analytical (i.e., closed-form) linear solution, and a compensation stream improving the inherent under-fitting limitation due to adopting linear mapping. The main stream redefines the CIL problem into a Concatenated Recursive Least Squares (C-RLS) task, allowing an equivalence between the CIL and its joint-learning counterpart. The compensation stream is governed by a Dual-Activation Compensation (DAC) module. This module re-activates the embedding with a different activation function from the main stream one, and seeks fitting compensation by projecting the embedding to the null space of the main stream's linear mapping. Empirical results demonstrate that the DS-AL, despite being an exemplar-free technique, delivers performance comparable with or better than that of replay-based methods across various datasets, including CIFAR-100, ImageNet-100 and ImageNet-Full. Additionally, the C-RLS' equivalent property allows the DS-AL to execute CIL in a phase-invariant manner. This is evidenced by a never-before-seen 500-phase CIL ImageNet task, which performs on a level identical to a 5-phase one. Our codes are available at https://github.com/ZHUANGHP/Analytic-continual-learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. A comprehensive study of class incremental learning algorithms for visual tasks. Neural Networks, 135: 38–54.
  2. End-to-End Incremental Learning. In Proceedings of the European Conference on Computer Vision (ECCV).
  3. Podnet: Pooled outputs distillation for small-tasks incremental learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, 86–102. Springer.
  4. Pseudoinverse learning algorithm for feedforward neural networks. Advances in Neural Networks and Applications, 321–326.
  5. Hayes, M. H. 1996. Statistical digital signal processing and modeling. John Wiley & Sons.
  6. Learning a Unified Classifier Incrementally via Rebalancing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  7. Less-forgetting learning in deep neural networks. arXiv preprint arXiv:1607.00122.
  8. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13): 3521–3526.
  9. Learning without Forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12): 2935–2947.
  10. Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting. In 2018 24th International Conference on Pattern Recognition (ICPR), 2262–2268.
  11. Online Hyperparameter Optimization for Class-Incremental Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(7): 8906–8913.
  12. Adaptive Aggregation Networks for Class-Incremental Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2544–2553.
  13. RMM: Reinforced Memory Management for Class-Incremental Learning. Advances in Neural Information Processing Systems, 34.
  14. Mnemonics Training: Multi-Class Incremental Learning Without Forgetting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  15. Progressive Voronoi Diagram Subdivision Enables Accurate Data-free Class-Incremental Learning. In The Eleventh International Conference on Learning Representations.
  16. FeTrIL: Feature Translation for Exemplar-Free Class-Incremental Learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 3911–3920.
  17. iCaRL: Incremental Classifier and Representation Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  18. FOSTER: Feature Boosting and Compression for Class-Incremental Learning. In Avidan, S.; Brostow, G.; Cissé, M.; Farinella, G. M.; and Hassner, T., eds., Computer Vision – ECCV 2022, 398–414. Cham: Springer Nature Switzerland. ISBN 978-3-031-19806-9.
  19. Large Scale Incremental Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  20. Class-Incremental Learning via Dual Augmentation. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems, volume 34, 14306–14318. Curran Associates, Inc.
  21. Prototype Augmentation and Self-Supervision for Incremental Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5871–5880.
  22. Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9296–9305.
  23. Blockwise Recursive Moore-Penrose Inverse for Network Learning. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 1–14.
  24. GKEAL: Gaussian Kernel Embedded Analytic Learning for Few-Shot Class Incremental Task. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7746–7755.
  25. ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection. In Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; and Oh, A., eds., Advances in Neural Information Processing Systems, volume 35, 11602–11614. Curran Associates, Inc.
Citations (12)

Summary

  • The paper introduces DS-AL, a novel method that redefines exemplar-free class-incremental learning as a Concatenated Recursive Least Squares (C-RLS) problem.
  • It achieves phase-invariant performance by updating main and compensation stream weights recursively without storing past exemplars.
  • The approach outperforms replay-based methods on benchmarks like CIFAR-100 and ImageNet, while effectively addressing under-fitting in analytic learning models.

The paper "DS-AL: A Dual-Stream Analytic Learning for Exemplar-Free Class-Incremental Learning" (DS-AL: A Dual-Stream Analytic Learning for Exemplar-Free Class-Incremental Learning, 26 Mar 2024) introduces a novel approach called Dual-Stream Analytic Learning (DS-AL) to address the significant challenge of catastrophic forgetting in exemplar-free class-incremental learning (EFCIL). EFCIL methods aim to learn new classes incrementally without storing or revisiting samples from previously learned classes, which is crucial for data privacy and resource-constrained environments. However, these methods often suffer from severe performance degradation compared to replay-based techniques that do store past samples. Existing analytic learning (AL) based CIL methods, while promising for EFCIL, can suffer from under-fitting due to their reliance on a single linear projection.

DS-AL tackles these issues through a dual-stream architecture:

  1. Main Stream: This stream provides an analytical, closed-form linear solution to the CIL problem. It redefines CIL as a Concatenated Recursive Least Squares (C-RLS) task.
    • Implementation:

      • Initially, a backbone network (e.g., ResNet) is trained using standard backpropagation (BP) on a base dataset containing the initial set of classes.
      • After BP training, the backbone weights WCNN\mathbf{W}_{\text{CNN}} are frozen. The original fully-connected classifier is replaced with a 2-layer AL-based network for subsequent incremental learning phases.
      • This AL-based network consists of a buffer layer (e.g., random projection XB\mathbf{X}_{\text{B}} mapping features to a higher dimension) followed by a linear classifier.
      • For the base phase (k=0k=0), features X0cnn\mathbf{X}_{0}^{\text{cnn}} extracted from the frozen backbone are passed through the buffer layer and an activation function σM\sigma_{\text{M}} (ReLU is used) to get XM,0\mathbf{X}_{\text{M},0}. The initial linear classifier weights W^M(0)\mathbf{\hat W}_{\text{M}^{(0)}} are computed using a regularized least squares solution:

        W^M(0)=(XM,0TXM,0+γI)1XM,0TY0train\mathbf{\hat W}_{\text{M}^{(0)}} = (\mathbf{X}_{\text{M},0}^{T}\mathbf{X}_{\text{M},0}+\gamma \mathbf{I})^{-1}\mathbf{X}_{\text{M},0}^{T}\mathbf{Y}_{0}^{\text{train}}

      • For subsequent incremental phases (k>0k > 0), new data {Xktrain,Yktrain}\{\mathbf{X}_{k}^{\text{train}}, \mathbf{Y}_{k}^{\text{train}}\} arrives. The C-RLS mechanism updates the weights W^M(k)\mathbf{\hat W}_{\text{M}^{(k)}} and an inverted auto-correlation matrix (iACM) RM,k\mathbf{R}_{\text{M},k} recursively, without needing past data samples X0:k1\mathbf{X}_{0:k-1}. Only W^M(k1)\mathbf{\hat W}_{\text{M}^{(k-1)}} and RM,k1\mathbf{R}_{\text{M},k-1} are carried forward. The update rules are:

        W^M(k)=W^M(k1)+RM,kXM,kT(YktrainXM,kW^M(k1))\mathbf{\hat W}_{\text{M}^{(k)}} = \mathbf{\hat W}_{\text{M}^{(k-1)\prime}} + \mathbf{ R}_{\text{M},k}\mathbf{X}_{\text{M},k}^{T}(\mathbf{ Y}_{k}^{\text{train}} - \mathbf{X}_{\text{M},k}\mathbf{\hat W}_{\text{M}^{(k-1)\prime}})

        RM,k=RM,k1RM,k1XM,kT(I+XM,kRM,k1XM,kT)1XM,kRM,k1\mathbf{R}_{\text{M},k} = \mathbf{R}_{\text{M},k-1} - \mathbf{R}_{\text{M},k-1}\mathbf{X}_{\text{M},k}^{T}(\mathbf{I} + \mathbf{X}_{\text{M},k}\mathbf{R}_{\text{M},k-1}\mathbf{X}_{\text{M},k}^{T})^{-1}\mathbf{X}_{\text{M},k}\mathbf{R}_{\text{M},k-1}

        where W^M(k1)=[W^M(k1)   0]\mathbf{\hat W}_{\text{M}^{(k-1)\prime}} = [\mathbf{\hat W}_{\text{M}^{(k-1)}} \ \ \ \mathbf{0}] pads the previous weight matrix to accommodate new classes.

    • Practical Implication: This C-RLS formulation allows the CIL process to be equivalent to joint training (training on all data from phase 0 to kk simultaneously) if the backbone is frozen. This leads to a "phase-invariant" behavior, where performance does not degrade significantly with an increasing number of incremental phases.
  2. Compensation Stream: This stream is designed to improve the fitting capability of the main stream, which might under-fit complex data due to its linear nature. It uses a Dual-Activation Compensation (DAC) module.
    • Implementation:

      • The compensation stream also uses the frozen backbone and the same buffer layer BB as the main stream, but employs a different activation function σC\sigma_{\text{C}} (e.g., Tanh, Mish) for its input embeddings XC,k\mathbf{X}_{\text{C},k}.
      • The "labels" for this stream are the residuals from the main stream:

        Y~k=[0  Yktrain]XM,kW^M(k)\mathbf{\tilde{Y}}_{k} = [\mathbf{0} \ \ \mathbf{Y}_{k}^{\text{train}}] - \mathbf{X}_{\text{M},k}\mathbf{\hat W}_{\text{M}^{(k)}}

        This residual represents the part of the data that the main stream failed to fit, effectively targeting the null space of the main stream's linear mapping.

      • A "Previous Label Cleansing" (PLC) step is applied to Y~k\mathbf{\tilde{Y}}_{k} to ensure that only the residuals corresponding to the current phase's classes are used for training the compensation stream for those new classes. This prevents false supervision for past classes.

        {Y~k}PLC=[0  (Y~k)new]\{\mathbf{\tilde{Y}}_{k}\}_{\text{PLC}} = [\mathbf{0} \ \ (\mathbf{\tilde{Y}}_{k})_{\text{new}}]

      • The compensation weights W^C(k)\mathbf{\hat W}_{\text{C}^{(k)}} and its iACM RC,k\mathbf{R}_{\text{C},k} are updated using a similar C-RLS mechanism as the main stream, but with XC,k\mathbf{X}_{\text{C},k} as input and {Y~k}PLC\{\mathbf{\tilde{Y}}_{k}\}_{\text{PLC}} as target.

    • Practical Implication: By using a different activation and targeting the main stream's residuals, the compensation stream can capture information missed by the main stream, thus improving overall model expressiveness and reducing under-fitting.

Overall DS-AL Process (Figure 1 in the paper):

  • (a) BP-based Training: Train CNN backbone on the base dataset.
  • (b) AL-based Re-training (Phase 0): Freeze backbone. Initialize main stream (W^M(0)\mathbf{\hat W}_{\text{M}^{(0)}}, RM,0\mathbf{R}_{\text{M},0}) and compensation stream (W^C(0)\mathbf{\hat W}_{\text{C}^{(0)}}, RC,0\mathbf{R}_{\text{C},0}) using base data. PLC is not applied for the compensation stream in this initial phase.
  • (c)-(d) AL-based CIL (Phase k>0k > 0): For new data, first update main stream to get W^M(k)\mathbf{\hat W}_{\text{M}^{(k)}} and RM,k\mathbf{R}_{\text{M},k}. Then calculate residuals, apply PLC, and update compensation stream to get W^C(k)\mathbf{\hat W}_{\text{C}^{(k)}} and RC,k\mathbf{R}_{\text{C},k}.

Inference:

The final prediction is a weighted sum of the outputs from both streams:

Y^k(all)=XM,kW^M(k)+CXC,kW^C(k)\mathbf{\hat Y}_{k}^{\text{(all)}} = \mathbf{X}_{\text{M},k}\mathbf{\hat W}_{\text{M}^{(k)}} + \mathcal{C}\mathbf{X}_{\text{C},k}\mathbf{\hat W}_{\text{C}^{(k)}}

where C\mathcal{C} is a hyperparameter called the compensation ratio, controlling the contribution of the compensation stream.

Key Contributions and Findings:

  • Novel EFCIL Method: DS-AL offers an analytical solution that is exemplar-free.
  • Equivalence to Joint Learning: The C-RLS in the main stream ensures that incremental training (with a frozen backbone) yields results identical to joint training on all seen data, thus mitigating catastrophic forgetting.
  • Overcoming Under-fitting: The DAC module in the compensation stream significantly improves the model's fitting power, addressing a key limitation of prior AL-based CIL methods.
  • State-of-the-Art Performance:
    • DS-AL achieves performance comparable to or better than existing replay-based methods, especially in scenarios with many incremental phases (K25K \ge 25).
    • It consistently outperforms other EFCIL methods across datasets like CIFAR-100, ImageNet-100, and ImageNet-Full.
  • Phase Invariance: The method demonstrates remarkable phase-invariant performance, achieving nearly identical results for a 5-phase CIL task and a 500-phase CIL task on ImageNet-Full. This is a significant advantage for real-world scenarios with continuous data arrival.
  • Hyperparameter Insights:
    • The choice of activation function σC\sigma_{\text{C}} for the compensation stream is important; Tanh was found to be effective.
    • The compensation ratio C\mathcal{C} needs tuning; optimal values tend to be higher for more complex datasets (e.g., ImageNet-Full) where under-fitting is more pronounced. It balances enhanced plasticity for new tasks against stability for old tasks.
  • Ablation Studies: Confirmed the positive contributions of both the DAC module and the PLC step. DAC improves performance over a single-stream C-RLS, and PLC further refines this by preventing incorrect supervision signals.

Implementation Considerations:

  • Computational Cost: The main computational overhead during incremental phases involves matrix multiplications and inversions for the RLS updates. The size of the iACM (RM,k\mathbf{R}_{\text{M},k}, RC,k\mathbf{R}_{\text{C},k}) is dB×dBd_B \times d_B, where dBd_B is the dimension of the buffer layer output. This can be significant if dBd_B is very large. However, it avoids retraining the entire network or iterating through optimizers.
  • Memory: Requires storing the weight matrices (W^M(k)\mathbf{\hat W}_{\text{M}^{(k)}}, W^C(k)\mathbf{\hat W}_{\text{C}^{(k)}}) and the iACMs (RM,k\mathbf{R}_{\text{M},k}, RC,k\mathbf{R}_{\text{C},k}). This is significantly less than storing past exemplars.
  • Backbone Choice: The performance is dependent on the quality of the features extracted by the frozen backbone. The backbone is only trained on the initial base classes.
  • Hyperparameter Tuning: γ\gamma (regularization), σC\sigma_C (compensation activation), and C\mathcal{C} (compensation ratio) are key hyperparameters requiring tuning.

The paper provides code at https://github.com/ZHUANGHP/Analytic-continual-learning, facilitating practical application and reproduction of the results. DS-AL presents a robust and effective solution for EFCIL, with strong theoretical grounding and empirical validation.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets