Analytic Continual Unlearning (ACU)

Updated 23 June 2026

Analytic Continual Unlearning (ACU) is a class of gradient-free, mathematically grounded procedures that enable precise, efficient removal of specific data from machine learning models without retraining.
ACU leverages closed-form updates through techniques like ridge regression, dual-teacher distillation, and unified KL-divergence minimization to ensure exact unlearning while preserving performance.
ACU methods guarantee data privacy and computational efficiency, applying to both shallow and deep architectures with provable fidelity to retraining baselines.

Analytic Continual Unlearning (ACU) is a class of gradient-free, mathematically grounded procedures for sequentially, efficiently, and exactly forgetting specific data or knowledge from machine learning models without violating data privacy or incurring performance degradation on retained information. ACU methods have emerged to address the limitations of prior "machine unlearning" approaches in continual learning (CL) regimes, particularly under constraints where historical data cannot be revisited after ingestion, and unlearning requests may be frequent, adversarial, or cumulative. Core works in this domain provide fully analytic, closed-form solutions to continual unlearning—rooted in linear algebra, least-squares regression, and controlled knowledge distillation, and are applicable to neural, analytic, and modular architectures (Tang et al., 18 May 2025, Chatterjee et al., 2024, Huang et al., 21 May 2025, Gao et al., 2024).

1. Problem Formulation and Privacy Constraints

Continual Learning (CL) aims to incrementally incorporate knowledge from a stream of disjoint tasks by sequentially updating the model. In standard CL, once the training data of a task is ingested, it is discarded and cannot be revisited due to privacy, storage, or policy constraints. Continual Unlearning (CU) formalizes the challenge of allowing post-hoc deletion of specific data—at the granularity of individual samples, classes, or distributions—while maintaining fidelity on all remaining knowledge.

Key requirements that distinguish ACU from prior “single-shot” or retrain-based unlearning approaches:

No Access to Retained Data: At each step, only the current forget set is available; all other data are irrecoverably discarded.
Efficient, Cumulative Operation: The unlearning algorithm must remain computationally efficient regardless of the number or frequency of requests, and accuracy on retained data must not degrade under repeated unlearning.
Exactness: The resulting model should match the (infeasible) retrained-from-scratch baseline on the remaining data set, up to numerical precision.
Data Privacy: Internal data structures cannot leak or permit reconstruction of the original data; privacy against membership inference and data extraction is required (Tang et al., 18 May 2025).

2. Core Analytic Frameworks

2.1 Analytic Ridge Regression Unlearning

The “ACU” architecture introduced in (Tang et al., 18 May 2025) is grounded in ridge-regularized least-squares classifiers operating over frozen features. Training proceeds by extracting features $f_j$ and labels $y_j$ , then solving

$W_0 = \left( \sum_{j \in D} f_j^\top f_j + \gamma I \right)^{-1} \sum_{j \in D} f_j^\top y_j$

where $W_0$ is the starting classifier after the CL phase.

When a batch $\mathcal{D}_i$ is to be forgotten, the analytic knowledge-tracker matrix $T_{i-1}$ and weights $W_{i-1}$ are updated via Woodbury identities:

$\begin{aligned} T_i & = T_{i-1} + T_{i-1} \hat{F}_i^\top (I - \hat{F}_i T_{i-1} \hat{F}_i^\top )^{-1} \hat{F}_i T_{i-1} \ W_i & = \left[ I + T_i \left( \sum_{j \in \mathcal{D}_i} f_j^\top f_j \right) \right] W_{i-1} - T_i \left( \sum_{j \in \mathcal{D}_i} f_j^\top y_j \right) \end{aligned}$

with $\hat{F}_i$ the to-be-forgotten feature matrix. This update provably yields $W_i$ equivalent to retraining on the retained set, without ever revisiting or storing it.

2.2 Controlled Distillation in Deep Networks

In (Chatterjee et al., 2024), analytic continual unlearning is formalized for deep models using a sequence of teacher–student distillation losses.

A dual-teacher mechanism is used: a CL-teacher representing retained knowledge, and a “bad” unlearning teacher (randomly initialized) representing the desired state for forgotten data.
The loss for unlearning requests mixes KL-divergence terms that both “pull” the model away from data being forgotten and "preserve" responses on retained data, formulated as:

$y_j$ 0

where $y_j$ 1 indicates whether $y_j$ 2 is in the forget set or buffer.

The system leverages a fixed-size memory buffer with reservoir sampling for continual learning requests and explicit purging for unlearning, allowing bounded, analytically characterized trade-offs between utility and unlearning efficacy.

2.3 Unified Optimization Theory for CL-Unlearning

(Huang et al., 21 May 2025) presents ACU as a unified Kullback-Leibler divergence minimization:

$y_j$ 3

with explicit decomposition into learning, unlearning, replay (retention), and saliency modulation terms using second-order expansions. The step is:

$y_j$ 4

with Hessian and saliency-based adjustments for stable and plastic unlearning.

2.4 Modular Orthogonal Adapter Unlearning

For LLMs, (Gao et al., 2024) describes the OOO (Orthogonal LoRA + OOD detector) framework. Each unlearning request instantiates a rank- $y_j$ 5 LoRA module per block, trained with:

$y_j$ 6

where $y_j$ 7 enforces strict orthogonality between adapters for different unlearning requests, thus preserving prior removals. A glocal-aware out-of-distribution detector (contrastive entropy + Mahalanobis/cosine-layer scoring) determines activation of unlearning adapters during inference. No retained data are required; operations are data-disjoint and privacy-compliant.

3. Algorithmic Procedures and Theoretical Guarantees

3.1 Pseudocode Summary

A typical ACU step (ridge-based, (Tang et al., 18 May 2025)):

$W_0$ 1

Empirically, each unlearning request is handled in linear time with respect to the forget set size and feature dimension. The underlying matrices $y_j$ 8 and $y_j$ 9 are of minimal dimension, and do not leak information about past data.

3.2 Theoretical Properties

Exactness: $W_0 = \left( \sum_{j \in D} f_j^\top f_j + \gamma I \right)^{-1} \sum_{j \in D} f_j^\top y_j$ 0 matches the sub-sampled ridge regression solution, identical to full retraining on the retained set (Tang et al., 18 May 2025).
No Historical Data Access: Only needs current forget set plus current ( $W_0 = \left( \sum_{j \in D} f_j^\top f_j + \gamma I \right)^{-1} \sum_{j \in D} f_j^\top y_j$ 1, $W_0 = \left( \sum_{j \in D} f_j^\top f_j + \gamma I \right)^{-1} \sum_{j \in D} f_j^\top y_j$ 2).
Privacy: The knowledge-tracker $W_0 = \left( \sum_{j \in D} f_j^\top f_j + \gamma I \right)^{-1} \sum_{j \in D} f_j^\top y_j$ 3 cannot be inverted to reconstruct individual features or samples.
Interpretable Decomposition: Each update is explicitly split into “amplification” of remaining knowledge and erasure of forgotten knowledge.
Trade-off Characterization: Methods provide an analytic characterization of the trade-off between buffer size, utility retention, and unlearning sharpness, e.g. $W_0 = \left( \sum_{j \in D} f_j^\top f_j + \gamma I \right)^{-1} \sum_{j \in D} f_j^\top y_j$ 4 for buffer size $W_0 = \left( \sum_{j \in D} f_j^\top f_j + \gamma I \right)^{-1} \sum_{j \in D} f_j^\top y_j$ 5 (Chatterjee et al., 2024).

4. Empirical Evaluation and Results

4.1 Experimental Protocols

Representative studies (Tang et al., 18 May 2025, Chatterjee et al., 2024) use:

Datasets: CIFAR-10, CIFAR-100, ciFAIR-10.
Evaluation over 5 to 25 sequential unlearning events.
Metrics:
- Parameter gap: $W_0 = \left( \sum_{j \in D} f_j^\top f_j + \gamma I \right)^{-1} \sum_{j \in D} f_j^\top y_j$ 6.
- Accuracy gaps: $W_0 = \left( \sum_{j \in D} f_j^\top f_j + \gamma I \right)^{-1} \sum_{j \in D} f_j^\top y_j$ 7, $W_0 = \left( \sum_{j \in D} f_j^\top f_j + \gamma I \right)^{-1} \sum_{j \in D} f_j^\top y_j$ 8, $W_0 = \left( \sum_{j \in D} f_j^\top f_j + \gamma I \right)^{-1} \sum_{j \in D} f_j^\top y_j$ 9.
- Membership inference attack robustness.
- Cumulative runtime for all unlearning requests.

4.2 Key Quantitative Findings

Method	Δ_Params	Δ_Retain	Δ_Forget	Δ_Test	Δ_MIA
Finetune	35.4	6.1	4.2	4.2	0.04
L1-Sparsity	25.4	15.2	10.3	10.5	0.10
RandomLabel	26.8	3.3	2.5	3.9	0.03
ACU	0.00	0.00	0.00	0.00	0.00

ACU achieves zero gap on all metrics, matching the retrained-from-scratch baseline, while running 50–125× faster than state-of-the-art matrix-influence or distillation baselines, and over 10,000× faster than full retrain. On CIFAR-10 and CIFAR-100, accuracy on unlearned classes drops to exactly zero, with >97% retention on preserved classes with typical buffer sizes and minimal collateral utility loss (Tang et al., 18 May 2025, Chatterjee et al., 2024).

5. Privacy, Limitations, and Future Research

ACU methods guarantee privacy by design: no retained raw data or reconstructible statistics are ever stored beyond the analytic matrices or memory buffers, and membership inference attack rates match the retrained baseline. However, several limitations persist:

Frozen Backbones: In (Tang et al., 18 May 2025), only the analytic classifier weights are subject to unlearning; representation backbones remain unchanged. Unlearning at the feature learning level remains an open research direction.
Analytic Layer Scope: Current ACU approaches focus on single-layer or shallow analytic models; extending to kernelized or deep analytic networks is an open problem.
Buffer-Utility Trade-off: In methods employing replay buffers, larger buffers improve retention but can degrade unlearning sharpness; this log– $W_0$ 0 trade-off is analytically described (Chatterjee et al., 2024).
Generalization to Large Models: Recent advances extend ACU principles to modular architectures (e.g., orthogonal LoRAs for continual LLM unlearning), but full, end-to-end ACU in high-capacity models presents significant challenges (Gao et al., 2024).

Ongoing work aims to support end-to-end unlearning, introduce certified privacy (e.g., via differential privacy), and leverage higher-order analytic approximations for more expressive unlearning (Tang et al., 18 May 2025, Chatterjee et al., 2024, Huang et al., 21 May 2025).

6. Broader Applicability and Methodological Variants

ACU’s analytic framework has been instantiated in several practical and theoretical forms:

Gradient-Free Ridge Solutions: Suited for settings where features are fixed and historical data must remain private.
Dual-Teacher Distillation: Enables interpretable, exact unlearning for deep architectures with minimal retained data, supporting a spectrum of tasks and operational scenarios (Chatterjee et al., 2024).
Unified KL-Minimization: Provides a principled optimization-theoretic base for simultaneously handling continual learning, unlearning, and retention in a unified descent loop (Huang et al., 21 May 2025).
Orthogonal Subspace Allocation in Modular Models: Applied in LLMs, this approach allows each sequence of continual unlearning operations to be cleanly separated, supporting non-interfering, data-private, and composable forgetting (Gao et al., 2024).

In all cases, analytic continual unlearning provides an interpretable and efficiently computable solution for privacy-preserving, sequential, and exact knowledge erasure, advancing the state of the art in dynamic, compliant, and modular machine learning systems.

Markdown Report Issue Upgrade to Chat

References (4)

ACU: Analytic Continual Unlearning for Efficient and Exact Forgetting with Privacy Preservation (2025)

A Unified Framework for Continual Learning and Unlearning (2024)

A Unified Gradient-based Framework for Task-agnostic Continual Learning-Unlearning (2025)

On Large Language Model Continual Unlearning (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Analytic Continual Unlearning (ACU).

Analytic Continual Unlearning (ACU)

1. Problem Formulation and Privacy Constraints

2. Core Analytic Frameworks

2.1 Analytic Ridge Regression Unlearning

2.2 Controlled Distillation in Deep Networks

2.3 Unified Optimization Theory for CL-Unlearning

2.4 Modular Orthogonal Adapter Unlearning

3. Algorithmic Procedures and Theoretical Guarantees

3.1 Pseudocode Summary

3.2 Theoretical Properties

4. Empirical Evaluation and Results

4.1 Experimental Protocols

4.2 Key Quantitative Findings

5. Privacy, Limitations, and Future Research

6. Broader Applicability and Methodological Variants

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Analytic Continual Unlearning (ACU)

1. Problem Formulation and Privacy Constraints

2. Core Analytic Frameworks

2.1 Analytic Ridge Regression Unlearning

2.2 Controlled Distillation in Deep Networks

2.3 Unified Optimization Theory for CL-Unlearning

2.4 Modular Orthogonal Adapter Unlearning

3. Algorithmic Procedures and Theoretical Guarantees

3.1 Pseudocode Summary

3.2 Theoretical Properties

4. Empirical Evaluation and Results

4.1 Experimental Protocols

4.2 Key Quantitative Findings

5. Privacy, Limitations, and Future Research

6. Broader Applicability and Methodological Variants

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research