Erase to Retain: Constrained Erasure Methods
- Erase to Retain is a design principle that uses targeted removal to protect valuable system properties across domains like machine unlearning, medical imaging, and storage.
- It employs retention-aware optimization to achieve strong forgetting of undesired data while maintaining model performance, image quality, or hardware reliability.
- This approach underpins methods that balance erasure strength with retained utility, addressing challenges in privacy, robustness, and data integrity.
“Erase to Retain” denotes a recurrent design principle in which selective removal is used to preserve a more valuable property of a system. In machine unlearning, the removed object is typically a forget set whose influence should disappear while retain-set utility remains high; in medical imaging, the removal target may be a mixed-anatomy bias, a lesion-specific representation, or a patient-specific contribution; in diffusion and multimodal systems, the target may be a harmful concept or redundant visual tokens; in databases and storage systems, the target may be private data that must be deleted without leaving inferable or recoverable remnants. Across these settings, the common pattern is constrained erasure: the system is modified so that forgetting is strong where required, but drift on retained content, retained functionality, or retained reliability is explicitly limited (Xue et al., 2024, Lee et al., 5 Feb 2026, Chakraborty et al., 1 Jul 2025, Cho et al., 2024).
1. Formal problem statements and recurring abstractions
In the standard machine-unlearning formulation, a dataset is split into a forget set and a retain set. “Erase at the Core” defines a training set , a forget set , and a retain set ; given an original model trained on all of , the goal is an unlearned model whose behavior approximates a retrained model trained only on (Lee et al., 5 Feb 2026). “Erase to Enhance” instantiates the same pattern for MRI reconstruction with as brain MRI, as knee MRI, 0 as the model trained on 1, 2 as the oracle trained only on 3, and 4 as the unlearned model (Xue et al., 2024).
Within this template, the “retain” side is not an auxiliary concern but part of the objective. “Decoupled Distillation to Erase” decomposes an unlearning KL objective into a forgetting term and a retention term, arguing that prior class-centric methods often optimize the former while under-supervising the latter (Zhou et al., 31 Mar 2025). “PURGE” formulates retain preservation at the update level by projecting a forget gradient away from directions that would increase retain loss, adapting A-GEM-style gradient projection from continual learning to machine unlearning (Jawandhia et al., 2 Jun 2026).
Outside neural unlearning, the same logic appears in stricter semantic forms. “Meaningful Data Erasure in the Presence of Dependencies” defines Pre-insertion Post Erasure Equivalence (P2E2), under which inferences possible after erasure must be no stronger than those possible before the erased cell was inserted (Chakraborty et al., 1 Jul 2025). In recommender systems, “ERASE -- A Real-World Aligned Benchmark for Unlearning in Recommender Systems” distinguishes exact unlearning, where the unlearned model distribution matches retraining on retained data, from approximate unlearning via a two-sided 5 condition (Lubitzsch et al., 9 Mar 2026). These formulations differ in mathematical detail, but they share the same structure: erasure is evaluated relative to a retained reference, not in isolation.
2. Retention-aware optimization in machine unlearning
Retention-aware unlearning methods make the preservation objective explicit in the loss, the update rule, or both. “Decoupled Distillation to Erase” studies class-centric unlearning without access to remaining data and without intervention during pretraining. Its central decomposition writes the KL objective as a forgetting loss over the forgotten class and a retention loss over the remaining classes. DELETE then refines the retention term using dark knowledge and a mask distillation mechanism that zeroes the forgotten logit while preserving the relative probability structure among non-forgotten classes. The resulting target is
6
which the paper presents as a way to erase the target class while retaining the pretrained model’s “worldview” over the remaining classes (Zhou et al., 31 Mar 2025).
“PURGE: Projected Unlearning via Retain-Guided Erasure” enforces retention through projected optimization. If the forget-direction gradient 7 conflicts with the retain gradient 8, PURGE replaces it with
9
so that the update is retain-safe to first order (Jawandhia et al., 2 Jun 2026). PURGE further replaces a uniform forget target with a retain-confusion target,
0
arguing that the natural confusion pattern of retained data is harder for membership inference attacks to distinguish from retraining than a uniform target. Its reported results span five datasets and 22 class-level forgetting tasks, with retain accuracy above 96% on all five datasets and MIA AUROC near 0.5 on four out of five datasets (Jawandhia et al., 2 Jun 2026).
A related line moves the preservation constraint from weights to activations. “BARRIER: Bounded Activation Regions for Robust Information Erasure” uses truncated SVD on forget-set activations, interval arithmetic on the projected space, and a protection loss
1
to bound non-target drift (Miksa et al., 15 May 2026). Its theorem gives a probabilistic tail bound,
2
making preservation a formal optimization target rather than an empirical afterthought. In this view, “erase to retain” is not merely a balance of scalar metrics; it is a constrained geometry in which forgetting is allowed only inside a delimited region of representation space (Miksa et al., 15 May 2026).
3. Medical imaging: hallucination removal, subject-level forgetting, and low-rank selective unlearning
Medical imaging provides especially direct instances of the principle because the unwanted signal is often entangled with clinically useful structure. In MRI reconstruction, “Erase to Enhance” reports that training on mixed-organ data can create hallucinations and reduced image quality in the reconstructed data. Using FastMRI multi-coil data, the paper trains an E2E-VarNet with 12 cascades, a sensitivity map estimation module, and about 29.9M parameters on brain and knee data under equispaced Cartesian undersampling with acceleration rate 8x and center fraction 0.04. The oracle trained on brain only performs better on brain and worse on knee, while the mixed model performs better on knee but slightly worse on brain; the roughly 5 dB gap on knee data is used as evidence of substantial distribution dissimilarity between brain and knee anatomy (Xue et al., 2024).
The unlearning study in that paper treats hallucinations near structures such as the corpus callosum as a proxy exemplar of undesired data influence. Fine-tuning alone and Full FT slightly improve brain performance but do not forget knee data; GA-3 and noisy labeling reduce knee performance strongly but severely damage brain performance; combined unlearning plus fine-tuning, especially NL-FT, best balances forgetting and retention (Xue et al., 2024). A notable finding is data-efficient unlearning: retain subsets of 1%, 5%, 10%, 20%, 50%, and 100% were tested, and using all retain data does not always improve results. The paper therefore argues that high-performance unlearning can be achieved with only partial retain access (Xue et al., 2024).
A second medical line shifts from reconstruction to dense prediction. “Erase to Retain: Low Rank Adaptation Guided Selective Unlearning in Medical Segmentation Networks” uses a teacher-student distillation paradigm with LoRA-constrained updates in a U-Net-style segmentation network. For a weight matrix 4, the student uses
5
with only 6 and 7 trainable, rank 8, and about 2.93% trainable parameters (Datta et al., 20 Nov 2025). The retain objective combines supervised loss, knowledge distillation, and a guard loss, while the forget objective uses all-background supervision for segmentation or class-randomizing objectives for classification. The method is explicitly two-phase: strong unlearning on 9, then gentle restoration on 0 (Datta et al., 20 Nov 2025). On ISIC segmentation, the student reduces forget-set IoU from 0.8752 to 0.5091 while maintaining retain and validation IoU at 0.6478 and 0.6771; on ISIC classification, retain accuracy improves from 0.8393 to 0.9058 while forget accuracy decreases from 0.8700 to 0.6413 (Datta et al., 20 Nov 2025).
Subject-level forgetting in 3D segmentation sharpens the privacy interpretation further. “To forget is to preserve: Machine Unlearning for 3D medical image segmentation” studies MRBrainS18 with 10 subjects, 2 forget subjects, and 8 retain subjects using a Med3D-pretrained 3D ResNet-50. The paper evaluates several approximate unlearning strategies over 20 and 50 epochs using Dice similarity coefficient and MAE, and reports that the Noisy Label strategy had the best overall trade-off with a decrease of 93% in the forget set while maintaining 84% accuracy for the retained set after 50 epochs (Singh et al., 15 Jun 2026). The contrast across methods is instructive: Retain Label / Max Loss can erase the forget set but catastrophically degrades retain performance at longer horizons; feature-space methods such as Learn Others and Learn Noise collapse the model; Fix Decoder becomes unstable and can reverse the intended forget-retain separation (Singh et al., 15 Jun 2026). In this setting, “erase to retain” is literally framed as a mechanism for GDPR-style subject deletion without full retraining.
4. Beyond logit suppression: representation erasure, shallow alignment, and faithful forgetting
A major controversy in recent unlearning work is whether output-level forgetting is sufficient. “Erase at the Core” argues that many approximate methods exhibit superficial forgetting: near-zero forget accuracy at the logit level while intermediate representations remain highly similar to those of the original model (Lee et al., 5 Feb 2026). To address this, the paper attaches auxiliary modules to multiple layers and combines contrastive unlearning on the forget set with retain-set cross-entropy at each supervision point. For layer 1, it defines normalized embeddings
2
a layer-wise contrastive unlearning loss, a layer-wise retain CE loss, and a weighted total objective over depths with 3 and 4 (Lee et al., 5 Feb 2026). The paper evaluates CKA, IDI, and downstream k-NN transfer, and reports that EC achieves near-zero forget accuracy while obtaining the lowest CKA and the smallest 5 among utility-preserving methods (Lee et al., 5 Feb 2026).
A related criticism appears in large-language-model unlearning. “Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning” argues that widely used methods often create shallow unlearning alignment rather than faithful erasure. The paper defines neuron attribution
6
and shows that after unlearning, positive influence is often not sufficiently reduced while negative influence increases, indicating spurious unlearning neurons that hide target knowledge instead of removing it (Yang et al., 26 Sep 2025). SSiUU adds attribution-guided regularization to the base unlearning loss, suppressing the growth of negative influence rather than eliminating all negative attribution. On FaithUn, the method reports FS = 0.0 with high RS and US, and substantially better robustness than baselines under both harmful and benign retraining scenarios; in the harmful attack analysis, the correlation between pre-attack and post-attack attribution distributions is highest for SSiUU, with 7 (Yang et al., 26 Sep 2025).
These papers correct a common misconception. Low forget accuracy, low immediate vulnerability, or visually plausible behavior after editing do not by themselves imply that the forgotten information is absent. “Erase at the Core” frames the residual problem as representational similarity; SSiUU frames it as shallow alignment supported by newly created suppressive neurons; BARRIER frames it as uncontrolled functional drift outside the target region (Lee et al., 5 Feb 2026, Yang et al., 26 Sep 2025, Miksa et al., 15 May 2026). The shared implication is that faithful “erase to retain” behavior requires intervention at the level where the unwanted information is actually encoded.
5. Generative, multimodal, and recurrent-memory interpretations
In generative models, the principle appears as a robustness-retention trade-off. “AEGIS: Adversarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models” argues that prior concept-erasure methods usually choose a target that is too narrow: erasing one prompt instance leaves the broader concept class partially intact, while stronger edits damage unrelated content (Li et al., 6 Feb 2026). AEGIS introduces an Adversarial Erasure Target in prompt-embedding space and a Gradient Regularization Projection mechanism that removes the component of the retention gradient that actively opposes erasure when 8. The method is retention-data-free because it regularizes parameters toward the original checkpoint rather than requiring an auxiliary retain dataset (Li et al., 6 Feb 2026). Across nudity, Van Gogh style, and object concepts such as Church, with robustness measured by ASR under P4D, UnlearnDiffAtk, and Ring-A-Bell and retention measured by FID and CLIP score, the paper reports that AEGIS generally achieves the best or near-best trade-off (Li et al., 6 Feb 2026).
Single-stream diffusion transformers pose a more structural version of the same problem. “Z-Erase: Enabling Concept Erasure in Single-Stream Diffusion Transformers” argues that prior erasure methods collapse because text and image tokens share the same self-attention pathway. With the unified sequence
9
and shared projections 0, 1, and 2, directly editing the concept path damages the image-synthesis path (Jiang et al., 26 Mar 2026). Z-Erase therefore introduces a Stream Disentangled Concept Erasure Framework using a selection operator
3
so that low-rank updates are injected only into textual hidden states, followed by Lagrangian-Guided Adaptive Erasure Modulation that balances erasure and preservation through a constrained update direction (Jiang et al., 26 Mar 2026). The paper proves convergence to a Pareto stationary point and reports state-of-the-art performance across NSFW erasure, celebrity erasure, miscellaneous concept erasure, and adversarial prompt robustness on Z-Image Turbo and HunyuanImage-3.0 (Jiang et al., 26 Mar 2026).
The idea also extends beyond safety unlearning to efficient multimodal inference. “ERASE: Eliminating Redundant Visual Tokens via Adaptive Two-Stage Token Pruning” removes redundant tokens so that salient tokens are retained. Stage 1 computes local patch entropy,
4
and keeps the top-5 highest-entropy patches according to a complexity-aware retention ratio; Stage 2 prunes the remaining tokens using text-to-vision attention at an adaptively selected decoder layer (Lee et al., 11 May 2026). The paper reports that a 2K image produces about 3K tokens and a 4K image about 16K tokens, and that on Qwen2.5-VL-7B at an 85% token pruning ratio ERASE retains 89.46% of the original model accuracy, whereas the best prior method retains 78.19% (Lee et al., 11 May 2026). Here, “erase to retain” is computational rather than privacy-driven: redundancy is removed in order to preserve accuracy under severe token reduction.
A more architectural interpretation appears in linear attention. “Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention” studies fixed-size recurrent memory, where the difficulty is not only what to forget but how to edit compressed memory without scrambling existing associations. Its central recurrence introduces a channel-wise erase gate 6 and a channel-wise write gate 7: 8 The paper argues that decoupling erase on the key/read side from write on the value side improves long-context retrieval and interference-heavy memory tasks, and reports the strongest overall results among Mamba-2, Gated DeltaNet, KDA, and Mamba-3 variants at 1.3B parameters trained on 100B FineWeb-Edu tokens (Hatamizadeh et al., 21 May 2026). Although this is not a machine-unlearning method, it preserves the same principle: selective erasure is a prerequisite for stable retention in compressed state.
6. Privacy-preserving data systems, retrieval-augmented generation, and recommender benchmarks
In retrieval-augmented generation, “Learning to Erase Private Knowledge from Multi-Documents for Retrieval-Augmented LLMs” defines privacy erasure as rewriting retrieved documents so that private knowledge is removed while sufficient public knowledge is retained for downstream QA (Wang et al., 14 Apr 2025). Eraser4RAG first constructs a global knowledge graph from retrieved documents, splits it into private and public sub-graphs, conditions a Flan-T5 rewriter on both, and then optimizes the rewriter with PPO using
9
where 0 is public retention rate and 1 is private retention rate (Wang et al., 14 Apr 2025). On PopQA, TriviaQA, Natural Questions, and HotpotQA, the paper reports lower private retention, stronger privacy connection ratios, and strong public retention relative to abstraction and GPT-4o baselines; it also reports inference time of 0.8024 s, lower than the GPT-based alternatives (Wang et al., 14 Apr 2025). The central design choice is global, not document-local, reasoning: the system tries to prevent de-anonymization across documents, not merely redact isolated spans.
The database setting makes that idea exact. “Meaningful Data Erasure in the Presence of Dependencies” treats erasure as a semantic property in the presence of Relational Dependency Rules, direct and transitive inference, and time-varying dependency structure (Chakraborty et al., 1 Jul 2025). Its Opt-P2E2 problem seeks the minimum-cost set of additional cells whose deletion guarantees P2E2 for a target cell, and the paper proves that Opt-P2E2 is NP-Hard (Chakraborty et al., 1 Jul 2025). It then gives an exact ILP, an exact dependence-hypergraph algorithm, and an approximate greedy alternative, together with batching over a regulatory grace period and proactive retention-time scheduling for derived data. In this literature, “retain” does not mean model accuracy; it means preserving all remaining data that does not increase post-erasure inference beyond the pre-insertion state (Chakraborty et al., 1 Jul 2025).
Recommender systems add deployment realism. “ERASE -- A Real-World Aligned Benchmark for Unlearning in Recommender Systems” spans collaborative filtering, session-based recommendation, and next-basket recommendation; two practical scenarios, namely sensitive item unlearning and removal of poisonous data; seven unlearning algorithms; nine public datasets; and nine state-of-the-art models (Lubitzsch et al., 9 Mar 2026). A central contribution is sequential unlearning, modeled as a stream of deletion requests rather than a single large forget set. The benchmark executes more than 600 GB of reusable artifacts and releases more than a thousand model checkpoints; the total released artifact count is specified as 1,069 model checkpoints (Lubitzsch et al., 9 Mar 2026). The empirical picture is mixed: approximate unlearning can match retraining in some settings, but repeated unlearning exposes weaknesses in general-purpose methods, especially for attention-based and recurrent models, while recommender-specific approaches such as SCIF behave more reliably (Lubitzsch et al., 9 Mar 2026). This benchmarked perspective shows that “erase to retain” is not only an algorithmic aspiration but an operational criterion involving utility, efficiency, robustness, and request structure.
7. Physical storage, secure deletion media, and hardware lifetime
In storage systems, “erase to retain” can be literal. “AERO: Adaptive Erase Operation for Improving Lifetime and Performance of Modern NAND Flash-Based SSDs” argues that conventional erase operations use a fixed latency set for worst-case conditions, causing unnecessary stress and delay (Cho et al., 2024). AERO predicts near-optimal erase latency from fail-bit counts during Incremental Step Pulse Erasure, adds a shallow first erase attempt of 1 ms to obtain the necessary fail-bit information, and then safely reduces erase latency further by exploiting a large ECC-capability margin (Cho et al., 2024). Using 160 real 48-layer 3D TLC NAND chips, the paper reports 43% lifetime improvement over the conventional erase scheme without changing NAND flash chips, and a 34% average reduction in 99.9999th-percentile read latency over eleven real-world workloads (Cho et al., 2024). The retained quantity is hardware utility itself: lifetime, responsiveness, and reliability.
Secure deletion in NAND flash devices reinterprets retention as security and verifiability. “IoT Security: On-Chip Secure Deletion Scheme using ECC Modulation in IoT Appliances” contrasts off-chip block-level secure deletion, which cannot perform real-time trim operations, with on-chip page-level deletion, which can but risks program disturbance (Ahn et al., 2023). Its solution performs partial programming only on the spare-area ECC region, generating an ECC_SD value so that the page becomes ECC-uncorrectable; read failure is then used as real-time deletion verification (Ahn et al., 2023). The paper’s claim is not that deletion becomes costless, but that page-level, on-chip, immediately verifiable deletion can be achieved with significantly reduced program disturbance.
A broader family of NAND-flash privacy-destruction schemes appears in “Schemes for Privacy Data Destruction in a NAND Flash Memory” (Ahn et al., 2019). The paper starts from the observation that file-system deletion leaves privacy data in invalid pages and unmapped blocks because NAND is page-programmable but block-erasable. It then proposes three mechanisms for invalid pages: a partial overwriting scheme that exploits higher reachable programmed states, an SLC programming scheme that reprograms the invalid page into an SLC-like state, and a deletion duty pulse scheme that forces the threshold-voltage state into an unusable region (Ahn et al., 2019). The paper emphasizes that page program and block erase differ by roughly three orders of magnitude in practice, and models erase degradation as 2 relative to program degradation (Ahn et al., 2019). In this hardware literature, “erase to retain” means that prompt destruction of residual data is necessary to retain privacy, while selective page-level operations are used to retain endurance and manageable latency.
Across these storage papers, the same conceptual inversion recurs. Erasure is not only destructive; it is preservative. Adaptive erase timing preserves SSD lifetime and tail latency, ECC-modulated deletion preserves neighboring-page integrity while making deleted data unreadable, and invalid-page sanitization preserves privacy obligations more directly than deferred block erase (Cho et al., 2024, Ahn et al., 2023, Ahn et al., 2019). The term therefore names a broader systems doctrine: selective, targeted erasure is justified not by deletion alone, but by the retained property it protects.