Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning (2509.13755v1)

Published 17 Sep 2025 in cs.SE, cs.AI, and cs.CR

Abstract: While Code LLMs (CLMs) have demonstrated superior performance in software engineering tasks such as code generation and summarization, recent empirical studies reveal a critical privacy vulnerability: these models exhibit unintended memorization of sensitive training data, enabling verbatim reproduction of confidential information when specifically prompted. To address this issue, several approaches, including training data de-duplication and differential privacy augmentation, have been proposed. However, these methods require full-model retraining for deployed CLMs, which incurs substantial computational costs. In this paper, we aim to answer the following research question: Can sensitive information memorized by CLMs be erased effectively and efficiently? We conduct a pioneering investigation into erasing sensitive memorization in CLMs through machine unlearning - a post-hoc modification method that removes specific information from trained models without requiring full retraining. Specifically, we first quantify the memorization risks of sensitive data within CLM training datasets and curate a high-risk dataset of 50,000 sensitive memorized samples as unlearning targets. We study two widely used gradient ascent-based unlearning approaches: the vanilla and constraint-based methods, and introduce CodeEraser, an advanced variant that selectively unlearns sensitive memorized segments in code while preserving the structural integrity and functional correctness of the surrounding code. Extensive experiments on three families of CLMs, i.e., CodeParrot, CodeGen-Mono, and Qwen2.5-Coder, validate the effectiveness and efficiency of CodeEraser in erasing targeted sensitive memorization while maintaining model utility.

Summary

The paper introduces CodeEraser, a fine-grained unlearning method that selectively erases sensitive memorization in code language models.
It demonstrates a 93.89% reduction in sensitive memorization on Qwen2.5-Coder-7B while retaining 99.99% of its code generation performance.
The approach offers an efficient post-hoc solution for_RTBF compliance with only 46.88 seconds per sample and manageable resource usage.

Erasing Sensitive Memorization in Code LLMs via Machine Unlearning

Introduction and Motivation

Code LLMs (CLMs) have become integral to software engineering tasks, including code generation, summarization, and repair. However, empirical evidence demonstrates that CLMs can inadvertently memorize and regurgitate sensitive information from their training data, such as emails, passwords, and API keys. This memorization poses significant privacy risks, especially in light of global data protection regulations (e.g., GDPR, CCPA) that mandate the "Right to Be Forgotten" (RTBF). Traditional mitigation strategies—data de-duplication and differential privacy—require full retraining and are computationally prohibitive for deployed models. The paper addresses the feasibility of post-hoc erasure of sensitive memorization in CLMs via machine unlearning, focusing on efficiency and preservation of model utility.

Figure 1: Existing approaches for mitigating memorization in CLMs: (a) data de-duplication, (b) differential privacy, and (c) machine unlearning.

Quantifying Sensitive Memorization in CLMs

The authors conduct a systematic quantification of sensitive memorization in several CLMs (CodeParrot, CodeGen-Mono, Qwen2.5-Coder) using the codeparrot-clean-train dataset. Sensitive segments are identified using the detect-secrets tool, revealing that approximately 18% of training samples contain sensitive information. Memorization is measured using Memorization Accuracy (MA) and Extraction Likelihood (EL), with thresholds empirically established using unseen datasets. The analysis shows that about 7% of training samples are memorized above threshold, indicating substantial privacy risk.

Figure 2: Distribution of memorization accuracy (MA) across sensitive data segments in the training corpus.

A high-risk dataset of 50,000 sensitive memorized samples is curated for subsequent unlearning experiments. The detection pipeline leverages regular expression-based secret identification and memorization scoring.

Figure 3: Pipeline for detecting and quantifying sensitive memorization in CLMs.

Machine Unlearning: Methods and Formulation

The unlearning problem is formalized as updating a trained CLM $f_\theta$ to $f_\theta'$ such that targeted samples $\mathbf{x}^f$ satisfy $\text{MA}(\mathbf{x}^f) \leq T_{\text{MA}}$ and $\text{EL}_n(\mathbf{x}^f) \leq T_{\text{EL}_n}$ , effectively erasing memorization. Three gradient ascent-based unlearning methods are considered:

Vanilla Gradient Ascent (GA): Maximizes the negative log-likelihood of forgotten samples, reducing their likelihood of generation.
Constraint-Based Unlearning (CU): Combines GA with KL-divergence minimization on retained data, balancing forgetting and utility preservation.
CodeEraser (Proposed): Selectively applies GA to sensitive segments and gradient descent to non-sensitive context, with KL-divergence constraints only on sensitive segments. This fine-grained approach preserves code structure and functionality.
Figure 4: Gradient ascent-based unlearning methods: (a) vanilla, (b) constraint-based, (c) CodeEraser (selective segment targeting).

Experimental Evaluation

Effectiveness and Efficiency

CodeEraser achieves substantial reduction in memorization metrics across all tested CLMs. For Qwen2.5-Coder-7B, CodeEraser reduces memorization by 93.89% on the forgotten set, with only 46.88 seconds per sample and peak memory usage of ~200GB for batch unlearning. This is orders of magnitude more efficient than retraining or DP-based approaches.

Figure 5: GPU time and memory usage for unlearning with different methods.

Model Utility Preservation

CodeEraser preserves model utility to a high degree, retaining 99.99% of HumanEval code generation performance in Qwen2.5-Coder-7B post-unlearning. In contrast, vanilla GA and CU methods exhibit notable utility degradation, especially as the number of forgotten samples increases. The selective targeting of CodeEraser minimizes collateral loss of non-sensitive code knowledge.

Analysis of Forgotten Data Characteristics

The impact of forgotten set size, duplication frequency, and sensitive data type is systematically analyzed:

Set Size: Utility remains robust for $k \leq 128$ ; larger $k$ induces degradation, indicating scalability limits.
Duplication Frequency: Both low and high duplication samples are less disruptive to utility when unlearned, while intermediate duplication levels cause more utility loss.
Sensitive Data Type: Unlearning API/SSH keys can improve model performance, likely due to their outlier status in the data distribution.
Figure 6: HumanEval performance post-unlearning as a function of forgotten set size, duplication frequency, and sensitive data type.

Hyperparameter Sensitivity

Learning rate is the most critical hyperparameter for balancing forgetting and utility retention. Regularization parameters ( $\gamma$ , $\alpha$ , $\lambda$ ) have minor effects within reasonable ranges, allowing flexible tuning.

Figure 7: Sensitivity analysis of learning rate, $\gamma$ , $\alpha$ , and $\lambda$ on post-unlearning utility.

Implications and Future Directions

The results demonstrate that post-hoc machine unlearning is a practical and efficient solution for erasing sensitive memorization in CLMs, enabling compliance with RTBF and other privacy regulations without full retraining. The selective segment targeting of CodeEraser is essential for maintaining code integrity and functional correctness. However, scalability to large forgotten sets and generalization to other CLM architectures remain open challenges. Future work should explore more robust secret detection, adaptive unlearning strategies, and integration with model deployment pipelines.

Conclusion

This paper establishes a rigorous framework for erasing sensitive memorization in CLMs via machine unlearning, introducing CodeEraser as a selective, efficient, and utility-preserving method. The approach is validated on multiple CLMs and large-scale sensitive datasets, demonstrating strong privacy protection with minimal computational overhead. The findings have direct implications for the deployment of CLMs in privacy-sensitive domains and set the stage for further research into scalable, fine-grained unlearning techniques.