Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models (2506.05314v1)

Published 5 Jun 2025 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs deployed in real-world settings increasingly face the need to unlearn sensitive, outdated, or proprietary information. Existing unlearning methods typically formulate forgetting and retention as a regularized trade-off, combining both objectives into a single scalarized loss. This often leads to unstable optimization and degraded performance on retained data, especially under aggressive forgetting. We propose a new formulation of LLM unlearning as a constrained optimization problem: forgetting is enforced via a novel logit-margin flattening loss that explicitly drives the output distribution toward uniformity on a designated forget set, while retention is preserved through a hard constraint on a separate retain set. Compared to entropy-based objectives, our loss is softmax-free, numerically stable, and maintains non-vanishing gradients, enabling more efficient and robust optimization. We solve the constrained problem using a scalable primal-dual algorithm that exposes the trade-off between forgetting and retention through the dynamics of the dual variable. Evaluations on the TOFU and MUSE benchmarks across diverse LLM architectures demonstrate that our approach consistently matches or exceeds state-of-the-art baselines, effectively removing targeted information while preserving downstream utility.

Authors (5)

Taha Entesari (6 papers)
Arman Hatami (1 paper)
Rinat Khaziev (2 papers)
Anil Ramakrishna (23 papers)
Mahyar Fazlyab (42 papers)

Summary

Constrained Entropic Unlearning: A Primal-Dual Framework for LLMs

The paper, titled "Constrained Entropic Unlearning: A Primal-Dual Framework for LLMs," addresses the challenge of unlearning within LLMs. As these models are increasingly deployed in various industry sectors, they face the necessity to erase certain sensitive or outdated information without degrading their overall performance. The authors propose a novel approach that recasts the unlearning task as a constrained optimization problem rather than a regularized trade-off.

Key Contributions

Reformulation of Unlearning Problem: Unlike the traditional regularized optimization which balances forgetting and retention via a scalarized loss, the authors redefine the task as a constrained optimization issue. This approach uses a logit-margin flattening loss to enforce forgetting, pushing the model's output distribution toward uniformity specifically on a designated forget set, while ensuring retention through strict constraints on a retain set.
Primal-Dual Algorithm: The constrained problem is solved using a scalable primal-dual algorithm, offering efficiency and stability. The algorithm's design promotes dynamic management of the trade-off between forgetting and retention using the dual variable dynamics, improving the optimization's robustness.
Logit-Margin Loss Function: The paper introduces a logit-margin flattening loss which avoids softmax operations and ensures non-vanishing gradients. This loss is numerically stable and convex, providing strong performance guarantees with efficient optimization suitable for LLM scales.

Numerical Results

Evaluations on standard benchmarks, such as TOFU and MUSE, reveal that the proposed method consistently matches or exceeds the performance of existing approaches. Specifically, the methodology demonstrated high scores in forget success while preserving model utility, as illustrated by comprehensive metrics like ROUGE scores and model fluency assessments. For example, in the TOFU dataset, the method yielded a forget success rate of 0.914 and model utility of 0.680 when tested on the Llama 3.2 3B architecture, outperforming traditional algorithms and nearly achieving retraining levels.

Implications and Future Directions

Practically, this work provides a streamlined path to unlearning in LLMs, reducing computational expense while maintaining compliance with privacy regulations such as data protection standards. Theoretically, it enriches the concept of constrained optimization as a viable technique for managing machine learning tasks involving trade-offs.

The paper suggests several speculative directions for future research, including handling dynamic updates in real-time applications, refining hyperparameters to enhance algorithm performance, and exploring resilience against adversarial learning attacks. Furthermore, the potential application of similar primal-dual approaches in other domains such as continual learning and safety alignment indicates wider implications for AI developments.

In summary, the authors propose a sophisticated yet practical solution for unlearning in LLMs, illustrating the strength of primal-dual frameworks in addressing complex optimization challenges within AI. This paper's approach provides a foundation for future explorations into stable and efficient model unlearning across diverse applications.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos