Papers
Topics
Authors
Recent
Search
2000 character limit reached

Info-Theoretic Framework for Attribute Unlearning

Updated 6 January 2026
  • The paper introduces an information-theoretic objective that minimizes mutual information between learned representations and sensitive attributes while preserving task-relevant utility.
  • It employs surrogate losses, such as variational bounds, MMD, adversarial and contrastive methods, to efficiently estimate and control the attribute's statistical footprint.
  • Practical schemes like LEGO and MaSS demonstrate strong empirical trade-offs, significantly reducing attribute inference without substantial loss in main-task performance.

Selective removal of sensitive or nuisance attributes from learned representations, known as attribute unlearning, has emerged as a fundamental requirement for privacy, fairness, and compliance in machine learning systems. The information-theoretic framework for attribute unlearning formalizes this objective as the selective reduction of statistical dependence—typically measured via Shannon mutual information—between target representations and the attributes to be forgotten, while ensuring maximal preservation of utility-relevant information. This paradigm provides mathematically rigorous objectives, optimization schemes, and guarantees, enabling principled design of algorithms that operate in settings as diverse as recommender systems, federated models, deep feature spaces, and multi-modal neural architectures.

1. Formal Information-Theoretic Objectives for Attribute Unlearning

Attribute unlearning is cast as an optimization over the encoder or data transformation parameters to maximize the retained information about task-relevant signals and inputs, while minimizing the mutual information with sensitive attributes. Let xXx\in \mathcal X denote the original input, yYy\in \mathcal Y the main task label, zZz\in \mathcal Z the attribute to be unlearned, and h=fθ(x)h = f_\theta(x) the learned representation.

The canonical information-theoretic objective is: maxθ U(h,y)subject to I(h;z)ε\max_\theta\ \mathcal U(h, y) \quad \text{subject to } I(h; z) \leq \varepsilon where U\mathcal U is a utility functional such as I(h;y)I(h; y) (task-fidelity) or I(h;x)I(h; x) (input preservation), and I(h;z)I(h; z) quantifies the attribute's residual footprint. In soft-constraint form, this becomes: maxθ U(h,y)  γI(h;z)\max_{\theta}\ \mathcal U(h, y)\ -\ \gamma I(h; z) for a Lagrange parameter γ0\gamma\geq 0 (Guo et al., 2022, Xu et al., 8 Feb 2025).

In frameworks supporting multi-attribute unlearning (e.g., MaSS), the problem further generalizes to: maximizeθ,ηI(X;F) subject toI(X;Si)mi i I(X;Uj)nj j\begin{aligned} \text{maximize}_{\theta,\eta} \quad & I(X'; F)\ \text{subject to} \quad & I(X'; S_i) \leq m_i\ \forall i\ & I(X'; U_j) \geq n_j\ \forall j \end{aligned} where XX' is the transformed sample, {Si}\{S_i\} are sensitive attributes, {Uj}\{U_j\} utility attributes, FF denotes unannotated generic information, and mi,njm_i, n_j are per-attribute budget parameters (Chen et al., 2024). Attribute unlearning in federated or distributed models defines analogous criteria with respect to the information Fisher-score of model parameters about attributes or client data (Balordi et al., 26 Aug 2025).

2. Surrogate Losses and Mutual Information Estimation

Direct estimation of mutual information between high-dimensional representations and discrete/categorical attributes is generally intractable. Practical frameworks employ variational upper bounds or proxy divergences.

  • Upper bounds via Variational Classifiers: The vCLUB bound, for attribute AtA_t and embedding θ\theta, is (Yu et al., 23 Oct 2025):

I(θ;At)Ep(θ,a)[logqϕ(aθ)]Ep(θ)Ep(a)[logqϕ(aθ)]I(\theta; A_t) \leq \mathbf{E}_{p(\theta, a)} [\log q_\phi(a | \theta)] - \mathbf{E}_{p(\theta)} \mathbf{E}_{p(a)} [\log q_\phi(a | \theta)]

for a learned variational classifier qϕq_\phi.

MMD2(P1,P2)=μ(P1)μ(P2)G2\mathrm{MMD}^2(\mathbb{P}_1, \mathbb{P}_2) = \|\mu(\mathbb{P}_1) - \mu(\mathbb{P}_2)\|^2_{\mathcal{G}}

where Pi\mathbb{P}_i is the distribution of embeddings conditioned on attribute class ii, and G\mathcal{G} is an RKHS.

  • Adversarial and Contrastive Surrogates: Adversarial classifiers (cross-entropy losses) and contrastive InfoNCE-type objectives are used to penalize mutual information with certain attributes while maximizing predictive information for utility attributes (Chen et al., 2024).
  • Jacobian-Norm Minimization: In local information-gain frameworks, the squared norm of the network Jacobian along the attribute axes is minimized, suppressing the transfer of information about the forgotten subspace (Foster et al., 2024):

Lattr(x;θ)=1Nfθ(xa,xu)fθ(xa+δa,xu)22δa22L_{\mathrm{attr}}(x; \theta) = \frac{1}{N} \sum \frac{\|f_\theta(x_a, x_u) - f_\theta(x_a + \delta_a, x_u)\|_2^2}{\|\delta_a\|_2^2}

serving as a first-order proxy for I(xa;fθ(x)xu)I(x_a; f_\theta(x) | x_u).

3. Unified Optimization and Algorithmic Strategies

Multiple algorithmic instantiations arise for optimizing surrogate losses subject to information constraints:

  • Training-phase MI regularization: MI-based regularizers are integrated into the representation learning process, with closed-form bounds (e.g., InfoFiltra) supporting stepwise surrogate minimization (Guo et al., 2022).
  • Post-training Editing: Models can be edited post hoc, by optimizing only the embedding matrix or feature layers using distributional or adversarial losses, combined with functional/parameter-space regularization to protect utility (Li et al., 2023, Chen et al., 2024).
  • Two-step Procedures and Combinatorial Unlearning: Multi-attribute settings exploit parallelizable attribute-wise calibration steps, followed by a convex combination search to align embeddings to minimize information leakage on all targeted attributes (Yu et al., 23 Oct 2025).

A summary of MI estimation and surrogate tasks appears below:

MI estimation Optimization phase Approach
vCLUB bound Training/post-train Variational MI
MMD (& barycenter) Post-train RKHS divergence
Adversarial CE Training/post-train Classifier-based
InfoNCE contrastive Both Mutual info proxy
Jacobian norm Post-train Output sensitivity

4. Theoretical Guarantees and Trade-off Bounds

Rigorous bounds characterize Pareto trade-offs between utility retention and attribute removal:

  • Surrogate Loss Tightness: For InfoFiltra, it is shown that the optimized upper-bound loss LinfoFiltra\mathcal{L}_{\mathrm{infoFiltra}} satisfies (Guo et al., 2022):

LinfoFiltraλ1I(h;x)λ2I(h;y)λ3I(h;z)\mathcal{L}_{\mathrm{infoFiltra}} \leq -\lambda_1 I(h; x) - \lambda_2 I(h; y) - \lambda_3 I(h; z)

and under additional conditions, the slack ϵ\epsilon is controlled by:

ϵαβ[I(x;y)I(h;y)I(h;z)]\epsilon \leq \alpha\beta [I(x; y) - I(h; y) - I(h; z)]

  • Feasibility Bounds: In selective suppression with utility lower bounds and sensitive-attribute upper bounds, the existence of a feasible mapping requires (Chen et al., 2024):

njmi+I(X;UjSi),njI(X;Uj),mi0n_j \leq m_i + I(X; U_j | S_i),\quad n_j \leq I(X; U_j),\quad m_i \geq 0

and for unannotated utility, the achievable retained information is upper-bounded by H(XSi)+miH(X|S_i) + m_i.

  • Distributional Pareto Frontier: In distributional unlearning, the Gaussian Pareto frontier of minimum deletions versus preservation is analytically given by (Allouah et al., 20 Jul 2025):

PF(PU,PR;P)={(α,(αD)2):αD}\mathrm{PF}(P_U, P_R; \mathcal P) = \left\{ \left(\alpha, (\sqrt{\alpha} - \sqrt{D})^2 \right): \alpha \geq D \right\}

with D=DKL(PUPR)D = D_{KL}(P_U \| P_R).

  • Federated Unlearning: Per-parameter Target Information Score (TIS) quantifies the impact of unlearning:

TIS(θi)=(θi2L(DT)θi2L(D))2\mathrm{TIS}(\theta_i) = \left( \frac{\partial_{\theta_i}^2 \mathcal{L}(D_T)}{\partial_{\theta_i}^2 \mathcal{L}(D)} \right)^2

Resetting high-TIS parameters followed by minimal retraining provably increases the adversary's error in inferring attribute presence (Balordi et al., 26 Aug 2025).

5. Practical Schemes and Empirical Evidence

Frameworks instantiate the above theory in various modalities and tasks:

  • LEGO: Employs parallelizable embedding calibration and a flexible convex-combination layer to support simultaneous/dynamic multi-attribute removal. It minimizes variational MI upper bounds subject to 2ℓ_2-proximity constraints, with theoretical approximation-ratio guarantees. Empirically, LEGO achieves a  24~24 point drop in attribute-inference attacker BAcc and <4%<4\% NDCG@10 loss (Yu et al., 23 Oct 2025).
  • MaSS: Introduces adversarial cross-entropy on labeled attributes, contrastive InfoNCE for unannotated utility, and normalizes performance via "Normalized Accuracy Gain"; it outperforms alternatives across audio, image, and sensor datasets (Chen et al., 2024).
  • PoT-AU/D2D-FR: Distributional metric unlearning using MMD for class separation enables post hoc attribute removal at high efficiency (seconds vs retrain), with attacker BAcc reduced to chance levels and only $1$–2%2\% drop in HR@10/NDCG@10 (Chen et al., 2024, Li et al., 2023).
  • Jacobian Smoothing: Just-in-Time attribute unlearning minimizes local information gain along attribute axes, yielding fast, output-level invariance to target attributes (Foster et al., 2024).
  • Wasserstein Barycenter: Feature-unlearning via optimal transport of attribute-conditional distributions to the barycenter gives a unified, closed-form solution for multiple objectives and admits efficient computation with neural OT maps (Xu et al., 8 Feb 2025).
  • Federated Unlearning: Hessian-diagonal scoring and targeted re-initialization generalizes attribute unlearning to federated and distributed settings (Balordi et al., 26 Aug 2025).

Empirical evaluations consistently show strong reduction in attribute inference, little degradation to main task metrics, and vastly improved computational and deletion efficiency relative to blind retraining or adversarial in-training unlearning.

6. Assessment, Metrics, and Generalization

Mutual-information–based metrics serve both as unlearning objectives and post hoc assessment tools:

  • Information Difference Index (IDI): Quantifies retained mutual information about forgotten attributes in intermediate layers; IDI=0IDI = 0 indicates full erasure, IDI=1IDI=1 no removal (Jeon et al., 2024). IDI leverages InfoNCE-based estimation and robustly separates truly unlearned models from those only masking outputs.
  • Trade-off analyses: Many frameworks present empirical utility–privacy Pareto frontiers, exhibiting minimal main-task loss up to near-complete attribute suppression (Yu et al., 23 Oct 2025, Chen et al., 2024, Allouah et al., 20 Jul 2025).
  • Extension to Arbitrary Attributes: All frameworks support multi-class, continuous, and even arbitrarily structured attribute unlearning by adapting the MI estimators, proxy losses, and regularization strengths accordingly (Chen et al., 2024, Yu et al., 23 Oct 2025, Xu et al., 8 Feb 2025).

7. Open Problems and Significance

While information-theoretic frameworks for attribute unlearning deliver closed-form objectives, surrogate bounds, and post hoc metrics, some limitations persist. These include the computational cost of high-dimensional MI estimation, the explicit-knowledge assumption about sensitive attributes for labelled suppression, and the non-convexity of deep representation mapping spaces. Nonetheless, these frameworks supply the first rigorous foundations for selective attribute removal, unifying disparate strategies under the powerful machinery of mutual information, and enabling efficient, theoretically justified implementations across a range of modern machine learning architectures (Guo et al., 2022, Yu et al., 23 Oct 2025, Chen et al., 2024, Chen et al., 2024, Xu et al., 8 Feb 2025, Allouah et al., 20 Jul 2025, Jeon et al., 2024, Balordi et al., 26 Aug 2025).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Information-Theoretic Framework for Attribute Unlearning.