Federated Unlearning Inversion Attacks

Updated 26 March 2026

Federated Unlearning Inversion Attacks (FUIA) are advanced privacy attacks that exploit gradient differences in federated unlearning to reconstruct data intended for erasure.
FUIA leverages methods such as gradient-difference optimization, learning-based inversion, and algorithm-agnostic DRAUN to recover erased information across various unlearning scenarios.
Empirical studies indicate that FUIA significantly compromises privacy in federated learning, while countermeasures like gradient pruning and perturbation present notable privacy–utility trade-offs.

Federated Unlearning Inversion Attacks (FUIA) refer to a class of privacy attacks targeting federated unlearning (FU)—a process that allows clients in federated learning (FL) to erase the influence of specific data instances, entire clients, or whole classes from a collaboratively trained model to comply with regulatory requirements such as the “right to be forgotten.” FUIA exploits the updates and gradients recorded before and after unlearning to reconstruct forgotten data, thereby compromising the very privacy guarantees FU aims to provide. Several recent works have established rigorous threat models and demonstrated effective inversion attacks under realistic settings, exposing significant privacy risks in a range of FU protocols (Zhou et al., 20 Feb 2025, Zhang et al., 16 May 2025, Lamri et al., 2 Jun 2025).

1. Threat Models and Privacy Objectives

FUIA operates within an honest-but-curious adversarial paradigm. The attacker (typically the central server or an external auditor) strictly adheres to the prescribed FL and FU protocols but archives all available model updates, gradients, or proofs exchanged therein:

Server-side FUIA: The server learns model parameters and updates before and after unlearning, enabling direct computation of differences associated with erased data.
Auditor-side FUIA: An external party (auditor) receives Proofs of Federated Unlearning (PoFU), which comprise gradient differences for each requested-forgotten sample.

The adversary never has access to raw client data but seeks to reconstruct features (e.g., images) and labels of the forgotten set purely from these “external” traces. Constraints include limited auxiliary data, incomplete protocol knowledge, and strict no-interference with the FU or FL workflows (Zhou et al., 20 Feb 2025, Zhang et al., 16 May 2025).

2. FU Scenarios and Attack Surfaces

FUIA attacks exploit the inherent information leakage in three major unlearning settings, each corresponding to a different operational goal:

Sample Unlearning: Remove a small set of specific samples from a client’s dataset. The attacker compares gradients (or parameter differences) before and after unlearning, recovering gradients attributable to the forgotten set.
Client Unlearning: Remove all data from a particular client. The attack leverages both the accumulated training gradients for the client and the global update induced by unlearning.
Class Unlearning: Remove all instances with a specified label. Differences in output-layer parameters (weights and biases) before and after unlearning allow direct inference of erased class identities.

All scenarios provide leakage vectors, enabling feature or label extraction (Zhou et al., 20 Feb 2025). Recent work generalizes to settings with only aggregate gradient-difference updates—as in protocols with explicit PoFU requirements—demonstrating that per-sample or per-client gradients are not required for effective inversion (Zhang et al., 16 May 2025).

3. Methodologies for FUIA

A range of attack methodologies are in use, spanning optimization-based gradient inversion, machine learning–based inversion, and algorithm-agnostic frameworks.

A. Gradient-Difference–Based Optimization

The key observation is that the parameter/gradient difference before and after unlearning, denoted $\Delta W = w^o - w^u$ , is approximately the sum (or aggregate) of gradients over forgotten data. The attack proceeds via:

Gradient Extraction: Isolate the “target gradient” corresponding to forgotten data, either by subtracting post-unlearning from pre-unlearning gradients or by specific forms of weighted averaging over rounds.
Gradient Inversion: Optimize synthetic inputs $(\hat x, \hat y)$ such that their gradients closely match the inferred target gradient, typically minimizing a cosine similarity loss,

$\min_{\hat x, \hat y} -\frac{\langle \nabla_{w^o}L(F(w^o, \hat x), \hat y), \nabla^k \rangle}{\|\nabla_{w^o}L\|\|\nabla^k\|_2} + \alpha TV(\hat x),$

where $TV(\cdot)$ regularizes input smoothness (Zhou et al., 20 Feb 2025).

B. Learning-based Inversion (IGF)

When per-update PoFUs are available, as in verifiable unlearning, the Inverting Gradient differences to Forgotten data (IGF) pipeline is applied:

SVD-Based Dimensionality Reduction: Projects high-dimensional gradient differences onto a compact basis that captures most variance ( $\nu$ -fraction, e.g., $95\%$ by SVD).
Pixel-Level Inversion Network: A neural network maps the reduced gradient-difference vectors to the image space, trained using a composite loss to preserve both pixel fidelity and semantic similarity.
Batch Reconstruction: The trained model then reconstructs forgotten samples from real PoFU submissions (Zhang et al., 16 May 2025).

C. Algorithm-Agnostic DRAUN Attack

DRAUN (Data Reconstruction Attack UNlearning) dispenses with assumptions on unlearning algorithms $\mathcal{A}_c$ . The attacker jointly optimizes for both the retained and forgotten sets, simulating updates that match observed model differences while maintaining separation of dummy retain/unlearn inputs,

$\ell_k = 1 - \frac{\langle\nabla\bar\theta_c,\tilde\nabla^{(k)}\rangle}{\|\cdot\|\|\cdot\|} + \lambda_{TV}\bigl[\beta\,TV(\tilde{x}_u)+(1-\beta)TV(\tilde{x}_r)\bigr].$

This strategy yields high-fidelity recovery even without protocol or hyperparameter knowledge (Lamri et al., 2 Jun 2025).

4. Experimental Findings and Benchmark Results

Empirical studies across multiple benchmark datasets (CIFAR-10, CIFAR-100, MNIST, FEMNIST) and architectures (ResNet-18/44, ConvNet, MLP, LeNet) corroborate the effectiveness of FUIA and variants.

Accuracy of Reconstruction: For sample unlearning with UnrollingSGD (CIFAR-10/ResNet-18), FUIA achieves MSE ≈ 0.005 and PSNR ≈ 25 dB, significantly outperforming classical model unlearning inversion attacks (MUIA) which yield MSE ≈ 0.02 and PSNR ≈ 20 dB (Zhou et al., 20 Feb 2025).
IGF Results: On PoFU attack benchmarks, IGF achieves MSE = 0.0211, PSNR = 17.19 dB (EFU, CIFAR-10), with reconstructions retaining fine structural details. Reconstructions on MNIST and Fashion-MNIST are nearly indistinguishable from originals (Zhang et al., 16 May 2025).
Algorithm-Agnostic DRAUN: DRAUN achieves SSIM = 0.641 and PSNR = 21.11 on CIFAR-10/ABL, with baseline methods failing (PSNR ≈ 5.5, SSIM ≈ 0.004) (Lamri et al., 2 Jun 2025).
Scale and Generalization: All attack methods generalize to larger batch unlearning and diverse architectures, with only minor degradation as the number of erased samples grows.

A table summarizing representative results is presented below.

Attack Method	Dataset/Model	Metric	Result
FUIA	CIFAR-10/ResNet-18	MSE	0.005
FUIA	CIFAR-10/ResNet-18	PSNR (dB)	25
IGF	CIFAR-10/ConvNet	MSE	0.0211
IGF	CIFAR-10/ConvNet	PSNR (dB)	17.19
DRAUN	CIFAR-10/ConvNet	SSIM	0.641
DRAUN	CIFAR-10/ConvNet	PSNR (dB)	21.11

5. Defense Strategies and Privacy-Utility Trade-offs

Defense mechanisms focus on degrading the utility of gradient differences to the attacker while sustaining FU integrity:

Gradient Pruning: Each client prunes the $p$ -fraction of gradient coordinates with the smallest magnitude. Higher $p$ reduces FUIA's PSNR (≤15 dB at $p=0.4$ ) but significantly degrades model test accuracy (from 84% to 62% for $p=0.4$ ); moderate values ( $p=0.2$ ) may offer a reasonable compromise (Zhou et al., 20 Feb 2025, Lamri et al., 2 Jun 2025).
Gradient Perturbation (Gaussian Noise): Adding $N(0, \sigma^2)$ noise reduces FUIA effectiveness with marginal accuracy loss until $\sigma$ exceeds $0.003$, at which point the attack nearly fails but accuracy drops sharply (Zhou et al., 20 Feb 2025).
Orthogonal Obfuscation (PoFU Setting): Each PoFU vector is replaced with a random orthogonal vector of identical $\ell_2$ norm, destroying the directional information necessary for reconstruction while preserving verification (PoFU norm checks). IGF reconstructions on obfuscated PoFUs resemble random noise (Zhang et al., 16 May 2025).

All defenses introduce privacy–utility trade-offs: noise or pruning impedes fast convergence or reduces test accuracy, and orthogonalization prevents auditing of more complex unlearning properties (e.g., nontrivial semantics).

6. Limitations, Open Problems, and Research Directions

A set of challenges remain for both theory and practice:

Robust Defenses: Current mitigation approaches either degrade utility or leave aspects of the attack (e.g., label recovery for class FU) unaddressed. Differential privacy–inspired solutions with end-to-end guarantees are open research questions (Zhou et al., 20 Feb 2025, Lamri et al., 2 Jun 2025).
Scalability and Heterogeneity: Realistic large-scale FL deployments involve non-IID data and highly heterogeneous client participation, potentially affecting both attack efficacy and defense robustness. Empirical results remain robust for moderate batch unlearning, but more analysis is needed for extreme regimes (Zhou et al., 20 Feb 2025).
Formal Guarantees: There is a lack of formal definitions and frameworks for “provable” privacy in FU, analogous to differential privacy for standard FL (Zhang et al., 16 May 2025).
Algorithm/Protocol Diversity: Most attacks focus on first-order or optimization-based FU; generalizing to second-order or more cryptographically robust unlearning remains open (Lamri et al., 2 Jun 2025).

A plausible implication is that new federated unlearning protocols must be co-designed with tailored privacy protection, considering both practical attack surfaces and rigorous privacy proofs.

7. Relation to Broader Machine Learning Privacy Attacks

FUIA extends the paradigm of model inversion and gradient inversion attacks from centralized and non-unlearning contexts to the federated unlearning setting, introducing additional complexity due to decentralized control, partial observability, and explicit privacy demands. Earlier inversion attacks in centralized Machine Learning-as-a-Service (MLaaS) settings do not directly carry over due to structural differences in loss surfaces and unlearning objectives (for example, Theorem 1 and 2 in (Lamri et al., 2 Jun 2025)). FU-specific attacks must account for updates structured as the difference between gradient contributions from both the forgotten and retained data, necessitating novel algorithm-agnostic approaches such as DRAUN (Lamri et al., 2 Jun 2025). This suggests that general-purpose privacy defenses for FL do not, in themselves, suffice for FU-specific threat models.

In summary, Federated Unlearning Inversion Attacks (FUIA) constitute an active and technically rigorous area within privacy research for federated learning. The demonstrated ability of honest-but-curious adversaries to reconstruct forgotten data in practical FU deployments exposes a pronounced gap in current privacy defenses, mandating both theoretical and engineering advances for verifiably private unlearning protocols (Zhou et al., 20 Feb 2025, Zhang et al., 16 May 2025, Lamri et al., 2 Jun 2025).

Markdown Report Issue Upgrade to Chat

References (3)

Model Inversion Attack against Federated Unlearning (2025)

Verifiably Forgotten? Gradient Differences Still Enable Data Reconstruction in Federated Unlearning (2025)

DRAUN: An Algorithm-Agnostic Data Reconstruction Attack on Federated Unlearning Systems (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Federated Unlearning Inversion Attacks (FUIA).