Deep Leakage from Gradients (DLG)

Updated 9 December 2025

DLG is a privacy attack that leverages the inversion of shared model gradients to accurately reconstruct sensitive training data, including inputs and labels.
Empirical results demonstrate that methods like iDLG and gradient-guided diffusion can achieve up to 99% recovery rates in various modalities such as vision and language.
Research on DLG emphasizes defenses like gradient perturbation, differential privacy, and homomorphic encryption, each balancing privacy protection with model utility.

Deep Leakage from Gradients (DLG) denotes a class of privacy attacks in distributed and federated learning where shared gradients can be inverted to recover private training data, exposing major vulnerabilities in collaborative machine intelligence workflows. DLG, first demonstrated by Zhu et al. and extended in subsequent work, exploits the fact that the full gradients of even a single example, given a known model architecture and weights, encode sufficient information to reconstruct the original input (and, in many cases, its label) with high fidelity. This phenomenon spans vision, language, and structured data, and critically undermines assumptions about the privacy-preserving nature of gradient sharing.

1. Threat Model and Formal DLG Attack

The standard threat model involves a passive adversary (typically the aggregation server or a dishonest client) in federated learning (FL), where each client holds local data and periodically uploads model gradients. Denote the model by $f_\theta$ , data sample by $(x_i, y_i)$ , and loss function by $\ell$ . The client computes the per-sample or mini-batch gradient: $g_i = \nabla_\theta\,\ell(f_\theta(x_i),\,y_i)$ and uploads $g_i$ to the server. The DLG attacker, assumed to know the model architecture and current weights, seeks dummy inputs $(x', y')$ such that their gradient via the forward-backward map matches the observed $g_i$ : $(x^*,y^*) = \arg\min_{x',\,y'}\,\|\,\nabla_\theta \ell(f_\theta(x'),y') - g_i\,\|_2^2$ Optimization proceeds via alternating or simultaneous gradient-based updates on $(x',y')$ . In the refinements known as iDLG, the label $y^*$ can be extracted in closed form for networks using a final linear layer and cross-entropy loss (Zhao et al., 2020). These attacks extend naturally to mini-batches, vision architectures, and NLP models (Zhu et al., 2019, Li et al., 3 Jun 2024).

2. Theory: Solvability, Architecture, and Information Pathways

DLG's practical and theoretical power stems from the (often invertible or low-dimensional) structure linking input data to weight gradients. Analytic work (Chen et al., 2022, Chen et al., 2021) demonstrates that fully connected layers with bias allow closed-form input recovery from their gradients: $x_l = \frac{\partial L/\partial w_{kl}}{\partial L/\partial b_k}$ for any output neuron $k$ with nonzero gradient. For convolutional layers, inversion reduces to a (possibly underdetermined) linear system whose rank depends on channel count and kernel structure, quantified via security metrics such as $c(M)$ , representing rank deficiencies per layer.

The vulnerability is modulated by architectural choices:

Networks with “wide” early convolutions (more channels than spatial pixels) or aggressive feature expansion in initial layers have higher rank deficiency and are less prone to full inversion.
Piecewise-invertible activations, insufficient mixing, or deterministic mappings enhance invertibility (and thus leakage risk).

Gradient leakage for attributes (not just raw data) is governed by the subspace distinctiveness in gradient space between samples with and without a given attribute, measurable by Grassmannian principal angles (Mo et al., 2021).

3. Empirical Severity and New Gradient Inversion Techniques

Canonical DLG (and iDLG) attacks reconstruct 32×32 or 64×64 images with pixelwise accuracy and exceed 90–99% label and data recovery rates under single-sample gradient sharing (Zhu et al., 2019, Mu, 2022, Zhao et al., 2020). Recent generative advances have dramatically increased attack power:

Gradient-Guided Diffusion Models: Fine-tuning diffusion models using gradient-matching loss enables visually perfect recovery of images up to 512×512 pixels (Meng et al., 13 Jun 2024). These methods dramatically outperform direct optimization, reducing mean squared error (MSE) by an order of magnitude and producing near-indistinguishable high-resolution reconstructions.
Gradient Inversion Transcript (GIT): Training a generative model (architecture-aligned to the victim) as an inversion operator yields reconstructive attacks that reduce MSE 3–7× versus classical DLG, with 10⁶× faster inference and robustness to gradient noise, batch mismatch, and domain shift (Chen et al., 26 May 2025).
Partial-Gradient Attacks for Transformers: Leakage persists even when an adversary only observes gradients from a single sub-component (e.g., a single attention matrix) in a Transformer; partial gradients comprising as little as 0.5% of model parameters suffice for >50% recovery of the true input sequence (Li et al., 3 Jun 2024).

Tables of empirical leakage severity—metrics such as MSE, PSNR, SSIM, LPIPS—consistently show that DLG, especially when paired with generative or architectural priors, achieves high-fidelity data exposure across vision and NLP tasks.

4. Defenses: Approaches, Efficacy, and Limitations

Numerous countermeasures have been developed, spanning cryptographic, architectural, and noise-based methods:

Gradient Perturbation and Sparsification

Random Masking (Dropout on Gradients): Zero-masking a random p=0.4 fraction of gradient entries forces DLG reconstructions’ SSIM below 0.2 (human-unrecognizable) with <2% accuracy loss (Kim et al., 15 Aug 2024). Clipping top gradients is partially effective; pruning or noising must be far more aggressive and often destroys utility.
Gradient Pruning/Compression: Zeroing 30% or more of small-magnitude entries can prevent reconstruction with minimal model accuracy loss if error compensation is used (Zhu et al., 2019).
Differential Privacy (DP-SGD): Gaussian noise injection can thwart classical DLG, but generative attacks and partial-gradient attacks withstand DP-SGD up to noise levels (σ=0.3–0.5) that collapse model accuracy (Li et al., 3 Jun 2024, Meng et al., 13 Jun 2024). Scalar noise (σ≥10⁻²) is required to block basic DLG, but creative attacks can recover through more moderate noise.
Dropout Layers: Adding dropout (p=0.3–0.5) prior to classification heads raises DLG reconstruction error by ~10–15% with negligible utility impact for small models (Zheng, 2021).

Architectural and In-Representation Defenses

Leveled Homomorphic Encryption (HE): Encrypting only the most sensitive (top r% by magnitude or Hessian) gradient coordinates using CKKS/BFV schemes blocks DLG attacks nearly completely for r≥10%, with only minor latency (+23%) and communication (30 KB) overhead and ≤0.2 percentage point loss in accuracy. Full encryption imposes significant overhead (8–10×) (Najjar et al., 9 Jun 2025).
Representation Perturbation/Projection: Targeted perturbations of representation layers—informed by leverages computed for each dimension—can increase DLG-induced MSE by >160× without accuracy loss, outperforming DP and baseline pruning (Sun et al., 2020).
PRECODE Module: Stochastic variational bottleneck layers inserted between features and classifier sever deterministic input-to-gradient mapping, dropping attack success rate to zero with <1% test-accuracy loss (Scheliga et al., 2021).

FL Protocol & Data-Handling Strategies

Gradient Averaging (FedAvg): Aggregating gradients across ≥30 local samples before sharing reduces information leakage (for both original and attribute inference) by >90% (Mo et al., 2021).
Batch/Group Diversification: Curating data to maximize spatial variance in location-based FL can mislead DLG into reconstructing only latent centroids, significantly degrading attacker accuracy (Bakopoulou et al., 2021).
Batch Size Tuning in Logistic Regression: In binary domains, unique data can only be reconstructed if the batch is small and full column rank; increasing batch size or aggregating multiple updates introduces ambiguity (Li et al., 2019).

Security Metrics and Attack Analysis

Layerwise rank-deficiency metrics ( $c(M)$ ) and sensitivity analyses can anticipate architecture-dependent vulnerability (Chen et al., 2022, Chen et al., 2021, Mo et al., 2021). The Inversion Influence Function (I²F) formalism predicts which gradient perturbations most degrade attacker performance and highlights unfairness in instance-level protection (Zhang et al., 2023).
Bayesian analysis connects DLG and its variants to MAP inference under various priors and noise models, highlighting the inescapable theoretical limits of attack and defense (Balunović et al., 2021).

5. Layerwise, Modality-Specific, and Partial-Gradient Leakage

DLG affects not only full models but also isolated layers and linear submodules:

Layerwise Analysis: Early convolutional layers exhibit the highest Jacobian sensitivity for input reconstruction, while attribute inference risk is often concentrated in the first fully connected layer (Mo et al., 2021).
Partial Gradients: NLP models are vulnerable even with gradients from just a single linear submodule (e.g., Transformer Q/K/V matrices), making naive parameter freezing or selective updating insufficient (Li et al., 3 Jun 2024).
Multi-Modality: The methodology—gradient-matching inversion—is agnostic to modality and applies in vision (images), language (tokens), geospatial (location), and structured (binary) domains.

6. Open Problems, Limitations, and Future Directions

Scaling DLG to Large Batches/High Resolution: Although DLG is most effective at batch size one, generative and diffusion-based attacks now achieve high-resolution leakage, putting pressure on fixed assumptions about safety with increased batch or data scale (Meng et al., 13 Jun 2024, Chen et al., 26 May 2025).
Trade-off between Privacy and Utility: Most defenses (DP, masking, noising) degrade accuracy at strong protection levels; highly targeted defenses (representation perturbation, PRECODE, selective HE) promise better trade-offs but require model changes or additional compute.
Adaptive, Contextualized Defense: Attack power varies widely across samples, training epochs, and network initializations. Future mitigations will need to be data- and architecture-aware, possibly using online sensitivity/attack-metering tools such as the I²F indicator.
Theoretical Guarantees and Certifiable Defenses: Only a handful of techniques (e.g., representation perturbation with proven bounds (Sun et al., 2020)) provide certified privacy/utility guarantees against DLG.
Cryptography at Scale: Homomorphic encryption and secure aggregation are privacy-optimal but often impractical for latency- and compute-constrained edge deployments; hybrid selective/leveled approaches are a current focus (Najjar et al., 9 Jun 2025).

7. Summary Table: Defenses and Their Efficacy vs. DLG

Defense (Class/Method)	Block DLG? (MSE/SSIM)	Utility Loss	Key Hyperparameter(s)	arXiv reference
Gradient masking (p=0.4)	Yes ( $<0.2$ SSIM)	<2% acc drop	Masking p (fraction)	(Kim et al., 15 Aug 2024)
Homomorphic enc. (r=10%)	Yes (>0.15 MSE)	<0.2 pp	r (% encrypted grads)	(Najjar et al., 9 Jun 2025)
Differential privacy (σ=0.3+)	Partial (generative DLG robust)	High	σ (noise scale)	(Meng et al., 13 Jun 2024)
PRECODE/vBottleneck	Yes (ASR=0)	<1% acc drop	β (KL weight), k (dim)	(Scheliga et al., 2021)
Representation perturbation	Yes (MSE↑×160)	0	ε (sparse-perturbation)	(Sun et al., 2020)
Dropout (p=0.3–0.5)	Partial (RMSE↑10–15%)	Minor	Dropout p	(Zheng, 2021)
Gradient pruning (≥30%)	Yes (artifacts/unrecov)	<1%	Pruning ratio	(Zhu et al., 2019)

Defending against Deep Leakage from Gradients remains an open and evolving research area: adversaries can reconstruct training data of high fidelity with minimal side information, demanding multi-pronged, context-sensitive technical defenses that respect accuracy, latency, and integration constraints. Research emphasis is shifting toward theoretically guided, architecture- and data-informed countermeasures, and layered protocols embedding both statistical and cryptographic protections.