Differentially Private Manifold Denoising

Published 1 Apr 2026 in cs.LG, cs.CR, and math.ST | (2604.00942v1)

Abstract: We introduce a differentially private manifold denoising framework that allows users to exploit sensitive reference datasets to correct noisy, non-private query points without compromising privacy. The method follows an iterative procedure that (i) privately estimates local means and tangent geometry using the reference data under calibrated sensitivity, (ii) projects query points along the privately estimated subspace toward the local mean via corrective steps at each iteration, and (iii) performs rigorous privacy accounting across iterations and queries using $(\varepsilon,δ)$-differential privacy (DP). Conceptually, this framework brings differential privacy to manifold methods, retaining sufficient geometric signal for downstream tasks such as embedding, clustering, and visualization, while providing formal DP guarantees for the reference data. Practically, the procedure is modular and scalable, separating DP-protected local geometry (means and tangents) from budgeted query-point updates, with a simple scheduler allocating privacy budget across iterations and queries. Under standard assumptions on manifold regularity, sampling density, and measurement noise, we establish high-probability utility guarantees showing that corrected queries converge toward the manifold at a non-asymptotic rate governed by sample size, noise level, bandwidth, and the privacy budget. Simulations and case studies demonstrate accurate signal recovery under moderate privacy budgets, illustrating clear utility-privacy trade-offs and providing a deployable DP component for manifold-based workflows in regulated environments without reengineering privacy systems.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a novel DP framework for manifold denoising, jointly managing measurement, privacy, and intrinsic geometric uncertainties.
It employs a DP local PCA for tangent space estimation, achieving near-optimal recovery with theoretical error bounds that balance noise and curvature effects.
Empirical results on biomedical and single-cell datasets demonstrate that the approach preserves local geometry and prediction accuracy under moderate privacy budgets.

Differentially Private Manifold Denoising: Technical Analysis

Introduction and Problem Setting

The manuscript "Differentially Private Manifold Denoising" (2604.00942) addresses the challenge of leveraging latent manifold structure for denoising high-dimensional data in settings where geometric reference datasets are sensitive and protected by differential privacy (DP) constraints. The authors formulate a practical and theoretically grounded framework enabling iterative correction of noisy query points using privatized geometric signals derived from a reference cohort, while rigorously accounting for privacy loss across queries and iterations.

The core technical objective is to jointly manage three sources of uncertainty:

Measurement noise in both references and queries,
Privacy-induced noise—introduced to geometric summaries for $(\varepsilon, \delta)$ -DP,
Intrinsic geometric error owing to manifold curvature and finite sample estimation.

This work occupies a critical intersection between geometry-driven latent structure inference and the formal privacy regime required by HIPAA, GDPR, and other regulations, where existing manifold denoising strategies either fail to protect confidentiality or are rendered ineffective by privacy noise.

Framework: Differentially Private Local Geometry Estimation

The denoising methodology proceeds in two tightly coupled phases: (i) DP estimation and release of local geometric surrogates; (ii) correction of queries using these privatized objects.

The reference data $\{\mathbf{y}_i\}_{i=1}^n \subset \mathbb{R}^D$ comprises noisy samples near a $d$ -dimensional $C^2$ manifold $\mathcal{M}$ , with independent query points $\mathbf{z} \in \mathbb{R}^D$ for correction.

DP Local PCA

Local tangent space geometry is recovered using a variant of kernelized principal component analysis (kPCA), computed over reference neighborhoods $B_D(\mathbf{z}, h)$ . The algorithm privatizes:

The empirical tangent projector (rank $d$ spectral projector of the local covariance),
The kernel-weighted local mean.

Sensitivity analysis yields that the Frobenius-norm change in both summaries under a replace-one operation is $O((nh^d)^{-1})$ (projector) and $O((nh^{d-1})^{-1})$ (mean), for suitable scale $\{\mathbf{y}_i\}_{i=1}^n \subset \mathbb{R}^D$ 0. Each summary is privatized via independently calibrated Gaussian mechanisms, with post-processing (returning the top- $\{\mathbf{y}_i\}_{i=1}^n \subset \mathbb{R}^D$ 1 eigenspace for projectors) preserving privacy under arbitrary further computation.

Theoretical results (Theorem 1) establish that the privatized tangent estimator achieves principal angle distance to the true tangent space

$\{\mathbf{y}_i\}_{i=1}^n \subset \mathbb{R}^D$ 2

where the first two terms reflect geometric and measurement errors, and the last term is the privacy penalty, scaling optimally in the ambient dimension $\{\mathbf{y}_i\}_{i=1}^n \subset \mathbb{R}^D$ 3, sample size, and privacy budget.

Figure 1: Synthetic manifold denoising results show effective recovery for canonical geometries (circle, torus, Swiss roll, sphere) across multiple error regimes and noise scales.

Iterative Manifold Denoising Under Privacy

The denoising operator, motivated by the normal bias decomposition [yao2025manifold], implements a fixed-point update:

$\{\mathbf{y}_i\}_{i=1}^n \subset \mathbb{R}^D$ 4

where $\{\mathbf{y}_i\}_{i=1}^n \subset \mathbb{R}^D$ 5 is the privatized local tangent projector (from the weighted sum of reference projectors), and $\{\mathbf{y}_i\}_{i=1}^n \subset \mathbb{R}^D$ 6 is the privatized local mean. All privacy noise is introduced in these low-dimensional surrogates.

Compositional privacy accounting is conducted using zero-concentrated DP (zCDP), allowing for modular split of the overall budget over iterations and queries, and precise tracking of the cumulative privacy loss under additive composition.

Theoretical guarantees (Theorem 2) show the denoised query converges to the manifold with error bounded by

$\{\mathbf{y}_i\}_{i=1}^n \subset \mathbb{R}^D$ 7

with $\{\mathbf{y}_i\}_{i=1}^n \subset \mathbb{R}^D$ 8 the privacy noise scales for projector and mean, respectively.

Figure 3: Robustness of denoising performance with respect to noise scale, sample size, and ambient dimension across canonical manifolds under Gaussian noise.

Empirical simulations show that with moderate privacy budgets $\{\mathbf{y}_i\}_{i=1}^n \subset \mathbb{R}^D$ 9, the DP denoiser achieves utility comparable to non-private analogues, and consistently preserves local and global geometric structure across severe high-curvature regimes and high $d$ 0.

Figure 2: Scalability results on high-dimensional spheres demonstrate that error remains stable as $d$ 1 increases, with controlled computation cost and neighborhood size.

Empirical Analyses: Biomedical and Single-Cell Applications

The DP manifold denoising pipeline is validated on two high-dimensional real-world tasks:

UK Biobank Clinical Biomarkers

Application to UK Biobank biomarker profiles (60 dimensions, $d$ 2) demonstrates that DP-manifold denoising preserves local geometry, maintains subject-level stability, and yields consistent or improved discrimination in downstream Cox models for disease risk prediction, even under privacy constraints. The noise-induced privacy-utility tradeoff is quantified across clinically meaningful endpoints.

Figure 4: Manifold denoising of UK Biobank data maintains local geometric stability and supports robust downstream risk stratification across numerous disease endpoints.

Figure 6: Full ICD-coded endpoint panel; denoising shifts are consistent across all clinical outcomes of interest, indicating broad compatibility for risk modeling.

Single-Cell RNA-Seq

In single-cell RNA-seq datasets, DP denoising systematically improves clustering accuracy and normalized mutual information relative to input expression matrices, closely tracking non-private denoising in ARI and NMI across both homogeneous and complex tissues. The effect sizes are consistent across datasets with varying sparsity and cell-type composition.

Privacy Versus Utility: Tradeoff and Optimization

The modular zCDP-conforming budget allocation enables explicit management of the privacy-utility tradeoff. Empirical curves show sharp reductions in error as the privacy budget increases, with saturation observed at practical privacy values $d$ 3. The empirical results are stable with respect to underlying geometry complexity and data dimensionality.

Figure 5: Privacy–utility tradeoff curves on Swiss roll and torus demonstrate that utility loss from privacy saturates rapidly, with near-optimal accuracy attained for moderate budgets.

Theoretical and Practical Implications

This framework provides the first rigorous, non-asymptotic utility bounds for geometry-aware denoising on unknown manifolds under formal DP, with explicit separation of curvature, measurement, and privacy terms. Its mechanisms are both computationally efficient (achieving $d$ 4 complexity per iteration, sublinear in both $d$ 5 and $d$ 6 under parallelization) and directly transferable to regulated high-dimensional applications. Notably, geometric errors do not substantially inflate under privacy, provided neighborhood size and noise scales are set according to the theoretical recommendations.

Key theoretical insights:

The sensitivity of geometric surrogates is governed by local neighborhood cardinality and bandwidth, following the minimax scaling for manifold estimation,
Privacy noise does not induce structural collapse, as projector noise primarily perturbs only the normal correction rather than intrinsic geometry,
The modular design allows implementation under general privacy accounting (standard DP, RDP, zCDP, or Gaussian DP).

Open Problems and Future Directions

The study remarks on several open challenges:

Eliminating the leading-order noise dependence from the utility bound via higher order cancellation or improved geometric surrogates,
Extending to unbounded (e.g., Gaussian) or heavy-tailed noise models, possibly integrating robust DP statistics,
Defining and privately releasing global manifold representations (meshes, implicit functions) beyond discrete point corrections,
Generalizing to privacy of both reference and queries (two-sided or interactive DP).

Conclusion

This work delivers a comprehensive framework for manifold denoising with rigorous differential privacy guarantees, achieving strong empirical and theoretical performance across synthetic, biomedical, and omics datasets. The methods are immediately deployable for privacy-sensitive geometric workflows and establish foundational limits for privacy-preserving inference under the manifold hypothesis.

Reference: "Differentially Private Manifold Denoising" (2604.00942)

Markdown Report Issue