Papers
Topics
Authors
Recent
Search
2000 character limit reached

DistanceDP Mechanism

Updated 7 May 2026
  • DistanceDP is a class of privacy mechanisms that adjust guarantees based on input distance, using metrics such as Euclidean, Earth Mover’s, and graph edit distances.
  • It features concrete instantiations like isotropic noise for vector embeddings, Gaussian noise for user data, and recursive graph decompositions for private distance queries.
  • Empirical studies show that DistanceDP offers improved utility-privacy tradeoffs over classical DP, enabling efficient retrieval, query release, and federated analytics.

DistanceDP Mechanism

DistanceDP refers to a class of privacy mechanisms that utilize a distance metric—typically on inputs such as vectors, graphs, or empirical distributions—to calibrate the tradeoff between privacy loss and data perturbation. Rather than applying uniform privacy guarantees to all neighboring datasets as in standard differential privacy (DP), DistanceDP provides a parameterized privacy guarantee that degrades gracefully based on the “distance” between inputs. This structure allows for stronger privacy guarantees when operands are close, but permits more information leakage at greater distances. DistanceDP has emerged in several technical domains: high-dimensional embedding privacy for retrieval-augmented generation and search, metric or graph-valued data, and user-level privacy with heterogeneous data contributions.

1. Formal Definitions and Core Principles

The general paradigm of DistanceDP is to relax the standard neighboring-dataset definition of DP by introducing a metric d(,)d(\cdot, \cdot) over the data domain. A randomized mechanism MM satisfies (ϵ,δ)(\epsilon, \delta)-DistanceDP with respect to metric dd if for all inputs x,xx, x' and measurable output sets SS,

Pr[M(x)S]exp(ϵd(x,x))Pr[M(x)S]+δ\Pr[M(x) \in S] \leq \exp(\epsilon\,d(x, x'))\Pr[M(x') \in S] + \delta

When d(x,x)d(x, x') is small, privacy leakage is tightly controlled; for large dd, the guarantee weakens proportionally.

Concrete instantiations include:

  • Euclidean metric for vectors (as in (n,ϵ)(n,\epsilon)-DistanceDP): Applied in embedding perturbation for LLM queries, MM0 (Cheng et al., 2024).
  • Earth Mover’s Distance (EMD) for user datasets: Captures both magnitude and spatial discrepancy between multisets of data items (Imola et al., 2024).
  • Graph edit distances: Used in graph-private distance queries, either with symmetric (edge addition/removal) or asymmetric (preferential monotonicity) neighborhoods (Sheng et al., 14 Jan 2025).

The key distinction from classical DP is the parameterization of privacy costs by input distance, enabling nuanced privacy–utility tuning that aligns with the inherent structure of the data.

2. Mechanistic Instantiations

(a) Euclidean DistanceDP for Vector Embeddings

In the MM1-DistanceDP model, the input space is MM2 and privacy loss is scaled by Euclidean embedding distance. The canonical perturbation is isotropic noise with radial density proportional to MM3:

  • Perturbation Process:
    • Sample MM4.
    • Sample MM5 uniformly.
    • Output MM6, where MM7 is the original embedding (Cheng et al., 2024).
  • Guarantee: This mechanism ensures for any MM8,

MM9

for all output embeddings (ϵ,δ)(\epsilon, \delta)0.

(b) DistanceDP for User-Level Metric DP

For privacy over user datasets (ϵ,δ)(\epsilon, \delta)1, the metric is the (ϵ,δ)(\epsilon, \delta)2-Wasserstein (Earth Mover's) distance (ϵ,δ)(\epsilon, \delta)3. Mechanisms such as noisy linear query output (for Lipschitz queries) or shuffle-amplified local randomization are used (Imola et al., 2024):

  • Gaussian mechanism for (ϵ,δ)(\epsilon, \delta)4-DP: Add Gaussian noise of scale

(ϵ,δ)(\epsilon, \delta)5

where (ϵ,δ)(\epsilon, \delta)6 is the Lipschitz constant of the query w.r.t. the ground metric.

  • Shuffle-amplified local randomization: Apply an (ϵ,δ)(\epsilon, \delta)7-differentially private local mechanism under (ϵ,δ)(\epsilon, \delta)8, then apply secure shuffling for privacy amplification.

(c) DistanceDP in Graph-Structured Data

  • Binary Tree and Separator-Based Mechanisms: For releasing all-pair graph distances, DistanceDP employs recursive graph decompositions using vertex separators, with noise added only to separator distances and composition at logarithmic (tree) depth (Dinitz et al., 4 Apr 2025).
  • Asymmetric Neighborhoods and Smooth Sensitivity: For unweighted graphs, monotonicity is exploited by defining one-sided edge addition/removal as neighbors and calibrating noise to individual smooth sensitivity rather than global sensitivity (Sheng et al., 14 Jan 2025).

3. Utility-Privacy Tradeoffs and Quantitative Analysis

DistanceDP mechanisms achieve accuracy–privacy bounds strictly better than uniform DP in structured data or metric spaces:

  • Embedding space (ϵ,δ)(\epsilon, \delta)9-DistanceDP: For embeddings of typical dimension dd0 or dd1 and privacy parameter dd2, noise magnitude is dd3 and can retain nearly dd4 top-dd5 recall with only a moderate increase in search pool size (dd6) (Cheng et al., 2024).
  • Graph all-pairs distances: On recursively separable graphs of dd7 vertices and maximum edge weight dd8, the DistanceDP mechanism gives additive errors dd9 for x,xx, x'0-minor-free graphs, vs x,xx, x'1 for naïve edge-noise. For grid graphs, error is x,xx, x'2 (Dinitz et al., 4 Apr 2025).
  • Earth Mover's DistanceDP: For linear queries under x,xx, x'3-DP, error is x,xx, x'4; for frequency estimation, x,xx, x'5 (Imola et al., 2024).
  • Asymmetric sensitivity in unweighted graphs: Utilizing individual one-sided smooth sensitivity (e.g., diameter minus one for edge addition) drastically reduces noise compared to classical (global) edge-DP. On real graphs, average relative error drops below x,xx, x'6 for x,xx, x'7 and x,xx, x'8 for x,xx, x'9 (Sheng et al., 14 Jan 2025).

4. Applications: Retrieval Privacy, Graph Data, and Query Release

  • Privacy-Preserving RAG (Retrieval-Augmented Generation): SS0-DistanceDP enables privacy against embedding inversion by perturbing user query embeddings before retrieval in cloud RAG services, maintaining full retrieval accuracy with orders-of-magnitude improvements in efficiency and privacy protection (Cheng et al., 2024).
  • All-Pairs Shortest Path Release: DistanceDP generalizes the tree/binary mechanism to arbitrary recursively separable graphs, controlling error by separator size and depth, and remains competitive for both exact and approximate (stretch) distance release (Dinitz et al., 4 Apr 2025).
  • User-Level Heterogeneous Privacy: In scenarios such as federated analytics, metric DP with earth mover's distance allows tunable privacy for changes in users' contributions, outperforming standard user-level DP mechanisms as long as the permitted budget SS1 is not tightly constrained (Imola et al., 2024).
  • Edge-Local and LDP Distance Queries: For social graphs without trusted curators, DistanceDP (e.g., neighbor-aggregation in LDP) yields efficient, accurate protocols for private distance estimation, outperforming synthetic-graph LDP methods by several orders of magnitude (Sheng et al., 7 Aug 2025).

5. Composition, Sensitivity, and Mechanism Design

  • Sensitivity Analysis: DistanceDP mechanisms calibrate noise based on local or smooth sensitivity with respect to the chosen metric, such as diameter for graphs or Lipschitz constant and EMD for datasets. Asymmetric neighborhood design and monotonicity properties enable reduced noise by considering only one-sided changes (edge addition/removal) (Sheng et al., 14 Jan 2025).
  • Mechanism Composition: Multi-level decomposition (e.g., binary separator trees) incurs only logarithmic noise composition, unlike per-edge mechanisms that accumulate linearly, yielding efficient mechanisms with tightly bounded error (Dinitz et al., 4 Apr 2025).
  • Optimality and Lower Bounds: For all-pairs graph distance release, sublinear error is achievable (SS2 for pure DP, SS3 for approximate DP), but a lower bound of SS4 is unavoidable in general graphs due to inherent linear query discrepancy barriers (Ghazi et al., 2022).

6. Practical Considerations and Parameter Selection

  • Parameter Tuning: Embedding dimension, privacy budgets, separator sizes, and covering sizes all affect realized error and computational overhead. For instance, embedding perturbation with SS5 ensures minimal top-SS6 loss, while separator size tuning in binary-tree mechanisms balances error with decomposition depth (Cheng et al., 2024, Dinitz et al., 4 Apr 2025).
  • Algorithmic Complexity: Construction of separator trees, sampling of high-dimensional isotropic noise, or iterative LDP aggregation all entail cost considerations. For example, Laplace or Gaussian draws need truncation/clamping for numerical stability, and separators are found by established algorithms (e.g., Lipton–Tarjan, excluded-minor) for decomposable graphs.
  • Empirical Performance: Evaluations on real and synthetic datasets (social networks, large knowledge bases) confirm that DistanceDP constructions consistently outperform classical DP approaches under equivalent privacy budgets on accuracy, efficiency, and communication cost (Cheng et al., 2024, Sheng et al., 14 Jan 2025, Sheng et al., 7 Aug 2025).

7. Theoretical Boundaries and Research Directions

DistanceDP establishes a spectrum of privacy-utility guarantees parameterized by domain geometry and distance structure. Open theoretical questions include tightening the gap between upper and lower bounds for all-pairs distance release, understanding the discrepancy barriers for more exotic graph classes, and extending metric DP variants (e.g., EMD-DP) to high-volume or streaming settings. The applicability and optimization of DistanceDP under various adversarial models, domain-specific semantic metrics, and privacy risk regimes continue to be prominent research themes (Imola et al., 2024, Ghazi et al., 2022, Dinitz et al., 4 Apr 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DistanceDP Mechanism.