Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Differentially Private Release and Learning of Threshold Functions (1504.07553v2)

Published 28 Apr 2015 in cs.CR and cs.LG

Abstract: We prove new upper and lower bounds on the sample complexity of $(\epsilon, \delta)$ differentially private algorithms for releasing approximate answers to threshold functions. A threshold function $c_x$ over a totally ordered domain $X$ evaluates to $c_x(y) = 1$ if $y \le x$, and evaluates to $0$ otherwise. We give the first nontrivial lower bound for releasing thresholds with $(\epsilon,\delta)$ differential privacy, showing that the task is impossible over an infinite domain $X$, and moreover requires sample complexity $n \ge \Omega(\log*|X|)$, which grows with the size of the domain. Inspired by the techniques used to prove this lower bound, we give an algorithm for releasing thresholds with $n \le 2{(1+ o(1))\log*|X|}$ samples. This improves the previous best upper bound of $8{(1 + o(1))\log*|X|}$ (Beimel et al., RANDOM '13). Our sample complexity upper and lower bounds also apply to the tasks of learning distributions with respect to Kolmogorov distance and of properly PAC learning thresholds with differential privacy. The lower bound gives the first separation between the sample complexity of properly learning a concept class with $(\epsilon,\delta)$ differential privacy and learning without privacy. For properly learning thresholds in $\ell$ dimensions, this lower bound extends to $n \ge \Omega(\ell \cdot \log*|X|)$. To obtain our results, we give reductions in both directions from releasing and properly learning thresholds and the simpler interior point problem. Given a database $D$ of elements from $X$, the interior point problem asks for an element between the smallest and largest elements in $D$. We introduce new recursive constructions for bounding the sample complexity of the interior point problem, as well as further reductions and techniques for proving impossibility results for other basic problems in differential privacy.

Citations (187)

Summary

  • The paper establishes new lower bounds on the sample complexity for releasing threshold functions under differential privacy, demonstrating impossibility over infinite domains.
  • It proposes an improved algorithm for releasing thresholds, significantly reducing the upper bound on sample complexity compared to prior work.
  • The study introduces technical reductions and constructions with implications for distribution learning and PAC learning under privacy, highlighting the utility-privacy trade-off.

Differentially Private Release and Learning of Threshold Functions

The paper in question addresses fundamental challenges in the context of differential privacy by focusing particularly on threshold functions—a class of problems characterized by whether data points meet certain threshold criteria. The paper achieves this by presenting new bounds on the sample complexity of differentially private algorithms for releasing approximate answers to threshold queries. These bounds are both upper and lower, contributing a nuanced understanding of the interplay between privacy guarantees and data utility.

Key Contributions

  1. Lower Bound on Sample Complexity: The authors present a novel lower bound for the release of thresholds under (ϵ,δ)(\epsilon, \delta)-differential privacy, suggesting the task's impossibility over infinite domains and necessitating sample complexity nΩ(logX)n \ge \Omega(\log^*|X|). This bound is groundbreaking as it challenges previous assumptions that releasing thresholds could be performed efficiently with static sample sizes, indicating instead a dependency on domain size.
  2. Algorithm for Releasing Thresholds: An algorithm is proposed for releasing thresholds with sample complexity n2(1+o(1))logXn \le 2^{(1+o(1))\log^*|X|}, which improves upon earlier upper bounds. This algorithm demonstrates the paper's goal of offering more efficient solutions, reducing overhead compared to the prior 8(1+o(1))logX8^{(1+o(1))\log^*|X|} sample complexity.
  3. Technical Reductions: To achieve these results, the paper introduces reductions to and from the interior point problem, highlighting novel recursive constructions for bounding sample complexities and demonstrating impossibility results for other differential privacy problems.
  4. Distributive Learning and PAC Learning: The implications of the lower bound extend beyond threshold queries to the contexts of distribution learning under Kolmogorov distance and properly PAC learning these thresholds, furthering the differentiation between learning with and without privacy.

Implications and Future Directions

These results not only solidify the essential trade-off between differential privacy and utility but also open avenues for further refinement in private data analysis methodologies. The paper lays a foundational stone in the understanding of what is feasible within the realms of differential privacy, igniting prospects for future research to delve into mitigating complexity gaps, especially within high-dimensional and infinite domains.

Despite the considerable progress made by this work, the discrepancy between upper and lower bounds on sample complexity remains wide. Addressing such discrepancies presents a critical challenge and opportunity for future research to refine privacy algorithms further. Additionally, the reduction techniques employed could be expanded to more complex data structures and queries, encompassing a broader universe of applications in privacy-preserving data analysis.

Overall, the paper offers significant insights and tools, providing a deeper theoretical understanding and practical algorithms in the domain of differentially private computation on threshold functions. It extends the conversation on how privacy can be balanced with utility in data-rich environments, especially those involving an infinite array of possible inputs.