Randomness Complexity in Differential Privacy
- Randomness complexity in differential privacy is defined as the minimum amount of internal randomness required to guarantee privacy and accuracy in counting query releases.
- The topic explores techniques like randomized shifting that compress the randomness needed for multiple queries to O(log d), balancing privacy with computational efficiency.
- It highlights practical implications for large-scale deployments, where sharing randomness across queries significantly reduces resource demands while maintaining strict DP guarantees.
Randomness complexity in differential privacy quantifies the minimum amount of internal randomness required by privacy-preserving mechanisms to achieve their rigorous stability guarantees. In the context of classical differential privacy (DP), randomness is integral for adding calibrated noise to dataset queries, masking the influence of single records and ensuring that an adversary cannot distinguish neighboring datasets. The paper of randomness complexity investigates the least possible random bits necessary for accurate and private outputs, the trade-offs when answering many queries, and how the structure of the mechanism impacts the amount of noise—and therefore randomness—that must be generated. Recent work, particularly in the context of counting queries, shows that naively adding independent noise to each query is not randomness-optimal, and considerable savings are possible when randomness is carefully managed and shared across outputs.
1. Fundamental Concepts: Differential Privacy and Randomness
Differential privacy (DP) is satisfied by a randomized mechanism if for all neighboring datasets and all measurable output subsets ,
where is the privacy parameter and is a small failure probability. The randomness complexity of a DP mechanism refers to the minimum expected or worst-case number of random bits required to ensure these guarantees, subject to a specified accuracy bound for the released statistics. For one-way counting queries, this problem takes a concrete form: output the number of database entries satisfying each predicate in a list under -DP, while minimizing the random bit usage for a given (additive) error.
Historically, large-scale DP deployments—such as the 2020 U.S. Census under the Disclosure Avoidance System—used tens of terabytes of randomness due to the sheer volume of queries, motivating the formal paper of derandomization and randomness-efficient DP mechanisms (Garfinkel et al., 2020).
2. Lower Bounds and the Need for Randomness in DP Query Release
For a single counting query, it is known that any -differentially private, reasonably accurate mechanism requires nearly one bit of true randomness per answer; that is, one cannot achieve privacy by deterministic means [CSV25]. This lower bound is essentially tight: to perturb the query sufficiently so that an adversary cannot pinpoint the participation of a single individual, the randomized output must have enough entropy, making the random bit requirement inescapable.
However, when many queries are answered—especially in batch—statistical correlations and the structure of noise injection enable stronger derandomization. In particular, [CSV25] shows that although independent queries appear to need random bits, the total randomness required can be compressed to bits while still meeting the same accuracy and privacy targets, due to the ability to share randomness across coordinates without increasing the risk of privacy loss.
3. Classical Versus Efficient Derandomization Schemes
Earlier derandomization results for counting queries (e.g., [CSV25]) are based on rounding schemes—a class of combinatorial objects that partition the range of possible outputs in a noise-efficient manner. These mechanisms achieve near-optimal randomness complexity, with the expected number of bits
for suitably chosen integer : taking yields random bits. However, such constructions are not known to be efficiently computable; the existence proofs for appropriate rounding schemes are nonconstructive and often useless in practice.
The new mechanism introduced in (Ghentiyala, 19 Oct 2025) addresses these limitations by providing a polynomial-time implementable derandomization for counting queries—eschewing rounding schemes for an approach that is both more intuitive and computationally feasible.
4. Polynomial-Time Randomness-Efficient Mechanism: Randomized Shifting and Selective Noise
The key technical innovation is the use of a randomized shift, , applied uniformly to all coordinates before adding noise and discretizing the result. Specifically, for input counts and independently drawn noise (e.g., discrete Gaussian), the mechanism proceeds as follows:
- A common random shift is generated using bits, with uniformly chosen from for parameters .
- The output is computed as , where rounds each coordinate to the nearest multiple of .
A central insight is that for most coordinates, the effect of is absorbed by the rounding, so that the outcome for those coordinates is insensitive to the precise noise realization—hence, new random bits for are only required in a small fraction of cases. For each coordinate, with probability at least $1 - 2/s$, the output does not depend on the individual noise sample for that coordinate, so the expected number of coordinates needing fresh randomness is $2d/s$. Choosing large (e.g., ) yields total expected random bits .
Randomness-Accuracy Trade-off: As increases, randomness decreases, but accuracy drops due to coarser rounding. The trade-off is explicit: for approximate -DP, and
for pure DP.
5. Quantitative Bounds and Mathematical Formulations
With precise parameter setting, the mechanism guarantees -DP and per-query accuracy,
while the error scales as above. In the limit , random bits suffice for all queries with only a modest loss in accuracy.
6. Implications, Practicality, and Future Directions
This result has immediate ramifications for large-scale, real-world deployments where cryptographically secure randomness is resource-constrained. It demonstrates that batching queries, together with common random shifts and selective noise, can exponentially compress the randomness complexity necessary for private, accurate release of statistics. The analysis also clarifies why randomness savings arise: when many queries are aggregated, much of the per-query randomness is redundant, and can be shared or omitted without weakening privacy, as long as the mechanism's output discretization absorbs ambiguities caused by noise.
Compared to previous, combinatorial approaches, this mechanism is conceptually and computationally simpler—making it attractive for further extensions to more complex analyses.
Research directions include generalizing these batch-derandomization methods to broader query families, establishing hardness results for randomness-optimal mechanisms outside counting queries, and exploring the interplay between randomness complexity, computational efficiency, and privacy for interactive or adaptive query answering.
7. Summary Table: Randomness Complexity for Counting Queries
| Query Setting | Random Bits Required | Mechanism |
|---|---|---|
| 1 Counting Query | Standard DP mechanisms (Laplace/Discrete Gaussian) | |
| Counting Queries | [CSV25] rounding schemes or efficient mechanism (Ghentiyala, 19 Oct 2025) |
The polynomial-time mechanism of (Ghentiyala, 19 Oct 2025) achieves near-optimal randomness savings for private, accurate release of counting queries, providing a practical solution to the scalability bottlenecks encountered in high-dimensional DP deployments.