Papers
Topics
Authors
Recent
2000 character limit reached

Interactive Proofs for Distribution Testing

Updated 4 December 2025
  • Interactive proofs for distribution testing are protocols where a verifier efficiently checks distribution properties with help from a powerful prover, reducing sample requirements.
  • The methodology leverages cryptographic commitments and interactive arguments of proximity to ensure high soundness and efficient verification even against adversarial provers.
  • Extensions using conditional queries and distribution-free settings achieve exponential sample savings, broadening the practical applicability to various testing scenarios.

Interactive proofs for distribution testing enable a computationally efficient and sample-efficient verifier to ascertain whether an unknown distribution over a finite domain possesses a specified property, via interaction with an untrusted yet computationally powerful prover. These protocols address the fundamental limitations of classical distribution testing, where verifying nontrivial properties typically demands at least Ω(N)\Omega(\sqrt{N}) samples for domain size NN, by leveraging interactive and cryptographic techniques to achieve significant reductions in verifier resources. Recent advances extend the model to information constraints, conditional queries, and distribution-free settings, elucidating both general and specialized trade-offs in communication, rounds, and soundness.

1. Formal Framework and Definitions

Let DD be an unknown distribution over a finite set [N]={1,,N}[N] = \{1, \dots, N\}. The canonical metric is total variation (statistical) distance: dtv(p,q)=12x=1Np(x)q(x).d_{\mathrm{tv}}(p, q) = \frac{1}{2} \sum_{x=1}^N |p(x) - q(x)|. For a property Π{all distributions on [N]}\Pi \subseteq \{ \text{all distributions on } [N] \}, we define dtv(D,Π)=infQΠdtv(D,Q)d_{\mathrm{tv}}(D, \Pi) = \inf_{Q \in \Pi} d_{\mathrm{tv}}(D, Q). Distribution testing asks, given sampling access to DD, to decide whether DΠD \in \Pi or dtv(D,Π)>εd_{\mathrm{tv}}(D, \Pi) > \varepsilon for parameter ε>0\varepsilon > 0.

An interactive proof for distribution testing involves a probabilistic polynomial-time verifier (with sample access to DD) and an all-powerful, potentially malicious prover. Completeness requires acceptance with high probability when DΠD \in \Pi and the prover behaves honestly. Computational soundness ensures rejection with high probability if dtv(D,Π)>εd_{\mathrm{tv}}(D, \Pi) > \varepsilon, against any polynomial-time cheating prover, under cryptographic assumptions such as the existence of collision-resistant hash functions (Herman et al., 10 Sep 2024).

2. Protocol Structure and Methodology

For any property Π\Pi decidable in polynomial time, the protocol of (Herman et al., 10 Sep 2024) employs four messages integrating statistical testing, cryptographic commitment, and generalized proximity proofs:

  1. Message 1 (Verifier \to Prover): The verifier generates and sends a collision-resistant hash (CRH) family key.
  2. Message 2 (Prover \to Verifier): The prover (honestly) constructs an explicit approximation QQ of DD, commits to QQ using a succinct CRH-based tree (whose digest dd enables efficient local opening of any Q[x]Q[x], cdf, etc.), and sends dd.
  3. Message 3 (Verifier \to Prover): The verifier executes a tolerant identity test: draws m=O(N/ε2)m = O(\sqrt{N}/\varepsilon^2) samples from DD, requests the prover to open Q[xi]Q[x_i], and verifies local consistency using the CRH. The verifier also simulates queries of an offline tester to QQ via local openings.
  4. Message 4 (Prover \to Verifier): With QDQ \approx D certified, the verifier reduces {QΠ}\{ Q \in \Pi \} (or approximate proximity to Π\Pi) to a string-proximity instance and runs a 4-message interactive argument of proximity (IAP), grounded in probabilistically checkable proofs of proximity (PCPP). Verifier accepts only if both identity and property tests pass.

Key technical ingredients are the ability to succinctly commit to QQ, enabling local verification without full transmission, and wrapping the property test in PCPP-based IAPs.

3. Complexity Bounds and Optimality

Resource Bound Optimality (up to $\polylog N$ factors)
Communication O~(N/ε2)\widetilde O(\sqrt{N}/\varepsilon^2) bits Yes
Verifier runtime O~(N/ε2)\widetilde O(\sqrt{N}/\varepsilon^2) Yes
Sample complexity O~(N/ε2)\widetilde O(\sqrt{N}/\varepsilon^2) Yes (cannot be beaten by any interactive protocol)

Here, O~()\widetilde O(\cdot) hides polylogarithmic factors in NN.

This matches the sample complexity lower bounds for tolerant testing (i.e., distinguishing dtv(D,Π)εd_{\mathrm{tv}}(D,\Pi)\leq \varepsilon vs dtv(D,Π)2εd_{\mathrm{tv}}(D,\Pi)\geq 2\varepsilon) that apply even to stand-alone, non-interactive testers for properties such as uniformity (Herman et al., 10 Sep 2024). Thus, sublinear sample tests remain optimally efficient within this general interactive paradigm unless extra oracular power is supplied.

4. Extensions: Conditional Oracles and Exponential Gains

Interactive proofs endowed with stronger oracles enable much sharper efficiency. For label-invariant properties (closed under relabelings), augmenting the verifier with a minimal number of pairwise conditional (PCOND) queries—each comparing probabilities D[i],D[j]D[i], D[j] by sampling according to D[{i,j})D[\{i,j\})—breaks the Ω(N)\Omega(\sqrt{N}) sample lower bound.

The main result (Biswas et al., 27 Nov 2025) establishes that for every label-invariant Π\Pi and ε>0\varepsilon>0, there is a public-coin interactive protocol with

O~(logNε2) samples,poly(logN,1/ε) PCOND queries, communication, and rounds,\widetilde O\left(\frac{\log N}{\varepsilon^2}\right) \text{ samples},\quad \mathrm{poly}(\log N, 1/\varepsilon) \text{ PCOND queries, communication, and rounds},

while preserving both completeness and soundness for the property testing task.

The protocol structure involves the prover claiming a bucketized histogram, with the verifier checking agreement by sampling points, conducting PCOND-based local comparisons, and statistically testing the fit. This approach achieves exponential saves in sample complexity, rendering testing feasible for massive domains.

5. Distribution-Free and Information-Constrained Settings

Generalizing to unknown sampling distributions (distribution-free property testing), interactive proofs of proximity (df-IPPs) enable sublinear proximity testing for Boolean functions under arbitrary DD (Aaronson et al., 2023). The verifier is allowed sample access to DD and queries to the function ff; the completeness and soundness conditions are defined with respect to dD(f,L)d_D(f,L). The principal result states:

For any log-space-uniform NCNC property L{0,1}nL \subseteq \{0,1\}^n, proximity ε>0\varepsilon > 0, and trade-off parameter 1τn1\leq \tau \leq \sqrt{n}, there is a df-IPP with

Q(queries)=τ+O(1/ε), S(samples)=τ+O(1/ε), C(communication)=O~(n/τ+1/ε),Q(\text{queries}) = \tau + O(1/\varepsilon),\ S(\text{samples}) = \tau + O(1/\varepsilon),\ C(\text{communication}) = \tilde O(n/\tau + 1/\varepsilon),

and polylogarithmic rounds/verifier time. For well-behaved distributions (smooth, product), the communication complexity can be further reduced.

In distributed settings constrained by b-bit-per-user communication or ε\varepsilon-LDP, interactivity does not lower the fundamental sample requirements for goodness-of-fit testing: optimal bounds are realized by public-coin, noninteractive protocols. However, for specially structured channels (e.g., "leaky queries"), fully interactive protocols demonstrate improved performance by adaptively concentrating information, achieving polynomial savings (Acharya et al., 2020).

6. Specializations and Protocol Instantiations

The general interactive proof protocol specializes efficiently to classical properties:

  • Uniformity testing: The prover commits to U[N]U_{[N]}. The protocol reduces to an identity test between DD and U[N]U_{[N]}, recovering the Θ(N/ε2)\Theta(\sqrt{N}/\varepsilon^2) sample and time complexity, but with soundness enforced against polynomial-time cheating provers via cryptographic commitments.
  • Monotonicity testing: The verifier verifies QDQ\approx D and in the final phase checks, via IAP and PCPP, (approximate) monotonicity of QQ with overall complexity O~(N/ε2)\widetilde O(\sqrt{N}/\varepsilon^2).
  • Label-invariant properties with PCOND: The protocol achieves polylogarithmic sample and query cost, tolerantly testing properties such as identity (up to relabeling), support size, entropy estimation, and monotonicity (Biswas et al., 27 Nov 2025).

7. Techniques, Proof Strategies, and Open Directions

Two major technical advances underlie efficient interactive proofs for distribution testing:

  1. Succinct Commitment via CRH Tree: Enables local openings (pdf/cdf) in O(logN)O(\log N) time; crucial for scalable commitment to the prover's claimed distribution and fast local verification.
  2. Interactive Arguments of Proximity: Imported from PCP of proximity (PCPP), these permit interactive proofs on string encodings of distributions without full disclosure, and ensure computational soundness against cheating provers.

The statistical identity testing phase provides the core sublinear sample savings, while the interactive proximity argument delegates the computational “hard” global property check to the prover, resulting in at least quadratic verifier-side speedup.

Extensions with conditional oracles demonstrate that judiciously strengthening the verifier's query model—without sacrificing the interaction model—can exponentially reduce the sample complexity for broad classes of properties. In contrast, information-constrained (e.g., LDP, communication-limited) models exhibit settings where interaction yields no further improvement, and others (leaky-query families) where adaptive interaction is provably beneficial (Acharya et al., 2020).

Open problems include minimizing PCOND query complexity to constants, reducing round complexity, and relaxing requirements on prover knowledge while maintaining soundness and efficiency (Biswas et al., 27 Nov 2025). Another direction is adapting these interactive proof techniques to further distributional models and testing properties with less structural symmetry.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Interactive Proofs for Distribution Testing.