PAC Indistinguishability: Theory & Algorithms

Updated 26 December 2025

PAC Indistinguishability is a framework that generalizes standard PAC learning by focusing on making a predictor indistinguishable from a target using a class of outcome-based distinguishers.
It leverages metric entropy and dual Minkowski norms to tightly characterize sample complexity, linking classical PAC and agnostic L₁-learning regimes.
Practical algorithms like Distinguisher-Covering and Multiaccuracy Boost demonstrate how theoretical bounds translate into efficient predictor selection and update procedures.

PAC indistinguishability, also termed no-access Outcome Indistinguishability (OI), generalizes the standard PAC (Probably Approximately Correct) learning paradigm by considering a scenario where the goal is to output a predictor $p$ that cannot be distinguished from a target predictor $p^*$ by a class $D$ of distinguishers, based on the observable outcomes derived from predictions. The distinguishing power of $D$ , the interplay between metric entropy and sample complexity, and the duality connections to convex geometry are central to the theory, yielding a framework that interpolates between classical PAC learning and agnostic $L_1$ -learning depending on the choice of $D$ (Hu et al., 2022).

1. Formal Definition and Framework

Let $X$ denote an instance space and $\mu\in \Delta_X$ a probability distribution over $X$ . A predictor $p: X \to [0,1]$ induces a joint law $\mu_p$ on $X \times \{0,1\}$ : first sample $x \sim \mu$ , then generate $o \sim \text{Bernoulli}(p(x))$ . Fix a distinguisher class $D \subseteq \{d: X \times \{0,1\} \to \{0,1\}\}$ , possibly randomized.

The distinguishing advantage for $d \in D$ and predictors $p, p'$ is given by

$\mathrm{Adv}_{\mu, d}(p, p') = \left|\Pr_{(x, o) \sim \mu_p}\left[d(x, o) = 1\right] - \Pr_{(x, o) \sim \mu_{p'}}\left[d(x, o) = 1\right]\right|.$

The maximized distinguishing advantage over $D$ is

$\mathrm{Adv}_{\mu, D}(p, p') = \sup_{d \in D} \mathrm{Adv}_{\mu, d}(p, p').$

A predictor $p$ is $(D,\epsilon)$ -OI to $p'$ under $\mu$ if $\mathrm{Adv}_{\mu, D}(p,p') \le \epsilon$ .

Expressing the distinguisher action as a function $f_d: X \to [-1,1]$ , with $f_d(x) = \Pr[d(x,1)=1] - \Pr[d(x,0)=1]$ , the distinguishing advantage becomes

$\mathrm{Adv}_{\mu, d}(p, p') = \left|\mathbb{E}_{x \sim \mu}[f_d(x)(p(x) - p'(x))]\right|.$

Thus $\mathrm{Adv}_{\mu, D}(p,p')$ corresponds to the dual Minkowski semi-norm:

$\|p - p'\|^*_{D, \mu} := \sup_{f \in D} \left|\mathbb{E}_{x \sim \mu}[f(x)(p(x) - p'(x))]\right|.$

Variants correspond to realizable vs. agnostic (the ground truth $p^*$ in a known class $P$ , or not) and distribution-specific vs. distribution-free (where the learner may or may not adapt to $\mu$ ).

2. Metric Entropy Characterization in the Distribution-Specific Realizable Setting

In the realizable, distribution-specific case, assume $p^* \in P \subseteq [0,1]^X$ is the unknown target, $D \subseteq [-1,1]^X$ is the distinguisher class, and $\mu$ is fixed.

The central sample complexity measure is the covering number (metric entropy) of $P$ with respect to the dual Minkowski norm:

$N_{\mu, D}(P, \epsilon) = \min\left\{N:\, \exists p_1, \ldots, p_N \in P,\ \forall p \in P\ \exists i: \|p - p_i\|^*_{D, \mu} \le \epsilon\right\}.$

Lower Bound: Packing arguments yield that any (possibly improper, randomized) learner using $n$ i.i.d. samples drawn from $\mu_{p^*}$ must satisfy:

$n_{\mathrm{real}}(P, D, \epsilon, \delta, \mu) \geq \log\left((1-\delta) N_{\mu, D}(P, 2\epsilon)\right).$

Upper Bound: The "Distinguisher-Covering" algorithm computes an approximate $\epsilon/2$ -cover of $P-P$ in the dual norm $\|\cdot\|^*_{D, \mu}$ , empirically estimates $\mathbb{E}_{x \sim \mu}[p(x) f(x)]$ for $f$ in this cover, and selects $p \in P$ minimizing the maximum estimation error. It achieves:

$n_{\mathrm{real}}(P, D, \epsilon, \delta, \mu) \leq O(\epsilon^{-2}[\,\log N_{\mu, P}(D, \epsilon/2) + \log(1/\delta)]\,).$

3. Metric-Entropy Duality: Tight Characterizations

Leveraging the symmetry between covering $P$ by $D$ and $D$ by $P$ , a metric-entropy duality theorem holds: for any bounded, nonempty $K_1 \subseteq [-M_1, M_1]^X$ , $K_2 \subseteq [-M_2, M_2]^X$ , and $\epsilon > 0$ ,

$\log N_{\mu, K_2}(K_1, \epsilon) \leq c(M_1 M_2 / \epsilon)^2 (1 + \log N_{\mu, K_1}(K_2, \epsilon/8))$

for an absolute constant $c > 0$ . In particular, plugging $K_1 = P - P$ , $K_2 = D$ yields nearly tight two-sided bounds:

$\Omega\left(\epsilon^2 \cdot \log N_{\mu, D}(P, 16\epsilon) - 1 + \log(1-\delta)\right) \leq n_{\mathrm{real}}(P, D, \epsilon, \delta, \mu) \leq O\left(\epsilon^{-4}\log N_{\mu, D}(P, \epsilon/32) + \epsilon^{-2} \log(1/\delta)\right).$

This duality connects the sample complexity of PAC indistinguishability to metric entropy duality phenomena in convex geometry. The $\epsilon^{-2}$ term is essential unless convexity further simplifies the setting (Hu et al., 2022).

4. Distribution-Free Characterization via Fat-Shattering Dimension

In the distribution-free agnostic and realizable settings—typically with $P = [0,1]^X$ —the sample complexity is governed by the fat-shattering dimension of $D$ , denoted $\mathrm{fat}_D(\gamma)$ . For any $D$ , $\epsilon, \delta \in (0,1)$ :

$n_{\mathrm{df}}([0,1]^X, D, \epsilon, \delta) = \Theta\left(\epsilon^{-4}[\mathrm{fat}_D(\epsilon/25) \cdot (\log(1/\epsilon))^2 + \log(1/\delta)]\right).$

This result leverages uniform convergence (via the fat-shattering dimension), and a multiaccuracy boosting algorithm that performs iterative updates: in each round, if there exists $d \in D$ with sufficient average discrepancy, $p$ is updated in the direction of $d$ . Each round uses $O(\epsilon^{-2}\,\mathrm{fat}_D(\cdot) + \log(1/\delta))$ fresh samples and decreases $L_2$ distance by $\Omega(\epsilon^2)$ . Packing arguments establish the matching lower bound.

5. Separation Between Realizable and Agnostic Regimes

A critical departure from classical PAC theory is the potential for an unbounded separation between realizable and agnostic PAC indistinguishability sample complexity:

Setting	Realizable Sample Complexity	Agnostic Sample Complexity
$P$ finite or $p^* \in \mathrm{conv}(P)$	$O(\epsilon^{-2} \log(1/\delta))$	$\Omega(\|P\|)$ or $\Omega(\mathrm{fat}_D(\cdot))$
$P = [0,1]^X$ , $D$ arbitrary	$O(\epsilon^{-2} \log(1/\delta))$	$\Theta(\epsilon^{-4} \mathrm{fat}_D(\cdot))$

Concretely, for $P = \{p_1, p_2\}$ differing at one point, realizable OI learning is trivial but agnostic, distribution-free OI requires $\Omega(1/\epsilon^2)$ samples. Under restrictions such as $P$ symmetric convex or $D$ containing all $\{\pm1\}$ functions (and $p^*,P$ binary), the rates collapse to $O(\epsilon^{-2} \mathrm{VC})$ or to the metric-entropy rate.

6. Algorithms for PAC Indistinguishability

Two principal algorithmic approaches realize the aforementioned sample complexity bounds:

Distinguisher-Covering (Realizable, Distribution-Specific):
- Cover $P-P$ under $\|\cdot\|^*_{D, \mu}$ .
- On $n$ samples $(x_i, o_i) \sim \mu_{p^*}$ , estimate $\mathbb{E}[p(x)f(x)]$ for $f$ in the cover.
- Select $p \in P$ minimizing the worst empirical error across the covering set.
Multiaccuracy Boost (Distribution-Free):
- Initialize $p \equiv \frac{1}{2}$ .
- Repeat $T = \Theta(\epsilon^{-2})$ rounds:
- Draw batch of $m = \Theta(\epsilon^{-2}\mathrm{fat}_D(\cdot) + \log(1/\delta))$ examples.
- If some $d \in D$ has discrepancy $\geq c\epsilon$ , update $p \leftarrow p + c'\epsilon d$ , clipped to $[0,1]$ .
- Otherwise, terminate.

Both algorithms yield nearly tight rates matching the theoretical characterizations given by metric entropy and fat-shattering dimension, respectively.

7. Mathematical Constructs and Significance

The theory centralizes two geometric-combinatorial constructs:

Dual Minkowski Norm: For $h \in [−1,1]^X$ and $D \subseteq [−1,1]^X$ ,

$\|h\|^*_{D, \mu} := \sup_{f \in D}\left|\mathbb{E}_{x \sim \mu}[h(x)f(x)]\right|.$

Metric Entropy (Covering Number):

$\mathcal{N}(\epsilon, P, \|\cdot\|^*) := \min\left\{N: \exists p_1, \ldots, p_N \in P,\ P \subseteq \bigcup_{i=1}^N \{p: \|p-p_i\|^* \le \epsilon\} \right\}.$

These underlie both the upper/lower bounds and duality results. The theory provides the first tight, general characterizations for the number of samples needed to ensure $(D, \epsilon)$ -indistinguishability, providing a continuum of learning-theoretic settings from PAC to fully agnostic $L_1$ -learning by appropriately varying $D$ (Hu et al., 2022).

PDF Markdown Chat (Pro)

References (1)

Metric Entropy Duality and the Sample Complexity of Outcome Indistinguishability (2022)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to PAC Indistinguishability.