PAC Indistinguishability: Theory & Algorithms
- PAC Indistinguishability is a framework that generalizes standard PAC learning by focusing on making a predictor indistinguishable from a target using a class of outcome-based distinguishers.
- It leverages metric entropy and dual Minkowski norms to tightly characterize sample complexity, linking classical PAC and agnostic L₁-learning regimes.
- Practical algorithms like Distinguisher-Covering and Multiaccuracy Boost demonstrate how theoretical bounds translate into efficient predictor selection and update procedures.
PAC indistinguishability, also termed no-access Outcome Indistinguishability (OI), generalizes the standard PAC (Probably Approximately Correct) learning paradigm by considering a scenario where the goal is to output a predictor that cannot be distinguished from a target predictor by a class of distinguishers, based on the observable outcomes derived from predictions. The distinguishing power of , the interplay between metric entropy and sample complexity, and the duality connections to convex geometry are central to the theory, yielding a framework that interpolates between classical PAC learning and agnostic -learning depending on the choice of (Hu et al., 2022).
1. Formal Definition and Framework
Let denote an instance space and a probability distribution over . A predictor induces a joint law 0 on 1: first sample 2, then generate 3. Fix a distinguisher class 4, possibly randomized.
The distinguishing advantage for 5 and predictors 6 is given by
7
The maximized distinguishing advantage over 8 is
9
A predictor 0 is 1-OI to 2 under 3 if 4.
Expressing the distinguisher action as a function 5, with 6, the distinguishing advantage becomes
7
Thus 8 corresponds to the dual Minkowski semi-norm:
9
Variants correspond to realizable vs. agnostic (the ground truth 0 in a known class 1, or not) and distribution-specific vs. distribution-free (where the learner may or may not adapt to 2).
2. Metric Entropy Characterization in the Distribution-Specific Realizable Setting
In the realizable, distribution-specific case, assume 3 is the unknown target, 4 is the distinguisher class, and 5 is fixed.
The central sample complexity measure is the covering number (metric entropy) of 6 with respect to the dual Minkowski norm:
7
Lower Bound: Packing arguments yield that any (possibly improper, randomized) learner using 8 i.i.d. samples drawn from 9 must satisfy:
0
Upper Bound: The "Distinguisher-Covering" algorithm computes an approximate 1-cover of 2 in the dual norm 3, empirically estimates 4 for 5 in this cover, and selects 6 minimizing the maximum estimation error. It achieves:
7
3. Metric-Entropy Duality: Tight Characterizations
Leveraging the symmetry between covering 8 by 9 and 0 by 1, a metric-entropy duality theorem holds: for any bounded, nonempty 2, 3, and 4,
5
for an absolute constant 6. In particular, plugging 7, 8 yields nearly tight two-sided bounds:
9
This duality connects the sample complexity of PAC indistinguishability to metric entropy duality phenomena in convex geometry. The 0 term is essential unless convexity further simplifies the setting (Hu et al., 2022).
4. Distribution-Free Characterization via Fat-Shattering Dimension
In the distribution-free agnostic and realizable settings—typically with 1—the sample complexity is governed by the fat-shattering dimension of 2, denoted 3. For any 4, 5:
6
This result leverages uniform convergence (via the fat-shattering dimension), and a multiaccuracy boosting algorithm that performs iterative updates: in each round, if there exists 7 with sufficient average discrepancy, 8 is updated in the direction of 9. Each round uses 0 fresh samples and decreases 1 distance by 2. Packing arguments establish the matching lower bound.
5. Separation Between Realizable and Agnostic Regimes
A critical departure from classical PAC theory is the potential for an unbounded separation between realizable and agnostic PAC indistinguishability sample complexity:
| Setting | Realizable Sample Complexity | Agnostic Sample Complexity |
|---|---|---|
| 3 finite or 4 | 5 | 6 or 7 |
| 8, 9 arbitrary | 0 | 1 |
Concretely, for 2 differing at one point, realizable OI learning is trivial but agnostic, distribution-free OI requires 3 samples. Under restrictions such as 4 symmetric convex or 5 containing all 6 functions (and 7 binary), the rates collapse to 8 or to the metric-entropy rate.
6. Algorithms for PAC Indistinguishability
Two principal algorithmic approaches realize the aforementioned sample complexity bounds:
- Distinguisher-Covering (Realizable, Distribution-Specific):
- Cover 9 under 0.
- On 1 samples 2, estimate 3 for 4 in the cover.
- Select 5 minimizing the worst empirical error across the covering set.
- Multiaccuracy Boost (Distribution-Free):
- Initialize 6.
- Repeat 7 rounds:
- Draw batch of 8 examples.
- If some 9 has discrepancy 00, update 01, clipped to 02.
- Otherwise, terminate.
Both algorithms yield nearly tight rates matching the theoretical characterizations given by metric entropy and fat-shattering dimension, respectively.
7. Mathematical Constructs and Significance
The theory centralizes two geometric-combinatorial constructs:
- Dual Minkowski Norm: For 03 and 04,
05
- Metric Entropy (Covering Number):
06
These underlie both the upper/lower bounds and duality results. The theory provides the first tight, general characterizations for the number of samples needed to ensure 07-indistinguishability, providing a continuum of learning-theoretic settings from PAC to fully agnostic 08-learning by appropriately varying 09 (Hu et al., 2022).