PAC Indistinguishability: Theory & Algorithms
- PAC Indistinguishability is a framework that generalizes standard PAC learning by focusing on making a predictor indistinguishable from a target using a class of outcome-based distinguishers.
- It leverages metric entropy and dual Minkowski norms to tightly characterize sample complexity, linking classical PAC and agnostic L₁-learning regimes.
- Practical algorithms like Distinguisher-Covering and Multiaccuracy Boost demonstrate how theoretical bounds translate into efficient predictor selection and update procedures.
PAC indistinguishability, also termed no-access Outcome Indistinguishability (OI), generalizes the standard PAC (Probably Approximately Correct) learning paradigm by considering a scenario where the goal is to output a predictor that cannot be distinguished from a target predictor by a class of distinguishers, based on the observable outcomes derived from predictions. The distinguishing power of , the interplay between metric entropy and sample complexity, and the duality connections to convex geometry are central to the theory, yielding a framework that interpolates between classical PAC learning and agnostic -learning depending on the choice of (Hu et al., 2022).
1. Formal Definition and Framework
Let denote an instance space and a probability distribution over . A predictor induces a joint law on : first sample , then generate . Fix a distinguisher class , possibly randomized.
The distinguishing advantage for and predictors is given by
The maximized distinguishing advantage over is
A predictor is -OI to under if .
Expressing the distinguisher action as a function , with , the distinguishing advantage becomes
Thus corresponds to the dual Minkowski semi-norm:
Variants correspond to realizable vs. agnostic (the ground truth in a known class , or not) and distribution-specific vs. distribution-free (where the learner may or may not adapt to ).
2. Metric Entropy Characterization in the Distribution-Specific Realizable Setting
In the realizable, distribution-specific case, assume is the unknown target, is the distinguisher class, and is fixed.
The central sample complexity measure is the covering number (metric entropy) of with respect to the dual Minkowski norm:
Lower Bound: Packing arguments yield that any (possibly improper, randomized) learner using i.i.d. samples drawn from must satisfy:
Upper Bound: The "Distinguisher-Covering" algorithm computes an approximate -cover of in the dual norm , empirically estimates for in this cover, and selects minimizing the maximum estimation error. It achieves:
3. Metric-Entropy Duality: Tight Characterizations
Leveraging the symmetry between covering by and by , a metric-entropy duality theorem holds: for any bounded, nonempty , , and ,
for an absolute constant . In particular, plugging , yields nearly tight two-sided bounds:
This duality connects the sample complexity of PAC indistinguishability to metric entropy duality phenomena in convex geometry. The term is essential unless convexity further simplifies the setting (Hu et al., 2022).
4. Distribution-Free Characterization via Fat-Shattering Dimension
In the distribution-free agnostic and realizable settings—typically with —the sample complexity is governed by the fat-shattering dimension of , denoted . For any , :
This result leverages uniform convergence (via the fat-shattering dimension), and a multiaccuracy boosting algorithm that performs iterative updates: in each round, if there exists with sufficient average discrepancy, is updated in the direction of . Each round uses fresh samples and decreases distance by . Packing arguments establish the matching lower bound.
5. Separation Between Realizable and Agnostic Regimes
A critical departure from classical PAC theory is the potential for an unbounded separation between realizable and agnostic PAC indistinguishability sample complexity:
| Setting | Realizable Sample Complexity | Agnostic Sample Complexity |
|---|---|---|
| finite or | or | |
| , arbitrary |
Concretely, for differing at one point, realizable OI learning is trivial but agnostic, distribution-free OI requires samples. Under restrictions such as symmetric convex or containing all functions (and binary), the rates collapse to or to the metric-entropy rate.
6. Algorithms for PAC Indistinguishability
Two principal algorithmic approaches realize the aforementioned sample complexity bounds:
- Distinguisher-Covering (Realizable, Distribution-Specific):
- Cover under .
- On samples , estimate for in the cover.
- Select minimizing the worst empirical error across the covering set.
- Multiaccuracy Boost (Distribution-Free):
- Initialize .
- Repeat rounds:
- Draw batch of examples.
- If some has discrepancy , update , clipped to .
- Otherwise, terminate.
Both algorithms yield nearly tight rates matching the theoretical characterizations given by metric entropy and fat-shattering dimension, respectively.
7. Mathematical Constructs and Significance
The theory centralizes two geometric-combinatorial constructs:
- Dual Minkowski Norm: For and ,
- Metric Entropy (Covering Number):
These underlie both the upper/lower bounds and duality results. The theory provides the first tight, general characterizations for the number of samples needed to ensure -indistinguishability, providing a continuum of learning-theoretic settings from PAC to fully agnostic -learning by appropriately varying (Hu et al., 2022).