Papers
Topics
Authors
Recent
Search
2000 character limit reached

Samplable PAC Learning & Evasive Sets

Updated 8 December 2025
  • Samplable PAC Learning is a variant of the PAC framework that restricts attention to efficiently samplable data distributions, reducing theoretical sample complexity.
  • The approach uses explicit evasive sets to achieve exponential separations from standard PAC learning even when the VC-dimension is high.
  • Integrating cryptographic assumptions and bounded adversaries, the framework enhances both computational and online learning efficiency.

Samplable PAC learning is a refinement of the classical Probably Approximately Correct (PAC) learning framework, in which the requirement to learn under all possible data distributions is relaxed to learning under distributions that are efficiently samplable. This modification, first explicitly formalized by Blum, Furst, Kearns, and Lipton (1993), has significant implications for both the statistical and computational complexity of learning, leading to new separations and open questions regarding the true nature of learnability when data-generation processes are algorithmically constrained (Blanc et al., 1 Dec 2025).

1. Formal Definitions and Framework

In the standard PAC (Valiant, 1984), a concept class C{c:X{0,1}}C \subseteq \{c: X \rightarrow \{0,1\}\} is PAC-learned if a learner AA, given mm labeled examples drawn i.i.d. from any distribution DD over the instance space XX and any target cCc \in C, outputs a hypothesis hh such that

PrS(D,c)m[PrxD[h(x)c(x)]ϵ]1δ.\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.

for all cCc \in C and all DD.

The samplable PAC model retains this structure but only requires AA0 to succeed when AA1 is efficiently samplable: that is, there exists a Boolean circuit AA2 of size at most AA3 such that AA4 induces AA5 for some AA6. The learner must succeed for all samplable AA7 of size at most AA8. The formal error metric is AA9 (Blanc et al., 1 Dec 2025).

2. Statistical Separations from Standard PAC Learning

A principal finding is the existence of concept classes where samplable PAC learning is exponentially more powerful than standard PAC. Specifically, there exist classes mm0 over mm1 such that:

  • mm2 implies that standard PAC sample complexity is mm3.
  • The same class is learnable in samplable PAC with polynomial sample complexity.

This is realized through the construction of an mm4-evasive set mm5, defined so every size-mm6 samplable distribution mm7 “misses” mm8 outside of its mm9 heaviest points, meaning DD0 for some DD1, DD2. The corresponding concept class DD3 is defined by DD4 if DD5, DD6 otherwiseDD7. While its VC-dimension remains large, the evasiveness property ensures that, against samplable DD8, memorization-based learners cover nearly all of DD9 using a small sample XX0 (Blanc et al., 1 Dec 2025).

3. Explicit Evasive Sets and Efficient Learnability

Explicit evasive sets are sets XX1 characterized by:

  • Membership in XX2 is decidable by a circuit of size XX3.
  • XX4 is super-polynomial in XX5.
  • For all XX6, every size-XX7 sampler XX8 XX9-misses cCc \in C0 for some constant cCc \in C1.

Such sets, if constructed, provide both hardness and efficient recognizability, crucial for sharp separations between standard and samplable PAC learnability. Their existence is connected to core complexity-theoretic hypotheses: if cCc \in C2, existential sampling makes cCc \in C3 non-evasive. In the random oracle model, explicit cCc \in C4 can be produced that are evasive for all polynomial-size oracle samplers, with cCc \in C5 (Blanc et al., 1 Dec 2025).

4. Computational Separations and Reductions

The study extends beyond sample complexity to computational complexity:

  • Let cCc \in C6 be a pseudorandom function family secure against cCc \in C7 circuits; define cCc \in C8, where cCc \in C9 if hh0, hh1 otherwise.
  • Learning hh2 with respect to the uniform distribution on hh3 requires breaking the PRF, which is infeasible in hh4 time (under standard cryptographic assumptions).
  • However, by the evasiveness of hh5, hh6 becomes learnable in hh7 time for all size-hh8 samplable distributions, as almost all of the probability mass falls on a small, easily memorizable subset.

This separation persists relative to a random oracle, demonstrating unconditional distinctions between standard and samplable PAC in oracle models (Blanc et al., 1 Dec 2025).

5. Online Learning and Adversarial Efficiency

The samplable PAC principle translates to online learning models. In the classic online framework (Littlestone 1988), an adversary provides instances; with no computational restriction, the adversary can force exponentially many mistakes (corresponding to the Littlestone dimension hh9 for some PrS(D,c)m[PrxD[h(x)c(x)]ϵ]1δ.\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.0). If the adversary is limited to producing instances by polynomial-size circuits (efficient adversary), for the same concept classes the best online learner's number of mistakes drops to PrS(D,c)m[PrxD[h(x)c(x)]ϵ]1δ.\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.1. A “default-zero” memorization learner achieves low mistake bounds because an efficient adversary can only produce a bounded number of distinct “hard” points in an evasive set PrS(D,c)m[PrxD[h(x)c(x)]ϵ]1δ.\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.2 (Blanc et al., 1 Dec 2025).

Computationally, the same construction methods (PRFs + explicit PrS(D,c)m[PrxD[h(x)c(x)]ϵ]1δ.\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.3) yield classes where online learning is hard against unbounded adversaries, yet efficient against bounded adversaries.

6. Connections to Classical PAC Learning Bounds

Classical results in PAC learning show that for a concept class PrS(D,c)m[PrxD[h(x)c(x)]ϵ]1δ.\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.4 of finite VC-dimension PrS(D,c)m[PrxD[h(x)c(x)]ϵ]1δ.\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.5, the sample complexity in the realizable PAC setting is PrS(D,c)m[PrxD[h(x)c(x)]ϵ]1δ.\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.6 (Hanneke, 2015). These optimal bounds hold for learning under arbitrary distributions. The samplable PAC findings indicate that, if distributions are restricted to samplable ones, these bounds may not fully capture the true sample or computational complexity: samplable PAC learning can circumvent high sample complexity by exploiting evasiveness in the domain, even when the VC-dimension is exponential. This underscores a fundamental distinction between distributional assumptions in learning theory (Blanc et al., 1 Dec 2025).

7. Open Problems and Future Directions

A central open problem is to characterize samplable PAC sample complexity in terms of both the complexities of the concept class and the samplability of the distribution, paralleling the role of VC-dimension in standard PAC settings. Current characterizations do not yield a combinatorial or analytical measure analogous to VC for samplable PAC. Further lines of investigation include developing measures of distributional complexity beyond samplability and constructing explicit evasive sets unconditionally (that is, outside random oracle or cryptographic assumptions), with the aim of better understanding and harnessing the expanded power of efficient learning under realistic data-generation constraints (Blanc et al., 1 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Samplable PAC Learning.