Realizable-to-Agnostic Transformation

Updated 4 October 2025

Realizable-to-Agnostic Transformation is a framework that shifts learning from perfect-label assumptions to noisy settings using techniques like refutation complexity and non-uniform covers.
It employs model-independent reductions, boosting with potential functions, and sample reuse to achieve near-optimal sample complexities in agnostic regimes.
The approach unifies statistical, computational, and privacy aspects, enabling robust and efficient learning even in the presence of arbitrary label noise.

Realizable-to-Agnostic Transformation refers to a family of mathematical, algorithmic, and conceptual frameworks in statistical and online learning theory for transferring guarantees, sample complexities, and properties from the realizable (noiseless) learning regime to the agnostic (noisy, arbitrary label) regime. This transformation is central for understanding when and how efficient learning algorithms designed for idealized situations (where data labels are generated by a member of a known hypothesis class) can be adapted to work in more realistic scenarios with arbitrary label noise and distributional misspecification. Mechanisms include the use of refutation complexity, model-independent cover-based reductions, boosting with potential functions (and sample reuse), and reconstructions leveraging unlabeled data or private data. The scope spans @@@@2@@@@ learning, online learning, active and robust learning, boosting, privacy-preserving algorithms, and deep semantic paradigms in realizability.

1. Fundamental Principles and Definitions

In learning theory, the realizable setting assumes the existence of a hypothesis $h^* \in \mathcal{H}$ such that the observed labels are generated perfectly by $h^*$ . The agnostic setting lifts this assumption—no hypothesis may explain the data without error; labels may be corrupted by noise, and the Bayes optimal classifier may not belong to the hypothesis class.

The transformation from realizable to agnostic learning revolves around characterizing efficient learnability and the minimal sample and computational requirements needed to learn in the absence of perfect structure. Specific quantitative measures introduced include:

Rademacher Complexity: A statistical measure quantifying the ability of a class $\mathcal{C}$ to correlate with random labels. For $m$ samples,

$\mathcal{R}_m(\mathcal{C}) = \mathbb{E}_{x_1,\ldots, x_m} \mathbb{E}_{\sigma_i} \left[ \frac{1}{m} \sup_{c \in \mathcal{C}} \sum_{i=1}^m \sigma_i c(x_i) \right]$

where $\sigma_i$ are i.i.d. $\pm 1$ random labels.

Refutation Complexity (Kothari et al., 2017): The computational analogue of Rademacher complexity, defined as the minimal sample size $m$ (with computational budget $T(n)$ ) needed for an algorithm to distinguish between structure (labels correlated with a member of $\mathcal{C}$ ) and noise (labels are i.i.d. Rademacher). Refutation complexity reflects both statistical hardness and computational difficulty.

2. Model-Independent Reductions and Cover-Based Transformations

The model-independent black-box reduction (Hopkins et al., 2021) provides a general recipe for lifting realizable algorithms to agnostic ones:

Draw an unlabeled sample $S_U$ , and a separate labeled sample $S_L$ from the input distribution.
Enumerate all possible labelings of $S_U$ to construct a non-uniform cover:

$C(S_U) = \left\{ A(S_U, h(S_U)) \mid h \in \mathcal{H}_{|S_U} \right\}$

where $A$ is any realizable learner.

Select $h^* \in C(S_U)$ by minimizing empirical risk on $S_L$ .

This framework requires no uniform convergence or combinatorial characterizations and is independent of loss function, data distribution, or model specification (including fairness, privacy, or robustness properties). Formal bounds use non-uniform covers and growth functions, such as:

$m_L(\epsilon, \delta) \leq O((n(\epsilon/2, \delta/2) + \log(1/\delta))/\epsilon^2)$

where $n(\cdot)$ is the sample complexity of the realizable learner.

3. Complexity-Theoretic Characterizations: Refutation and Efficient Agnostic Learning

The link between refutation complexity and agnostic learning (Kothari et al., 2017) is twofold:

If efficient agnostic learning is possible, then efficient refutation is achievable by testing the learned hypothesis.
If efficient refutation is possible, a weak agnostic learner can be extracted via a hybrid argument, and agnostic boosting can upgrade it into a strong learner.

This equivalence clarifies that computational limits of agnostic learning are precisely characterized by refutation complexity. Lower bounds for learning can thus be shown by proving the hardness of refutation (e.g., for random CSPs).

The agnostic transformation adds nuance by demanding nontrivial (but possibly small) correlation in the presence of noise for the refutation algorithm to succeed, whereas realizable refutation seeks perfect structure. The agnostic reduction must preserve the sample distribution during boosting, contrasting with distribution-independent treatments in the realizable case.

4. Algorithmic Transformations: Boosting and Data Reuse

Multiple studies refine realizable-to-agnostic reduction via boosting:

Efficient agnostic boosting algorithms utilize potential functions and sample reuse to approach ERM-level sample complexity (Ghai et al., 31 Oct 2024), often employing smooth convex potential functions, e.g.,

$\phi(z) = \begin{cases} 2 - z & z \leq 0 \ (z + 2) e^{-z} & z > 0 \end{cases}$

with ensemble updates $H_{t+1} = H_t + \eta h_t$ .

Martingale-based analyses control generalized error in improper hypothesis classes, avoiding the need for uniform convergence over exponentially large spaces.
The use of unlabeled data in boosting (Ghai et al., 6 Mar 2025) enables estimation of feature-dependent components of the potential, relegating label-dependent terms to smaller labeled samples, achieving optimal sample complexity matching ERM under polynomially many additional unlabeled examples.

Online agnostic boosting is achieved via reduction to online convex optimization (Brukhim et al., 2020):

Losses are constructed for the OCO routine, e.g., $\ell_t^{(i)}(p) = p \cdot ((1/\gamma) \mathcal{W}_i(x_t) y_t - 1)$ , with randomized majorities and regret bounds that adapt to adversarial label sequences.

5. Fine-Grained Analysis and Universal Rates

Universal learning rates for ERM in agnostic learning exhibit a trichotomy (Hanneke et al., 17 Jun 2025):

Concept Class $\mathcal{H}$	Excess Risk Rate $E = \mathbb{E}[\operatorname{er} - \inf_{h \in \mathcal{H}} \operatorname{er}]$
Finite	$O(e^{-n})$
Infinite, Finite VCdim	$o(n^{-1/2})$
Infinite VCdim	Arbitrarily slow

Target-dependent refinements and Bayes-dependent refinements localize rates to the gap between the optimal hypothesis and others, and to the combinatorial structure in the proximity of the target.

6. Extensions: Privacy, Robustness, and Semantics

Privacy-preserving realizable-to-agnostic transformation (Li et al., 1 Oct 2025) demonstrates that privatizing a PAC learner for the realizable setting enables agnostic learning with near-optimal extra sample complexity $\tilde{O}(\mathrm{VC}(\mathcal{C})/\alpha^2)$ —eliminating the traditional suboptimal $1/\varepsilon$ dependency via refined score functions for the exponential mechanism and leveraging the entire dataset for selection.

In higher-order logic, syntactic effectful realizability (Cohen et al., 11 Jun 2025) uses monadic translations, so extracted computational witnesses are “agnostic” to the particular effect, e.g., state, control, memoization, or exceptions. The semantic connection to evidenced frames and triposes bridges syntactic transformations and realizability toposes, unifying pure and effectful paradigms.

For adversarial robustness, disagreement-based learners and batch updating strategies permit nearly zero misclassification with only poly-logarithmic abstention error under clean-label adversarial settings, extending realizable to agnostic analyses (Heinzler, 17 Apr 2025).

7. Implications and Directions

The realizable-to-agnostic transformation has profound implications:

It identifies a computational-statistical trade-off: statistical complexity (Rademacher/covering/bracketing) determines lower bounds, while refutation complexity or boosting analysis identifies efficient reduction avenues or obstacles.
Property generalization (Hopkins et al., 2021): Any property satisfied by finite-class realizable learners (stability, privacy, fairness, robustness) can extend, via reduction, to agnostic settings.
Practical algorithms: Sample reuse and potential-based boosting enable sample-optimal and computationally feasible learners in noisy, adversarial, group-heterogeneous, or privacy-sensitive environments.
Research directions: Open questions include the full interpolation between realizable and agnostic bounds in regimes with small excess error, optimizing constants in error rates, further privacy cost reduction, and semantic extensions to learning algorithms with non-standard computational effects.

Theoretical advances across these dimensions have clarified the mechanisms by which strong realizability guarantees can be adapted to more fraught, realistic scenarios, providing unified design, analysis, and lower-bound frameworks for contemporary learning theory.