Omnipredictors for Loss-Oblivious Learning

Updated 4 July 2026

Omnipredictors are predictors designed for loss-oblivious learning, decoupling model training from loss-specific post-processing and enabling multiple downstream tasks.
They leverage techniques like multicalibration, outcome indistinguishability, and calibrated multiaccuracy to rigorously match the performance of the best hypothesis for diverse loss functions.
Their framework extends to binary classification, regression, multiclass prediction, fairness-constrained optimization, and online settings with strong sample and regret guarantees.

Searching arXiv for recent and foundational papers on omniprediction/omnipredictors to ground the article. Omnipredictors are predictors designed for loss-oblivious learning: rather than training a separate model for each downstream objective, one learns a single predictor whose output can later be post-processed, by a loss-specific but data-independent rule, to compete with the best hypothesis in a reference class for many losses simultaneously (Gopalan et al., 2021). In the binary setting, this predictor is typically a map $p:X\to[0,1]$ , interpreted as an estimate of a conditional probability, and the post-processing for a loss $\ell$ is the Bayes-optimal action under a Bernoulli distribution with mean $p(x)$ (Gopalan et al., 2022). Subsequent work has recast this paradigm through multicalibration, outcome indistinguishability, calibrated multiaccuracy, proper calibration, and step calibration; extended it to fairness-constrained optimization, regression, multiclass prediction, online adversarial learning, performative prediction, and evolving graphs; and sharpened both sample-complexity and runtime guarantees (Gopalan et al., 2022, Hu et al., 2022, Garg et al., 2023, Gopalan et al., 2024, Okoroafor et al., 28 Jan 2025, Hu et al., 19 Feb 2026, Noarov et al., 18 Jun 2026).

1. Definition and basic formalism

In the binary-label formulation, a predictor is a function $f:X\to[0,1]$ , interpreted as estimating $\Pr[y=1\mid x]$ (Gopalan et al., 2021). Given a loss family $\mathcal L$ and a hypothesis class $\mathcal C=\{c:X\to\mathbb R\}$ , $f$ is an $(\mathcal L,\mathcal C,\delta)$ -omnipredictor if for every $\ell\in\mathcal L$ there exists a univariate post-processing function $\ell$ 0 such that

$\ell$ 1

The central requirement is that learning takes place without knowing $\ell$ 2; the loss only determines the final transformation $\ell$ 3 (Gopalan et al., 2021).

A closely related formulation defines, for each loss $\ell$ 4,

$\ell$ 5

Then $\ell$ 6 is an $\ell$ 7-omnipredictor if for every $\ell$ 8,

$\ell$ 9

This makes explicit that the predictor is fixed once and for all, while the post-processing $p(x)$ 0 is loss-specific (Gopalan et al., 2022).

The post-processing step is exactly the one induced by treating the predictor’s output as a probability estimate. For binary labels, the “ideal” loss-optimal post-processing is

$p(x)$ 1

Examples given in the literature include squared loss, for which $p(x)$ 2, and $p(x)$ 3 loss, for which $p(x)$ 4 (Gopalan et al., 2021).

The same perspective extends beyond binary outcomes. In the more general setting where labels lie in $p(x)$ 5, the predictor outputs a distribution $p(x)$ 6, and the post-processing becomes $p(x)$ 7 (Gopalan et al., 2021). For regression with continuous labels $p(x)$ 8, later work formulates omnipredictors as predictors $p(x)$ 9 together with post-processings $f:X\to[0,1]$ 0 such that, for every loss $f:X\to[0,1]$ 1,

$f:X\to[0,1]$ 2

(Gopalan et al., 2024).

2. Multicalibration, outcome indistinguishability, and structural characterizations

The original structural route to omniprediction is through multicalibration. In the binary case, if a partition $f:X\to[0,1]$ 3 is $f:X\to[0,1]$ 4-approximately multicalibrated for $f:X\to[0,1]$ 5, then the associated canonical predictor $f:X\to[0,1]$ 6 is an $f:X\to[0,1]$ 7-omnipredictor for the family $f:X\to[0,1]$ 8 of $f:X\to[0,1]$ 9-nice convex losses (Gopalan et al., 2021). The covariance-based multicalibration condition used there is

$\Pr[y=1\mid x]$ 0

That formulation was chosen because it extends naturally to real-valued $\Pr[y=1\mid x]$ 1, multi-class labels, and real-valued outcomes, and it is closed under linear combinations (Gopalan et al., 2021).

A later conceptual reformulation introduces Loss Outcome Indistinguishability. For each $\Pr[y=1\mid x]$ 2 and $\Pr[y=1\mid x]$ 3, define

$\Pr[y=1\mid x]$ 4

A predictor $\Pr[y=1\mid x]$ 5 is $\Pr[y=1\mid x]$ 6-loss OI if

$\Pr[y=1\mid x]$ 7

for all $\Pr[y=1\mid x]$ 8 and $\Pr[y=1\mid x]$ 9, where $\mathcal L$ 0 is the simulated world obtained by sampling $\mathcal L$ 1 and then $\mathcal L$ 2 (Gopalan et al., 2022). By construction,

$\mathcal L$ 3

while the converse does not hold (Gopalan et al., 2022).

Loss OI decomposes into hypothesis OI and decision OI. The paper introduces the discrete derivative

$\mathcal L$ 4

together with the identity

$\mathcal L$ 5

This converts indistinguishability statements into correlation conditions. Hypothesis OI becomes multiaccuracy for the derived class

$\mathcal L$ 6

and decision OI becomes a calibration condition for the class $\mathcal L$ 7 (Gopalan et al., 2022). The generic recipe is therefore

$\mathcal L$ 8

This perspective led to calibrated multiaccuracy, defined as calibration plus multiaccuracy. It is positioned between multiaccuracy and multicalibration: $\mathcal L$ 9 For generalized linear model losses of the form

$\mathcal C=\{c:X\to\mathbb R\}$ 0

one has

$\mathcal C=\{c:X\to\mathbb R\}$ 1

so the derived class collapses to $\mathcal C=\{c:X\to\mathbb R\}$ 2. Hence, for GLM losses, calibration plus $\mathcal C=\{c:X\to\mathbb R\}$ 3-multiaccuracy is enough for Loss OI and therefore omniprediction (Gopalan et al., 2022).

A different but related structural simplification appears in later work on proper calibration. There, Decision OI is identified with a weighted calibration error

$\mathcal C=\{c:X\to\mathbb R\}$ 4

and proper calibration is defined by

$\mathcal C=\{c:X\to\mathbb R\}$ 5

For the class $\mathcal C=\{c:X\to\mathbb R\}$ 6, proper calibration is equivalent, up to constant factors, to threshold-weighted calibration $\mathcal C=\{c:X\to\mathbb R\}$ 7-calibration (Okoroafor et al., 28 Jan 2025). More recent work further reduces omniprediction and panprediction to step calibration, which controls correlations on predictor sublevel sets and hypothesis sublevel sets inside groups (Balakrishnan et al., 31 Oct 2025).

3. Algorithmic learning and complexity

The first end-to-end omnipredictor construction in the batch setting is based on approximately multicalibrated partitions together with weak agnostic learning (Gopalan et al., 2021). In the multi-class case, the paper gives iterative Split and Merge procedures, and states an explicit partition-size bound

$\mathcal C=\{c:X\to\mathbb R\}$ 8

The resulting runtime is polynomial in $\mathcal C=\{c:X\to\mathbb R\}$ 9 and in the weak learner’s parameters (Gopalan et al., 2021).

The loss-OI formulation yields a simpler calibrated-multiaccuracy algorithm. The main idea is to alternate between a multiaccuracy update using a weak learner for $f$ 0 and a recalibration step, both of which reduce the potential

$f$ 1

The number of weak-learner calls is

$f$ 2

which is comparable to standard multiaccuracy and significantly better than the multicalibration bound quoted there as roughly

$f$ 3

(Gopalan et al., 2022).

A subsequent line of work shows that the sample and regret cost of omniprediction can be near that of ordinary prediction. In the online setting, one paper gives an oracle-efficient algorithm with $f$ 4 regret for any class of Lipschitz loss functions $f$ 5, and an offline learning algorithm for bounded-variation losses $f$ 6 whose error scales near-linearly in the Rademacher complexity of $f$ 7 (Okoroafor et al., 28 Jan 2025). In that framework, finite $f$ 8 and finite $f$ 9 yield an online regret bound

$(\mathcal L,\mathcal C,\delta)$ 0

and, more generally,

$(\mathcal L,\mathcal C,\delta)$ 1

given an online weak agnostic learner for $(\mathcal L,\mathcal C,\delta)$ 2 (Okoroafor et al., 28 Jan 2025).

The online adversarial setting was also developed through online multicalibration. One construction reduces online multicalibration to online squared-loss regression over $(\mathcal L,\mathcal C,\delta)$ 3, achieves contextual swap regret of roughly

$(\mathcal L,\mathcal C,\delta)$ 4

and thereby derives the first efficient online omnipredictor for Lipschitz convex losses (Garg et al., 2023). For linear predictors, combining the reduction with Azoury–Warmuth regression and choosing $(\mathcal L,\mathcal C,\delta)$ 5 gives

$(\mathcal L,\mathcal C,\delta)$ 6

(Garg et al., 2023).

The currently strongest online rates separate oracle efficiency from information-theoretic optimality. One paper proves an oracle-efficient online learning algorithm with $(\mathcal L,\mathcal C,\delta)$ 7 regret for omniprediction (Okoroafor et al., 28 Jan 2025), while another earlier paper showed that an oracle-efficient multicalibration route naturally yields swap-omniprediction, for which $(\mathcal L,\mathcal C,\delta)$ 8 regret is impossible online, and established a lower bound of $(\mathcal L,\mathcal C,\delta)$ 9 via online calibration (Garg et al., 2023). That same paper also presents a non-oracle-efficient algorithm achieving the optimal $\ell\in\mathcal L$ 0 omniprediction rate for suitable finite Boolean classes $\ell\in\mathcal L$ 1, yielding an information-theoretic separation between omniprediction and multicalibration-based solution concepts (Garg et al., 2023).

A recent batch result sharpens deterministic sample complexity. It gives a deterministic multicalibration algorithm with sample complexity

$\ell\in\mathcal L$ 2

and a deterministic finite-test OI algorithm with sample complexity

$\ell\in\mathcal L$ 3

then derives deterministic omnipredictors and panpredictors with optimal sample complexity (Noarov et al., 18 Jun 2026). The paper explicitly states that randomization is not statistically necessary for optimal offline multicalibration or omniprediction (Noarov et al., 18 Jun 2026).

4. Fairness, constraints, and group-conditional guarantees

A major theme in the literature is that omnipredictors are not limited to unconstrained ERM. In multi-group loss minimization, if a partition is approximately multicalibrated for a product class $\ell\in\mathcal L$ 4, then for each subgroup $\ell\in\mathcal L$ 5, the canonical predictor remains an omnipredictor for $\ell\in\mathcal L$ 6 on the subpopulation $\ell\in\mathcal L$ 7 (Gopalan et al., 2021). The induced partition on $\ell\in\mathcal L$ 8 retains approximate multicalibration with controlled degradation, and consequently each group can be treated optimally, for any loss in the family, without retraining (Gopalan et al., 2021).

This multi-group perspective was extended to fairness-constrained optimization. One formulation studies false-positive fairness constraints

$\ell\in\mathcal L$ 9

with

$\ell$ 00

Starting from a multicalibrated regression function $\ell$ 01, the paper shows that a no-regret primal-dual post-processing, using only unlabeled data and $\ell$ 02, yields a classifier $\ell$ 03 satisfying

$\ell$ 04

and

$\ell$ 05

for every $\ell$ 06 (Globus-Harris et al., 2022). The post-processing threshold depends on the group-membership vector through a linear form in the dual variables, allowing the method to handle intersecting groups without enumerating all intersections (Globus-Harris et al., 2022).

A parallel development defines omnipredictors for constrained optimization more generally. The downstream task is

$\ell$ 07

and the learner is assumed to know in advance the groups $\ell$ 08 that will define future constraints (Hu et al., 2022). The key device is the simulated distribution $\ell$ 09, obtained by sampling $\ell$ 10 from the marginal and then $\ell$ 11. Appropriate variants of group multiaccuracy, group multicalibration, group calibration, and group level-set multiaccuracy imply that solving the constrained task on $\ell$ 12 transfers back to a near-optimal solution on the true distribution $\ell$ 13 (Hu et al., 2022).

For convex and “special” group objectives and constraints, a predictor satisfying group multiaccuracy and group calibration is enough (Hu et al., 2022). The “special” condition is

$\ell$ 14

which covers $\ell$ 15 loss, squared loss, generalized linear model losses after scaling, and linear constraints including statistical parity, equal opportunity, and equalized odds (Hu et al., 2022). For arbitrary group objectives and constraints with bounded differences, stronger group level-set multiaccuracy is required (Hu et al., 2022).

A plausible implication is that omniprediction functions as a reusable representation not only for uncertain losses but also for uncertain regulatory or fairness regimes, provided the relevant subgroup structure is specified in advance.

5. Extensions beyond the original binary batch setting

The regression extension reframes omniprediction through sufficient statistics for loss minimization. A family $\ell$ 16, with $\ell$ 17, gives $\ell$ 18-uniform approximations to $\ell$ 19 if for every $\ell$ 20 there are coefficient functions $\ell$ 21 such that

$\ell$ 22

If a predictor is multiaccurate on the corresponding statistic class and calibrated, then it is an $\ell$ 23-omnipredictor with

$\ell$ 24

(Gopalan et al., 2024). The same paper proves that the $\ell$ 25-approximate dimension of bounded, convex, $\ell$ 26-Lipschitz functions on $\ell$ 27 satisfies

$\ell$ 28

which leads to faster omnipredictor learning for convex Lipschitz losses than a naive discretization-based approach (Gopalan et al., 2024).

For single-index models, omniprediction has become a route to agnostic learning with unknown activations. One paper uses calibrated multiaccuracy to construct omnipredictors that minimize matching losses, then converts matching-loss control into squared-error guarantees for SIMs and GLMs through Bregman-divergence inequalities (Gollakota et al., 2023). A later paper gives a simpler Isotron-based omnipredictor construction for SIMs, defines the omnigap

$\ell$ 29

and proves that $\ell$ 30 implies $\ell$ 31 is an $\ell$ 32-omnipredictor (Hu et al., 2024). Its finite-sample algorithm Omnitron requires

$\ell$ 33

samples for Lipschitz links, with improvement to roughly $\ell$ 34 in the bi-Lipschitz case, and outputs a structured multi-index model with $\ell$ 35 heads (Hu et al., 2024).

The multiclass setting requires substantially different machinery. Recent work defines a multiclass omnipredictor $\ell$ 36 through the ex ante optimal action map

$\ell$ 37

and proves omniprediction with horizon or sample complexity approximately $\ell$ 38 for $\ell$ 39-class problems (Hu et al., 19 Feb 2026). Its main technical contribution is a framework for simultaneous Blackwell approachability, developed to combine calibration and multiaccuracy constraints in the multiclass simplex (Hu et al., 19 Feb 2026).

The outcome-indistinguishability perspective has also been transferred to performative prediction, where deployed actions affect the distribution of outcomes. Under outcome performativity, Nature is modeled by

$\ell$ 40

A performative omnipredictor is a predictor $\ell$ 41 such that, for each loss $\ell$ 42, the induced decision rule

$\ell$ 43

is nearly optimal under the true performative environment (Kim et al., 2022). The paper proves that performative POI plus performative DOI imply performative omniprediction (Kim et al., 2022).

In evolving graphs, an online kernel-based framework based on a modified $\ell$ 44 algorithm yields outcome indistinguishability and omniprediction guarantees for rich, possibly infinite distinguisher classes over node pairs and neighborhoods (Dwork et al., 2024). The resulting graph predictors support multicalibration across pairs of demographic groups, tests based on embeddedness

$\ell$ 45

and competition against finite sets of bounded graph predictors, including graph neural networks (Dwork et al., 2024).

6. Statistical optimality, proper losses, and the current frontier

A recent development argues that multicalibration is stronger than necessary for omniprediction. For binary proper losses

$\ell$ 46

one paper proves lower bounds showing that calibrated multiaccuracy and multicalibration are strictly harder objectives than omniprediction (Gibbs et al., 14 Oct 2025). In particular, for $\ell$ 47 and $\ell$ 48,

$\ell$ 49

whereas omniprediction over proper losses can achieve the standard VC rate

$\ell$ 50

(Gibbs et al., 14 Oct 2025).

The same work exploits the fact that every left-continuous proper loss decomposes as a mixture of weighted $\ell$ 51 losses,

$\ell$ 52

and establishes both a sample-efficient randomized online-to-batch algorithm and a direct deterministic unrandomized algorithm, each attaining

$\ell$ 53

for omniprediction over left-continuous proper losses (Gibbs et al., 14 Oct 2025). This suggests that, at least for proper losses, omniprediction can match ordinary statistical learning rates without the calibration overhead inherited from earlier constructions.

The same theme appears in a broader generalization called panprediction, which unifies omniprediction and multi-group learning. A predictor $\ell$ 54 is a $\ell$ 55-panpredictor if, for every loss $\ell$ 56 and every group $\ell$ 57,

$\ell$ 58

Recent work gives deterministic and randomized step-calibration algorithms with sample complexities $\ell$ 59 and $\ell$ 60, respectively, and shows that under bounded-variation losses, many-loss many-group prediction can be statistically as easy as ordinary learning (Balakrishnan et al., 31 Oct 2025).

The most recent deterministic results close a longstanding gap. Optimal deterministic multicalibration and finite-test outcome indistinguishability imply optimal deterministic omnipredictors and panpredictors, resolving open problems about whether randomized output predictors are necessary (Noarov et al., 18 Jun 2026). In this sense, the present frontier is less about existence than about the strongest possible combination of generality, oracle efficiency, and structural simplicity.

A common misconception is that omnipredictors are merely calibrated probability estimators. The literature instead treats calibration, multicalibration, proper calibration, calibrated multiaccuracy, step calibration, and various forms of outcome indistinguishability as solution concepts or sufficient conditions for a stronger objective: a single learned representation that supports many downstream decision rules through explicit Bayes-act post-processing (Gopalan et al., 2022, Okoroafor et al., 28 Jan 2025, Balakrishnan et al., 31 Oct 2025). Another misconception is that omniprediction is intrinsically tied to convex losses. Foundational results focused on nice convex losses (Gopalan et al., 2021), but later work covers non-convex losses through Loss OI (Gopalan et al., 2022), proper losses via threshold decompositions (Gibbs et al., 14 Oct 2025), bounded-variation losses (Okoroafor et al., 28 Jan 2025), and constrained or group-conditional objectives (Hu et al., 2022, Balakrishnan et al., 31 Oct 2025).

Taken together, these developments position omnipredictors as a unifying object at the intersection of calibration, fairness, robust post-processing, online learning, and statistical decision theory: a single predictor intended to preserve enough predictive information that many loss-minimization problems can be solved after training, rather than during it.