Papers
Topics
Authors
Recent
Search
2000 character limit reached

Omnipredictors for Loss-Oblivious Learning

Updated 4 July 2026
  • Omnipredictors are predictors designed for loss-oblivious learning, decoupling model training from loss-specific post-processing and enabling multiple downstream tasks.
  • They leverage techniques like multicalibration, outcome indistinguishability, and calibrated multiaccuracy to rigorously match the performance of the best hypothesis for diverse loss functions.
  • Their framework extends to binary classification, regression, multiclass prediction, fairness-constrained optimization, and online settings with strong sample and regret guarantees.

Searching arXiv for recent and foundational papers on omniprediction/omnipredictors to ground the article. Omnipredictors are predictors designed for loss-oblivious learning: rather than training a separate model for each downstream objective, one learns a single predictor whose output can later be post-processed, by a loss-specific but data-independent rule, to compete with the best hypothesis in a reference class for many losses simultaneously (Gopalan et al., 2021). In the binary setting, this predictor is typically a map p:X[0,1]p:X\to[0,1], interpreted as an estimate of a conditional probability, and the post-processing for a loss \ell is the Bayes-optimal action under a Bernoulli distribution with mean p(x)p(x) (Gopalan et al., 2022). Subsequent work has recast this paradigm through multicalibration, outcome indistinguishability, calibrated multiaccuracy, proper calibration, and step calibration; extended it to fairness-constrained optimization, regression, multiclass prediction, online adversarial learning, performative prediction, and evolving graphs; and sharpened both sample-complexity and runtime guarantees (Gopalan et al., 2022, Hu et al., 2022, Garg et al., 2023, Gopalan et al., 2024, Okoroafor et al., 28 Jan 2025, Hu et al., 19 Feb 2026, Noarov et al., 18 Jun 2026).

1. Definition and basic formalism

In the binary-label formulation, a predictor is a function f:X[0,1]f:X\to[0,1], interpreted as estimating Pr[y=1x]\Pr[y=1\mid x] (Gopalan et al., 2021). Given a loss family L\mathcal L and a hypothesis class C={c:XR}\mathcal C=\{c:X\to\mathbb R\}, ff is an (L,C,δ)(\mathcal L,\mathcal C,\delta)-omnipredictor if for every L\ell\in\mathcal L there exists a univariate post-processing function \ell0 such that

\ell1

The central requirement is that learning takes place without knowing \ell2; the loss only determines the final transformation \ell3 (Gopalan et al., 2021).

A closely related formulation defines, for each loss \ell4,

\ell5

Then \ell6 is an \ell7-omnipredictor if for every \ell8,

\ell9

This makes explicit that the predictor is fixed once and for all, while the post-processing p(x)p(x)0 is loss-specific (Gopalan et al., 2022).

The post-processing step is exactly the one induced by treating the predictor’s output as a probability estimate. For binary labels, the “ideal” loss-optimal post-processing is

p(x)p(x)1

Examples given in the literature include squared loss, for which p(x)p(x)2, and p(x)p(x)3 loss, for which p(x)p(x)4 (Gopalan et al., 2021).

The same perspective extends beyond binary outcomes. In the more general setting where labels lie in p(x)p(x)5, the predictor outputs a distribution p(x)p(x)6, and the post-processing becomes p(x)p(x)7 (Gopalan et al., 2021). For regression with continuous labels p(x)p(x)8, later work formulates omnipredictors as predictors p(x)p(x)9 together with post-processings f:X[0,1]f:X\to[0,1]0 such that, for every loss f:X[0,1]f:X\to[0,1]1,

f:X[0,1]f:X\to[0,1]2

(Gopalan et al., 2024).

2. Multicalibration, outcome indistinguishability, and structural characterizations

The original structural route to omniprediction is through multicalibration. In the binary case, if a partition f:X[0,1]f:X\to[0,1]3 is f:X[0,1]f:X\to[0,1]4-approximately multicalibrated for f:X[0,1]f:X\to[0,1]5, then the associated canonical predictor f:X[0,1]f:X\to[0,1]6 is an f:X[0,1]f:X\to[0,1]7-omnipredictor for the family f:X[0,1]f:X\to[0,1]8 of f:X[0,1]f:X\to[0,1]9-nice convex losses (Gopalan et al., 2021). The covariance-based multicalibration condition used there is

Pr[y=1x]\Pr[y=1\mid x]0

That formulation was chosen because it extends naturally to real-valued Pr[y=1x]\Pr[y=1\mid x]1, multi-class labels, and real-valued outcomes, and it is closed under linear combinations (Gopalan et al., 2021).

A later conceptual reformulation introduces Loss Outcome Indistinguishability. For each Pr[y=1x]\Pr[y=1\mid x]2 and Pr[y=1x]\Pr[y=1\mid x]3, define

Pr[y=1x]\Pr[y=1\mid x]4

A predictor Pr[y=1x]\Pr[y=1\mid x]5 is Pr[y=1x]\Pr[y=1\mid x]6-loss OI if

Pr[y=1x]\Pr[y=1\mid x]7

for all Pr[y=1x]\Pr[y=1\mid x]8 and Pr[y=1x]\Pr[y=1\mid x]9, where L\mathcal L0 is the simulated world obtained by sampling L\mathcal L1 and then L\mathcal L2 (Gopalan et al., 2022). By construction,

L\mathcal L3

while the converse does not hold (Gopalan et al., 2022).

Loss OI decomposes into hypothesis OI and decision OI. The paper introduces the discrete derivative

L\mathcal L4

together with the identity

L\mathcal L5

This converts indistinguishability statements into correlation conditions. Hypothesis OI becomes multiaccuracy for the derived class

L\mathcal L6

and decision OI becomes a calibration condition for the class L\mathcal L7 (Gopalan et al., 2022). The generic recipe is therefore

L\mathcal L8

This perspective led to calibrated multiaccuracy, defined as calibration plus multiaccuracy. It is positioned between multiaccuracy and multicalibration: L\mathcal L9 For generalized linear model losses of the form

C={c:XR}\mathcal C=\{c:X\to\mathbb R\}0

one has

C={c:XR}\mathcal C=\{c:X\to\mathbb R\}1

so the derived class collapses to C={c:XR}\mathcal C=\{c:X\to\mathbb R\}2. Hence, for GLM losses, calibration plus C={c:XR}\mathcal C=\{c:X\to\mathbb R\}3-multiaccuracy is enough for Loss OI and therefore omniprediction (Gopalan et al., 2022).

A different but related structural simplification appears in later work on proper calibration. There, Decision OI is identified with a weighted calibration error

C={c:XR}\mathcal C=\{c:X\to\mathbb R\}4

and proper calibration is defined by

C={c:XR}\mathcal C=\{c:X\to\mathbb R\}5

For the class C={c:XR}\mathcal C=\{c:X\to\mathbb R\}6, proper calibration is equivalent, up to constant factors, to threshold-weighted calibration C={c:XR}\mathcal C=\{c:X\to\mathbb R\}7-calibration (Okoroafor et al., 28 Jan 2025). More recent work further reduces omniprediction and panprediction to step calibration, which controls correlations on predictor sublevel sets and hypothesis sublevel sets inside groups (Balakrishnan et al., 31 Oct 2025).

3. Algorithmic learning and complexity

The first end-to-end omnipredictor construction in the batch setting is based on approximately multicalibrated partitions together with weak agnostic learning (Gopalan et al., 2021). In the multi-class case, the paper gives iterative Split and Merge procedures, and states an explicit partition-size bound

C={c:XR}\mathcal C=\{c:X\to\mathbb R\}8

The resulting runtime is polynomial in C={c:XR}\mathcal C=\{c:X\to\mathbb R\}9 and in the weak learner’s parameters (Gopalan et al., 2021).

The loss-OI formulation yields a simpler calibrated-multiaccuracy algorithm. The main idea is to alternate between a multiaccuracy update using a weak learner for ff0 and a recalibration step, both of which reduce the potential

ff1

The number of weak-learner calls is

ff2

which is comparable to standard multiaccuracy and significantly better than the multicalibration bound quoted there as roughly

ff3

(Gopalan et al., 2022).

A subsequent line of work shows that the sample and regret cost of omniprediction can be near that of ordinary prediction. In the online setting, one paper gives an oracle-efficient algorithm with ff4 regret for any class of Lipschitz loss functions ff5, and an offline learning algorithm for bounded-variation losses ff6 whose error scales near-linearly in the Rademacher complexity of ff7 (Okoroafor et al., 28 Jan 2025). In that framework, finite ff8 and finite ff9 yield an online regret bound

(L,C,δ)(\mathcal L,\mathcal C,\delta)0

and, more generally,

(L,C,δ)(\mathcal L,\mathcal C,\delta)1

given an online weak agnostic learner for (L,C,δ)(\mathcal L,\mathcal C,\delta)2 (Okoroafor et al., 28 Jan 2025).

The online adversarial setting was also developed through online multicalibration. One construction reduces online multicalibration to online squared-loss regression over (L,C,δ)(\mathcal L,\mathcal C,\delta)3, achieves contextual swap regret of roughly

(L,C,δ)(\mathcal L,\mathcal C,\delta)4

and thereby derives the first efficient online omnipredictor for Lipschitz convex losses (Garg et al., 2023). For linear predictors, combining the reduction with Azoury–Warmuth regression and choosing (L,C,δ)(\mathcal L,\mathcal C,\delta)5 gives

(L,C,δ)(\mathcal L,\mathcal C,\delta)6

(Garg et al., 2023).

The currently strongest online rates separate oracle efficiency from information-theoretic optimality. One paper proves an oracle-efficient online learning algorithm with (L,C,δ)(\mathcal L,\mathcal C,\delta)7 regret for omniprediction (Okoroafor et al., 28 Jan 2025), while another earlier paper showed that an oracle-efficient multicalibration route naturally yields swap-omniprediction, for which (L,C,δ)(\mathcal L,\mathcal C,\delta)8 regret is impossible online, and established a lower bound of (L,C,δ)(\mathcal L,\mathcal C,\delta)9 via online calibration (Garg et al., 2023). That same paper also presents a non-oracle-efficient algorithm achieving the optimal L\ell\in\mathcal L0 omniprediction rate for suitable finite Boolean classes L\ell\in\mathcal L1, yielding an information-theoretic separation between omniprediction and multicalibration-based solution concepts (Garg et al., 2023).

A recent batch result sharpens deterministic sample complexity. It gives a deterministic multicalibration algorithm with sample complexity

L\ell\in\mathcal L2

and a deterministic finite-test OI algorithm with sample complexity

L\ell\in\mathcal L3

then derives deterministic omnipredictors and panpredictors with optimal sample complexity (Noarov et al., 18 Jun 2026). The paper explicitly states that randomization is not statistically necessary for optimal offline multicalibration or omniprediction (Noarov et al., 18 Jun 2026).

4. Fairness, constraints, and group-conditional guarantees

A major theme in the literature is that omnipredictors are not limited to unconstrained ERM. In multi-group loss minimization, if a partition is approximately multicalibrated for a product class L\ell\in\mathcal L4, then for each subgroup L\ell\in\mathcal L5, the canonical predictor remains an omnipredictor for L\ell\in\mathcal L6 on the subpopulation L\ell\in\mathcal L7 (Gopalan et al., 2021). The induced partition on L\ell\in\mathcal L8 retains approximate multicalibration with controlled degradation, and consequently each group can be treated optimally, for any loss in the family, without retraining (Gopalan et al., 2021).

This multi-group perspective was extended to fairness-constrained optimization. One formulation studies false-positive fairness constraints

L\ell\in\mathcal L9

with

\ell00

Starting from a multicalibrated regression function \ell01, the paper shows that a no-regret primal-dual post-processing, using only unlabeled data and \ell02, yields a classifier \ell03 satisfying

\ell04

and

\ell05

for every \ell06 (Globus-Harris et al., 2022). The post-processing threshold depends on the group-membership vector through a linear form in the dual variables, allowing the method to handle intersecting groups without enumerating all intersections (Globus-Harris et al., 2022).

A parallel development defines omnipredictors for constrained optimization more generally. The downstream task is

\ell07

and the learner is assumed to know in advance the groups \ell08 that will define future constraints (Hu et al., 2022). The key device is the simulated distribution \ell09, obtained by sampling \ell10 from the marginal and then \ell11. Appropriate variants of group multiaccuracy, group multicalibration, group calibration, and group level-set multiaccuracy imply that solving the constrained task on \ell12 transfers back to a near-optimal solution on the true distribution \ell13 (Hu et al., 2022).

For convex and “special” group objectives and constraints, a predictor satisfying group multiaccuracy and group calibration is enough (Hu et al., 2022). The “special” condition is

\ell14

which covers \ell15 loss, squared loss, generalized linear model losses after scaling, and linear constraints including statistical parity, equal opportunity, and equalized odds (Hu et al., 2022). For arbitrary group objectives and constraints with bounded differences, stronger group level-set multiaccuracy is required (Hu et al., 2022).

A plausible implication is that omniprediction functions as a reusable representation not only for uncertain losses but also for uncertain regulatory or fairness regimes, provided the relevant subgroup structure is specified in advance.

5. Extensions beyond the original binary batch setting

The regression extension reframes omniprediction through sufficient statistics for loss minimization. A family \ell16, with \ell17, gives \ell18-uniform approximations to \ell19 if for every \ell20 there are coefficient functions \ell21 such that

\ell22

If a predictor is multiaccurate on the corresponding statistic class and calibrated, then it is an \ell23-omnipredictor with

\ell24

(Gopalan et al., 2024). The same paper proves that the \ell25-approximate dimension of bounded, convex, \ell26-Lipschitz functions on \ell27 satisfies

\ell28

which leads to faster omnipredictor learning for convex Lipschitz losses than a naive discretization-based approach (Gopalan et al., 2024).

For single-index models, omniprediction has become a route to agnostic learning with unknown activations. One paper uses calibrated multiaccuracy to construct omnipredictors that minimize matching losses, then converts matching-loss control into squared-error guarantees for SIMs and GLMs through Bregman-divergence inequalities (Gollakota et al., 2023). A later paper gives a simpler Isotron-based omnipredictor construction for SIMs, defines the omnigap

\ell29

and proves that \ell30 implies \ell31 is an \ell32-omnipredictor (Hu et al., 2024). Its finite-sample algorithm Omnitron requires

\ell33

samples for Lipschitz links, with improvement to roughly \ell34 in the bi-Lipschitz case, and outputs a structured multi-index model with \ell35 heads (Hu et al., 2024).

The multiclass setting requires substantially different machinery. Recent work defines a multiclass omnipredictor \ell36 through the ex ante optimal action map

\ell37

and proves omniprediction with horizon or sample complexity approximately \ell38 for \ell39-class problems (Hu et al., 19 Feb 2026). Its main technical contribution is a framework for simultaneous Blackwell approachability, developed to combine calibration and multiaccuracy constraints in the multiclass simplex (Hu et al., 19 Feb 2026).

The outcome-indistinguishability perspective has also been transferred to performative prediction, where deployed actions affect the distribution of outcomes. Under outcome performativity, Nature is modeled by

\ell40

A performative omnipredictor is a predictor \ell41 such that, for each loss \ell42, the induced decision rule

\ell43

is nearly optimal under the true performative environment (Kim et al., 2022). The paper proves that performative POI plus performative DOI imply performative omniprediction (Kim et al., 2022).

In evolving graphs, an online kernel-based framework based on a modified \ell44 algorithm yields outcome indistinguishability and omniprediction guarantees for rich, possibly infinite distinguisher classes over node pairs and neighborhoods (Dwork et al., 2024). The resulting graph predictors support multicalibration across pairs of demographic groups, tests based on embeddedness

\ell45

and competition against finite sets of bounded graph predictors, including graph neural networks (Dwork et al., 2024).

6. Statistical optimality, proper losses, and the current frontier

A recent development argues that multicalibration is stronger than necessary for omniprediction. For binary proper losses

\ell46

one paper proves lower bounds showing that calibrated multiaccuracy and multicalibration are strictly harder objectives than omniprediction (Gibbs et al., 14 Oct 2025). In particular, for \ell47 and \ell48,

\ell49

whereas omniprediction over proper losses can achieve the standard VC rate

\ell50

(Gibbs et al., 14 Oct 2025).

The same work exploits the fact that every left-continuous proper loss decomposes as a mixture of weighted \ell51 losses,

\ell52

and establishes both a sample-efficient randomized online-to-batch algorithm and a direct deterministic unrandomized algorithm, each attaining

\ell53

for omniprediction over left-continuous proper losses (Gibbs et al., 14 Oct 2025). This suggests that, at least for proper losses, omniprediction can match ordinary statistical learning rates without the calibration overhead inherited from earlier constructions.

The same theme appears in a broader generalization called panprediction, which unifies omniprediction and multi-group learning. A predictor \ell54 is a \ell55-panpredictor if, for every loss \ell56 and every group \ell57,

\ell58

Recent work gives deterministic and randomized step-calibration algorithms with sample complexities \ell59 and \ell60, respectively, and shows that under bounded-variation losses, many-loss many-group prediction can be statistically as easy as ordinary learning (Balakrishnan et al., 31 Oct 2025).

The most recent deterministic results close a longstanding gap. Optimal deterministic multicalibration and finite-test outcome indistinguishability imply optimal deterministic omnipredictors and panpredictors, resolving open problems about whether randomized output predictors are necessary (Noarov et al., 18 Jun 2026). In this sense, the present frontier is less about existence than about the strongest possible combination of generality, oracle efficiency, and structural simplicity.

A common misconception is that omnipredictors are merely calibrated probability estimators. The literature instead treats calibration, multicalibration, proper calibration, calibrated multiaccuracy, step calibration, and various forms of outcome indistinguishability as solution concepts or sufficient conditions for a stronger objective: a single learned representation that supports many downstream decision rules through explicit Bayes-act post-processing (Gopalan et al., 2022, Okoroafor et al., 28 Jan 2025, Balakrishnan et al., 31 Oct 2025). Another misconception is that omniprediction is intrinsically tied to convex losses. Foundational results focused on nice convex losses (Gopalan et al., 2021), but later work covers non-convex losses through Loss OI (Gopalan et al., 2022), proper losses via threshold decompositions (Gibbs et al., 14 Oct 2025), bounded-variation losses (Okoroafor et al., 28 Jan 2025), and constrained or group-conditional objectives (Hu et al., 2022, Balakrishnan et al., 31 Oct 2025).

Taken together, these developments position omnipredictors as a unifying object at the intersection of calibration, fairness, robust post-processing, online learning, and statistical decision theory: a single predictor intended to preserve enough predictive information that many loss-minimization problems can be solved after training, rather than during it.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Omnipredictors.