Optimal Deterministic Multicalibration and Omniprediction
Abstract: A model is multicalibrated on a collection of group weights $G$ if it is calibrated -- i.e. unbiased even conditional on its prediction -- not just overall, but also after reweighting contexts by each $g \in G$. It is a useful property for many downstream applications and is a basic desideratum of trustworthy machine learning. Before this work, all predictors known to attain the minimax-optimal $\widetilde O(\varepsilon{-3})$ sample complexity rate for $\varepsilon$-multicalibration were randomized, while deterministic predictors were known only with substantially worse sample complexity. Whether randomization is necessary for optimal sample complexity in multicalibration was explicitly asked by [CLNR26] and implicitly in several prior works. We resolve this open problem by giving a minimax-optimal multicalibration algorithm that outputs a deterministic predictor. We then generalize the algorithm to produce optimal deterministic predictors that satisfy outcome indistinguishability (OI) with respect to finite or finitely covered collections of tests. As an application, this also gives deterministic omnipredictors and panpredictors with optimal sample complexity, resolving open problems posed by [OKK25] and [BHHLZ25].
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Overview
This paper asks a simple question with big consequences: Do we really need models that flip coins at prediction time to get the best accuracy and fairness? The authors show that the answer is no. They design learning methods that output deterministic predictors—models that always give the same prediction for the same input—that achieve the best-known accuracy for several important goals in machine learning:
- multicalibration (a strong form of calibration across many groups),
- outcome indistinguishability (tests can’t tell whether an outcome came from the world or the model, given the prediction),
- omniprediction (one learned predictor supports many different downstream goals), and
- panprediction (like omniprediction, but also across subgroups).
These deterministic predictors match the sample efficiency of the best previous randomized methods, which is important for trust, auditing, and fairness.
Key Questions
The paper focuses on these plain-language questions:
- Can we learn a model that is calibrated across many groups without using randomness at prediction time?
- Can such models be learned with the optimal number of training examples?
- Can we do the same for outcome indistinguishability and omniprediction (and panprediction), where one model can be adapted to many different tasks?
- If previous best methods were randomized, is randomness actually necessary?
How They Did It (Methods, explained simply)
Think of the learning process as trying to pass a large set of fairness and accuracy “tests” at once. Earlier methods achieved this by letting the model inject randomness into predictions, which made it easier to balance all the tests. But randomness brings problems: two identical people might get different predictions, auditing is harder, and results aren’t reproducible.
The authors’ approach: use your data smartly to reduce uncertainty, then round away the randomness—carefully.
Here’s the idea in steps:
- Why naive derandomization fails:
- Imagine a world with only two types of inputs, A and B.
- A is always labeled 0; B is always labeled 1.
- A randomized model can mix predictions so that the average prediction equals the average label among all cases that receive the same prediction value. That’s perfect calibration.
- But if you “fix” the randomness by choosing one prediction per input (deterministic rounding), you break that perfect balance. Calibration error shoots up.
- This failure happens when some inputs repeat a lot in the data (“atoms”). You need to handle repeated inputs specially.
- Use confidence intervals to handle repeated inputs:
- Split your data into parts. Use one part to estimate, for each input you’ve seen, an interval where its average label likely lies. Frequent inputs get narrow intervals (more precise estimates). Rare inputs get wide intervals (less precise).
- These intervals act as “hints” about what predictions are reasonable for each input.
- An online learning strategy that respects the hints:
- Use an online algorithm (think: repeatedly adjusting predictions to satisfy many tests at once) that only predicts values close to the interval hints.
- This keeps the model’s “randomized” predictions tightly focused for frequent inputs and safely broad for rare ones.
- Turn the randomized predictor into a deterministic one:
- Use another part of the data to group similar inputs into “cells.”
- Within each cell, instead of flipping a fresh coin for every prediction, fix a single random seed once for the whole cell. Then apply the same “random draw” to all inputs in that cell.
- Because frequent inputs already have narrow intervals, and because cells are chosen so no cell has too much probability mass, the rounding barely changes the tests’ results.
- The final model is deterministic: same input, same output.
The key insight is to avoid a hard rule like “treat inputs as heavy if they repeat many times and light otherwise.” That leaves a troublesome middle zone. Instead, use adaptive confidence intervals that smoothly adjust to how often you’ve seen an input—this fills the gap.
Main Findings
Below are the main results, written in accessible terms. In all cases, “sample complexity” means how many training examples you need to reach error at most ε, up to small log factors.
- Deterministic multicalibration at the optimal rate:
- The authors give an algorithm that outputs a deterministic predictor whose multicalibration error is at most ε.
- It needs about proportional to ε-3 training examples (this matches the best possible rate previously achieved only by randomized predictors).
- It runs in polynomial time.
- Deterministic outcome indistinguishability (for any finite set of tests):
- If you have a fixed finite collection of tests that look at the context, the model’s prediction, and the outcome, they produce a model whose test correlation error is at most ε.
- It needs about proportional to (log(number of tests)) / ε2 training examples.
- This is optimal for that setting.
- Deterministic omniprediction (and panprediction):
- For many loss functions and a benchmark class of models, they show how to build one deterministic predictor you can post-process to perform well on all those losses.
- If the “auditor” class (loss-derived functions used to check performance) has complexity p (its pseudo-dimension), the sample complexity is about proportional to (p + log(1/ε)) / ε2, matching the best randomized rates.
- They extend the same idea to panprediction (doing well across losses and subgroups), showing deterministic predictors also achieve optimal sample complexity.
- No hidden randomness required:
- Even the training-time randomness can be removed with only small (logarithmic) changes in the sample bounds.
Why These Results Matter
- Trust and fairness:
- Deterministic predictors treat identical inputs the same. That’s better for fairness, auditing, and explaining decisions.
- It avoids awkward situations where a person might get a different outcome just because the model flipped a different coin that day.
- Practical deployment:
- Companies and institutions prefer deterministic systems because they’re easier to verify, reproduce, and certify.
- The results show you don’t have to trade off trustworthiness for performance.
- One model for many tasks:
- Omniprediction means you can train once and then adapt your predictions for many different goals (like different loss functions) without retraining.
- Doing this deterministically, and with the optimal number of samples, can save time, compute, and data.
- Scientific clarity:
- Before this work, it was unclear whether randomness gave a statistical advantage in multicalibration and related tasks.
- This paper proves that, with the right method, randomness is not necessary to reach the best-known sample efficiency.
A Simple Intuition Recap
- Randomized models can perfectly “mix” inputs to pass many calibration tests, but their randomness is uncomfortable in practice.
- The trouble with naive derandomization is repeated inputs (“atoms”), where simple rounding breaks that perfect mix.
- The fix: use the data to build confidence intervals that reflect how sure you are about each input’s average label, constrain the online learner to predict within those intervals, and then carefully round by grouping inputs into small cells.
- This preserves accuracy and fairness—without relying on prediction-time coin flips.
Bottom Line
The paper shows that deterministic models can be just as sample-efficient and powerful as randomized ones for multicalibration, outcome indistinguishability, omniprediction, and panprediction. Their approach combines interval-based hints, an online learning strategy, and a smart rounding scheme to remove randomness while keeping performance optimal. This makes trustworthy, auditable, and adaptable machine learning more practical in the real world.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a consolidated list of what remains uncertain, missing, or unexplored based on the paper’s scope and results:
- Extending beyond finite/finitely-covered test families
- Remove the finite-cover requirement: Can the deterministic, sample-optimal guarantees be established for infinite test/group classes characterized by capacity measures (e.g., Rademacher complexity, pseudo-dimension, fat-shattering, metric entropy) without discretization or precomputed covers?
- Data-dependent complexity: Can bounds be derived in terms of distribution-dependent complexities (localized Rademacher complexities) rather than worst-case log|A| or log|G| factors?
- Optimality beyond polylog factors
- Tighten logarithmic factors: Are the extra log(||), log(1/ε) terms inherent for deterministic multicalibration/OI/omniprediction, or can they be removed to match lower bounds exactly?
- Sharp constants: Provide non-asymptotic constants in the main rates and quantify the additive γ-term introduced by interval hints and gridding.
- Computational scalability and oracle-efficiency
- Large or implicit group/test classes: The polynomial-time implementation scales with explicit enumeration of groups and grid values. Can one design oracle-efficient algorithms (e.g., via separation oracles, ERM oracles, oracles for ∑-aggregation) that remain deterministic and sample-optimal?
- Memory/time bounds: Precisely characterize runtime and memory in terms of |G|, grid size, and sample sizes (S0/S1/S2), and improve scalability for high-dimensional or combinatorially large group families.
- Stronger OI variants and richer distinguishers
- Beyond “one-sample sample-access” OI: Do the deterministic, sample-optimal guarantees extend to stronger OI models (e.g., two-sample tests, tests with auxiliary side information, multi-sample distinguishers, or distributional indistinguishability notions closer to cryptographic settings)?
- Adaptive distinguishers: What changes if the test families are chosen adaptively after observing the predictor (auditor adaptivity)?
- Online/streaming determinism
- Truly online deterministic guarantees: Can one obtain optimal-rate deterministic omniprediction/OI/multicalibration in a streaming/on-policy setting where predictions must be deterministic at each round (without batching/averaging and without post-hoc derandomization)?
- Single-pass, sublinear-memory algorithms: Is there a memory-optimal deterministic online method with comparable statistical rates?
- Robustness and distribution shift
- Coverage failures in interval hints: The analysis assumes high-probability validity of the learned intervals. Can guarantees be made robust to a small fraction of invalid hints (e.g., adversarial corruption, heavy-tailed noise)?
- Shift-resilience: How do deterministic guarantees degrade under covariate or label shifts (e.g., covariate shift, concept drift)? Can one design robust variants (e.g., distributionally robust multicalibration/OI) with deterministic predictors?
- Atom handling and partitioning
- Optimality of lexicographic partitioning: Is the lexicographic cell construction minimax-optimal for controlling ∑C P_X(C)2? Are there data-adaptive or geometry-aware partitions with provably smaller rounding variance, especially in high dimensions?
- Seed-per-cell storage: Can the rounding step be redesigned to avoid storing a seed per cell (e.g., via deterministic tie-breaking rules, hashing, or pseudorandom functions) while preserving guarantees?
- Discretization and continuous predictions
- Grid-free derandomization: Can one avoid discretization (Λ) entirely and design continuous deterministic predictors with the same sample complexity?
- Impact on downstream tasks: Quantify how grid resolution affects downstream optimization quality for a broad loss family, beyond worst-case ε-calibration metrics.
- Broader loss/label spaces and settings
- Multiclass and structured outcomes: Do the deterministic, optimal-rate guarantees extend to multiclass or structured outputs (e.g., Y in a simplex), and to vector-valued calibration notions?
- Unbounded or heavy-tailed losses: The analysis assumes Y ∈ [0,1] and bounded tests. What if the loss functions or outcomes are unbounded/heavy-tailed? Are there robust variants with truncated or median-of-means estimators?
- Contextual bandits and reinforcement learning: Can deterministic omniprediction-style guarantees be achieved in interactive settings where exploration-exploitation tradeoffs complicate calibration and OI?
- Panprediction specifics
- Full, explicit deterministic lower bounds: The paper claims optimal deterministic panprediction via the OI extension; can matching lower bounds be proved explicitly for the deterministic panprediction setting to close all gaps?
- Auditor construction for groups × losses: Provide explicit, efficiently-computable finite covers/bases for loss-derived auditor classes in practical panprediction instances.
- Privacy and stability
- Differential privacy: Can one achieve differential privacy together with deterministic outputs and sample-optimal rates (up to logs)? What are the tight tradeoffs among privacy, determinism, and sample complexity?
- Stability to resampling: Quantify sensitivity of the intervals and partitions to sample perturbations; provide stability bounds that justify reproducibility claims.
- Interplay with model constraints and interpretability
- Structural constraints: Can the deterministic predictor be constrained to be monotone/sparse/smooth or belong to a specific hypothesis class while retaining sample-optimal rates?
- Post-hoc compression: Can the learned deterministic predictor be compressed (e.g., via distillation) without losing calibration/OI guarantees?
- Cross-fitting and sample reuse
- Reducing sample splitting: The algorithm uses three independent splits (S0, S1, S2). Can cross-fitting or data reuse eliminate or reduce splitting without sacrificing determinism or rates?
- Extensions to multi-distribution learning
- Partial extension of techniques: Can the interval-hint and rounding-cell machinery derandomize certain subclasses of multi-distribution learning problems (beyond the label-consistency regime), or is there a fundamental barrier akin to existing hardness results?
Practical Applications
Practical Applications Derived from “Optimal Deterministic Multicalibration and Omniprediction”
Below are actionable applications that flow from the paper’s core contributions: sample-optimal deterministic multicalibration, deterministic outcome indistinguishability (OI) for finite/finitely covered test families, and deterministic (pan)omniprediction with optimal sample complexity. Each item notes likely sectors, potential tools/products/workflows, and key assumptions/dependencies.
Immediate Applications
- Deterministic calibration/multicalibration post-processing layer
- What: Add a deterministic, sample-optimal multicalibration module on top of any scoring model to guarantee small ECE error simultaneously across a chosen finite family of groups.
- Sectors: healthcare (readmission/mortality risk), finance (credit default, fraud), insurance (claim severity), HR/admissions (risk/fit scoring), ads (CTR/conversion).
- Tools/products/workflows:
- “Calibration layer” SDK implementing interval-hint learning + deterministic rounding on a finite prediction grid; takes a labeled dataset and group weight functions, outputs a deterministic mapping h(x).
- Built-in audit report of group-wise ECE with signed tests.
- Assumptions/dependencies: i.i.d. samples; finite group family G (or finite cover); n ≈ Õ((log|G|)/ε²) for desired error ε; availability of group weights per example; some engineering for LP solving per example (as in the paper’s factored implementation).
- Deterministic omnipredictor for many business metrics (train-once, optimize-many)
- What: Train a single predictor once, then cheaply post-process it to optimize a wide range of loss functions with guarantees competitive to a benchmark class—all without randomized predictions.
- Sectors: adtech (revenue vs. margin vs. ROAS), e-commerce (returns vs. conversion), operations (fill-rate vs. stockout), healthcare triage (precision-recall trade-offs), supply-chain (service-level vs. cost).
- Tools/products/workflows:
- “Loss plugin” library: pass a loss from a covered loss-family; the system returns the corresponding post-processed decision rule.
- Batch training with three-way sample split (confidence intervals S0, online-to-batch S1, rounding cells S2) packaged into an internal pipeline.
- Assumptions/dependencies: finite/finitely covered loss-derived auditor class Δ∘F (finite cover or finite pseudo-dimension p); n ≈ Õ((p+log(1/δ))/ε²); availability of a benchmark function class for competitiveness; distributional stationarity between train and serve.
- Deterministic panprediction for multi-stakeholder guarantees
- What: Guarantee omniprediction simultaneously across subgroups (e.g., regions, demographics) so each group receives a near-optimal policy for its preferred loss.
- Sectors: education (admissions), housing/lending (fair access), public services (allocations), platform safety (policy enforcement across user segments).
- Tools/products/workflows:
- “Group-conditional optimizer” that accepts group functions and losses and yields group-specific decision rules from one base predictor.
- Assumptions/dependencies: as above, plus a finite/finitely covered set of group-weighted tests in the OI reduction.
- Reproducible and auditable model serving in regulated environments
- What: Deterministic predictions enable exact replay and auditability; identical individuals get identical outputs (no prediction-time coin flips).
- Sectors: finance (model risk), healthcare (clinical decision support), public sector (eligibility decisions).
- Tools/products/workflows:
- Deterministic rounding with per-cell seeding; immutable logs capturing rounding-cell partition and fixed grid.
- Audit dashboards that report signed ECE/OI tests and certification artifacts.
- Assumptions/dependencies: retention of the rounding partition and seeds; stable preprocessing to ensure the same context serializes into the same rounding cell.
- Stable A/B testing and offline evaluation
- What: Remove predictor-induced randomness from experiments, reducing variance and enabling strict reproducibility of treatment assignment driven by model scores.
- Sectors: product experimentation, ads, marketplace ranking.
- Tools/products/workflows:
- Integrate deterministic calibrated scores in experimentation platforms; record that allocation differences arise from policy choices, not model randomness.
- Assumptions/dependencies: standard A/B assumptions; stable model inputs.
- Vendor/model procurement under many objectives
- What: Benchmark third-party models via a single learned omnipredictor that can be post-processed to many losses; simplifies due diligence and comparison.
- Sectors: enterprise ML platforms, regulated procurement.
- Tools/products/workflows:
- Evaluation harness that applies a fixed omnipredictor and reports per-loss competitiveness vs. provided baselines.
- Assumptions/dependencies: shared evaluation distribution; access to loss family and benchmarks.
- Calibration-aware threshold and triage policy design
- What: Using guaranteed calibrated scores, set thresholds aligned to costs and service-level targets; convert calibration into reliable action thresholds.
- Sectors: clinical triage, fraud ops, safety moderation, customer support routing.
- Tools/products/workflows:
- Threshold and budget optimizers that rely on calibration to meet error-rate or cost constraints per group or globally.
- Assumptions/dependencies: cost/benefit specification; chosen groups and loss class captured in the OI tests.
- Finite-test OI audit suites for production models
- What: Package finite OI tests (e.g., calibration + multiaccuracy for loss-derived classes) to “red team” outcome residuals of production models without full retraining.
- Sectors: platform governance, compliance, safety.
- Tools/products/workflows:
- Test harness that runs a finite family of OI tests and outputs pass/fail with effect sizes; supports regression-to-the-mean budgeting across many tests.
- Assumptions/dependencies: finite test family; sufficient held-out data n ≈ Õ((log|Tests|)/ε²).
- Robustness on datasets with repeated contexts (atoms)
- What: The method’s confidence-interval hints and rounding-cell approach explicitly manage repeated IDs or contexts (e.g., item IDs, devices).
- Sectors: retail (SKU-level forecasting), IoT (device-level signals), logistics (route IDs).
- Tools/products/workflows:
- Interval-hint computation per observed context frequency; lexicographic rounding-cell partitioning.
- Assumptions/dependencies: repeat observations for some contexts; correct serialization and hashing.
- Compute/energy savings via “train-once, reuse-many”
- What: Replace repeated retraining per objective with one omnipredictor and cheap per-objective post-processing; lowers cost and carbon footprint.
- Sectors: all high-throughput ML shops.
- Tools/products/workflows:
- CI/CD pipeline changes: single training job feeds multiple downstream business metrics.
- Assumptions/dependencies: covered loss family; stable data generation.
- Suggested production workflow (from the paper’s algorithmic design)
- Steps:
- Define a finite grid over [0,1] and specify a finite group family and/or a finite/finitely covered test family for OI.
- Split data into S0 (confidence intervals), S1 (online-to-batch learning with interval hints), S2 (rounding-cell partition).
- Train the randomized predictor constrained by interval hints; round deterministically with one seed per partition cell.
- Validate ECE/OI metrics on a held-out set; archive partition, seeds, and grid for reproducibility.
- Assumptions/dependencies: i.i.d. samples; appropriate grid size (γ-net); polynomial-time implementation using the factored exponential-weights update and per-context LPs.
Long-Term Applications
- Deterministic fairness certification and regulatory standards
- What: Standardize deterministic multicalibration and finite-test OI as audit artifacts for compliance regimes (e.g., financial model risk, healthcare AI governance).
- Sectors: finance, healthcare, public administration.
- Dependencies: agreed test suites; regulatory acceptance; periodic revalidation under drift.
- Sector-specific omnipredictor platforms (“omni-risk engines”)
- What: Productize omnipredictors as APIs configurable by clients’ loss/cost curves and subgroup requirements; provide certified reproducibility.
- Sectors: insurers, banks, logistics, ad platforms.
- Dependencies: curated loss families and benchmark classes; SLAs on ε and sample sizes; client-side loss elicitation.
- Automated subgroup discovery plus guarantees
- What: Pair subgroup mining with deterministic multicalibration/panprediction to cover many discovered subgroups while controlling multiple-testing error.
- Sectors: HR, lending, public services.
- Dependencies: research on finite covers for adaptively discovered groups; computational scaling for large group sets.
- Streaming/online deployment under distribution shift
- What: Adapt the interval-hint and constrained online algorithms to continuous recalibration in production with drift detection and rolling windows.
- Sectors: all streaming ML (ads, marketplaces, sensor analytics).
- Dependencies: shift detection; bounded regret variants with finite test families; safe model update policies.
- Third-party OI challenge frameworks
- What: Open “challenge sets” of OI tests to externally stress models; publish pass/fail as part of transparency reports.
- Sectors: platforms, public-interest tech, standards bodies.
- Dependencies: test curation and maintenance; dataset governance; legal/privacy constraints.
- Privacy-preserving deterministic (pan)omnipredictors
- What: Combine with differential privacy so training logs and audit artifacts are privacy-safe while maintaining deterministic serving.
- Sectors: health, finance, gov-tech.
- Dependencies: DP composition with interval hints/online-to-batch; utility-privacy trade-off tuning.
- Model marketplaces with train-once, optimize-many licensing
- What: Distribute omnipredictors that buyers can adapt to their own loss functions and subgroup priorities without retraining or randomness at serve time.
- Sectors: enterprise AI marketplaces.
- Dependencies: standard interfaces for loss specification; legal and IP frameworks.
- Safety-critical decision support with legal accountability
- What: Use deterministic calibration/omniprediction to support explainability and consistent outcomes in high-stakes decisions (e.g., transplant lists, parole decisions).
- Sectors: healthcare, criminal justice.
- Dependencies: high-quality data; governance boards; robust post-deployment monitoring.
Assumptions and Dependencies That Affect Feasibility
- Data and sampling
- i.i.d. sampling; adequate sample size n ≈ Õ((log|G| + complexity of test/loss cover + log(1/δ))/ε²).
- Distribution shift can degrade guarantees; retraining/recalibration needed.
- Groups and tests
- Finite group family for multicalibration; finite/finitely covered test family for OI/omniprediction/panprediction.
- Availability and computability of group weights for each example.
- Loss/auditor classes
- For omniprediction, finite cover or bounded pseudo-dimension of the loss-derived auditor class Δ∘F; known or estimable benchmark class.
- Computation and engineering
- Polynomial-time implementation via factored exponential-weights; per-example small LPs to select predictions within interval hints.
- Grid choice (γ-net resolution) trades off accuracy vs. compute.
- Rounding-cell partition requires a deterministic, stable serialization of contexts (e.g., lexicographic order) and storage of seeds/partitions for reproducibility.
- Governance and productization
- Clear documentation of test suites, parameters (ε, δ, grid), and data partitions.
- Legal acceptance of deterministic calibration/omniprediction audits; stakeholder alignment on losses and groups.
These applications leverage the paper’s core insight: prediction-time randomness is not required to achieve minimax-optimal rates for multicalibration, outcome indistinguishability, omniprediction, and panprediction. This unlocks reproducible, auditable, and computationally efficient pipelines that are immediately useful across regulated and high-stakes ML deployments, while opening clear paths for standardization and productization.
Glossary
- Atom: A point in a probability distribution that has positive mass (non-zero probability) assigned to it. "our setting allows atoms in the feature distribution where exact purification can fail"
- Azuma–Hoeffding: A concentration inequality that bounds deviations of martingale sums, used to control estimation error in sequential settings. "Azuma--Hoeffding controls this difference for one signed test"
- Calibration: The property that, conditional on a model’s prediction, that prediction equals the expected outcome. "A predictor is calibrated if, conditional on the value it predicts, that value equals the expected outcome"
- Confidence interval: An interval estimate derived from data that, with high probability, contains an unknown parameter (here, a conditional mean). "build a confidence interval for its conditional label mean"
- ECE (Expected Calibration Error): A standard scalar measure of miscalibration that aggregates the magnitude of prediction-conditional bias across predicted values. "The standard quantitative measure of miscalibration is the expected calibration error (ECE)"
- ECE multicalibration error: The maximum group-weighted ECE over a family of groups, measuring calibration uniformly across reweightings. "The ECE multicalibration error is then the maximum, over groups , of the group-weighted ECE (Definition~\ref{def:mc})."
- Exponential weights: An online learning technique that maintains a weighted mixture over experts/tests, updating weights multiplicatively based on observed losses or gains. "A minimax and exponential-weights argument gives an online learning algorithm"
- Grid predictor: A predictor that outputs (possibly randomized) values from a finite grid of prediction values. "A randomized grid predictor assigns to each context a distribution over grid values"
- Lexicographic order: An ordering of vectors by comparing coordinates in sequence, used here to sort contexts and induce partition cells. "imposing a lexicographic order, sorting the partition sample in that order, and taking the cells induced by adjacent sampled contexts"
- Martingale online-to-batch reduction: A technique that converts an online learning guarantee into a batch (population) guarantee, often using martingale concentration. "a standard martingale online-to-batch reduction for multicalibration"
- Minimax-optimal: Achieving the best possible (tight) rate in worst-case sample complexity or error among all algorithms. "We give a minimax-optimal multicalibration algorithm that outputs a deterministic predictor."
- Multiaccuracy: The requirement that prediction residuals have small correlation with a class of functions of the context, ensuring broad accuracy beyond calibration. "together with multiaccuracy tests for the loss-derived class"
- Multi-distribution learning: Learning models that perform well across multiple distributions, often with additional complexity compared to single-distribution settings. "For the closely related problem of multi-distribution learning"
- Multicalibration: A strengthening of calibration that requires calibration to hold simultaneously after reweighting by each function in a collection of groups. "Multicalibration strengthens calibration by requiring it to hold not just marginally but simultaneously after reweighting by every group function in a collection $$."</li> <li><strong>Omniprediction</strong>: Learning a single predictor that can be post-processed to achieve near-optimal performance across many downstream loss functions against a benchmark class. "A closely connected goal is omniprediction"</li> <li><strong>Online adversarial setting</strong>: An online learning framework where data may be chosen adaptively by an adversary, and the learner seeks worst-case guarantees. "establish the optimal rate for multicalibration in the online adversarial setting"</li> <li><strong>Online-to-batch reduction</strong>: A methodology that turns online learning algorithms into batch learners with population guarantees by averaging iterates over i.i.d. samples. "their upper bound follows from online-to-batch reductions"</li> <li><strong>Outcome indistinguishability (OI)</strong>: A testing-based notion requiring that predictions induce outcomes indistinguishable (for a family of tests) from those generated by the true process. "Recent work gives more direct routes through outcome indistinguishability (OI)"</li> <li><strong>Outcome-indistinguishability test</strong>: A bounded function of context and prediction used to probe whether predicted outcomes are indistinguishable from true outcomes. "bounded outcome-indistinguishability tests $a:\times[0,1]\to[-1,1]$"</li> <li><strong>Panprediction</strong>: A group-conditional generalization of omniprediction that requires guarantees across both losses and subgroups. "panprediction (a group conditional notion of omniprediction)"</li> <li><strong>Proper losses</strong>: Loss functions for which the expected loss is minimized by predicting the true conditional mean; often used in probabilistic forecasting. "a direct, deterministic, and sample-optimal result for the special case of proper losses"</li> <li><strong>Pseudo-dimension</strong>: A complexity measure for real-valued function classes (a generalization of VC dimension) that controls sample complexity. "If this loss-derived class has pseudo-dimension $p$"</li> <li><strong>Purification theorem</strong>: A result showing that randomized strategies in games can be replaced by deterministic ones without changing certain expected payoffs, under atomless assumptions. "the purification theorem of \citet{dvoretzky1951elimination}"</li> <li><strong>Regret</strong>: The performance gap between an online learner and the best fixed comparator in hindsight, typically scaling as $\widetilde O(\sqrt{T})$. "omniprediction can be obtained via online algorithms with regret $\widetilde O(\sqrt{T})$"</li> <li><strong>Signed calibration test</strong>: A test that assigns signs to prediction values, converting absolute calibration errors into linear (signed) residuals used for analysis. "For each signed calibration test $r=(g,\sigma)\in\mathcal T$"
- Step calibration: A calibration requirement defined over step (thresholded) partitions of predictions, used in panprediction reductions. "They reduce panprediction to step calibration"
- Threshold-calibration tests: Calibration tests indexed by thresholds on predictions, used in reductions from omniprediction to OI. "reduces to threshold-calibration tests together with multiaccuracy tests"
Collections
Sign up for free to add this paper to one or more collections.