Novel Conformal P-Values for Robust Inference
- The new family of conformal p-values is a robust, distribution-free method providing finite-sample error control under diverse testing scenarios.
- It utilizes advanced calibration, weighting, and localization strategies to handle challenges such as covariate shift, imbalanced data, and limited sample sizes.
- Aggregated exchangeable p-values are combined with decision-theoretic principles to optimize power, enhance interpretability, and improve error rate control.
A new family of conformal p-values encompasses a diverse set of modern test statistics and calibration strategies designed to provide distribution-free, finite-sample validity for hypothesis testing, predictive inference, outlier and novelty detection, set-valued classification, multiple testing, and changepoint analysis. These constructions leverage advanced principles from statistical decision theory—including confidence set inversion, robustness to nuisance parameters, recalibration using concentration inequalities, full data reuse, and efficient combination of exchangeable p-values or e-values—to improve interpretability, control error rates (type I, FDR, FWER), and optimize power in both classical and challenging scenarios such as covariate shift, open-set and imbalanced classification, and limited-data regimes.
1. Foundations and Validity Principles
The essential property of a conformal p-value is its distribution-free control under the null hypothesis: for any α ∈ [0, 1], the p-value satisfies for arbitrary data distributions, provided an (approximate or exact) exchangeability or weighted-exchangeability condition holds. Conformal p-values are typically computed as data-driven ranks, quantifying how "nonconforming" a new observation is relative to a calibration sample. This notion generalizes beyond simple ranking to accommodate:
- Composite nulls and nuisance parameter uncertainty, via valid p-values (VpVs) constructed with supremum or confidence-set correction: , where is a confidence set (Vexler, 2020).
- Conditioning on side information, such as covariates under covariate shift, via weighted conformal p-values that correct for the Radon–Nikodym derivative ; see Equation 1 in (Jin et al., 2023).
- Single-observation or small-sample situations, using supremum or conservative bias terms to maintain uniform coverage.
- Marginal, calibration-conditional, or localized (kernel-weighted) validity, each appropriate for different inferential settings (Bates et al., 2021, Wu et al., 25 Sep 2024).
Conformal p-values have also been shown to retain their error control properties when properly calibrated under model misspecification and when adaptive or selective splitting strategies are used for imbalanced data (Xie et al., 14 Oct 2025).
2. Extended Expected p-Value (EPV) and ROC Analysis
The expected p-value (EPV) is reinterpreted as a global metric of a test’s performance: , measuring "one minus integrated power" across all α (Vexler, 2020). Partial EPV (pEPV) focuses on significant α-intervals. This viewpoint naturally connects the EPV to the area under the ROC curve (AUC), facilitating optimal threshold selection via maximization of Youden's index:
which identifies the point on the ROC maximizing the "distance from chance." In the normal case, the Youden index recovers classic cutoffs (e.g., "three sigma"/α = 0.05).
The EPV–ROC framework also refines procedures for unbiasedness assessment and performance comparisons across tests, especially when integrated with Bayes factor statistics and likelihood-ratio principles (Vexler, 2020).
3. Robust and Conditional P-value Calibration
Modern conformal p-value constructions increasingly emphasize conditional calibration to obtain either marginal validity (“on average”) or stronger conditional (calibration-conditional) guarantees. The latter is achieved through deterministic recalibration of marginal p-values using simultaneous empirical confidence bands constructed from concentration inequalities:
- Simes and DKWM adjustments provide explicit thresholds for p-value calibration (Bates et al., 2021).
- Functions mapping marginal p-values to conditional p-values ensure for all .
- Monte Carlo and asymptotic (Brownian bridge) approximations further optimize calibration, especially in finite samples.
This stratagem allows the resulting p-values to be independent across test points after conditioning, enabling the direct application of classical FDR and FWER control procedures (e.g., Benjamini–Hochberg, Storey-BH, Hochberg) (Bates et al., 2021, Biscio et al., 30 Jan 2025).
4. Data-Integrated, Weighted, and Localized Constructions
Recent work has advanced the use of all available data—null, alternative, and unlabeled/test data—to maximize efficiency in conformal p-value construction and calibration (Huo et al., 16 Aug 2025). The ECOT (Enhanced COformal Testing) framework achieves this by constructing score functions using the entire data set, calibrating via a full permutation strategy that ensures super-uniformity under exchangeability of null scores. Score functions that are calibration- or joint-symmetric further simplify computation.
Weighted conformal p-values explicitly adjust for covariate shift, using weights in the empirical calculation of ranks and type-I error:
with explicit finite-sample type-I control guaranteed by the weighting scheme (Jin et al., 2023).
Localized conformal p-values introduce kernel-based weights in constructing variable prediction intervals and invert those intervals to obtain p-values sensitive to local heterogeneity and covariate shift. These p-values adapt their validity bounds according to kernel bandwidth and regularity of the density ratio (Wu et al., 25 Sep 2024).
5. Aggregation and Combination of Exchangeable P-Values
Exchangeable p-values—arising naturally in repeated calibration splits, cross-conformal prediction, or ensemble tests—admit more statistically efficient combination strategies than traditional p-merging under arbitrary dependence. The modern approach calibrates individual p-values to e-values via an -dependent calibrator, averages, and then inverts the average using advanced Markov inequalities:
Randomization (e.g., using a uniform variable ) further sharpens combined p-values (Gasparin et al., 4 Apr 2024). Such efficient rules yield prediction sets with provable lower size (higher power), and are optimal in the admissible class when exchangeability or randomization is present (Gasparin et al., 3 Mar 2025).
In ordinal, multi-input, or open-set classification, aggregation of class- and observation-specific conformal p-values using abstract scoring functions (e.g., via Beta–Binomial distributions, quantile or area envelopes) further enables sharp control of coverage and set size, even under label imbalance or infinite label spaces (Chakraborty et al., 25 Apr 2024, Fermanian et al., 9 Jul 2025, Xie et al., 14 Oct 2025).
6. Applications: Outlier Detection, Selective Inference, and Changepoint Localization
Conformal p-values form the statistical foundation for robust, multiple-testing–aware, and model-agnostic outlier detection procedures. Calibration-conditional p-values enable direct use of multiple testing corrections (e.g., FDR, FWER) with provable finite-sample error control (Bates et al., 2021). Unified ECOT frameworks, integrative conformal methods with automatic model selection, and robust extensions to full permutation or weighted auxiliary calibration extend coverage to one-class, binary, and hybrid configurations (Liang et al., 2022, Huo et al., 16 Aug 2025).
Conditional and localized conformal p-values underpin selective inference under covariate shift, screening of multiple potential outcomes (such as in drug discovery or causal inference), and conditional label screening with exact FWER control (Jin et al., 2023, Wu et al., 25 Sep 2024). In particular, when covariate shift breaks classical exchangeability, adaptively weighted calibration restores error control.
Changepoint localization benefits from the distribution-free, self-calibrating nature of conformal p-values. By inverting conformity-based prediction intervals at each candidate changepoint, one constructs a matrix of p-values that can be aggregated to yield finite-sample: (i) confidence sets for the changepoint, (ii) consistent point estimators (with proven convergence rates), and (iii) distribution-free tests for exchangeability or segmentation (Bhattacharyya et al., 9 Oct 2025). Quantitative bounds on the CDF deviation of the conformal p-value under distribution change provide theoretical guarantees for both detection power and error rates.
7. Extensions, Optimality, and Open Directions
The optimality of modern conformal p-value constructions is often established with respect to decision-theoretic criteria:
- Good–Turing–type conformal p-values are proven optimal (in the class of frequency-profile–based tests) for open-set/imbalanced classification, connecting prediction set coverage to unseen-label mass estimation and adapting to class imbalance via selective splitting and reweighting (Xie et al., 14 Oct 2025).
- Neyman–Pearson–inspired likelihood-ratio rules abandon the p-value paradigm entirely, instead using likelihood ratios to maximize selection power under an explicit FDR constraint, resulting in procedures that dominate classic conformal selection in non location-shift scenarios and remain robust under model misspecification or covariate shift (Qin et al., 23 Feb 2025).
- E-value–based conformal predictors enable anytime-valid, batch-, or fixed-size prediction regions and allow principled post-hoc coverage adjustment, ambiguous label handling, and stable error aggregation via martingale theory (Bashari et al., 2023, Gauthier et al., 17 Mar 2025, Balinsky et al., 28 Mar 2024).
Open research directions include extending these principles to non-exchangeable settings, further characterizing the limits of efficiency in p-merging, and developing adaptive, on-line calibration or selection rules for streaming data contexts.
In summary, the new family of conformal p-values encompasses conditionally or marginally valid, adaptively weighted, or locally calibrated statistics foundational for distribution-free inference in modern statistical and machine learning practice. This body of methodology integrates and advances ROC analysis, optimal error-rate control, efficient p-value combination, and adaptation to data structure—providing robust, interpretable, and theoretically grounded tools for a wide spectrum of inferential tasks.