Online Selective Conformal Prediction: Errors and Solutions (2503.16809v1)

Published 21 Mar 2025 in stat.ML and cs.LG

Abstract: In online selective conformal inference, data arrives sequentially, and prediction intervals are constructed only when an online selection rule is met. Since online selections may break the exchangeability between the selected test datum and the rest of the data, one must correct for this by suitably selecting the calibration data. In this paper, we evaluate existing calibration selection strategies and pinpoint some fundamental errors in the associated claims that guarantee selection-conditional coverage and control of the false coverage rate (FCR). To address these shortcomings, we propose novel calibration selection strategies that provably preserve the exchangeability of the calibration data and the selected test datum. Consequently, we demonstrate that online selective conformal inference with these strategies guarantees both selection-conditional coverage and FCR control. Our theoretical findings are supported by experimental evidence examining tradeoffs between valid methods.

Summary

Online Selective Conformal Prediction: Errors and Solutions

The paper investigates the intricacies of constructing prediction intervals under the online selective conformal prediction framework. This extends the conformal prediction paradigm where data arrives sequentially, and prediction intervals are generated based on a selection criterion that determines whether a test instance should have a prediction interval reported. The primary challenge is that such selective reporting can disrupt the exchangeability of data, essential for conformal methods' validity guarantees.

Conformal prediction traditionally relies on the property of exchangeability to provide distribution-free predictive intervals. Yet, when data points are selectively processed, the independence of selection rules from other data can break exchangeability, thereby undermining prediction interval validity. This work scrutinizes the calibration selection strategies deployed to address this issue and introduces novel methods that correct fundamental errors in previously claimed guarantees.

Theoretical Contributions

The authors focused on ensuring the selection-conditional coverage and False Coverage Rate (FCR) control, two metrics crucial for evaluating the efficacy of prediction intervals. Selection-conditional coverage ensures that the prediction interval remains valid on the data that meet the selection criterion. In contrast, FCR evaluates the proportion of prediction intervals that fail to cover the true label among the selected instances.

The authors criticized existing methods, notably the Adaptive Calibration Selection (ADA) strategy, highlighting its failure to maintain selection-conditional coverage. They propose several novel strategies, including the express and k-express strategies, which ensure exchangeability by constructing calibration sets that explicitly match the selection process of the test instances. Notably, these strategies ensure valid coverage through robust exchangeability-preserving designs, albeit sometimes at the cost of calibration data set size.

Empirical Evaluation

Extensive empirical evidence supports the theoretical claims, revealing that existing strategies such as the full calibration strategy often fall short when it comes to ensuring valid prediction intervals under selective settings. The proposed methods demonstrate superior performance in maintaining both selection-conditional coverage and controlling the FCR across various simulations. For instance, strategies like EXPRESS prove particularly effective, albeit strict in conditions for inclusion in calibration sets, leading to a trade-off between statistical rigidity and data availability. The authors verify that while certain traditional techniques may appear effective under specific conditions, they lack the robust guarantee provided by the novel methods.

Implications and Future Directions

The paper has substantial repercussions for applying statistical methods in real-world online settings where selectivity is inherent, such as personalized medicine or dynamic resource allocation systems. The proposed methods could help ensure that prediction systems not only maintain statistical rigor but also yield practical utility in environments characterized by selective observation and reporting.

Looking ahead, the research invites additional exploration into hybrid or adaptive strategies that might better balance exchangeability and calibration set size. Extending the proposed methods to broader classes of selection rules beyond decision-driven ones presents a promising avenue for future work. Furthermore, empirical application in diverse domains could reveal additional insights into practical adaptation and performance.

This investigation enriches the conformal prediction framework by addressing its limitations in selective contexts, offering theoretically sound and empirically validated methods to maintain prediction validity. The groundwork laid invites further exploration and refinement as applications of machine learning continue to expand into more complex, real-time environments.