Confidence on the Focal: Conformal Prediction with Selection-Conditional Coverage (2403.03868v2)
Abstract: Conformal prediction builds marginally valid prediction intervals that cover the unknown outcome of a randomly drawn new test point with a prescribed probability. However, a common scenario in practice is that, after seeing the data, practitioners decide which test unit(s) to focus on in a data-driven manner and seek for uncertainty quantification of the focal unit(s). In such cases, marginally valid conformal prediction intervals may not provide valid coverage for the focal unit(s) due to selection bias. This paper presents a general framework for constructing a prediction set with finite-sample exact coverage conditional on the unit being selected by a given procedure. The general form of our method works for arbitrary selection rules that are invariant to the permutation of the calibration units, and generalizes Mondrian Conformal Prediction to multiple test units and non-equivariant classifiers. We then work out the computationally efficient implementation of our framework for a number of realistic selection rules, including top-K selection, optimization-based selection, selection based on conformal p-values, and selection based on properties of preliminary conformal prediction sets. The performance of our methods is demonstrated via applications in drug discovery and health risk prediction.
- Outpatient appointment systems in healthcare: A review of optimization studies. European Journal of Operational Research, 258(1):3–34.
- Inference on winners. The Quarterly Journal of Economics, 139(1):305–358.
- A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511.
- Selective conformal inference with false coverage-statement rate control. Biometrika, page asae010.
- Testing for outliers with conformal p-values. The Annals of Statistics, 51(1):149–178.
- Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289–300.
- False discovery rate–adjusted multiple confidence intervals for selected parameters. Journal of the American Statistical Association, 100(469):71–81.
- Valid post-selection inference. The Annals of Statistics, pages 802–837.
- Combined mathematical programming and heuristics for a radiotherapy pre-treatment scheduling problem. Journal of Scheduling, 15:333–346.
- Valid post-selection and post-regularization inference: An elementary, general approach. Annu. Rev. Econ., 7(1):649–688.
- The limits of distribution-free conditional predictive inference. Information and Inference: A Journal of the IMA, 10(2):455–482.
- Dynamic scheduling with due dates and time windows: an application to chemotherapy patient appointment booking. Health care management science, 17:60–76.
- An extensive data processing pipeline for mimic-iv. In Machine Learning for Health, pages 311–325. PMLR.
- Deeppurpose: a deep learning library for drug–target interaction prediction. Bioinformatics, 36(22-23):5545–5547.
- Model-free selective inference under covariate shift via weighted conformal p-values. arXiv preprint arXiv:2307.09291.
- Selection by prediction with conformal p-values. Journal of Machine Learning Research, 24(244):1–41.
- Mimic-iv, a freely accessible electronic health record dataset. Scientific data, 10(1):1.
- Optimized appointment scheduling. European Journal of Operational Research, 239(1):243–255.
- A (tight) upper bound for the length of confidence intervals with conditional coverage. arXiv preprint arXiv:2007.12448.
- Codrug: Conformal drug property prediction with density estimation under covariate shift. Advances in Neural Information Processing Systems, 36.
- Exact post-selection inference, with application to the lasso.
- Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113(523):1094–1111.
- Levitskaya, V. (2023). How to boost business decisions with conformal prediction and confidence. https://redfield.ai/conformal-prediction-for-business/.
- Integrative conformal p-values for powerful out-of-distribution testing with labeled outliers. arXiv preprint arXiv:2208.11111.
- Liu, S. (2023). An exact sampler for inference after polyhedral model selection. arXiv preprint arXiv:2308.10346.
- Machine learning meets false discovery rate. arXiv preprint arXiv:2208.06685.
- Unifying approach to selective inference with applications to cross-validation. arXiv preprint arXiv:1703.06559.
- Improving predictions of pediatric surgical durations with supervised learning. International Journal of Data Science and Analytics, 4:35–52.
- McCloskey, A. (2024). Hybrid confidence intervals for informative uniform asymptotic inference after model selection. Biometrika, 111(1):109–127.
- mknapsack (2023). Python package mknapsack 1.1.12.
- Estimating diagnostic uncertainty in artificial intelligence assisted pathology using conformal prediction. Nature communications, 13(1):7761.
- Post-selection point and interval estimation of signal sizes in gaussian samples. Canadian Journal of Statistics, 45(2):128–148.
- Robots that ask for help: Uncertainty alignment for large language model planners. arXiv preprint arXiv:2307.01928.
- Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5):742–754.
- Conformalized quantile regression. Advances in neural information processing systems, 32.
- Classification with valid and adaptive coverage. Advances in Neural Information Processing Systems, 33:3581–3591.
- Post-selection inference for conformal prediction: Trading off coverage for precision. arXiv preprint arXiv:2304.06158.
- Conformalized selective regression. arXiv preprint arXiv:2402.16300.
- Maximizing gain in high-throughput screening using conformal prediction. Journal of cheminformatics, 10(1):1–10.
- Improving screening efficiency through iterative screening using docking and conformal prediction. Journal of chemical information and modeling, 57(3):439–444.
- Asymptotics of selective inference. Scandinavian Journal of Statistics, 44(2):480–499.
- Uniform asymptotic inference and the bootstrap after model selection.
- Exact post-selection inference for sequential regression procedures. Journal of the American Statistical Association, 111(514):600–620.
- Algorithmic learning in a random world, volume 29. Springer.
- Mondrian confidence machine.
- Selection adjusted confidence intervals with more power to determine the sign. Journal of the American Statistical Association, 108(501):165–176.
- Online control of the false coverage rate and false sign rate. In International Conference on Machine Learning, pages 10193–10202. PMLR.
- Constructing confidence intervals for selected parameters. Biometrics, 76(4):1098–1108.
- Ying Jin (57 papers)
- Zhimei Ren (25 papers)