Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them) (2405.06627v3)
Abstract: As AI / ML gain widespread adoption, practitioners are increasingly seeking means to quantify and control the risk these systems incur. This challenge is especially salient when such systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the data distribution. Conformal prediction is a promising approach to uncertainty and risk quantification, but prior variants' validity guarantees have assumed some form of ``quasi-exchangeability'' on the data distribution, thereby excluding many types of sequential shifts. In this paper we prove that conformal prediction can theoretically be extended to \textit{any} joint data distribution, not just exchangeable or quasi-exchangeable ones. Although the most general case is exceedingly impractical to compute, for concrete practical applications we outline a procedure for deriving specific conformal algorithms for any data distribution, and we use this procedure to derive tractable algorithms for a series of AI/ML-agent-induced covariate shifts. We evaluate the proposed algorithms empirically on synthetic black-box optimization and active learning tasks.
- Conformal risk control. arXiv preprint arXiv:2208.02814, 2022.
- Conformal pid control for time series prediction. Advances in neural information processing systems, 36, 2023.
- High-throughput ml-guided design of diverse single-domain antibodies against sars-cov-2. bioRxiv, pp. 2023–12, 2023.
- Predictive inference with the jackknife+. 2021.
- Conformal prediction beyond exchangeability. The Annals of Statistics, 51(2):816–845, 2023.
- Improved online conformal prediction via strongly adaptive online learning. arXiv preprint arXiv:2302.07869, 2023.
- Low-n protein engineering with data-efficient deep learning. Nature methods, 18(4):389–396, 2021.
- Airfoil Self-Noise. UCI Machine Learning Repository, 2014. DOI: https://doi.org/10.24432/C5VW2C.
- Buza, K. BlogFeedback. UCI Machine Learning Repository, 2014. DOI: https://doi.org/10.24432/C58S3F.
- Adaptive conformal prediction for motion planning among dynamic agents. In Learning for Dynamics and Control Conference, pp. 300–314. PMLR, 2023.
- Sample design of the Medical Expenditure Panel Survey household component, 1998-2007. US Department of Health & Human Services, Agency for Healthcare Research and …, 2008.
- Conformal prediction under feedback covariate shift for biomolecular design. Proceedings of the National Academy of Sciences, 119(43):e2204569119, 2022.
- Non-exchangeable conformal risk control. arXiv preprint arXiv:2310.01262, 2023.
- Achieving risk control in online learning settings. Transactions on Machine Learning Research, 2023.
- The limits of distribution-free conditional predictive inference. Information and Inference: A Journal of the IMA, 10(2):455–482, 2021.
- Frank, A. Uci machine learning repository. http://archive. ics. uci. edu/ml, 2010.
- Adaptive conformal inference under distribution shift. Advances in Neural Information Processing Systems, 34:1660–1672, 2021.
- Conformal prediction with conditional guarantees. arXiv preprint arXiv:2305.12616, 2023.
- Protein design with guided discrete diffusion. Advances in neural information processing systems, 36, 2023.
- Selection by prediction with conformal p-values. Journal of Machine Learning Research, 24(244):1–41, 2023.
- Probabilistic graphical models: principles and techniques. MIT press, 2009.
- Risk-controlling model selection via guided bayesian optimization. arXiv preprint arXiv:2312.01692, 2023.
- Papadopoulos, H. Inductive conformal prediction: Theory and application to neural networks. In Tools in artificial intelligence. Citeseer, 2008.
- Pearl, J. Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009. ISBN 052189560X.
- Distribution-free uncertainty quantification for classification under label shift. In Uncertainty in Artificial Intelligence, pp. 844–853. PMLR, 2021.
- Learning the pattern of epistasis linking genotype and phenotype in a protein. Nature communications, 10(1):4213, 2019.
- Jaws: Auditing predictive uncertainty under covariate shift. Advances in Neural Information Processing Systems, 35:35907–35920, 2022.
- Jaws-x: addressing efficiency bottlenecks of conformal prediction under standard and feedback covariate shift. In International Conference on Machine Learning, pp. 28167–28190. PMLR, 2023.
- Redmond, M. Communities and Crime. UCI Machine Learning Repository, 2009. DOI: https://doi.org/10.24432/C53W3X.
- Optimizing hyperparameters with conformal quantile regression. arXiv preprint arXiv:2305.03623, 2023.
- Shimodaira, H. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference, 90(2):227–244, 2000.
- Accelerating bayesian optimization for biological sequence design with denoising autoencoders. In International Conference on Machine Learning, pp. 20459–20478. PMLR, 2022.
- Bayesian optimization with conformal prediction sets. In International Conference on Artificial Intelligence and Statistics, pp. 959–986. PMLR, 2023.
- Conformal prediction under covariate shift. Advances in neural information processing systems, 32, 2019.
- Conformal predictive decision making. In Conformal and Probabilistic Prediction and Applications, pp. 52–62. PMLR, 2018.
- Algorithmic learning in a random world, volume 29. Springer, 2005.
- Adaptive conformal predictions for time series. In International Conference on Machine Learning, pp. 25834–25866. PMLR, 2022.
- Bayesian optimization with formal safety guarantees via online conformal prediction. arXiv preprint arXiv:2306.17815, 2023.
- Optimal trade-off control in machine learning–based library design, with application to adeno-associated virus (aav) for gene therapy. Science Advances, 10(4):eadj3786, 2024.
- Drew Prinster (4 papers)
- Samuel Stanton (14 papers)
- Anqi Liu (51 papers)
- Suchi Saria (35 papers)