Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Out-of-Distribution Detection Should Use Conformal Prediction (and Vice-versa?) (2403.11532v1)

Published 18 Mar 2024 in stat.ML, cs.CV, and cs.LG

Abstract: Research on Out-Of-Distribution (OOD) detection focuses mainly on building scores that efficiently distinguish OOD data from In Distribution (ID) data. On the other hand, Conformal Prediction (CP) uses non-conformity scores to construct prediction sets with probabilistic coverage guarantees. In this work, we propose to use CP to better assess the efficiency of OOD scores. Specifically, we emphasize that in standard OOD benchmark settings, evaluation metrics can be overly optimistic due to the finite sample size of the test dataset. Based on the work of (Bates et al., 2022), we define new conformal AUROC and conformal FRP@TPR95 metrics, which are corrections that provide probabilistic conservativeness guarantees on the variability of these metrics. We show the effect of these corrections on two reference OOD and anomaly detection benchmarks, OpenOOD (Yang et al., 2022) and ADBench (Han et al., 2022). We also show that the benefits of using OOD together with CP apply the other way around by using OOD scores as non-conformity scores, which results in improving upon current CP methods. One of the key messages of these contributions is that since OOD is concerned with designing scores and CP with interpreting these scores, the two fields may be inherently intertwined.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Uncertainty sets for image classifiers using conformal prediction. arXiv preprint arXiv:2009.14193, 2020.
  2. A gentle introduction to conformal prediction and distribution-free uncertainty quantification, 2022.
  3. Conformal prediction for reliable machine learning: theory, adaptations and applications. Newnes, 2014.
  4. Testing for Outliers with Conformal p-values, May 2022. URL http://arxiv.org/abs/2104.08279.
  5. Towards Open Set Deep Networks. CoRR, abs/1511.06233, 2015. URL http://arxiv.org/abs/1511.06233.
  6. Extremely Simple Activation Shaping for Out-of-distribution Detection. CoRR, abs/2209.09858, 2022. doi: 10.48550/ARXIV.2209.09858. URL https://doi.org/10.48550/arXiv.2209.09858.
  7. Selective classification for deep neural networks. Advances in neural information processing systems, 30, 2017.
  8. SelectiveNet: A deep neural network with an integrated reject option. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  2151–2159. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/geifman19a.html.
  9. Prediction and outlier detection in classification problems. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2):524–546, 2022.
  10. On Calibration of Modern Neural Networks. CoRR, abs/1706.04599, 2017. URL http://arxiv.org/abs/1706.04599.
  11. Adbench: Anomaly detection benchmark. Advances in Neural Information Processing Systems, 35:32142–32159, 2022.
  12. A statistical framework for efficient out of distribution detection in deep neural networks, March 2022. URL http://arxiv.org/abs/2102.12967.
  13. A Baseline for Detecting Misclassified and Out-of-distribution Examples in Neural Networks. CoRR, abs/1610.02136, 2016. URL http://arxiv.org/abs/1610.02136.
  14. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks, October 2018. URL http://arxiv.org/abs/1610.02136.
  15. Scaling Out-of-distribution Detection for Real-world Settings. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., and Sabato, S. (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp.  8759–8773. PMLR, 2022. URL https://proceedings.mlr.press/v162/hendrycks22a.html.
  16. On the Importance of Gradients for Detecting Distributional Shifts in the Wild. CoRR, abs/2110.00218, 2021. URL https://arxiv.org/abs/2110.00218.
  17. iDECODe: In-Distribution Equivariance for Conformal Out-of-Distribution Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 36(7):7104–7114, June 2022. ISSN 2374-3468, 2159-5399. doi: 10.1609/aaai.v36i7.20670. URL https://ojs.aaai.org/index.php/AAAI/article/view/20670.
  18. Laxhammar, R. Conformal anomaly detection. Skövde, Sweden: University of Skövde, 2, 2014.
  19. Sequential conformal anomaly detection in trajectories based on hausdorff distance. In 14th international conference on information fusion, pp.  1–8. IEEE, 2011.
  20. A Simple Unified Framework for Detecting Out-of-distribution Samples and Adversarial Attacks. CoRR, abs/1807.03888, 2018. URL http://arxiv.org/abs/1807.03888.
  21. Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance. Journal of Experimental Social Psychology, 2018.
  22. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum?id=H1VGkIxRZ.
  23. Integrative conformal p-values for powerful out-of-distribution testing with labeled outliers, August 2022. URL http://arxiv.org/abs/2208.11111.
  24. Energy-based Out-of-distribution Detection. CoRR, abs/2010.03759, 2020. URL https://arxiv.org/abs/2010.03759.
  25. Inductive confidence machines for regression. In Machine Learning: ECML 2002: 13th European Conference on Machine Learning Helsinki, Finland, August 19–23, 2002 Proceedings 13, pp.  345–356. Springer, 2002.
  26. Understanding softmax confidence and uncertainty. arXiv preprint arXiv:2106.04972, 2021.
  27. Classification with valid and adaptive coverage. Advances in Neural Information Processing Systems, 33:3581–3591, 2020.
  28. Least ambiguous set-valued classifiers with bounded error levels. Journal of the American Statistical Association, 114(525):223–234, 2019.
  29. Detecting Out-of-distribution Examples with Gram Matrices. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp.  8491–8501. PMLR, 2020. URL http://proceedings.mlr.press/v119/sastry20a.html.
  30. Calibration of ρ𝜌\rhoitalic_ρ values for testing precise null hypotheses. The American Statistician, 55:62 – 71, 2001. URL https://api.semanticscholar.org/CorpusID:396772.
  31. A tutorial on conformal prediction. Journal of Machine Learning Research, 9(3), 2008.
  32. RankFeat: Rank-1 Feature Removal for Out-of-distribution Detection. CoRR, abs/2209.08590, 2022. doi: 10.48550/ARXIV.2209.08590. URL https://doi.org/10.48550/arXiv.2209.08590.
  33. DICE: Leveraging Sparsification for Out-of-distribution Detection. In Avidan, S., Brostow, G. J., Cissé, M., Farinella, G. M., and Hassner, T. (eds.), Computer Vision - ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXIV, volume 13684 of Lecture Notes in Computer Science, pp.  691–708. Springer, 2022. doi: 10.1007/978-3-031-20053-3_40. URL https://doi.org/10.1007/978-3-031-20053-3_40.
  34. ReAct: Out-of-distribution Detection With Rectified Activations. CoRR, abs/2111.12797, 2021. URL https://arxiv.org/abs/2111.12797.
  35. Out-of-distribution Detection with Deep Nearest Neighbors. CoRR, abs/2204.06507, 2022. doi: 10.48550/ARXIV.2204.06507. URL https://doi.org/10.48550/arXiv.2204.06507.
  36. Vovk, V. Conditional Validity of Inductive Conformal Predictors. In Proceedings of the Asian Conference on Machine Learning, pp.  475–490. PMLR, November 2012. URL https://proceedings.mlr.press/v25/vovk12.html.
  37. Testing exchangeability on-line. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp.  768–775, 2003.
  38. Algorithmic learning in a random world, volume 29. Springer, 2005.
  39. ViM: Out-Of-distribution with Virtual-logit Matching. CoRR, abs/2203.10807, 2022. doi: 10.48550/ARXIV.2203.10807. URL https://doi.org/10.48550/arXiv.2203.10807.
  40. Generalized Out-of-distribution Detection: A Survey. CoRR, abs/2110.11334, 2021. URL https://arxiv.org/abs/2110.11334.
  41. Openood: Benchmarking generalized out-of-distribution detection. Advances in Neural Information Processing Systems, 35:32598–32611, 2022.
  42. Out-of-distribution Detection based on In-distribution Data Patterns Memorization with Modern Hopfield Energy. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=KkazG4lgKL.
Citations (1)

Summary

  • The paper introduces a bidirectional framework where conformal prediction enhances OOD detection by providing statistically principled threshold selection.
  • The methodology conditions p-values on class distributions, improving the discriminative power and reliability of detection scores.
  • Experimental results on the SVHN dataset demonstrate superior AUROC and TPR performance, highlighting the framework's potential for robust prediction.

Conformal Prediction for Out-of-Distribution Detection

The paper presents a detailed investigation into the synergy between conformal prediction (CP) and out-of-distribution (OOD) detection methodologies. The authors propose a novel framework that integrates CP techniques into the OOD detection paradigm to enhance the reliability and efficiency of both methods.

Overview

The core proposal of the paper is to establish a bidirectional improvement mechanism where CP can address key challenges in OOD detection and vice versa. The research identifies several critical aspects of this integration:

  1. Threshold Selection in OOD Detection: Traditional OOD methods rely on predefined thresholds that lack robustness across various datasets and scenarios. The authors argue that conformal prediction can offer a statistically principled approach to select these thresholds by calibrating them on auxiliary datasets, thereby ensuring more reliable inferences across varying conditions.
  2. Class-Conditional P-Values: By conditioning p-values on class distributions, the paper suggests a refined methodology that enhances the discriminative power of OOD detection scores. This approach allows for more nuanced decision-making processes that are sensitive to the inherent class structures within the dataset.
  3. Utilizing OOD Scores for CP: The paper also explores the reverse integration, where sophisticated OOD scores can be leveraged as non-conformity measures within the CP framework. This integration is proposed to improve the predictive intervals and sets in CP, ensuring better statistical coverage and efficiency.

Experimental Framework

The authors propose a set of experiments, notably on the SVHN dataset, to validate their hypotheses. This involves training neural networks and using large calibration datasets to derive empirical thresholds and compare them against those obtained through conventional OOD procedures. The paper anticipates that CP-corrected methods will demonstrate superior performance in terms of AUROC and TPR metrics.

Implications and Challenges

The integration of CP and OOD detection methodologies holds significant potential for advancing the reliability of model predictions in machine learning. However, the paper also notes the challenges in achieving perfect conditional coverage and independence of p-values, highlighting the complexity of real-world data distributions.

Future Directions

The research opens up several avenues for future exploration:

  • Theoretical Assessment: A deeper theoretical evaluation of the assumptions underpinning CP in the context of OOD detection could offer further insights into the strengths and limitations of the proposed methods.
  • Class Conditioning Dynamics: Investigating how class-conditioned approaches scale with increasing dataset complexity and class imbalance would be valuable in understanding their practical utility.
  • Broader Applicability: Extending the framework to other domains where OOD detection is critical, such as adversarial robustness and anomaly detection, could significantly enhance the robustness and applicability of CP methodologies.

In conclusion, the paper provides a compelling case for combining conformal prediction with OOD detection strategies, aiming to bridge gaps in current machine learning practices. The proposed methodologies and experimental insights suggest a promising direction for enhancing prediction reliability in diverse application domains.