Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Geometric Explanation of the Likelihood OOD Detection Paradox

Published 27 Mar 2024 in cs.LG, cs.AI, cs.CV, and stat.ML | (2403.18910v2)

Abstract: Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM. Our method can be applied to normalizing flows and score-based diffusion models, and obtains results which match or surpass state-of-the-art OOD detection benchmarks using the same DGM backbones. Our code is available at https://github.com/layer6ai-labs/dgm_ood_detection.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (86)
  1. Federated learning and differential privacy for medical image analysis. Scientific Reports, 12(1):1953, 2022.
  2. Scikit-Dimension: a python package for intrinsic dimension estimation. Entropy, 23(10):1368, 2021.
  3. Understanding and mitigating exploding inverses in invertible neural networks. In International Conference on Artificial Intelligence and Statistics, pp.  1792–1800, 2021.
  4. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–1828, 2013.
  5. Foundations of Data Science. Cambridge University Press, 2020.
  6. End to end learning for self-driving cars. arXiv:1604.07316, 2016.
  7. Flows for simultaneous manifold learning and density estimation. In Advances in Neural Information Processing Systems, 2020.
  8. Verifying the union of manifolds hypothesis for image data. In International Conference on Learning Representations, 2023.
  9. Entropic issues in likelihood-based OOD detection. In I (Still) Can’t Believe It’s Not Better! Workshop at NeurIPS 2021, pp.  21–26, 2021.
  10. Rectangular flows for manifold learning. In Advances in Neural Information Processing Systems, 2021.
  11. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, 2018.
  12. WAIC, but Why? Generative ensembles for robust anomaly detection. arXiv:1810.01392, 2018.
  13. Projection regret: Reducing background bias for novelty detection via diffusion models. In Advances in Neural Information Processing Systems, 2023.
  14. EMNIST: Extending MNIST to handwritten letters. In International Joint Conference on Neural Networks (IJCNN), pp.  2921–2926, 2017.
  15. Relaxing bijectivity constraints with continuously indexed normalising flows. In International Conference on Machine Learning, pp.  2133–2143, 2020.
  16. Density estimation using real NVP. In International Conference on Learning Representations, 2017.
  17. Implicit generation and modeling with energy based models. In Advances in Neural Information Processing Systems, 2019.
  18. Neural spline flows. In Advances in neural information processing systems, 2019.
  19. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115–118, 2017.
  20. Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Scientific Reports, 7(1):12140, 2017.
  21. An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, 100(2):176–183, 1971.
  22. A learning based hypothesis test for harmful covariate shift. In International Conference on Learning Representations, 2023.
  23. Likelihood-based out-of-distribution detection with denoising diffusion probabilistic models. In British Machine Vision Conference (BMVC), 2023.
  24. Toward supervised anomaly detection. Journal of Artificial Intelligence Research, 46:235–262, 2013.
  25. Denoising diffusion models for out-of-distribution detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2947–2956, 2023.
  26. Your classifier is secretly an energy based model and you should treat it like one. In International Conference on Learning Representations, 2020.
  27. Hierarchical VAEs know what they don’t know. In International Conference on Machine Learning, pp.  4117–4128, 2021.
  28. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, 2020.
  29. Intrinsic dimensionality estimation using normalizing flows. In Advances in Neural Information Processing Systems, 2022.
  30. Hutchinson, M. F. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 18(3):1059–1076, 1989.
  31. Low bias local intrinsic dimension estimation from expected simplex skewness. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1):196–202, 2014.
  32. Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, 2018.
  33. Auto-encoding variational Bayes. In International Conference on Learning Representations, 2014.
  34. Why normalizing flows fail to detect out-of-distribution data. In Advances in Neural Information Processing Systems, 2020.
  35. Kist, A. M. CelebA Dataset cropped with Haar-Cascade face detector, 2021. URL https://doi.org/10.5281/zenodo.5561092.
  36. Learning multiple layers of features from tiny images. Technical Report, 2009.
  37. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
  38. Tiny ImageNet visual recognition challenge. Technical Report, 2015.
  39. Perfect density models cannot guarantee anomaly detection. Entropy, 23(12):1690, 2021.
  40. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  41. Maximum likelihood estimation of intrinsic dimension. In Advances in Neural Information Processing Systems, 2004.
  42. Out-of-distribution detection with an adaptive likelihood ratio on informative hierarchical VAE. In Advances in Neural Information Processing Systems, 2022.
  43. A survey on deep learning in medical image analysis. Medical Image Analysis, 42:60–88, 2017.
  44. A simple approach to improve single-model deep uncertainty via distance-awareness. Journal of Machine Learning Research, 23(42):1–63, 2022.
  45. On the consistent estimation of optimal Receiver Operating Characteristic (ROC) curve. In Advances in Neural Information Processing Systems, 2022.
  46. Energy-based out-of-distribution detection. In Advances in Neural Information Processing Systems, volume 33, pp.  21464–21475, 2020.
  47. Unsupervised out-of-distribution detection with diffusion inpainting. In International Conference on Machine Learning, pp.  22528–22538, 2023.
  48. Diagnosing and fixing manifold overfitting in deep generative models. Transactions on Machine Learning Research, 2022.
  49. Mathematical analysis of singularities in the diffusion model under the submanifold assumption. arXiv:2301.07882, 2023.
  50. Density of states estimation for out of distribution detection. In International Conference on Artificial Intelligence and Statistics, pp.  3232–3240, 2021.
  51. Do deep generative models know what they don’t know? In International Conference on Learning Representations, 2019a.
  52. Detecting out-of-distribution inputs to deep generative models using typicality. arXiv:1906.02994, 2019b.
  53. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
  54. Pidstrigach, J. Score-based generative models detect manifolds. In Advances in Neural Information Processing Systems, 2022.
  55. The intrinsic dimension of images and its impact on learning. In International Conference on Learning Representations, 2021.
  56. Dataset Shift in Machine Learning. MIT Press, 2008.
  57. Failing loudly: An empirical study of methods for detecting dataset shift. In Advances in Neural Information Processing Systems, 2019.
  58. Likelihood ratios for out-of-distribution detection. In Advances in Neural Information Processing Systems, 2019.
  59. Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning, pp.  1278–1286, 2014.
  60. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10684–10695, 2022.
  61. Tractable density estimation on learned manifolds with conformal embedding flows. In Advances in Neural Information Processing Systems, 2021.
  62. Deep semi-supervised anomaly detection. In International Conference on Learning Representations, 2020.
  63. Neural empirical bayes. Journal of Machine Learning Research, 20(181):1–23, 2019.
  64. Understanding anomaly detection with deep invertible networks through hierarchies of distributions and features. In Advances in Neural Information Processing Systems, 2020.
  65. Input complexity and out-of-distribution detection with likelihood-based generative models. In International Conference on Learning Representations, 2020.
  66. Universal features of price formation in financial markets: Perspectives from deep learning. Quantitative Finance, 19(9):1449–1459, 2019.
  67. FLIF: Free lossless image format based on MANIAC compression. In International Conference on Image Processing, pp.  66–70, 2016.
  68. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp.  2256–2265, 2015.
  69. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, 2019.
  70. Maximum likelihood training of score-based diffusion models. In Advances in Neural Information Processing Systems, 2021a.
  71. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021b.
  72. Your diffusion model secretly knows the dimension of the data manifold. arXiv:2212.12611, 2022.
  73. LIDL: Local intrinsic dimension estimation using approximate likelihood. In International Conference on Machine Learning, pp.  21205–21231, 2022.
  74. NVAE: A deep hierarchical variational autoencoder. In Advances in Neural Information Processing Systems, 2020.
  75. Score-based generative modeling in latent space. In Advances in Neural Information Processing Systems, 2021.
  76. On feature collapse and deep kernel learning for single forward pass uncertainty. arXiv:2102.11409, 2021.
  77. Vincent, P. A connection between score matching and denoising autoencoders. Neural Computation, 23(7):1661–1674, 2011.
  78. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
  79. Wegner, S.-A. Lecture notes on high-dimensional data. arXiv:2101.05841, 2021.
  80. Mitigating neural network overconfidence with logit normalization. In International Conference on Machine Learning, pp.  23631–23644, 2022.
  81. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747, 2017.
  82. Likelihood regret: An out-of-distribution detection score for variational auto-encoder. In Advances in Neural Information Processing Systems, 2020.
  83. A theory of generative ConvNet. In International Conference on Machine Learning, pp.  2635–2644, 2016.
  84. Autoencoding under normalization constraints. In International Conference on Machine Learning, pp.  12087–12097, 2021.
  85. Understanding failures in out-of-distribution detection with deep generative models. In International Conference on Machine Learning, pp.  12427–12436, 2021.
  86. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  586–595, 2018.
Citations (6)

Summary

  • The paper demonstrates that DGMs assign unexpectedly high likelihoods to simpler OOD data due to low probability mass on low-dimensional manifolds.
  • It introduces local intrinsic dimension (LID) estimation as a method to quantify density and guide a dual threshold for effective OOD detection.
  • Experimental results show a significant AUC-ROC improvement from 0.070 to 0.953, proving the method's effectiveness in challenging FMNIST vs. MNIST scenarios.

A Geometric Explanation of the Likelihood OOD Detection Paradox

Introduction

The paper explores the perplexing behavior exhibited by likelihood-based deep generative models (DGMs) in the context of out-of-distribution (OOD) detection. Specifically, these models, when trained on complex datasets, tend to assign higher likelihoods to OOD data from simpler datasets. This paradox arises despite the fact that DGMs do not generate samples from these high-likelihood regions. The paper proposes a geometric explanation, suggesting that high likelihoods can coincide with low probability mass due to the data's confinement to low-dimensional manifolds. The authors introduce local intrinsic dimension (LID) estimation as a means to detect this scenario and propose a method for OOD detection, leveraging the combination of likelihoods and LID estimates.

Methodology

Likelihood Behavior in DGMs

The authors begin by highlighting the unusual trend where trained DGMs assign higher likelihoods to simpler, OOD datasets compared to more complex, in-distribution datasets. This observation is coupled with the fact that DGMs generate samples that visually appear similar to the training data, never generating the seemingly high-likelihood OOD samples. The paper posits that this behavior can occur when OOD data resides in regions of low probability mass. Figure 1

Figure 1

Figure 1

Figure 1

Figure 1

Figure 1: FMNIST-trained DM vs.\ MNIST.

Relationship Between LID and Probability Mass

The authors explore the relationship between local intrinsic dimension and contiguous volume, establishing that LID serves as a proxy for how densely a model distributes probability mass around a given point. The volume assigned around a point in low-dimensional space is smaller than in higher-dimensional regions, allowing high densities without substantial probability mass. The paper illustrates this concept using Gaussian convolutions and other mathematical formulations, showing empirically that the intrinsic dimension can be effectively captured by the rank of specific matrices.

Dual Threshold OOD Detection

To address the paradox, the paper proposes a dual threshold method for OOD detection. This approach involves classifying a data point as OOD if its likelihood is low or if, despite being high, its LID is small. Conversely, samples are classified as in-distribution only if both likelihood and LID are high.

Experiments

Validation of LID-Based Detection

Experiments are conducted on various pairs of datasets to evaluate the effectiveness of the proposed method. The experimental results demonstrate significant improvements when leveraging LID in conjunction with likelihoods for OOD detection compared to using likelihoods alone. In key pathological scenarios such as FMNIST vs. MNIST, results show AUC-ROC improvements, showcasing the potential of the dual threshold method. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: ~FMNIST~vs.~MNIST:\AUC-ROC boost (0.070 \to 0.953)

Conclusion

The paper presents a compelling geometric explanation for the likelihood-based OOD detection paradox and provides a practical method to address it by using LID estimates alongside likelihood evaluations. This dual approach consistently enhances OOD detection performance across diverse datasets, proving more resilient than traditional single threshold techniques. The study opens avenues for further exploration into the geometry of data manifolds and their impact on density estimation, providing a robust framework for understanding and mitigating likelihood pathologies in DGMs.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 5 tweets with 43 likes about this paper.