Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Entropic Issues in Likelihood-Based OOD Detection (2109.10794v2)

Published 22 Sep 2021 in stat.ML and cs.LG

Abstract: Deep generative models trained by maximum likelihood remain very popular methods for reasoning about data probabilistically. However, it has been observed that they can assign higher likelihoods to out-of-distribution (OOD) data than in-distribution data, thus calling into question the meaning of these likelihood values. In this work we provide a novel perspective on this phenomenon, decomposing the average likelihood into a KL divergence term and an entropy term. We argue that the latter can explain the curious OOD behaviour mentioned above, suppressing likelihood values on datasets with higher entropy. Although our idea is simple, we have not seen it explored yet in the literature. This analysis provides further explanation for the success of OOD detection methods based on likelihood ratios, as the problematic entropy term cancels out in expectation. Finally, we discuss how this observation relates to recent success in OOD detection with manifold-supported models, for which the above decomposition does not hold directly.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
  2. Christopher M Bishop. Novelty detection and neural network validation. IEE Proceedings-Vision, Image and Signal processing, 141(4):217–222, 1994.
  3. Flows for simultaneous manifold learning and density estimation. Advances in Neural Information Processing Systems, 33, 2020.
  4. Rectangular flows for manifold learning. arXiv preprint arXiv:2106.01413, 2021.
  5. Waic, but why? generative ensembles for robust anomaly detection. arXiv preprint arXiv:1810.01392, 2018.
  6. Implicit generation and generalization in energy-based models. arXiv preprint arXiv:1903.08689, 2019.
  7. Linear dynamical neural population models through nonlinear embeddings. Advances in neural information processing systems, 29:163–171, 2016.
  8. Deep generative models strike back! improving understanding and evaluation in light of unmet expectations for ood data. arXiv preprint arXiv:1911.04699, 2019.
  9. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
  10. Semi-supervised learning with deep generative models. In Advances in neural information processing systems, pages 3581–3589, 2014.
  11. Why normalizing flows fail to detect out-of-distribution data. Advances in Neural Information Processing Systems, 33, 2020.
  12. Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  13. Learning multiple layers of features from tiny images. 2009.
  14. Perfect density models cannot guarantee anomaly detection. In ”I Can’t Believe It’s Not Better!”NeurIPS 2020 workshop, 2020.
  15. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  16. A tutorial on energy-based learning. Predicting structured data, 1(0), 2006.
  17. Nonparametric k-nearest-neighbor entropy estimator. Physical Review E, 93(1):013310, 2016.
  18. Density of states estimation for out of distribution detection. In International Conference on Artificial Intelligence and Statistics, pages 3232–3240. PMLR, 2021.
  19. Do deep generative models know what they don’t know? In International Conference on Learning Representations, 2019a.
  20. Detecting out-of-distribution inputs to deep generative models using typicality. arXiv preprint arXiv:1906.02994, 2019b.
  21. Reading digits in natural images with unsupervised feature learning. 2011.
  22. Parallel wavenet: Fast high-fidelity speech synthesis. In International conference on machine learning, pages 3918–3926. PMLR, 2018.
  23. Normalizing flows for probabilistic modeling and inference. arXiv preprint arXiv:1912.02762, 2019.
  24. Likelihood ratios for out-of-distribution detection. Advances in Neural Information Processing Systems, 32:14707–14718, 2019.
  25. Stochastic backpropagation and variational inference in deep latent gaussian models. In International Conference on Machine Learning, volume 2, page 2. Citeseer, 2014.
  26. Detecting out-of-distribution examples with gram matrices. In International Conference on Machine Learning, pages 8491–8501. PMLR, 2020.
  27. Understanding anomaly detection with deep invertible networks through hierarchies of distributions and features. Advances in Neural Information Processing Systems, 33, 2020.
  28. Input complexity and out-of-distribution detection with likelihood-based generative models. In International Conference on Learning Representations, 2020.
  29. Practical lossless compression with latent variables using bits back coding. In International Conference on Learning Representations, 2019.
  30. Nvae: A deep hierarchical variational autoencoder. In Advances in Neural Information Processing Systems, volume 33, 2020.
  31. Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. Journal of machine learning research, 11(12), 2010.
  32. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  33. Likelihood regret: An out-of-distribution detection score for variational auto-encoder. Advances in Neural Information Processing Systems, 33, 2020.
  34. Understanding failures in out-of-distribution detection with deep generative models. In International Conference on Machine Learning, pages 12427–12436. PMLR, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Anthony L. Caterini (17 papers)
  2. Gabriel Loaiza-Ganem (30 papers)
Citations (14)

Summary

We haven't generated a summary for this paper yet.