Approximations to the Fisher Information Metric of Deep Generative Models for Out-Of-Distribution Detection (2403.01485v2)
Abstract: Likelihood-based deep generative models such as score-based diffusion models and variational autoencoders are state-of-the-art machine learning models approximating high-dimensional distributions of data such as images, text, or audio. One of many downstream tasks they can be naturally applied to is out-of-distribution (OOD) detection. However, seminal work by Nalisnick et al. which we reproduce showed that deep generative models consistently infer higher log-likelihoods for OOD data than data they were trained on, marking an open problem. In this work, we analyse using the gradient of a data point with respect to the parameters of the deep generative model for OOD detection, based on the simple intuition that OOD data should have larger gradient norms than training data. We formalise measuring the size of the gradient as approximating the Fisher information metric. We show that the Fisher information matrix (FIM) has large absolute diagonal values, motivating the use of chi-square distributed, layer-wise gradient norms as features. We combine these features to make a simple, model-agnostic and hyperparameter-free method for OOD detection which estimates the joint density of the layer-wise gradient norms for a given data point. We find that these layer-wise gradient norms are weakly correlated, rendering their combined usage informative, and prove that the layer-wise gradient norms satisfy the principle of (data representation) invariance. Our empirical results indicate that this method outperforms the Typicality test for most deep generative models and image dataset pairings.
- Shun-ichi Amari. Natural gradient works efficiently in learning. Neural Computation, 10(2):251–276, 1998.
- The statistical analysis of variance-heterogeneity and the logarithmic transformation. Supplement to the Journal of the Royal Statistical Society, 8(1):128–138, 1946.
- Autoencoders for unsupervised anomaly segmentation in brain mr images: a comparative study. Medical Image Analysis, 69:101952, 2021.
- Gradorth: A simple yet efficient out-of-distribution detection with orthogonal projection of gradients, arXiv, 2023.
- Model-agnostic out-of-distribution detection using combined statistical tests. In International Conference on Artificial Intelligence and Statistics. PMLR, 2022.
- Christopher M Bishop. Novelty detection and neural network validation. IEE Proceedings-Vision, Image and Signal processing, 141(4):217–222, 1994.
- Foundations of Data Science. Cambridge University Press, 2020.
- Entropic issues in likelihood-based OOD detection. In Melanie F. Pradier, Aaron Schein, Stephanie Hyland, Francisco J. R. Ruiz, and Jessica Z. Forde (eds.), Proceedings on "I (Still) Can’t Believe It’s Not Better!" at NeurIPS 2021 Workshops. 13 Dec 2022.
- Waic, but why? generative ensembles for robust anomaly detection. arXiv preprint arXiv:1810.01392, 2018.
- Robust out-of-distribution detection on deep probabilistic generative models, arXiv, 2021.
- Asymptotic Equipartition Property, chapter 3, pp. 57–69. John Wiley & Sons, Ltd, 1991. ISBN 9780471748823.
- R. A. Fisher. Statistical methods for research workers. Oliver and Boyd, 1938.
- Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17:1–35, 2016.
- Rlsbench: Domain adaptation under relaxed label shift. arXiv preprint arXiv:2302.03020, 2023.
- Fast approximate natural gradient descent in a kronecker factored eigenbasis. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems. 2018.
- Denoising diffusion models for out-of-distribution detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2023.
- Hierarchical vaes know what they don’t know. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning. 18–24 Jul 2021.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. In International Conference on Learning Representations, 2017.
- Deep anomaly detection with outlier exposure. CoRR, abs/1812.04606, 2018.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- How useful are gradients for ood detection really?, arXiv, 2022.
- Exploiting generative models in discriminative classifiers. In M. Kearns, S. Solla, and D. Cohn (eds.), Advances in Neural Information Processing Systems. 1998.
- Neural tangent kernel: Convergence and generalization in neural networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems. 2018.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
- Auto-encoding variational bayes. In Proceedings of The 33rd International Conference on Machine Learning, 2014.
- Glow: Generative flow with invertible 1x1 convolutions. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems. 2018.
- Why normalizing flows fail to detect out-of-distribution data. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems. 2020.
- Backpropagated gradient representations for anomaly detection. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
- Perfect density models cannot guarantee anomaly detection. Entropy, 23(12), 2021.
- Enhancing the reliability of out-of-distribution image detection in neural networks. In International Conference on Learning Representations, 2018.
- Energy-based out-of-distribution detection. Advances in neural information processing systems, 33:21464–21475, 2020.
- James Martens. New insights and perspectives on the natural gradient method. Journal of Machine Learning Research, 21(146):1–76, 2020.
- Density of states estimation for out of distribution detection. In Arindam Banerjee and Kenji Fukumizu (eds.), Proceedings of The 24th International Conference on Artificial Intelligence and Statistics. 13–15 Apr 2021.
- Do deep generative models know what they don’t know? In International Conference on Learning Representations, 2019a.
- Detecting out-of-distribution inputs to deep generative models using typicality, arXiv, 2019b.
- Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
- Gee: A gradient-based explainable variational autoencoder for network anomaly detection. In 2019 IEEE Conference on Communications and Network Security (CNS). IEEE, 2019.
- Normalizing flows for probabilistic modeling and inference. The Journal of Machine Learning Research, 22(1):2617–2680, 2021.
- C. Radhakrishna Rao. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44(1):50–57, 1948.
- Likelihood ratios for out-of-distribution detection. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems. 2019.
- Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517, 2017.
- Understanding anomaly detection with deep invertible networks through hierarchies of distributions and features. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems. 2020.
- Input complexity and out-of-distribution detection with likelihood-based generative models. In International Conference on Learning Representations, 2020.
- Deep unsupervised learning using nonequilibrium thermodynamics. In Francis Bach and David Blei (eds.), Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 07–09 Jul 2015a. PMLR.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning. PMLR, 2015b.
- Jack Stilgoe. Who Killed Elaine Herzberg?, pp. 1–6. Springer International Publishing, Cham, 2020. ISBN 978-3-030-32320-2.
- Intriguing properties of neural networks. January 2014. 2nd International Conference on Learning Representations, ICLR 2014 ; Conference date: 14-04-2014 Through 16-04-2014.
- Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 4(2):26–31, 2012.
- Trust issues: Uncertainty estimation does not enable reliable ood detection on medical tabular data. In Emily Alsentzer, Matthew B. A. McDermott, Fabian Falck, Suproteem K. Sarkar, Subhrajit Roy, and Stephanie L. Hyland (eds.), Proceedings of the Machine Learning for Health NeurIPS Workshop. 11 Dec 2020.
- Conditional image generation with pixelcnn decoders. Advances in neural information processing systems, 29, 2016.
- Likelihood regret: An out-of-distribution detection score for variational auto-encoder. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems. 2020.
- Understanding failures in out-of-distribution detection with deep generative models. In International Conference on Machine Learning. PMLR, 2021a.
- On the out-of-distribution generalization of probabilistic image modelling. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems. 2021b.
- Domain generalization in vision: A survey. arXiv preprint arXiv:2103.02503, 2021.