Why Normalizing Flows Fail to Detect Out-of-Distribution Data (2006.08545v1)

Published 15 Jun 2020 in stat.ML and cs.LG

Abstract: Detecting out-of-distribution (OOD) data is crucial for robust machine learning systems. Normalizing flows are flexible deep generative models that often surprisingly fail to distinguish between in- and out-of-distribution data: a flow trained on pictures of clothing assigns higher likelihood to handwritten digits. We investigate why normalizing flows perform poorly for OOD detection. We demonstrate that flows learn local pixel correlations and generic image-to-latent-space transformations which are not specific to the target image dataset. We show that by modifying the architecture of flow coupling layers we can bias the flow towards learning the semantic structure of the target data, improving OOD detection. Our investigation reveals that properties that enable flows to generate high-fidelity images can have a detrimental effect on OOD detection.

Citations (253)

View on Semantic Scholar

Summary

The paper shows that normalizing flows fail OOD detection due to an excessive focus on local pixel correlations instead of semantic content.
It demonstrates through activation visualizations that the models capture basic graphical features rather than richer, meaningful representations.
Introducing bottlenecks in the st-networks significantly improves OOD detection, highlighting a promising direction for future model design.

Analyzing the Failures of Normalizing Flows in Out-of-Distribution Detection

The paper "Why Normalizing Flows Fail to Detect Out-of-Distribution Data" by Polina Kirichenko et al. critically examines the limitations of normalizing flows (NFs) in out-of-distribution (OOD) detection, particularly in the context of image data. While normalizing flows are designed as flexible deep generative models capable of modeling complex distributions with high fidelity, their performance in identifying data that lie outside the distribution of the training data has exhibited significant shortcomings.

Understanding the Shortcomings

The principal conclusion of the paper indicates that normalizing flows, when applied to the task of OOD detection, are more influenced by their architectural inductive biases than by the maximum likelihood objective on which they are trained. This intrinsic emphasis on certain biases leads flows to model local pixel correlations and general transformations rather than the semantic content needed for effective OOD detection. Therefore, images with structured patterns or backgrounds, irrespective of their semantic relevance to the model's training data, often receive higher likelihood scores. This phenomenon is starkly illustrated by the authors through instances where normalizing flows trained on datasets like ImageNet assign higher likelihoods to datasets that were distinctly different, such as CelebA or SVHN.

Methodologies and Insights

The paper meticulously investigates latent representations learned by normalizing flows, highlighting that these models tend to preserve basic graphical information rather than extract richer semantic details. Visualizations of activations from coupling layers reveal that flows rely heavily on local pixel statistics, which explains their failure with semantically anomalous OOD data. Further, the paper illustrates how normalizing flows, when altered to focus on semantic transformations rather than localized pixel correlations, show improved performance in differentiating in-distribution from OOD samples.

The authors experimented with modifying the capacity of the networks responsible for predicting transformations in normalizing flows—namely, the $st$ -networks used in coupling layers. By introducing architectural changes such as bottlenecks in these networks, they restricted the model's capacity to encode all available information, thereby nudging it to focus on high-level features. While not a complete remedy, this adjustment considerably improved flow performance on OOD tasks.

Practical Implications and Future Directions

This paper suggests that OOD detection in normalizing flows—an application of increasing importance amidst the rise of deployment-centered AI systems—requires methodologies that emphasize semantic understanding over raw data density modeling. For practical machine learning systems, ensuring robust OOD detection capabilities is fundamental, given these systems often operate in diverse real-world conditions that differ from their training and validation environments.

As a forward-looking view, the paper opens up avenues for future research into enhancing normalizing flows by incorporating model architectures or pre-processing techniques that can accentuate semantic learning. Moreover, exploring hybrid models that integrate the density estimation strength of normalizing flows with the feature extraction capabilities of other deep learning paradigms could be a productive strategy.

In conclusion, while normalizing flows demonstrate high potential in generative modeling tasks, their OOD detection capabilities remain limited by intrinsic model biases. Understanding these biases allows researchers to move toward solutions that adapt normalizing flows, or related methodologies, better suited for real-world applications where robustness against OOD inputs is crucial. This work serves as a substantial stepping stone in addressing the nuanced challenges faced by modern deep learning models in OOD detection.

PDF Markdown