Input complexity and out-of-distribution detection with likelihood-based generative models

Published 25 Sep 2019 in cs.LG and stat.ML | (1909.11480v3)

Abstract: Likelihood-based generative models are a promising resource to detect out-of-distribution (OOD) inputs which could compromise the robustness or reliability of a machine learning system. However, likelihoods derived from such models have been shown to be problematic for detecting certain types of inputs that significantly differ from training data. In this paper, we pose that this problem is due to the excessive influence that input complexity has in generative models' likelihoods. We report a set of experiments supporting this hypothesis, and use an estimate of input complexity to derive an efficient and parameter-free OOD score, which can be seen as a likelihood-ratio, akin to Bayesian model comparison. We find such score to perform comparably to, or even better than, existing OOD detection approaches under a wide range of data sets, models, model sizes, and complexity estimates.

Abstract PDF Upgrade to Chat

Citations (262)

View on Semantic Scholar

Summary

The paper finds that likelihoods from generative models are heavily influenced by input complexity, undermining their effectiveness for out-of-distribution detection.
A novel out-of-distribution score is proposed that adjusts the generative model's log-likelihood by accounting for input complexity, improving detection.
Empirical results show the proposed complexity-adjusted score outperforms traditional likelihood-based methods with zero hyper-parameters and no additional training.

Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models

The paper "Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models" provides a detailed examination of the challenges and methodologies associated with out-of-distribution (OOD) detection in machine learning, particularly when using likelihood-based generative models. This research is pertinent for those focusing on the development of robust machine learning systems that need to maintain reliability when faced with inputs that diverge from the training data.

Evaluation of Generative Models for OOD Detection

Likelihood-based generative models have been considered promising candidates for OOD detection due to their capacity to model input data distributions. However, a significant insight offered by this research is the recognition that likelihoods computed by these generative models are heavily influenced by the complexity of the inputs, undermining their efficacy in distinguishing between in-distribution and OOD inputs. The study demonstrates this with empirical evidence, revealing that simpler inputs (often quantified by their compressed size) tend to produce higher likelihoods, even when they are significantly different from any training data.

Proposed OOD Score

To address the observed shortcomings, the research introduces a novel OOD score that adjusts the generative model's log-likelihood by accounting for input complexity. This score, akin to a likelihood-ratio test statistic, integrates an estimate of input complexity derived from compressibility measures, aiming to isolate true OOD samples more effectively than using likelihoods alone. The score is demonstrated to outperform traditional likelihood-based approaches across a wide array of data sets and model architectures, providing improved OOD detection in terms of the area under the receiver operating characteristic curve (AUROC).

Methodological Insights

The authors adopt a Bayesian argument by likening the proposed score to Bayesian model comparison. This theoretical framework underscores that the score is analogous to Occam's razor: it compares a specifically trained generative model with a more universal model, ensuring that predictions are weighted against the complexity of the input. Such a strategy implicitly emphasizes model simplicity and highlights unusual patterns in the data that merit attention as potential OOD indicators.

Results and Implications

Empirical results substantiate the efficacy of the proposed score. With zero hyper-parameters and no requirement for additional training, the score exhibits improved detection across various scenarios compared to existing methods. This is particularly notable given its simplicity and the broad applicability of a parameter-free system in practical settings, which adds to its attractiveness for deployment in real-world applications.

Future Directions

While the score shows promise, the paper preludes several avenues for future investigation. These include refining the complexity estimate with more sophisticated or ensemble-based compression metrics, exploring its generality to other domains (such as text or audio data), and assessing its applicability in conjunction with ensemble models to potentially further enhance performance.

In conclusion, this research represents an incremental advancement in understanding and improving OOD detection with likelihood-based generative models. By addressing input complexity bias, it provides a clearer pathway towards developing robust machine learning systems capable of more reliable performance in complex, real-world environments.

Markdown