Likelihood Ratios for Out-of-Distribution Detection (1906.02845v2)

Published 7 Jun 2019 in stat.ML and cs.LG

Abstract: Discriminative neural networks offer little or no performance guarantees when deployed on data not generated by the same process as the training distribution. On such out-of-distribution (OOD) inputs, the prediction may not only be erroneous, but confidently so, limiting the safe deployment of classifiers in real-world applications. One such challenging application is bacteria identification based on genomic sequences, which holds the promise of early detection of diseases, but requires a model that can output low confidence predictions on OOD genomic sequences from new bacteria that were not present in the training data. We introduce a genomics dataset for OOD detection that allows other researchers to benchmark progress on this important problem. We investigate deep generative model based approaches for OOD detection and observe that the likelihood score is heavily affected by population level background statistics. We propose a likelihood ratio method for deep generative models which effectively corrects for these confounding background statistics. We benchmark the OOD detection performance of the proposed method against existing approaches on the genomics dataset and show that our method achieves state-of-the-art performance. We demonstrate the generality of the proposed method by showing that it significantly improves OOD detection when applied to deep generative models of images.

Authors (8)

Jie Ren (329 papers)
Peter J. Liu (30 papers)
Emily Fertig (7 papers)
Jasper Snoek (42 papers)
Ryan Poplin (6 papers)
Mark A. DePristo (1 paper)
Joshua V. Dillon (23 papers)
Balaji Lakshminarayanan (62 papers)

Citations (671)

View on Semantic Scholar

Summary

Likelihood Ratios for Out-of-Distribution Detection: An Analysis

The paper presents a robust approach to addressing the challenge of out-of-distribution (OOD) detection in discriminative neural networks. This is particularly crucial in maintaining prediction reliability when data deviates from the training distribution, as in the case of bacterial identification from genomic sequences—a scenario prone to high-risk misclassification.

Core Contributions

The authors introduce a novel genomics dataset for benchmarking OOD detection, adding a new dimension to machine learning applications in genomics. Moreover, they propose a likelihood ratio method for leveraging deep generative models in OOD scenarios, which corrects for confounding background statistics, significantly improving detection capabilities.

Methodology

The likelihood ratio approach contrasts the likelihood score of an input against a background model trained on perturbed data to isolate semantic information relevant to the in-distribution data. This technique is grounded in the hypothesis that conventional deep generative models yield likelihood scores overly influenced by background statistics—an idea empirically validated across genomic sequences and image datasets.

The methodology was evaluated through auto-regressive deep generative models, such as PixelCNN++ for image recognition tasks, and LSTMs for genomic sequences. Modifications to likelihood calculation allowed for a better distinction between OOD and in-distribution inputs by focusing on in-distribution-specific semantic features.

Numerical Results

The paper reports that their method achieves state-of-the-art performance on the newly proposed genomics benchmark, with significant improvements over existing baselines. For instance, the likelihood ratio method in image dataset experiments improved AUROC from poorly performing likelihood-based scores to almost perfect differentiation between in-distribution and OOD data.

Implications and Future Directions

The introduction of the genomics dataset establishes a tangible benchmark for the research community to assess progress in OOD detection, particularly addressing a critical requirement in AI applications in genomics. The integration of likelihood ratios presents a feasible avenue for improving safety in AI applications prone to distributional shifts.

Addressing OOD detection through deeply generative models invites extensive research into further generalizing such models beyond genomics and images to other domains vulnerable to distributional variance. Future explorations might refine the background model's training or attempt transferring this approach to other generative frameworks to enhance adaptability.

Conclusion

This paper adeptly handles OOD detection within critical applications, providing both a practical machine learning benchmark and a theoretically grounded methodology for the broader AI research community. The strategic use of likelihood ratios in deep generative models exemplifies a substantive step in real-world AI deployment, ensuring models remain reliable when confronted with unforeseen data distributions.

PDF Markdown