Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder (2003.02977v3)

Published 6 Mar 2020 in cs.LG, cs.CV, and stat.ML

Abstract: Deep probabilistic generative models enable modeling the likelihoods of very high dimensional data. An important application of generative modeling should be the ability to detect out-of-distribution (OOD) samples by setting a threshold on the likelihood. However, some recent studies show that probabilistic generative models can, in some cases, assign higher likelihoods on certain types of OOD samples, making the OOD detection rules based on likelihood threshold problematic. To address this issue, several OOD detection methods have been proposed for deep generative models. In this paper, we make the observation that many of these methods fail when applied to generative models based on Variational Auto-encoders (VAE). As an alternative, we propose Likelihood Regret, an efficient OOD score for VAEs. We benchmark our proposed method over existing approaches, and empirical results suggest that our method obtains the best overall OOD detection performances when applied to VAEs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Zhisheng Xiao (17 papers)
  2. Qing Yan (21 papers)
  3. Yali Amit (13 papers)
Citations (173)

Summary

Likelihood Regret: An Out-of-Distribution Detection Score for Variational Auto-encoders

The paper presents a novel approach to address out-of-distribution (OOD) detection in variational auto-encoders (VAEs), which is an important problem for ensuring the reliability of AI systems when faced with data that deviates from the training distribution. The proposed method, termed Likelihood Regret (LR), aims to rectify the inadequacies of existing probabilistic generative models that sometimes assign high likelihoods to OOD samples, impairing effective anomaly detection.

Context and Challenges

Deep generative models, including VAEs, have been pivotal in modeling the likelihoods of high-dimensional data. A critical application of these models is in detecting OOD samples using likelihood thresholds. However, recent studies revealed that generative models could assign higher likelihoods to certain OOD samples compared to in-distribution samples, defeating their purpose as reliable anomaly detectors. Existing OOD detection methods for generative models have shown limited effectiveness with VAEs, necessitating the development of a new OOD score.

Likelihood Regret: Proposed Solution

The authors propose Likelihood Regret as an efficient OOD detection metric for VAEs. This metric is defined as the log likelihood improvement of the model configuration optimized individually for a test sample over that of the configuration optimized for the training set. The rationalization is that for in-distribution samples, the improvement in likelihood should be small, resulting in low LR, whereas OOD samples should yield a larger LR due to their deviation from the training distribution.

To compute LR, the method optimizes the parameters of the VAE's variational posterior distribution for individual test inputs, providing a regularized adaptation that limits overfitting. This approach is implemented using iterative optimization algorithms within a structured bottleneck latent space facilitated by the VAE architecture.

Experimental Evaluation

The paper benchmarks Likelihood Regret against state-of-the-art OOD detection scores across various image datasets, including tasks where VAEs are trained on FashionMNIST and CIFAR-10. In quantitative measures like AUCROC, AUCPRC, and FPR80, LR consistently achieves superior or comparable results, particularly on challenging OOD detection tasks, demonstrating its robustness and effectiveness.

Importantly, the paper highlights that alternative state-of-the-art methods exhibit failures in certain tasks, especially when applied to VAEs, which further emphasizes the necessity and advantage of the proposed LR score. Despite computational overhead due to its iterative nature, the method's computational demands are manageable within typical model training regimes.

Implications and Future Directions

This research introduces a critical refinement for OOD detection in VAEs, enhancing their applicability and reliability in real-world scenarios where encountering OOD data is inevitable. By achieving more consistent performance across varied experimental setups, Likelihood Regret stands as a significant step towards more dependable unsupervised anomaly detection in generative models.

Future research could explore extending the concept of LR to other types of generative models, potentially by defining analogous optimizable model configurations. Moreover, investigating the LR's adaptations for different data types and its integration with hybrid models may further enhance the scope and impact of this technique in broader AI safety applications.