Likelihood Regret: An Out-of-Distribution Detection Score for Variational Auto-encoders
The paper presents a novel approach to address out-of-distribution (OOD) detection in variational auto-encoders (VAEs), which is an important problem for ensuring the reliability of AI systems when faced with data that deviates from the training distribution. The proposed method, termed Likelihood Regret (LR), aims to rectify the inadequacies of existing probabilistic generative models that sometimes assign high likelihoods to OOD samples, impairing effective anomaly detection.
Context and Challenges
Deep generative models, including VAEs, have been pivotal in modeling the likelihoods of high-dimensional data. A critical application of these models is in detecting OOD samples using likelihood thresholds. However, recent studies revealed that generative models could assign higher likelihoods to certain OOD samples compared to in-distribution samples, defeating their purpose as reliable anomaly detectors. Existing OOD detection methods for generative models have shown limited effectiveness with VAEs, necessitating the development of a new OOD score.
Likelihood Regret: Proposed Solution
The authors propose Likelihood Regret as an efficient OOD detection metric for VAEs. This metric is defined as the log likelihood improvement of the model configuration optimized individually for a test sample over that of the configuration optimized for the training set. The rationalization is that for in-distribution samples, the improvement in likelihood should be small, resulting in low LR, whereas OOD samples should yield a larger LR due to their deviation from the training distribution.
To compute LR, the method optimizes the parameters of the VAE's variational posterior distribution for individual test inputs, providing a regularized adaptation that limits overfitting. This approach is implemented using iterative optimization algorithms within a structured bottleneck latent space facilitated by the VAE architecture.
Experimental Evaluation
The paper benchmarks Likelihood Regret against state-of-the-art OOD detection scores across various image datasets, including tasks where VAEs are trained on FashionMNIST and CIFAR-10. In quantitative measures like AUCROC, AUCPRC, and FPR80, LR consistently achieves superior or comparable results, particularly on challenging OOD detection tasks, demonstrating its robustness and effectiveness.
Importantly, the paper highlights that alternative state-of-the-art methods exhibit failures in certain tasks, especially when applied to VAEs, which further emphasizes the necessity and advantage of the proposed LR score. Despite computational overhead due to its iterative nature, the method's computational demands are manageable within typical model training regimes.
Implications and Future Directions
This research introduces a critical refinement for OOD detection in VAEs, enhancing their applicability and reliability in real-world scenarios where encountering OOD data is inevitable. By achieving more consistent performance across varied experimental setups, Likelihood Regret stands as a significant step towards more dependable unsupervised anomaly detection in generative models.
Future research could explore extending the concept of LR to other types of generative models, potentially by defining analogous optimizable model configurations. Moreover, investigating the LR's adaptations for different data types and its integration with hybrid models may further enhance the scope and impact of this technique in broader AI safety applications.