Anomaly Detection with Density Estimation (2001.04990v2)

Published 14 Jan 2020 in hep-ph, hep-ex, physics.data-an, and stat.ML

Abstract: We leverage recent breakthroughs in neural density estimation to propose a new unsupervised anomaly detection technique (ANODE). By estimating the probability density of the data in a signal region and in sidebands, and interpolating the latter into the signal region, a likelihood ratio of data vs. background can be constructed. This likelihood ratio is broadly sensitive to overdensities in the data that could be due to localized anomalies. In addition, a unique potential benefit of the ANODE method is that the background can be directly estimated using the learned densities. Finally, ANODE is robust against systematic differences between signal region and sidebands, giving it broader applicability than other methods. We demonstrate the power of this new approach using the LHC Olympics 2020 R&D Dataset. We show how ANODE can enhance the significance of a dijet bump hunt by up to a factor of 7 with a 10\% accuracy on the background prediction. While the LHC is used as the recurring example, the methods developed here have a much broader applicability to anomaly detection in physics and beyond.

Citations (207)

View on Semantic Scholar

Summary

The paper presents the ANODE method, which leverages normalizing flows to estimate multidimensional data densities for unsupervised anomaly detection.
It interpolates densities from sidebands into the signal region to build a data-driven likelihood ratio that markedly improves signal significance.
The method’s adaptability to high-dimensional data positions it as a promising tool for applications in collider physics and beyond.

Anomaly Detection with Density Estimation

The paper "Anomaly Detection with Density Estimation" by Nachman and Shih proposes a novel unsupervised technique, termed ANODE, which integrates neural density estimation to detect anomalies in experimental data without recourse to specific model hypotheses. This work leverages recent advances in neural density estimation, particularly the use of normalizing flows and their variants. The authors detail the potential of ANODE in high energy physics applications, using the paradigm of resonances at the Large Hadron Collider (LHC).

Overview

The main innovation here is the ANODE method, which estimates the conditional probability density of data in both a signal region (SR) and a set of sidebands (SB). By interpolating densities from the SB into the SR, ANODE can construct a fully data-driven likelihood ratio distinguishing data from background. This methodology is inherently unsupervised and capitalizes on the multidimensional density estimation capacity of neural networks, making it broadly applicable to anomaly detection tasks.

Technical Execution

ANODE utilizes normalizing flows, specifically masked autoregressive flows (MAF), to estimate densities. These models are adept at transforming a simple base distribution (e.g., Gaussian) into a complex target distribution using a sequence of invertible neural network transformations. The normalizing flows provide a powerful framework to manage high-dimensional data, aligning with the complex, multifaceted nature of collider data.

Key Results

Applied to the LHC Olympics 2020 R&D dataset, a simulated environment featuring hypothetical particle decays, the ANODE method markedly enhances detection sensitivity. For instance, the technique improved the significance of a simulated signal over background by a factor of 7, demonstrating its robustness and efficacy in scenarios where traditional bump-hunt methods may falter. These results illustrate the utility of ANODE in identifying otherwise elusive signals within complex data sets.

Implications and Future Directions

From a practical standpoint, ANODE presents a promising tool for extending the reach of existing search strategies at colliders like the LHC. The method's capacity to handle correlations in high-dimensional feature spaces could advance anomaly detection beyond particle physics, into disciplines such as cosmology or network traffic analysis, where data complexity and volume pose significant analysis challenges.

Theoretically, ANODE's integration of density estimation offers an empirical path to uncover subtle deformations in expected data distributions, potentially hinting at new physics phenomena without the constraints of predefined models. As neural density estimation techniques evolve, future work could explore the incorporation of more expressive models like neural spline flows.

In conclusion, ANODE represents a methodologically robust leap in anomaly detection, demonstrating the confluence of deep learning and physical data analysis. As tools improve, it will likely serve as a template for similar efforts beyond its initial application context.

PDF Markdown