Compressed Anomaly Detection with Multiple Mixed Observations
Published 31 Jan 2018 in cs.IT, cs.DS, eess.SP, math.NA, and math.OC | (1801.10264v2)
Abstract: We consider a collection of independent random variables that are identically distributed, except for a small subset which follows a different, anomalous distribution. We study the problem of detecting which random variables in the collection are governed by the anomalous distribution. Recent work proposes to solve this problem by conducting hypothesis tests based on mixed observations (e.g. linear combinations) of the random variables. Recognizing the connection between taking mixed observations and compressed sensing, we view the problem as recovering the "support" (index set) of the anomalous random variables from multiple measurement vectors (MMVs). Many algorithms have been developed for recovering jointly sparse signals and their support from MMVs. We establish the theoretical and empirical effectiveness of these algorithms at detecting anomalies. We also extend the LASSO algorithm to an MMV version for our purpose. Further, we perform experiments on synthetic data, consisting of samples from the random variables, to explore the trade-off between the number of mixed observations per sample and the number of samples required to detect anomalies.
The paper introduces novel JSM-2R and JSM-3R models to recover anomaly indices using mixed observations and compressed sensing concepts.
It adapts five algorithms, including OSGA, MMV-SOMP, MMV-LASSO, TECC, and ACIE, demonstrating effective reconstruction of sparse anomalies.
Experimental evaluation confirms that minimal measurements and dynamic estimation methods can reliably detect anomalies under varying conditions.
Compressed Anomaly Detection with Multiple Mixed Observations
Introduction
The paper "Compressed Anomaly Detection with Multiple Mixed Observations" (1801.10264) tackles a critical problem in various scientific and engineering domains: anomaly detection among a collection of random variables. Traditional methods of anomaly detection typically involve sampling each random variable individually followed by hypothesis testing techniques. In contrast, this paper explores a novel approach involving hypothesis tests conducted on mixed observations of these random variables, drawing upon concepts from compressed sensing.
The goal is to identify which variables follow an anomalous distribution, diverging from the norm followed by the majority. The authors view this task as analogous to recovering the "support" or index set of the anomalous signals within the framework of multiple measurement vectors (MMVs). Various algorithms originally designed for recovering jointly sparse signals are adapted to address anomaly detection, with theoretical and empirical analysis provided to demonstrate their effectiveness.
Figure 1: Depiction of the existing joint sparsity models (JSM-2 and JSM-3) and the new models developed for anomaly detection (JSM-2R and JSM-3R).
Methodology
Joint Sparsity Models for Anomaly Detection
The paper introduces two novel models, JSM-2R and JSM-3R, inspired by previously established joint sparsity models for correlated signals. These models capture the correlation structure among independent realizations of random variables and are crucial for anomaly detection:
JSM-2R: Characterizes signals with zero-mean small amplitude for prevalent distributions and larger amplitude for anomalous distributions. This relates to sparse signal models where the "support" corresponds to anomalous indices.
JSM-3R: Extends JSM-3 by introducing a common component shared across signals, with varying innovation components following the JSM-2R model.
Algorithms
Five algorithms are discussed, each adapted for detecting anomalous random variables:
OSGA (One-Step Greedy Algorithm): Adapted for JSM-2R, leveraging inner products between measurements and sensing matrix columns to recapture anomaly indices.
Figure 2: The recovery phase transition for the OSGA algorithm with K=1, 5, and 10 anomalous random variables.
MMV-SOMP (Simultaneous Orthogonal Matching Pursuit): Iteratively reconstructs sparse support in JSM-2R signals by identifying anomaly indices one at a time, updating residuals iteratively.
Figure 3: The recovery phase transition for the MMV-SOMP algorithm with K=1, 5, and 10 anomalous random variables.
MMV-LASSO (Least Absolute Shrinkage and Selection Operator): Extends conventional LASSO to MMV settings, offering a robust mechanism for anomaly detection with minimal measurements.
Figure 4: The recovery phase transition for the MMV-LASSO algorithm with K=1, 5, and 10 anomalous random variables.
TECC (Transpose Estimation of Common Component) and ACIE (Alternating Common and Innovation Estimation) are tailored for JSM-3R signals, iteratively estimating common components and refining residuals to detect anomalies.
Experimental Evaluation
Extensive numerical experiments verify the effectiveness of these algorithms across varying signal models and parameters. The results reveal key insights:
Performance Impacts: Algorithms exhibit varying sensitivities to the number of anomalies, measurements per time-step, and overall time-steps. MMV-LASSO demonstrates robustness across different parameter settings.
Estimates of Anomalies: Approaches to estimate the number of anomalies dynamically show promise, leveraging statistical thresholds in test statistics and reconstructed signal amplitudes.
Figure 5: Plots of the values from which indices are selected for K in the JSM-2R algorithms. The dotted line denotes the drop between the top K values and the remaining N−K values.
Theoretical Insights
Theoretical results provide confidence in the asymptotic performance of these algorithms. Under conditions of distinct variances between prevalent and anomalous distributions, algorithms like OSGA and TECC with OSGA are shown to reliably recover anomaly indices, even with minimal measurements.
Conclusion
This work makes significant strides in framing anomaly detection as a compressed sensing problem, offering computational efficiency and minimal distribution knowledge requirements. Future research directions include optimizing sensing matrix design, expanding theorem guarantees to more algorithms, and exploring applicability in heavy-tailed distribution scenarios.
The findings underline the potential for leveraging MMV techniques in anomaly detection, paving the way for innovative applications in data-driven domains.