Attention Adversarial Dual AutoEncoder
- The paper introduces ADAEN, a hybrid framework combining dual autoencoder architectures with an attention layer to prioritize salient features in anomaly detection.
- It integrates adversarial training with ranking-based prioritization and active learning, using GAN-generated augmentations to reduce labeled data requirements.
- Empirical results demonstrate significant improvements in anomaly scoring and ranking across diverse operating systems, highlighting its applicability in real-world security contexts.
The Attention Adversarial Dual AutoEncoder (ADAEN) is a hybrid neural anomaly detection framework introduced for highly imbalanced detection tasks such as advanced persistent threat (APT) identification in security provenance trace data. ADAEN integrates dual autoencoder architectures, an attention mechanism, adversarial learning, ranking-based prioritization, and a data-efficient active learning loop with generative augmentation to achieve superior anomaly prioritization with minimal labeled data requirements (Benabderrahmane et al., 25 Nov 2025).
1. Dual AutoEncoder Architecture
ADAEN consists of two feed-forward autoencoders (AEs), denoted AE₁ and AE₂. AE₁ functions as a generator focusing on capturing core data characteristics, while AE₂ serves as a complementary refiner/discriminator, promoting distinct but compatible latent representations. Formally, let with . Each AE maps to a latent code and reconstructs it:
Each network uses LeakyReLU activations, batch normalization, and dropout. The loss for each AE is its mean squared reconstruction error:
Total reconstruction loss is a weighted combination:
2. Attention Mechanism
ADAEN incorporates an attention layer between AE₁’s encoder and decoder to enhance focus on salient features or temporal slices of the latent code, . For each slice and context vector , compute attention weights:
The attended summary is:
This vector is input to the decoder, enabling adaptive emphasis on features relevant for reconstruction, and consequently anomaly discrimination.
3. Adversarial Training Regime
ADAEN introduces an adversarial game where a discriminator (parameterized as a 3-layer MLP) aims to separate genuine data from reconstructed outputs of AE₁ and AE₂. The adversarial losses are:
Discriminator:
Generator (AE₁, AE₂):
The overall objective combines reconstruction and adversarial losses via hyperparameter ():
Optimization alternates between minimizing (updating with AE parameters fixed) and minimizing (updating AE parameters with fixed).
4. Anomaly Scoring and Ranking
During inference, ADAEN computes an anomaly score for each :
Samples are sorted in descending order by , producing a ranked anomaly list. Evaluation prioritizes high-fidelity ranking, employing normalized discounted cumulative gain (nDCG):
where is the anomaly label and is the ideal .
5. Active Learning with Data Augmentation
To address limited labeled anomaly data, ADAEN incorporates an active learning protocol:
- Train ADAEN on initial labeled set (all normals).
- Score unlabeled pool by .
- Set threshold at the -th percentile of .
- Compute uncertainty for .
- Select top- points (minimum ) and query an oracle for ground truth.
- For points confirmed as normal, augment using a GAN :
- Train GAN on oracle-confirmed normals.
- Generate synthetic normals .
- Update new normals (real and synthetic).
- Retrain ADAEN and repeat steps up to rounds.
Pseudocode (as in (Benabderrahmane et al., 25 Nov 2025)):
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Input: D_L (labeled normals), D_U (unlabeled), Q (query budget), τ‐percentile, N_iter for t in 1…N_iter: 1. Train ADAEN on D_L. 2. For x∈D_U, compute s(x)=anomaly score. 3. Set τ ← percentile_q({s(x):x∈D_U}). 4. Compute U(x)=|s(x)−τ|. 5. Select {x_i}_(i=1…Q)=top‐Q by smallest U(x). 6. Query oracle for labels y_i. 7. Let N_new={x_i | y_i=normal}. 8. Train GAN on N_new → generate Ŷ_new. 9. D_L←D_L∪N_new∪Ŷ_new; D_U←D_U∖{x_i}. end Output: final ranked list by s(x) on remaining D_U. |
6. Implementation and Hyperparameters
The ADAEN configuration is defined by several key hyperparameters and training details:
| Symbol | Meaning | Default/Example Value |
|---|---|---|
| Input dimension | (dataset-dependent) | |
| Latent code size | ||
| AE₁/AE₂ recon. weight | ||
| Adversarial term weight | ||
| Anomaly threshold percentile | ||
| Oracle queries per round | user-defined | |
| Max active learning iters | 40 | |
| Activation function | LeakyReLU + batch-norm, dropout | |
| Discriminator net arch | 3-layer MLP | |
| GAN generator/discriminator | (see (Benabderrahmane et al., 25 Nov 2025)) |
Optimization uses Adam (lr=$1e$-$4$, =0.5, =0.999), batch size 128, and early stopping with patience 10 on validation .
7. Application Context and Empirical Results
ADAEN targets extreme class-imbalance settings, exemplified by APT detection where attacks comprise as little as 0.004% of events in provenance trace databases (DARPA Transparent Computing). Empirical evaluation spans Android, Linux, BSD, and Windows datasets under two distinct attack scenarios. Adoption of the ranking- and active-learning-enhanced protocol yields significant improvements in detection rates and ranking metrics (nDCG), outperforming previous approaches in prioritizing true anomalies with reduced labeling overhead. A plausible implication is that such an architecture facilitates deployment in real-world security infrastructure with minimal manual annotation requirements (Benabderrahmane et al., 25 Nov 2025).