Attention Adversarial Dual AutoEncoder

Updated 2 December 2025

The paper introduces ADAEN, a hybrid framework combining dual autoencoder architectures with an attention layer to prioritize salient features in anomaly detection.
It integrates adversarial training with ranking-based prioritization and active learning, using GAN-generated augmentations to reduce labeled data requirements.
Empirical results demonstrate significant improvements in anomaly scoring and ranking across diverse operating systems, highlighting its applicability in real-world security contexts.

The Attention Adversarial Dual AutoEncoder (ADAEN) is a hybrid neural anomaly detection framework introduced for highly imbalanced detection tasks such as advanced persistent threat (APT) identification in security provenance trace data. ADAEN integrates dual autoencoder architectures, an attention mechanism, adversarial learning, ranking-based prioritization, and a data-efficient active learning loop with generative augmentation to achieve superior anomaly prioritization with minimal labeled data requirements (Benabderrahmane et al., 25 Nov 2025).

1. Dual AutoEncoder Architecture

ADAEN consists of two feed-forward autoencoders (AEs), denoted AE₁ and AE₂. AE₁ functions as a generator focusing on capturing core data characteristics, while AE₂ serves as a complementary refiner/discriminator, promoting distinct but compatible latent representations. Formally, let $\mathcal{X} = \{x^{(i)}\}_{i=1}^m$ with $x^{(i)}\in\mathbb{R}^d$ . Each AE maps $x\in\mathbb{R}^d$ to a latent code $z\in\mathbb{R}^k$ and reconstructs it:

$z_j = f_{\theta_j}(x) = \sigma(W_e^{(j)}x + b_e^{(j)}),$

$\hat{x}_j = g_{\phi_j}(z_j) = \sigma(W_d^{(j)}z_j + b_d^{(j)}), \quad j=1,2$

Each network uses LeakyReLU activations, batch normalization, and dropout. The loss for each AE is its mean squared reconstruction error:

$\mathcal{L}_{rec_j} = \frac{1}{|\mathcal{X}|} \sum_{x \in \mathcal{X}} \| x - g_{\phi_j}(f_{\theta_j}(x)) \|^2_2$

Total reconstruction loss is a weighted combination:

$\mathcal{L}_{rec} = \alpha \mathcal{L}_{rec_1} + (1-\alpha)\mathcal{L}_{rec_2}, \quad \alpha=0.5$

2. Attention Mechanism

ADAEN incorporates an attention layer between AE₁’s encoder and decoder to enhance focus on salient features or temporal slices of the latent code, $z_1\in\mathbb{R}^{T \times k}$ . For each slice $h_i\in\mathbb{R}^k$ and context vector $v\in\mathbb{R}^k$ , compute attention weights:

$e_i = v^{\top}h_i, \quad \xi_i = \frac{\exp(e_i)}{\sum_{j=1}^T \exp(e_j)}$

The attended summary $c$ is:

$c = \sum_{i=1}^T \xi_i h_i$

This vector $c$ is input to the decoder, enabling adaptive emphasis on features relevant for reconstruction, and consequently anomaly discrimination.

3. Adversarial Training Regime

ADAEN introduces an adversarial game where a discriminator $D_\psi$ (parameterized as a 3-layer MLP) aims to separate genuine data from reconstructed outputs of AE₁ and AE₂. The adversarial losses are:

Discriminator:

$\mathcal{L}_D = -\,\mathbb{E}_{x\sim p_{data}}[\log D_\psi(x)] -\,\mathbb{E}_{\hat{x}_j\sim p_{rec}}[\log (1 - D_\psi(\hat{x}_j))],\;j=1,2$

Generator (AE₁, AE₂):

$\mathcal{L}_{adv} = -\,\mathbb{E}_{\hat{x}_j\sim p_{rec}}[\log D_\psi(\hat{x}_j)],\;j=1,2$

The overall objective combines reconstruction and adversarial losses via hyperparameter $\lambda$ ( $\lambda=0.5$ ):

$\mathcal{L}_{ADAEN} = \mathcal{L}_{rec} + \lambda \mathcal{L}_{adv}$

Optimization alternates between minimizing $\mathcal{L}_D$ (updating $\psi$ with AE parameters fixed) and minimizing $\mathcal{L}_{ADAEN}$ (updating AE parameters with $\psi$ fixed).

4. Anomaly Scoring and Ranking

During inference, ADAEN computes an anomaly score for each $x$ :

$s(x) = \alpha\|x - \hat{x}_1\|^2_2 + (1-\alpha)\|x - \hat{x}_2\|^2_2$

Samples are sorted in descending order by $s(x)$ , producing a ranked anomaly list. Evaluation prioritizes high-fidelity ranking, employing normalized discounted cumulative gain (nDCG):

$DCG = \sum_{i=1}^N \frac{rel_i}{\log_2(i+1)},\quad nDCG = \frac{DCG}{iDCG}$

where $rel_i\in\{0,1\}$ is the anomaly label and $iDCG$ is the ideal $DCG$ .

5. Active Learning with Data Augmentation

To address limited labeled anomaly data, ADAEN incorporates an active learning protocol:

Train ADAEN on initial labeled set $D_L$ (all normals).
Score unlabeled pool $D_U$ by $s(x)$ .
Set threshold $\tau$ at the $q$ -th percentile of $s(x)$ .
Compute uncertainty $U(x) = |s(x) - \tau|$ for $x\in D_U$ .
Select top- $Q$ points (minimum $U(x)$ ) and query an oracle for ground truth.
For points confirmed as normal, augment using a GAN $(G_{GAN}, D_{GAN})$ $(G_{G A N}, D_{G A N})$ :
- Train GAN on oracle-confirmed normals.
- Generate synthetic normals $\hat{y} = G_{GAN}(z),\;z\sim\mathcal{N}(0,I)$ .
Update $D_L\leftarrow D_L\cup$ new normals (real and synthetic).
Retrain ADAEN and repeat steps up to $N_{iter}$ rounds.

Pseudocode (as in (Benabderrahmane et al., 25 Nov 2025)):

Input: D_L (labeled normals), D_U (unlabeled), Q (query budget), τ‐percentile, N_iter
for t in 1…N_iter:
    1. Train ADAEN on D_L.
    2. For x∈D_U, compute s(x)=anomaly score.
    3. Set τ ← percentile_q({s(x):x∈D_U}).
    4. Compute U(x)=|s(x)−τ|.
    5. Select {x_i}_(i=1…Q)=top‐Q by smallest U(x).
    6. Query oracle for labels y_i.
    7. Let N_new={x_i | y_i=normal}.
    8. Train GAN on N_new → generate Ŷ_new.
    9. D_L←D_L∪N_new∪Ŷ_new; D_U←D_U∖{x_i}.
end
Output: final ranked list by s(x) on remaining D_U.

6. Implementation and Hyperparameters

The ADAEN configuration is defined by several key hyperparameters and training details:

Symbol	Meaning	Default/Example Value
$d$	Input dimension	(dataset-dependent)
$k$	Latent code size	$k=32$
$\alpha$	AE₁/AE₂ recon. weight	$\alpha=0.5$
$\lambda$	Adversarial term weight	$\lambda=0.5$
$\tau$	Anomaly threshold percentile	$q=80$
$Q$	Oracle queries per round	user-defined
$N_{iter}$	Max active learning iters	40
$\sigma$	Activation function	LeakyReLU + batch-norm, dropout
$D_\psi$	Discriminator net arch	3-layer MLP
$G_{GAN}, D_{GAN}$	GAN generator/discriminator	(see (Benabderrahmane et al., 25 Nov 2025))

Optimization uses Adam (lr=$1e$-$4$, $\beta_1$ =0.5, $\beta_2$ =0.999), batch size 128, and early stopping with patience 10 on validation $\mathcal{L}_{rec}$ .

7. Application Context and Empirical Results

ADAEN targets extreme class-imbalance settings, exemplified by APT detection where attacks comprise as little as 0.004% of events in provenance trace databases (DARPA Transparent Computing). Empirical evaluation spans Android, Linux, BSD, and Windows datasets under two distinct attack scenarios. Adoption of the ranking- and active-learning-enhanced protocol yields significant improvements in detection rates and ranking metrics (nDCG), outperforming previous approaches in prioritizing true anomalies with reduced labeling overhead. A plausible implication is that such an architecture facilitates deployment in real-world security infrastructure with minimal manual annotation requirements (Benabderrahmane et al., 25 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Ranking-Enhanced Anomaly Detection Using Active Learning-Assisted Attention Adversarial Dual AutoEncoders (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Attention Adversarial Dual AutoEncoder.