Papers
Topics
Authors
Recent
2000 character limit reached

Attention Adversarial Dual AutoEncoder

Updated 2 December 2025
  • The paper introduces ADAEN, a hybrid framework combining dual autoencoder architectures with an attention layer to prioritize salient features in anomaly detection.
  • It integrates adversarial training with ranking-based prioritization and active learning, using GAN-generated augmentations to reduce labeled data requirements.
  • Empirical results demonstrate significant improvements in anomaly scoring and ranking across diverse operating systems, highlighting its applicability in real-world security contexts.

The Attention Adversarial Dual AutoEncoder (ADAEN) is a hybrid neural anomaly detection framework introduced for highly imbalanced detection tasks such as advanced persistent threat (APT) identification in security provenance trace data. ADAEN integrates dual autoencoder architectures, an attention mechanism, adversarial learning, ranking-based prioritization, and a data-efficient active learning loop with generative augmentation to achieve superior anomaly prioritization with minimal labeled data requirements (Benabderrahmane et al., 25 Nov 2025).

1. Dual AutoEncoder Architecture

ADAEN consists of two feed-forward autoencoders (AEs), denoted AE₁ and AE₂. AE₁ functions as a generator focusing on capturing core data characteristics, while AE₂ serves as a complementary refiner/discriminator, promoting distinct but compatible latent representations. Formally, let X={x(i)}i=1m\mathcal{X} = \{x^{(i)}\}_{i=1}^m with x(i)Rdx^{(i)}\in\mathbb{R}^d. Each AE maps xRdx\in\mathbb{R}^d to a latent code zRkz\in\mathbb{R}^k and reconstructs it:

zj=fθj(x)=σ(We(j)x+be(j)),z_j = f_{\theta_j}(x) = \sigma(W_e^{(j)}x + b_e^{(j)}),

x^j=gϕj(zj)=σ(Wd(j)zj+bd(j)),j=1,2\hat{x}_j = g_{\phi_j}(z_j) = \sigma(W_d^{(j)}z_j + b_d^{(j)}), \quad j=1,2

Each network uses LeakyReLU activations, batch normalization, and dropout. The loss for each AE is its mean squared reconstruction error:

Lrecj=1XxXxgϕj(fθj(x))22\mathcal{L}_{rec_j} = \frac{1}{|\mathcal{X}|} \sum_{x \in \mathcal{X}} \| x - g_{\phi_j}(f_{\theta_j}(x)) \|^2_2

Total reconstruction loss is a weighted combination:

Lrec=αLrec1+(1α)Lrec2,α=0.5\mathcal{L}_{rec} = \alpha \mathcal{L}_{rec_1} + (1-\alpha)\mathcal{L}_{rec_2}, \quad \alpha=0.5

2. Attention Mechanism

ADAEN incorporates an attention layer between AE₁’s encoder and decoder to enhance focus on salient features or temporal slices of the latent code, z1RT×kz_1\in\mathbb{R}^{T \times k}. For each slice hiRkh_i\in\mathbb{R}^k and context vector vRkv\in\mathbb{R}^k, compute attention weights:

ei=vhi,ξi=exp(ei)j=1Texp(ej)e_i = v^{\top}h_i, \quad \xi_i = \frac{\exp(e_i)}{\sum_{j=1}^T \exp(e_j)}

The attended summary cc is:

c=i=1Tξihic = \sum_{i=1}^T \xi_i h_i

This vector cc is input to the decoder, enabling adaptive emphasis on features relevant for reconstruction, and consequently anomaly discrimination.

3. Adversarial Training Regime

ADAEN introduces an adversarial game where a discriminator DψD_\psi (parameterized as a 3-layer MLP) aims to separate genuine data from reconstructed outputs of AE₁ and AE₂. The adversarial losses are:

Discriminator:

LD=Expdata[logDψ(x)]Ex^jprec[log(1Dψ(x^j))],  j=1,2\mathcal{L}_D = -\,\mathbb{E}_{x\sim p_{data}}[\log D_\psi(x)] -\,\mathbb{E}_{\hat{x}_j\sim p_{rec}}[\log (1 - D_\psi(\hat{x}_j))],\;j=1,2

Generator (AE₁, AE₂):

Ladv=Ex^jprec[logDψ(x^j)],  j=1,2\mathcal{L}_{adv} = -\,\mathbb{E}_{\hat{x}_j\sim p_{rec}}[\log D_\psi(\hat{x}_j)],\;j=1,2

The overall objective combines reconstruction and adversarial losses via hyperparameter λ\lambda (λ=0.5\lambda=0.5):

LADAEN=Lrec+λLadv\mathcal{L}_{ADAEN} = \mathcal{L}_{rec} + \lambda \mathcal{L}_{adv}

Optimization alternates between minimizing LD\mathcal{L}_D (updating ψ\psi with AE parameters fixed) and minimizing LADAEN\mathcal{L}_{ADAEN} (updating AE parameters with ψ\psi fixed).

4. Anomaly Scoring and Ranking

During inference, ADAEN computes an anomaly score for each xx:

s(x)=αxx^122+(1α)xx^222s(x) = \alpha\|x - \hat{x}_1\|^2_2 + (1-\alpha)\|x - \hat{x}_2\|^2_2

Samples are sorted in descending order by s(x)s(x), producing a ranked anomaly list. Evaluation prioritizes high-fidelity ranking, employing normalized discounted cumulative gain (nDCG):

DCG=i=1Nrelilog2(i+1),nDCG=DCGiDCGDCG = \sum_{i=1}^N \frac{rel_i}{\log_2(i+1)},\quad nDCG = \frac{DCG}{iDCG}

where reli{0,1}rel_i\in\{0,1\} is the anomaly label and iDCGiDCG is the ideal DCGDCG.

5. Active Learning with Data Augmentation

To address limited labeled anomaly data, ADAEN incorporates an active learning protocol:

  1. Train ADAEN on initial labeled set DLD_L (all normals).
  2. Score unlabeled pool DUD_U by s(x)s(x).
  3. Set threshold τ\tau at the qq-th percentile of s(x)s(x).
  4. Compute uncertainty U(x)=s(x)τU(x) = |s(x) - \tau| for xDUx\in D_U.
  5. Select top-QQ points (minimum U(x)U(x)) and query an oracle for ground truth.
  6. For points confirmed as normal, augment using a GAN (GGAN,DGAN)(G_{GAN}, D_{GAN}):
    • Train GAN on oracle-confirmed normals.
    • Generate synthetic normals y^=GGAN(z),  zN(0,I)\hat{y} = G_{GAN}(z),\;z\sim\mathcal{N}(0,I).
  7. Update DLDLD_L\leftarrow D_L\cup new normals (real and synthetic).
  8. Retrain ADAEN and repeat steps up to NiterN_{iter} rounds.

Pseudocode (as in (Benabderrahmane et al., 25 Nov 2025)):

1
2
3
4
5
6
7
8
9
10
11
12
13
Input: D_L (labeled normals), D_U (unlabeled), Q (query budget), τpercentile, N_iter
for t in 1N_iter:
    1. Train ADAEN on D_L.
    2. For xD_U, compute s(x)=anomaly score.
    3. Set τ  percentile_q({s(x):xD_U}).
    4. Compute U(x)=|s(x)τ|.
    5. Select {x_i}_(i=1Q)=topQ by smallest U(x).
    6. Query oracle for labels y_i.
    7. Let N_new={x_i | y_i=normal}.
    8. Train GAN on N_new  generate Ŷ_new.
    9. D_LD_LN_newŶ_new; D_UD_U{x_i}.
end
Output: final ranked list by s(x) on remaining D_U.

6. Implementation and Hyperparameters

The ADAEN configuration is defined by several key hyperparameters and training details:

Symbol Meaning Default/Example Value
dd Input dimension (dataset-dependent)
kk Latent code size k=32k=32
α\alpha AE₁/AE₂ recon. weight α=0.5\alpha=0.5
λ\lambda Adversarial term weight λ=0.5\lambda=0.5
τ\tau Anomaly threshold percentile q=80q=80
QQ Oracle queries per round user-defined
NiterN_{iter} Max active learning iters 40
σ\sigma Activation function LeakyReLU + batch-norm, dropout
DψD_\psi Discriminator net arch 3-layer MLP
GGAN,DGANG_{GAN}, D_{GAN} GAN generator/discriminator (see (Benabderrahmane et al., 25 Nov 2025))

Optimization uses Adam (lr=$1e$-$4$, β1\beta_1=0.5, β2\beta_2=0.999), batch size 128, and early stopping with patience 10 on validation Lrec\mathcal{L}_{rec}.

7. Application Context and Empirical Results

ADAEN targets extreme class-imbalance settings, exemplified by APT detection where attacks comprise as little as 0.004% of events in provenance trace databases (DARPA Transparent Computing). Empirical evaluation spans Android, Linux, BSD, and Windows datasets under two distinct attack scenarios. Adoption of the ranking- and active-learning-enhanced protocol yields significant improvements in detection rates and ranking metrics (nDCG), outperforming previous approaches in prioritizing true anomalies with reduced labeling overhead. A plausible implication is that such an architecture facilitates deployment in real-world security infrastructure with minimal manual annotation requirements (Benabderrahmane et al., 25 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Attention Adversarial Dual AutoEncoder.