AutoGraphAD: Unsupervised Graph NIDS
- AutoGraphAD is an unsupervised anomaly detection framework that leverages a heterogeneous variational graph autoencoder to process NetFlow data as temporal, bipartite graphs.
- It constructs graphs by linking connection-flow nodes with IP nodes, enabling localized anomaly scoring in fixed time windows while bypassing the need for labeled data.
- The framework optimizes reconstruction and KL-divergence losses through grid search on hyperparameters, achieving high throughput and robust detection performance.
AutoGraphAD is an unsupervised anomaly detection framework for network intrusion detection systems (NIDS), leveraging a Heterogeneous Variational Graph Autoencoder (VGAE) architecture to process NetFlow data as temporal heterogeneous graphs. Unlike supervised approaches, AutoGraphAD is designed to obviate the need for labelled datasets and downstream anomaly detectors by producing end-to-end anomaly scores based on reconstruction and latent space regularization (Anyfantis et al., 21 Nov 2025).
1. Graph Construction from Network Flows
AutoGraphAD operates on graphs constructed from raw NetFlow records within fixed time windows of length seconds. For each time window , a heterogeneous graph is built with:
- Node types:
- : connection-flow nodes, where each node represents a single network flow and has features (e.g., bytes, packets, durations, flags).
- : IP-address nodes, one for each distinct IP seen in the window, typically with placeholder or low-dimensional features (e.g., one-hot encoding for IP version).
- Edges:
- Each connection node is linked to its source and destination nodes, with no direct - or - connections.
- Mathematical representation:
- Flow node features:
- IP node features: (low-dimensional)
- Biadjacency matrix: expresses - relations
This structure explicitly encodes the bipartite flow–IP interaction within each snapshot, facilitating localized reasoning over short timescales.
2. Model Architecture: Heterogeneous Variational Graph Autoencoder
AutoGraphAD employs a heterogeneous VGAE customized for bipartite (,) graphs, reconstructing both network structure and node attributes.
- Encoder: A GNN (e.g., GraphSAGE, GCN) processes masked input graphs (masking applied only to nodes' features).
- Generates node-level variational parameters: , for all .
- Latent embedding via reparametrization: , with and per node.
- Decoder:
- Structure reconstruction: Predicted adjacency , , (optionally per edge type).
- Feature reconstruction: On -nodes only, a lightweight GNN-decoder maps and to .
- Contrastive component: Negative-edge sampling during training introduces non-edge pairs at a ratio ; the model is trained to output low probabilities on such non-edges, implemented via binary cross-entropy loss over edge and non-edge sets.
This architecture supports edge reconstruction and flow feature imputation in a unified variational framework.
3. Losses, Training Objectives, and Anomaly Scoring
- Reconstruction losses:
- Structure: .
- Features (on ): (MSE) or (cosine embedding), chosen per model variant.
- KL-divergence: .
The total loss is: with controlling relative weightings.
- Node-level anomaly scoring at inference:
For each connection node ,
where each term corresponds to the node-specific feature/structure residual and KL penalty, and is an additional weight. Predicted scores are scaled (RobustScaler), and a percentile threshold is selected (via grid search) so that flags an anomaly.
4. Training Procedure and Hyperparameter Choices
AutoGraphAD is trained with AdamW (learning rate , weight decay ), early stopping (patience 20, up to 100 epochs), and batch size 1 (i.e., one window graph per batch). Key regularization and augmentation include:
- Node masking: 20–40% of -nodes masked per epoch
- Edge dropout: 10–20%
- Negative-edge ratio: 20–40%
- Encoder/decoder GNN depth: 1–2 layers
- Latent size:
- Feature loss: MSE or cosine embedding distance
- Loss weights and threshold grid: , , (percentile)
Model selection and threshold tuning are performed via grid search on a held-out, mildly contaminated validation split.
5. Performance Evaluation and Runtime Analysis
Benchmarks are performed on the UNSW-NB15 dataset using sliding time windows, with contamination rates of 0%, 3.5%, and 5.7%. AutoGraphAD is compared to Anomal-E’s PyOD-based downstream detectors (PCA, CBLOF, HBOS):
| Contamination | Method | Accuracy (%) | F-Macro (%) | Recall (%) |
|---|---|---|---|---|
| 0% | PCA (Anomal-E) | 96.65 | 82.39 | 98.27 |
| 0% | CBLOF (Anomal-E) | 96.21 | 79.96 | 94.46 |
| 0% | HBOS (Anomal-E) | 96.68 | 82.50 | 98.28 |
| 0% | AutoGraphAD | 97.69 | 84.23 | 87.98 |
Comparable or superior results are observed at higher contamination levels.
Runtime per window:
| Method | Train (s) | Inference (s) |
|---|---|---|
| PCA | 0.0885 | 0.0286 |
| CBLOF | 0.0454 | 0.0323 |
| HBOS | 0.1359 | 0.0869 |
| AutoGraphAD (end-to-end) | 0.0060 | 0.0046 |
AutoGraphAD is 1.18 orders of magnitude faster in training and 1.03 orders faster in inference compared to the Anomal-E+PyOD pipeline. This suggests substantial suitability for live operational deployment and concept-drift adaptation.
6. Advantages, Limitations, and Extensions
Strengths:
- Fully unsupervised: eliminates reliance on labeled flows or additional anomaly detectors.
- Produces an end-to-end anomaly score based on reconstruction and variational residuals.
- GPU-native operation with latent dimension yields high per-window throughput.
- Simple retuning for concept drift via adjustment of loss and threshold hyperparameters.
Limitations:
- Performance and detection robustness exhibit sensitivity to the selection of , , , and the anomaly percentile .
- Static windowing may miss low-rate or long-duration attacks that span multiple time slices.
- No explicit modeling of – interactions, which could be important for detecting lateral movement.
Potential extensions include temporal/dynamic VGAE models over window sequences, adversarial regularization on latent embeddings akin to AR-VGAE, explicit integration of edge features into the decoder, and cross-corpus training with multi-task contrastive objectives.
A plausible implication is that end-to-end, graph-native anomaly scoring may become the preferred paradigm for high-throughput NIDS settings, pending refinements to temporal and multi-view regularization (Anyfantis et al., 21 Nov 2025).