Variational Temporal GNN Explainer
- The paper introduces VA-TGExplainer, which employs variational masks to quantify the temporal importance of edges in TGNN-based intrusion detection.
- The methodology uses reparameterized variational inference and ELBO optimization, showing that removing key edges can reduce anomaly scores by up to 78%.
- The framework preserves temporal-causal structures and provides uncertainty estimates that enhance interpretability and trust for SOC analysts in forensic analysis.
A Variational Temporal Graph Explainer (VA-TGExplainer) is an explanation module designed for Temporal Graph Neural Network (TGNN)–based intrusion detection systems (IDS) operating on provenance graphs. It quantifies the importance of edges in a temporal context for high anomaly score predictions, outputs distributions over edge importances instead of deterministic masks, and explicitly models uncertainty in subgraph-based explanations. VA-TGExplainer operates downstream of TGNN link-prediction heads, providing fine-grained, uncertainty-aware explanations compatible with complex, time-evolving graph structures such as those derived from system audit data (Dhanuka et al., 20 Dec 2025).
1. Mathematical Formalism
Let denote a temporal provenance graph, with a node set (e.g., processes, files, sockets), a set of directed, timestamped edges representing system events, and the set of discrete timestamps. For a target event (anomalous edge) in graph , the associated TGNN provides a link prediction score, with anomaly scores derived from the negative log-likelihood of the event label: .
An explanation context subgraph is formed by selecting edges within a specific time window (e.g., 15 minutes) or k-hop causal neighborhoods relevant to , preserving event temporal order.
The explainer introduces a vector of random mask variables , where each scales edge 's presence in the masked graph . Masks are sampled from a variational posterior
with , and the logistic sigmoid. The mask variables are parameterized by , enabling continuous edge-importance estimation and uncertainty quantification.
2. Objective and Training Procedure
VA-TGExplainer maximizes a variational evidence lower bound (ELBO) for with respect to : where is an elementwise prior (e.g., iid normal in logit space), the closed-form divergence, and a sparsity penalty (sum or norm over mask means). controls the sparsity–fidelity trade-off. The main loss is implemented with cross-entropy for TGNN link prediction, augmented by KL-divergence and sparsity regularization.
The training loop for a target event involves:
- Sampling edge mask variables via reparameterization per epoch
- Forming a masked graph
- Evaluating loss components: data likelihood (via frozen TGNN decoder), KL penalty, sparsity
- Gradient update (Adam optimizer) on using the composite loss Inference uses the posterior mean as the edge importance score, with the variance parameter quantifying uncertainty.
3. Temporal and Causal Structure Preservation
In alignment with provenance analysis requirements, the context subgraph for explanation construction is defined by the TGNN’s sliding window (e.g., 15 minutes) or causal neighborhood. The masking operation preserves temporal ordering: masks remove (or downweight) edges but do not permute or retime events. This design maintains the correspondence between explanation and the temporal-causal relationships critical for forensic analysis.
4. Edge Importance and Uncertainty Quantification
VA-TGExplainer outputs, for each edge in , both the mean inclusion probability and an uncertainty measure, such as , or empirically via sampling. High mean, low variance edges are considered definite contributors; high-variance edges are interpreted as tentative, highlighting model uncertainty in their explanatory relevance.
Explanations can be thresholded, e.g., with for inclusion. JSON reports for each edge typically contain fields: source, destination, relation type, mean importance, and variance—enabling downstream interpretation.
5. Architecture, Implementation, and Computational Characteristics
The VA-TGExplainer encoder employs a two-layer multilayer perceptron (MLP), with input features given by the concatenation of source/destination TGNN node embeddings (, ) and the original edge features , outputting and . The decoder is the frozen TGNN link prediction head, ensuring explanations are contextualized to the pre-trained model’s predictions. Adam optimization, with standard hyperparameters (, , ) and 200 epochs per event, is typical. Relaxation is achieved via logistic-normal reparameterization; Gumbel-softmax is not required.
Hardware requirements are moderate—single GPU operation with up to approximately 5k edges in ; fallback to CPU is triggered if less than 500 MB GPU memory remains. The evaluation of runtime and mask statistics on DARPA CADETS windows yields $3$–$5$ s overhead per event.
| Method | Time/event | Avg Mask Size | Comprehensiveness | Sufficiency |
|---|---|---|---|---|
| GraphMask | 1.8 s | ~10 edges | 0.90 | 0.08 |
| GNNExplainer | 2.5 s | ~5 edges | 0.82 | 0.15 |
| VA-TGExplainer | 3.8 s | ~5 edges | 0.84 | 0.12 |
6. Empirical Evaluation and Comparative Metrics
Empirical evaluation on the DARPA CADETS dataset demonstrates that VA-TGExplainer preserves high fidelity to the TGNN’s decision process. Ablation by removing the top-3 mean-mask edges lowers anomaly scores by 78% on average. Masks with 3–5 edges yield comprehensiveness above 0.8, and expansion to approximately 8 edges increases comprehensiveness beyond 0.9, but with diminishing returns.
Uncertainty quantification differentiates VA-TGExplainer: edges with variance below 0.02 are consistent across random restarts, while those above 0.1 fluctuate in approximately 30% of runs. This feature supports communication of explanation confidence to end-users such as SOC analysts.
Comparative baselines indicate:
- GraphMask produces larger, global window-level subgraphs (~10 edges), highest fidelity (0.90), and rapid computation, but lacks uncertainty measures.
- GNNExplainer delivers localized, event-specific masks and standard fidelity, without uncertainty quantification.
- VA-TGExplainer achieves similar mask compactness and slightly higher comprehensiveness than GNNExplainer, with the added benefit of modeling multiple plausible explanations and reporting explanation uncertainty (Dhanuka et al., 20 Dec 2025).
7. Application and Significance in SOC Analysis
VA-TGExplainer is architected for integration with post-hoc explainability frameworks such as PROVEX and temporal graph IDS like KAIROS, with general applicability to other temporal graph-based detection systems. By providing human-interpretable, uncertainty-aware explanations that highlight key causal subgraphs, VA-TGExplainer is designed to enhance SOC analyst trust, improve incident triage speed, and support provenance-based threat forensics. Its ability to quantify edge-level uncertainty in explanations has particular value under ambiguous, adversarial, or noisy attack scenarios, offering a principled approach to calibrating explanation confidence in large-scale, rapidly evolving cyber environments (Dhanuka et al., 20 Dec 2025).