Papers
Topics
Authors
Recent
2000 character limit reached

Variational Temporal GNN Explainer

Updated 27 December 2025
  • The paper introduces VA-TGExplainer, which employs variational masks to quantify the temporal importance of edges in TGNN-based intrusion detection.
  • The methodology uses reparameterized variational inference and ELBO optimization, showing that removing key edges can reduce anomaly scores by up to 78%.
  • The framework preserves temporal-causal structures and provides uncertainty estimates that enhance interpretability and trust for SOC analysts in forensic analysis.

A Variational Temporal Graph Explainer (VA-TGExplainer) is an explanation module designed for Temporal Graph Neural Network (TGNN)–based intrusion detection systems (IDS) operating on provenance graphs. It quantifies the importance of edges in a temporal context for high anomaly score predictions, outputs distributions over edge importances instead of deterministic masks, and explicitly models uncertainty in subgraph-based explanations. VA-TGExplainer operates downstream of TGNN link-prediction heads, providing fine-grained, uncertainty-aware explanations compatible with complex, time-evolving graph structures such as those derived from system audit data (Dhanuka et al., 20 Dec 2025).

1. Mathematical Formalism

Let G=(V,E,T)G = (V, E, T) denote a temporal provenance graph, with VV a node set (e.g., processes, files, sockets), EV×V×TE \subseteq V \times V \times T a set of directed, timestamped edges representing system events, and TT the set of discrete timestamps. For a target event (anomalous edge) yy in graph GG, the associated TGNN fθf_\theta provides a link prediction score, with anomaly scores derived from the negative log-likelihood of the event label: (G;θ)=logPθ(yG)\ell(G; \theta) = -\log P_\theta(y|G).

An explanation context subgraph SES \subseteq E is formed by selecting edges within a specific time window (e.g., 15 minutes) or k-hop causal neighborhoods relevant to yy, preserving event temporal order.

The explainer introduces a vector of random mask variables Z=(zi)iSZ = (z_i)_{i \in S}, where each zi[0,1]z_i \in [0,1] scales edge ii's presence in the masked graph GZG \odot Z. Masks are sampled from a variational posterior

qϕ(ZG,y)=iSqϕ(zi)q_\phi(Z|G, y) = \prod_{i \in S} q_\phi(z_i)

with zi=σ(μi+exp(12logσi2)ϵi)z_i = \sigma(\mu_i + \exp(\frac{1}{2}\log \sigma_i^2) \epsilon_i), ϵiN(0,1)\epsilon_i \sim \mathcal{N}(0,1) and σ\sigma the logistic sigmoid. The mask variables are parameterized by ϕ={μi,logσi2}\phi = \{\mu_i, \log \sigma_i^2\}, enabling continuous edge-importance estimation and uncertainty quantification.

2. Objective and Training Procedure

VA-TGExplainer maximizes a variational evidence lower bound (ELBO) for Pθ(yG)P_\theta(y|G) with respect to ϕ\phi: ELBO(ϕ)=EZqϕ[logPθ(yG,Z)]KL(qϕ(ZG,y)p(Z))λspΩ(EZqϕ[Z])\text{ELBO}(\phi) = \mathbb{E}_{Z \sim q_\phi} [ \log P_\theta(y | G, Z) ] - \mathrm{KL}(q_\phi(Z|G, y) \| p(Z)) - \lambda_{\text{sp}} \cdot \Omega(\mathbb{E}_{Z\sim q_\phi}[Z]) where p(Z)p(Z) is an elementwise prior (e.g., iid normal in logit space), KL\mathrm{KL} the closed-form divergence, and Ω()\Omega(\cdot) a sparsity penalty (sum or norm over mask means). λsp\lambda_{\text{sp}} controls the sparsity–fidelity trade-off. The main loss is implemented with cross-entropy for TGNN link prediction, augmented by KL-divergence and sparsity regularization.

The training loop for a target event involves:

  • Sampling edge mask variables ZZ via reparameterization per epoch
  • Forming a masked graph GZG \odot Z
  • Evaluating loss components: data likelihood (via frozen TGNN decoder), KL penalty, sparsity
  • Gradient update (Adam optimizer) on ϕ\phi using the composite loss Inference uses the posterior mean pi=σ(μi)p_i = \sigma(\mu_i) as the edge importance score, with the variance parameter σi2\sigma_i^2 quantifying uncertainty.

3. Temporal and Causal Structure Preservation

In alignment with provenance analysis requirements, the context subgraph SS for explanation construction is defined by the TGNN’s sliding window (e.g., 15 minutes) or causal neighborhood. The masking operation preserves temporal ordering: masks remove (or downweight) edges but do not permute or retime events. This design maintains the correspondence between explanation and the temporal-causal relationships critical for forensic analysis.

4. Edge Importance and Uncertainty Quantification

VA-TGExplainer outputs, for each edge ii in SS, both the mean inclusion probability pip_i and an uncertainty measure, such as Var(zi)σ(μi)(1σ(μi))exp(logσi2)1+exp(logσi2)\operatorname{Var}(z_i) \approx \sigma(\mu_i) (1 - \sigma(\mu_i)) \frac{\exp(\log \sigma_i^2)}{1 + \exp(\log \sigma_i^2)}, or empirically via sampling. High mean, low variance edges are considered definite contributors; high-variance edges are interpreted as tentative, highlighting model uncertainty in their explanatory relevance.

Explanations can be thresholded, e.g., with pi>0.7p_i > 0.7 for inclusion. JSON reports for each edge typically contain fields: source, destination, relation type, mean importance, and variance—enabling downstream interpretation.

5. Architecture, Implementation, and Computational Characteristics

The VA-TGExplainer encoder employs a two-layer multilayer perceptron (MLP), with input features given by the concatenation of source/destination TGNN node embeddings (huh_u, hvh_v) and the original edge features xex_e, outputting μi\mu_i and logσi2\log \sigma_i^2. The decoder is the frozen TGNN link prediction head, ensuring explanations are contextualized to the pre-trained model’s predictions. Adam optimization, with standard hyperparameters (lr=0.01\text{lr}=0.01, λKL=1e3\lambda_{\mathrm{KL}}=1e-3, λsp=1e3\lambda_{\mathrm{sp}}=1e-3) and 200 epochs per event, is typical. Relaxation is achieved via logistic-normal reparameterization; Gumbel-softmax is not required.

Hardware requirements are moderate—single GPU operation with up to approximately 5k edges in SS; fallback to CPU is triggered if less than 500 MB GPU memory remains. The evaluation of runtime and mask statistics on DARPA CADETS windows yields $3$–$5$ s overhead per event.

Method Time/event Avg Mask Size Comprehensiveness Sufficiency
GraphMask 1.8 s ~10 edges 0.90 0.08
GNNExplainer 2.5 s ~5 edges 0.82 0.15
VA-TGExplainer 3.8 s ~5 edges 0.84 0.12

6. Empirical Evaluation and Comparative Metrics

Empirical evaluation on the DARPA CADETS dataset demonstrates that VA-TGExplainer preserves high fidelity to the TGNN’s decision process. Ablation by removing the top-3 mean-mask edges lowers anomaly scores by 78% on average. Masks with 3–5 edges yield comprehensiveness above 0.8, and expansion to approximately 8 edges increases comprehensiveness beyond 0.9, but with diminishing returns.

Uncertainty quantification differentiates VA-TGExplainer: edges with variance below 0.02 are consistent across random restarts, while those above 0.1 fluctuate in approximately 30% of runs. This feature supports communication of explanation confidence to end-users such as SOC analysts.

Comparative baselines indicate:

  • GraphMask produces larger, global window-level subgraphs (~10 edges), highest fidelity (0.90), and rapid computation, but lacks uncertainty measures.
  • GNNExplainer delivers localized, event-specific masks and standard fidelity, without uncertainty quantification.
  • VA-TGExplainer achieves similar mask compactness and slightly higher comprehensiveness than GNNExplainer, with the added benefit of modeling multiple plausible explanations and reporting explanation uncertainty (Dhanuka et al., 20 Dec 2025).

7. Application and Significance in SOC Analysis

VA-TGExplainer is architected for integration with post-hoc explainability frameworks such as PROVEX and temporal graph IDS like KAIROS, with general applicability to other temporal graph-based detection systems. By providing human-interpretable, uncertainty-aware explanations that highlight key causal subgraphs, VA-TGExplainer is designed to enhance SOC analyst trust, improve incident triage speed, and support provenance-based threat forensics. Its ability to quantify edge-level uncertainty in explanations has particular value under ambiguous, adversarial, or noisy attack scenarios, offering a principled approach to calibrating explanation confidence in large-scale, rapidly evolving cyber environments (Dhanuka et al., 20 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Variational Temporal GNN Explainer (VA-TGExplainer).