Graph Invariant and Variant Embedding (GIVE)
- GIVE is a framework that disentangles invariant graph substructures (consistent across environments) from variant ones (domain-specific), ensuring robust representation.
- It leverages advanced architectures such as hybrid Transformer/MPNN models and Sinkhorn-based attention to effectively partition and encode graph features.
- Empirical evaluations in molecular prediction, brain connectome analysis, and node classification confirm GIVE’s ability in enhancing out-of-distribution generalization.
Graph Invariant and Variant Embedding (GIVE) is a framework for disentangling and jointly modeling graph substructures that are invariants—statistically or causally stable under given distributional shifts—and those that are variant—domain-specific, spurious, or transient. GIVE is central to recent advances in out-of-distribution (OOD) generalization, expressive graph learning, and interpretable representation schemes for graph data.
1. Formal Foundations: Graph Invariant and Variant Functions
Let denote a graph with nodes , edges (adjacency matrix ), and optional node features . In the GIVE conceptualization, the goal is to create two complementary representations:
- Invariant embedding: Captures substructures and features whose relation to downstream labels is consistent across environments or under interventions; formally, is invariant in , for .
- Variant embedding: Captures substructures whose statistical properties vary across conditions or environments and may contribute to spurious correlations.
The mathematical backbone of invariant embeddings is permutation invariance with respect to node relabeling; i.e., for a permutation , 0. This is realized at the graph level (for tasks like classification or regression) or at the node/edge level via permutation equivariant functions 1 such that 2. Universality results establish that suitable architectures (e.g., higher-order Folklore GNNs) can approximate every continuous invariant or equivariant function, up to the distinguishing power of Weisfeiler-Lehman graph isomorphism tests (Azizian et al., 2020).
2. Deep Learning Architectures for GIVE
2.1 GOODFormer: Entropy-Guided GIVE
GOODFormer operationalizes the GIVE principle through three key modules (Liao et al., 1 Aug 2025):
- Entropy-Guided Invariant Subgraph Disentangler: For each graph, computes attention maps via hybrid Transformer/MPNN layers. Two paths are created:
- Invariant (3): Softmax4 attention; edges/nodes with high attention are considered invariant.
- Variant (5): Softmax6 attention; the complement, capturing variant content.
- Soft masks 7 partition the adjacency 8 into 9 (invariant) and 0 (variant), guiding MPNN propagation. An entropy penalty 1 enforces attention sharpness, supplemented at inference by per-graph temperature tuning to maintain sharpness under shifted test distributions.
- Evolving Subgraph Positional/Structural Encoding (PSE): Standard graph positional encodings are replaced with a trainable MPNN 2 that efficiently generates a PSE for each subgraph, with an auxiliary MLP 3 reconstructing a global hand-crafted PE to regulate information flow and prevent shortcut leakage between invariant/variant streams.
- Invariant Learning with Interventional Loss: Final merged representations from invariant and variant streams are pooled and classified separately; a causal-intervention-inspired loss (with variance-penalized risk under do(4)) is minimized to force 5's (variant subgraph) information to be uninformative, given 6 (invariant subgraph). The final prediction uses only 7, "blocking" spurious effects at inference. Training jointly optimizes disentanglement, PSE reconstruction, and invariant objectives.
These modules jointly yield a robust graph embedding generalizing under substantial distribution shifts, as evidenced in benchmarks (Liao et al., 1 Aug 2025).
2.2 SNIGL: Probability of Necessity and Sufficiency–Based GIVE
SNIGL (Chen et al., 2024) refines the GIVE paradigm by requiring that the extracted invariant subgraph 8 is both necessary and sufficient for the label 9. The "probability of necessity and sufficiency" (PNS) framework rigorously quantifies this core invariance: 0 The SNIGL architecture comprises two parallel rationale-extracting GNNs: 1 for 2 (invariant) and 3 for 4 (variant), with subsequent independent classifiers and a calibrated, ensemble COMBINE prediction rule at inference. The training objective explicitly minimizes PNS-based risk, auxiliary invariance risk, joint cross-entropy, and independence penalties, addressing OOD generalization loss due to insufficient or unnecessary invariants. Empirical evaluation on OOD benchmarks shows SNIGL achieving state-of-the-art OOD robustness (Chen et al., 2024).
2.3 Sinkhorn-Based GIVE Extensions
GSINA (Ding et al., 2024) proposes Graph Sinkhorn Attention, leveraging entropic optimal transport to achieve sparse, soft, and fully differentiable attention masks for subgraph extraction. The GIVE extension computes two attention-based embeddings: an invariant stream using the first row of the Sinkhorn transport plan, and a variant stream using the complement. Fused representations allow for simultaneous training of invariant and variant channels, with optional orthogonality or contrastive penalties to enforce information separation (Ding et al., 2024).
2.4 Longitudinal Dynamics: Brain Graphs
In longitudinal brain connectome modeling, GIVE facilitates temporal and spatial decomposition by using EvolveGCN-style time-dynamic node embeddings as invariants, and hypergraph-based edge embeddings (spatial and cross-time) as variants. These multi-type embeddings allow interpretable, tokenized input to Transformer-based architectures for diagnosis/prognosis tasks (Dong et al., 2023).
3. Theoretical Guarantees and Expressiveness
GIVE unifies the function classes of permutation-invariant and partial permutation-invariant maps. Theorem 6 in (Gui et al., 2019) establishes that any continuous (partial) permutation-invariant function on neighborhoods (or graphs) can be represented as
5
with (shared) MLPs 6, thus enabling universal approximation in (typed) graphs. This mathematical form underlies practical invariance-oriented architectures such as PINE, guaranteeing that all node or graph representations are robust to neighbor ordering and identity (Gui et al., 2019).
More generally, (Azizian et al., 2020) shows that for any 7, 8-FGNNs achieve the full expressiveness of 9-WL, with universality as 0 (number of nodes). Invariant layers capture graph-level tasks; equivariant layers are required for node- or edge-level tasks.
4. Practical Methodologies: Masking, Attention, and Efficient Embedding
In practice, various design patterns are established:
- Soft/hard subgraph masking: Masks (soft via sigmoid or Sinkhorn, hard via sampling) assign edge/node membership to invariant/variant subgraphs.
- Hybrid GNN–Transformer blocks: Enable simultaneous long-range/global and local processing, with fine-grained attention used to distinguish invariants from variants.
- Masked batched tensors: Padding and binary masks (as in FGNNs (Azizian et al., 2020)) enable efficient batching and training over variable-sized graphs.
- Bi-Lipschitz and kernel-based permutation-invariant embeddings: Sorting-based and polynomial-based mechanisms (see (Balan et al., 2022)) yield embeddings that are robust, injective, and theoretically suited for universal function approximation on graphs, enabling permutation-invariant downstream learning.
5. Applications and Empirical Results
GIVE has demonstrated significant utility in:
- OOD generalization in molecular property prediction: Both GOODFormer and SNIGL outperform prior baselines on synthetic and real-world datasets, including motif-perturbed and scaffold-split benchmarks. For instance, on GOOD-HIV and OGBG-Molsider, SNIGL shows ROC-AUC improvements of 2–4% over the second-best (Chen et al., 2024, Liao et al., 1 Aug 2025).
- Brain connectome analysis: GIVE-based longitudinal representations yield notable gains in distinguishing Alzheimer’s stages and progression on fMRI data (Dong et al., 2023).
- Universal node/graph embeddings: PINE outperforms DeepWalk, node2vec, GCN, and GAT on both homogeneous and heterogeneous node classification, with 2–5 points improvement across evaluation splits (Gui et al., 2019).
A summary table of key GIVE paradigms and their architectural signatures:
| Model | Invariant Subgraph Extraction | Variant Channel | Fusion/Downstream |
|---|---|---|---|
| GOODFormer | Entropy-guided softmask + hybrid attention | Complementary softmax; MPNN/MLP | Interventional loss, block variant at test |
| SNIGL | PNS-optimized masking (Gumbel-Softmax) | Parallel rationale GNN, domain-specific | Logit-space calibrated ensemble |
| GSINA | Sinkhorn OT-based edge selection | Residual attention | Concatenation or contrastive head |
| PINE | Partial permutation-invariant sum + MLP | N/A (nodewise emb.) | Direct node embedding |
| BrainTokenGT | GCN+GRU node evolution | Dual (spatial/temporal) edge token | Token-level Transformer readout |
6. Open Problems and Future Directions
Current challenges and future research avenues for GIVE include:
- Relaxing independence assumptions: Many algorithms assume conditional independence 1 for clean decomposition; relaxing this may broaden the framework's applicability (Chen et al., 2024).
- Scaling high-order invariance and efficient embedding: While higher-order FGNNs yield maximal expressivity, their 2 complexity for k-WL separation remains a bottleneck (Azizian et al., 2020).
- Automated subgraph selection: Strategies for subgraph size/higher-order motif discovery in 3 remain mostly hand-tuned or fixed (Liao et al., 1 Aug 2025, Chen et al., 2024).
- Generalization beyond binary/multiclass labels: Extending GIVE to multi-label or regression settings, and for dynamic or heterogeneous graphs, is ongoing.
- Theory-guided learning objectives: Incorporating recent advances in causal risk minimization and information bottleneck regularization to further enforce invariance (Chen et al., 2024, Liao et al., 1 Aug 2025).
GIVE synthesizes foundational invariance theories, deep graph representation learning techniques, and modern causal-verification principles to deliver robust, interpretable, and powerful graph embeddings for a wide array of challenging tasks.