Foundation Graph Models

Updated 13 May 2026

Foundation Graph Models are unified pretrained models that learn from diverse graph data to support node-, edge-, and graph-level inference with zero- or few-shot adaptation.
They leverage a structural vocabulary of subgraphs—such as trees and cycles—and Riemannian geometry to capture hierarchical and cyclic patterns across domains.
GFMs demonstrate significant gains in few-shot performance, outperforming traditional GNNs in tasks like link prediction and node classification.

A Foundation Graph Model (GFM) is defined as a unified, pretrained model that learns from massive, heterogeneous collections of graph-structured data and transfers across domains and tasks, supporting node-, edge-, and graph-level inference, often with zero- or few-shot adaptation. GFMs bring the scalable, pretrain-then-adapt paradigm from language and vision to the graph domain, but must address unique challenges arising from non-Euclidean structure, domain heterogeneity, and the absence of standard discrete vocabularies. Recent advances have established new principles and architectures that generalize beyond traditional task-specific Graph Neural Networks (GNNs), yielding significant gains in transferability and emergent capabilities across diverse graph tasks (Sun et al., 5 Feb 2025, Wang et al., 2024, Bechler-Speicher et al., 4 Feb 2026, Wang et al., 21 May 2025).

1. Conceptual Foundations and Motivation

GFMs aim to supersede the traditional paradigm in which GNNs are trained from scratch for one task and one graph. The GFM paradigm comprises three core elements:

Pretraining on Diverse Graph Data: The model is trained on large, multi-domain collections, learning to encode transferable structural and semantic knowledge (Liu et al., 2023, Wang et al., 2024).
Cross-task Adaptation: The pretrained model adapts—typically via fine-tuning, prompt tuning, or even zero-shot inference—across node, edge, and graph tasks, exhibiting data efficiency and broad generalization.
Emergence and Homogenization: At sufficient scale, new abilities arise, analogous to in-context learning and chain-of-thought reasoning in LLMs.

Traditional GNNs lack these properties due to their narrow scope and lack of universal inductive biases. Key goals are emergence (unlocked at scale), task/dataset homogenization, and principled transferability (Mao et al., 2024, Wang et al., 21 May 2025).

2. Structural Vocabulary: The Basis for Transfer

A central insight driving recent GFMs is the analogy between the discrete vocabulary of LLMs and transferable subgraphs in graphs. In RiemannGFM, this is formalized as a structural vocabulary: a collection of subgraph types (e.g., trees, cycles) such that any target graph is covered by overlapping instances from this set (Sun et al., 5 Feb 2025). The model leverages universal substructures:

Trees: Acyclic, connected fragments capturing hierarchies and spanning structures.
Cycles: Simple loops encoding non-tree-like regularities, such as triangles and quadrangles.

The selection of a minimal domain-agnostic vocabulary enables decomposing arbitrary graphs into reusable, transferable units, providing the inductive bias necessary for zero- and few-shot transfer. This substructure-centric approach is distinct from the motif-agnostic pretraining of classical GNNs and is motivated by theoretical desiderata: expressiveness, invariance, and stability (Mao et al., 2024).

3. Riemannian Geometry and Model Architecture

RiemannGFM exemplifies a lineage of GFMs where graph structural patterns are encoded in geometric spaces with mixed curvature (Sun et al., 5 Feb 2025):

Manifold Construction: The state space is a product bundle, mixing hyperbolic (for hierarchical/tree structure) and spherical (for cyclic/loop structure) spaces:

$\mathcal{P} = (\mathcal{H}_{\kappa_H}^{d_H} \otimes \mathcal{T}\mathcal{H}_{\kappa_H}^{d_H}) \otimes (\mathcal{S}_{\kappa_S}^{d_S} \otimes \mathcal{T}\mathcal{S}_{\kappa_S}^{d_S}).$

Node Encoding: Each node is represented by manifold coordinates and tangent encodings for both curvature regimes.
Universal Riemannian Layers: Two modules per layer:
- Vocabulary Learning (Substructure-Level): Cross-geometry attention updates node representations by allowing tree and cycle substructures to exchange information.
- Global Learning (Graph-Level): Bundle convolution and geometric averaging align node representations across multiple substructure instances.

Fully Riemannian layers and manifold-preserving linear maps provide a rigorously defined geometry for both node and subgraph embeddings. Geodesic distances, parallel transport, and tangent space mappings establish equivariance and permit precise alignment across domains.

4. Pretraining Objectives and Transfer Mechanisms

RiemannGFM and related GFMs employ pretraining objectives tailored to structural information, eschewing reliance on text (Sun et al., 5 Feb 2025):

Contrastive Learning: Augmentations provided by dual geometric views (hyperbolic/spherical) support a geometric contrastive loss over parallel-transported tangent encodings. The loss aligns representations across structural augmentations:

$\mathcal{J}(H, S) = -\sum_{i=1}^N \log \frac{\exp\langle PT_{p^H_i \to o}(z^H_i), PT_{p^S_i \to o}(z^S_i) \rangle}{\sum_{j=1}^N \exp\langle PT_{p^H_i \to o}(z^H_i), PT_{p^S_j \to o}(z^S_j) \rangle}.$

Manifold-Preserving Linear Transformations: Ensure architectural compatibility with the Riemannian structure, crucial for stable transfer.
Pretraining Workflows: Alternate substructure sampling, manifold-structured message passing, and parameter updates. Initializations employ spectral graph features (Laplacian eigenvectors) for unbiased starting points.
Adaptation: Downstream adaptation is parameter efficient due to shared substructure representations, typically requiring only projection layers or shallow adapters to map foundation model outputs to task-specific heads.

5. Empirical Evaluation, Ablations, and Transfer Analysis

Extensive cross-domain and few-shot evaluations demonstrate that structure-centric GFMs outperform both classical GNNs and recent graph foundation models on link prediction, node classification, and few-shot learning (Sun et al., 5 Feb 2025):

| Model | Citeseer AUC | GitHub 1-shot ACC | Airports ACC | GraphMAE2 | 92.8% | 65% | 52.3% | |--------------|--------------|-------------------|--------------|-----------|--------------|---------------| | RiemannGFM | 99.4% | >84% (5-shot) | 55.3% |

Key findings include:

Few-Shot Transfer: RiemannGFM achieves +15–20% absolute accuracy gains over baselines in one-shot settings (GitHub, Airport), with robust performance across text-rich, mixed, and non-attributed graphs.
Substructure-Geometric Ablations: Embedding trees in hyperbolic and cycles in spherical spaces provides best transfer; alternative pairings degrade accuracy by 2–3%. Cross-geometry attention further boosts performance by 1–2 points.
Domain Robustness: Pretraining on diverse graphs (e.g., Flickr, WikiCS) yields <2% accuracy variance, in contrast to competing GFMs that degrade heavily under domain shift.

The structural vocabulary and geometric alignment not only transfer between tasks but also confer stability and robustness to input heterogeneity.

6. Theoretical Properties and Future Directions

The theoretical underpinnings of a structural vocabulary embedded in Riemannian manifolds provide both practical transferability and formal guarantees (Sun et al., 5 Feb 2025, Mao et al., 2024):

Expressivity: Any graph can be decomposed into trees and cycles, ensuring that the vocabulary is fundamentally sufficient for arbitrary discrete graphs.
Stability: Geometric constructions admit provable invariance to node relabelings, and manifold-based transformations preserve structure under perturbations.
Scaling Laws: Inclusion of a fixed, expressive vocabulary enables, by analogy to token-based models in NLP/CV, the postulation of neural scaling behavior for graphs—i.e., that performance improves predictably as model or pretraining data scale, given architectural and vocabulary alignment (Mao et al., 2024).
Generalization: The Riemannian architecture is anticipated to support extensions to higher-order motifs (cliques, bicliques), heterogeneous graphs, and time-dependent structure via curvature adaptation.

Open challenges include broadening the vocabulary to more complex substructures, integrating Euclidean or pseudospherical curvature regimes, and developing scaling recipes tailored to non-Euclidean domains.

7. Significance and Implications

Foundation Graph Models, as exemplified by RiemannGFM, mark a shift from ad hoc, task-specific architectures to universal, structurally grounded models for graphs. By grounding representations in a shared geometric and substructural basis, these models enable robust cross-domain transfer, emergence of generalization beyond text-reliant or sequential graph representations, and theoretical guarantees on invariance and expressivity. The structural vocabulary paradigm opens the path to universal graph intelligence, independent of textual annotation, and establishes a platform for future work on universal, geometry-driven representation learning in graphs (Sun et al., 5 Feb 2025, Wang et al., 2024, Mao et al., 2024, Wang et al., 21 May 2025).