G2T-FM: Graph Model for Tabular Features
- G2T-FM is a graph foundation model that augments node representations with structural, neighborhood, and learnable encodings for universal applicability.
- It employs Neighborhood Feature Aggregation, classic structure-based features, and PEARL to fuse local and global graph information effectively.
- Empirical evaluations show that G2T-FM outperforms traditional GFMs in both in-context and finetuning regimes across diverse datasets.
G2T-FM is a graph foundation model built to leverage tabular foundation models—specifically TabPFNv2—for graph machine learning tasks involving arbitrary and heterogeneous node features. Contrary to prior graph foundation models (GFMs) that have focused on text-attributed graphs, G2T-FM is designed for universal applicability, including non-textual (tabular) node features, by augmenting node representations with structural, neighborhood, and learnable encodings prior to tabular processing.
1. Model Architecture and Feature Augmentation
The architecture centers on transforming each node’s representation into a form suitable for TabPFNv2 by concatenating the node’s original features with graph-derived augmentations. The process is defined as:
- Neighborhood Feature Aggregation (NFA): For node , aggregate neighborhood features using mean, maximum, and minimum statistics (numerical features), and average one-hot encodings (categorical features):
where is the set of neighbors and is the feature of neighbor .
- Classic Structure-Based Features (SF): Compute degree, PageRank, and first Laplacian eigenvectors for each node. Laplacian eigenvectors are extracted from the graph Laplacian to encode positional information.
- Learnable Structure-Based Encodings (PEARL): Each node receives a random initialization that is processed by a small GNN multiple times, averaging results to provide powerful, permutation-equivariant encodings.
The final node representation for each node is:
where are the original node features. This composite is input to TabPFNv2, which applies transformer-like processing, random positional encodings, and attention, preserving invariance to feature order and label permutation through mechanisms such as label shuffling.
2. Handling Arbitrary Node Features
G2T-FM is agnostic to the type of node features (numerical, categorical, or mixed). Unlike methods that require text-based features or transformations, G2T-FM:
- Accepts original node features in their native format.
- Uses feature aggregation to encode local topology, capturing nuanced relationships in the neighborhood.
- Integrates classic graph descriptors for a richer global context.
- Utilizes PEARL to ensure expressive, permutation-equivariant structural information even in graphs lacking strong symmetry-breaking attributes.
This ensures G2T-FM can process a wide array of graph datasets, including those from domains (e.g., city analytics, crowdsourcing, art networks) where node attributes are not naturally textual.
3. Empirical Performance
Extensive evaluations demonstrate G2T-FM's efficacy:
- In-Context Regime: When the full training set is supplied as prompt (without parameter updates), G2T-FM outperforms leading public GFMs (AnyGraph, OpenGraph, TS-GNN) and matches well-tuned classical GNNs (GCN, GraphSAGE, GAT, Graph Transformer), measured using average precision (binary tasks), accuracy (multiclass), and (regression).
- Finetuning: Upon gradient-based optimization of both the TabPFNv2 backbone and the PEARL module, G2T-FM surpasses classic GNNs trained from scratch. Preprocessing steps such as PCA may be applied to accommodate high-dimensional features.
- Dataset Coverage: Strong results are reported on datasets with diverse feature types, such as tolokers-2 (crowdsourcing), city-reviews (urban analytics), and artnet-views (art network analysis), indicating broad applicability.
4. Learning Paradigms: In-Context vs. Finetuned
G2T-FM supports two usage paradigms:
- In-Context Learning (ICL): The model is conditioned on labeled training data and evaluated on test data, with no gradient updates. This regime yields robust performance, showing the power of tabular backbones with graph augmentations even absent fine-scale parameter adaptation.
- Finetuning: Model parameters are updated using the downstream task’s training data. Substantive performance improvements are observed, with task-specific adaptation allowing G2T-FM to outperform strong baselines across tasks. The approach is robust to varying feature spaces and graph topologies.
5. Implications for Graph Foundation Modeling
The utilization of TabPFNv2 reveals a previously overlooked direction in graph representation learning:
- Generalization: G2T-FM is capable of processing arbitrary node feature spaces without the limiting assumptions of text-form features or restricted modalities.
- Unification: The methodology unifies tabular and graph modalities, reflecting the similar challenges in arbitrary feature handling, and enables cross-domain advancements through tabular-model-derived techniques.
- Competitive Edge: G2T-FM demonstrates that tabular modeling, when paired with critical graph augmentations, can yield competitive and superior performance to specialized GNNs—even in “foundation” scenarios with extremely varied data.
- Research Pathways: The presented architecture encourages further augmentation—such as multi-hop neighborhood encoding, dynamic neighborhood interactions, or cross-graph pretraining—supporting expansion into node regression, fraud detection, and other domains.
6. Summary Table: Core Components and Roles
Component | Description | Role in G2T-FM |
---|---|---|
NFA | Neighbor statistics (mean, max, min) | Local context, topology |
SF | Degree, PageRank, Laplacian eigvecs | Global and relative position |
PEARL | Random GNN-based encodings | Symmetry breaking, equivariance |
TabPFNv2 | Transformer tabular backbone | Feature/structure processing |
The confluence of tabular feature processing, contextually rich graph augmentations, and robust learning paradigms establishes G2T-FM as a generic, high-performing graph foundation model suitable for diverse, real-world tasks. This approach signals a broader shift in graph ML, demonstrating the viability and strengths of adapting tabular foundation models for graph-centric challenges (Eremeev et al., 28 Aug 2025).