Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 90 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 24 tok/s
GPT-5 High 27 tok/s Pro
GPT-4o 100 tok/s
GPT OSS 120B 478 tok/s Pro
Kimi K2 217 tok/s Pro
2000 character limit reached

Turning Tabular Foundation Models into Graph Foundation Models (2508.20906v1)

Published 28 Aug 2025 in cs.LG

Abstract: While foundation models have revolutionized such fields as natural language processing and computer vision, their application and potential within graph machine learning remain largely unexplored. One of the key challenges in designing graph foundation models (GFMs) is handling diverse node features that can vary across different graph datasets. Although many works on GFMs have been focused exclusively on text-attributed graphs, the problem of handling arbitrary features of other types in GFMs has not been fully addressed. However, this problem is not unique to the graph domain, as it also arises in the field of machine learning for tabular data. In this work, motivated by the recent success of tabular foundation models like TabPFNv2, we propose G2T-FM, a simple graph foundation model that employs TabPFNv2 as a backbone. Specifically, G2T-FM augments the original node features with neighborhood feature aggregation, adds structural embeddings, and then applies TabPFNv2 to the constructed node representations. Even in a fully in-context regime, our model achieves strong results, significantly outperforming publicly available GFMs and performing on par with well-tuned GNNs trained from scratch. Moreover, after finetuning, G2T-FM surpasses well-tuned GNN baselines, highlighting the potential of the proposed approach. More broadly, our paper reveals a previously overlooked direction of utilizing tabular foundation models for graph machine learning tasks.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces G2T-FM, a framework that repurposes TabPFNv2 through feature augmentation to handle heterogeneous node features in graph tasks.
  • Its novel pipeline, including Neighborhood Feature Aggregation (NFA), classic structural features, and PEARL, outperforms traditional GNNs and current GFMs in both classification and regression tasks.
  • Experimental evaluations and ablation studies confirm that integrating tabular modeling with graph-specific augmentations leads to improved generalization and robust performance.

Turning Tabular Foundation Models into Graph Foundation Models: A Technical Analysis

Motivation and Problem Statement

The paper addresses the challenge of developing general-purpose Graph Foundation Models (GFMs) capable of handling heterogeneous node features and target spaces across diverse graph domains. Existing GFMs predominantly focus on text-attributed graphs, leveraging pretrained text encoders, or employ dimensionality reduction techniques (e.g., SVD, PCA) to standardize feature spaces. These approaches are limited in their ability to process arbitrary feature types and do not generalize well to graphs with non-textual attributes or regression tasks. The authors propose leveraging advances in Tabular Foundation Models (TFMs), specifically TabPFNv2, to construct a GFM that can process arbitrary node features and targets by transforming graph tasks into tabular ones.

G2T-FM: Architecture and Methodology

The proposed Graph-to-Table Foundation Model (G2T-FM) framework augments node features with graph-derived information and applies a tabular foundation model (TabPFNv2) to the resulting representations. The augmentation pipeline consists of:

  • Neighborhood Feature Aggregation (NFA): For each node, numerical features are aggregated over neighbors using mean, max, and min; categorical features are one-hot encoded and averaged. This captures local neighborhood statistics.
  • Classic Structure-Based Features (SF): Node degree, PageRank, and the first KK Laplacian eigenvectors are computed to encode both local and global structural properties.
  • Learnable Structure-Based Encodings (PEARL): Random node initializations are processed by a GNN, repeated MM times, and averaged to break structural symmetries and enhance expressivity. Both learnable and non-learnable (random) variants are considered.

The concatenated feature vector is input to TabPFNv2, which operates in both in-context learning (ICL) and finetuning (FT) regimes. The model is designed to satisfy feature permutation invariance, label permutation equivariance, and node permutation equivariance in distribution. Figure 1

Figure 1: Overview of the proposed G2T-FM, illustrating the augmentation of node features with graph-derived components and subsequent processing by TabPFNv2.

Experimental Evaluation

Datasets and Protocol

Experiments are conducted on two collections: (1) graphs with tabular (non-textual) node features from the GraphLand benchmark, and (2) classical graph benchmarks with text-derived node features. Both regression and classification tasks are considered, with standardized splits (10% train, 10% validation, 80% test) and transductive evaluation. TabPFNv2 constraints limit experiments to datasets with ≤10 classes and ≤10,000 training samples.

Baselines

Comparisons are made against:

  • Well-tuned GNNs (GCN, GraphSAGE, GAT, GT) with residual connections, layer normalization, and MLP blocks.
  • LightGBM+NFA (gradient-boosted trees with neighborhood aggregation).
  • Publicly available GFMs (AnyGraph, OpenGraph, TS-GNN, GCOPE) in ICL and FT regimes.

Results

  • Existing GFMs underperform relative to well-tuned GNNs across both tabular and text-based datasets, except for isolated cases (e.g., TS-GNN on amazon-ratings).
  • G2T-FM (ICL) matches or exceeds GNNs on tabular datasets, with superior average rank and strong performance on specific datasets (e.g., tolokers-2, artnet-views).
  • G2T-FM (FT) consistently outperforms GNNs after finetuning, demonstrating positive transfer from pretraining and robustness across both tabular and text-based datasets.
  • Ablation studies confirm the necessity of all augmentation components (NFA, SF, PEARL), with performance drops observed upon removal. Enhanced baselines (GNNs/LightGBM with identical augmented features) do not close the gap to G2T-FM, indicating the synergy between the TabPFNv2 backbone and graph-to-tabular augmentation.

Implementation Considerations

  • Computational Requirements: TabPFNv2 imposes limits on class count and training set size; PCA is used to mitigate OOM errors on high-dimensional datasets.
  • Finetuning: Full model finetuning is preferred over parameter-efficient methods, with grid search over learning rates.
  • PEARL Integration: For GNNs, PEARL outputs are concatenated with node features and trained end-to-end; for LightGBM, random PEARL outputs are used due to lack of differentiability.
  • Symmetry Enforcement: Label shuffling is employed to ensure label permutation equivariance during multiclass classification.

Theoretical and Practical Implications

The paper demonstrates that tabular foundation models can be effectively repurposed for graph machine learning, overcoming limitations of existing GFMs in handling arbitrary feature and target spaces. The G2T-FM framework provides a simple yet powerful baseline, outperforming both traditional GNNs and current GFMs in standard (non-few-shot) settings. This suggests that the core challenges in GFM design—feature heterogeneity and target generalization—can be addressed by leveraging advances in tabular modeling and appropriate graph-to-tabular transformations.

Future Directions

  • Scalability: Extending G2T-FM to handle larger graphs and more classes by integrating scalable TFMs or optimizing the augmentation pipeline.
  • Graph-Specific Augmentations: Incorporating more sophisticated aggregation mechanisms (e.g., multi-hop, learnable aggregations) and cross-graph pretraining to enhance structural representation.
  • Generalization Across Modalities: Exploring the application of tabular foundation models to other data modalities (e.g., time series, multimodal graphs).
  • Benchmarking: Establishing standardized evaluation protocols for GFMs that reflect real-world graph heterogeneity and avoid misleading "zero-shot" terminology.

Conclusion

The paper provides a rigorous framework for transforming tabular foundation models into graph foundation models via feature augmentation and demonstrates strong empirical performance across diverse graph tasks. The results highlight the limitations of current GFMs and the potential of tabular models as a backbone for generalizable, robust graph learning. The approach sets a new baseline for GFMs and opens avenues for future research in scalable, modality-agnostic foundation models for graph-structured data.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube