GFMate: Empowering Graph Foundation Models with Test-time Prompt Tuning

Published 14 May 2026 in cs.LG | (2605.14809v1)

Abstract: Graph prompt tuning has shown great potential in graph learning by introducing trainable prompts to enhance the model performance in conventional single-domain scenarios. Recent research has extended graph prompts to improve Graph Foundation Models (GFMs) by few-shot tuning auxiliary prompts. Despite their progress, most existing methods embed source-domain information into prompts, which serve either as input to GFMs or encoded during model pre-training. Such prompt entanglement with specific source domains and GFM pre-training strategy restricts their generalisability to other domains and different GFMs. Furthermore, existing GFM prompts merely rely on few-shot tuning for adaptation, neglecting the rich information in unlabelled target domain test data. Motivated by these insights, this paper aims to empower GFMs with pre-training-agnostic test-time graph prompt tuning, named GFMate. GFMate introduces centroid and layer prompts applied after pre-training on target domains, avoiding entanglement with specific source domains and model pre-training. In addition, a test-time complementary learning objective is devised to exploit both labelled and unlabelled target domain data for effective test-time prompt tuning. Extensive experiments on 12 benchmark datasets demonstrate the superior performance and efficiency of GFMate, achieving improvements of up to 30.63%. Code is available at https://github.com/YanJiangJerry/GFMate.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a pre-training-agnostic test-time prompt tuning framework for graph foundation models to boost cross-domain adaptation.
It employs centroid and layer prompts alongside a novel test-time graph complementary learning objective to optimize performance on both labelled and unlabelled nodes.
Empirical results demonstrate up to 30.63% accuracy improvement and reduced computational overhead, confirming GFMate’s robustness and efficiency across diverse datasets.

GFMate: Pre-training-Agnostic Test-Time Prompt Tuning for Graph Foundation Models

Introduction and Motivation

The paper "GFMate: Empowering Graph Foundation Models with Test-time Prompt Tuning" (2605.14809) presents a novel methodology for improving the adaptation and generalisability of Graph Foundation Models (GFMs) in cross-domain graph learning tasks. Existing GFM prompt-tuning approaches have significant limitations due to their reliance on pre-training-entangled prompts, which are closely coupled to source domain distributions and specific pre-training strategies. Consequently, these methods exhibit poor transferability to unseen target domains and architectures, and their adaptation mechanisms commonly ignore the information embedded in abundant unlabelled target domain samples. GFMate addresses these challenges by proposing a test-time prompt tuning framework that is explicitly pre-training-agnostic and designed to exploit both labelled and unlabelled target domain data for robust domain adaptation.

Limitations of Existing GFM Prompt Tuning Paradigms

Prevailing GFM prompt tuning frameworks are characterized by the joint pre-training of backbone models and prompt vectors on multiple source domains, with adaptation to target domains achieved solely via few-shot labelled examples. The fundamental limitations inherent to this paradigm are:

Domain and Model Entanglement: Prompts are encoded with source-domain information, rendering them non-transferable when the target domain exhibits a divergent graph structure or feature distribution, or when a model is pre-trained via a different self-supervised objective or architecture.
Limited Exploitation of Test Data: While few-shot labelled nodes are directly utilized for prompt fine-tuning, unlabelled target domain nodes serve only as contextual neighbors in the message-passing scheme and do not actively contribute to prompt optimization. This leads to poor modeling of the test distribution and sub-optimal adaptation under distribution shift.

The paper empirically demonstrates that hop-wise aggregation performance varies substantially across domains in pre-trained GFMs, and the embedding alignment between few-shot nodes and test nodes is often poor in unseen domains, causing classification degradation.

GFMate Framework

GFMate introduces two key innovations: (1) pre-training-agnostic prompt designs, and (2) a test-time graph complementary learning (TGCL) objective for prompt optimization.

Pre-training-Agnostic Prompt Design

Instead of coupling prompts with the pre-training process, GFMate defines and learns all prompts strictly post pre-training:

Centroid Prompts: For each class, centroid prompts are randomly initialized and added to the few-shot computed class centroids, refining their position in the latent space to better represent true class centers in the target domain.
Layer Prompts: Multi-layer ensembling is performed by introducing layer prompts, which act as learnable coefficients determining the aggregation weight for each GNN layer, thus enabling adaptive exploitation of domain-specific hop-aggregation patterns.

This formulation is entirely agnostic to pre-training strategies, source domain choices, and backbone architectures, maximizing cross-domain and cross-model generalisability.

Test-Time Graph Complementary Learning (TGCL)

GFMate actively leverages both labelled and unlabelled nodes from the target domain via a novel complementary learning objective:

Complementary Labels: At test time, unlabelled nodes are assigned complementary labels corresponding to the least similar predicted class (as determined by an entropy-based layer selection).
TGCL Objective: The optimization jointly minimizes the convex combination of losses on labelled (few-shot) and complementary-labelled test nodes, thereby forcing the prompts to be optimized with respect to the entire test distribution. Theoretical analysis establishes an excess risk bound dependent on the number of classes and the size of the unlabelled set, making explicit the generalisability benefits of leveraging abundant test data.

Empirical Results

GFMate exhibits strong empirical performance across 12 benchmark datasets spanning social, citation, commercial, and biological networks, in both node- and graph-level classification. Key findings include:

Superior Accuracy: GFMate achieves up to a 30.63% accuracy improvement over state-of-the-art cross-domain GFM methods in one-shot settings. Performance gains are especially prominent under pronounced distribution shifts and in binary/few-class regimes, aligning with the theoretical generalization analysis.
Efficiency: The framework significantly reduces downstream adaptation time, GPU memory consumption, and the number of tunable parameters compared to prompt design paradigms requiring prompt parameterization per domain/sample or involving complex fine-tuning procedures.
Ablation and Robustness: All GFMate modules, including centroid prompt, layer prompt, and TGCL, are essential for peak empirical performance. The method demonstrates robustness to feature and structure noise, pre-training domain shift, and varying few-shot regimes.
Generalisability: GFMate can be plugged into any GNN-based GFM, regardless of the pre-training objective (e.g., link prediction, contrastive learning, deep graph infomax), and consistently boosts adaptation efficacy.

Theoretical Implications

The adoption of test-time complementary learning yields a tighter excess risk bound as the number of complementary-labelled samples increases or the number of classes decreases, according to Rademacher complexity-based analysis. This property uniquely positions GFMate to benefit from the full spectrum of available test data—a theoretical advantage empirically confirmed by enhanced binary classification results and ablation studies on test data usage.

Practical and Theoretical Impact

Practically, GFMate provides a lightweight, efficient, and general framework for GNN-based GFM adaptation in cross-domain scenarios. It removes dependence on domain similarity assumptions and custom pre-training strategies. Test-time adaptation is accomplished without intrusive re-training or test graph perturbation, making GFMate suitable for deployment atop a wide range of pre-trained generic GFM architectures. The framework's design is not applicable to LLM-based GFMs or text-attributed graphs, suggesting avenues for future work in prompt compatibility across heterogeneous GFM backbones.

Theoretically, GFMate leverages a test-time learning objective founded on robust risk minimization under label noise, extending the formalism of complementary-label learning to domain adaptation for graphs. This bridges test-time training and prompt tuning paradigms, providing both theoretical generalization guarantees and empirical strategies mitigating distributional shift.

Conclusion

GFMate establishes a new paradigm for GFM adaptation in cross-domain graph learning by introducing pre-training-agnostic test-time prompt tuning, actively leveraging unlabelled target domain data. It achieves significant efficiency and accuracy benefits over state-of-the-art methods and is universally applicable to GNN-based GFMs, independent of pre-training protocol. Future research directions entail adaptation to text-attributed and LLM-based GFMs, and extension to new types of graph tasks and architectures.

Markdown Report Issue