- The paper introduces a pre-training-agnostic test-time prompt tuning framework for graph foundation models to boost cross-domain adaptation.
- It employs centroid and layer prompts alongside a novel test-time graph complementary learning objective to optimize performance on both labelled and unlabelled nodes.
- Empirical results demonstrate up to 30.63% accuracy improvement and reduced computational overhead, confirming GFMate’s robustness and efficiency across diverse datasets.
GFMate: Pre-training-Agnostic Test-Time Prompt Tuning for Graph Foundation Models
Introduction and Motivation
The paper "GFMate: Empowering Graph Foundation Models with Test-time Prompt Tuning" (2605.14809) presents a novel methodology for improving the adaptation and generalisability of Graph Foundation Models (GFMs) in cross-domain graph learning tasks. Existing GFM prompt-tuning approaches have significant limitations due to their reliance on pre-training-entangled prompts, which are closely coupled to source domain distributions and specific pre-training strategies. Consequently, these methods exhibit poor transferability to unseen target domains and architectures, and their adaptation mechanisms commonly ignore the information embedded in abundant unlabelled target domain samples. GFMate addresses these challenges by proposing a test-time prompt tuning framework that is explicitly pre-training-agnostic and designed to exploit both labelled and unlabelled target domain data for robust domain adaptation.
Limitations of Existing GFM Prompt Tuning Paradigms
Prevailing GFM prompt tuning frameworks are characterized by the joint pre-training of backbone models and prompt vectors on multiple source domains, with adaptation to target domains achieved solely via few-shot labelled examples. The fundamental limitations inherent to this paradigm are:
- Domain and Model Entanglement: Prompts are encoded with source-domain information, rendering them non-transferable when the target domain exhibits a divergent graph structure or feature distribution, or when a model is pre-trained via a different self-supervised objective or architecture.
- Limited Exploitation of Test Data: While few-shot labelled nodes are directly utilized for prompt fine-tuning, unlabelled target domain nodes serve only as contextual neighbors in the message-passing scheme and do not actively contribute to prompt optimization. This leads to poor modeling of the test distribution and sub-optimal adaptation under distribution shift.
The paper empirically demonstrates that hop-wise aggregation performance varies substantially across domains in pre-trained GFMs, and the embedding alignment between few-shot nodes and test nodes is often poor in unseen domains, causing classification degradation.
GFMate Framework
GFMate introduces two key innovations: (1) pre-training-agnostic prompt designs, and (2) a test-time graph complementary learning (TGCL) objective for prompt optimization.
Pre-training-Agnostic Prompt Design
Instead of coupling prompts with the pre-training process, GFMate defines and learns all prompts strictly post pre-training:
- Centroid Prompts: For each class, centroid prompts are randomly initialized and added to the few-shot computed class centroids, refining their position in the latent space to better represent true class centers in the target domain.
- Layer Prompts: Multi-layer ensembling is performed by introducing layer prompts, which act as learnable coefficients determining the aggregation weight for each GNN layer, thus enabling adaptive exploitation of domain-specific hop-aggregation patterns.
This formulation is entirely agnostic to pre-training strategies, source domain choices, and backbone architectures, maximizing cross-domain and cross-model generalisability.
Test-Time Graph Complementary Learning (TGCL)
GFMate actively leverages both labelled and unlabelled nodes from the target domain via a novel complementary learning objective:
- Complementary Labels: At test time, unlabelled nodes are assigned complementary labels corresponding to the least similar predicted class (as determined by an entropy-based layer selection).
- TGCL Objective: The optimization jointly minimizes the convex combination of losses on labelled (few-shot) and complementary-labelled test nodes, thereby forcing the prompts to be optimized with respect to the entire test distribution. Theoretical analysis establishes an excess risk bound dependent on the number of classes and the size of the unlabelled set, making explicit the generalisability benefits of leveraging abundant test data.
Empirical Results
GFMate exhibits strong empirical performance across 12 benchmark datasets spanning social, citation, commercial, and biological networks, in both node- and graph-level classification. Key findings include:
- Superior Accuracy: GFMate achieves up to a 30.63% accuracy improvement over state-of-the-art cross-domain GFM methods in one-shot settings. Performance gains are especially prominent under pronounced distribution shifts and in binary/few-class regimes, aligning with the theoretical generalization analysis.
- Efficiency: The framework significantly reduces downstream adaptation time, GPU memory consumption, and the number of tunable parameters compared to prompt design paradigms requiring prompt parameterization per domain/sample or involving complex fine-tuning procedures.
- Ablation and Robustness: All GFMate modules, including centroid prompt, layer prompt, and TGCL, are essential for peak empirical performance. The method demonstrates robustness to feature and structure noise, pre-training domain shift, and varying few-shot regimes.
- Generalisability: GFMate can be plugged into any GNN-based GFM, regardless of the pre-training objective (e.g., link prediction, contrastive learning, deep graph infomax), and consistently boosts adaptation efficacy.
Theoretical Implications
The adoption of test-time complementary learning yields a tighter excess risk bound as the number of complementary-labelled samples increases or the number of classes decreases, according to Rademacher complexity-based analysis. This property uniquely positions GFMate to benefit from the full spectrum of available test data—a theoretical advantage empirically confirmed by enhanced binary classification results and ablation studies on test data usage.
Practical and Theoretical Impact
Practically, GFMate provides a lightweight, efficient, and general framework for GNN-based GFM adaptation in cross-domain scenarios. It removes dependence on domain similarity assumptions and custom pre-training strategies. Test-time adaptation is accomplished without intrusive re-training or test graph perturbation, making GFMate suitable for deployment atop a wide range of pre-trained generic GFM architectures. The framework's design is not applicable to LLM-based GFMs or text-attributed graphs, suggesting avenues for future work in prompt compatibility across heterogeneous GFM backbones.
Theoretically, GFMate leverages a test-time learning objective founded on robust risk minimization under label noise, extending the formalism of complementary-label learning to domain adaptation for graphs. This bridges test-time training and prompt tuning paradigms, providing both theoretical generalization guarantees and empirical strategies mitigating distributional shift.
Conclusion
GFMate establishes a new paradigm for GFM adaptation in cross-domain graph learning by introducing pre-training-agnostic test-time prompt tuning, actively leveraging unlabelled target domain data. It achieves significant efficiency and accuracy benefits over state-of-the-art methods and is universally applicable to GNN-based GFMs, independent of pre-training protocol. Future research directions entail adaptation to text-attributed and LLM-based GFMs, and extension to new types of graph tasks and architectures.