Graph Prompt Tuning for Streaming Rec

Updated 25 November 2025

The paper introduces graph prompt tuning that injects learnable prompts into pre-trained GNNs to continuously adapt to streaming data while preserving model stability.
The paper presents a two-stage pipeline where a frozen GNN is coupled with dynamic prompt updates, achieving measurable improvements in Recall and NDCG metrics.
The paper demonstrates that multi-level prompt mechanisms (node-, structure-, view-, and temporal-level) effectively mitigate distribution shifts and reduce computational costs.

Graph prompt tuning for streaming recommendation refers to a paradigm in which learnable or lightweight prompts are injected into a pre-trained graph neural network (GNN) recommender system to enable rapid, continual adaptation to evolving user–item interaction data, while avoiding expensive full-model retraining. The approach addresses the fundamental challenge of distribution shift, data privacy constraints, and the need for both long-term preference modeling and short-term behavioral dynamics in real-world streaming recommendation environments. Notable frameworks such as GPT4Rec and GraphPro exemplify advanced strategies for prompt tuning in this context (Zhang et al., 12 Jun 2024, Yang et al., 2023).

1. Problem Formulation and Rationale

In streaming recommendation, user–item interactions arrive continuously or in mini-batch segments, represented as a sequence of evolving graphs $G = (G_1, ..., G_T)$ , where each $G_t = (V_t, E_t, X_t, A_t)$ captures the set of nodes, edges, features, and adjacency at time $t$ . The challenge is to maintain highly accurate, up-to-date recommendations as the graph evolves, without revisiting large-scale historical data or retraining all model parameters for each update.

Traditional approaches—such as historical data replay, segment-wise fine-tuning, and model expansion—face limitations around computational cost, memory/privacy constraints, and susceptibility to catastrophic forgetting or over-stability. Graph prompt tuning circumvents these barriers by isolating adaptation to a compact set of prompts: these can be parameter-free (e.g., temporal edge weights (Yang et al., 2023)) or learnable (e.g., node/structure/view-level prompt embeddings (Zhang et al., 12 Jun 2024)), which modulate the behavior of a frozen or mostly-frozen GNN encoder.

2. Multi-level Graph Prompt Mechanisms

State-of-the-art approaches employ several prompt types, hierarchically organized to capture fine-grained to global patterns:

Node-level prompts: For each disentangled view of the graph, a pool of learnable prompt vectors is maintained. Each node attends to these prompts, yielding an individualized prompt injection that fuses with its feature representation. This enables local adaptation to evolving user/item attributes.
Structure-level prompts: Prompts at this level encode edge or subgraph connectivity patterns. Each edge aggregates a prompt token via attention over prompt pools, influencing message passing by reweighting neighbor information according to dynamic structural signals.
View-level prompts: After node and structure-level encoding, multiple disentangled graph views are aggregated using cross-view prompts and attention, ensuring that the final node embedding captures diverse interaction semantics (e.g., co-purchase versus co-view) (Zhang et al., 12 Jun 2024).
Temporal prompts: Parameter-free time-based weights are injected at the aggregation stage (e.g., $\alpha_{u,v}$ in GraphPro), giving more influence to recent interactions. This is operationalized via softmax-weighted recency factors in message passing (Yang et al., 2023).
Graph-structural prompts: Supplemental edges or subgraphs constructed from new streaming data are injected as prompt graphs; these are sampled, decayed, or emphasized based on their recency or structural features. Only lightweight gating or prompt modules are updated, while the GNN’s core parameters remain frozen (Yang et al., 2023).

3. Model Architectures and Optimization

Graph prompt tuning systems, such as GPT4Rec and GraphPro, follow a two-stage pipeline:

Pre-training: The base GNN is trained over historical or batched data to capture stable long-term user–item affinities. For GraphPro, this is mixed with temporal prompts to encode time-awareness into every aggregation step, using a normalized recency-based attention coefficient $\alpha_{u,v}$ .
Prompted streaming adaptation: As new interactions arrive ( $\Delta G_t$ ), only the prompts (node, structure, and view-level, or prompt-related gating parameters) are updated. The base GNN weights $\theta$ remain frozen, ensuring knowledge preservation. For each graph snapshot, embeddings are computed by fusing node features with node-level prompts, performing multi-view GNN propagation augmented by structure prompts, and pooling via view-level prompts.

The training objective is typically a Bayesian Personalized Ranking (BPR) loss over newly arrived interactions, optimizing only the prompt parameters by gradient descent. Theoretical analysis demonstrates that prompt tuning achieves at least the expressivity of global fine-tuning, as prompt embeddings can bridge the predictive gap between full retraining and prompt-only adaptation (Zhang et al., 12 Jun 2024).

4. Dynamic Evaluation Protocols and Empirical Results

Evaluation protocols are tailored to streaming dynamics. Datasets are partitioned into temporal snapshots (e.g., by day or week). Pre-training occurs on early segments; subsequent segments are processed sequentially, each yielding a fast prompt-tuned update and evaluation on held-out interactions from the current segment. Key metrics include Recall@20 and NDCG@20, reported per segment and averaged across the stream.

The following table presents benchmark results of GPT4Rec with MGCCF backbones (Zhang et al., 12 Jun 2024):

Dataset	Best Baseline (R@20, N@20)	GPT4Rec (R@20, N@20)	Improvement
Taobao2014	0.1082, 0.0142 (DEGC)	0.1127, 0.0149	+4.16%, +4.93%
Taobao2015	0.4892, 0.0167 (DEGC)	0.5018, 0.0172	+2.58%, +2.99%
Netflix	0.3470, 0.0583 (GraphSAIL)	0.3508, 0.0589	+1.10%, +1.03%
Foursquare	0.1425, 0.0178 (DEGC)	0.1477, 0.0185	+3.65%, +3.93%

GraphPro achieves 10%–20% higher Recall and NDCG than prompt tuning baselines (e.g., GraphPrompt, GPF) and 5%–15% above dynamic GNNs (EvolveGCN-H/O, ROLAND) (Yang et al., 2023). With various backbones (LightGCN, SGL, MixGCF, SimGCL), further improvements of 3%–7% are observed, demonstrating model-agnostic adaptability. Efficiency is also enhanced—fine-tuning per segment requires only 4–5 epochs compared to 8–10 for previous prompt approaches, and matches or exceeds full retraining accuracy at 20–80× speedup.

5. Ablation Studies and Theoretical Guarantees

Ablation experiments across both GPT4Rec and GraphPro demonstrate that each prompt mechanism contributes substantially to accuracy and convergence:

Removing node/structure/view prompts (GPT4Rec) results in marked drops in Recall@20 and NDCG@20, confirming the necessity of multi-level prompting.
Disabling temporal prompts (GraphPro) leads to a 3%–6% nDCG decrease; omitting structural prompts (i.e., prompt graph edges) reduces performance by 4%–8% and slows learning.
Excluding adaptive gating extends convergence time by 50% and reduces accuracy by 2%–4% (Yang et al., 2023).

From a theoretical perspective, prompt tuning is shown to have at least the representation power of full-parameter fine-tuning. Specifically, it can be proven that for a suitably expressive prompt parameterization, tuning only prompts is sufficient to close the objective gap to global fine-tuning (Zhang et al., 12 Jun 2024). This suggests that under adequate model and prompt capacity, prompt-tuned streaming GNNs can match (and sometimes exceed) accuracy of fully retrained models.

6. Robustness, Efficiency, and Limitations

Prompt tuning demonstrates robust performance across distinct GNN backbones (NGCF, LightGCN, MGCCF), with improvements >1% over the best baselines on multiple datasets (Zhang et al., 12 Jun 2024). Only prompt parameters are updated, yielding streaming adaptation at low computational and storage cost—even under large-scale deployments (e.g., 10+ million users, as in GraphPro).

However, performance remains dependent on the underlying backbone GNN’s pre-training quality. Hyperparameters governing prompt pool sizes and number of disentangled views require careful tuning; beyond moderate values (e.g., prompt pool sizes 32–64 and number of views $K\approx4$ –8), returns diminish. High-velocity or low-latency streaming scenarios may necessitate further methods, such as dynamic prompt pruning. While GraphPro employs temporal prompts, GPT4Rec identifies the integration of temporal-encoding prompts as a direction for further research (Zhang et al., 12 Jun 2024).

7. Outlook and Research Directions

The emergence of graph prompt tuning has established a new paradigm for streaming recommendation, balancing efficiency, expressivity, and resistance to catastrophic forgetting. Notable strengths include:

Data-agnostic prompt tuning, decoupled from raw data replay and compliant with privacy constraints.
Isolation of adaptation, preserving core GNN knowledge and ameliorating over-stability and forgetting.
Multi-level and multi-view prompt design, synthesizing global and local user–item distribution shifts into the model's representations.

Current limitations motivate several directions: generalization to faster streams, automated prompt pool management, enhancing temporal prompt mechanisms, and further backbone pre-training. These avenues are poised to drive future advances in continual, privacy-aware, and robust graph-based recommendation systems (Zhang et al., 12 Jun 2024, Yang et al., 2023).