Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 82 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 17 tok/s Pro
GPT-4o 107 tok/s Pro
Kimi K2 174 tok/s Pro
GPT OSS 120B 468 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

ReaL-TG: Reasoning-Enhanced Temporal Graphs

Updated 7 September 2025
  • The paper introduces a novel QA framework that reformulates temporal link prediction to produce predictions with detailed reasoning traces via RL fine-tuning.
  • ReaL-TG employs an α-temporal random walk to construct context subgraphs, promoting efficient prediction and seamless transferability to new graphs.
  • Prediction performance is measured using metrics like MRR/pMRR, while explanation quality is verified through LLM-based and human evaluations.

Reasoning-Enhanced Learning for Temporal Graphs (ReaL-TG) is an advanced paradigm targeting explainable and accurate link forecasting on temporal graphs by leveraging LLMs fine-tuned through reinforcement learning. It reframes temporal link prediction as a question answering (QA) problem, enabling explicit generation of natural language reasoning traces alongside predictions. The core motivation is to combine efficient prediction, transferability to new graphs, and human-interpretable explanations, overcoming the limitations of both neural temporal graph models (which often lack explainability and adaptability) and earlier LLM-based methods (which seldom operate on real-world TGs and neglect reasoning trace evaluation) (Ding et al., 31 Aug 2025).

1. Problem Setting and Core Framework

The ReaL-TG framework addresses temporal link forecasting: given a temporal graph comprising nodes (entities), time-stamped edges (interactions), and a future prediction query (e.g., “What nodes will uqu_q connect to at tqt_q?”), the objective is to predict the relevant destination nodes with explanations, based solely on the observed historical structure and timestamps.

The workflow consists of:

  • Temporal Context Graph Selection (T-CGS): Construction of a task-specific context subgraph using an α\alpha-temporal random walk with a decay factor β\beta, which biases sampling toward recent interactions. This subgraph acts as the reasoning context for the LLM.
  • Prompt Construction: The selected subgraph’s edges are verbalized into textual descriptions and combined with a QA-style query. The prompt format includes sections specifically for the model’s reasoning trace (delimited by > ...</think>) and the final answer (<answer>...</answer>).

    • Self-Explaining Inference: The LLM, fine-tuned by RL, outputs both its prediction (target nodes) and a human-readable justification describing the causal or heuristic rationale based on the context subgraph.

    A typical prompt is designed as:

    "Given the following graph (links, timestamps): ... <think> Reasoning about which nodes will ... at tqt_q ... <answer> ... </answer>"

The overall process enables transfer to previously unseen graphs and direct evaluation of explanation quality, facilitating broader evaluation and practical deployment.

2. Reinforcement Learning Fine-Tuning

Fine-tuning is carried out under a reinforcement learning (RL) framework, where the LLM is prompted to maximize a reward signal reflecting both prediction performance and explanation quality. Key aspects include:

  • Outcome-Based Reward: The scalar reward is the F1 score between the predicted node set and ground-truth destination nodes for each query, ensuring a balance of precision and recall in predictions.

  • Grouped Regularized Policy Optimization (GRPO): Each full rollout (the concatenated explanation and answer) is used to compute a per-token advantage:

Advi,j=r(Oi)μ({r(Oi)}i=1g)σ({r(Oi)}i=1g)\text{Adv}_{i,j} = \frac{r(O_i) - \mu(\{r(O_i)\}_{i=1}^g)}{\sigma(\{r(O_i)\}_{i=1}^g)}

where OiO_i denotes one rollout for group gg. The RL update objective includes a KL-regularization term:

JGRPO(θ)=EQP(Q),{Oi}πθold[...]γDKL(πθπref)\mathcal{J}_{\mathrm{GRPO}}(\theta) = \mathbb{E}_{\mathcal{Q} \sim P(\mathcal{Q}), \{O_i\} \sim \pi_{\theta_\text{old}}} \left[ ... \right] - \gamma D_\mathrm{KL}(\pi_{\theta} \| \pi_\mathrm{ref})

This method encourages exploration and prevents divergence from the pre-trained base LLM.

  • Self-Exploration of Reasoning Strategies: RL incentivizes the LLM to discover, via trial and error, both effective solution strategies (e.g., recency heuristics, high-degree bias, pattern detection) and clear, faithful rationales.

ReaL-TG makes the forecasting process directly interpretable:

  • Each output contains (A) a reasoning trace which justifies the prediction with reference to the temporal context and (B) the predicted answer set.

  • The LLM is prompted to explicitly describe its “chain of reasoning,” such as recent interaction frequency, temporal patterns, or inferred causality.

  • This approach yields not only high accuracy but also transparent, auditable predictions—a property considered essential in decision support, security, and scientific domains.

4. Evaluation Protocol: Ranking and Explanation Metrics

ReaL-TG introduces a unified protocol to evaluate both the “what” (prediction) and the “why” (reasoning trace):

  • Ranking Metrics: Mean Reciprocal Rank (MRR) for prediction accuracy, and a “penalized MRR” (pMRR) that assigns extra penalty (e.g., rank 1.1 for each spurious node) to outputs with over-generation.

  • Quality of Explanation: Reasoning traces are assessed using an LLM-as-a-Judge protocol (e.g., GPT-4.1 mini), scoring:

    • Faithfulness to the context (does the explanation accurately reference the provided subgraph?)
    • Logical Consistency (is the reasoning coherent and non-contradictory?)
    • Alignment between explanation and answer (does the stated logic actually support the prediction?)
  • Human Evaluation: Selected outputs are also reviewed for validity, consistency, and absence of hallucination.

Performance results show that RL-fine-tuned ReaL-TG-4B (Qwen3-4B base) outperforms much larger frontier LLMs (such as GPT-5 mini, Gemma, and Llama 3.3-70B) in both MRR/pMRR and explanation quality on both seen and new (unseen) real-world TGs. However, using a stronger base model can further improve logical consistency and justification in reasoning traces (Ding et al., 31 Aug 2025).

5. Implications, Advantages, and Limitations

The ReaL-TG approach offers several concrete innovations:

  • Cross-Graph Generalization: Unlike temporal graph neural networks (TGNNs), which require retraining for each new graph, the LLM-based QA formulation immediately transfers across graphs.
  • Human-Readable Explanations: Directly produces explanations, supporting trust and supporting downstream auditing or decision-making.
  • Efficiency: Single-pass inference per query (versus per-candidate for traditional link prediction).
  • Research Utility: The combined protocol supports systematic comparative evaluation of both prediction skill and interpretability—highlighting areas (e.g., logical consistency in explanations) needing further progress.

Potential limitations include context window size (restricting the subgraph size that can be modeled at once), and residual hallucinations or inconsistencies in model-generated explanations. Incorporating larger LLMs, improved subgraph selection, and advanced prompt engineering may mitigate such issues.

6. Future Directions

Advances suggested include:

  • Scaling to More Powerful LLMs: Applying the framework to models beyond Qwen3-4B, with richer language understanding and reasoning capacity, is expected to further improve both predictive and explanatory quality.
  • Enhanced Context Encoding: Adaptive context selection or hierarchical graph compression could allow handling of even larger, more complex temporal graphs.
  • Hallucination Mitigation: Incorporating stricter checking or adversarial training to penalize unsupported statements within explanations.
  • Expansion to Downstream Tasks: Beyond link prediction, ReaL-TG principles may inform explainable node classification, event forecasting with justification, and more.

This line of research demonstrates that RL-fine-tuned LLMs, equipped with outcome-aligned self-exploration and appropriate prompt structures, can yield both state-of-the-art temporal link prediction and human-interpretable, verifiable reasoning on real-world TGs, paving the way for transparent and deployable reasoning-augmented temporal graph learning (Ding et al., 31 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)