Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 89 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 15 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 90 tok/s Pro
Kimi K2 211 tok/s Pro
GPT OSS 120B 459 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

ReaL-TG-4B: Explainable Temporal Graph Forecasting

Updated 7 September 2025
  • The paper introduces ReaL-TG-4B, a model fine-tuned with reinforcement learning to deliver accurate link predictions and explicit reasoning traces on temporal graphs.
  • ReaL-TG-4B is a fine-tuned language model that encodes temporal graphs via natural language, employing the T-CGS algorithm to prioritize recent interactions.
  • The model achieves state-of-the-art performance with transparent, human-readable explanations, proving effective in applications like fraud detection and recommendation systems.

ReaL-TG-4B is a LLM fine-tuned for explainable link forecasting on temporal graphs, leveraging a reinforcement learning framework to optimize both predictive accuracy and the quality of model-generated reasoning traces. Developed by applying the Reasoning-Enhanced Learning for Temporal Graphs (ReaL-TG) algorithm to the Qwen3-4B base, ReaL-TG-4B produces competitive predictions with explicit human-readable rationales, surpassing much larger state-of-the-art LLMs on standard temporal graph benchmarks while offering strong interpretability (Ding et al., 31 Aug 2025).

1. Architecture and Graph Encoding

ReaL-TG-4B is constructed by fine-tuning the Qwen3-4B pretrained model to operate on temporal graph link forecasting tasks. Its input prompt is generated from a “temporal context graph” (Gc\mathcal{G}_c), extracted via the Temporal Context Graph Selection (T-CGS) algorithm. T-CGS performs an α\alpha-temporal random walk over the original temporal graph, employing a decay factor β\beta to prioritize recent or relevant interactions for the link prediction query.

The model encodes the induced subgraph as text, verbalizing nodes, edges, and time-stamped interactions together with a natural language question. Prompts systematically instruct the model to output its deductive process within > ... tags and its formal answer within <answer>... </answer> tags, explicitly decoupling reasoning from prediction and exposing the model’s internal rationale.

2. Reinforcement Learning Optimization

ReaL-TG-4B is trained under a reinforcement learning paradigm, where the model acts as a “policy” generating full textual outputs (i.e., both prediction and explicit reasoning trace). The reward function for each rollout is based on the outcome—specifically, the F1 score between the set of nodes predicted in the <answer>...</answer> segments and the set of ground-truth destination nodes:

r(O)=F1({aanswer},{vq})r(O) = F1(\{a_{\text{answer}}\}, \{v_q\})

The optimization employs grouped regularized policy optimization (GRPO), performing update steps at the token level within response groups. The per-token advantage for sample ii at token jj is:

Advi,j=r(Oi)μ({r(Oi)})σ({r(Oi)})\operatorname{Adv}_{i,j} = \frac{r(O_i) - \mu(\{r(O_i)\})}{\sigma(\{r(O_i)\})}

A Kullback-Leibler divergence penalty is included to regularize model updates and prevent deviation from the original LLM priors. The full GRPO objective is given as:

JGRPO(θ)=EQP(Q),{Oi}πθold[1gi=1g()γDKL(πθπref)]\mathcal{J}_{\text{GRPO}}(\theta) = \mathbb{E}_{\mathcal{Q} \sim P(\mathcal{Q}), \{O_i\} \sim \pi_{\theta_{\text{old}}}} \left[ \frac{1}{g} \sum_{i=1}^g ( \ldots ) - \gamma D_{\text{KL}}(\pi_\theta \Vert \pi_{\text{ref}}) \right]

Since the reward is computed exclusively via outcome (not stepwise or via “teacher-forcing”), ReaL-TG-4B is incentivized to self-explore various reasoning strategies, converging toward those that maximize prediction accuracy and explanation quality.

3. Explainability and Evaluation of Reasoning Traces

A primary innovation is the explicit prompting and evaluation of reasoning traces. ReaL-TG-4B is compelled to articulate logical rationales supporting its predictions as part of its generated output. This design “opens the black box” of LLM-generated graph reasoning by coupling answer generation with mandatory explanation.

The model’s explanations are systematically evaluated by dedicated metrics:

  • Faithfulness (δf\delta_f): The proportion of atomic claims supported by the input context.
  • Logical Consistency (δlc\delta_{lc}): A normalized score (0–1) measuring coherence and logical progression within the reasoning.
  • Answer–Explanation Alignment (δa\delta_a): The ratio of predictions explicitly justified by preceding reasoning.

An LLM-as-a-Judge evaluation system (GPT-4.1 mini in reported experiments) is configured with a custom prompt template, automatically scoring each output along these criteria on a per-example basis and aggregating results over the test set.

4. Evaluation Methodology and Benchmark Results

Performance is benchmarked in two major axes: prediction quality and explanation quality.

  • Link forecasting performance is measured with Mean Reciprocal Rank (MRR) and a penalized variant, pMRR, which penalizes the model for over-generation by elevating the scores of false positives appearing outside the ground-truth set.
  • Reasoning trace evaluation uses the LLM-as-a-Judge protocol to assign δf\delta_f, δlc\delta_{lc}, and δa\delta_a scores. These metrics are averaged over multiple datasets for robust comparison.

ReaL-TG-4B achieves strong results, outperforming models with larger parameter counts (including GPT-5 mini and Llama 3.3-70B) on tasks over both seen (wiki, subreddit, coin, flight) and unseen (uci, enron) temporal graph datasets. For example, on the “wiki” dataset, it attains an MRR of 0.824, exceeding the scores of all baseline models of greater or equal scale.

Model Params Wiki MRR
ReaL-TG-4B 4B 0.824
Llama3.3-70B 70B <0.8
GPT-5 mini ? <0.8

This empirical advantage holds on both direct forecasting and structured reasoning evaluation metrics.

5. Real-World Applications

ReaL-TG-4B is specifically designed for deployment in environments requiring explainable inference about dynamic, temporally-evolving relational structures. Representative applications include:

  • Recommendation systems: Where user-item link prediction and generation of model-justified recommendations enhances transparency and user trust.
  • Fraud detection/financial analysis: Application to transaction networks, identifying anomalous or suspicious links with traceable explanations.
  • Social network analysis: Facilitates discovery and exploration of dynamic communities, while rationalizing forecasts of social interactions.

The model’s generalization to unseen graphs, combined with its built-in reasoning trace, advocates its use in high-stakes or safety-critical domains where interpretable AI is mandated.

6. Broader Impact and Significance

ReaL-TG-4B demonstrates that outcome-driven reinforcement learning can surface both high quality predictions and explainable, human-auditable reasoning in LLM-based temporal graph forecasting tasks. Its framework is applicable to any scenario in which dynamic interactions require not only accurate prediction but interpretable rationale. By providing methods and evaluation protocols for LLM-generated reasoning in complex structured domains, it sets a precedent for scalable, explainable, and generalizable reasoning in dynamic graph analysis (Ding et al., 31 Aug 2025).

A plausible implication is that further advances in outcome-based RL frameworks and context selection methods could enable even smaller models to rival much larger LLMs—potentially with enhanced faithfulness and logical consistency—across a broader spectrum of relational reasoning problems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)