Papers
Topics
Authors
Recent
Search
2000 character limit reached

Temporal Graph Sequential Recommender (TGSRec)

Updated 25 March 2026
  • TGSRec is a model that integrates temporal graphs, attention mechanisms, and collaborative signals to predict future user-item interactions.
  • It employs continuous-time bipartite graph construction and learnable temporal embeddings to efficiently capture evolving user preferences and item dynamics.
  • The transformer-based TCT layer with multi-head attention yields significant gains in recall and MRR over previous recommendation methods.

The Temporal Graph Sequential Recommender (TGSRec) is a model class for sequential recommendation tasks that combines temporal graph structures, attention-based deep learning, and explicit modeling of temporal collaborative and sequential patterns. TGSRec was introduced to address the challenge of jointly capturing evolving user preferences, complex item dynamics, and collaborative signals within temporally structured user-item interaction data. The TGSRec framework includes the construction of continuous-time bipartite graphs, dedicated time and collaborative-aware embeddings, and a specialized transformer architecture, yielding state-of-the-art performance for recommending items at arbitrary future timestamps (Fan et al., 2021).

1. Problem Setting and Graph Construction

TGSRec formulates the sequential recommendation problem as learning to predict future user-item interactions, where user behavior evolves over continuous time. The interaction data are modeled as a continuous-time bipartite graph (CTBG)

B=(U,I,ET)\mathcal B = (\mathcal U, \mathcal I, \mathcal E_T)

with user set U\mathcal U, item set I\mathcal I, and edge set ETU×I×R+\mathcal E_T \subset \mathcal U \times \mathcal I \times \mathbb R^+, where each edge is an interaction (u,i,t)(u,i,t) signifying that user uu interacted with item ii at timestamp tt. For each user uu, the model must, at any query time tt, rank all items iIIu(t)i\in\mathcal I\setminus\mathcal I_u(t) (the set of items not yet interacted with by uu up to tt), so that the actual item selected at tt is highly ranked. This continuous-time, inductive setting requires models to both memorize long-term evolving interests and generalize to unseen query times.

2. Temporal Embeddings and Representation Mechanisms

In TGSRec, each user uu and item ii is associated with a learnable long-term embedding vector,

Eu,EiRdE_u,\, E_i \in \mathbb R^d

stored in a joint embedding table ERd×(U+I)E\in\mathbb R^{d \times (|\mathcal U| + |\mathcal I|)}. Time is encoded by mapping any real-valued tR+t \in \mathbb R^+ to a vector Φ(t)RdT\Phi(t) \in \mathbb R^{d_T} using a learnable multi-frequency trigonometric kernel: Φ(t)=1dT[cos(ω1t),sin(ω1t),,cos(ωdTt),sin(ωdTt)]\Phi(t) = \sqrt{\tfrac1{d_T}}\,\left[\cos(\omega_1 t),\,\sin(\omega_1 t),\,\dots,\,\cos(\omega_{d_T} t),\,\sin(\omega_{d_T} t)\right]^\top where the frequencies ωi\omega_i are learned. This encoding, via Bochner's theorem, yields a translation-invariant kernel Φ(t1)Φ(t2)\Phi(t_1)^\top\Phi(t_2) that quantifies temporal similarity.

3. Temporal Collaborative Transformer (TCT) Architecture

The TGSRec core architecture is the Temporal Collaborative Transformer (TCT) layer, which generalizes self-attention to jointly encode:

  • Sequential patterns in user-item trajectories;
  • Temporal proximity via explicit time-kernelization;
  • Collaborative signals via query–key composition of user and item embeddings.

Each TCT layer ll receives, for every node xx (user or item) at time tt, a temporal embedding ex(l1)(t)e_x^{(l-1)}(t) concatenated with its time encoding: hx(l1)(t)=ex(l1)(t)Φ(t)Rd+dTh_x^{(l-1)}(t) = e_x^{(l-1)}(t)\, \Vert\, \Phi(t) \in \mathbb R^{d + d_T} For a query user uu at time tt, a fixed number SS of their past interactions Nu(t)={(i,ts)(u,i,ts)ET,ts<t}\mathcal N_u(t) = \{(i, t_s) \mid (u, i, t_s) \in \mathcal E_T, t_s < t \} are sampled as neighbors. Attention is computed via: αu,t(i,ts)=exp(qK:jd+dT)mexp(qK:md+dT)\alpha_{u,t}(i, t_s) = \frac{\exp\left(\frac{q^\top K_{:j}}{\sqrt{d+d_T}}\right)}{\sum_m \exp\left(\frac{q^\top K_{:m}}{\sqrt{d+d_T}}\right)} where q=Wq(l)hu(l1)(t)q = W_q^{(l)} h_u^{(l-1)}(t), KK is the stack of Wk(l)hi(l1)(ts)W_k^{(l)} h_i^{(l-1)}(t_s), and VV is the stack of Wv(l)hi(l1)(ts)W_v^{(l)} h_i^{(l-1)}(t_s). This structure captures both collaborative affinity (euei)(e_u^\top e_i) and temporal proximity (Φ(t)Φ(ts))(\Phi(t)^\top\Phi(t_s)). The aggregated message is given by

mu(l)(t)=(i,ts)Nu(t)αu,t(i,ts)[Wvhi(l1)(ts)]m_u^{(l)}(t) = \sum_{(i, t_s) \in \mathcal N_u(t)} \alpha_{u,t}(i, t_s) \left[W_v h_i^{(l-1)}(t_s)\right]

and fused with the query into an updated embedding via a two-layer feed-forward network. Multi-head and stacked layers support higher-order temporal and collaborative dependencies.

4. Prediction Layer, Training Objectives, and Optimization

Recommendation at time tt involves ranking items ii according to the bilinear similarity between the final temporal user and item embeddings: r(u,i,t)=(eu(L)(t))ei(L)(t)r(u, i, t) = \left(e_u^{(L)}(t)\right)^\top e_i^{(L)}(t) Losses are computed using either the Bayesian Personalized Ranking (BPR) loss,

LBPR=(u,i,j,t)lnσ(r(u,i,t)r(u,j,t))+λΘ22\mathcal L_{BPR} = -\sum_{(u,i,j,t)} \ln \sigma(r(u,i,t) - r(u,j,t)) + \lambda \lVert\Theta\rVert_2^2

or a binary cross-entropy loss. For each positive (u,i,t)(u,i,t) interaction, a negative item jj (not yet seen by uu by time tt) is sampled. Model parameters Θ\Theta are updated via mini-batch Adam optimization, with on-the-fly negative sampling and standard regularization.

5. Experimental Protocol and Results

Empirical validation of TGSRec (Fan et al., 2021) was conducted on five datasets: four Amazon categories (“Toys”, “Baby”, “Tools”, “Music”) and MovieLens-100K. All datasets are split chronologically (80%/10%/10%) and have high sparsity (e.g., “Toys”: 0.07%\sim0.07\% density, mean inter-event interval \sim85 days). Baselines included static collaborative filtering (BPR, LightGCN), temporal graph models (CTDNE), RNN-based and graph-based sequential models (FPMC, GRU4Rec, Caser, SR-GNN), and Transformer-based methods (SASRec, BERT4Rec, SSE-PT, TiSASRec). Evaluation metrics comprised Recall@10, Recall@20, MRR, and NDCG@10, ranking each target item among 1,000 randomly sampled negatives per test query using Krichene & Rendle's unbiased estimator.

Across all datasets, TGSRec yielded large absolute improvements over strong baselines. Averaged over five sets, TGSRec achieved 22.5% absolute gain in Recall@10 and 22.1% in MRR compared to the best previous model. For example, on “Toys”, TGSRec achieved Recall@10=0.3650 and MRR=0.3661, exceeding SASRec's Recall@10=0.1452 and MRR=0.0732. Ablation studies confirmed that both the learned time kernel and collaborative attention are critical; stacking two TCT layers provides further improvements over a single layer.

6. Key Insights and Significance

TGSRec demonstrates that fusing sequential modeling, collaborative filtering, and dedicated continuous-time mechanisms yields substantial gains in sequential recommendation accuracy. The model's structure allows:

  • Simultaneous encoding of dynamic user preferences and collaborative signals;
  • Robust performance in sparse, irregular, and highly dynamic environments;
  • Generalization to arbitrary query timestamps, not limited to observed discrete intervals.

Ablation results highlight the necessity of (a) learned temporal embeddings, (b) collaborative attention in the TCT, and (c) stacking attention layers for hierarchical aggregation. Removing any component significantly degrades performance. These findings confirm that merely applying Transformer-style architectures to timestamped graphs is insufficient without explicit temporal and collaborative integration.

7. Relationship to Broader Research and Extensions

TGSRec’s principle of unifying temporal, sequential, and collaborative modeling in inductive, graph-based architectures is consistent with trends in sequential recommendation and dynamic graph learning. Contemporary methods such as Time-Guided Graph Neural ODEs (TGODE) introduce adaptive time-aware augmentation and joint ODE-driven graph evolution to further close the gap with irregular and long-term drift in real data (Fu et al., 23 Nov 2025). TGODE, while related, utilizes a diffusion-based graph augmenter and continuous graph ODE, whereas TGSRec’s innovation centers on attention-based encoding within temporal graphs.

TGSRec constitutes a general, flexible, and scalable solution for sequential recommendation in temporally rich domains, providing a blueprint for subsequent models and analyses targeting fine-grained, dynamically evolving user-item interaction systems.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Temporal Graph Sequential Recommender (TGSRec).