Sequential Recommender Systems

Updated 16 May 2026

Sequential recommender systems are algorithmic frameworks that predict users' next actions by explicitly modeling the temporal order of interactions.
They utilize diverse architectures such as RNNs, attention mechanisms, and graph-based models to capture both short-term intent and long-term preference evolution.
Recent innovations leverage data augmentation, semantic integration, and efficiency improvements to address challenges like data sparsity and scalability.

Sequential Recommender Systems (SRSs) are algorithmic frameworks designed to predict a user's next action or item of interest based on their historical sequence of interactions. Distinct from traditional (non-sequential) recommender systems that ignore temporal order, SRSs explicitly model the order and dynamics of user behaviors, capturing both short-term intent and long-term preference evolution. This focus on temporal dependencies enables SRSs to provide more contextually aware and adaptive recommendations across domains such as e-commerce, streaming, and online advertising (Dai et al., 25 Nov 2025, Liu et al., 2024, Wang et al., 2019, Wang et al., 2022).

1. Formal Definition and Data Characteristics

Formally, an SRS operates on a user–item interaction log where each user $u$ is associated with a chronologically ordered sequence:

$S_u = (i_1, i_2, \ldots, i_T), \quad i_t \in \mathcal{I}$

Here, $i_t$ represents the $t$ -th item the user engaged with, and $\mathcal{I}$ is the item set. The canonical task is next-item prediction: given $S_u$ , estimate the distribution or ranking $P(i_{T+1} | S_u)$ over candidates (Dai et al., 25 Nov 2025, Betello et al., 2024, Wang et al., 2022).

Key data characteristics include:

Order sensitivity: The interaction order reveals recency bias, drift, and co-visitation motifs; order-agnostic models miss these effects (Klenitskiy et al., 2024).
Variable-length sequences: Sequences range from sparse (cold-start) to very long, creating computational and modeling challenges (Liu et al., 2024).
Multi-behavior heterogeneity: Real-world datasets often record multiple interaction types (e.g., clicks, carts, purchases), with imbalanced frequency and complementary information (Cho et al., 2023).
Data sparsity and long-tail distributions: Most users and items have few interactions, exacerbating the cold-start and long-tail challenges (Dai et al., 25 Nov 2025, Liu et al., 2024, Sun et al., 16 Mar 2025).

2. Model Architectures and Methodological Paradigms

SRS model design has advanced across several paradigms, summarized below (Wang et al., 2019, Wang et al., 2022, Betello et al., 2024):

Paradigm	Modeling Principle	Focal Limitation
Markov Chain	Next item depends on last L items	Captures only short-range context
RNN-based	Hidden state encodes all past items	Hard to model non-adjacency/cold
Attention/Transformer	Item–item dependencies via attention	Quadratic cost in sequence length
Graph-based	High-order transition motifs via GNNs	Session-level, expensive graphs
Mixture/Hybrid	Fusing multi-range or multi-modal signals	Complexity/regularization
Multi-behavior	Separate/fused encoders per behavior	Scalability with behavior types
RL-based	Treats recommender as policy/MDP	Large action space, sparse reward

For example, SASRec uses stacked self-attention to model both short- and long-term dependencies, while BERT4Rec employs bidirectional masking for denoising and missing-item prediction (Liu et al., 2024, Betello et al., 2024). DyMuS and DyMuS+ explicitly model multi-behavior sequences using capsule routing to disentangle interest signals across behavior types, with demonstrable gains on large e-commerce datasets (Cho et al., 2023).

Recent approaches integrate content features (e.g., TSSR’s two-stream ID/content architecture with hierarchical contrastive alignment (Cheng et al., 2024)) or leverage LLM-generated world knowledge for semantic augmentation (e.g., GRASP (Dai et al., 25 Nov 2025), LLMEmb (Liu et al., 2024), LLMSeR (Sun et al., 16 Mar 2025)).

3. Data Augmentation, Robustness, and Efficiency

Addressing noise, sparsity, and computational constraints is central to recent SRS advances.

Data Augmentation

Augmentation strategies (noise injection, redundancy, masking, synonym replacement) expand the training space, particularly effective under low data regimes, providing up to 28% NDCG@10 boost when only 10% of data is used. Augmenting with direct sequence manipulation is more effective than earlier subset-splitting approaches, especially for models prone to overfitting under limited supervision (Song et al., 2022, Sun et al., 16 Mar 2025).

Robustness

SRSs are highly sensitive to recent items in the sequence: removal of tail interactions degrades NDCG by up to 60%, while perturbing the prefix or middle has minimal effect. This exposes vulnerability to adversarial behaviors, motivating robustness-oriented data augmentation, recency-aware losses, and adversarial training (Betello et al., 2023). Evaluation metrics such as Finite Rank-Biased Overlap (FRBO) better reflect rank-list consistency under perturbations.

Efficiency

Quadratic attention cost in Transformers inhibits scaling to long sequences. LinRec introduces L₂-normalized linear attention, reducing memory and time from $\mathcal{O}(N^2)$ to $\mathcal{O}(N)$ , with negligible or even positive impact on accuracy. This efficiency unlocks the modeling of extended interaction sequences without prohibitive resource demand (Liu et al., 2024). Light architectures such as GLINT-RU further accelerate inference through dense selective gating and hybrid GRU/attention blocks (Zhang et al., 2024). Model compression (CpRec) achieves 4–8 $\times$ reduction in memory with maintained or improved quality by exploiting item-frequency heterogeneity and block-wise parameter sharing (Sun et al., 2020).

4. Integration of Side Information and World Knowledge

Single-modality (ID-based) SRSs are insufficient to bridge semantic gaps, especially in long-tail recommendation or cold-start settings. Recent innovations enhance SRSs with external knowledge and multi-modal content:

LLM-augmented SRSs: GRASP generates rich, user/item-level natural language descriptions with LLMs, encodes these into semantic embeddings, and employs holistic multi-level attention to fuse ID, content, and similar-user/item contexts, outperforming all prior LLM-based SRS modules and exhibiting particular efficacy for tail users/items (Dai et al., 25 Nov 2025).
LLMEmb: LLM-derived textual embeddings undergo supervised contrastive fine-tuning and collaborative adaptation—injecting both semantic affinity and collaborative proximity—which materially improves tail-item hit rate (e.g., 158% on Yelp) and is backbone-agnostic (Liu et al., 2024).
TSSR: Hierarchical contrastive learning aligns ID and content-based streams to close the semantic gap and empower content-based cold-start generalization (Cheng et al., 2024).

A plausible implication is that fusing semantic, collaborative, and behavioral signals under robust, contrastively aligned architectures is essential for next-generation SRSs, especially under high sparsity.

5. Multi-Behavior, Loss Engineering, and Explainability

Modern SRS research recognizes the necessity of multi-faceted preference modeling and the limits of one-hot, next-item losses:

Multi-behavior modeling: DyMuS+ encodes per-behavior sequences with dynamic item-level routing, capturing inter-behavior heterogeneity and correlations, yielding up to 40% relative NDCG@10 improvement over prior multi-behavior architectures (Cho et al., 2023).
Relevance-aware losses: Incorporating multiple plausible future items and weighting their relevance in binary cross-entropy (linear, power, exponential schedules) enhances robustness to noise such as accidental clicks, improving NDCG by up to 1.6% under realistic, multi-positive evaluation protocols (Bacciu et al., 2023).
Explainability: Recently, counterfactual explanations for SRSs have been proposed via genetic algorithms designed for discrete sequences. Generating minimal-edit counterfactuals is NP-complete; efficient approximations generate “what-if” histories that illuminate model reasoning and foster user/system trust (Scarcelli et al., 5 Aug 2025).

6. Datasets, Evaluation, and Reproducibility

Rigorous SRS assessment requires:

Dataset audit: Many popular benchmarks have weak sequential structure—randomly shuffling user histories results in minimal NDCG drop; such datasets may not truly reward sequence modeling (Klenitskiy et al., 2024). Reporting drop in HR/NDCG and top-K Jaccard after sequence shuffling is advised to expose dataset “sequence strength.”
Evaluation protocols: Standard practice is leave-one-out next-item prediction with ranking metrics such as Hit Rate@K, NDCG@K, and MRR. Contemporary evaluation advocates for multi-future-item protocols and fine-grained robustness analysis (Bacciu et al., 2023, Betello et al., 2023).
Reproducibility: There is high sensitivity to experimental design—preprocessing, initialization, hyperparameters, and negative sampling—impacting ranking and claims of “state-of-the-art.” GRU4Rec can outperform SASRec at low parameter counts; large models favor attention-based architectures (Betello et al., 2024). To ensure fair comparison, uniform frameworks (e.g., EasyRec) and full code/config/seed publication are necessary.

7. Emerging Trends and Future Directions

Ongoing and emerging research thrusts include:

Self-supervised and contrastive pretraining: Masked prediction and sequence-level contrastive objectives to leverage unlabeled data and regularize sparse interactions (Cheng et al., 2024, Liu et al., 2024).
Reinforcement learning: Framing SRS as a sequential decision process to optimize long-term user engagement via policy-gradient or multi-objective sampling (Zhang et al., 2023, Wang et al., 2022).
Adaptive data-centric methods: Automatic behavior subsampling (AutoSAM) dynamically discerns the most informative parts of history, optimizing not only for next-item accuracy but for sequence coherence and model generalization (Zhang et al., 2023).
Scalability to massive catalogs: Hybrid block-wise and hierarchical architectures merge memory–compute optimization with expressive power (Sun et al., 2020, Liu et al., 2024).
Responsible recommendation: Fairness, explanation, privacy, and robustness to adversarial perturbations (Scarcelli et al., 5 Aug 2025, Betello et al., 2023).

In summary, SRSs have evolved into a mature field leveraging advanced neural architectures, robust data-centric engineering, and principled evaluation. Continual innovation in semantic augmentation, efficient modeling, adaptive sampling, and robust loss design pushes the boundaries of sequential recommendation in both academic and industrial contexts.