Cross-Domain Sequential Recommendation

Updated 14 October 2025

Cross-Domain Sequential Recommendation (CDSR) is a framework that predicts user interactions by integrating sequential behavior data across multiple domains, addressing issues like data sparsity and negative transfer.
Recent methodologies employ dual dynamic graphs, joint graph neural network and transformer encoders, and multimodal fusion to effectively capture both intra-domain and cross-domain dependencies.
Robust training strategies such as contrastive losses, game-theoretic weighting, and diffusion-based generative models are central to CDSR, ensuring reliable performance in open-world and data-sparse scenarios.

Cross-Domain Sequential Recommendation (CDSR) focuses on predicting future user interactions by leveraging sequential behavior data from multiple domains—such as books, movies, e-commerce, or financial services. Unlike traditional single-domain sequential recommendation, CDSR aims to model and transfer both intra-domain (within-domain) and inter-domain (cross-domain) signals, addressing challenges of data sparsity, cold-start effects, and preference drift that arise in heterogeneous, multi-platform environments. The field has witnessed rapid methodological development, including advanced representation learning with dynamic graphs, contrastive mutual information objectives, game-theoretic approaches for negative transfer mitigation, robust modeling for non-overlapping users, item-level alignment, and multimodal fusion with LLM enhancements.

1. Problem Formalization and Challenges

Formally, CDSR models user–item interactions as a four-dimensional tensor $\Gamma \in \mathbb{R}^{n \times m \times s \times k}$ , where $n$ is users, $m$ is items (across all $k$ domains), and $s$ is time. The recommendation objective is to estimate $P(\hat{i}~|~\Gamma)$ for candidate item $\hat{i}$ , given all available cross-domain histories (Chen et al., 10 Jan 2024).

CDSR introduces significant challenges:

Retention vs. Transfer: Balancing the retention of domain-specific preferences with the integration of cross-domain knowledge is non-trivial. Naïve aggregation may wash out domain-local signals.
Data Sparsity and Cold Start: Many users interact in only one or a few domains, and item sequences are often short, limiting the learnability of preference models, especially for non-overlapping users (Li et al., 17 Oct 2024).
Negative Transfer: The introduction of unrelated or conflicting domain signals can degrade target-domain performance, particularly when domains are weakly related or user intents diverge (Park et al., 2023, Bian et al., 25 Jan 2025, Park et al., 15 Jul 2024).
Prediction Mismatch: Misalignment between current tokens and prediction targets across domains leads to inappropriate knowledge transfer and degraded accuracy (Bian et al., 25 Jan 2025).
Domain Heterogeneity and Noise: Varying item types, semantics, and interaction intent across domains mean that collaborative signals may not always be beneficial; sequential noise further exacerbates robustness issues (Ye et al., 30 Aug 2025).

2. Representation Learning and Sequence Modeling Approaches

Recent CDSR frameworks utilize advanced architectures to jointly capture intra-domain and cross-domain dependencies:

Dual Dynamic Graphs with Attention Integration: DDGHM constructs local graphs for domain-specific behavior and a global graph for the merged cross-domain sequence. A fuse attentive gating mechanism (FAG) adaptively integrates structure from both graphs at the node-feature and sequence levels, solving the intra/inter-domain transition challenge (Zheng et al., 2022).
Joint Graph Neural and Self-Attentive Encoders: C²DSR uses a GNN to mine cross-domain collaborative relationships (inter-sequence) and a sequential attentive (transformer-style) encoder for domain-local dependencies. The model learns single- and cross-domain representations in parallel, exposing both types of signals to joint training (Cao et al., 2023).
Domain-Hybrid Hierarchical Modeling: Several works adopt hybrid (merged) sequences alongside domain-local sequences, applying multi-head self-attention at multiple abstraction levels, or using hierarchical embeddings to encode both fine-grained (item-level) and coarse-grained (category-level) signals (Park et al., 2023, Wu et al., 21 Apr 2025).

A pattern of optimization arises: representations learned for each domain are either fused at the sequence level (concatenation, multi-level fusion) or interact through weighted graph, attention, or denoising mechanisms, emphasizing context-dependent knowledge flow.

3. Knowledge Transfer, Alignment, and Negative Transfer Mitigation

A principal direction in CDSR research is the transfer of user/item knowledge across domains with careful regulation:

Hybrid Metric Training: DDGHM's hybrid objective combines cross-entropy, collaborative metric losses (for representation alignment), and contrastive losses (for instance uniformity) with domain-level weighting (Zheng et al., 2022).
Contrastive Mutual Information Maximization: C²DSR introduces a contrastive cross-domain infomax objective, maximizing mutual information between single- and cross-domain representations. This explicitly aligns user preference vectors from two spaces, enforcing correlation where behaviors exhibit true complementarity (Cao et al., 2023).
Partial Alignment of Item Representations: CA-CDSR addresses item-level misalignment by decomposing global (joint) embedding spaces and only partially aligning spectral components identified as transferable via an adaptive spectrum filter, preventing over-alignment and avoiding negative transfer (Yin et al., 21 May 2024).
Game-Theoretic Loss Rebalancing: CGRec models inter-domain transfer as a cooperative game, assigning loss weights to each domain proportional to its marginal (Shapley) contribution; domains with negative influence (i.e., causing negative transfer) receive reduced influence during joint training (Park et al., 2023).
Task-Guided Sequence Alignment and Invariant Adaptation: ABXI eschews timestamp alignment in favor of matching prediction targets across domain-specific and cross-domain sequences. Domain-invariant LoRA modules extract shared interests, and task-guided supervision prevents tokens from one domain affecting prediction in another (Bian et al., 25 Jan 2025).

The above approaches are mathematically formalized with composite loss functions, mutual information terms, margin-based metric learning, and robust probabilistic inference modules.

4. Robustness under Open-World and Data-Sparse Scenarios

Several frameworks address real-world “open-world” characteristics, where few users overlap and data distributions shift:

Multi-Interest Grouping and Doubly Robust Estimation: AMID (Adaptive Multi-Interest Debiasing) constructs interest groups spanning both overlapping and non-overlapping users to propagate information even when user participation is partial. Its doubly robust estimator (DRE) corrects for selection bias using learned propensities and imputed errors, reducing bias and variance relative to standard IPS (Xu et al., 2023).
Contrastive Denoising with Auxiliary Behaviors: MACD leverages auxiliary (e.g., click) behaviors in addition to sparse target interactions. Dedicated denoising modules, fusion gates, contrastive losses, and an inductive representation generator enable robustness—especially for long-tailed and cold-start users (Xu et al., 2023).
Neural Process–Based Meta-Learning: CDSRNP applies meta-learning, treating overlapped users as support and non-overlapped as query. Latent distribution alignment between these sets, guided by neural processes, permits transfer of cross-domain knowledge even to non-overlapped users (Li et al., 17 Oct 2024).
Latent Disentanglement with VAE: i²VAE incorporates mutual information–based disentangling and denoising regularizers within a VAE. Transferable cross-domain latent variables enable effective augmentation for long-tailed and cold-start scenarios (Ning et al., 31 May 2024).

Empirically, these modules are shown to yield robust improvement across user groups and exposure/click-through/conversion metrics in both offline and A/B online experiments.

5. Multimodal and LLM-Enhanced Fusion

A recent theme is the fusion of multimodal features (image, text, tags) and LLM-generated semantic signals to create enhanced item and user representations:

CLIP-Driven Image and Text Integration: IFCDSR and HAF-VT exploit frozen CLIP encoders for both image and text features. These modalities, fused with ID embeddings via hierarchical or multiple-attention mechanisms, enable the framework to model subtle visual/textual cues alongside sequential ID-based histories (Wu et al., 31 Dec 2024, Wu et al., 21 Apr 2025).
LLM-Augmented Representations: LLM4CDSR integrates pre-trained LLM embeddings (prompted with structured templates) both at the item-level (semantic unification across domains) and via hierarchical user preference summarization. A trainable adapter with contrastive regularization brings LLM features into alignment with the collaborative filtering objective (Liu et al., 25 Apr 2025, Wu et al., 22 Jun 2025).
Tag-Enriched Multi-Attention: TEMA-LLM uses LLMs for domain-aware tag generation, combining tag embeddings with image, text, and ID representations. A multi-attention mechanism enables each modality, including tags, to contribute distinctly to the prediction, improving traceability and accuracy (Wu et al., 10 Oct 2025).

Tables of results consistently show gains in standard metrics (MRR, NDCG@k, HR@k) when moving from uni-modal, sequential-only models to such multimodal, semantically enriched frameworks.

6. Diffusion-Based Generative and Denoising Modeling

Diffusion models have recently been employed for robust representation learning in CDSR:

Dual-Oriented Diffusion and Align-for-Fusion: HorizonRec avoids naïve align-then-fuse by constructing a mixed-conditioned distribution retrieval strategy, retrieving behavior-guided noise from global (mixed) domains. Dual-oriented diffusion modules denoise source and target domain representations under this mixed supervision, leading to noise suppression and target-aware preference fusion with robustness guarantees (Zha et al., 7 Aug 2025).
Disentangled Preference-Guided Diffusion: DPG-Diff explicitly decomposes user representation into domain-invariant and domain-specific components, both of which guide the reverse diffusion process. Iterative tri-view contrastive learning ensures that denoised representations, fused domain-aware signals, and data augmentations remain consistent, filtering out cross-domain and sequential noise to lower negative transfer (Ye et al., 30 Aug 2025).

Mathematically, these frameworks employ variants of the DDPM equations for forward and reverse processes, modified to include behaviorally guided priors and preference-specific denoising modules.

7. Theoretical and Experimental Evaluation; Datasets and Benchmarks

Theoretical analysis across the literature supports:

Guarantees in bias reduction and tighter tail bounds for selection-bias–aware estimators under open-world settings (Xu et al., 2023).
Empirical demonstration that careful loss reweighting, partial alignment, and disentangled denoising mitigate negative transfer and amplify cross-domain synergy (Park et al., 2023, Yin et al., 21 May 2024, Bian et al., 25 Jan 2025, Liu et al., 25 Apr 2025, Zha et al., 7 Aug 2025, Ye et al., 30 Aug 2025).
Confirmation that naive aggregation and over-alignment are detrimental, especially under data sparsity and domain divergence.

Evaluation is primarily based on large-scale real-world datasets including Amazon (Food–Kitchen, Movie–Book), Douban, HVIDEO, and telco services (Chen et al., 10 Jan 2024). Metrics such as NDCG@k, HR@k, MRR, and ablation studies are standard. Recent works show consistent, statistically significant improvements over strong baselines including SASRec, GRU4Rec, Tri-CDR, C²DSR, and others.

In summary, cross-domain sequential recommendation has evolved into a mature research area combining sophisticated sequence modeling, multi-source alignment, robust optimization for open-world settings, and cutting-edge multimodal and generative modeling. The current generation of CDSR systems achieves robust improvement by harmonizing transferable knowledge across domains, carefully disentangling signals, and integrating advanced neural and semantic technologies. Directions for future research include scaling to many domains, federated privacy-preserving integration, deeper semantic context fusion, real-world efficiency, and advances in denoising and representation learning architectures.