Cross-Domain Autoregressive Recommendation

Updated 12 November 2025

Cross-domain autoregressive recommendation is a sequential framework that predicts next-item interactions across heterogeneous domains using specialized attention and tokenization techniques.
It overcomes data sparsity and computational challenges through intra-domain masking, dynamic cross-domain state propagation, and generative semantic tokenization.
Empirical studies reveal notable gains in hit rate and NDCG metrics, affirming its effectiveness in real-world multi-domain recommender systems.

Cross-domain autoregressive recommendation (CDAR) denotes a class of recommendation systems designed to model user behaviors across multiple item domains—such as books, movies, and apparel—within a unified sequential framework. Unlike traditional single-domain sequential recommendation, which predicts the next item within a homogeneous catalog, CDAR must model personalized user histories that span heterogeneous domains and must faithfully capture both intra-domain (within-domain) and inter-domain (cross-domain) dependencies. The field addresses not only data sparsity and domain transfer challenges, but also computational constraints arising from scaling such models across large user bases and item catalogs.

1. Problem Formulation and Foundational Challenges

The formal CDAR problem generalizes single-domain autoregressive recommendation by seeking to model, at each time $i$ , the conditional distribution

$P(x_{i+1} \mid x_1, \ldots, x_i; d_1, \ldots, d_i, d_{i+1})$

where $x_j$ denotes the $j$ -th item in a user's history and $d_j$ is its associated domain label, with the recommendation output at step $i+1$ restricted to items in domain $d_{i+1}$ (Loureiro et al., 28 Aug 2025). This setting presents several technical obstacles:

Quadratic Complexity: Vanilla transformer architectures compute O( $n^2$ ) attention maps over $n$ -length histories; as both $n$ and the number of domains $D$ grow, this becomes prohibitively expensive in memory and compute (Loureiro et al., 28 Aug 2025).
Overlap Dilemma: Methods relying on users with cross-domain interactions ("overlapped" users) for collaborative learning struggle when only a minority of users span multiple domains, leading to poor generalization under sparse overlap (Liu et al., 25 Apr 2025).
Transition Complexity: Mixed-domain behavioral sequences introduce complex, personalized transition dynamics (e.g., from movies to books) that are not well captured by simple temporal self-attention or domain-agnostic modeling alone (Liu et al., 25 Apr 2025).

2. Model Architectures and Domain Interaction Mechanisms

Architectural approaches to CDAR aim to balance the learning of domain-specific sequential patterns with robust transfer of cross-domain preferences while enforcing computational efficiency.

2.1 Intra- and Inter-Domain Attention Decomposition

Efficient transformer-based models, such as the TAPE+DDSR architecture, replace full-sequence attention with a hybrid scheme (Loureiro et al., 28 Aug 2025):

Intra-domain masked self-attention: Tokens attend only to other items within the same domain segment; mathematically, let $s^d$ be the count of items in domain $d$ , then the attention complexity becomes $\sum_{d=1}^{D} (s^d)^2$ rather than $n^2$ .
Transition-Aware Positional Embeddings (TAPE): Augment token embeddings with transition vectors $r_i$ , so that only when a domain transition is imminent ( $d_{i+1} \neq d_i$ ), $r_i$ is nonzero:

$r_i = \begin{cases} M(d_{i+1}) \odot (W M(d_i) + b) & \text{if } d_{i+1} \neq d_i \ 0 & \text{otherwise} \end{cases}$

Such design enables the encoder to modulate its intra-domain context according to upcoming cross-domain transitions.

2.2 Lightweight Cross-Domain State Propagation

Dynamic Domain State Representation (DDSR): Maintains, at each network layer, a small set of hidden state vectors—one per domain—tracking the most recent intra-domain representation. Cross-domain dependencies are modeled using a D-to- $n$ attention (where $D \ll n$ ), infused into the backbone as an additive context (Loureiro et al., 28 Aug 2025). This enables exchange of global information without full quadratic computation.

2.3 Multi-Stream and Group-Prototype Fusion

Alternate models, such as MAN, operate separate local (domain-specific) and global (domain-shared) encoding streams (Lin et al., 2023):

Mixed Attention Layer: Implements item similarity attention (ISA), sequence-fusion attention (SFA), and group-prototype attention (GPA), the latter introducing learnable prototype vectors to facilitate knowledge transfer even in the absence of overlapped users.
Tri-Thread LLM Fusion: LLM4CDSR organizes user modeling into two local (per-domain) and a shared, global LLM-derived semantic stream. Each thread generates a user embedding, with predictions determined by the fusion of local and global outputs (Liu et al., 25 Apr 2025).

2.4 Generative Tokenization and Universal Routing

GenCDR incorporates domain-adaptive tokenization by converting item features into semantic IDs (SIDs) using a universal encoder and domain-specific adapters, followed by routing via a dynamic gating network (Hu et al., 11 Nov 2025). The cross-domain generative model fuses universal and domain-specific interest distributions, adaptively selected per-user and sequence segment.

3. Training Objectives and Inference Protocols

CDAR models typically optimize autoregressive next-item prediction objectives, but differ in architectural constraints and auxiliary regularizers.

Sampled Softmax or BPR Loss: For next-item ranking, only item embeddings from the target domain are considered at each step, with losses such as negative log-likelihood over the sampled candidate set (Loureiro et al., 28 Aug 2025), or BPR margin-based losses (Liu et al., 25 Apr 2025).
Contrastive and Alignment Losses: LLM-based approaches apply contrastive regularization to enforce that cross-domain semantic representations of truly related items/users are close, while negatives are pushed apart, enhancing discrimination even under low overlap (Liu et al., 25 Apr 2025).
Prototype Disentanglement: GPA modules include explicit regularization to encourage group prototypes to diverge, thus maximizing the diversity of transferred patterns (Lin et al., 2023).
Generative Prefix-Tree Constraints: To guarantee syntactic and semantic validity in sequence generation, GenCDR constructs a domain-specific prefix trie for efficient and valid constrained beam search, minimizing invalid token generations (Hu et al., 11 Nov 2025).

4. Computational Efficiency and Scalability

Scalability is a central concern in CDAR as application domains and item/user cardinality expand:

Model Class	Key Efficiency Feature	Asymptotic Complexity
Full Transformer	Global all-to-all attention	$O(n^2)$
Intra-domain Masked Attention	Disjoint per-domain attention	$\sum_d s^d{}^2$ ( $\leq O(n^2)$ )
DDSR/Lightweight Domain States	Cross-domain D-to-n attention	$O(D \cdot n \cdot k)$
MAN (ISA, SFA, GPA)	MLP-based attention, group-level pooling	Linear in sequence/domain size
LLM4CDSR/GenCDR	Shallow/frozen LLM + adapter, prefix-trie	Bounded, data-prep-time LLM calls

Empirical findings (Loureiro et al., 28 Aug 2025, Hu et al., 11 Nov 2025):

Intra-domain masking and DDSR yield several-fold improvements in memory and computational time.
GenCDR's LoRA-based fine-tuning reduces trainable parameter count by up to 97%, achieves 3x training speedup, and maintains constant inference time as the item vocabulary grows, aided by efficient prefix-tree search.

5. Empirical Evaluation and Comparative Performance

Recent comparative studies examine multiple large-scale datasets spanning 2–5 domains (e.g., Amazon reviews, Douban). Reported metrics include Hit Rate@K, NDCG@K, AUC:

TAPE+DDSR (masking+TAPE+DDSR): On Amazon 5-domain data, achieves Hit Rate@100 of 10.25% vs. 9.09% for masking-only models—a ∼13% relative improvement—and NDCG@100 of 3.26 vs. 3.15 (Loureiro et al., 28 Aug 2025). DDSR removal yields significant performance drops in recall.
LLM4CDSR: Delivers 4%–32% improvements in Hit@10 and 3%–35% in NDCG@10 across all six task domains, robust under 25% overlap (where conventional CDSR fails) (Liu et al., 25 Apr 2025).
GenCDR: Outperforms best baselines (including LLM4CDSR) on Recall@K and NDCG@K in all six evaluated domains; for example, on Phones NDCG@10 rises from 0.0506 to 0.0512, a statistically significant 1.2% gain (Hu et al., 11 Nov 2025).
MAN: On Amazon (sparse overlap), AUC improves by 8%–10% over strong single-domain or pairwise CDSR baselines. Removing group-prototype attention is the most deleterious ablation, substantiating the importance of unsupervised group transfer (Lin et al., 2023).

6. Overcoming Data Sparsity and Tokenization Barriers

A persistent challenge in CDAR is the "item ID tokenization dilemma"—the combinatorial explosion in raw item IDs across domains and the inability to generalize when IDs are disjoint (Hu et al., 11 Nov 2025). Approaches to address this barrier include:

Semantic Tokenization: GenCDR maps items to semantic codes via RQ-VAE with domain-adaptive routing, obviating the need for shared IDs while preserving high-order collaborative knowledge.
LLM-Derived Embeddings: LLM4CDSR utilizes frozen LLM embeddings to bridge items with no co-clicked user, thus circumventing the overlap dilemma and enabling robust zero-shot transfer.
Prototype and SFA-based Grouping: MAN's GPA and SFA layers allow transfer of group-level sequential knowledge even in the total absence of user overlap.
A plausible implication is that these tokenization and semantic bridging mechanisms generalize more gracefully to new domains or item categories compared to ID-centric models.

7. Implications, Limitations, and Future Directions

Current CDAR models demonstrate state-of-the-art accuracy, efficiency, and robustness to data sparsity. However, several limitations remain:

Trade-offs exist between expressive cross-domain fusion and computational tractability; for example, masking all cross-domain attention reduces recall unless compensated by explicit state transfers (DDSR, SFA, prototypes).
LLM-based approaches, while powerful in transfer, depend on quality and granularity of item textual/attribute data and may inherit biases from the underlying LLMs.
Locally optimal tokenization (RQ-VAE/adaptive routing) may not always capture the joint collaborative structure desired for global recommendation.

Looking forward, integrating more granular domain signals (e.g., subdomain hierarchies, user intent modeling), advanced multi-agent or federated learning protocols for privacy, and further advances in efficient generative inference (e.g., beam-constrained search, prefix trees) represent promising research avenues. A rigorous understanding of when and how unsupervised group prototypes enable effective transfer without user overlap remains an open problem, as does systematic benchmarking across increasing numbers of domains and greater heterogeneity.

Cross-domain autoregressive recommendation has thus evolved from conventional sequence models to highly modular, scalable, and semantically-grounded architectures, offering robust solutions for real-world recommender systems across the breadth of heterogeneous catalogs and interaction modalities.