Multi-Domain Recommendation Systems

Updated 2 May 2026

Multi-domain recommendation systems jointly model user preferences across various domains, leveraging overlapping data to improve prediction accuracy.
The M3oE framework disentangles shared, domain-specific, and task-specific information, enhancing performance metrics such as AUC and NDCG.
Automated architecture search and two-level fusion techniques ensure scalability and effectively address negative transfer and data heterogeneity challenges.

Multi-domain recommendation refers to the paradigm in recommender systems where user preferences are jointly modeled and predicted across multiple distinct domains, such as product categories, content verticals, or application scenarios. This approach is motivated by the substantial user and item overlaps observed in real-world platforms (e.g., e-commerce, social media), where knowledge transfer between domains can alleviate data sparsity and improve recommendation quality. However, it introduces significant challenges including domain heterogeneity, negative transfer, scalability to large numbers of domains, and the need for architectures that balance shared and domain-specific signal.

1. Problem Formulation and Notational Foundations

A multi-domain recommendation (MDR) system operates on the following entities:

Users: $U = \{u_1, ..., u_{|U|}\}$
Items: $I = \{i_1, ..., i_{|I|}\}$
Domains: $D = \{1, ..., D\}$ , each representing a scenario, catalog, or context
Tasks (optional): $T = \{1, ..., T\}$ , e.g., click, like, rating prediction

The basic training input is a tuple $(u, i, d[, t])$ with ground truth $y_{u,i,d,[t]}$ (binary or regression), and the feature vector $x_{u,i,d} \in \mathbb{R}^d$ encoding user, item, context, and domain. The core objective is to learn a function $f(\cdot)$ such that for any $(u, i, d[, t])$ ,

$\hat{y}_{d,[t]} = f(x_{u,i,d};\,\Theta)$

jointly minimizes a multi-domain, multi-task loss:

$I = \{i_1, ..., i_{|I|}\}$ 0

Here, $I = \{i_1, ..., i_{|I|}\}$ 1 is the task/domain-specific loss function, $I = \{i_1, ..., i_{|I|}\}$ 2 are trade-off weights, and $I = \{i_1, ..., i_{|I|}\}$ 3 is a regularizer (e.g., $I = \{i_1, ..., i_{|I|}\}$ 4).

Key challenges include domain/data heterogeneity, negative transfer, combinatorial domain relationships, domain-wise imbalance, and model/parameter scalability (Zhang et al., 2024).

2. Model Architectures and Core Methodologies

2.1 Mixture-of-Experts with Multi-level Disentanglement (M3oE)

The M3oE framework (Zhang et al., 2024) generalizes the Mixture-of-Experts (MoE) approach to jointly disentangle:

Common Preferences: Experts $I = \{i_1, ..., i_{|I|}\}$ 5 capture information shared across all $I = \{i_1, ..., i_{|I|}\}$ 6.
Domain-Aspect Preferences: Experts $I = \{i_1, ..., i_{|I|}\}$ 7 (per-domain) extract domain-specific behavior, managed by per-domain gates.
Task-Aspect Preferences: Experts $I = \{i_1, ..., i_{|I|}\}$ 8 (per-task) model task specificity.

Each input $I = \{i_1, ..., i_{|I|}\}$ 9 is encoded by these three expert modules, each mixed by a learned gating network (softmax of an affine projection of $D = \{1, ..., D\}$ 0). The outputs are fused in two stages:

Domain-level fusion via gate $D = \{1, ..., D\}$ 1: combines $D = \{1, ..., D\}$ 2 and $D = \{1, ..., D\}$ 3
Task-level fusion via gate $D = \{1, ..., D\}$ 4: integrates domain-fused $D = \{1, ..., D\}$ 5 and task-specific $D = \{1, ..., D\}$ 6

These two adaptive fusion stages enable the system to modulate shared vs. domain vs. task signals for any $D = \{1, ..., D\}$ 7 slice.

2.2 Automated Architecture Search

M3oE incorporates architecture search: mixture sizes ( $D = \{1, ..., D\}$ 8), hidden dimensions, and gating networks’ architectural hyperparameters form a space explored by a controller (e.g., RNN, evolutionary algorithm). Rewards (AUC/NDCG) on validation guide iterative sampling and selection. This self-adaptivity yields architectural capacity well-matched to the present domain/task constraints (Zhang et al., 2024).

2.3 Multi-task Objective and Optimization

Multi-domain, multi-task settings pool losses across all $D = \{1, ..., D\}$ 9, with optional learned weighting (e.g., uncertainty-driven). Group sparsity or diversity regularizers across experts prevent collapse. The system is trained end-to-end via mini-batch Adam, with the architecture search phase requiring repeated warm-start training/evaluation cycles for comparative selection (Zhang et al., 2024).

3. Addressing Transfer, Negative Transfer, and Feature Heterogeneity

3.1 Negative Transfer Mitigation

Extensive ablation experiments with M3oE (Zhang et al., 2024) demonstrate that removing any expert type (shared, domain, task) or simplifying fusion leads to reduced AUC/NDCG. The two-level fusion in particular is critical for fine-grained control over inductive transfer. The modular separation directly targets negative transfer by isolating factors of variation.

3.2 Generalization to Small Overlap/Long-Tail Domains

Embedding-level (not only model-level) disentanglement, as in EDDA (Ning et al., 2023), decouples gradients arising from shared and domain-specialized embeddings. This guards against negative transfer, particularly in domains with limited overlap or unique behavior. Further, alignment via random-walk-based similarity and aligning per-domain linear projections increases the capacity for knowledge transfer, especially in sparse or cold-start regimes.

4. Evaluation Protocols and Empirical Results

4.1 Datasets and Metrics

Benchmarks span from large industrial datasets to established academic splits:

Multiple domains (3–50+), each with distinct user-item interactions and severe sparsity
Tasks: CTR, like-rate, rating, completed across all available domains and, optionally, tasks

Metrics include:

AUC, NDCG@K, Recall@K: for overall and per-domain evaluation
Ablations: changes in performance with individual architectural components dropped

4.2 Empirical Comparison Table

Model	AUC (CTR)	NDCG@10 (like)	Gain vs SOTA
MLP (separate)	0.720	0.310	Baseline
MMoE	0.730	0.320	Baseline
STAR	0.735	0.325	Baseline
M3oE (ours)	0.756	0.342	+2.1%/+5.2%

Ablation Impact (AUC drops in M3oE, all relative to full model):

Remove shared experts: –1.1%
Remove domain experts: –0.8%
Remove task experts: –0.6%
Single instead of two-level fusion: –0.7%
Disable AutoML: –0.4%

Each expert module and both fusion levels independently improve performance, and AutoML adaptation further boosts capacity allocation, particularly in regimes with heterogenous or imbalanced data (Zhang et al., 2024).

5. Connections to Broader Multi-domain and Multi-task Learning

M3oE’s contributions are situated within a rapidly evolving ecosystem of multi-domain models:

Multi-task MoE extensions are central to contemporary MDR. Prior work (MMoE, PLE) focuses on separating domain/task knowledge at the model or expert level but does not provide the fully orthogonal disentanglement and two-stage fusion seen in M3oE.
Embedding-level disentanglement and structural alignment as in EDDA (Ning et al., 2023) complements M3oE by further decoupling shared/domain gradients.
AutoML augmentation for architectural search is increasingly used for scalable adaptation and capacity control in both industrial and academic settings.
The M3oE abstraction is generic: it could in principle be combined with graph-based or memory-augmented methods for cross-domain representation and transfer.

6. Practical Implications and Future Directions

M3oE’s modularization and adaptivity offer not just state-of-the-art empirical results but a roadmap for scalable deployment:

Explicit modeling of the common, domain, and task axes enables fine-grained control—crucial for large heterogeneous platforms and for scenarios where domains or tasks continually evolve.
Two-level fusion mechanisms allow the precision balancing of transfer vs. specialization—directly addressing negative transfer pathologies endemic to MDR.
Coupling with automated architecture search enables rapid, data-driven adjustment as platforms scale or as domain/task distributions shift.

Future directions highlighted include online AutoML for continual domain/task expansion, graph-based gating mechanisms to more delicately model inter-domain similarities, and expert pruning/selection for resource-constrained real-time scenarios (Zhang et al., 2024).

7. Conclusion

Multi-domain recommendation synthesizes contemporary advances in representation learning, mixture-of-experts architectures, disentangled feature modeling, and adaptive system design. The M3oE framework represents the current frontier by simultaneously addressing negative transfer, fine-grained specialization, and network scalability via multi-level expert disentanglement and AutoML-driven architecture search. As digital platforms' domain complexity grows, such modular and adaptive MDR systems will be central to both offline evaluation efficacy and robust, efficient production deployment (Zhang et al., 2024).

Markdown Report Issue Upgrade to Chat

References (2)

M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework (2024)

Multi-domain Recommendation with Embedding Disentangling and Domain Alignment (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi-Domain Recommendation.