Dynamic Domain State Representation

Updated 29 August 2025

Dynamic Domain State Representation (DDSR) is an innovative architecture that captures and transfers relevant intra- and inter-domain signals by maintaining dynamic per-domain states.
It employs an auxiliary self-attention mechanism combined with Transition-Aware Positional Embeddings (TAPE) to efficiently fuse cross-domain information into domain-specific representations.
DDSR notably reduces computational complexity from quadratic to domain-specific subsequence levels, leading to scalable and robust performance in large-scale recommendation tasks.

Dynamic Domain State Representation (DDSR) refers to a class of architectural and algorithmic mechanisms developed to efficiently encode, aggregate, and transfer stateful information in multi-domain, sequential learning systems. It is particularly relevant in large-scale recommender settings where both intra-domain dependencies (patterns within a domain) and inter-domain dependencies (long-range or cross-domain influences) must be captured while addressing the computational bottlenecks of traditional full-attention architectures. DDSR as introduced in the context of transformer-based recommendation fundamentally restructures how cross-domain context is maintained and delivered for prediction, employing dynamic per-domain state tracking in tandem with positional signals that mediate transition awareness.

1. Motivation and Problem Setting

Autoregressive recommendation models (ARMs), particularly transformer-based architectures such as HSTU, have demonstrated strong empirical scaling behavior in sequential retrieval tasks. However, when extended to multi-domain scenarios, the standard global self-attention—where every token in a user’s interaction history attends to every other token—results in prohibitive computational costs and increased noise, as the majority of tokens from unrelated domains contribute little to the modeling of current domain-specific predictions.

Traditional approaches that attempt to model cross-domain interaction by global token attention incur a quadratic O(S²) cost in sequence length S and often generate suboptimal representations due to information overload from irrelevant domains. DDSR addresses these limitations by providing an efficient mechanism for capturing and transferring only the most relevant inter-domain information, ensuring that the complexity scales with the sum of intra-domain sub-sequence lengths rather than the entire sequence length squared.

2. Dynamic Domain State Construction and Mathematical Formulation

DDSR maintains a dynamic per-domain state matrix at each transformer layer. For a token sequence x₁, x₂, ..., xₙ with domain assignments d₁, d₂, ..., dₙ and a domain set 𝒟, the dynamic domain state matrix H_D^L ∈ ℝ^{|𝒟| × n × k} at layer L is constructed such that each row corresponds to a domain d and each position i registers the latest hidden state for that domain:

$(H_D^L)^{(d,i)} = H^L_{\phi_d(i)}$

Here, φ_d(i) is defined as:

$\phi_d(i) = \max \{ j : d = T_{u,j} \land 1 \le j \le i \}$

with T_{u,j} denoting the domain label of token j. If no such j exists, the entry is initialized to 0.

This state construction ensures that, at every prediction step, current computation has explicit access to the most recent state for every domain as induced by the user’s navigation history, even if the current prediction is not for that domain.

3. Cross-Domain Knowledge Fusion Mechanism

After constructing the dynamic state matrix, DDSR incorporates inter-domain knowledge into the intra-domain token representation using an auxiliary self-attention mechanism. The fusion process is described as follows:

Compute keys and values from previous layer’s hidden states:

$K^L, V^L = \text{Split}(W_c \cdot H^{L-1} + b_c)$

with $W_c \in \mathbb{R}^{k \times 2k}$ , $b_c \in \mathbb{R}^{2k}$ .

Queries from per-domain states:

$Q^L = W_q \cdot H_D^L + b_q$

with $W_q \in \mathbb{R}^{k \times k}$ .

Apply scaled dot-product attention and normalization over domains:

$C^L = \text{Norm} \left( \sum_{d \in \mathcal{D}} \text{Softmax} \left( \frac{Q^L \cdot K^L}{\sqrt{k}} \right) \cdot V^L \right )$

Fuse the cross-domain context $C^L$ with the intra-domain hidden representation:

$\hat{H}^L = H^L + C^L$

This approach applies minimal computation for inter-domain transfer (proportional to |𝒟| × n per step) and leverages only the most recent, salient cross-domain signals rather than the entire token history.

4. Interaction with Transition-Aware Positional Embeddings (TAPE)

DDSR is integrated with Transition-Aware Positional Embeddings (TAPE), which encode explicit signals for when domain transitions occur in the interaction history:

For token $x_i$ , embedding computation includes:

$\hat{e}_i = e_i + p_i + r_i$

where $r_i$ encodes domain-transition information:

$r_i = \begin{cases} \hat{d}_{i+1} \odot (W \cdot \hat{d}_i + b), & \text{if } d_{i+1} \neq d_i \ 0, & \text{otherwise} \end{cases}$

with $W$ , $b$ learnable parameters and $\odot$ denoting elementwise product.

TAPE focuses attention on relevant intra-domain segments, while DDSR handles the aggregation and propagation of domain-wise context. The combination effectively partitions knowledge management: TAPE mediates where to attend in sequence, DDSR provides what domain context to incorporate.

5. Advantages in Scalability and Retrieval Performance

The DDSR-TAPE dual mechanism yields several advantages:

Computational Efficiency: By restricting masked self-attention to each domain’s sequence, computational complexity is reduced from O(S²) to approximately O(∑_d s_d²), with $s_d$ the length of domain-d subsequence and typically $|𝒟| \ll n$ .
Scalability: The approach is suitable for large |𝒟| and long sequences n, as the per-domain states are lightweight and the cross-domain fusion step does not scale quadratically.
Effective Cross-Domain Transfer: Recent domain-specific states are used for context transfer, avoiding the noisy and computationally intensive effects of attending to all domains globally.
Joint Modulation: In conjunction with TAPE, DDSR enables models to track and respond to domain changes efficiently, ensuring that cross-domain influences are always encoded using the latest available information.

6. Empirical Results and Impact

The incorporation of DDSR (with TAPE) achieves significant improvements on retrieval tasks in large-scale, cross-domain recommendation settings. By separately modeling and then combining inter- and intra-domain representations, the system demonstrates enhanced efficiency, reduced computational costs, and robust performance in retrieval. The architecture allows for easy adaptation to new domains, as domain-specific context aggregation is performed dynamically at inference time.

7. Broader Context and Future Directions

DDSR provides an architectural blueprint for handling multiple interacting dynamic systems within a unified sequence model. While described in the context of multi-domain recommendation, the separable state aggregation and fusion scheme generalizes to other multi-task or multi-domain sequential modeling scenarios where full attention is infeasible.

Future explorations likely include extending DDSR for non-autoregressive transformers, joint learning of domain taxonomies for hierarchical DDSR state organization, and integration into architectures for multi-modal or non-recommendation sequential decision processes.

In summary, Dynamic Domain State Representation enables scalable, efficient knowledge transfer in multi-domain sequential models by maintaining and dynamically fusing per-domain hidden states, thereby overcoming the attention bottlenecks of standard transformer architectures and enabling practical deployment in real-world, large-scale systems (Loureiro et al., 28 Aug 2025).

PDF Markdown Chat (Pro)

References (1)

Efficient Large-Scale Cross-Domain Sequential Recommendation with Dynamic State Representations (2025)

Follow Topic

Get notified by email when new papers are published related to Dynamic Domain State Representation (DDSR).