Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Long-Sequence Recommendation Models Need Decoupled Embeddings (2410.02604v3)

Published 3 Oct 2024 in cs.IR and cs.LG

Abstract: Lifelong user behavior sequences are crucial for capturing user interests and predicting user responses in modern recommendation systems. A two-stage paradigm is typically adopted to handle these long sequences: a subset of relevant behaviors is first searched from the original long sequences via an attention mechanism in the first stage and then aggregated with the target item to construct a discriminative representation for prediction in the second stage. In this work, we identify and characterize, for the first time, a neglected deficiency in existing long-sequence recommendation models: a single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes. Initial attempts to address this issue with some common methods (e.g., linear projections -- a technique borrowed from language processing) proved ineffective, shedding light on the unique challenges of recommendation models. To overcome this, we propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are initialized and learned separately to fully decouple attention and representation. Extensive experiments and analysis demonstrate that DARE provides more accurate searches of correlated behaviors and outperforms baselines with AUC gains up to 0.9% on public datasets and notable improvements on Tencent's advertising platform. Furthermore, decoupling embedding spaces allows us to reduce the attention embedding dimension and accelerate the search procedure by 50% without significant performance impact, enabling more efficient, high-performance online serving. Code in PyTorch for experiments, including model analysis, is available at https://github.com/thuml/DARE.

Summary

  • The paper identifies the conflict of shared embeddings in attention and representation tasks as a key performance hindrance.
  • The paper proposes the DARE model with separate embedding tables, boosting attention accuracy and representation discriminability with up to 9‰ AUC gains.
  • The paper shows that decoupling embeddings reduces attention dimensions, accelerating the search process by 50% without significant performance loss.

Decoupled Embeddings in Long-Sequence Recommendation Models

The paper "Long-Sequence Recommendation Models Need Decoupled Embeddings" addresses a prevalent challenge in modern recommendation systems: efficiently managing long user behavior sequences to accurately predict user preferences. The paper identifies an overlooked deficiency in traditional models, which utilize a single set of embeddings for both attention and representation tasks. This shared embedding approach leads to interference between the two processes, hindering performance.

Conceptual Framework

Recommendation systems typically employ a two-stage process to handle extensive user behavior histories. The first stage involves selecting relevant behaviors using an attention mechanism, constructing a shorter sequence for further analysis. In the second stage, these behaviors are integrated with the target item to generate a representative vector for prediction tasks. The core issue identified is that shared embeddings fail to perform optimally in both stages, as attention must focus on behavior correlation while representation needs to capture discriminative features.

Proposed Solution: Decoupled Embeddings

The authors propose the Decoupled Attention and Representation Embeddings (DARE) model, which distinctly separates the embedding spaces for attention and representation. By decoupling these processes, the model circumvents the gradient conflicts observed in shared embeddings. Initial attempts to resolve this issue using linear projections were ineffective due to limited capacity in recommendation contexts. DARE instead uses two distinct embedding tables, improving both attention accuracy and representation discriminability.

Experimental Findings

Comprehensive experiments were conducted on public datasets, including Taobao and Tmall, as well as real-world online environments. DARE consistently outperformed baseline models with AUC gains of up to 9\textperthousand. Additionally, decoupling the embeddings allowed for a reduction in attention embedding dimensions, accelerating the search process by 50% without significant performance degradation.

Specific analyses on mutual information demonstrated that DARE captures the temporal-semantic correlations of user behavior more accurately than existing models. During retrieval, the model efficiently identifies key behaviors, enhancing overall prediction accuracy. Representation discriminability was also enhanced, as evidenced by mutual information measurements between representations and labels.

Practical and Theoretical Implications

The implications of this research are substantial for both practical applications and theoretical advancements in recommendation systems. Practically, the model can improve user engagement through more personalized content delivery with faster computational efficiency. Theoretically, the findings challenge conventional embedding strategies and open avenues for exploring embedding decoupling in other domains where long-sequence management is critical.

Future Research Directions

Future investigations might explore further decoupling strategies or embedding variations for other types of sequences or domains. There is also potential to refine projection matrix techniques or analyze interaction effects more deeply, broadening our understanding of capacity constraints in recommendation models.

Overall, this paper contributes a novel approach by rethinking embedding utility in long-sequence recommendations, fostering more precise and efficient user modeling.