Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 32 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 83 tok/s
GPT OSS 120B 467 tok/s Pro
Kimi K2 197 tok/s Pro
2000 character limit reached

Unlocking Scaling Law in Industrial Recommendation Systems with a Three-step Paradigm based Large User Model (2502.08309v1)

Published 12 Feb 2025 in cs.IR

Abstract: Recent advancements in autoregressive LLMs have achieved significant milestones, largely attributed to their scalability, often referred to as the "scaling law". Inspired by these achievements, there has been a growing interest in adapting LLMs for Recommendation Systems (RecSys) by reformulating RecSys tasks into generative problems. However, these End-to-End Generative Recommendation (E2E-GR) methods tend to prioritize idealized goals, often at the expense of the practical advantages offered by traditional Deep Learning based Recommendation Models (DLRMs) in terms of in features, architecture, and practices. This disparity between idealized goals and practical needs introduces several challenges and limitations, locking the scaling law in industrial RecSys. In this paper, we introduce a large user model (LUM) that addresses these limitations through a three-step paradigm, designed to meet the stringent requirements of industrial settings while unlocking the potential for scalable recommendations. Our extensive experimental evaluations demonstrate that LUM outperforms both state-of-the-art DLRMs and E2E-GR approaches. Notably, LUM exhibits excellent scalability, with performance improvements observed as the model scales up to 7 billion parameters. Additionally, we have successfully deployed LUM in an industrial application, where it achieved significant gains in an A/B test, further validating its effectiveness and practicality.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper demonstrates a three-step paradigm that pre-trains a Large User Model to capture user interests and collaborative signals effectively.
  • The approach integrates generative pre-training with discriminative DLRMs, yielding improved AUC, recall, and scalability in industrial settings.
  • The study identifies power-law scaling trends, confirming that as model parameters and sequence length increase, performance continues to improve.

Unlocking Scaling Law in Industrial Recommendation Systems with a Three-step Paradigm based Large User Model

This paper introduces a large user model (LUM) that incorporates a novel three-step paradigm aimed at addressing limitations in existing recommendation systems (RecSys) by leveraging scalability principles observed in LLMs. The research demonstrates superior performance of LUM over both DLRMs and end-to-end generative recommendation approaches, particularly when scaling up to 7 billion parameters and deploying in an industrial application context.

Introduction

Traditionally, deep learning-based recommendation models (DLRMs) have struggled to scale efficiently compared to the notable scalability observed in LLMs. The discrepancy between generative models and discriminative models is at the core of this challenge. Whereas generative models capture the joint probability distribution, discriminative models focus on simpler conditional probabilities, limiting the advantages of increased computational resources. The paper observes various limitations of end-to-end generative recommendation methods (E2E-GRs), such as inconsistencies between training and application, efficiency challenges, lack of flexibility, and limited compatibility with industrial settings.

Inconsistency: Generative models focus on data generation rather than precise predictive outcomes, which can be problematic for application tasks such as click-through rate prediction.

Efficiency: Industrial applications demand efficient training with continuous streaming data and low-latency inference, which E2E-GRs struggle to meet.

Flexibility and Compatibility: E2E-GRs are rigid and incompatible with pre-existing industrial knowledge and explicit feature engineering from traditional models.

To tackle these limitations, the paper proposes a three-step paradigm for training a Large User Model (LUM): Figure 1

Figure 1: Intuitive insight from the common paradigm in using LLM to the proposed multi-step, generative-to-discriminative paradigm.

  1. Knowledge Construction: Utilizing transformer architecture, LUM is pre-trained through generative learning to capture user interests and collaborative relationships among items.
  2. Knowledge Querying: LUM is queried with user-specific information, which involves extracting insights using a concept akin to "prompt engineering."
  3. Knowledge Utilization: Outputs from LUM enrich traditional DLRMs, enhancing their predictive accuracy and decision-making capabilities.

Method

Step 1: Knowledge Construction via Pre-training LUM

Tokenization: A novel tokenization strategy is implemented where each item is expanded into a condition token and an item token. This approach is crucial for capturing user behavior across varied conditions.

Architecture: The hierarchical structure of LUM includes a Token Encoder for integrating heterogeneous input features and a User Encoder, utilizing an autoregressive transformer architecture for comprehensive sequence processing. Figure 2

Figure 2 (a): The architecture of LUM. (b) An example of query knowledge from pre-trained LUM. (c) An example of utilizing knowledge in DLRMs.

Next-condition-item Prediction: To handle high vocabulary size, the paper uses the InfoNCE loss and introduces a packing strategy optimizing sequence processing efficiency.

Step 2: Knowledge Querying with Given Conditions

LUM's architecture supports multi-condition querying, setting the stage for discriminative tasks. The tokenization method allows triggering user-specific insights, further documented through empirical evaluations. Figure 3

Figure 3: An example of group query.

Step 3: Knowledge Utilization in DLRMs

Knowledge extracted from LUM is integrated into DLRMs, either directly as fixed additional features or through interest matching via similarity measurement.

Experiments

Performance on Recommendation Tasks

Public Datasets: Evaluations on multiple datasets show that LUM consistently offers superior performance compared to established methods, validating the effectiveness of the three-step paradigm.

Industrial Setting: In real-world industrial applications, LUM achieves significant improvement in AUC and recall metrics, demonstrating its applicability in large-scale environments.

Effectiveness Evaluation

Impact on DLRMs: The paper assesses various DLRMs integrated with LUM, noting consistent improvements in predictive accuracy.

Tokenization and Utilization Strategies: Various strategies outlined, including tokenization and knowledge utilization, are shown to impact performance significantly.

Efficiency Evaluation

The paper highlights the efficiency of training and serving processes using LUM, emphasizing scalability advantages and reduced computational costs compared to E2E-GRs. Figure 4

Figure 4

Figure 4

Figure 4: The Results of Efficiency Evaluation.

Scaling Law for LUM

The research identifies power-law scaling trends with model size and sequence length, demonstrating the potential for continuous performance improvements as LUM scales. Figure 5

Figure 5

Figure 5: Scaling law for LUM.

Conclusion

This paper presents a comprehensive framework that successfully leverages generative models to unlock scalability in recommendation systems. The proposed method addresses critical challenges in industrial applications, offering robust integration and flexibility. Experiments validate the practicality of LUM, ensuring its scalability and efficiency, thus enhancing user engagement and business outcomes.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube