Wukong: Towards a Scaling Law for Large-Scale Recommendation (2403.02545v4)

Published 4 Mar 2024 in cs.LG and cs.AI

Abstract: Scaling laws play an instrumental role in the sustainable improvement in model quality. Unfortunately, recommendation models to date do not exhibit such laws similar to those observed in the domain of LLMs, due to the inefficiencies of their upscaling mechanisms. This limitation poses significant challenges in adapting these models to increasingly more complex real-world datasets. In this paper, we propose an effective network architecture based purely on stacked factorization machines, and a synergistic upscaling strategy, collectively dubbed Wukong, to establish a scaling law in the domain of recommendation. Wukong's unique design makes it possible to capture diverse, any-order of interactions simply through taller and wider layers. We conducted extensive evaluations on six public datasets, and our results demonstrate that Wukong consistently outperforms state-of-the-art models quality-wise. Further, we assessed Wukong's scalability on an internal, large-scale dataset. The results show that Wukong retains its superiority in quality over state-of-the-art models, while holding the scaling law across two orders of magnitude in model complexity, extending beyond 100 GFLOP/example, where prior arts fall short.

References (38)

Citations (8)

View on Semantic Scholar

Summary

The paper presents Wukong, an innovative network built on stacked factorization machines to capture high-order feature interactions in recommendation tasks.
It emphasizes dense scaling by enhancing interaction components instead of merely expanding embedding tables, reducing computational resource demands.
Empirical evaluations across diverse datasets demonstrate consistent AUC gains and robustness over a 100 GFLOP complexity range.

Analyzing "Wukong: Towards a Scaling Law for Large-Scale Recommendation"

The paper "Wukong: Towards a Scaling Law for Large-Scale Recommendation" presents the development of a novel network architecture, Wukong, designed explicitly for improving recommendation models' scalability and efficiency. The research addresses a persistent challenge—establishing scaling laws for recommendation systems comparable to those observed in LLMs.

Core Contributions

Architecture Design: Wukong is built on stacked factorization machines (FMs), a design that distinguishes it from traditional recommendation models. The architecture is aimed at capturing diverse and high-order interactions among features by employing deeper and wider network layers. This characteristic is particularly crucial for recommendation tasks requiring high-level reasoning and interaction modeling.
Scalable Interaction Component: Unlike existing models, which often rely heavily on expanding embedding tables (sparse scaling), Wukong emphasizes dense scaling. By focusing on upscaling interaction components rather than just increasing the size of embedding tables, Wukong achieves better quality improvements while maintaining or reducing infrastructure costs.
Empirical Validation: The authors conducted extensive evaluations across six public datasets and one large-scale proprietary dataset. Across varying model complexities—extending beyond 100 GFLOP/example—Wukong consistently outperforms state-of-the-art baseline models. This consistent performance demonstrates its robustness across different complexity scales and diverse datasets.

Significant Findings

Wukong delivers superior predictive accuracy as measured by AUC across all tested datasets, highlighting its efficacy in various recommendation scenarios.
The architecture's ability to uphold scaling laws is shown by its continuous quality improvement over two orders of magnitude in model complexity.
Wukong achieves scalability without significant loss in model efficiency, aligning well with hardware capabilities that favor enhanced compute over memory.

Implications and Future Perspectives

The development of Wukong has practical and theoretical implications for future AI and recommendation research. Practically, it provides a scalable backbone for deploying recommendation systems that can adapt to rapidly increasing dataset complexity and size without prohibitive computational costs. Theoretically, it opens avenues for further exploration of scaling laws in domains beyond LLMs, potentially setting precedence for similar constructs in other machine learning tasks.

Future studies could investigate the limits of Wukong’s scalability, explore its applicability in different contexts like sequential recommendations, or even its potential compatibility and interaction with transformer-based architectures. Moreover, developing and evaluating efficient serving strategies for such scaled-up models could enhance the deployment and real-time usability of these systems.

In conclusion, Wukong marks an important stride towards deriving scaling laws in recommendation systems, presenting a robust alternative to traditional upscaling strategies that hinge on mere expansion of embedding tables. Its innovative approach to interaction modeling and efficacy across scale presents a valuable asset to both academic research and practical applications in recommendation systems.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (15)

Tweets

https://twitter.com/_akhaliq/status/1765205736244568207

https://twitter.com/ryanlpeterman/status/1828937991341195424

https://twitter.com/javaeeeee1/status/1766483420538921312

https://twitter.com/AxSaucedo/status/1793892211852967952

HackerNews

Wukong: Towards a Scaling Law for Large-Scale Recommendation (1 point, 0 comments)