Accelerating Recommender Model Training by Dynamically Skipping Stale Embeddings (2404.04270v1)

Published 22 Mar 2024 in cs.IR and cs.LG

Abstract: Training recommendation models pose significant challenges regarding resource utilization and performance. Prior research has proposed an approach that categorizes embeddings into popular and non-popular classes to reduce the training time for recommendation models. We observe that, even among the popular embeddings, certain embeddings undergo rapid training and exhibit minimal subsequent variation, resulting in saturation. Consequently, updates to these embeddings lack any contribution to model quality. This paper presents Slipstream, a software framework that identifies stale embeddings on the fly and skips their updates to enhance performance. This capability enables Slipstream to achieve substantial speedup, optimize CPU-GPU bandwidth usage, and eliminate unnecessary memory access. SlipStream showcases training time reductions of 2x, 2.4x, 1.2x, and 1.175x across real-world datasets and configurations, compared to Baseline XDL, Intel-optimized DRLM, FAE, and Hotline, respectively.

References (52)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/gastronomy/status/1777549156069880040

HackerNews

Accelerating Recommender Model Training by Dynamically Skipping Stale Embeddings (2 points, 0 comments)

Accelerating Recommender Model Training by Dynamically Skipping Stale Embeddings (2404.04270v1)

Summary

Related Papers

Tweets

HackerNews