Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Accelerating Recommender Model Training by Dynamically Skipping Stale Embeddings (2404.04270v1)

Published 22 Mar 2024 in cs.IR and cs.LG

Abstract: Training recommendation models pose significant challenges regarding resource utilization and performance. Prior research has proposed an approach that categorizes embeddings into popular and non-popular classes to reduce the training time for recommendation models. We observe that, even among the popular embeddings, certain embeddings undergo rapid training and exhibit minimal subsequent variation, resulting in saturation. Consequently, updates to these embeddings lack any contribution to model quality. This paper presents Slipstream, a software framework that identifies stale embeddings on the fly and skips their updates to enhance performance. This capability enables Slipstream to achieve substantial speedup, optimize CPU-GPU bandwidth usage, and eliminate unnecessary memory access. SlipStream showcases training time reductions of 2x, 2.4x, 1.2x, and 1.175x across real-world datasets and configurations, compared to Baseline XDL, Intel-optimized DRLM, FAE, and Hotline, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. NVIDIA Merlin: HugeCTR. https://github.com/NVIDIA-Merlin/HugeCTR.
  2. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, 2015.
  3. Understanding Training Efficiency of Deep Learning Recommendation Models at Scale, 2020.
  4. Accelerating Recommendation System Trainingby Leveraging Popular Choices. In VLDB, 2022.
  5. Heterogeneous acceleration pipeline for recommendation system training, 2022.
  6. Ad-rec: Advanced feature interactions to address covariate-shifts in recommendation networks, 2023.
  7. Bagpipe: Accelerating deep recommendation model training, 2022.
  8. Alibaba. User Behavior Data from Taobao for Recommendation. https://tianchi.aliyun.com/dataset/dataDetail?dataId=649userId=1.
  9. Understanding and improving early stopping for learning with noisy labels. Advances in Neural Information Processing Systems, 34:24392–24403, 2021.
  10. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In ISCA, 2016.
  11. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro, 38:8–20, March 2018.
  12. CriteoLabs. Criteo Display Ad Challenge. https://www.kaggle.com/c/criteo-display-ad-challenge.
  13. CriteoLabs. Terabyte Click Logs. https://labs.criteo.com/2013/12/download-terabyte-click-logs.
  14. A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication. In International Symposium on Field-Programmable Custom Computing Machines. IEEE, May.
  15. Mixed dimension embeddings with application to memory-efficient recommendation systems. CoRR, abs/1909.11810, 2019.
  16. The netflix recommender system: Algorithms, business value, and innovation. ACM Trans. Manage. Inf. Syst., 6(4), December 2016.
  17. Deepfm: a factorization-machine based neural network for ctr prediction. arXiv preprint arXiv:1703.04247, 2017.
  18. Early stopping in deep networks: Double descent and how to eliminate it. arXiv preprint arXiv:2007.10099, 2020.
  19. Time-based Sequence Model for Personalization and Recommendation Systems. CoRR, abs/2008.11922, 2020.
  20. Early-stopped neural networks are consistent. Advances in Neural Information Processing Systems, 34:1805–1817, 2021.
  21. XDL: An Industrial Deep Learning Framework for High-Dimensional Sparse Data. DLP-KDD ’19, New York, NY, USA, 2019. Association for Computing Machinery.
  22. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA ’17, page 1–12, New York, NY, USA, 2017. Association for Computing Machinery.
  23. Kaggle. Avazu mobile ads CTR. https://www.kaggle.com/c/avazu-ctr-prediction.
  24. Optimizing Deep Learning Recommender Systems Training on CPU Cluster Architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’20. IEEE Press, 2020.
  25. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pages 790–803, 2020.
  26. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’18, page 461–475, New York, NY, USA, 2018. Association for Computing Machinery.
  27. Xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’18, page 1754–1763, New York, NY, USA, 2018. Association for Computing Machinery.
  28. Tabla: A unified template-based framework for accelerating statistical machine learning. March 2016.
  29. Early stopping without a validation set. arXiv preprint arXiv:1703.09580, 2017.
  30. meta. Meta recommender model training on ZionEX devices. https://www.infoq.com/news/2021/05/facebook-zionex-training/.
  31. Software-hardware co-design for fast and scalable training of deep learning recommendation models. In Proceedings of the 49th Annual International Symposium on Computer Architecture, ISCA ’22, page 993–1011, New York, NY, USA, 2022. Association for Computing Machinery.
  32. High-performance, distributed training of large-scale deep learning recommendation models. CoRR, abs/2104.05158, 2021.
  33. Hw/sw co-design for future ai platforms - large memory unified training platform (zion), 2019.
  34. Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR, abs/1906.00091, 2019.
  35. Nvidia. NVIDIA Collective Communications Library (NCCL). https://docs.nvidia.com/deeplearning/nccl/index.html.
  36. Nvidia. Nvlink. https://www.nvidia.com/en-us/data-center/nvlink/.
  37. Scale-Out Acceleration for Machine Learnng. October 2017.
  38. Automatic differentiation in PyTorch. 2017.
  39. Lutz Prechelt. Automatic early stopping using cross validation: quantifying the criteria. Neural networks, 11(4):761–767, 1998.
  40. Product-based neural networks for user response prediction. In 2016 IEEE 16th international conference on data mining (ICDM), pages 1149–1154. IEEE, 2016.
  41. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pages 267–278, June 2016.
  42. Recshard: Statistical feature-based memory optimization for industry-scale neural recommendation, 2022.
  43. Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems, page 165–175. Association for Computing Machinery, New York, NY, USA, 2020.
  44. B. Smith and G. Linden. Two decades of recommender systems at amazon.com. IEEE Internet Computing, 21(3):12–18, 2017.
  45. Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 1161–1170, 2019.
  46. A Generic Network Compression Framework for Sequential Recommender Systems. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, page 1299–1308, New York, NY, USA, 2020. Association for Computing Machinery.
  47. Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In Proceedings of the web conference 2021, pages 1785–1797, 2021.
  48. Machine learning at facebook: Understanding inference at the edge. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 331–344, Feb 2019.
  49. Saec: similarity-aware embedding compression in recommendation systems. In Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems, pages 82–89, 2020.
  50. TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models, 2021.
  51. Distributed hierarchical gpu parameter server for massive scale deep learning ads systems, 2020.
  52. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1059–1068, 2018.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com