Papers
Topics
Authors
Recent
Search
2000 character limit reached

CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models

Published 6 Dec 2023 in cs.LG | (2312.03256v2)

Abstract: Recently, the growing memory demands of embedding tables in Deep Learning Recommendation Models (DLRMs) pose great challenges for model training and deployment. Existing embedding compression solutions cannot simultaneously meet three key design requirements: memory efficiency, low latency, and adaptability to dynamic data distribution. This paper presents CAFE, a Compact, Adaptive, and Fast Embedding compression framework that addresses the above requirements. The design philosophy of CAFE is to dynamically allocate more memory resources to important features (called hot features), and allocate less memory to unimportant ones. In CAFE, we propose a fast and lightweight sketch data structure, named HotSketch, to capture feature importance and report hot features in real time. For each reported hot feature, we assign it a unique embedding. For the non-hot features, we allow multiple features to share one embedding by using hash embedding technique. Guided by our design philosophy, we further propose a multi-level hash embedding framework to optimize the embedding tables of non-hot features. We theoretically analyze the accuracy of HotSketch, and analyze the model convergence against deviation. Extensive experiments show that CAFE significantly outperforms existing embedding compression methods, yielding 3.92% and 3.68% superior testing AUC on Criteo Kaggle dataset and CriteoTB dataset at a compression ratio of 10000x. The source codes of CAFE are available at GitHub.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. Source codes related to CAFE. https://github.com/HugoZHL/CAFE.
  2. Fauce: Fast and accurate deep ensembles with uncertainty for cardinality estimation. Proceedings of the VLDB Endowment, 14(11):1950–1963, 2021.
  3. Cardinality estimation of approximate substring queries using deep learning. Proceedings of the VLDB Endowment, 15(11):3145–3157, 2022.
  4. Queryformer: A tree transformer model for query plan representation. Proceedings of the VLDB Endowment, 15(8):1658–1670, 2022.
  5. LOGER: A learned optimizer towards generating efficient and robust query execution plans. Proceedings of the VLDB Endowment, 16(7):1777–1789, 2023.
  6. Natural language to SQL: where are we today? Proceedings of the VLDB Endowment, 13(10):1737–1750, 2020.
  7. Distributed representations of tuples for entity resolution. Proceedings of the VLDB Endowment, 11(11):1454–1467, 2018.
  8. Context-aware semantic type identification for relational attributes. Journal of Computer Science and Technology, 38(4):927–946, 2023.
  9. Effective and efficient retrieval of structured entities. Proceedings of the VLDB Endowment, 13(6):826–839, 2020.
  10. Parallel training of knowledge graph embedding models: A comparison of techniques. Proceedings of the VLDB Endowment, 15(3):633–645, 2021.
  11. Scaling attributed network embedding to massive graphs. Proceedings of the VLDB Endowment, 14(1):37–49, 2020.
  12. HET-GMP: A graph-based system approach to scaling large embedding model training. In Proceedings of the International Conference on Management of Data (SIGMOD), 2022.
  13. Notes from the ai frontier: Insights from hundreds of use cases. McKinsey Global Institute, 2, 2018.
  14. Corinna Underwood. Use cases of recommendation systems in business–current applications and methods. Emerj, 2019.
  15. Personalized recommendation systems: Five hot research topics you must know. Microsoft Research Lab-Asia, 2018.
  16. The architectural implications of facebook’s dnn-based personalized recommendation. In IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020.
  17. Deep learning training in facebook data centers: Design of scale-up and scale-out systems. CoRR, abs/2003.09518, 2020.
  18. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems (DLRS@RecSys), 2016.
  19. Deep learning recommendation model for personalization and recommendation systems. CoRR, abs/1906.00091, 2019.
  20. Faircf: fairness-aware collaborative filtering. Science China Information Sciences, 65(12), 2022.
  21. Evolving interest with feature co-action network for CTR prediction. Data Science and Engineering, 8(4):344–356, 2023.
  22. A survey of personalized news recommendation. Data Science and Engineering, 8(4):396–416, 2023.
  23. Multimodal interactive network for sequential recommendation. Journal of Computer Science and Technology, 38(4):911–926, 2023.
  24. Tt-rec: Tensor train compression for deep learning recommendation models. In Proceedings of Machine Learning and Systems (MLSys), 2021.
  25. Software-hardware co-design for fast and scalable training of deep learning recommendation models. In Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA), 2022.
  26. HET: scaling out huge embedding model training via cache-enabled distributed framework. Proceedings of the VLDB Endowment, 15(2):312–320, 2022.
  27. Merlin hugectr: Gpu-accelerated recommender system training and inference. In Proceedings of the 16th ACM Conference on Recommender Systems (RecSys), 2022.
  28. Learning compressed embeddings for on-device inference. In Proceedings of Machine Learning and Systems (MLSys), 2022.
  29. Deepfm: A factorization-machine based neural network for CTR prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017.
  30. Distributed hierarchical GPU parameter server for massive scale deep learning ads systems. In Proceedings of Machine Learning and Systems (MLSys), 2020.
  31. Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference. In Proceedings of the 47th Annual International Symposium on Computer Architecture (ISCA), 2020.
  32. Model size reduction using frequency based double hashing for recommender systems. In Proceedings of the 14th ACM Conference on Recommender Systems (RecSys), 2020.
  33. Mixed dimension embeddings with application to memory-efficient recommendation systems. In IEEE International Symposium on Information Theory (ISIT), 2021.
  34. Autodim: Field-aware embedding dimension searchin recommender systems. In Proceedings of the Web Conference (WWW), 2021.
  35. Autosrh: An embedding dimensionality search framework for tabular data prediction. IEEE Transactions on Knowledge and Data Engineering, 35(7):6673–6686, 2023.
  36. Geosoca: Exploiting geographical, social and categorical correlations for point-of-interest recommendations. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2015.
  37. Feature hashing for large scale multitask learning. In Proceedings of the 26th International Conference on Machine Learning (ICML), 2009.
  38. Compositional embeddings using complementary partitions for memory-efficient recommendation systems. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD), 2020.
  39. Binary code based hash embedding for web-scale applications. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM), 2021.
  40. Adaembed: Adaptive embedding for large-scale recommendation models. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2023.
  41. Daniel Ting. Data sketches for disaggregated subset sum and frequent item estimation. In Proceedings of the International Conference on Management of Data (SIGMOD), 2018.
  42. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17, 2017.
  43. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD), 2018.
  44. Xdl: an industrial deep learning framework for high-dimensional sparse data. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data, 2019.
  45. Aibox: CTR prediction model training on a single node. In Proceedings of the 28th ACM International Conference on Information & Knowledge Management (CIKM), 2019.
  46. Kraken: memory-efficient continual learning for large-scale real-time recommendations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020.
  47. Persia: An open, hybrid system scaling deep learning-based recommenders up to 100 trillion parameters. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD), 2022.
  48. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations (ICLR), 2015.
  49. Siddharth Gopal. Adaptive sampling for SGD by exploiting side information. In Proceedings of the 33nd International Conference on Machine Learning (ICML), 2016.
  50. Not all samples are created equal: Deep learning with importance sampling. In Proceedings of the 35th International Conference on Machine Learning (ICML), 2018.
  51. Efficient computation of frequent and top-k elements in data streams. In International Conference on Database Theory, 2005.
  52. Zeyuan Allen-Zhu. Natasha: Faster non-convex stochastic optimization via strongly non-convex parameter. In Proceedings of the 34th International Conference on Machine Learning (ICML), 2017.
  53. Hetu: a highly efficient automatic parallel distributed deep learning system. Science China Information Sciences, 66(1), 2023.
  54. Avazu click-through rate prediction. https://kaggle.com/competitions/avazu-ctr-prediction, 2014.
  55. Criteo Labs. Kaggle display advertising challenge dataset. https://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/, 2014.
  56. Aden and Yi Wang. Kdd cup 2012, track 2. https://kaggle.com/competitions/kddcup2012-track2, 2012.
  57. Criteo Labs. Download terabyte click logs. https://labs.criteo.com/2013/12/download-terabyte-click-logs/, 2013.
  58. NVIDIA AI platform. Mlperf benchmark. https://mlperf.org, 2020.
  59. Open benchmarking for click-through rate prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM), 2021.
  60. Experimental analysis of large-scale learnable vector storage compression. CoRR, abs/2311.15578, 2023.
  61. Agile and accurate CTR prediction model training for massive-scale online advertising systems. In Proceedings of the International Conference on Management of Data (SIGMOD), 2021.
  62. Adaptive low-precision training for embeddings in click-through rate prediction. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023.
  63. Deeplight: Deep lightweight feature interactions for accelerating CTR predictions in ad serving. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM), 2021.
  64. Optembed: Learning optimal embedding table for click-through rate prediction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM), 2022.
  65. Graham Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58–75, 2005.
  66. COMPASS: online sketch-based query optimization for in-memory databases. In Proceedings of the International Conference on Management of Data (SIGMOD), 2021.
  67. At-the-time and back-in-time persistent sketches. In Proceedings of the International Conference on Management of Data (SIGMOD), 2021.
  68. SKT: A one-pass multi-sketch data analytics accelerator. Proceedings of the VLDB Endowment, 14(11):2369–2382, 2021.
  69. Elastic sketch: adaptive and fast network-wide measurements. In Proceedings of the 2018 ACM SIGCOMM Conference, 2018.
  70. Cocosketch: high-performance sketch-based measurement over arbitrary partial key query. In Proceedings of the 2021 ACM SIGCOMM Conference, 2021.
  71. New directions in traffic measurement and accounting. ACM SIGCOMM Computer Communication Review, 32(4):323–336, 2002.
  72. New estimation algorithms for streaming data: Count-min can do more. Webdocs. Cs. Ualberta. Ca, 2007.
  73. Wavingsketch: An unbiased and generic sketch for finding top-k items in data streams. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD), 2020.
  74. Heavyguardian: Separate and guard hot items in data streams. In Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD), 2018.
  75. Topkapi: Parallel and fast sketches for finding top-k frequent elements. In Advances in Neural Information Processing Systems 31 (NeurIPS), 2018.
  76. Hypercalm sketch: One-pass mining periodic batches in data streams. In 39th IEEE International Conference on Data Engineering (ICDE), 2023.
  77. Finding frequent items in data streams. In Automata, Languages and Programming, 29th International Colloquium (ICALP), 2002.
  78. Augmented sketch: Faster and more accurate stream processing. In Proceedings of the International Conference on Management of Data (SIGMOD), 2016.
  79. Count-min-log sketch: Approximately counting with approximate counters. In International Symposium on Web AlGorithms, 2015.
  80. Yao-Chung Fan and Arbee L. P. Chen. Efficient and robust sensor data aggregation using linear counting sketches. In 22nd IEEE International Symposium on Parallel and Distributed Processing (IPDPS), 2008.
  81. Per-flow traffic measurement through randomized counter sharing. IEEE/ACM Transactions on Networking, 20(5):1622–1634, 2012.
  82. Probabilistic lossy counting: an efficient algorithm for finding heavy hitters. ACM SIGCOMM Computer Communication Review, 38(1):5, 2008.
Citations (5)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

GitHub

  1. GitHub - HugoZHL/CAFE (13 stars)