Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems (2401.04408v2)

Published 9 Jan 2024 in cs.IR and cs.LG

Abstract: Huge embedding tables in modern deep learning recommender models (DLRM) require prohibitively large memory during training and inference. This paper proposes FIITED, a system to automatically reduce the memory footprint via FIne-grained In-Training Embedding Dimension pruning. By leveraging the key insight that embedding vectors are not equally important, FIITED adaptively adjusts the dimension of each individual embedding vector during model training, assigning larger dimensions to more important embeddings while adapting to dynamic changes in data. We prioritize embedding dimensions with higher frequencies and gradients as more important. To enable efficient pruning of embeddings and their dimensions during model training, we propose an embedding storage system based on virtually-hashed physically-indexed hash tables. Experiments on two industry models and months of realistic datasets show that FIITED can reduce DLRM embedding size by more than 65% while preserving model quality, outperforming state-of-the-art in-training embedding pruning methods. On public datasets, FIITED can reduce the size of embedding tables by 2.1x to 800x with negligible accuracy drop, while improving model throughput.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Learning elastic embeddings for customizing on-device recommenders. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, pp.  138–147, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383325. doi: 10.1145/3447548.3467220. URL https://doi.org/10.1145/3447548.3467220.
  2. Random offset block embedding (robe) for compressed embedding tables in deep learning recommendation systems. In D. Marculescu, Y. Chi, and C. Wu (eds.), Proceedings of Machine Learning and Systems, volume 4, pp.  762–778, 2022. URL https://proceedings.mlsys.org/paper/2022/file/e2c420d928d4bf8ce0ff2ec19b371514-Paper.pdf.
  3. Training with multi-layer embeddings for model reduction. arXiv preprint arXiv:2006.05623, 2020.
  4. Mixed dimension embeddings with application to memory-efficient recommendation systems. In 2021 IEEE International Symposium on Information Theory (ISIT), pp.  2786–2791, 2021. doi: 10.1109/ISIT45174.2021.9517710.
  5. Post-training 4-bit quantization on embedding tables. arXiv preprint arXiv:1911.02079, 2019.
  6. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, pp.  1–9, New York, NY, USA, 2014. Association for Computing Machinery. ISBN 9781450329996. doi: 10.1145/2648584.2648589. URL https://doi.org/10.1145/2648584.2648589.
  7. Torchrec: A pytorch domain library for recommendation systems. In Proceedings of the 16th ACM Conference on Recommender Systems, RecSys ’22, pp.  482–483, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392785. doi: 10.1145/3523227.3547387. URL https://doi.org/10.1145/3523227.3547387.
  8. Autosrh: An embedding dimensionality search framework for tabular data prediction. IEEE Transactions on Knowledge and Data Engineering, pp.  1–14, 2022. doi: 10.1109/TKDE.2022.3186387.
  9. AdaEmbed: Adaptive embedding for Large-Scale recommendation models. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pp.  817–831, Boston, MA, July 2023. USENIX Association. ISBN 978-1-939133-34-2. URL https://www.usenix.org/conference/osdi23/presentation/lai.
  10. Persia: An open, hybrid system scaling deep learning-based recommenders up to 100 trillion parameters. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, pp.  3288–3298, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393850. doi: 10.1145/3534678.3539070. URL https://doi.org/10.1145/3534678.3539070.
  11. Anchor & transform: Learning sparse embeddings for large vocabularies. arXiv preprint arXiv:2003.08197, 2020.
  12. Automated embedding size search in deep recommender systems. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, pp.  2307–2316, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450380164. doi: 10.1145/3397271.3401436. URL https://doi.org/10.1145/3397271.3401436.
  13. Learnable embedding sizes for recommender systems. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=vQzcqQWIS0q.
  14. Optembed: Learning optimal embedding table for click-through rate prediction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, CIKM ’22, pp.  1399–1409, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392365. doi: 10.1145/3511808.3557411. URL https://doi.org/10.1145/3511808.3557411.
  15. Deep learning recommendation model for personalization and recommendation systems. CoRR, abs/1906.00091, 2019. URL https://arxiv.org/abs/1906.00091.
  16. Learning compressed embeddings for on-device inference. In D. Marculescu, Y. Chi, and C. Wu (eds.), Proceedings of Machine Learning and Systems, volume 4, pp.  382–397, 2022. URL https://proceedings.mlsys.org/paper/2022/file/812b4ba287f5ee0bc9d43bbf5bbe87fb-Paper.pdf.
  17. User behavior retrieval for click-through rate prediction. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, pp.  2347–2356, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450380164. doi: 10.1145/3397271.3401440. URL https://doi.org/10.1145/3397271.3401440.
  18. Single-shot embedding dimension search in recommender system. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, pp.  513–522, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450387323. doi: 10.1145/3477495.3532060. URL https://doi.org/10.1145/3477495.3532060.
  19. Recshard: Statistical feature-based memory optimization for industry-scale neural recommendation. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’22, pp.  344–358, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392051. doi: 10.1145/3503222.3507777. URL https://doi.org/10.1145/3503222.3507777.
  20. Flexshard: Flexible sharding for industry-scale sequence recommendation models, 2023. URL https://arxiv.org/abs/2301.02959.
  21. Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM international conference on information and knowledge management, pp.  1161–1170, 2019.
  22. Merlin hugectr: Gpu-accelerated recommender system training and inference. In Proceedings of the 16th ACM Conference on Recommender Systems, RecSys ’22, pp.  534–537, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392785. doi: 10.1145/3523227.3547405. URL https://doi.org/10.1145/3523227.3547405.
  23. Field-wise embedding size search via structural hard auxiliary mask pruning for click-through rate prediction. arXiv preprint arXiv:2208.08004, 2022.
  24. Binary code based hash embedding for web-scale applications. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, CIKM ’21, pp.  3563–3567, New York, NY, USA, 2021a. Association for Computing Machinery. ISBN 9781450384469. doi: 10.1145/3459637.3482065. URL https://doi.org/10.1145/3459637.3482065.
  25. Learning effective and efficient embedding via an adaptively-masked twins-based layer. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, CIKM ’21, pp.  3568–3572, New York, NY, USA, 2021b. Association for Computing Machinery. ISBN 9781450384469. doi: 10.1145/3459637.3482130. URL https://doi.org/10.1145/3459637.3482130.
  26. Mixed-precision embedding using a cache. arXiv preprint arXiv:2010.11305, 2020.
  27. i-razor: A neural input razor for feature selection and dimension search in large-scale recommender systems. arXiv preprint arXiv:2204.00281, 2022.
  28. Model size reduction using frequency based double hashing for recommender systems. In Proceedings of the 14th ACM Conference on Recommender Systems, RecSys ’20, pp.  521–526, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450375832. doi: 10.1145/3383313.3412227. URL https://doi.org/10.1145/3383313.3412227.
  29. Training with low-precision embedding tables. In Systems for Machine Learning Workshop at NeurIPS, volume 2018, 2018.
  30. Distributed hierarchical gpu parameter server for massive scale deep learning ads systems. In I. Dhillon, D. Papailiopoulos, and V. Sze (eds.), Proceedings of Machine Learning and Systems, volume 2, pp.  412–428, 2020. URL https://proceedings.mlsys.org/paper_files/paper/2020/file/6e426f4c16c6677a605375ae2e4877d5-Paper.pdf.
  31. Autodim: Field-aware embedding dimension searchin recommender systems. In Proceedings of the Web Conference 2021, WWW ’21, pp.  3015–3022, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383127. doi: 10.1145/3442381.3450124. URL https://doi.org/10.1145/3442381.3450124.
  32. Autoemb: Automated embedding dimensionality search in streaming recommendations. In 2021 IEEE International Conference on Data Mining (ICDM), pp.  896–905, 2021. doi: 10.1109/ICDM51629.2021.00101.
  33. Automl for deep recommender systems: A survey. ACM Trans. Inf. Syst., 41(4), mar 2023. ISSN 1046-8188. doi: 10.1145/3579355. URL https://doi.org/10.1145/3579355.
  34. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp.  1059–1068, 2018.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets