Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 118 tok/s Pro
Kimi K2 181 tok/s Pro
GPT OSS 120B 429 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

ProactivePIM: Accelerating Weight-Sharing Embedding Layer with PIM for Scalable Recommendation System (2402.04032v5)

Published 6 Feb 2024 in cs.AR and cs.AI

Abstract: The model size growth of personalized recommendation systems poses new challenges for inference. Weight-sharing algorithms have been proposed for size reduction, but they increase memory access. Recent advancements in processing-in-memory (PIM) enhanced the model throughput by exploiting memory parallelism, but such algorithms introduce massive CPU-PIM communication into prior PIM systems. We propose ProactivePIM, a PIM system for weight-sharing recommendation system acceleration. ProactivePIM integrates a cache within the PIM with a prefetching scheme to leverage a unique locality of the algorithm and eliminate communication overhead through a subtable mapping strategy. ProactivePIM achieves a 4.8x speedup compared to prior works.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. M. Naumov, D. Mudigere, H.-J. M. Shi, J. Huang, N. Sundaraman, J. Park, X. Wang, U. Gupta, C.-J. Wu, A. G. Azzolini, et al., “Deep learning recommendation model for personalization and recommendation systems,” arXiv preprint arXiv:1906.00091, 2019.
  2. P. Covington, J. Adams, and E. Sargin, “Deep neural networks for youtube recommendations,” in Proceedings of the 10th ACM conference on recommender systems, pp. 191–198, 2016.
  3. H. Steck, L. Baltrunas, E. Elahi, D. Liang, Y. Raimond, and J. Basilico, “Deep learning for recommender systems: A netflix case study,” AI Magazine, vol. 42, no. 3, pp. 7–18, 2021.
  4. U. Gupta, C.-J. Wu, X. Wang, M. Naumov, B. Reagen, D. Brooks, B. Cottel, K. Hazelwood, M. Hempstead, B. Jia, et al., “The architectural implications of facebook’s dnn-based personalized recommendation,” in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 488–501, IEEE, 2020.
  5. M. Zhao, N. Agarwal, A. Basant, B. Gedik, S. Pan, M. Ozdal, R. Komuravelli, J. Pan, T. Bao, H. Lu, et al., “Understanding data storage and ingestion for large-scale deep recommendation model training: Industrial product,” in Proceedings of the 49th Annual International Symposium on Computer Architecture, pp. 1042–1057, 2022.
  6. Y. Kwon, Y. Lee, and M. Rhu, “Tensordimm: A practical near-memory processing architecture for embeddings and tensor operations in deep learning,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 740–753, 2019.
  7. L. Ke, U. Gupta, B. Y. Cho, D. Brooks, V. Chandra, U. Diril, A. Firoozshahian, K. Hazelwood, B. Jia, H.-H. S. Lee, et al., “Recnmp: Accelerating personalized recommendation with near-memory processing,” in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pp. 790–803, IEEE, 2020.
  8. J. Park, B. Kim, S. Yun, E. Lee, M. Rhu, and J. H. Ahn, “Trim: Enhancing processor-memory interfaces with scalable tensor reduction in memory,” in MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 268–281, 2021.
  9. H. Kal, S. Lee, G. Ko, and W. W. Ro, “Space: locality-aware processing in heterogeneous memory for personalized recommendations,” in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pp. 679–691, IEEE, 2021.
  10. L. Ke, X. Zhang, J. So, J.-G. Lee, S.-H. Kang, S. Lee, S. Han, Y. Cho, J. H. Kim, Y. Kwon, et al., “Near-memory processing in action: Accelerating personalized recommendation with axdimm,” IEEE Micro, vol. 42, no. 1, pp. 116–127, 2021.
  11. B. Asgari, R. Hadidi, J. Cao, S.-K. Lim, H. Kim, et al., “Fafnir: Accelerating sparse gathering by using efficient near-memory intelligent reduction,” in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 908–920, IEEE, 2021.
  12. G. Sethi, B. Acun, N. Agarwal, C. Kozyrakis, C. Trippel, and C.-J. Wu, “Recshard: statistical feature-based memory optimization for industry-scale neural recommendation,” in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 344–358, 2022.
  13. D. Mudigere, Y. Hao, J. Huang, Z. Jia, A. Tulloch, S. Sridharan, X. Liu, M. Ozdal, J. Nie, J. Park, et al., “Software-hardware co-design for fast and scalable training of deep learning recommendation models,” in Proceedings of the 49th Annual International Symposium on Computer Architecture, pp. 993–1011, 2022.
  14. E. K. Ardestani, C. Kim, S. J. Lee, L. Pan, J. Axboe, V. Rampersad, B. Agrawal, F. Yu, A. Yu, T. Le, et al., “Supporting massive dlrm inference through software defined memory,” in 2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS), pp. 302–312, IEEE, 2022.
  15. L. Ke, U. Gupta, M. Hempstead, C.-J. Wu, H.-H. S. Lee, and X. Zhang, “Hercules: Heterogeneity-aware inference serving for at-scale personalized recommendation,” in 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 141–154, IEEE, 2022.
  16. A. Firoozshahian, J. Coburn, R. Levenstein, R. Nattoji, A. Kamath, O. Wu, G. Grewal, H. Aepala, B. Jakka, B. Dreyer, et al., “Mtia: First generation silicon targeting meta’s recommendation systems,” in Proceedings of the 50th Annual International Symposium on Computer Architecture, pp. 1–13, 2023.
  17. W. Zhao, J. Zhang, D. Xie, Y. Qian, R. Jia, and P. Li, “Aibox: Ctr prediction model training on a single node,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 319–328, 2019.
  18. H.-J. M. Shi, D. Mudigere, M. Naumov, and J. Yang, “Compositional embeddings using complementary partitions for memory-efficient recommendation systems,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 165–175, 2020.
  19. A. A. Ginart, M. Naumov, D. Mudigere, J. Yang, and J. Zou, “Mixed dimension embeddings with application to memory-efficient recommendation systems,” in 2021 IEEE International Symposium on Information Theory (ISIT), pp. 2786–2791, IEEE, 2021.
  20. C. Yin, B. Acun, C.-J. Wu, and X. Liu, “Tt-rec: Tensor train compression for deep learning recommendation models,” Proceedings of Machine Learning and Systems, vol. 3, pp. 448–462, 2021.
  21. F. Lyu, X. Tang, H. Zhu, H. Guo, Y. Zhang, R. Tang, and X. Liu, “Optembed: Learning optimal embedding table for click-through rate prediction,” in Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp. 1399–1409, 2022.
  22. K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg, “Feature hashing for large scale multitask learning,” in Proceedings of the 26th annual international conference on machine learning, pp. 1113–1120, 2009.
  23. W.-C. Kang, D. Z. Cheng, T. Yao, X. Yi, T. Chen, L. Hong, and E. H. Chi, “Learning to embed categorical features without embedding tables for recommendation,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 840–850, 2021.
  24. CriteoLabs. 2014, “Kaggle display advertising challenge dataset.” [Online]. Available : https://jmcauley.ucsd.edu/data/amazon/.
  25. V. J. Reddi, C. Cheng, D. Kanter, P. Mattson, G. Schmuelling, C.-J. Wu, B. Anderson, M. Breughe, M. Charlebois, W. Chou, et al., “Mlperf inference benchmark,” in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pp. 446–459, IEEE, 2020.
  26. S. Li, Z. Yang, D. Reddy, A. Srivastava, and B. Jacob, “Dramsim3: A cycle-accurate, thermal-capable dram simulator,” IEEE Computer Architecture Letters, vol. 19, no. 2, pp. 106–109, 2020.
  27. J. Sim, A. R. Alameldeen, Z. Chishti, C. Wilkerson, and H. Kim, “Transparent hardware management of stacked dram as part of memory,” in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 13–24, IEEE, 2014.
  28. C. C. Chou, A. Jaleel, and M. K. Qureshi, “Cameo: A two-level memory organization with capacity of main memory and flexibility of hardware-managed cache,” in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 1–12, IEEE, 2014.
  29. C. Chou, A. Jaleel, and M. K. Qureshi, “Bear: Techniques for mitigating bandwidth bloat in gigascale dram caches,” ACM SIGARCH Computer Architecture News, vol. 43, no. 3S, pp. 198–210, 2015.
  30. N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, “Cacti 6.0: A tool to model large caches,” HP laboratories, vol. 27, p. 28, 2009.
  31. F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,” Acm transactions on interactive intelligent systems (tiis), vol. 5, no. 4, pp. 1–19, 2015.
  32. M. Wan, J. Ni, R. Misra, and J. McAuley, “Addressing marketing bias in product recommendations,” in Proceedings of the 13th international conference on web search and data mining, pp. 618–626, 2020.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.