Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale Recommender Systems (2401.04338v1)

Published 9 Jan 2024 in cs.LG, cs.DC, and cs.IR

Abstract: Recently, a new paradigm, meta learning, has been widely applied to Deep Learning Recommendation Models (DLRM) and significantly improves statistical performance, especially in cold-start scenarios. However, the existing systems are not tailored for meta learning based DLRM models and have critical problems regarding efficiency in distributed training in the GPU cluster. It is because the conventional deep learning pipeline is not optimized for two task-specific datasets and two update loops in meta learning. This paper provides a high-performance framework for large-scale training for Optimization-based Meta DLRM models over the \textbf{G}PU cluster, namely \textbf{G}-Meta. Firstly, G-Meta utilizes both data parallelism and model parallelism with careful orchestration regarding computation and communication efficiency, to enable high-speed distributed training. Secondly, it proposes a Meta-IO pipeline for efficient data ingestion to alleviate the I/O bottleneck. Various experimental results show that G-Meta achieves notable training speed without loss of statistical performance. Since early 2022, G-Meta has been deployed in Alipay's core advertising and recommender system, shrinking the continuous delivery of models by four times. It also obtains 6.48\% improvement in Conversion Rate (CVR) and 1.06\% increase in CPM (Cost Per Mille) in Alipay's homepage display advertising, with the benefit of larger training samples and tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Tensorflow: a system for large-scale machine learning.. In Osdi, Vol. 16. Savannah, GA, USA, 265–283.
  2. High performance I/O for large scale deep learning. In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 5965–5967.
  3. Aliyun. 2023. Aliyun pricing. Retrieved May 14, 2023 from https://www.alibabacloud.com/pricing
  4. Meta-learning with differentiable closed-form solvers. arXiv preprint arXiv:1805.08136 (2018).
  5. Investigating Parallelization of MAML. In International Conference on Discovery Science. Springer, 294–306.
  6. Dhruba Borthakur. 2007. The hadoop distributed file system: Architecture and design. Hadoop Project Website 11, 2007 (2007), 21.
  7. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).
  8. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.
  9. Large scale distributed deep networks. Advances in neural information processing systems 25 (2012).
  10. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107–113.
  11. Sequential scenario-specific meta learner for online recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2895–2904.
  12. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning. PMLR, 1126–1135.
  13. Andrew Gibiansky. 2017. Bringing HPC Techniques to Deep Learning. Retrieved February 19, 2023 from https://andrew.gibiansky.com/blog/machine-learning/baidu-allreduce
  14. F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1–19.
  15. Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence 44, 9 (2021), 5149–5169.
  16. An Industrial Framework for Cold-Start Recommendation in Zero-Shot Scenarios. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3403–3407.
  17. Sylvain Jeaugey. 2017. Nccl 2.0. In GPU Technology Conference (GTC), Vol. 2.
  18. Distributed training for accelerating metalearning algorithms. In Proceedings of the International Workshop on Big Data in Emergent Distributed Environments. 1–6.
  19. Melu: Meta-learned user preference estimator for cold-start recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1073–1082.
  20. Scaling distributed machine learning with the parameter server. In 11th {normal-{\{{USENIX}normal-}\}} Symposium on Operating Systems Design and Implementation ({normal-{\{{OSDI}normal-}\}} 14). 583–598.
  21. Meta-learning on heterogeneous information networks for cold-start recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1563–1573.
  22. Entire space multi-task model: An effective approach for estimating post-click conversion rate. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1137–1140.
  23. Temporal-contextual recommendation in real-time. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2291–2299.
  24. Software-hardware co-design for fast and scalable training of deep learning recommendation models. In Proceedings of the 49th Annual International Symposium on Computer Architecture. 993–1011.
  25. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999 (2018).
  26. Warm up cold-start advertisements: Improving ctr predictions via learning to learn id embeddings. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 695–704.
  27. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
  28. Meta-learning with implicit gradients. Advances in neural information processing systems 32 (2019).
  29. Sachin Ravi and Hugo Larochelle. 2017. Optimization as a model for few-shot learning. In International conference on learning representations.
  30. Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018).
  31. CBML: A cluster-based meta-learning model for session-based recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 1713–1722.
  32. Tensorflow. 2023. TFRecords. Retrieved February 19, 2023 from https://www.tensorflow.org/tutorials/load_data/tfrecord
  33. A meta-learning perspective on cold-start recommendations for items. Advances in neural information processing systems 30 (2017).
  34. Dropoutnet: Addressing cold start in recommender systems. Advances in neural information processing systems 30 (2017).
  35. Multimodal model-agnostic meta-learning via task-aware modulation. Advances in neural information processing systems 32 (2019).
  36. Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference. In Proceedings of the 16th ACM Conference on Recommender Systems. 534–537.
  37. Task Similarity Aware Meta Learning for Cold-Start Recommendation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 4630–4634.
  38. PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems. arXiv preprint arXiv:2204.04903 (2022).
  39. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 5941–5948.
Citations (6)

Summary

We haven't generated a summary for this paper yet.