Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference (2210.08803v1)

Published 17 Oct 2022 in cs.DC, cs.AI, cs.IR, and cs.LG

Abstract: In this talk, we introduce Merlin HugeCTR. Merlin HugeCTR is an open source, GPU-accelerated integration framework for click-through rate estimation. It optimizes both training and inference, whilst enabling model training at scale with model-parallel embeddings and data-parallel neural networks. In particular, Merlin HugeCTR combines a high-performance GPU embedding cache with an hierarchical storage architecture, to realize low-latency retrieval of embeddings for online model inference tasks. In the MLPerf v1.0 DLRM model training benchmark, Merlin HugeCTR achieves a speedup of up to 24.6x on a single DGX A100 (8x A100) over PyTorch on 4x4-socket CPU nodes (4x4x28 cores). Merlin HugeCTR can also take advantage of multi-node environments to accelerate training even further. Since late 2021, Merlin HugeCTR additionally features a hierarchical parameter server (HPS) and supports deployment via the NVIDIA Triton server framework, to leverage the computational capabilities of GPUs for high-speed recommendation model inference. Using this HPS, Merlin HugeCTR users can achieve a 5~62x speedup (batch size dependent) for popular recommendation models over CPU baseline implementations, and dramatically reduce their end-to-end inference latency.

Authors (12)

Joey Wang (4 papers)
Yingcan Wei (2 papers)
Minseok Lee (3 papers)
Matthias Langer (30 papers)
Fan Yu (63 papers)
Jie Liu (492 papers)
Alex Liu (19 papers)
Daniel Abel (3 papers)
Gems Guo (1 paper)
Jianbing Dong (2 papers)
Jerry Shi (2 papers)
Kunlun Li (4 papers)

Citations (27)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference (2210.08803v1)

Summary

Related Papers