Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference (2210.08803v1)

Published 17 Oct 2022 in cs.DC, cs.AI, cs.IR, and cs.LG

Abstract: In this talk, we introduce Merlin HugeCTR. Merlin HugeCTR is an open source, GPU-accelerated integration framework for click-through rate estimation. It optimizes both training and inference, whilst enabling model training at scale with model-parallel embeddings and data-parallel neural networks. In particular, Merlin HugeCTR combines a high-performance GPU embedding cache with an hierarchical storage architecture, to realize low-latency retrieval of embeddings for online model inference tasks. In the MLPerf v1.0 DLRM model training benchmark, Merlin HugeCTR achieves a speedup of up to 24.6x on a single DGX A100 (8x A100) over PyTorch on 4x4-socket CPU nodes (4x4x28 cores). Merlin HugeCTR can also take advantage of multi-node environments to accelerate training even further. Since late 2021, Merlin HugeCTR additionally features a hierarchical parameter server (HPS) and supports deployment via the NVIDIA Triton server framework, to leverage the computational capabilities of GPUs for high-speed recommendation model inference. Using this HPS, Merlin HugeCTR users can achieve a 5~62x speedup (batch size dependent) for popular recommendation models over CPU baseline implementations, and dramatically reduce their end-to-end inference latency.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Joey Wang (4 papers)
  2. Yingcan Wei (2 papers)
  3. Minseok Lee (3 papers)
  4. Matthias Langer (30 papers)
  5. Fan Yu (63 papers)
  6. Jie Liu (492 papers)
  7. Alex Liu (19 papers)
  8. Daniel Abel (3 papers)
  9. Gems Guo (1 paper)
  10. Jianbing Dong (2 papers)
  11. Jerry Shi (2 papers)
  12. Kunlun Li (4 papers)
Citations (27)

Summary

We haven't generated a summary for this paper yet.