Beyond Inference: Performance Analysis of DNN Server Overheads for Computer Vision (2403.12981v1)
Abstract: Deep neural network (DNN) inference has become an important part of many data-center workloads. This has prompted focused efforts to design ever-faster deep learning accelerators such as GPUs and TPUs. However, an end-to-end DNN-based vision application contains more than just DNN inference, including input decompression, resizing, sampling, normalization, and data transfer. In this paper, we perform a thorough evaluation of computer vision inference requests performed on a throughput-optimized serving system. We quantify the performance impact of server overheads such as data movement, preprocessing, and message brokers between two DNNs producing outputs at different rates. Our empirical analysis encompasses many computer vision tasks including image classification, segmentation, detection, depth-estimation, and more complex processing pipelines with multiple DNNs. Our results consistently demonstrate that end-to-end application performance can easily be dominated by data processing and data movement functions (up to 56% of end-to-end latency in a medium-sized image, and $\sim$ 80% impact on system throughput in a large image), even though these functions have been conventionally overlooked in deep learning system design. Our work identifies important performance bottlenecks in different application scenarios, achieves 2.25$\times$ better throughput compared to prior work, and paves the way for more holistic deep learning system design.
- 2023. https://blog.youtube/news-and-events/using-technology-more-consistently-apply-age-restrictions/
- 2023a. https://developer.nvidia.com/blog/leveraging-hardware-jpeg-decoder-and-nvjpeg-on-a100/
- 2023. AI Matrix. https://aimatrix.ai/en-us
- 2023. Business Insider: Facebook Users Are Uploading 350 Million New Photos Each Day. https://www.businessinsider.com/facebook-350-million-photos-each-day-2013-9
- 2023. ChatGPT sets record for fastest-growing user base - analyst note. https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/
- 2023b. NVIDIA Data Loading Library (DALI). https://developer.nvidia.com/dali
- Mohamed S Abdelfattah et al. 2018. DLA: Compiler and FPGA overlay for neural network inference acceleration. In 2018 28th international conference on field programmable logic and applications (FPL). IEEE, 411–4117.
- Robert Adolf et al. 2016. Fathom: Reference workloads for modern deep learning methods. In 2016 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1–10.
- Gene M Amdahl. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, spring joint computer conference. 483–485.
- Wesley Brewer et al. 2020a. iBench: a distributed inference simulation and benchmark suite. In 2020 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1–6.
- Wesley Brewer et al. 2020b. Inference benchmarking on HPC systems. In 2020 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1–9.
- Cody Coleman et al. 2017. Dawnbench: An end-to-end deep learning benchmark and competition. Training 100, 101 (2017), 102.
- Alexey Dosovitskiy et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
- Amin Firoozshahian et al. 2023. MTIA: First Generation Silicon Targeting Meta’s Recommendation Systems. In Proceedings of the 50th Annual International Symposium on Computer Architecture. 1–13.
- Norman P. Jouppi et al. 2023. TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings. arXiv:2304.01433 [cs.AR]
- Vijay Janapa Reddi et al. 2020. Mlperf inference benchmark. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 446–459.
- Daniel Richins et al. 2021. AI tax: The hidden cost of AI data center applications. ACM Transactions on Computer Systems (TOCS) 37, 1-4 (2021), 1–32.
- Huaizheng Zhang et al. 2020. Inferbench: Understanding deep learning inference serving with an automatic benchmarking system. arXiv preprint arXiv:2011.02327 (2020).
- Hongbin Zheng et al. 2020. Optimizing memory-access patterns for deep learning accelerators. arXiv preprint arXiv:2002.12798 (2020).