Papers
Topics
Authors
Recent
Search
2000 character limit reached

Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks

Published 18 Apr 2023 in cs.LG and cs.PF | (2304.08925v1)

Abstract: In this paper, we primarily focus on understanding the data preprocessing pipeline for DNN Training in the public cloud. First, we run experiments to test the performance implications of the two major data preprocessing methods using either raw data or record files. The preliminary results show that data preprocessing is a clear bottleneck, even with the most efficient software and hardware configuration enabled by NVIDIA DALI, a high-optimized data preprocessing library. Second, we identify the potential causes, exercise a variety of optimization methods, and present their pros and cons. We hope this work will shed light on the new co-design of data storage, loading pipeline'' andtraining framework'' and flexible resource configurations between them so that the resources can be fully exploited and performance can be maximized.

Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.