Exascale Deep Learning for Scientific Inverse Problems (1909.11150v1)
Abstract: We introduce novel communication strategies in synchronous distributed Deep Learning consisting of decentralized gradient reduction orchestration and computational graph-aware grouping of gradient tensors. These new techniques produce an optimal overlap between computation and communication and result in near-linear scaling (0.93) of distributed training up to 27,600 NVIDIA V100 GPUs on the Summit Supercomputer. We demonstrate our gradient reduction techniques in the context of training a Fully Convolutional Neural Network to approximate the solution of a longstanding scientific inverse problem in materials imaging. The efficient distributed training on a dataset size of 0.5 PB, produces a model capable of an atomically-accurate reconstruction of materials, and in the process reaching a peak performance of 2.15(4) EFLOPS$_{16}$.
- Nouamane Laanait (11 papers)
- Joshua Romero (5 papers)
- Junqi Yin (30 papers)
- M. Todd Young (3 papers)
- Sean Treichler (4 papers)
- Vitalii Starchenko (3 papers)
- Albina Borisevich (10 papers)
- Alex Sergeev (2 papers)
- Michael Matheson (5 papers)