Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs (2105.12912v3)
Abstract: Error-bounded lossy compression is a critical technique for significantly reducing scientific data volumes. With ever-emerging heterogeneous high-performance computing (HPC) architecture, GPU-accelerated error-bounded compressors (such as cuSZ+ and cuZFP) have been developed. However, they suffer from either low performance or low compression ratios. To this end, we propose cuSZ+ to target both high compression ratios and throughputs. We identify that data sparsity and data smoothness are key factors for high compression throughputs. Our key contributions in this work are fourfold: (1) We propose an efficient compression workflow to adaptively perform run-length encoding and/or variable-length encoding. (2) We derive Lorenzo reconstruction in decompression as multidimensional partial-sum computation and propose a fine-grained Lorenzo reconstruction algorithm for GPU architectures. (3) We carefully optimize each of cuSZ+ kernels by leveraging state-of-the-art CUDA parallel primitives. (4) We evaluate cuSZ+ using seven real-world HPC application datasets on V100 and A100 GPUs. Experiments show cuSZ+ improves the compression throughputs and ratios by up to 18.4X and 5.3X, respectively, over cuSZ on the tested datasets.
- Jiannan Tian (30 papers)
- Sheng Di (58 papers)
- Xiaodong Yu (44 papers)
- Cody Rivera (6 papers)
- Kai Zhao (160 papers)
- Sian Jin (32 papers)
- Yunhe Feng (21 papers)
- Xin Liang (75 papers)
- Dingwen Tao (60 papers)
- Franck Cappello (60 papers)