Machine Learning Techniques for Data Reduction of Climate Applications (2405.00879v1)
Abstract: Scientists conduct large-scale simulations to compute derived quantities-of-interest (QoI) from primary data. Often, QoI are linked to specific features, regions, or time intervals, such that data can be adaptively reduced without compromising the integrity of QoI. For many spatiotemporal applications, these QoI are binary in nature and represent presence or absence of a physical phenomenon. We present a pipelined compression approach that first uses neural-network-based techniques to derive regions where QoI are highly likely to be present. Then, we employ a Guaranteed Autoencoder (GAE) to compress data with differential error bounds. GAE uses QoI information to apply low-error compression to only these regions. This results in overall high compression ratios while still achieving downstream goals of simulation or data collections. Experimental results are presented for climate data generated from the E3SM Simulation model for downstream quantities such as tropical cyclone and atmospheric river detection and tracking. These results show that our approach is superior to comparable methods in the literature.
- H. Zou, Y. Yu, W. Tang, and H. M. Chen, “Improving i/o performance with adaptive data compression for big data applications,” in 2014 IEEE International Parallel & Distributed Processing Symposium Workshops. IEEE, 2014, pp. 1228–1237.
- L. Wen, K. Zhou, S. Yang, and L. Li, “Compression of smart meter big data: A survey,” Renewable and Sustainable Energy Reviews, vol. 91, pp. 59–69, 2018.
- S. Di and F. Cappello, “Fast error-bounded lossy hpc data compression with sz,” in 2016 ieee international parallel and distributed processing symposium (ipdps). IEEE, 2016, pp. 730–739.
- Q. Gong, B. Whitney, C. Zhang, X. Liang, A. Rangarajan, J. Chen, L. Wan, P. Ullrich, Q. Liu, R. Jacob et al., “Region-adaptive, error-controlled scientific data compression using multilevel decomposition,” in Proceedings of the 34th International Conference on Scientific and Statistical Database Management, 2022, pp. 1–12.
- P. Lindstrom, “Fixed-rate compressed floating-point arrays,” IEEE transactions on visualization and computer graphics, vol. 20, no. 12, pp. 2674–2683, 2014.
- J.-C. Golaz, P. M. Caldwell, L. P. Van Roekel, M. R. Petersen, Q. Tang, J. D. Wolfe, G. Abeshu, V. Anantharaj, X. S. Asay-Davis, D. C. Bader et al., “The doe e3sm coupled model version 1: Overview and evaluation at standard resolution,” Journal of Advances in Modeling Earth Systems, vol. 11, no. 7, pp. 2089–2129, 2019.
- K. Balaguru, L. R. Leung, L. P. Van Roekel, J.-C. Golaz, P. A. Ullrich, P. M. Caldwell, S. M. Hagos, B. E. Harrop, and A. Mametjanov, “Characterizing tropical cyclones in the energy exascale earth system model version 1,” Journal of Advances in Modeling Earth Systems, vol. 12, no. 8, p. e2019MS002024, 2020.
- S. Kim, L. R. Leung, B. Guan, and J. C. Chiang, “Atmospheric river representation in the energy exascale earth system model (e3sm) version 1.0,” Geoscientific Model Development, vol. 15, no. 14, pp. 5461–5480, 2022.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.
- J. Lee, A. Rangarajan, and S. Ranka, “Nonlinear-by-linear: Guaranteeing error bounds in compressive autoencoders,” ser. IC3-2023. New York, NY, USA: Association for Computing Machinery, 2023, p. 552–561. [Online]. Available: https://doi.org/10.1145/3607947.3609702
- M. Ainsworth, O. Tugluk, B. Whitney, and S. Klasky, “Multilevel techniques for compression and reduction of scientific data—the multivariate case,” SIAM Journal on Scientific Computing, vol. 41, no. 2, pp. A1278–A1303, 2019.
- A. Moffat, “Huffman coding,” ACM Computing Surveys (CSUR), vol. 52, no. 4, pp. 1–35, 2019.
- M. S. Abdelfattah, A. Hagiescu, and D. Singh, “Gzip on a chip: High performance lossless data compression on fpgas using opencl,” in Proceedings of the international workshop on openCL 2013 & 2014, 2014, pp. 1–9.
- Y. Collet and C. Turner, “Smaller and faster data compression with zstandard,” Facebook Code [online], vol. 1, 2016.
- P. A. Ullrich, C. M. Zarzycki, E. E. McClenny, M. C. Pinheiro, A. M. Stansfield, and K. A. Reed, “Tempestextremes v2. 1: A community framework for feature detection, tracking, and analysis in large datasets,” Geoscientific Model Development, vol. 14, no. 8, pp. 5023–5048, 2021.
- X. Liang, S. Di, D. Tao, S. Li, S. Li, H. Guo, Z. Chen, and F. Cappello, “Error-controlled lossy compression optimized for high compression ratios of scientific datasets,” in 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018, pp. 438–447.
- J. Tian, S. Di, K. Zhao, C. Rivera, M. H. Fulp, R. Underwood, S. Jin, X. Liang, J. Calhoun, D. Tao et al., “Cusz: An efficient gpu-based error-bounded lossy compression framework for scientific data,” in Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, 2020, pp. 3–15.
- J. Diffenderfer, A. L. Fox, J. A. Hittinger, G. Sanders, and P. G. Lindstrom, “Error analysis of zfp compression for floating-point data,” SIAM Journal on Scientific Computing, vol. 41, no. 3, pp. A1867–A1898, 2019.
- A. Fox, J. Diffenderfer, J. Hittinger, G. Sanders, and P. Lindstrom, “Stability analysis of inline zfp compression for floating-point data in iterative methods,” SIAM Journal on Scientific Computing, vol. 42, no. 5, pp. A2701–A2730, 2020.
- X. Liang, B. Whitney, J. Chen, L. Wan, Q. Liu, D. Tao, J. Kress, D. Pugmire, M. Wolf, N. Podhorszki et al., “Mgard+: Optimizing multilevel methods for error-bounded scientific data reduction,” IEEE Transactions on Computers, vol. 71, no. 7, pp. 1522–1536, 2021.
- M. Ainsworth, O. Tugluk, B. Whitney, and S. Klasky, “Multilevel techniques for compression and reduction of scientific data-quantitative control of accuracy in derived quantities,” SIAM Journal on Scientific Computing, vol. 41, no. 4, pp. A2146–A2171, 2019.
- J. Chen, L. Wan, X. Liang, B. Whitney, Q. Liu, Q. Gong, D. Pugmire, N. Thompson, J. Y. Choi, M. Wolf et al., “Scalable multigrid-based hierarchical scientific data refactoring on gpus,” arXiv preprint arXiv:2105.12764, 2021.
- M. Rahman, M. Islam, C. Holt, J. Calhoun, and M. Chowdhury, “Dynamic error-bounded lossy compression to reduce the bandwidth requirement for real-time vision-based pedestrian safety applications,” Journal of Real-Time Image Processing, pp. 1–15, 2022.
- S. R. Uddehal, T. Strutz, H. Och, and A. Kaup, “Image segmentation for improved lossless screen content compression,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- H. Akutsu and T. Naruko, “End-to-end deep roi image compression,” IEICE TRANSACTIONS on Information and Systems, vol. 103, no. 5, pp. 1031–1038, 2020.
- P. Jiao, S. Di, H. Guo, K. Zhao, J. Tian, D. Tao, X. Liang, and F. Cappello, “Toward quantity-of-interest preserving lossy compression for scientific data,” Proceedings of the VLDB Endowment, vol. 16, no. 4, pp. 697–710, 2022.
- X. Liang, S. Di, F. Cappello, M. Raj, C. Liu, K. Ono, Z. Chen, T. Peterka, and H. Guo, “Toward feature-preserving vector field compression,” IEEE Transactions on Visualization and Computer Graphics, 2022.
- Q. Gong, C. Zhang, X. Liang, V. Reshniak, J. Chen, A. Rangarajan, S. Ranka, N. Vidal, L. Wan, P. Ullrich et al., “Spatiotemporally adaptive compression for scientific dataset with feature preservation–a case study on simulation data with extreme climate events analysis,” in 2023 IEEE 19th International Conference on e-Science (e-Science). IEEE, 2023, pp. 1–10.