Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spatiotemporally adaptive compression for scientific dataset with feature preservation -- a case study on simulation data with extreme climate events analysis (2401.03317v1)

Published 6 Jan 2024 in cs.CV, cs.NA, and math.NA

Abstract: Scientific discoveries are increasingly constrained by limited storage space and I/O capacities. For time-series simulations and experiments, their data often need to be decimated over timesteps to accommodate storage and I/O limitations. In this paper, we propose a technique that addresses storage costs while improving post-analysis accuracy through spatiotemporal adaptive, error-controlled lossy compression. We investigate the trade-off between data precision and temporal output rates, revealing that reducing data precision and increasing timestep frequency lead to more accurate analysis outcomes. Additionally, we integrate spatiotemporal feature detection with data compression and demonstrate that performing adaptive error-bounded compression in higher dimensional space enables greater compression ratios, leveraging the error propagation theory of a transformation-based compressor. To evaluate our approach, we conduct experiments using the well-known E3SM climate simulation code and apply our method to compress variables used for cyclone tracking. Our results show a significant reduction in storage size while enhancing the quality of cyclone tracking analysis, both quantitatively and qualitatively, in comparison to the prevalent timestep decimation approach. Compared to three state-of-the-art lossy compressors lacking feature preservation capabilities, our adaptive compression framework improves perfectly matched cases in TC tracking by 26.4-51.3% at medium compression ratios and by 77.3-571.1% at large compression ratios, with a merely 5-11% computational overhead.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Frontier exscale supercomputer. [Online]. Available: https://www.olcf.ornl.gov/frontier
  2. Summit supercomputer:. [Online]. Available: https://www.olcf.ornl.gov/summit
  3. C.-C. Chang, J.-C. Chuang, and Y.-S. Hu, “Retrieving digital images from a jpeg compressed image database,” Image and Vision Computing, vol. 22, no. 6, pp. 471–484, 2004.
  4. L. Deri, S. Mainardi, and F. Fusco, “tsdb: A compressed database for time series,” in International Workshop on Traffic Monitoring and Analysis.   Springer, 2012, pp. 143–156.
  5. Z. Chen, J. Gehrke, and F. Korn, “Query optimization in compressed database systems,” in Proceedings of the 2001 ACM SIGMOD international conference on Management of data, 2001, pp. 271–282.
  6. A. Arion, A. Bonifati, I. Manolescu, and A. Pugliese, “Xquec: A query-conscious compressed xml database,” ACM Transactions on Internet Technology (TOIT), vol. 7, no. 2, pp. 10–es, 2007.
  7. C. M. Zarzycki and P. A. Ullrich, “Assessing sensitivities in algorithmic detection of tropical cyclones in climate data,” Geophysical Research Letters, vol. 44, no. 2, pp. 1141–1149, 2017.
  8. C. M. Zarzycki, P. A. Ullrich, and K. A. Reed, “Metrics for evaluating tropical cyclones in climate data,” Journal of Applied Meteorology and Climatology, vol. 60, no. 5, pp. 643–660, 2021.
  9. E. E. McClenny, P. A. Ullrich, and R. Grotjahn, “Sensitivity of atmospheric river vapor transport and precipitation to uniform sea surface temperature increases,” Journal of Geophysical Research: Atmospheres, vol. 125, no. 21, p. e2020JD033421, 2020.
  10. Y. Zhou, T. A. O’Brien, P. A. Ullrich, W. D. Collins, C. M. Patricola, and A. M. Rhoades, “Uncertainties in atmospheric river lifecycles by detection algorithms: climatology and variability,” Journal of Geophysical Research: Atmospheres, vol. 126, no. 8, p. e2020JD033711, 2021.
  11. D. Tao, S. Di, Z. Chen, and F. Cappello, “Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization,” in 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).   IEEE, 2017, pp. 1129–1139.
  12. X. Liang, S. Di, D. Tao, S. Li, S. Li, H. Guo, Z. Chen, and F. Cappello, “Error-controlled lossy compression optimized for high compression ratios of scientific datasets,” in 2018 IEEE International Conference on Big Data (Big Data).   IEEE, 2018, pp. 438–447.
  13. K. Zhao, S. Di, M. Dmitriev, T.-L. D. Tonellot, Z. Chen, and F. Cappello, “Optimizing error-bounded lossy compression for scientific data by dynamic spline interpolation,” in 2021 IEEE 37th International Conference on Data Engineering (ICDE).   IEEE, 2021, pp. 1643–1654.
  14. P. Lindstrom and M. Isenburg, “Fast and efficient compression of floating-point data,” IEEE transactions on visualization and computer graphics, vol. 12, no. 5, pp. 1245–1250, 2006.
  15. P. Lindstrom, “Fixed-rate compressed floating-point arrays,” IEEE transactions on visualization and computer graphics, vol. 20, no. 12, pp. 2674–2683, 2014.
  16. M. Ainsworth, S. Klasky, and B. Whitney, “Compression using lossless decimation: analysis and application,” SIAM Journal on Scientific Computing, vol. 39, no. 4, pp. B732–B757, 2017.
  17. M. Ainsworth, O. Tugluk, B. Whitney, and S. Klasky, “Multilevel techniques for compression and reduction of scientific data—the multivariate case,” SIAM Journal on Scientific Computing, vol. 41, no. 2, pp. A1278–A1303, 2019.
  18. ——, “Multilevel techniques for compression and reduction of scientific data–quantitative control of accuracy in derived quantities,” SIAM Journal on Scientific Computing, vol. 41, no. 4, pp. A2146–A2171, 2019.
  19. P. Jiao, S. Di, H. Guo, K. Zhao, J. Tian, D. Tao, X. Liang, and F. Cappello, “Toward quantity-of-interest preserving lossy compression for scientific data,” Proceedings of the VLDB Endowment, vol. 16, no. 4, pp. 697–710, 2022.
  20. X. Liang, H. Guo, S. Di, F. Cappello, M. Raj, C. Liu, K. Ono, Z. Chen, and T. Peterka, “Toward feature-preserving 2d and 3d vector field compression.” in PacificVis, 2020, pp. 81–90.
  21. Q. Gong, B. Whitney, C. Zhang, X. Liang, A. Rangarajan, J. Chen, L. Wan, P. Ullrich, Q. Liu, R. Jacob et al., “Region-adaptive, error-controlled scientific data compression using multilevel decomposition,” in Proceedings of the 34th International Conference on Scientific and Statistical Database Management, 2022, pp. 1–12.
  22. X. Liang, S. Di, F. Cappello, M. Raj, C. Liu, K. Ono, Z. Chen, T. Peterka, and H. Guo, “Toward feature-preserving vector field compression,” IEEE Transactions on Visualization and Computer Graphics, 2022.
  23. P. M. Caldwell, A. Mametjanov, Q. Tang, L. P. Van Roekel, J.-C. Golaz, W. Lin, D. C. Bader, N. D. Keen, Y. Feng, R. Jacob et al., “The doe e3sm coupled model version 1: Description and results at high resolution,” Journal of Advances in Modeling Earth Systems, vol. 11, no. 12, pp. 4095–4146, 2019.
  24. R. Mendelsohn, K. Emanuel, S. Chonabayashi, and L. Bakkensen, “The impact of climate change on global tropical cyclone damage,” Nature climate change, vol. 2, no. 3, pp. 205–209, 2012.
  25. P. Deutsch et al., “Gzip file format specification version 4.3,” 1996.
  26. M. Burtscher and P. Ratanaworabhan, “Fpc: A high-speed compressor for double-precision floating-point data,” IEEE Transactions on Computers, vol. 58, no. 1, pp. 18–31, 2008.
  27. Y. Collet, “Rfc 8878: Zstandard compression and the’application/zstd’media type,” 2021.
  28. S. Lakshminarasimhan, N. Shah, S. Ethier, S.-H. Ku, C.-S. Chang, S. Klasky, R. Latham, R. Ross, and N. F. Samatova, “Isabela for effective in situ compression of scientific data,” Concurrency and Computation: Practice and Experience, vol. 25, no. 4, pp. 524–540, 2013.
  29. M. Ainsworth, O. Tugluk, B. Whitney, and S. Klasky, “Multilevel techniques for compression and reduction of scientific data–the univariate case,” Computing and Visualization in Science, vol. 19, no. 5, pp. 65–76, 2018.
  30. J. Lee, Q. Gong, J. Choi, T. Banerjee, S. Klasky, S. Ranka, and A. Rangarajan, “Error-bounded learned scientific data compression with preservation of derived quantities,” Applied Sciences, vol. 12, no. 13, p. 6718, 2022.
  31. Q. Gong, X. Liang, B. Whitney, J. Y. Choi, J. Chen, L. Wan, S. Ethier, S.-H. Ku, R. M. Churchill, C.-S. Chang et al., “Maintaining trust in reduction: Preserving the accuracy of quantities of interest for lossy compression,” in Smoky Mountains Computational Sciences and Engineering Conference.   Springer, 2021, pp. 22–39.
  32. T. Banerjee, J. Choi, J. Lee, Q. Gong, R. Wang, S. Klasky, A. Rangarajan, and S. Ranka, “An algorithmic and software pipeline for very large scale scientific data compression with error guarantees,” in 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC).   IEEE, 2022, pp. 226–235.
  33. A. Moffat, “Huffman coding,” ACM Computing Surveys (CSUR), vol. 52, no. 4, pp. 1–35, 2019.
  34. F. Cappello, S. Di, S. Li, X. Liang, A. M. Gok, D. Tao, C. H. Yoon, X.-C. Wu, Y. Alexeev, and F. T. Chong, “Use cases of lossy compression for floating-point data in scientific data sets,” The International Journal of High Performance Computing Applications, vol. 33, no. 6, pp. 1201–1220, 2019.
  35. J. Diffenderfer, A. L. Fox, J. A. Hittinger, G. Sanders, and P. G. Lindstrom, “Error analysis of zfp compression for floating-point data,” SIAM Journal on Scientific Computing, vol. 41, no. 3, pp. A1867–A1898, 2019.
  36. P. Berg, O. Christensen, K. Klehmet, G. Lenderink, J. Olsson, C. Teichmann, and W. Yang, “Precipitation extremes in a euro-cordex 0.11° ensemble at hourly resolution,” Nat. Hazards Earth Syst. Sci, pp. 1–21, 2018.
  37. J. Meyer, M. Neuper, L. Mathias, E. Zehe, and L. Pfister, “More frequent flash flood events and extreme precipitation favouring atmospheric conditions in temperate regions of europe,” Hydrology and Earth System Sciences Discussions, vol. 2021, pp. 1–28, 2021.
  38. Andes cluster. [Online]. Available: https://www.olcf.ornl.gov/olcf-resources/compute-systems/andes/
  39. P. A. Ullrich, C. M. Zarzycki, E. E. McClenny, M. C. Pinheiro, A. M. Stansfield, and K. A. Reed, “Tempestextremes v2. 1: a community framework for feature detection, tracking, and analysis in large datasets,” Geoscientific Model Development, vol. 14, no. 8, pp. 5023–5048, 2021.
  40. T. Eiter and H. Mannila, “Computing discrete fréchet distance,” 1994.
  41. K. Witowski and N. Stander, “Parameter identification of hysteretic models using partial curve mapping,” in 12th AIAA Aviation Technology, Integration, and Operations (ATIO) Conference and 14th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, 2012, p. 5580.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Qian Gong (28 papers)
  2. Chengzhu Zhang (1 paper)
  3. Xin Liang (75 papers)
  4. Viktor Reshniak (14 papers)
  5. Jieyang Chen (25 papers)
  6. Anand Rangarajan (47 papers)
  7. Sanjay Ranka (39 papers)
  8. Nicolas Vidal (3 papers)
  9. Lipeng Wan (27 papers)
  10. Paul Ullrich (4 papers)
  11. Norbert Podhorszki (20 papers)
  12. Robert Jacob (3 papers)
  13. Scott Klasky (35 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.