Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchical Autoencoder-based Lossy Compression for Large-scale High-resolution Scientific Data (2307.04216v2)

Published 9 Jul 2023 in cs.LG, cs.AI, and eess.IV

Abstract: Lossy compression has become an important technique to reduce data size in many domains. This type of compression is especially valuable for large-scale scientific data, whose size ranges up to several petabytes. Although Autoencoder-based models have been successfully leveraged to compress images and videos, such neural networks have not widely gained attention in the scientific data domain. Our work presents a neural network that not only significantly compresses large-scale scientific data, but also maintains high reconstruction quality. The proposed model is tested with scientific benchmark data available publicly and applied to a large-scale high-resolution climate modeling data set. Our model achieves a compression ratio of 140 on several benchmark data sets without compromising the reconstruction quality. 2D simulation data from the High-Resolution Community Earth System Model (CESM) Version 1.3 over 500 years are also being compressed with a compression ratio of 200 while the reconstruction error is negligible for scientific analysis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. S. N. A. Laboratory. (2023) Linac coherent light source(lcls-ii). [Online]. Available: https://lcls.slac.stanford.edu/
  2. N. R. A. Observatory. (2023) The very large array radio telescope. [Online]. Available: https://public.nrao.edu/
  3. P. Chang, S. Zhang, G. Danabasoglu, S. G. Yeager, H. Fu, H. Wang, F. S. Castruccio, Y. Chen, J. Edwards, D. Fu, Y. Jia, L. C. Laurindo, X. Liu, N. Rosenbloom, R. J. Small, G. Xu, Y. Zeng, Q. Zhang, J. Bacmeister, D. A. Bailey, X. Duan, A. K. DuVivier, D. Li, Y. Li, R. Neale, A. Stössel, L. Wang, Y. Zhuang, A. Baker, S. Bates, J. Dennis, X. Diao, B. Gan, A. Gopal, D. Jia, Z. Jing, X. Ma, R. Saravanan, W. G. Strand, J. Tao, H. Yang, X. Wang, Z. Wei, and L. Wu, “An unprecedented set of high-resolution earth system simulations for understanding multiscale interactions in climate variability and change,” Journal of Advances in Modeling Earth Systems, vol. 12, no. 12, p. e2020MS002298, 2020, e2020MS002298 2020MS002298. [Online]. Available: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2020MS002298
  4. S. W. Son, Z. Chen, W. Hendrix, A. Agrawal, W.-k. Liao, and A. Choudhary, “Data compression for the exascale computing era-survey,” Supercomputing frontiers and innovations, vol. 1, no. 2, pp. 76–88, 2014.
  5. P. Deutsch, “Gzip file format specification version 4.3,” Tech. Rep., 1996.
  6. Y. Collet and E. Kucherawy, “Zstandard-real-time data compression algorithm,” 2015.
  7. G. K. Wallace, “The jpeg still picture compression standard,” Communications of the ACM, vol. 34, no. 4, pp. 30–44, 1991.
  8. S. Di and F. Cappello, “Fast error-bounded lossy hpc data compression with sz,” in 2016 ieee international parallel and distributed processing symposium (ipdps).   IEEE, 2016, pp. 730–739.
  9. D. Tao, S. Di, Z. Chen, and F. Cappello, “Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization,” in 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).   IEEE, 2017, pp. 1129–1139.
  10. J. Kim, A. D. Baczewski, T. D. Beaudet, A. Benali, M. C. Bennett, M. A. Berrill, N. S. Blunt, E. J. L. Borda, M. Casula, D. M. Ceperley et al., “Qmcpack: an open source ab initio quantum monte carlo package for the electronic structure of atoms, molecules and solids,” Journal of Physics: Condensed Matter, vol. 30, no. 19, p. 195901, 2018.
  11. A. H. Baker, H. Xu, J. M. Dennis, M. N. Levy, D. Nychka, S. A. Mickelson, J. Edwards, M. Vertenstein, and A. Wegener, “A methodology for evaluating the impact of data compression on climate simulation data,” in Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, 2014, pp. 203–214.
  12. N. Sasaki, K. Sato, T. Endo, and S. Matsuoka, “Exploration of lossy compression for application-level checkpoint/restart,” in 2015 IEEE International Parallel and Distributed Processing Symposium.   IEEE, 2015, pp. 914–922.
  13. A. H. Baker, H. Xu, D. M. Hammerling, S. Li, and J. P. Clyne, “Toward a multi-method approach: Lossy data compression for climate simulation data,” in International Conference on High Performance Computing.   Springer, 2017, pp. 30–42.
  14. M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” in International conference on machine learning.   PMLR, 2021, pp. 10 096–10 106.
  15. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  16. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  17. N. Johnston, E. Eban, A. Gordon, and J. Ballé, “Computationally efficient neural image compression,” arXiv preprint arXiv:1912.08771, 2019.
  18. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
  19. A. Vahdat and J. Kautz, “Nvae: A deep hierarchical variational autoencoder,” Advances in Neural Information Processing Systems, vol. 33, pp. 19 667–19 679, 2020.
  20. D. Minnen and S. Singh, “Channel-wise autoregressive entropy models for learned image compression,” in 2020 IEEE International Conference on Image Processing (ICIP).   IEEE, 2020, pp. 3339–3343.
  21. J. Liu, S. Di, K. Zhao, S. Jin, D. Tao, X. Liang, Z. Chen, and F. Cappello, “Exploring autoencoder-based error-bounded compression for scientific data,” in 2021 IEEE International Conference on Cluster Computing (CLUSTER).   IEEE, 2021, pp. 294–306.
  22. T. Liu, J. Wang, Q. Liu, S. Alibhai, T. Lu, and X. He, “High-ratio lossy compression: Exploring the autoencoder to compress scientific data,” IEEE Transactions on Big Data, 2021.
  23. P. Lindstrom, “Fixed-rate compressed floating-point arrays,” IEEE transactions on visualization and computer graphics, vol. 20, no. 12, pp. 2674–2683, 2014.
  24. P. Lindstrom and M. Isenburg, “Fast and efficient compression of floating-point data,” IEEE transactions on visualization and computer graphics, vol. 12, no. 5, pp. 1245–1250, 2006.
  25. K. Zhao, S. Di, M. Dmitriev, T.-L. D. Tonellot, Z. Chen, and F. Cappello, “Optimizing error-bounded lossy compression for scientific data by dynamic spline interpolation,” in 2021 IEEE 37th International Conference on Data Engineering (ICDE).   IEEE, 2021, pp. 1643–1654.
  26. J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” arXiv preprint arXiv:1611.01704, 2016.
  27. J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” arXiv preprint arXiv:1802.01436, 2018.
  28. J. J. Rissanen, “Generalized kraft inequality and arithmetic coding,” IBM Journal of research and development, vol. 20, no. 3, pp. 198–203, 1976.
  29. J. Rissanen and G. G. Langdon, “Arithmetic coding,” IBM Journal of research and development, vol. 23, no. 2, pp. 149–162, 1979.
  30. D. A. Huffman, “A method for the construction of minimum-redundancy codes,” Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101, 1952.
  31. J. Ballé, N. Johnston, and D. Minnen, “Integer networks for data compression with latent-variable models,” in International Conference on Learning Representations, 2018.
  32. D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” Advances in neural information processing systems, vol. 31, 2018.
  33. F. Bellard. (2023) Bpg image format. [Online]. Available: https://bellard.org/bpg/
  34. D. Kim, M. Lee, and K. Museth, “Neuralvdb: High-resolution sparse volume representation using hierarchical neural networks,” arXiv preprint arXiv:2208.04448, 2022.
  35. A. Van Den Oord, O. Vinyals et al., “Neural discrete representation learning,” Advances in neural information processing systems, vol. 30, 2017.
  36. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
  37. J. N. Martel, D. B. Lindell, C. Z. Lin, E. R. Chan, M. Monteiro, and G. Wetzstein, “Acorn: Adaptive coordinate networks for neural scene representation,” arXiv preprint arXiv:2105.02788, 2021.
  38. A. Glaws, R. King, and M. Sprague, “Deep learning for in situ data compression of large turbulent flow simulations,” Physical Review Fluids, vol. 5, no. 11, p. 114602, 2020.
  39. A. Nasari, H. Le, R. Lawrence, Z. He, X. Yang, M. Krell, A. Tsyplikhin, M. Tatineni, T. Cockerill, L. Perez et al., “Benchmarking the performance of accelerators on national cyberinfrastructure resources for artificial intelligence/machine learning workloads,” in Practice and Experience in Advanced Research Computing, 2022, pp. 1–9.
  40. M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” Journal of Computational physics, vol. 378, pp. 686–707, 2019.
  41. J. Choi, M. Churchill, Q. Gong, S.-H. Ku, J. Lee, A. Rangarajan, S. Ranka, D. Pugmire, C. Chang, and S. Klasky, “Neural data compression for physics plasma simulation,” in Neural Compression: From Information Theory to Applications–Workshop@ ICLR 2021, 2021.
  42. Y. Zhuang, S. Cheng, N. Kovalchuk, M. Simmons, O. K. Matar, Y.-K. Guo, and R. Arcucci, “Ensemble latent assimilation with deep learning surrogate model: application to drop interaction in a microfluidics device,” Lab on a Chip, vol. 22, no. 17, pp. 3187–3202, 2022.
  43. C. Zhong, S. Cheng, M. Kasoar, and R. Arcucci, “Reduced-order digital twin and latent data assimilation for global wildfire prediction,” Natural hazards and earth system sciences, vol. 23, no. 5, pp. 1755–1768, 2023.
  44. S. Cheng, J. Chen, C. Anastasiou, P. Angeli, O. K. Matar, Y.-K. Guo, C. C. Pain, and R. Arcucci, “Generalised latent assimilation in heterogeneous reduced spaces with machine learning surrogate models,” Journal of Scientific Computing, vol. 94, no. 1, p. 11, 2023.
  45. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  46. D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.
  47. J. Ballé, V. Laparra, and E. P. Simoncelli, “Density modeling of images using a generalized normalization transformation,” arXiv preprint arXiv:1511.06281, 2015.
  48. J. Ballé, “Efficient nonlinear transforms for lossy image compression,” in 2018 Picture Coding Symposium (PCS).   IEEE, 2018, pp. 248–252.
  49. C. K. Sønderby, T. Raiko, L. Maaløe, S. K. Sønderby, and O. Winther, “Ladder variational autoencoders,” Advances in neural information processing systems, vol. 29, 2016.
  50. A. Razavi, A. Van den Oord, and O. Vinyals, “Generating diverse high-fidelity images with vq-vae-2,” Advances in neural information processing systems, vol. 32, 2019.
  51. Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv preprint arXiv:1308.3432, 2013.
  52. K. Zhao, S. Di, X. Lian, S. Li, D. Tao, J. Bessac, Z. Chen, and F. Cappello, “Sdrbench: Scientific data reduction benchmark for lossy compressors,” in 2020 IEEE International Conference on Big Data (Big Data).   IEEE, 2020, pp. 2716–2724.
  53. X. Liang, S. Di, D. Tao, S. Li, S. Li, H. Guo, Z. Chen, and F. Cappello, “Error-controlled lossy compression optimized for high compression ratios of scientific datasets,” in 2018 IEEE International Conference on Big Data (Big Data).   IEEE, 2018, pp. 438–447.
  54. N. Wang, T. Liu, J. Wang, Q. Liu, S. Alibhai, and X. He, “Locality-based transfer learning on compression autoencoder for efficient scientific data lossy compression,” Journal of Network and Computer Applications, vol. 205, p. 103452, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.