A Survey on Error-Bounded Lossy Compression for Scientific Datasets (2404.02840v1)
Abstract: Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving the reconstructed data fidelity very well. Many error-bounded lossy compressors have been developed for a wide range of parallel and distributed use cases for years. These lossy compressors are designed with distinct compression models and design principles, such that each of them features particular pros and cons. In this paper we provide a comprehensive survey of emerging error-bounded lossy compression techniques for different use cases each involving big data to process. The key contribution is fourfold. (1) We summarize an insightful taxonomy of lossy compression into 6 classic compression models. (2) We provide a comprehensive survey of 10+ commonly used compression components/modules used in error-bounded lossy compressors. (3) We provide a comprehensive survey of 10+ state-of-the-art error-bounded lossy compressors as well as how they combine the various compression modules in their designs. (4) We provide a comprehensive survey of the lossy compression for 10+ modern scientific applications and use-cases. We believe this survey is useful to multiple communities including scientific applications, high-performance computing, lossy compression, and big data.
- [n. d.]. https://docs.olcf.ornl.gov/systems/frontier_user_guide.html#frontier-compute-nodes
- [n. d.]. https://newsroom.ibm.com/2022-11-09-IBM-Unveils-400-Qubit-Plus-Quantum-Processor-and-Next-Generation-IBM-Quantum-System-Two
- [n. d.]. Digit Rounding Code. https://github.com/CNES/Digit_Rounding. Online.
- [n. d.]. Flash-X: A Multiphysics Scientific Software System. https://flash-x.org/.
- [n. d.]. FuncX. https://funcx.org/.
- [n. d.]. Globus. https://www.globus.org/.
- [n. d.]. HDF5. http://www.hdfgroup.org/HDF5
- [n. d.]. NetCDF Operator Site. https://nco.sourceforge.net/.
- [n. d.]. Reverse Time Migration (RTM) Technology. http://www.seismiccity.com/RTM.html.
- . 2017. Linac Coherent Light Source (LCLS-II). https://lcls.slac.stanford.edu/. Online.
- . 2020. EXAALT: Malecular Dynamics at the Exascale. https://www.exascaleproject.org/wp-content/uploads/2019/10/EXAALT.pdf. Online.
- 2020. GAMESS: Enabling GAMESS for exascale computing in chemistry and materials. https://www.exascaleproject.org/wp-content/uploads/2019/10/GAMESS.pdf. Online.
- Multilevel techniques for compression and reduction of scientific data—the univariate case. Computing and Visualization in Science 19, 5 (2018), 65–76.
- Multilevel techniques for compression and reduction of scientific data—the multivariate case. SIAM Journal on Scientific Computing 41, 2 (2019), A1278–A1303.
- Multilevel techniques for compression and reduction of scientific data-quantitative control of accuracy in derived quantities. SIAM Journal on Scientific Computing 41, 4 (2019), A2146–A2171.
- Multilevel techniques for compression and reduction of scientific data—The unstructured case. SIAM Journal on Scientific Computing 42, 2 (2020), A1402–A1427.
- Alham Fikri Aji and Kenneth Heafield. 2017. Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021 (2017).
- Baidaa A. Al-Salamee and Dhiah Al-Shammary. 2021. Survey Analysis for Medical Image Compression Techniques. In Communication and Intelligent Systems, Harish Sharma, Mukesh Kumar Gupta, G. S. Tomar, and Wang Lipo (Eds.). Springer Singapore, Singapore, 241–264.
- QSGD: Communication-efficient SGD via gradient quantization and encoding. Advances in Neural Information Processing Systems 30 (2017).
- Nyx: A massively parallel amr code for computational cosmology. The Astrophysical Journal 765, 1 (2013), 39.
- PaLM 2 Technical Report. arXiv:2305.10403 [cs.CL]
- Evaluating Lossy Data Compression on Climate Simulation Data within a Large Ensemble. Geoscientific Model Development 9, 12 (December 2016), 4381–4403. https://doi.org/10.5194/gmd-9-4381-2016
- Evaluating Image Quality Measures to Assess the Impact of Lossy Data Compression Applied to Climate Simulation Data. Computer Graphics Forum 38, 3 (december 2019), 517–528.
- DSSIM: A Structural Similarity Index for Floating-Point Data. arXiv:2202.02616 [cs, stat] (February 2022). arXiv:2202.02616 [cs, stat]
- DSSIM: a structural similarity index for floating-point data.
- Toward a Multi-method Approach: Lossy Data Compression for Climate Simulation Data. In High Performance Computing. Springer International Publishing, 30–42.
- TTHRESH: Tensor compression for multidimensional visual data. IEEE Transactions on Visualization and Computer Graphics 26, 9 (2019), 2891–2903.
- Analysis of tensor approximation for compression-domain volume visualization. Computers & Graphics 47 (2015), 34–47.
- An Algorithmic and Software Pipeline for Very Large Scale Scientific Data Compression with Error Guarantees. In 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC). IEEE, 226–235.
- Carlos HS Barbosa and Alvaro LGA Coutinho. 2023. Reverse Time Migration with Lossy and Lossless Wavefield Compression. In 2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 192–201.
- GPU implementation of minimal dispersion recursive operators for reverse time migration. SEG Technical Program Expanded Abstracts 34 (2015), 4116–4120. https://doi.org/10.1190/segam2015-5754164.1 Publisher Copyright: © 2015 SEG.; null ; Conference date: 18-10-2011 Through 23-10-2011.
- signSGD: Compressed optimisation for non-convex problems. In International Conference on Machine Learning. PMLR, 560–569.
- Deep learning approaches for video compression: a bibliometric analysis. Big Data and Cognitive Computing 6, 2 (2022), 44.
- On the Opportunities and Risks of Foundation Models. arXiv:2108.07258 [cs.LG]
- M. Burtscher and P. Ratanaworabhan. 2009. FPC: A High-Speed Compressor for Double-Precision Floating-Point Data. IEEE Trans. Comput. 58, 1 (Jan 2009), 18–31.
- Use cases of lossy compression for floating-point data in scientific data sets. The International Journal of High Performance Computing Applications 33, 6 (2019), 1201–1220.
- LFZip: Lossy Compression of Multivariate Floating-Point Time Series Data via Improved Prediction. In 2020 Data Compression Conference (DCC). 342–351. https://doi.org/10.1109/DCC47342.2020.00042
- Accelerating multigrid-based hierarchical scientific data refactoring on GPUs. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 859–868.
- Log Hyperbolic Cosine Loss Improves Variational Auto-Encoder. https://openreview.net/forum?id=rkglvsC9Ym. Online.
- NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing. In SC ’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 733–744.
- Giacomo Chiarot and Claudio Silvestri. 2023. Time Series Compression Survey. ACM Comput. Surv. 55, 10, Article 198 (feb 2023), 32 pages. https://doi.org/10.1145/3560814
- Neural Data Compression for Physics Plasma Simulation. ([n. d.]).
- The Earth System Grid Federation: An Open Infrastructure for Access to Distributed Geospatial Data. Future Generation Computer Systems 36 (July 2014), 400–417.
- Biorthogonal bases of compactly supported wavelets. Communications on Pure and Applied Mathematics 45, 5 (1992), 485–560. https://doi.org/10.1002/cpa.3160450502 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpa.3160450502
- Yann Collet. 2015. Zstandard – Real-time data compression algorithm. http://facebook.github.io/zstd/ (2015).
- cuZFP. 2020. https://github.com/LLNL/zfp/tree/develop/src/cuda_zfp. Online.
- Ingrid Daubechies. 1988. Orthonormal bases of compactly supported wavelets. Communications on Pure and Applied Mathematics 41, 7 (1988), 909–996. https://doi.org/10.1002/cpa.3160410705 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpa.3160410705
- Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files. Geoscientific Model Development 12, 9 (2019), 4099–4113.
- L Peter Deutsch. 1996. GZIP file format specification version 4.3.
- Sheng Di and Franck Cappello. 2016. Fast error-bounded lossy HPC data compression with SZ. In IEEE International Parallel and Distributed Processing Symposium. 730–739.
- Sheng Di and Franck Cappello. 2018. Optimization of Error-Bounded Lossy Compression for Hard-to-Compress HPC Data. IEEE Transactions on Parallel and Distributed Systems 29, 1 (2018), 129–143.
- Predictive compression of molecular dynamics trajectories. Journal of Molecular Graphics and Modelling 96 (2020), 107531.
- Herbert Edelsbrunner and Ernst Peter Mücke. 1990. Simulation of simplicity: a technique to cope with degenerate cases in geometric algorithms. ACM Transactions on Graphics (tog) 9, 1 (1990), 66–104.
- SS: The Future of Seismic Imaging; Reverse Time Migration and Full Wavefield Inversion-Reverse Time Migration Imaging and Model Estimation. In Offshore Technology Conference. OTC, OTC–19879.
- Thomas E. Fornek. 2017. Advanced Photon Source Upgrade Project preliminary design report.
- In situ and in-transit analysis of cosmological simulations. Computational Astrophysics and Cosmology 3, 1 (2016), 1–18.
- Deep learning for in situ data compression of large turbulent flow simulations. Physical Review Fluids 5, 11 (2020), 114602.
- PaSTRI: Error-Bounded Lossy Compression for Two-Electron Integrals in Quantum Chemistry. In 2018 IEEE International Conference on Cluster Computing (CLUSTER). 1–11. https://doi.org/10.1109/CLUSTER.2018.00013
- MGARD: A multigrid framework for high-performance, error-controlled data compression and refactoring. SoftwareX 24 (2023), 101590.
- Deep Learning. MIT Press. http://www.deeplearningbook.org.
- Generative Adversarial Nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada) (NIPS’14). MIT Press, Cambridge, MA, USA, 2672––2680.
- Foresight: analysis that matters for data reduction. In 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE Computer Society, 1171–1185.
- DE-ZFP: An FPGA implementation of a modified ZFP compression/decompression algorithm. Microprocessors and Microsystems 90 (2022), 104453. https://doi.org/10.1016/j.micpro.2022.104453
- HACC: Extreme scaling and performance across diverse architectures. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–10.
- Jun Han and Chaoli Wang. 2022. Coordnet: Data generation and visualization generation for time-varying volumes via a coordinate-based neural network. IEEE Transactions on Visualization and Computer Graphics (2022).
- KD-INR: Time-Varying Volumetric Data Compression via Knowledge Distillation-based Implicit Neural Representation. IEEE Transactions on Visualization and Computer Graphics (2023).
- Using Neural Networks for Two Dimensional Scientific Data Compression. In 2021 IEEE International Conference on Big Data (Big Data). IEEE, 2956–2965.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
- HACC Cosmological Simulations: First Data Release. arXiv preprint arXiv:1904.11966 (2019).
- GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. Journal of Chemical Theory and Computation 4, 3 (2008), 435–447.
- beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In ICLR.
- Learning end-to-end lossy image compression: A benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 8 (2021), 4194–4211.
- gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters. arXiv:2308.05199 [cs.DC]
- POSTER: Optimizing Collective Communications with Error-bounded Lossy Compression for GPU Clusters. In Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (Edinburgh, United Kingdom) (PPoPP ’24). Association for Computing Machinery, New York, NY, USA, 454–456.
- An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression. arXiv:2304.03890 [cs.DC]
- Langwen Huang and Torsten Hoefler. 2022. Compressing multidimensional weather and climate data into neural networks. arXiv preprint arXiv:2210.12538 (2022).
- cuSZp: An Ultra-fast GPU Error-bounded Lossy Compression Framework with Optimized End-to-End Performance. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–13.
- Efficient data compression for 3D sparse TPC via bicephalous convolutional autoencoder. In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 1094–1099.
- Fast 2D Bicephalous Convolutional Autoencoder for Compressing 3D Time Projection Chamber Data. In Proceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis. 298–305.
- Towards Improving Reverse Time Migration Performance by High-speed Lossy Compression. In 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid). IEEE, 651–661.
- Compressing molecular dynamics trajectories: Breaking the one-bit-per-sample barrier. Journal of Computational Chemistry 37, 20 (2016), 1897–1906.
- Learning-driven lossy image compression: A comprehensive survey. Engineering Applications of Artificial Intelligence 123 (2023), 106361.
- A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications. Journal of King Saud University – Computer and Information Sciences 33, 2 (2021), 119–140.
- CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction. In 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE.
- DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing (Phoenix, AZ, USA) (HPDC ’19). ACM, New York, NY, USA, 159–170.
- Concealing compression-accelerated I/O for HPC applications through in situ task scheduling. In EuroSys 2024.
- Understanding GPU-Based Lossy Compression for Extreme-Scale Cosmological Simulations. arXiv preprint arXiv:2004.00224 (2020).
- Adaptive configuration of in situ lossy compression for cosmology simulations via fine-grained rate-quality modeling. In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing. 45–56.
- Accelerating parallel write via deeply integrating predictive lossy compression with HDF5. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–15.
- Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583–589. https://doi.org/10.1038/s41586-021-03819-2
- Diederik P Kingma and Max Welling. 2013. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013).
- Compressing Atmospheric Data into Its Real Information Content. Nature Computational Science 1, 11 (Nov. 2021), 713–724. https://doi.org/10.1038/s43588-021-00156-2
- Sliced Wasserstein auto-encoders. In International Conference on Learning Representations.
- Data Compression for Climate Data. Supercomputing Frontiers and Innovations 3, 1 (Jun. 2016), 75––94. https://superfri.org/index.php/superfri/article/view/101
- Variational inference of disentangled latent concepts from unlabeled observations. arXiv preprint arXiv:1711.00848 (2017).
- Compression in Molecular Simulation Datasets. In Intelligence Science and Big Data Engineering. Berlin, Heidelberg, 22–29.
- Argonne National Laboratory. 2023. cuSZp– a lossy error-bounded compression library for compression of floating-point data in NVIDIA GPU. https://github.com/szcompressor/cuSZp.
- Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data. In Euro-Par 2011 Parallel Processing, Emmanuel Jeannot, Raymond Namyst, and Jean Roman (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 366–379.
- ISABELA for effective in situ compression of scientific data. Concurrency and Computation: Practice and Experience 25, 4 (2013), 524–540.
- Error-bounded learned scientific data compression with preservation of derived quantities. Applied Sciences 12, 13 (2022), 6718.
- Parameter server for distributed machine learning. In Big learning NIPS workshop, Vol. 6.
- Shaomeng Li. 2018. VAPOR Github. https://github.com/NCAR/VAPOR.
- VAPOR: A Visualization Package Tailored to Analyze Simulation Data in Eart System Science. (07 2019).
- Lossy scientific data compression with SPERR. In 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 1007–1017.
- PyTorch Distributed: Experiences on Accelerating Data Parallel Training. Proceedings of the VLDB Endowment 13, 12 ([n. d.]).
- Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations. In Conference on Learning Theory. PMLR, 2–47.
- Zhen-Chun Li and Ying-Ming Qu. 2022. Research progress on seismic imaging technology. Petroleum Science 19, 1 (2022), 128–146.
- Xin Liang et al. 2021. SZ3: A Modular Framework for Composing Prediction-Based Error-Bounded Lossy Compressors. https://arxiv.org/abs/2111.02925. Online.
- Toward Feature-Preserving Vector Field Compression. IEEE Trans. Vis. Comput. Graph. 29, 12 (2023), 5434–5450.
- Exploring Best Lossy Compression Strategy By Combining SZ with Spatiotemporal Decimation. https://sc18.supercomputing.org/proceedings/workshops/workshop_files/ws_drbsd108s1-file1.pdf.
- Significantly improving lossy compression quality based on an optimized hybrid prediction model. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–26.
- An efficient transformation scheme for lossy data compression with point-wise relative error bound. In 2018 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 179–189.
- Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets. In 2018 IEEE International Conference on Big Data. IEEE.
- Improving Performance of Data Dumping with Lossy Compression for Scientific Simulation. In 2019 IEEE International Conference on Cluster Computing (CLUSTER). 1–11.
- Toward Feature-Preserving 2D and 3D Vector Field Compression. In 2020 IEEE Pacific Visualization Symposium, PacificVis 2020, Tianjin, China, June 3-5, 2020. IEEE, 81–90.
- Deep gradient compression: Reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887 (2017).
- Peter Lindstrom. 2014. Fixed-rate compressed floating-point arrays. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 2674–2683.
- Peter G Lindstrom et al. 2017. Fpzip. Technical Report. Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).
- Scientific Error-bounded Lossy Compression with Super-resolution Neural Networks. In 2023 IEEE International Conference on Big Data (BigData). IEEE, 229–236.
- Exploring Autoencoder-based Error-bounded Compression for Scientific Data. In 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 294–306.
- Dynamic quality metric oriented error bounded lossy compression for scientific datasets. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Dallas, Texas) (SC ’22). IEEE Press, Article 62, 15 pages.
- FAZ: A flexible auto-tuned modular error-bounded compression framework for scientific data. In Proceedings of the 37th International Conference on Supercomputing (Orlando, FL, USA) (ICS ’23). Association for Computing Machinery, New York, NY, USA, 1–13.
- High-performance Effective Scientific Error-bounded Lossy Compression with Auto-tuned Multi-component Interpolation. In ACM Special Interest Group on Management of Data (SIGMOD2024).
- Improving Lossy Compression for SZ by Exploring the Best-Fit Lossless Compression Techniques. In 2021 IEEE International Conference on Big Data (Big Data). 2986–2991. https://doi.org/10.1109/BigData52589.2021.9671954
- High-Ratio Lossy Compression: Exploring the Autoencoder to Compress Scientific Data. IEEE Transactions on Big Data (2021).
- Optimizing Scientific Data Transfer on Globus with Error-Bounded Lossy Compression. In 2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS). 703–713. https://doi.org/10.1109/ICDCS57875.2023.00064
- Optimizing Multi-Range based Error-Bounded Lossy Compression for Scientific Datasets. In 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC). 394–399.
- Understanding Effectiveness of Multi-Error-Bounded Lossy Compression for Preserving Ranges of Interest in Scientific Analysis. In 2021 7th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-7). 40–46. https://doi.org/10.1109/DRBSD754563.2021.00010
- Optimizing Error-Bounded Lossy Compression for Scientific Data With Diverse Constraints. IEEE Transactions on Parallel and Distributed Systems 33, 12 (2022), 4440–4457. https://doi.org/10.1109/TPDS.2022.3194695
- Compressive neural representations of volumetric scalar fields. In Computer Graphics Forum, Vol. 40. Wiley Online Library, 135–146.
- Igor L. Markov and Yaoyun Shi. 2008. Simulating Quantum Computation by Contracting Tensor Networks. SIAM J. Comput. 38, 3 (2008), 963–981. https://doi.org/10.1137/050644756 arXiv:https://doi.org/10.1137/050644756
- Charles H Martin and Michael W Mahoney. 2021. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning. The Journal of Machine Learning Research 22, 1 (2021), 7479–7551.
- Essential Dynamics: A Tool for Efficient Trajectory Compression and Management. Journal of Chemical Theory and Computation 2, 2 (2006), 251–258.
- Deep architectures for image compression: a critical review. Signal Processing 191 (2022), 108346.
- Lossless Compression of Climate Data. In Progress in Systems Engineering, Henry Selvaraj, Dawid Zydek, and Grzegorz Chmaj (Eds.). Springer International Publishing, Cham, 391–400.
- Seismic Data Compression: A Survey. In Advances in Geophysics, Tectonics and Petroleum Geosciences, Mustapha Meghraoui, Narasimman Sundararajan, Santanu Banerjee, Klaus-G. Hinzen, Mehdi Eshagh, François Roure, Helder I. Chaminé, Said Maouche, and André Michard (Eds.). Springer International Publishing, Cham, 253–255.
- Scalable I/O of large-scale molecular dynamics simulations: A data-compression algorithm. Computer Physics Communications 131, 1 (2000), 78–85.
- GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Efficient, low-complexity image coding with a set-partitioning embedded block coder. IEEE Transactions on Circuits and Systems for Video Technology 14, 11 (2004), 1219–1235.
- Assessing Differences in Large Spatio-temporal Climate Datasets with a New Python Package. In 2020 IEEE International Conference on Big Data (Big Data). 2699–2707. https://doi.org/10.1109/BigData50022.2020.9378100
- John Preskill. 2012. Quantum computing and the entanglement frontier. arXiv:1203.5813 [quant-ph]
- Survey on Deep Learning-based Point Cloud Compression. Frontiers in Signal Processing 2 (2022). https://doi.org/10.3389/frsip.2022.846972
- Massively parallel quantum computer simulator. Comput. Phys. Commun. 176 (2006), 121–136. https://api.semanticscholar.org/CorpusID:17463164
- SparCML: High-performance sparse communication for machine learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–15.
- R. Rew and G. Davis. 1990. NetCDF: an interface for scientific data access. IEEE Computer Graphics and Applications 10 (1990), 76–82.
- Etienne Robein. November 15, 2016. EAGE E-Lecture: Reverse Time Migration: How Does It Work, When To Use It. https://youtu.be/ywdML8ndYeQ.
- Sparse binary compression: Towards distributed deep learning with minimal communication. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
- ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization. In Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (Delft, The Netherlands) (HPDC ’12). Association for Computing Machinery, New York, NY, USA, 61––72. https://doi.org/10.1145/2287076.2287086
- 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. In Fifteenth annual conference of the international speech communication association.
- Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv:1802.05799 [cs.LG]
- GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit Simulations. In 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 757–767. https://doi.org/10.1109/IPDPS54959.2023.00081
- Improving I/O Throughput with PRIMACY: Preconditioning ID-Mapper for Compressing Incompressibility. In 2012 IEEE International Conference on Cluster Computing. 209–219. https://doi.org/10.1109/CLUSTER.2012.16
- Megatron-LM: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053 (2019).
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV]
- qHiPSTER: The Quantum High Performance Software Testing Environment. arXiv:1601.07195 [quant-ph]
- Topologically Controlled Lossy Compression. In IEEE Pacific Visualization Symposium, PacificVis 2018, Japan, 2018. IEEE Computer Society, 46–55.
- Data Compression for the Exascale Computing Era – Survey. Supercomput. Front. Innov.: Int. J. 1, 2 (jul 2014), 76––88.
- Summit supercomputer. 2020. https://www.olcf.ornl.gov/summit/.
- Gongjin Sun and Sang-Woo Jun. 2019. ZFP-V: Hardware-Optimized Lossy Floating Point Compression. In 2019 International Conference on Field-Programmable Technology (ICFPT). 117–125.
- Interactive Multiscale Tensor Reconstruction for Multiresolution Volume Visualization. IEEE Transactions on Visualization and Computer Graphics 17, 12 (2011), 2135–2143.
- Tamresh – tensor approximation multiresolution hierarchy for interactive volume visualization. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 151–160.
- Exploration of pattern-matching techniques for lossy compression on cosmology simulation data sets. In International Conference on High Performance Computing. Springer, 43–54.
- In-depth exploration of single-snapshot lossy compression techniques for N-body simulations. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 486–493.
- Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In 2017 IEEE International Parallel and Distributed Processing Symposium. IEEE, 1129–1139.
- Improving performance of iterative methods by lossy checkponting. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing. 52–65.
- Nikola Tchipev et al. 2019. TweTriS: Twenty trillion-atom simulation. The International Journal of High Performance Computing Applications 33, 5 (2019), 838–854.
- Jiannan Tian et al. 2020b. CuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT ’20). 3––15.
- Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs. In 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE Computer Society, Los Alamitos, CA, USA, 283–293.
- WaveSZ: A Hardware-Algorithm Co-Design of Efficient Lossy Compression for Scientific Data. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Diego, California) (PPoPP ’20). Association for Computing Machinery, New York, NY, USA, 74––88.
- Wasserstein auto-encoders. arXiv preprint arXiv:1711.01558 (2017).
- Understanding the Effects of Modern Compressors on the Community Earth Science Model. In 2022 IEEE/ACM 8th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD). IEEE, Dallas, TX, USA, 1–10.
- ROIBIN-SZ: Fast and Science-Preserving Compression for Serial Crystallography. Synchrotron Radiation News 36, 4 (2023), 17–22.
- PowerSGD: Practical low-rank gradient compression for distributed optimization. Advances in Neural Information Processing Systems 32 (2019).
- Analysis and modeling of the end-to-end I/O performance on OLCF’s Titan supercomputer. In 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 1–9.
- Comprehensive measurement and analysis of the user-perceived I/O performance in a production leadership-class storage system. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 1022–1031.
- TAC: Optimizing Error-Bounded Lossy Compression for Three-Dimensional Adaptive Mesh Refinement Simulations. In Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing. 135–147.
- TAC+: Optimizing Error-Bounded Lossy Compression for 3D AMR Simulations. IEEE Transactions on Parallel and Distributed Systems 35, 3 (March 2024), 421––438.
- Analyzing impact of data reduction techniques on visualization for AMR applications using AMReX framework. In Proceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis. 263–271.
- AMRIC: A Novel In Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’23). Association for Computing Machinery, New York, NY, USA, Article 44, 15 pages.
- Terngrad: Ternary gradients to reduce communication in distributed deep learning. Advances in Neural Information Processing Systems 30 (2017).
- Efficient Communication in Federated Learning Using Floating-Point Lossy Compression. arXiv:2312.13461 [cs.DC]
- Full-State Quantum Circuit Simulation by Using Data Compression. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) (SC ’19). Association for Computing Machinery, New York, NY, USA, Article 80, 24 pages. https://doi.org/10.1145/3295500.3356155
- Preserving Topological Feature with Sign-of-Determinant Predicates in Lossy Compression: A Case Study of Vector Field Critical Points. In 2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE.
- Ghostsz: A transparent FPGA-accelerated lossy compression framework. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 258–266.
- TopoSZ: Preserving Topology in Error-Bounded Lossy Compression. IEEE Transactions on Visualization and Computer Graphics (2023. Early Access).
- Ultrafast error-bounded lossy compression for scientific datasets. In Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing. 159–171.
- Charles S. Zender. 2016. Bit Grooming: Statistically Accurate Precision-Preserving Quantization with Compression, Evaluated in the netCDF Operators (NCO, v4.4.8+). Geoscientific Model Development 9, 9 (Sept. 2016), 3199–3211.
- FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs. In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (Orlando, FL, USA) (HPDC ’23). Association for Computing Machinery, New York, NY, USA, 129–142. https://doi.org/10.1145/3588195.3592994
- Efficient Encoding and Reconstruction of HPC Datasets for Checkpoint/Restart. In 2019 35th Symposium on Mass Storage Systems and Technologies (MSST). 79–91. https://doi.org/10.1109/MSST.2019.00-14
- Momentum-driven adaptive synchronization model for distributed DNN training on HPC clusters. J. Parallel and Distrib. Comput. 159 (2022), 65–84.
- Zhaorui Zhang and Choli Wang. 2021. SaPus: Self-adaptive parameter update strategy for DNN training on Multi-GPU clusters. IEEE Transactions on Parallel and Distributed Systems 33, 7 (2021), 1569–1580.
- Zhaorui Zhang and Choli Wang. 2022. MIPD: An adaptive gradient sparsification framework for distributed DNNs training. IEEE Transactions on Parallel and Distributed Systems 33, 11 (2022), 3053–3066.
- Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 1643–1654.
- SDRBench: Scientific Data Reduction Benchmark for Lossy Compressors. In 2020 IEEE International Conference on Big Data (Big Data). 2716–2724.
- Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (Stockholm, Sweden) (HPDC ’20). Association for Computing Machinery, New York, NY, USA, 89––100.
- MDZ: An Efficient Error-bounded Lossy Compressor for Molecular Dynamics. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). 27–40.
- Infovae: Information maximizing variational autoencoders. arXiv preprint arXiv:1706.02262 (2017).
- Designing High-Performance MPI Libraries with On-the-fly Compression for Modern GPU Clusters. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 444–453.
- Accelerating MPI All-to-All Communication With Online Compression On Modern GPU Clusters. In High Performance Computing: 37th International Conference, ISC High Performance 2022, Hamburg, Germany, May 29 – June 2, 2022, Proceedings (Hamburg, Germany). Springer-Verlag, Berlin, Heidelberg, 3––25.
- Zlib. [n. d.]. https://www.zlib.net/. Online.
- Accelerating Relative-error Bounded Lossy Compression for HPC datasets with Precomputation-Based Mechanisms. In 2019 35th Symposium on Mass Storage Systems and Technologies (MSST). 65–78.
- Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data. IEEE Transactions on Parallel and Distributed Systems 31, 7 (2020), 1665–1680. https://doi.org/10.1109/TPDS.2020.2972548