Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability (2306.12141v3)
Abstract: Entropy coding is essential to data compression, image and video coding, etc. The Range variant of Asymmetric Numeral Systems (rANS) is a modern entropy coder, featuring superior speed and compression rate. As rANS is not designed for parallel execution, the conventional approach to parallel rANS partitions the input symbol sequence and encodes partitions with independent codecs, and more partitions bring extra overhead. This approach is found in state-of-the-art implementations such as DietGPU. It is unsuitable for content-delivery applications, as the parallelism is wasted if the decoder cannot decode all the partitions in parallel, but all the overhead is still transferred. To solve this, we propose Recoil, a parallel rANS decoding approach with decoder-adaptive scalability. We discover that a single rANS-encoded bitstream can be decoded from any arbitrary position if the intermediate states are known. After renormalization, these states also have a smaller upper bound, which can be stored efficiently. We then split the encoded bitstream using a heuristic to evenly distribute the workload, and store the intermediate states and corresponding symbol indices as metadata. The splits can then be combined simply by eliminating extra metadata entries. The main contribution of Recoil is reducing unnecessary data transfer by adaptively scaling parallelism overhead to match the decoder capability. The experiments show that Recoil decoding throughput is comparable to the conventional approach, scaling massively on CPUs and GPUs and greatly outperforming various other ANS-based codecs.
- Eirikur Agustsson and Radu Timofte. 2017. NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
- Performance Evaluations of C-Band 5G NR FR1 (Sub-6 GHz) Uplink MIMO on Urban Train. In 2023 IEEE Wireless Communications and Networking Conference (WCNC). 1–6. https://doi.org/10.1109/WCNC55385.2023.10118777
- Variational image compression with a scale hyperprior. In International Conference on Learning Representations.
- Yann Collet. 2023a. New Generation Entropy coders. Retrieved 2023-04-01 from https://github.com/Cyan4973/FiniteStateEntropy
- Yann Collet. 2023b. Zstandard - Real-time data compression algorithm. Retrieved 2023-04-03 from http://facebook.github.io/zstd/
- Sebastian Deorowicz. 2020. Silesia compression corpus. Retrieved 2023-04-10 from https://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
- Jarek Duda. 2009. Asymmetric numeral systems. arXiv:0902.0271 [cs.IT]
- Light Loss-Less Data Compression, with GPU Implementation, Vol. 10048. 281–294. https://doi.org/10.1007/978-3-319-49583-5_22
- Fabian Giesen. 2014. Interleaved entropy coders. arXiv:1402.3392 [cs.IT]
- Fabian Giesen. 2018. Simple rANS encoder/decoder (arithmetic coding-ish entropy coder). Retrieved 2023-04-10 from https://github.com/rygorous/ryg_rans
- Jeff Johnson. 2022. DietGPU: GPU-based lossless compression for numerical data. https://github.com/facebookresearch/dietgpu
- Joint Photographic Experts Group. 2022. JPEG - JPEG XL. Retrieved 2023-04-03 from https://jpeg.org/jpegxl/
- Ndzip-Gpu: Efficient Lossless Compression of Scientific Floating-Point Data on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (St. Louis, Missouri) (SC ’21). Association for Computing Machinery, New York, NY, USA, Article 93, 14 pages. https://doi.org/10.1145/3458817.3476224
- GST: GPU-Decodable Supercompressed Textures. ACM Trans. Graph. 35, 6, Article 230 (dec 2016), 10 pages. https://doi.org/10.1145/2980179.2982439
- Multistage Spatial Context Models for Learned Image Compression. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1–5. https://doi.org/10.1109/ICASSP49357.2023.10095875
- Matt Mahoney. 2023. Large Text Compression Benchmark. Retrieved 2023-04-03 from https://mattmahoney.net/dc/text.html
- Joint Autoregressive and Hierarchical Priors for Learned Image Compression. In Advances in Neural Information Processing Systems.
- An Architecture for Asymmetric Numeral Systems Entropy Decoder - A Comparison with a Canonical Huffman Decoder. J. Signal Process. Syst. 91, 7 (jul 2019), 805–817. https://doi.org/10.1007/s11265-018-1421-4
- NVIDIA. 2023a. NVCOMP. Retrieved 2023-04-10 from https://developer.nvidia.com/nvcomp
- NVIDIA. 2023b. nvJPEG. Retrieved 2023-04-10 from https://developer.nvidia.com/nvjpeg
- Adnan Ozsoy and Martin Swany. 2011. CULZSS: LZSS Lossless Data Compression on CUDA. In 2011 IEEE International Conference on Cluster Computing. 403–411. https://doi.org/10.1109/CLUSTER.2011.52
- Parallel lossless data compression on the GPU. In 2012 Innovative Parallel Computing (InPar). 1–9. https://doi.org/10.1109/InPar.2012.6339599
- Massively-Parallel Lossless Data Decompression. In 2016 45th International Conference on Parallel Processing (ICPP). 242–247. https://doi.org/10.1109/ICPP.2016.35
- André Weißenberger and Bertil Schmidt. 2018. Massively Parallel Huffman Decoding on GPUs. In Proceedings of the 47th International Conference on Parallel Processing (Eugene, OR, USA) (ICPP ’18). Association for Computing Machinery, New York, NY, USA, Article 27, 10 pages. https://doi.org/10.1145/3225058.3225076
- André Weißenberger and Bertil Schmidt. 2019. Massively Parallel ANS Decoding on GPUs. In Proceedings of the 48th International Conference on Parallel Processing (Kyoto, Japan) (ICPP ’19). Association for Computing Machinery, New York, NY, USA, Article 100, 10 pages. https://doi.org/10.1145/3337821.3337888