Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability (2306.12141v3)

Published 21 Jun 2023 in cs.DC, cs.IT, and math.IT

Abstract: Entropy coding is essential to data compression, image and video coding, etc. The Range variant of Asymmetric Numeral Systems (rANS) is a modern entropy coder, featuring superior speed and compression rate. As rANS is not designed for parallel execution, the conventional approach to parallel rANS partitions the input symbol sequence and encodes partitions with independent codecs, and more partitions bring extra overhead. This approach is found in state-of-the-art implementations such as DietGPU. It is unsuitable for content-delivery applications, as the parallelism is wasted if the decoder cannot decode all the partitions in parallel, but all the overhead is still transferred. To solve this, we propose Recoil, a parallel rANS decoding approach with decoder-adaptive scalability. We discover that a single rANS-encoded bitstream can be decoded from any arbitrary position if the intermediate states are known. After renormalization, these states also have a smaller upper bound, which can be stored efficiently. We then split the encoded bitstream using a heuristic to evenly distribute the workload, and store the intermediate states and corresponding symbol indices as metadata. The splits can then be combined simply by eliminating extra metadata entries. The main contribution of Recoil is reducing unnecessary data transfer by adaptively scaling parallelism overhead to match the decoder capability. The experiments show that Recoil decoding throughput is comparable to the conventional approach, scaling massively on CPUs and GPUs and greatly outperforming various other ANS-based codecs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Eirikur Agustsson and Radu Timofte. 2017. NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
  2. Performance Evaluations of C-Band 5G NR FR1 (Sub-6 GHz) Uplink MIMO on Urban Train. In 2023 IEEE Wireless Communications and Networking Conference (WCNC). 1–6. https://doi.org/10.1109/WCNC55385.2023.10118777
  3. Variational image compression with a scale hyperprior. In International Conference on Learning Representations.
  4. Yann Collet. 2023a. New Generation Entropy coders. Retrieved 2023-04-01 from https://github.com/Cyan4973/FiniteStateEntropy
  5. Yann Collet. 2023b. Zstandard - Real-time data compression algorithm. Retrieved 2023-04-03 from http://facebook.github.io/zstd/
  6. Sebastian Deorowicz. 2020. Silesia compression corpus. Retrieved 2023-04-10 from https://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
  7. Jarek Duda. 2009. Asymmetric numeral systems. arXiv:0902.0271 [cs.IT]
  8. Light Loss-Less Data Compression, with GPU Implementation, Vol. 10048. 281–294. https://doi.org/10.1007/978-3-319-49583-5_22
  9. Fabian Giesen. 2014. Interleaved entropy coders. arXiv:1402.3392 [cs.IT]
  10. Fabian Giesen. 2018. Simple rANS encoder/decoder (arithmetic coding-ish entropy coder). Retrieved 2023-04-10 from https://github.com/rygorous/ryg_rans
  11. Jeff Johnson. 2022. DietGPU: GPU-based lossless compression for numerical data. https://github.com/facebookresearch/dietgpu
  12. Joint Photographic Experts Group. 2022. JPEG - JPEG XL. Retrieved 2023-04-03 from https://jpeg.org/jpegxl/
  13. Ndzip-Gpu: Efficient Lossless Compression of Scientific Floating-Point Data on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (St. Louis, Missouri) (SC ’21). Association for Computing Machinery, New York, NY, USA, Article 93, 14 pages. https://doi.org/10.1145/3458817.3476224
  14. GST: GPU-Decodable Supercompressed Textures. ACM Trans. Graph. 35, 6, Article 230 (dec 2016), 10 pages. https://doi.org/10.1145/2980179.2982439
  15. Multistage Spatial Context Models for Learned Image Compression. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1–5. https://doi.org/10.1109/ICASSP49357.2023.10095875
  16. Matt Mahoney. 2023. Large Text Compression Benchmark. Retrieved 2023-04-03 from https://mattmahoney.net/dc/text.html
  17. Joint Autoregressive and Hierarchical Priors for Learned Image Compression. In Advances in Neural Information Processing Systems.
  18. An Architecture for Asymmetric Numeral Systems Entropy Decoder - A Comparison with a Canonical Huffman Decoder. J. Signal Process. Syst. 91, 7 (jul 2019), 805–817. https://doi.org/10.1007/s11265-018-1421-4
  19. NVIDIA. 2023a. NVCOMP. Retrieved 2023-04-10 from https://developer.nvidia.com/nvcomp
  20. NVIDIA. 2023b. nvJPEG. Retrieved 2023-04-10 from https://developer.nvidia.com/nvjpeg
  21. Adnan Ozsoy and Martin Swany. 2011. CULZSS: LZSS Lossless Data Compression on CUDA. In 2011 IEEE International Conference on Cluster Computing. 403–411. https://doi.org/10.1109/CLUSTER.2011.52
  22. Parallel lossless data compression on the GPU. In 2012 Innovative Parallel Computing (InPar). 1–9. https://doi.org/10.1109/InPar.2012.6339599
  23. Massively-Parallel Lossless Data Decompression. In 2016 45th International Conference on Parallel Processing (ICPP). 242–247. https://doi.org/10.1109/ICPP.2016.35
  24. André Weißenberger and Bertil Schmidt. 2018. Massively Parallel Huffman Decoding on GPUs. In Proceedings of the 47th International Conference on Parallel Processing (Eugene, OR, USA) (ICPP ’18). Association for Computing Machinery, New York, NY, USA, Article 27, 10 pages. https://doi.org/10.1145/3225058.3225076
  25. André Weißenberger and Bertil Schmidt. 2019. Massively Parallel ANS Decoding on GPUs. In Proceedings of the 48th International Conference on Parallel Processing (Kyoto, Japan) (ICPP ’19). Association for Computing Machinery, New York, NY, USA, Article 100, 10 pages. https://doi.org/10.1145/3337821.3337888
Citations (1)

Summary

We haven't generated a summary for this paper yet.