Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression (2404.13372v1)

Published 20 Apr 2024 in eess.IV and cs.CV

Abstract: This paper investigates the challenging problem of learned image compression (LIC) with extreme low bitrates. Previous LIC methods based on transmitting quantized continuous features often yield blurry and noisy reconstruction due to the severe quantization loss. While previous LIC methods based on learned codebooks that discretize visual space usually give poor-fidelity reconstruction due to the insufficient representation power of limited codewords in capturing faithful details. We propose a novel dual-stream framework, HyrbidFlow, which combines the continuous-feature-based and codebook-based streams to achieve both high perceptual quality and high fidelity under extreme low bitrates. The codebook-based stream benefits from the high-quality learned codebook priors to provide high quality and clarity in reconstructed images. The continuous feature stream targets at maintaining fidelity details. To achieve the ultra low bitrate, a masked token-based transformer is further proposed, where we only transmit a masked portion of codeword indices and recover the missing indices through token generation guided by information from the continuous feature stream. We also develop a bridging correction network to merge the two streams in pixel decoding for final image reconstruction, where the continuous stream features rectify biases of the codebook-based pixel decoder to impose reconstructed fidelity details. Experimental results demonstrate superior performance across several datasets under extremely low bitrates, compared with existing single-stream codebook-based or continuous-feature-based LIC methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. [n. d.]. Kodak Dataset. https://r0k.us/graphics/kodak/
  2. Nicola Asuni and Andrea Giachetti. 2014. TESTIMAGES: a Large-scale Archive for Testing Visual Devices and Basic Image Processing Algorithms. In Smart Tools and Apps for Graphics - Eurographics Italian Chapter Conference, Andrea Giachetti (Ed.). The Eurographics Association. https://doi.org/10.2312/stag.20141242
  3. Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018).
  4. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021).
  5. Fabrice Bellard. 2018. BPG Image Format. https://bellard.org/bpg/
  6. Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (2021), 3736–3764.
  7. Maskgit: Masked generative image transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11315–11325.
  8. Real-World Blind Super-Resolution via Feature Matching with Implicit High-Resolution Priors. arXiv:2202.13142 [cs.CV]
  9. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7939–7948.
  10. Learned Lossless Image Compression with A Hyperprior and Discretized Gaussian Mixture Likelihoods. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2158–2162. https://doi.org/10.1109/ICASSP40776.2020.9053413
  11. Taming Transformers for High-Resolution Image Synthesis. arXiv:2012.09841 [cs.CV]
  12. Radu Timofte Lucas Theis Johannes Balle Eirikur Agustsson Nick Johnston Fabian Mentzer George Toderici, Wenzhe Shi. 2020. Workshop and Challenge on Learned Image Compression (CLIC2020). http://www.compression.cc
  13. Soft then hard: Rethinking the quantization in neural image compression. In International Conference on Machine Learning. PMLR, 3920–3929.
  14. Siamese masked autoencoders. Advances in Neural Information Processing Systems 36 (2024).
  15. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16000–16009.
  16. Coarse-to-fine hyper-prior modeling for learned image compression. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11013–11020.
  17. Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization. arXiv:2305.11718 [cs.CV]
  18. Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation. arXiv:2305.13607 [cs.CV]
  19. Wei Jiang. 2023. MLIC pretrained checkpoints. https://github.com/JiangWeibeta/MLIC
  20. Adaptive human-centric video compression for humans and machines. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1121–1129.
  21. Neural Image Compression Using Masked Sparse Visual Representation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 4189–4197.
  22. Online meta adaptation for variable-rate learned image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 498–506.
  23. Mlic: Multi-reference entropy model for learned image compression. In Proceedings of the 31st ACM International Conference on Multimedia. 7618–7627.
  24. Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4385–4393.
  25. Classsr: A general framework to accelerate super-resolution networks by data characteristic. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12016–12025.
  26. Contextformer: A transformer with spatio-channel attention for context modeling in learned image compression. In European Conference on Computer Vision. Springer, 447–463.
  27. Transformers in speech processing: A survey. arXiv preprint arXiv:2303.11607 (2023).
  28. Context-adaptive entropy model for end-to-end optimized image compression. arXiv preprint arXiv:1809.10452 (2018).
  29. Neural speech synthesis with transformer network. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 6706–6713.
  30. Tianhong Li. 2023. MAGE pretrained checkpoints. https://github.com/LTH14/mage
  31. Mage: Masked generative encoder to unify representation learning and image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2142–2152.
  32. A spatial rnn codec for end-to-end image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13269–13277.
  33. Learning image-adaptive codebooks for class-agnostic image restoration. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5373–5383.
  34. AdaFormer: Efficient Transformer with Adaptive Token Sparsification for Image Super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 4009–4016.
  35. Extreme Image Compression using Fine-tuned VQGAN Models. arXiv preprint arXiv:2307.08265 (2023).
  36. Joint autoregressive and hierarchical priors for learned image compression. Advances in neural information processing systems 31 (2018).
  37. Content adaptive latents and decoder for neural image compression. In European Conference on Computer Vision. Springer, 556–573.
  38. Generating diverse structure for image inpainting with hierarchical VQ-VAE. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10775–10784.
  39. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning. PMLR, 28492–28518.
  40. Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems 32 (2019).
  41. Adversarial masking for self-supervised learning. In International Conference on Machine Learning. PMLR, 20026–20040.
  42. JPEG2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging 11, 2 (2002), 286–287.
  43. Neural discrete representation learning. Advances in neural information processing systems 30 (2017).
  44. Substitutional neural image compression. In 2022 Picture Coding Symposium (PCS). IEEE, 97–101.
  45. CAMixerSR: Only Details Need More” Attention”. arXiv preprint arXiv:2402.19289 (2024).
  46. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9653–9663.
  47. Improving inference for neural image compression. Advances in Neural Information Processing Systems 33 (2020), 573–584.
  48. Towards Robust Blind Face Restoration with Codebook Lookup Transformer. arXiv:2206.11253 [cs.CV]
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Lei Lu (55 papers)
  2. Yanyue Xie (12 papers)
  3. Wei Jiang (341 papers)
  4. Wei Wang (1793 papers)
  5. Xue Lin (92 papers)
  6. Yanzhi Wang (197 papers)
Citations (1)