Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Progressive Learning with Visual Prompt Tuning for Variable-Rate Image Compression (2311.13846v2)

Published 23 Nov 2023 in cs.CV, cs.IT, and math.IT

Abstract: In this paper, we propose a progressive learning paradigm for transformer-based variable-rate image compression. Our approach covers a wide range of compression rates with the assistance of the Layer-adaptive Prompt Module (LPM). Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively, which are fed as additional information into the Swin Transformer layer of a pre-trained transformer-based image compression model to affect the allocation of attention region and the bits, which in turn changes the target compression ratio of the model. To ensure the network is more lightweight, we involves the integration of prompt networks with less convolutional layers. Exhaustive experiments show that compared to methods based on multiple models, which are optimized separately for different target rates, the proposed method arrives at the same performance with 80% savings in parameter storage and 90% savings in datasets. Meanwhile, our model outperforms all current variable bitrate image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed bitrate image compression methods trained from scratch.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. End-to-end optimized image compression. arXiv preprint arXiv:1611.01704, 2016.
  2. Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436, 2018.
  3. Fabrice Bellard. Bpg image format, 2018.
  4. Overview of the versatile video coding (vvc) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3736–3764, 2021.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  6. Transtic: Transferring transformer-based image compression from human perception to machine perception. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23297–23307, 2023.
  7. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7939–7948, 2020.
  8. Variable rate deep image compression with a conditional autoencoder. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3146–3154, 2019.
  9. Asymmetric gained deep image compression with continuous rate adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10532–10541, 2021.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  11. Rich Franzen. Kodak lossless true color image suite. source: http://r0k. us/graphics/kodak, 4(2):9, 1999.
  12. Flexible neural image compression via code editing. Advances in Neural Information Processing Systems, 35:12184–12196, 2022.
  13. Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022.
  14. Instance-aware prompt learning for language understanding and generation. ACM Transactions on Asian and Low-Resource Language Information Processing, 2023.
  15. Joint global and local hierarchical priors for learned image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5992–6001, 2022.
  16. Context-adaptive entropy model for end-to-end optimized image compression. arXiv preprint arXiv:1809.10452, 2018.
  17. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
  18. Transformer-based image compression. arXiv preprint arXiv:2111.06707, 2021.
  19. High-fidelity generative image compression. Advances in Neural Information Processing Systems, 33:11913–11924, 2020.
  20. Joint autoregressive and hierarchical priors for learned image compression. Advances in neural information processing systems, 31, 2018.
  21. Channel-wise autoregressive entropy models for learned image compression. In 2020 IEEE International Conference on Image Processing (ICIP), pages 3339–3343. IEEE, 2020.
  22. Entroformer: A transformer-based entropy model for learned image compression. arXiv preprint arXiv:2202.05492, 2022.
  23. An overview of the jpeg 2000 still image compression standard. Signal processing: Image communication, 17(1):3–48, 2002.
  24. Claude E Shannon et al. Coding theorems for a discrete source with a fidelity criterion. IRE Nat. Conv. Rec, 4(142-163):1, 1959.
  25. Variable-rate deep image compression through spatially-adaptive feature transform. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2380–2389, 2021.
  26. Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085, 2015.
  27. Clic 2020: Challenge on learned image compression, 2020, 2020.
  28. Full resolution image compression with recurrent neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 5306–5314, 2017.
  29. Enhanced invertible encoding for learned image compression. In Proceedings of the 29th ACM international conference on multimedia, pages 162–170, 2021.
  30. Variable rate deep image compression with modulated autoencoder. IEEE Signal Processing Letters, 27:331–335, 2020.
  31. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67–78, 2014.
  32. Transformer-based transform coding. In International Conference on Learning Representations, 2021.
  33. The devil is in the details: Window-based attention for image compression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17492–17501, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Shiyu Qin (5 papers)
  2. Yimin Zhou (8 papers)
  3. Jinpeng Wang (48 papers)
  4. Bin Chen (547 papers)
  5. Baoyi An (8 papers)
  6. Tao Dai (57 papers)
  7. Shu-Tao Xia (171 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.