Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bridging the gap between image coding for machines and humans (2401.10732v1)

Published 19 Jan 2024 in eess.IV and cs.CV

Abstract: Image coding for machines (ICM) aims at reducing the bitrate required to represent an image while minimizing the drop in machine vision analysis accuracy. In many use cases, such as surveillance, it is also important that the visual quality is not drastically deteriorated by the compression process. Recent works on using neural network (NN) based ICM codecs have shown significant coding gains against traditional methods; however, the decompressed images, especially at low bitrates, often contain checkerboard artifacts. We propose an effective decoder finetuning scheme based on adversarial training to significantly enhance the visual quality of ICM codecs, while preserving the machine analysis accuracy, without adding extra bitcost or parameters at the inference phase. The results show complete removal of the checkerboard artifacts at the negligible cost of -1.6% relative change in task performance score. In the cases where some amount of artifacts is tolerable, such as when machine consumption is the primary target, this technique can enhance both pixel-fidelity and feature-fidelity scores without losing task performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. “Jpeg at 25: Still going strong,” IEEE MultiMedia, vol. 24, no. 2, pp. 96–103, 2017.
  2. Recommendation ITU-T H.266 | ISO/IEC 23090-3, “Versatile video coding,” 2020.
  3. “Video Coding for Machine: Compact Visual Representation Compression for Intelligent Collaborative Analytics,” arXiv:2110.09241 [cs], Oct. 2021.
  4. “Video coding for machines: A paradigm of collaborative compression and intelligent analytics,” IEEE Transactions on Image Processing, vol. 29, pp. 8680–8695, 2020.
  5. “Image coding for machines: an end-to-end learned approach,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 1590–1594.
  6. “Boosting neural image compression for machines using latent space masking,” arXiv preprint arXiv:2112.08168, 2021.
  7. “Enhancing image coding for machines with compressed feature residuals,” in 2021 IEEE International Symposium on Multimedia (ISM). IEEE, 2021, pp. 217–225.
  8. “Image-to-Image Translation with Conditional Adversarial Networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). July 2017, pp. 5967–5976, IEEE.
  9. “Learned image coding for machines: A content-adaptive approach,” in 2021 IEEE International Conference on Multimedia and Expo (ICME), 2021, pp. 1–6.
  10. H. Choi and I. V. Bajic, “Scalable image coding for humans and machines,” 2022.
  11. R. Zhang, “Making convolutional networks shift-invariant again,” in International conference on machine learning. PMLR, 2019, pp. 7324–7334.
  12. “Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better,” in The IEEE International Conference on Computer Vision (ICCV), Oct 2019.
  13. “High-Fidelity Generative Image Compression,” arXiv:2006.09965 [cs, eess], Oct. 2020.
  14. “Analysis Of Neural Image Compression Networks For Machine-To-Machine Communication,” in 2021 IEEE International Conference on Image Processing (ICIP), Sept. 2021.
  15. “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778, ISSN: 1063-6919.
  16. “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034.
  17. “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1874–1883.
  18. “Learned enhancement filters for image coding for machines,” in 2021 IEEE International Symposium on Multimedia (ISM). IEEE, 2021, pp. 235–239.
  19. “Feature pyramid networks for object detection,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936–944.
  20. “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
  21. “The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale,” IJCV, 2020.
  22. “Evaluation framework for video coding for machines (m57658),” ISO/IEC JTC 1/SC 29/WG 2, MPEG Technical requirements, Document: N104, July 2021.
  23. S. Liu and W. Deng, “Very deep convolutional neural network based image classification using small training sample size,” in 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), 2015, pp. 730–734.
  24. “Mask R-CNN,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988.
  25. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Nam Le (15 papers)
  2. Honglei Zhang (32 papers)
  3. Francesco Cricri (22 papers)
  4. Ramin G. Youvalari (4 papers)
  5. Hamed Rezazadegan Tavakoli (6 papers)
  6. Emre Aksu (16 papers)
  7. Miska M. Hannuksela (6 papers)
  8. Esa Rahtu (78 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.