Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Image Coding for Machines with Edge Information Learning Using Segment Anything (2403.04173v3)

Published 7 Mar 2024 in cs.CV

Abstract: Image Coding for Machines (ICM) is an image compression technique for image recognition. This technique is essential due to the growing demand for image recognition AI. In this paper, we propose a method for ICM that focuses on encoding and decoding only the edge information of object parts in an image, which we call SA-ICM. This is an Learned Image Compression (LIC) model trained using edge information created by Segment Anything. Our method can be used for image recognition models with various tasks. SA-ICM is also robust to changes in input data, making it effective for a variety of use cases. Additionally, our method provides benefits from a privacy point of view, as it removes human facial information on the encoder's side, thus protecting one's privacy. Furthermore, this LIC model training method can be used to train Neural Representations for Videos (NeRV), which is a video compression model. By training NeRV using edge information created by Segment Anything, it is possible to create a NeRV that is effective for image recognition (SA-NeRV). Experimental results confirm the advantages of SA-ICM, presenting the best performance in image compression for image recognition. We also show that SA-NeRV is superior to ordinary NeRV in video compression for machines. Code is available at https://github.com/final-0/SA-ICM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (9)
  1. G. K. Wallace, “The JPEG still picture compression standard,” IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii-xxxiv, Feb. 1992.
  2. J. Liu, H. Sun, and J. Katto, “Learned Image Compression with Mixed Transformer-CNN Architectures,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 14388-14397.
  3. H. Choi, and I.V.Bajic. “Scalable Image Coding for Humans and Machines,” IEEE Transaction on Image Processing, vol. 31, 2022.
  4. F. Codevilla, J. G. Simard, R. Goroshin, and C. Pal, “Learned Image Compression for Machine Perception,” arXiv prepint, arXiv : 2111.02249, 2021.
  5. T. Shindo, T. Watanabe, K. Yamada and H. Watanabe, “Image Coding for Machines with Object Region Learning,” IEEE Consumer Communications and Networking Conference (CCNC 2024), Jan. 2024.
  6. J. Redmon, S. Divvala, R. Girshick and A. Farhadi, “You only look once: Unified, real-time object detection.” arXiv preprint arXiv:1506.02640, 2015.
  7. R. Feng, X. Jin, Z. Guo, R. Feng, Y. Gao, T. He, Z. Zhang, S. Sun, and Z. Chen, “Image Coding for Machines with Omnipotent Feature Learning,” Computer Vision - ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13697. 2022, pp 510-528.
  8. K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778.
  9. C. Y. Wang, A. Bochkovskiy, and M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for realtime object detectors.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 7464-7475.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Takahiro Shindo (10 papers)
  2. Kein Yamada (4 papers)
  3. Taiju Watanabe (8 papers)
  4. Hiroshi Watanabe (92 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.