Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scalable Face Image Coding via StyleGAN Prior: Towards Compression for Human-Machine Collaborative Vision (2312.15622v1)

Published 25 Dec 2023 in cs.CV, cs.AI, and cs.MM

Abstract: The accelerated proliferation of visual content and the rapid development of machine vision technologies bring significant challenges in delivering visual data on a gigantic scale, which shall be effectively represented to satisfy both human and machine requirements. In this work, we investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision. Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers, supporting machine intelligence and human visual perception in a progressive fashion. With the aim of achieving efficient compression, we propose the layer-wise scalable entropy transformer to reduce the redundancy between layers. Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio. We validate the proposed paradigm's feasibility in face image compression. Extensive qualitative and quantitative experimental results demonstrate the superiority of the proposed paradigm over the latest compression standard Versatile Video Coding (VVC) in terms of both machine analysis as well as human perception at extremely low bitrates ($<0.01$ bpp), offering new insights for human-machine collaborative compression.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. “Video coding for machines: A paradigm of collaborative compression and intelligent analytics,” IEEE Transactions on Image Processing, vol. 29, pp. 8680–8695, 2020.
  2. “Towards analysis-friendly face representation with scalable feature and texture compression,” IEEE Transactions on Multimedia, 2021.
  3. “Sssic: semantics-to-signal scalable image coding with learned structural representations,” IEEE Transactions on Image Processing, vol. 30, pp. 8939–8954, 2021.
  4. “Semantic scalable image compression with cross-layer priors,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 4044–4052.
  5. “Semantics-to-signal scalable image compression with learned revertible representations,” International Journal of Computer Vision, vol. 129, no. 9, pp. 2605–2621, 2021.
  6. “Towards coding for human and machine vision: A scalable image coding approach,” in IEEE International Conference on Multimedia and Expo, 2020, pp. 1–6.
  7. “Towards coding for human and machine vision: scalable face image coding,” IEEE Transactions on Multimedia, vol. 23, pp. 2957–2971, 2021.
  8. “Scalable image coding for humans and machines,” IEEE Transactions on Image Processing, vol. 31, pp. 2739–2754, 2022.
  9. “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4401–4410.
  10. “Analyzing and improving the image quality of stylegan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8110–8119.
  11. “Interfacegan: Interpreting the disentangled face representation learned by gans,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  12. “Generative hierarchical features from synthesizing images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4432–4442.
  13. JPEG: Still image data compression standard, Springer Science & Business Media, 1992.
  14. Majid Rabbani, “JPEG2000: Image compression fundamentals, standards and practice,” Journal of Electronic Imaging, vol. 11, no. 2, pp. 286, 2002.
  15. “Overview of the H.264/AVC video coding standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, 2003.
  16. “An overview of AVS2 standard,” in Advanced Video Coding Systems, pp. 35–49. Springer, 2014.
  17. “Overview of the high efficiency video coding (HEVC) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649–1668, 2012.
  18. “Overview of the versatile video coding (VVC) standard and its applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
  19. “End-to-end optimized image compression,” in International Conference on Learning Representations, 2017.
  20. “Joint autoregressive and hierarchical priors for learned image compression,” in Advances in Neural Information Processing Systems, 2018, pp. 10771–10780.
  21. “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  22. “Variational image compression with a scale hyperprior,” in International Conference on Learning Representations, 2018.
  23. “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014, pp. 2672–2680.
  24. “Generative adversarial networks for extreme learned image compression,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 221–231.
  25. “High-fidelity generative image compression,” Advances in Neural Information Processing Systems, vol. 33, 2020.
  26. “Layered conceptual image compression via deep semantic synthesis,” in IEEE International Conference on Image Processing, 2019, pp. 694–698.
  27. “Conceptual compression via deep structure and texture synthesis,” IEEE Transactions on Image Processing, vol. 31, pp. 2809–2823, 2022.
  28. “Disentangled visual representations for extreme human body video compression,” in IEEE International Conference on Multimedia and Expo, 2022, pp. 1–6.
  29. “Compress-then-analyze vs. analyze-then-compress: Two paradigms for image analysis in visual sensor networks,” in IEEE International Workshop on Multimedia Signal Processing, 2013, pp. 278–282.
  30. “Overview of the mpeg-cdvs standard,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 179–194, 2015.
  31. “Compact descriptors for video analysis: The emerging mpeg standard,” IEEE MultiMedia, vol. 26, no. 2, pp. 44–54, 2018.
  32. “Progressive growing of gans for improved quality, stability, and variation,” in International Conference on Learning Representations, 2018.
  33. “Large scale gan training for high fidelity natural image synthesis,” arXiv preprint arXiv:1809.11096, 2018.
  34. “Wasserstein generative adversarial networks,” in International Conference on Machine Learning, 2017, pp. 214–223.
  35. “Least squares generative adversarial networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 2794–2802.
  36. “Improved training of wasserstein gans,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  37. “Mode seeking generative adversarial networks for diverse image synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1429–1437.
  38. “Gan inversion: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  39. “Image2stylegan: How to embed images into the stylegan latent space?,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4432–4441.
  40. “Image2stylegan++: How to edit the embedded images?,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8296–8305.
  41. “Encoding in style: a stylegan encoder for image-to-image translation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021.
  42. “Designing an encoder for stylegan image manipulation,” ACM Transactions on Graphics, vol. 40, no. 4, pp. 1–14, 2021.
  43. “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  44. “Adaptive wing loss for robust face alignment via heatmap regression,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6971–6981.
  45. “Maskgan: Towards diverse and interactive facial image manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  46. “Arcface: Additive angular margin loss for deep face recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4690–4699.
  47. “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 586–595.
  48. “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, 2012.
  49. “Which training methods for gans do actually converge?,” in International Conference on Machine Learning, 2018.
  50. “Automatic differentiation in pytorch,” Advances in Neural Information Processing Systems Workshops, 2017.
  51. “Adam: A method for stochastic optimization,” in International Conference on Learning Representations, 2015.
  52. “A review on deep learning techniques applied to semantic segmentation,” arXiv preprint arXiv:1704.06857, 2017.
  53. “Anycost gans for interactive image synthesis and editing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14986–14996.
  54. “Image quality assessment: Unifying structure and texture similarity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  55. RECOMMENDATION ITU-R BT, “Methodology for the subjective assessment of the quality of television pictures,” International Telecommunication Union, 2002.
  56. “Thousand to one: Semantic prior modeling for conceptual coding,” in IEEE International Conference on Multimedia and Expo, 2021, pp. 1–6.
  57. “The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english,” PloS one, vol. 13, no. 5, pp. e0196391, 2018.
  58. “Celebv-hq: A large-scale video facial attributes dataset,” in European conference on computer vision. Springer, 2022, pp. 650–667.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Qi Mao (22 papers)
  2. Chongyu Wang (9 papers)
  3. Meng Wang (1063 papers)
  4. Shiqi Wang (162 papers)
  5. Ruijie Chen (5 papers)
  6. Libiao Jin (6 papers)
  7. Siwei Ma (84 papers)
Citations (5)