Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing (2402.14398v1)

Published 22 Feb 2024 in cs.CV and cs.AI

Abstract: GAN-based image attribute editing firstly leverages GAN Inversion to project real images into the latent space of GAN and then manipulates corresponding latent codes. Recent inversion methods mainly utilize additional high-bit features to improve image details preservation, as low-bit codes cannot faithfully reconstruct source images, leading to the loss of details. However, during editing, existing works fail to accurately complement the lost details and suffer from poor editability. The main reason is they inject all the lost details indiscriminately at one time, which inherently induces the position and quantity of details to overfit source images, resulting in inconsistent content and artifacts in edited images. This work argues that details should be gradually injected into both the reconstruction and editing process in a multi-stage coarse-to-fine manner for better detail preservation and high editability. Therefore, a novel dual-stream framework is proposed to accurately complement details at each stage. The Reconstruction Stream is employed to embed coarse-to-fine lost details into residual features and then adaptively add them to the GAN generator. In the Editing Stream, residual features are accurately aligned by our Selective Attention mechanism and then injected into the editing process in a multi-stage manner. Extensive experiments have shown the superiority of our framework in both reconstruction accuracy and editing quality compared with existing methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Image2stylegan: How to embed images into the stylegan latent space? In Proceedings of the IEEE/CVF international conference on computer vision, 4432–4441.
  2. Image2stylegan++: How to edit the embedded images? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8296–8305.
  3. Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. ACM Transactions on Graphics (ToG), 40(3): 1–21.
  4. Restyle: A residual-based stylegan encoder via iterative refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6711–6720.
  5. Hyperstyle: Stylegan inversion with hypernetworks for real image editing. In Proceedings of the IEEE/CVF conference on computer Vision and pattern recognition, 18511–18521.
  6. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4690–4699.
  7. Hyperinverter: Improving stylegan inversion via hypernetwork. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11389–11398.
  8. Ganspace: Discovering interpretable gan controls. Advances in neural information processing systems, 33: 9841–9850.
  9. Style transformer for image inversion and editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11337–11346.
  10. Curricularface: adaptive curriculum learning loss for deep face recognition. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5901–5910.
  11. Transforming and projecting images into class-conditional generative networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, 17–34. Springer.
  12. Gan inversion for out-of-range images with geometric transformations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 13941–13949.
  13. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
  14. Training generative adversarial networks with limited data. Advances in neural information processing systems, 33: 12104–12114.
  15. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34: 852–863.
  16. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4401–4410.
  17. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8110–8119.
  18. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  19. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, 554–561.
  20. Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10072–10082.
  21. Cycle encoding of a StyleGAN encoder for improved reconstruction and editability. In Proceedings of the 30th ACM International Conference on Multimedia, 2032–2041.
  22. Interestyle: Encoding an interest region for robust stylegan inversion. In European Conference on Computer Vision, 460–476. Springer.
  23. Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2085–2094.
  24. Styleres: Transforming the residuals for real image editing with stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1828–1837.
  25. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763. PMLR.
  26. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2287–2296.
  27. Pivotal tuning for latent-based editing of real images. ACM Transactions on graphics (TOG), 42(1): 1–13.
  28. Interpreting the latent space of gans for semantic face editing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 9243–9252.
  29. Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE transactions on pattern analysis and machine intelligence, 44(4): 2004–2018.
  30. Closed-form factorization of latent semantics in gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1532–1540.
  31. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  32. Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG), 40(4): 1–14.
  33. Attention is all you need. Advances in neural information processing systems, 30.
  34. Hijack-gan: Unintended-use of pretrained, black-box gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 7872–7881.
  35. High-fidelity gan inversion for image attribute editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11379–11388.
  36. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4): 600–612.
  37. E2Style: Improve the efficiency and effectiveness of StyleGAN inversion. IEEE Transactions on Image Processing, 31: 3267–3280.
  38. Stylespace analysis: Disentangled controls for stylegan image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12863–12872.
  39. Reluplex made more practical: Leaky ReLU. In 2020 IEEE Symposium on Computers and communications (ISCC), 1–7. IEEE.
  40. A style-based gan encoder for high fidelity reconstruction of images and videos. In European conference on computer vision, 581–597. Springer.
  41. Lookahead optimizer: k steps forward, 1 step back. Advances in neural information processing systems, 32.
  42. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, 586–595.
  43. Generative visual manipulation on the natural image manifold. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14, 597–613. Springer.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hao Li (803 papers)
  2. Mengqi Huang (29 papers)
  3. Lei Zhang (1689 papers)
  4. Bo Hu (110 papers)
  5. Yi Liu (543 papers)
  6. Zhendong Mao (55 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.