Spatial-Contextual Discrepancy Information Compensation for GAN Inversion (2312.07079v1)
Abstract: Most existing GAN inversion methods either achieve accurate reconstruction but lack editability or offer strong editability at the cost of fidelity. Hence, how to balance the distortioneditability trade-off is a significant challenge for GAN inversion. To address this challenge, we introduce a novel spatial-contextual discrepancy information compensationbased GAN-inversion method (SDIC), which consists of a discrepancy information prediction network (DIPN) and a discrepancy information compensation network (DICN). SDIC follows a "compensate-and-edit" paradigm and successfully bridges the gap in image details between the original image and the reconstructed/edited image. On the one hand, DIPN encodes the multi-level spatial-contextual information of the original and initial reconstructed images and then predicts a spatial-contextual guided discrepancy map with two hourglass modules. In this way, a reliable discrepancy map that models the contextual relationship and captures finegrained image details is learned. On the other hand, DICN incorporates the predicted discrepancy information into both the latent code and the GAN generator with different transformations, generating high-quality reconstructed/edited images. This effectively compensates for the loss of image details during GAN inversion. Both quantitative and qualitative experiments demonstrate that our proposed method achieves the excellent distortion-editability trade-off at a fast inference speed for both image inversion and editing tasks.
- Image2StyleGAN: How to embed images into the StyleGAN latent space? In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 4432–4441.
- Image2StyleGAN++: How to edit the embedded images? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8296–8305.
- Styleflow: Attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows. ACM Transactions on Graphics (ToG), 40(3): 1–21.
- ReStyle: A residual-based StyleGAN encoder via iterative refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 6711–6720.
- HyperStyle: StyleGAN inversion with hypernetworks for real image editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18511–18521.
- Semantic photo manipulation with a generative image prior. arXiv preprint arXiv:2005.07727.
- Seeing what a GAN cannot generate. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 4502–4511.
- Pyramid stereo matching network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5410–5418.
- Exploiting hierarchical context on a large database of object categories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 129–136.
- Editing in style: Uncovering the local semantics of GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5771–5780.
- Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4690–4699.
- Hyperinverter: Improving StyleGAN inversion via hypernetwork. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11389–11398.
- StyleGAN-NADA: CLIP-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG), 41(4): 1–13.
- Ganalyze: Toward visual definitions of cognitive image properties. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 5744–5753.
- Image processing using multi-code GAN prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3012–3021.
- GANspace: Discovering interpretable GAN controls. Conference on Neural Information Processing Systems (NeurIPS), 33: 9841–9850.
- Style transformer for image inversion and editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11337–11346.
- Curricularface: adaptive curriculum learning loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5901–5910.
- Detecting bias with generative counterfactual face attribute augmentation. arXiv preprint arXiv:1906.06439.
- GAN inversion for out-of-range images with geometric transformations. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 13941–13949.
- Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4401–4410.
- Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8110–8119.
- 3d object representations for fine-grained categorization. In Proceedings of the International IEEE Workshop on 3D Representation and Recognition, 554–561.
- Attentive contexts for object detection. IEEE Transactions on Multimedia (TMM), 19(5): 944–954.
- Delving StyleGAN inversion for image editing: A foundation latent space viewpoint. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10072–10082.
- Deep learning face attributes in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 3730–3738.
- StyleCLIP: Text-driven manipulation of StyleGAN imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2085–2094.
- Adversarial latent autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14104–14113.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), 8748–8763.
- Encoding in style: a StyleGAN encoder for image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2287–2296.
- Pivotal tuning for latent-based editing of real images. ACM Transactions on graphics (TOG), 42(1): 1–13.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 234–241.
- Interpreting the latent space of GANs for semantic face editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9243–9252.
- Closed-form factorization of latent semantics in GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1532–1540.
- Designing an encoder for StyleGAN image manipulation. ACM Transactions on Graphics (TOG), 40(4): 1–14.
- Unsupervised discovery of interpretable directions in the GAN latent space. In arXiv preprint arXiv:2002.03754.
- The geometry of deep generative image models and its applications. arXiv preprint arXiv:2101.06006.
- High-fidelity GAN inversion for image attribute editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11379–11388.
- Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), 3–19.
- GAN inversion: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- A Style-Based GAN Encoder for High Fidelity Reconstruction of Images and Videos. European Conference on Computer Vision (ECCV).
- Gradient centralization: A new optimization technique for deep neural networks. In European Conference on Computer Vision (ECCV), 635–652. Springer.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 586–595.
- In-domain GAN inversion for real image editing. In European Conference on Computer Vision (ECCV), 592–608.
- Improved StyleGAN embedding: Where are the good latents? arXiv preprint arXiv:2012.09036.
- Ziqiang Zhang (11 papers)
- Yan Yan (242 papers)
- Jing-Hao Xue (54 papers)
- Hanzi Wang (66 papers)