3D-SSGAN: Lifting 2D Semantics for 3D-Aware Compositional Portrait Synthesis (2401.03764v1)
Abstract: Existing 3D-aware portrait synthesis methods can generate impressive high-quality images while preserving strong 3D consistency. However, most of them cannot support the fine-grained part-level control over synthesized images. Conversely, some GAN-based 2D portrait synthesis methods can achieve clear disentanglement of facial regions, but they cannot preserve view consistency due to a lack of 3D modeling abilities. To address these issues, we propose 3D-SSGAN, a novel framework for 3D-aware compositional portrait image synthesis. First, a simple yet effective depth-guided 2D-to-3D lifting module maps the generated 2D part features and semantics to 3D. Then, a volume renderer with a novel 3D-aware semantic mask renderer is utilized to produce the composed face features and corresponding masks. The whole framework is trained end-to-end by discriminating between real and synthesized 2D images and their semantic masks. Quantitative and qualitative evaluations demonstrate the superiority of 3D-SSGAN in controllable part-level synthesis while preserving 3D view consistency.
- A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019, p. 4401–4410.
- Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020, p. 8110–8119.
- Alias-free generative adversarial networks. Advances in Neural Information Processing Systems 2021;34:852–863.
- State-of-the-art in the architecture, methods and applications of stylegan. Computer Graphics Forum 2022;41(2):591–611.
- Semanticstylegan: Learning compositional generative priors for controllable image synthesis and editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, p. 11254–11264.
- Stylenerf: A style-based 3d-aware generator for high-resolution image synthesis. arXiv preprint arXiv:211008985 2021;.
- 3d-aware image synthesis via learning structural and textural representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022a, p. 18430–18439.
- Efficient geometry-aware 3d generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, p. 16123–16133.
- Gram: Generative radiance manifolds for 3d-aware image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, p. 10673–10683.
- Stylesdf: High-resolution 3d-consistent image and geometry generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, p. 13503–13513.
- Generative adversarial nets. stat 2014;1050:10.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM 2021;65(1):99–106.
- Ide-3d: Interactive disentangled editing for high-resolution 3d-aware portrait synthesis. arXiv preprint arXiv:220515517 2022a;.
- Nerffaceediting: Disentangled face editing in neural radiance fields. In: SIGGRAPH Asia 2022 Conference Papers. 2022, p. 1–9.
- Semantic 3d-aware portrait synthesis and manipulation based on compositional neural radiance field. In: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI). 2023,.
- Advances in neural rendering. Computer Graphics Forum 2022;41(2):703–735.
- A survey on 3d-aware image synthesis. 2022. arXiv:2210.14267.
- Visual object networks: Image generation with disentangled 3d representations. Advances in neural information processing systems 2018;31.
- Escaping plato’s cave: 3d shape from adversarial rendering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019, p. 9984--9993.
- Hologan: Unsupervised learning of 3d representations from natural images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019, p. 7588--7597.
- Blockgan: Learning 3d object-aware scene representations from unlabelled images. Advances in Neural Information Processing Systems 2020;33:6767--6778.
- pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021, p. 5799--5809.
- A shading-guided generative implicit model for shape-accurate 3d-aware image synthesis. Advances in Neural Information Processing Systems 2021;34:20002--20013.
- Cips-3d: A 3d-aware generator of gans based on conditionally-independent pixel synthesis. arXiv preprint arXiv:211009788 2021;.
- Gram-hd: 3d-consistent image generation at high resolution with generative radiance manifolds. arXiv preprint arXiv:220607255 2022;.
- Exploiting spatial dimensions of latent in gan for real-time image editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, p. 852--861.
- Diagonal attention and style-based gan for content-style disentanglement in image generation and translation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, p. 13980--13989.
- Transeditor: transformer-based dual-space gan for highly controllable facial editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022b, p. 7683--7692.
- Stylefusion: A generative model for disentangling spatial segments. arXiv preprint arXiv:210707437 2021;.
- Fenerf: Face editing in neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022b, p. 7672--7682.
- Sem2nerf: Converting single-view semantic masks to neural radiance fields. arXiv preprint arXiv:220310821 2022;.
- Training and tuning generative neural radiance fields for attribute-conditional 3d-aware face generation. arXiv preprint arXiv:220812550 2022;.
- Compositional gan: Learning image-conditional binary composition. International Journal of Computer Vision 2020;128:2570--2585.
- Surprising image compositions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, p. 3926--3930.
- Sac-gan: Structure-aware image composition. IEEE Transactions on Visualization and Computer Graphics 2022;:1--13.
- Attend, infer, repeat: Fast scene understanding with generative models. Advances in neural information processing systems 2016;29.
- Lr-gan: Layered recursive generative adversarial networks for image generation. arXiv preprint arXiv:170301560 2017;.
- Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:190111390 2019;.
- Multi-object representation learning with iterative variational inference. In: International Conference on Machine Learning. PMLR; 2019, p. 2424--2433.
- Relate: Physically plausible multi-object scene synthesis using structured latent spaces. Advances in Neural Information Processing Systems 2020;33:11202--11213.
- Compositional transformers for scene generation. Advances in Neural Information Processing Systems 2021;34:9506--9520.
- Giraffe: Representing scenes as compositional generative neural feature fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, p. 11453--11464.
- Giraffe hd: A high-resolution 3d-aware generative model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, p. 18440--18449.
- Maskgan: Towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, p. 5549--5558.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 2017;30.