Few-shot Image Generation via Masked Discrimination (2210.15194v3)
Abstract: Few-shot image generation aims to generate images of high quality and great diversity with limited data. However, it is difficult for modern GANs to avoid overfitting when trained on only a few images. The discriminator can easily remember all the training samples and guide the generator to replicate them, leading to severe diversity degradation. Several methods have been proposed to relieve overfitting by adapting GANs pre-trained on large source domains to target domains using limited real samples. This work presents a novel approach to realize few-shot GAN adaptation via masked discrimination. Random masks are applied to features extracted by the discriminator from input images. We aim to encourage the discriminator to judge various images which share partially common features with training samples as realistic. Correspondingly, the generator is guided to generate diverse images instead of replicating training samples. In addition, we employ a cross-domain consistency loss for the discriminator to keep relative distances between generated samples in its feature space. It strengthens global image discrimination and guides adapted GANs to preserve more information learned from source domains for higher image quality. The effectiveness of our approach is demonstrated both qualitatively and quantitatively with higher quality and greater diversity on a series of few-shot image generation tasks than prior methods.
- Image2stylegan++: How to edit the embedded images? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8296–8305, 2020.
- Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
- When do gans replicate? on the choice of dataset size. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6701–6710, 2021.
- Generative adversarial nets. Advances in Neural Information Processing Systems, 27, 2014.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems, 30, 2017.
- Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1125–1134, 2017.
- Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, pages 694–711. Springer, 2016.
- Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems, 33:12104–12114, 2020.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019.
- Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8110–8119, 2020.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- G. Kwon and J. C. Ye. One-shot adaptation of gan in just one clip. arXiv preprint arXiv:2203.09301, 2022.
- Diverse image-to-image translation via disentangled representations. In Proceedings of the European Conference on Computer Vision, pages 35–51, 2018.
- Few-shot image generation with elastic weight consolidation. Advances in Neural Information Processing Systems, 2020.
- A tutorial on fisher information. Journal of Mathematical Psychology, 2017.
- Exemplar guided unsupervised image-to-image translation with semantic consistency. arXiv preprint arXiv:1805.11145, 2018.
- Freeze the discriminator: A simple baseline for fine-tuning gans. In CVPR AI for Content Creation Workshop, 2020.
- A. Noguchi and T. Harada. Image generation from small datasets via batch statistics adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2750–2758, 2019.
- Few-shot image generation via cross-domain correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10743–10752, 2021.
- Image-to-image translation: Methods and applications. IEEE Transactions on Multimedia, 2021.
- Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR, Jul 2021.
- Encoding in style: A stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2287–2296, 2021.
- Coco-funit: Few-shot unsupervised image translation with a content conditioned style encoder. In Proceedings of the European Conference on Computer Vision, pages 382–398. Springer, 2020.
- A u-net based discriminator for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8207–8216, 2020.
- On data augmentation for gan training. IEEE Transactions on Image Processing, 30:1882–1897, 2021.
- X. Wang and X. Tang. Face photo-sketch synthesis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11):1955–1967, 2008.
- Minegan: Effective knowledge transfer from gans to target domains with few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9332–9341, 2020.
- Semi-supervised learning for few-shot image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4453–4462, 2020.
- Transferring gans: Generating images from limited data. In Proceedings of the European Conference on Computer Vision, pages 218–234, 2018.
- Few shot generative model adaption via relaxed spatial structural alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11204–11213, 2022.
- The face of art: Landmark detection and geometric style in portraits. ACM Transactions on Graphics (TOG), 38(4):1–15, 2019.
- Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018.
- Towards diverse and faithful one-shot adaption of generative adversarial networks. arXiv preprint arXiv:2207.08736, 2022.
- Generalized one-shot domain adaption of generative adversarial networks. arXiv preprint arXiv:2209.03665, 2022.
- Differentiable augmentation for data-efficient gan training. Advances in Neural Information Processing Systems, 33:7559–7570, 2020.
- Few-shot image generation via adaptation-aware kernel modulation. Advances in Neural Information Processing Systems, 2022.
- A closer look at few-shot image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9140–9150, 2022.
- Image augmentations for gan training. arXiv preprint arXiv:2006.02595., 2020.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2223–2232, 2017.
- Toward multimodal image-to-image translation. Advances in Neural Information Processing Systems, 30, 2017.
- Mind the gap: Domain gap control for single shot domain adaptation for generative adversarial networks. arXiv preprint arXiv:2110.08398, 2021.
- Jingyuan Zhu (11 papers)
- Huimin Ma (44 papers)
- Jiansheng Chen (41 papers)
- Jian Yuan (57 papers)