Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks (2103.10428v1)

Published 18 Mar 2021 in cs.CV, cs.GR, and cs.LG

Abstract: Numerous task-specific variants of conditional generative adversarial networks have been developed for image completion. Yet, a serious limitation remains that all existing algorithms tend to fail when handling large-scale missing regions. To overcome this challenge, we propose a generic new approach that bridges the gap between image-conditional and recent modulated unconditional generative architectures via co-modulation of both conditional and stochastic style representations. Also, due to the lack of good quantitative metrics for image completion, we propose the new Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS), which robustly measures the perceptual fidelity of inpainted images compared to real images via linear separability in a feature space. Experiments demonstrate superior performance in terms of both quality and diversity over state-of-the-art methods in free-form image completion and easy generalization to image-to-image translation. Code is available at https://github.com/zsyzzsoft/co-mod-gan.

Citations (266)

Summary

  • The paper introduces co-modulated GANs that integrate conditional inputs with stochastic style representations to improve image completion for large missing areas.
  • It employs a dual architecture based on StyleGAN2, using a conditional encoder and mapping network to generate diverse and realistic outputs.
  • The proposed P-IDS/U-IDS metrics reliably assess performance, aligning with human perception and outperforming state-of-the-art methods.

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

The paper "Large Scale Image Completion via Co-Modulated Generative Adversarial Networks" presents an innovative approach to the longstanding challenge of completing images with large missing regions using GAN architectures. The researchers propose a novel method known as co-modulated GANs that bridges the gap between image-conditional and modulated unconditional generative architectures, demonstrating notable improvements over existing frameworks.

Methodology

Existing conditional GANs have encountered substantial difficulties in handling large missing areas due to their limited generative capabilities. The authors address this by introducing a co-modulation approach. This innovative method integrates both conditional inputs and stochastic style representations, thus enhancing the diversity and quality of the generated content.

The model's architecture expands upon StyleGAN2, effectively embedding a conditional encoder (E\mathcal{E}) alongside a mapping network (M\mathcal{M}) to produce a style vector that incorporates both the input image and latent vector information. This dual approach allows for inherent stochasticity and a more sophisticated generation of varied outputs without necessitating external losses.

Proposed Metrics

A critical contribution of this paper is the development of the Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS). These metrics address the previously unmet need for robust quantitative assessment in image completion tasks. By measuring the linear separability of fake versus real images in a feature space and providing alignment with human perception, P-IDS/U-IDS offer scalable and reliable metrics.

Experimental Results

The paper presents comprehensive experiments across different datasets, including FFHQ and Places2, showcasing superior performance in image completion and image-to-image translation tasks. The results in terms of P-IDS, U-IDS, and FID indicated that the co-modulated GANs outperformed state-of-the-art methods such as DeepFillv2 and RFR, providing clearer, more consistent completions of large missing areas.

Further experiments on datasets like Edges2Shoes and COCO-Stuff demonstrated the effectiveness of the proposed model in both fidelity and diversity, highlighting the model’s versatility in handling different types of image translation tasks.

Implications and Future Work

The work presented in this paper has significant theoretical and practical implications. By successfully introducing stochasticity into conditional GAN architectures, the model not only improves image completion radically but also opens avenues for other applications in image-to-image translation. Future developments could involve further exploration of co-modulation techniques in other generative tasks, potentially enhancing the capability of GANs in diverse domains.

The introduction of P-IDS/U-IDS as informative metrics could shift the paradigm in GAN evaluation standards, offering a more aligned understanding with human visual fidelity.

In conclusion, this paper provides a solid foundation and pathway for advancing methods in visual content synthesis, leveraging both conditional and unconditional generative strengths to tackle large-scale completion challenges effectively.