A U-Net Based Discriminator for Generative Adversarial Networks (2002.12655v2)

Published 28 Feb 2020 in cs.CV, cs.LG, and eess.IV

Abstract: Among the major remaining challenges for generative adversarial networks (GANs) is the capacity to synthesize globally and locally coherent images with object shapes and textures indistinguishable from real images. To target this issue we propose an alternative U-Net based discriminator architecture, borrowing the insights from the segmentation literature. The proposed U-Net based architecture allows to provide detailed per-pixel feedback to the generator while maintaining the global coherence of synthesized images, by providing the global image feedback as well. Empowered by the per-pixel response of the discriminator, we further propose a per-pixel consistency regularization technique based on the CutMix data augmentation, encouraging the U-Net discriminator to focus more on semantic and structural changes between real and fake images. This improves the U-Net discriminator training, further enhancing the quality of generated samples. The novel discriminator improves over the state of the art in terms of the standard distribution and image quality metrics, enabling the generator to synthesize images with varying structure, appearance and levels of detail, maintaining global and local realism. Compared to the BigGAN baseline, we achieve an average improvement of 2.7 FID points across FFHQ, CelebA, and the newly introduced COCO-Animals dataset. The code is available at https://github.com/boschresearch/unetgan.

Authors (3)

Edgar Schönfeld (21 papers)
Bernt Schiele (210 papers)
Anna Khoreva (27 papers)

Citations (275)

View on Semantic Scholar

Summary

The paper presents a U-Net based discriminator that leverages segmentation principles to provide both global and detailed per-pixel feedback for improved image coherence in GANs.
It achieves an average improvement of 2.7 FID points across datasets like FFHQ, CelebA, and COCO-Animals, demonstrating its effectiveness over conventional methods.
The approach requires no modifications to the generator architecture, ensuring compatibility with various GAN frameworks and facilitating future advancements.

A U-Net Based Discriminator for Generative Adversarial Networks

The paper presents a novel approach to enhancing the performance of Generative Adversarial Networks (GANs) by redesigning the discriminator using a U-Net-based architecture. The primary motivation behind this advancement is to address one of the significant challenges in GANs: the synthesis of images that are globally and locally coherent, both in terms of structure and texture, with real images.

Key Contributions

U-Net Based Discriminator: The core of the proposed method is a U-Net architecture for the discriminator. This model extends traditional discriminator designs, which typically operate as classifiers, by incorporating segmentation network principles. The U-Net discriminator provides both global feedback and detailed per-pixel feedback to the generator, potentially enhancing the quality of the generated images.
Per-pixel Consistency Regularization: Leveraging the U-Net architecture's pixel-wise feedback capability, the authors introduced a regularization technique that ensures the discriminator remains focused on semantic and structural differences between real and synthesized images. This is achieved using a CutMix data augmentation strategy, which blends portions of real and fake images, thereby guiding the discriminator towards more nuanced feedback.
Performance Improvements: The proposed U-Net GAN model yields substantial gains in the Fréchet Inception Distance (FID) metric compared to the BigGAN baseline. Specifically, an average improvement of 2.7 FID points was achieved across the FFHQ, CelebA, and a new COCO-Animals dataset. Such results demonstrate that the nuanced global and local feedback facilitated by the U-Net discriminator translates into more realistic image generation.
Orthogonal Enhancements: Importantly, the modifications to the discriminator do not necessitate changes to the generator architecture. This characteristic suggests that these innovations are compatible with a variety of GAN models and can complement other advances in generator architectures, divergence measures, and regularization techniques.

Implications and Future Directions

The introduction of U-Net based discriminators is a promising direction for improving GAN performance. This approach addresses the balance between global structure realism and local detail, a common issue in conventional GANs where discriminators often prioritize one aspect over the other. By mitigating this disparity, the U-Net architecture allows for the synthesis of more coherent images and has implications for numerous applications in computer vision, from facial image generation to complex scene synthesis.

Future research could investigate the integration of U-Net discriminators with other state-of-the-art generator models beyond BigGAN, such as StyleGAN, to explore further improvements. Additionally, the versatility of the U-Net's pixel-level feedback could be extended to other types of generative models or tasks requiring fine-grained structural feedback.

In conclusion, this paper's proposed U-Net GAN model contributes significantly to the advancement of generative modeling by effectively blending segmentation techniques with adversarial learning paradigms, resulting in improved image generation quality and opening new avenues for research in high-fidelity synthetic image synthesis.

PDF Markdown

Related Papers

YouTube

Show All Videos