Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (1703.10593v7)

Published 30 Mar 2017 in cs.CV

Abstract: Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain $X$ to a target domain $Y$ in the absence of paired examples. Our goal is to learn a mapping $G: X \rightarrow Y$ such that the distribution of images from $G(X)$ is indistinguishable from the distribution $Y$ using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping $F: Y \rightarrow X$ and introduce a cycle consistency loss to push $F(G(X)) \approx X$ (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.

Citations (5,524)

View on Semantic Scholar

Summary

The paper introduces CycleGAN, a framework that uses adversarial and cycle consistency losses to enable unpaired image-to-image translation while preserving essential content.
The method employs dual generators and discriminators to achieve bidirectional mapping, resulting in visually realistic and semantically consistent image transformations.
Quantitative and perceptual evaluations demonstrate CycleGAN's effectiveness in tasks like style transfer, object transfiguration, and seasonal changes compared to prior methods.

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Abstract and Introduction

The paper presents a novel approach for unpaired image-to-image translation, leveraging Cycle-Consistent Adversarial Networks (CycleGAN). Traditional image-to-image translation methods require paired training data, which is often difficult or impractical to obtain. The authors circumvent this requirement by introducing an approach that learns to translate images from a source domain $X$ to a target domain $Y$ without paired examples. This is achieved through adversarial loss combined with a cycle consistency loss, compelling the network to preserve the fundamental content of input images during translation.

Methodology

Adversarial Loss and Cycle Consistency

The proposed CycleGAN architecture consists of two generators, $G: X \rightarrow Y$ and $F: Y \rightarrow X$ , each paired with a discriminator, $D_Y$ and $D_X$ , respectively. The adversarial loss ensures that the output from the generators is indistinguishable from real images in the target domain. Mathematically, for $G$ and $D_Y$ , the adversarial loss is expressed as:

$\mathcal{L}_{\text{GAN}}(G, D_Y, X, Y) = \mathbb{E}_{y \sim p_\text{data}(y)}[\log D_Y(y)] + \mathbb{E}_{x \sim p_\text{data}(x)}[\log (1 - D_Y(G(x)))]$

Given that adversarial training alone is under-constrained and prone to mode collapse, the authors introduce a cycle consistency loss to regularize the mappings. This loss ensures that translating an image to the target domain and back to the source domain reconstructs the original image, expressed as:

$\mathcal{L}_{\text{cyc}}(G, F) = \mathbb{E}_{x \sim p_\text{data}(x)}[\| F(G(x)) - x \|_1] + \mathbb{E}_{y \sim p_\text{data}(y)}[\| G(F(y)) - y \|_1]$

The combined objective thus becomes: $\mathcal{L}(G, F, D_X, D_Y) = \mathcal{L}_{\text{GAN}}(G, D_Y, X, Y) + \mathcal{L}_{\text{GAN}}(F, D_X, Y, X) + \lambda \mathcal{L}_{\text{cyc}}(G, F)$

Results and Evaluation

The authors validate the efficacy of their approach through a series of experiments on different unpaired image translation tasks. These include style transfer between photographs and artistic paintings, object transfiguration, and season transfer. The qualitative results confirm that CycleGAN can produce high-quality and semantically meaningful translations without paired training data.

In quantitative evaluations, the performance of CycleGAN is benchmarked using both perceptual studies and automatic metrics. For instance, in the Cityscapes labels-to-photo task, CycleGAN achieves superior FCN scores compared to baseline methods, indicating better semantic consistency in the generated images. The AMT perceptual studies further reveal that CycleGAN-generated images are significantly more realistic than those produced by competing approaches such as CoGAN and SimGAN.

Implications and Future Technologies

The practical implications of CycleGAN are profound, particularly in domains where paired data is scarce. Applications of this research extend to artistic content creation, data augmentation for machine learning, and enhancements in virtual reality. Theoretically, this work emphasizes the importance of structured loss functions in adversarial training, showcasing how additional constraints like cycle consistency can stabilize training and produce more reliable models.

Future developments in AI could see the adoption of similar cycle-consistent approaches in areas requiring significant stylistic or semantic transformations. Furthermore, improvements in generator architectures could address current limitations related to geometric transformations, broadening the applicability of these methods. An intriguing area for future investigation is the integration of weak or semi-supervised learning signals to improve performance and resolve ambiguities in translation tasks.

Conclusion

The paper introduces a robust framework for unpaired image-to-image translation, effectively harnessing adversarial networks with cycle consistency losses. The CycleGAN model demonstrates competitive results across various challenging tasks, validating the potential of unpaired translation methods. While limitations exist, particularly concerning geometric changes, the proposed approach provides a solid foundation for future exploration and application in both academic and industrial settings.

PDF Markdown

Related Papers

Tweets

https://twitter.com/csinva/status/1747413889233342880

https://twitter.com/kartike_ya/status/1893544276719436067

https://twitter.com/ndwork/status/1792319794022351297

https://twitter.com/WillieMaize828/status/1796399958633193813

https://twitter.com/DrPyRepo/status/1824403300331974751

YouTube

Show All Videos