Unsupervised Cross-Domain Image Generation (1611.02200v1)

Published 7 Nov 2016 in cs.CV

Abstract: We study the problem of transferring a sample in one domain to an analog sample in another domain. Given two related domains, S and T, we would like to learn a generative function G that maps an input sample from S to the domain T, such that the output of a given function f, which accepts inputs in either domains, would remain unchanged. Other than the function f, the training data is unsupervised and consist of a set of samples from each domain. The Domain Transfer Network (DTN) we present employs a compound loss function that includes a multiclass GAN loss, an f-constancy component, and a regularizing component that encourages G to map samples from T to themselves. We apply our method to visual domains including digits and face images and demonstrate its ability to generate convincing novel images of previously unseen entities, while preserving their identity.

Citations (982)

View on Semantic Scholar

Summary

The paper presents a novel DTN framework that transfers images across domains using a composite loss function combining multiclass GAN, f-constancy, and regularization terms.
The approach achieves 90.66% accuracy for SVHN to MNIST transfer and 84.44% for unsupervised domain adaptation, outperforming existing methods.
The research demonstrates enhanced image synthesis by mapping photos to emojis with higher fidelity and semantic consistency compared to human-crafted designs.

Unsupervised Cross-Domain Image Generation

This paper, presented by Yaniv Taigman, Adam Polyak, and Lior Wolf from Facebook AI Research, tackles the unsupervised cross-domain image generation problem. The central goal is to transfer a sample from one domain (S) to an analogous sample in another domain (T) using a generative function $G$ , ensuring the output remains consistent under a common function $f$ .

Summary

The proposed framework, termed Domain Transfer Network (DTN), integrates a generative adversarial network (GAN) loss, an $f$ -constancy term, and a regularizing term to facilitate the unsupervised training. The DTN utilizes a composite loss function to ensure the generated sample $G(x)$ from domain $S$ matches the distribution in domain $T$ , satisfying $f(x) \approx f(G(x))$ . This methodology is tested across visual domains, specifically digit datasets (MNIST and SVHN) and face datasets to emojis.

Methodology

The DTN architecture distinguishes itself through several key innovations:

Multiclass GAN Loss: Unlike typical binary GAN loss, this work employs a multiclass GAN loss to handle both $S$ and $T$ domains comprehensively.
$f$ -Constancy Term: Ensures that $f(x)$ and $f(G(x))$ are close, maintaining the original feature representation across domains.
Regularization Term: Encourages the identity mapping $G(x) = x$ for samples in domain $T$ .

The network's training incorporates these three loss components to iteratively refine $G$ . For instance, to ensure the target domain consistency, a loss term penalizes deviations between $f(x)$ and $f(G(x))$ for samples in $S$ .

Experimental Results

Digits: SVHN to MNIST

By transferring SVHN digit images to the MNIST domain, the DTN achieves a high classification accuracy of 90.66% on the generated MNIST-like images, significantly outperforming the baseline method. The paper further examines the impact of omitting specific digit classes during training, demonstrating DTN's robustness in generating plausible samples even for unseen classes.

The authors also utilize DTN for unsupervised domain adaptation, achieving superior results (84.44% accuracy) compared to state-of-the-art methods like Domain-Adversarial Neural Networks (DANN) which achieved 73.85%.

Faces: Photos to Emoji

The DTN is applied to transfer face photos to emoji representations. The authors compare the results to emojis crafted by human annotators, finding that the DTN-generated emojis are more distinctive and preserve facial characteristics significantly better. A quantitative evaluation using a retrieval task shows a median rank of 16 for DTN-generated emojis, compared to 16,311 for manually created emojis, indicating a substantial improvement in recognizability and fidelity.

Implications and Future Developments

The implications of this research are multifaceted:

Algorithmic Advancements: The novel compound loss function in DTN could inspire new architectures in cross-domain generative models, emphasizing unsupervised learning paradigms.
Visual Domain Applications: The ability to transfer visual domain knowledge without labeled data has broad applications, from creating personalized emojis to various image synthesis tasks.
Unsupervised Domain Adaptation: By demonstrating the efficacy of DTN in domain adaptation, this work opens up new avenues for improving the adaptability of models across diverse data distributions without requiring labeled target data.

Future developments might explore extending DTN to other modalities and more complex domains, such as medical imaging or text-to-image generation. Moreover, improving the scalability and computational efficiency of DTNs could make them more accessible for real-world applications.

Conclusion

The paper on unsupervised cross-domain image generation via Domain Transfer Networks presents a robust framework for mapping between visually distinct domains while maintaining semantic consistency. Through extensive experiments, it underscores the importance of a balanced compound loss function and paves the way for advancing unsupervised learning techniques in generative models. The potential applications and improvements highlighted in this paper mark significant contributions to the field of AI image synthesis and domain adaptation.

PDF Markdown

Unsupervised Cross-Domain Image Generation (1611.02200v1)

Summary

Unsupervised Cross-Domain Image Generation

Summary

Methodology

Experimental Results

Digits: SVHN to MNIST

Faces: Photos to Emoji

Implications and Future Developments

Conclusion

Related Papers