Few-Shot Unsupervised Image-to-Image Translation (1905.01723v2)

Published 5 May 2019 in cs.CV, cs.AI, cs.GR, cs.MM, and stat.ML

Abstract: Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images. While remarkably successful, current methods require access to many images in both source and destination classes at training time. We argue this greatly limits their use. Drawing inspiration from the human capability of picking up the essence of a novel object from a small number of examples and generalizing from there, we seek a few-shot, unsupervised image-to-image translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example images. Our model achieves this few-shot generation capability by coupling an adversarial training scheme with a novel network design. Through extensive experimental validation and comparisons to several baseline methods on benchmark datasets, we verify the effectiveness of the proposed framework. Our implementation and datasets are available at https://github.com/NVlabs/FUNIT .

Citations (576)

View on Semantic Scholar

Summary

The paper introduces FUNIT, a novel framework that uses dual encoders and adversarial training to perform few-shot image-to-image translation.
The approach outperforms baselines on datasets like Animal Faces and North American Birds in both translation accuracy and photorealism.
FUNIT’s success in limited-data scenarios offers practical insights for applications in areas like medical imaging and wildlife monitoring.

Few-Shot Unsupervised Image-to-Image Translation: A Summary

Introduction

The paper, "Few-Shot Unsupervised Image-to-Image Translation," addresses the limitations of existing image translation techniques which require extensive data from both source and target classes at training time. Inspired by human ability to generalize from limited examples, the authors propose the Few-shot UNsupervised Image-to-image Translation (FUNIT) framework. This method is capable of translating an image to a target class using only a few example images that are provided at test time, without having seen images from that class during training.

Methodology

The FUNIT framework employs a combination of adversarial training and a novel network architecture comprising a content encoder, a class encoder, and a decoder. The content encoder extracts class-invariant features, while the class encoder identifies class-specific attributes. This approach allows the framework to generalize image translation tasks to unseen classes effectively.

To achieve few-shot capabilities, FUNIT relies on Generative Adversarial Networks (GANs) with a multi-task adversarial discriminator that solves several binary classification tasks. This setup not only distinguishes real from generated images but also ensures the translated outputs remain faithful to the characteristics of the content and target images.

Experimental Results

Experiments were conducted using a variety of datasets including Animal Faces and North American Birds. FUNIT was rigorously tested against various baseline models such as CycleGAN, UNIT, and MUNIT. Results demonstrated that FUNIT substantially outperforms both fair baselines (trained only on source classes) and unfair baselines (which have access to target data during training), particularly in translation accuracy and photorealistic output quality.

FUNIT's proficiency is underscored by its performance metrics: for one-shot settings, it achieved higher Top-5 test accuracy compared to baselines, showcasing superior ability to adapt to few-shot conditions. Moreover, the model's translation accuracy, content preservation, and generative output quality were positively correlated with the diversity of the training set, indicating its robustness across varied datasets.

Implications and Future Directions

The introduction of FUNIT opens new avenues for efficient training paradigms in image translation, especially in domains where data scarcity is an issue. Its ability to perform under few-shot conditions has significant implications for tasks such as medical imaging, wildlife monitoring, and beyond.

The paper also poses questions for future exploration: enhancing generalization to even more diverse and visually distinct classes, integrating this approach with other few-shot learning paradigms, and further scaling its applications in real-world scenarios.

In summary, while not deemed revolutionary, FUNIT presents a notable advancement in the field of unsupervised image-to-image translation by bridging existing capabilities with few-shot learning techniques. It establishes a foundation for both theoretical research and practical applications where data is limited.

PDF Markdown

Related Papers

GitHub

GitHub - NVlabs/FUNIT: Translate images to unseen domains in the test time with few example images. (1,578 stars)

Tweets

https://twitter.com/liu_mingyu/status/1190083311143546881

https://twitter.com/DynamicWebPaige/status/1143542118536773633

https://twitter.com/rudyagovic/status/1193234351946915840

https://twitter.com/Josh412/status/1262525935217258497

https://twitter.com/Janisku7/status/1347039999775170561

https://twitter.com/tackman/status/1154732215755915264

YouTube

Show All Videos