Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

One-Shot Unsupervised Cross Domain Translation (1806.06029v2)

Published 15 Jun 2018 in cs.CV

Abstract: Given a single image x from domain A and a set of images from domain B, our task is to generate the analogous of x in B. We argue that this task could be a key AI capability that underlines the ability of cognitive agents to act in the world and present empirical evidence that the existing unsupervised domain translation methods fail on this task. Our method follows a two step process. First, a variational autoencoder for domain B is trained. Then, given the new sample x, we create a variational autoencoder for domain A by adapting the layers that are close to the image in order to directly fit x, and only indirectly adapt the other layers. Our experiments indicate that the new method does as well, when trained on one sample x, as the existing domain transfer methods, when these enjoy a multitude of training samples from domain A. Our code is made publicly available at https://github.com/sagiebenaim/OneShotTranslation

Citations (125)

Summary

  • The paper introduces OST, a novel method that adapts a single domain A image using a VAE trained on domain B and selective backpropagation.
  • It demonstrates that one-shot translation achieves competitive performance against multi-sample baselines like CycleGAN and UNIT.
  • The approach enhances adaptive AI systems and lays a foundation for future research in lifelong unsupervised domain adaptation.

One-Shot Unsupervised Cross Domain Translation: A Study

The paper "One-Shot Unsupervised Cross Domain Translation" by Sagie Benaim and Lior Wolf addresses a novel problem within the domain of unsupervised domain translation, specifically the one-shot scenario. This work is centered on achieving cross-domain image translation given only a single reference image from an unseen domain, termed as domain AA, and a set of images or a pre-trained model from a target domain, BB.

Problem Definition and Approach

The concept of unsupervised domain translation generally involves learning a mapping between two domains using unpaired data samples from each. This established research typically relies on large datasets from both domains. The challenge presented by this paper arises when only a single sample from domain AA is available, a case not previously explored to the same extent as zero-shot tasks.

To address this challenge, the authors propose a method termed One-Shot Translation (OST), consisting of a two-step process. Initially, a variational autoencoder (VAE) is trained to model domain BB. The empirical features from this VAE are then adapted by employing a second phase, where the model is cloned and adjusted to fit the single sample xx from AA. The critical innovation in this step is in adopting selective backpropagation mechanisms to prevent overfitting on xx, an issue inherently tied to the one-shot nature of the task.

Methodology and Results

The methodology builds upon the VAE framework combined with adversarial training to ensure image quality and features alignment in domain BB. Unlike existing models, shared layers between the two domains are finely adjusted to accommodate features from both AA and BB. Experiments demonstrated that OST maintains comparable performance levels to traditional multi-sample approaches, even when given only the singular xx.

Key benchmarks, such as MNIST to SVHN translation, depict OST's robustness when compared to industry baselines like CycleGAN and UNIT, especially in conditions reflecting limited data from AA. The implications of successfully executing such one-shot domain translations extend into improved adaptability for autonomous systems encountering diverse environments sequentially.

Implications and Future Directions

The potential applications for this research are manifold. Aside from multimedia tasks, the underpinning ability to translate based on minimal exposure positions OST favorably in cognitive AI systems—those positioning themselves to learn and adapt continually with limited prior exposure to certain domains. These instances of one-shot learning serve as a crucial step towards effective lifelong learning systems.

Future developments could delve into refining the model’s stability under further varied domain representations or integrating additional constraints that might improve the distinctiveness of translations. Additionally, exploring network architectures beyond VAEs and GANs might uncover more efficient pathways for such constrained translation tasks. The overall prospects of this direction lay out a framework that could eventually lead towards broader, more flexible systems capable of navigating the complex dynamics of the natural world through the lens of computational translation and representation.

This research contributes a valuable piece to the rapidly evolving landscape of unsupervised learning and domain adaptation, setting a foundation for subsequent developments in one-shot and few-shot learning paradigms.