Towards Identifiable Unsupervised Domain Translation: A Diversified Distribution Matching Approach (2401.09671v2)
Abstract: Unsupervised domain translation (UDT) aims to find functions that convert samples from one domain (e.g., sketches) to another domain (e.g., photos) without changing the high-level semantic meaning (also referred to as content''). The translation functions are often sought by probability distribution matching of the transformed source domain and target domain. CycleGAN stands as arguably the most representative approach among this line of work. However, it was noticed in the literature that CycleGAN and variants could fail to identify the desired translation functions and produce content-misaligned translations. This limitation arises due to the presence of multiple translation functions -- referred to as
measure-preserving automorphism" (MPA) -- in the solution space of the learning criteria. Despite awareness of such identifiability issues, solutions have remained elusive. This study delves into the core identifiability inquiry and introduces an MPA elimination theory. Our analysis shows that MPA is unlikely to exist, if multiple pairs of diverse cross-domain conditional distributions are matched by the learning function. Our theory leads to a UDT learner using distribution matching over auxiliary variable-induced subsets of the domains -- other than over the entire data domains as in the classical approaches. The proposed framework is the first to rigorously establish translation identifiability under reasonable UDT settings, to our best knowledge. Experiments corroborate with our theoretical claims.
- TravelGAN: Image-to-image translation by transformation vector learning. In Proceedings of IEEE/CVF Computer Vision and Pattern Recognition (CVPR), pp. 8983–8992, 2019.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Spectrally-normalized margin bounds for neural networks. Advances in Neural Information Processing Systems (NeurIPS), 30, 2017.
- Neal L Carothers. Real analysis. Cambridge University Press, 2000.
- On translation and reconstruction guarantees of the cycle-consistent generative adversarial networks. Advances in Neural Information Processing Systems (NeurIPS), 35:23607–23620, 2022.
- StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of IEEE/CVF Computer Vision and Pattern Recognition (CVPR), pp. 8789–8797, 2018.
- StarGAN v2: Diverse image synthesis for multiple domains. In Proceedings of IEEE/CVF Computer Vision and Pattern Recognition (CVPR), pp. 8188–8197, 2020.
- Word translation without parallel data. arXiv preprint arXiv:1710.04087, 2017.
- Joint distribution optimal transportation for domain adaptation. Advances in Neural Information Processing Systems (NeurIPS), 30, 2017.
- George Darmois. Analyse des liaisons de probabilité. In Proceedings of International Statistic Conferences, pp. 231, 1951.
- Identifiability results for multimodal contrastive learning. arXiv preprint arXiv:2303.09166, 2023.
- CycleGAN through the lens of (dynamical) optimal transport. In Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML), pp. 132–147. Springer, 2021.
- Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications. IEEE Signal Processing Magazine, 36(2):59–80, 2019.
- Scaling-up disentanglement for image translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 6783–6792, 2021.
- An image is worth more than a thousand words: Towards disentanglement in the wild. Advances in Neural Information Processing Systems (NeurIPS), 34:9216–9228, 2021.
- Generalization bounds for unsupervised cross-domain mapping with WGANs. arXiv preprint arXiv:1807.08501, 2018a.
- The role of minimal complexity functions in unsupervised learning of semantic mappings. In Proceedings of International Conference on Learning Representations (ICLR), 2018b.
- Risk bounds for unsupervised cross-domain mapping with ipms. The Journal of Machine Learning Research, 22(1):4019–4060, 2021.
- Domain-adversarial training of neural networks. Journal of Machine Learning Research (JMLR), 17:2096–2030, 2016.
- Generative adversarial networks. In Advances in Neural Information Processing Systems (NeurIPS), 2014.
- Identifiability conditions for domain adaptation. In Proceedings of International Conference on Machine Learning (ICML), pp. 7982–7997, 2022.
- GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems (NeurIPS), 30, 2017.
- Cross-domain image retrieval with a dual attribute-aware ranking network. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1062–1070, 2015.
- Multimodal unsupervised image-to-image translation. In Proceedings of European Conference on Computer Vision (ECCV), pp. 172–189, 2018.
- Nonlinear independent component analysis: Existence and uniqueness results. Neural networks, 12(3):429–439, 1999.
- Nonlinear ica using auxiliary variables and generalized contrastive learning. In Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 859–868. PMLR, 2019.
- Image-to-image translation with conditional adversarial networks. In Proceedings of IEEE/CVF Computer Vision and Pattern Recognition (CVPR), pp. 1125–1134, 2017.
- Exploring patch-wise semantic relation for contrastive learning in image-to-image translation tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18260–18269, 2022.
- Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
- U-GAT-IT: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In Proceedings of International Conference on Learning Representations (ICLR), 2020.
- Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of International Conference on Machine Learning (ICML), pp. 1857–1865, 2017.
- Adam: A method for stochastic optimization. In Proceedings of International Conference on Learning Representations (ICLR), 2015.
- Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems (NeurIPS), 25, 2012.
- Unsupervised machine translation using monolingual corpora only. arXiv preprint arXiv:1711.00043, 2017.
- MNIST handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
- Attribute guided unpaired image-to-image translation with semi-supervised learning. arXiv preprint arXiv:1904.12428, 2019.
- Coupled generative adversarial networks. Advances in Neural Information Processing Systems (NeurIPS), 29, 2016.
- Unsupervised image-to-image translation networks. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, 2017.
- Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 10551–10560, 2019.
- Understanding latent correlation-based multiview learning and self-supervision: An identifiability perspective. In Proceedings of International Conference on Learning Representations (ICLR), 2022.
- Rectifier nonlinearities improve neural network acoustic models. In Proceedings of International Conference on Machine Learning (ICML), volume 30, pp. 3, 2013.
- Least squares generative adversarial networks. In Proceedings of International Conference on Computer Vision (ICCV), pp. 2794–2802, 2017.
- Which training methods for GANs do actually converge? In Proceedings of International Conference on Machine Learning (ICML), pp. 3481–3490. PMLR, 2018.
- Kernel of CycleGAN as a principle homogeneous space. In Proceedings of International Conference on Learning Representations (ICLR), 2020.
- Mostafa Mozafari. Bitmoji faces. https://www.kaggle.com/datasets/mostafamozafari/bitmoji-faces, 2020. Accessed on September 20th, 2023.
- Transformation consistency regularization: A semi-supervised paradigm for image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 599–615, 2020.
- Image-to-image translation: Methods and applications. IEEE Transactions on Multimedia, 24:3859–3881, 2021.
- Contrastive learning for unpaired image-to-image translation. In Proceedings of European Conference on Computer Vision (ECCV), pp. 319–345, 2020.
- Learning transferable visual models from natural language supervision. In Proceedings of International Conference on Machine Learning (ICML), pp. 8748–8763, 2021.
- Walter Rudin. Principles of mathematical analysis, volume 3. McGraw-hill New York, 1976.
- Rethinking the inception architecture for computer vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826, 2016.
- Unsupervised cross-domain image generation. In Proceedings of International Conference on Learning Representations (ICLR), 2017.
- Self-supervised learning with data augmentations provably isolates content from style. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, pp. 16451–16467, 2021.
- High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of IEEE/CVF Computer Vision and Pattern Recognition (CVPR), pp. 8798–8807, 2018.
- Semi-supervised learning for few-shot image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4453–4462, 2020.
- TransGaGa: Geometry-aware unsupervised image-to-image translation. In Proceedings of IEEE/CVF Computer Vision and Pattern Recognition (CVPR), pp. 8012–8021, 2019.
- Unsupervised image-to-image translation with density changing regularization. Advances in Neural Information Processing Systems (NeurIPS), 35:28545–28558, 2022.
- Maximum spatial perturbation consistency for unpaired image-to-image translation. In Proceedings of IEEE/CVF Computer Vision and Pattern Recognition (CVPR), pp. 18311–18320, 2022.
- Gp-unit: Generative prior for versatile unsupervised image-to-image translation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- The unusual effectiveness of averaging in gan training. In International Conference on Learning Representations, 2018.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 586–595, 2018.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of IEEE/CVF Computer Vision and Pattern Recognition (CVPR), pp. 2223–2232, 2017.
- A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1):43–76, 2020.
- Contrastive learning inverts the data generating process. In Proceedings of International Conference on Machine Learning (ICML), pp. 12979–12990, 2021.