Multiple Noises in Diffusion Model for Semi-Supervised Multi-Domain Translation (2309.14394v1)
Abstract: Domain-to-domain translation involves generating a target domain sample given a condition in the source domain. Most existing methods focus on fixed input and output domains, i.e. they only work for specific configurations (i.e. for two domains, either $D_1\rightarrow{}D_2$ or $D_2\rightarrow{}D_1$). This paper proposes Multi-Domain Diffusion (MDD), a conditional diffusion framework for multi-domain translation in a semi-supervised context. Unlike previous methods, MDD does not require defining input and output domains, allowing translation between any partition of domains within a set (such as $(D_1, D_2)\rightarrow{}D_3$, $D_2\rightarrow{}(D_1, D_3)$, $D_3\rightarrow{}D_1$, etc. for 3 domains), without the need to train separate models for each domain configuration. The key idea behind MDD is to leverage the noise formulation of diffusion models by incorporating one noise level per domain, which allows missing domains to be modeled with noise in a natural way. This transforms the training task from a simple reconstruction task to a domain translation task, where the model relies on less noisy domains to reconstruct more noisy domains. We present results on a multi-domain (with more than two domains) synthetic image translation dataset with challenging semantic domain inversion.
- T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” 2018.
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” 2016.
- X. Meng, Y. Gu, Y. Pan, N. Wang, P. Xue, M. Lu, X. He, Y. Zhan, and D. Shen, “A novel unified conditional score-based generative framework for multi-modal medical image completion,” 2022.
- T. Mayet, S. Bernard, C. Chatelain, and R. Herault, “Domain translation via latent space mapping,” 2022.
- J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” 2020.
- J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” 2022.
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022.
- J. Wolleb, R. Sandkühler, F. Bieder, P. Valmaggia, and P. C. Cattin, “Diffusion models for implicit image segmentation ensembles,” 2021.
- I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” 2014.
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2022.
- D. J. Rezende and S. Mohamed, “Variational inference with normalizing flows,” 2016.
- A. Grover, C. Chute, R. Shu, Z. Cao, and S. Ermon, “Alignflow: Cycle consistent learning from multiple domains via normalizing flows,” 2019.
- H. Sun, R. Mehta, H. H. Zhou, Z. Huang, S. C. Johnson, V. Prabhakaran, and V. Singh, “Dual-glow: Conditional flow-based generative model for modality transfer,” 2019.
- J. Sohl-Dickstein, E. A. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” 2015.
- P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” 2018.
- J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” 2020.
- M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation networks,” 2018.
- X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” 2018.
- H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. K. Singh, and M.-H. Yang, “Diverse image-to-image translation via disentangled representations,” 2018.
- H.-Y. Lee, H.-Y. Tseng, Q. Mao, J.-B. Huang, Y.-D. Lu, M. Singh, and M.-H. Yang, “Drit++: Diverse image-to-image translation via disentangled representations,” 2019.
- Y. Li, H.-C. Shao, X. Liang, L. Chen, R. Li, S. Jiang, J. Wang, and Y. Zhang, “Zero-shot medical image translation via frequency-guided diffusion models,” 2023.
- C. Meng, Y. He, Y. Song, J. Song, J. Wu, J.-Y. Zhu, and S. Ermon, “Sdedit: Guided image synthesis and editing with stochastic differential equations,” 2022.
- T. Wang, T. Zhang, B. Zhang, H. Ouyang, D. Chen, Q. Chen, and F. Wen, “Pretraining is all you need for image-to-image translation,” 2022.
- A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,” 2022.
- Y. Lin, S. Zhang, X. Yang, X. Wang, and Y. Shi, “Regeneration learning of diffusion models with rich prompts for zero-shot image translation,” 2023.
- P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” 2021.
- J. Ho and T. Salimans, “Classifier-free diffusion guidance,” 2022.
- Q. Wang, D. Kong, F. Lin, and Y. Qi, “Diffsketching: Sketch control image synthesis with diffusion models,” 2023.
- J. O. Cross-Zamirski, P. Anand, G. Williams, E. Mouchet, Y. Wang, and C.-B. Schönlieb, “Class-guided image-to-image diffusion: Cell painting from brightfield images with class labels,” 2023.
- M. Mirza and S. Osindero, “Conditional generative adversarial nets,” 2014.
- T. Xie, C. Cao, Z. Cui, Y. Guo, C. Wu, X. Wang, Q. Li, Z. Hu, T. Sun, Z. Sang, Y. Zhou, Y. Zhu, D. Liang, Q. Jin, G. Chen, and H. Wang, “Synthesizing pet images from high-field and ultra-high-field mr images using joint diffusion attention model,” 2023.
- C. Saharia, W. Chan, H. Chang, C. A. Lee, J. Ho, T. Salimans, D. J. Fleet, and M. Norouzi, “Palette: Image-to-image diffusion models,” 2022.
- Q. Lyu and G. Wang, “Conversion between ct and mri images using diffusion and score-matching models,” 2022.
- C. Saharia, J. Ho, W. Chan, T. Salimans, D. J. Fleet, and M. Norouzi, “Image super-resolution via iterative refinement,” 2021.
- A. Lugmayr, M. Danelljan, A. Romero, F. Yu, R. Timofte, and L. V. Gool, “Repaint: Inpainting using denoising diffusion probabilistic models,” 2022.
- H. Sasaki, C. G. Willcocks, and T. P. Breckon, “Unit-ddpm: Unpaired image translation with denoising diffusion probabilistic models,” 2021.
- G. Mariani, I. Tallini, E. Postolache, M. Mancusi, L. Cosmo, and E. Rodolà, “Multi-source diffusion models for simultaneous music generation and separation,” 2023.
- C. Ham, J. Hays, J. Lu, K. K. Singh, Z. Zhang, and T. Hinz, “Modulating pretrained diffusion models for multimodal image synthesis,” 2023.
- X. Huang, A. Mallya, T.-C. Wang, and M.-Y. Liu, “Multimodal conditional image synthesis with product-of-experts gans,” 2021.
- F. Bao, S. Nie, K. Xue, C. Li, S. Pu, Y. Wang, G. Yue, Y. Cao, H. Su, and J. Zhu, “One transformer fits all distributions in multi-modal diffusion at scale,” 2023.