Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unpaired Image-to-Image Translation via Neural Schrödinger Bridge (2305.15086v3)

Published 24 May 2023 in cs.CV, cs.AI, cs.LG, and stat.ML

Abstract: Diffusion models are a powerful class of generative models which simulate stochastic differential equations (SDEs) to generate data from noise. While diffusion models have achieved remarkable progress, they have limitations in unpaired image-to-image (I2I) translation tasks due to the Gaussian prior assumption. Schr\"{o}dinger Bridge (SB), which learns an SDE to translate between two arbitrary distributions, have risen as an attractive solution to this problem. Yet, to our best knowledge, none of SB models so far have been successful at unpaired translation between high-resolution images. In this work, we propose Unpaired Neural Schr\"{o}dinger Bridge (UNSB), which expresses the SB problem as a sequence of adversarial learning problems. This allows us to incorporate advanced discriminators and regularization to learn a SB between unpaired data. We show that UNSB is scalable and successfully solves various unpaired I2I translation tasks. Code: \url{https://github.com/cyclomon/UNSB}

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Mutual information neural estimation. In ICML, 2018.
  2. One-sided unsupervised domain mapping. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/59b90e1005a220e2ebc542eb9d950b1e-Paper.pdf.
  3. Diffusion schrödinger bridge with applications to score-based generative modeling. In NeurIPS, 2021.
  4. The schrödinger bridge between gaussian measures has a closed form. In AISTATS, 2023.
  5. Reflected schrödinger bridge: Density control with path constraints. In 2021 American Control Conference (ACC), pp.  1137–1142. IEEE, 2021.
  6. Reusing discriminators for encoding: Towards unsupervised image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  8168–8177, 2020.
  7. Likelihood training of schrödinger bridge using forward-backward sdes theory. In ICLR, 2022.
  8. Stochastic control liaisons: Richard sinkhorn meets gaspard monge on a schrodinger bridge. Siam Review, 63(2):249–313, 2021.
  9. Stargan v2: Diverse image synthesis for multiple domains. In CVPR, 2022.
  10. Diffusion posterior sampling for general noisy inverse problems. In ICLR, 2023.
  11. Inversion by direct iteration: An alternative to denoising diffusion for image restoration. arxiv preprint arXiv:2303.11435, 2023.
  12. Nice: Non-linear independent components estimation. arxiv preprint arXiv:1410.8516, 2015.
  13. Geometry-consistent generative adversarial networks for one-sided unsupervised domain mapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  14. Generative adversarial nets. In NeurIPS, 2014.
  15. Entropic neural optimal transport via diffusion processes. In NeurIPS, 2023a.
  16. Entropic neural optimal transport via diffusion processes. In NeurIPS, 2023b.
  17. Building the bridge of schrödinger: A continuous entropic optimal transport benchmark. In NeurIPS Track on Datasets and Benchmarks, 2023c.
  18. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  19. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  20. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  21. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
  22. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  1125–1134, 2017.
  23. Exploring patch-wise semantic relation for contrastive learning in image-to-image translation tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  18260–18269, June 2022.
  24. Progressive growing of GANs for improved quality, stability, and variation. In ICLR, 2018.
  25. Auto-encoding variational bayes. In ICLR, 2014.
  26. Neural optimal transport. In ICLR, 2023.
  27. Christian Léonard. A survey of the schrödinger problem and some of its connections with optimal transport. arXiv preprint arXiv:1308.0215, 2013.
  28. Ar-dae: Towards unbiased neural entropy estimation. In ICML, 2020.
  29. Deep generalized schrödinger bridge. arXiv preprint arXiv:2209.09893, 2022.
  30. I2sb: Image-to-image schrödinger bridge. arxiv preprint arXiv:2302.05872, 2023.
  31. Sdedit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022.
  32. Contrastive learning for unpaired image-to-image translation. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (eds.), Computer Vision – ECCV 2020, pp.  319–345, Cham, 2020. Springer International Publishing. ISBN 978-3-030-58545-7.
  33. Multisample flow matching: Straightening flows with minibatch couplings. In ICML, 2023.
  34. Paolo Dai Pra. A stochastic control appraoch to reciprocal diffusion processes. Applied Mathematics and Optimization, 23(1):313–329, 1991.
  35. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  36. Unbiased estimation using a class of diffusion processes. Journal of Computational Physics, 472:111643, 2023.
  37. Can push-forward generative models fit multimodal distributions? In NeurIPS, 2022.
  38. Erwin Schrödinger. Sur la théorie relativiste de l’électron et l’interprétation de la mécanique quantique. Annales de l’institut Henri Poincaré, 2(4):269–310, 1932.
  39. Conditional simulation using diffusion schrödinger bridges. In Uncertainty in Artificial Intelligence, pp.  1792–1802. PMLR, 2022.
  40. Diffusion schrödinger bridge matching. In NeurIPS, 2023.
  41. Deep unsupervised learing using nonequilibrium thermodynamics. In ICML, 2015.
  42. Denoising diffusion implicit models. In ICLR, 2021a.
  43. Score-based generative modeling through stochastic differential equations. In ICLR, 2021b.
  44. Dual diffusion implicit bridges for image-to-image translation. In ICLR, 2023.
  45. Transport with support: Data-conditional diffusion bridges. arXiv preprint arXiv:2301.13636, 2023.
  46. Riemannian diffusion schrödinger bridge. arXiv preprint arXiv:2207.03024, 2022.
  47. Conditional flow matching: Simulation-free dynamic optimal transport. arxiv preprint arXiv:2302.00482, 2023.
  48. Solving schrödinger bridges via maximum likelihood. Entropy, 23(9):1134, 2021.
  49. Deep generative learning via schrödinger bridge. In International Conference on Machine Learning, pp. 10794–10804. PMLR, 2021a.
  50. Instance-wise hard negative example generation for contrastive learning in unpaired image-to-image translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.  14020–14029, October 2021b.
  51. Tackling the generative learning trilemma with denoising diffusion gans. In ICLR, 2022.
  52. Path integral sampler: a stochastic control approach for sampling. arXiv preprint arXiv:2111.15141, 2021.
  53. Egsde: Unpaired image-to-image translation via energy-guided stochastic differential equations. In NeurIPS, 2022.
  54. The spatially-correlative loss for various image translation tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  16407–16417, June 2021.
  55. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
Citations (30)

Summary

  • The paper introduces UNSB, a novel method that reformulates the Schrödinger Bridge problem as a sequence of adversarial learning tasks for unpaired image translation.
  • The paper employs a time-conditional neural network with multi-step refinement to progressively generate high-resolution images while mitigating dimensionality challenges.
  • Experimental results demonstrate that UNSB outperforms GAN and diffusion-based methods in FID and KID scores on benchmarks such as Horse2Zebra and Map2Cityscape.

This paper introduces the Unpaired Neural Schrödinger Bridge (UNSB), a novel approach for unpaired image-to-image translation, particularly effective for high-resolution images. The core challenge addressed is the difficulty of traditional diffusion models in unpaired settings due to their fixed Gaussian prior and the limitations of existing Schrödinger Bridge (SB) methods, which struggle with scalability and the curse of dimensionality in high-dimensional data spaces like images.

UNSB formulates the SB problem, which aims to find the most likely stochastic process bridging two arbitrary distributions, as a sequence of adversarial learning problems. This is inspired by the self-similarity property of SBs, which states that an SB restricted to a sub-interval is also an SB. The authors discretize the time interval [0,1][0,1] into steps t0,,tNt_0, \ldots, t_N and learn the transition p(xti+1xti)p(\bm{x}_{t_{i+1}}|\bm{x}_{t_i}) sequentially, forming a Markov chain from the source distribution at t0t_0 to the target distribution at tNt_N.

The key idea is to learn a time-conditional neural network qϕ(x1xti)q_\phi(\bm{x}_1 | \bm{x}_{t_i}) for each step, which predicts the target image x1\bm{x}_1 given an intermediate image xti\bm{x}_{t_i}. Learning this mapping is posed as a constrained optimization problem: minimizing an entropy-regularized transport cost between the intermediate distribution and the target distribution, subject to the constraint that the marginal distribution of the predicted target images matches the true target distribution π1\pi_1. This constrained problem is translated into a Lagrangian formulation combining:

  1. An Adversarial Loss (LAdvL_{Adv}): Estimated via adversarial learning, this loss ensures the distribution of predicted target images qϕ(x1)q_\phi(\bm{x}_1) matches the true target distribution p(x1)p(\bm{x}_1). This allows the use of advanced discriminators (like patch-wise discriminators) to better capture the target distribution's characteristics, mitigating the curse of dimensionality compared to methods relying solely on empirical data matching.
  2. An SB Loss (LSBL_{SB}): Related to the entropy-regularized transport cost, approximated using mutual information estimation.
  3. A Regularization Loss (LRegL_{Reg}): An application-specific loss that enforces consistency or structural similarity between the initial source image x0\bm{x}_0 and the predicted target image x1(xti)\bm{x}_1(\bm{x}_{t_i}). This term incorporates inductive biases relevant to the translation task (e.g., using a patch-wise contrastive loss inspired by CUT (2007.08971)).

By combining these components, UNSB aims to learn a mapping that generalizes well beyond the limited samples available in high dimensions. The multi-step generation process, obtained by iteratively sampling from the learned transition kernels qϕ(xti+1xti)q_\phi(\bm{x}_{t_{i+1}}|\bm{x}_{t_i}), allows for gradual refinement of the output image, which is beneficial for complex translations.

For practical implementation, the authors use a single time-conditional DNN for qϕq_\phi, taking (xti,ti)(\bm{x}_{t_i}, t_i) as input, and train it by sampling random time steps. Intermediate samples xti\bm{x}_{t_i} are generated by simulating the learned Markov chain from the source distribution π0\pi_0. The adversarial learning involves training a discriminator alongside the generator.

Experimental results demonstrate UNSB's effectiveness. On toy examples like translating between two concentric spheres, UNSB shows robustness to the curse of dimensionality where other SB/OT methods fail. For high-resolution (256x256) unpaired image-to-image translation tasks (Horse2Zebra, Map2Cityscape, Summer2Winter, Map2Satellite), UNSB achieves superior FID and KID scores compared to various GAN-based methods (CycleGAN (1703.10593), MUNIT (1804.04732), CUT (2007.08971)) and diffusion/SB baselines (NOT (2303.10116), SDEdit (2108.01073), P2P (2208.01626)). Qualitative results show UNSB generates more realistic target domain images while preserving source structure.

The number of function evaluations (NFE), which corresponds to the number of time steps used in the generation process, impacts quality. While NFE=1 (analogous to a single-step GAN) yields reasonable results, increasing NFE (typically 3-5 steps) consistently improves quality, reflecting the benefits of the multi-step refinement. Ablation studies confirm that the advanced discriminator, regularization, and multi-step generation each contribute positively to the performance. UNSB also demonstrates stochasticity, producing diverse outputs for a given input.

In summary, UNSB provides a practical framework for applying Schrödinger Bridges to challenging unpaired image-to-image translation problems in high dimensions by addressing the curse of dimensionality through adversarial learning, regularization, and a multi-step generative process. The implementation involves training a time-conditional generator and a discriminator using a combined objective function. While computationally more intensive than single-step methods, it achieves state-of-the-art performance on various unpaired translation benchmarks. The code is available for reproduction.