Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 73 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 75 tok/s Pro
Kimi K2 184 tok/s Pro
GPT OSS 120B 466 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

LBM: Latent Bridge Matching for Fast Image-to-Image Translation (2503.07535v1)

Published 10 Mar 2025 in cs.CV

Abstract: In this paper, we introduce Latent Bridge Matching (LBM), a new, versatile and scalable method that relies on Bridge Matching in a latent space to achieve fast image-to-image translation. We show that the method can reach state-of-the-art results for various image-to-image tasks using only a single inference step. In addition to its efficiency, we also demonstrate the versatility of the method across different image translation tasks such as object removal, normal and depth estimation, and object relighting. We also derive a conditional framework of LBM and demonstrate its effectiveness by tackling the tasks of controllable image relighting and shadow generation. We provide an open-source implementation of the method at https://github.com/gojasper/LBM.

Summary

  • The paper introduces LBM, a novel Latent Bridge Matching method that enables fast image-to-image translation, achieving state-of-the-art results often in a single inference step.
  • LBM operates in the latent space of a VAE and models the translation process as a Brownian bridge stochastic process whose drift is approximated by a trained neural network.
  • This versatile method performs well across various tasks like object removal and relighting, offering significantly faster inference than diffusion models while maintaining competitive quality metrics, though it requires paired training data.

The paper "LBM: Latent Bridge Matching for Fast Image-to-Image Translation" (2503.07535) introduces a novel and efficient image-to-image translation method called Latent Bridge Matching (LBM). LBM leverages bridge matching in a latent space to achieve fast and versatile image translation with state-of-the-art results in a single inference step.

Core Principles of LBM:

  • LBM operates within the latent space of a pre-trained Variational Autoencoder (VAE). Source and target images are encoded into latent representations using the VAE encoder, which reduces computational costs and enables scaling to high-resolution images.
  • The method is based on the bridge matching framework, which aims to find a transport map between the probability distributions of the latent representations of source and target images.
  • Given paired latent codes (z0z_0, z1z_1) from source and target images, LBM creates a stochastic interpolant ztz_t that follows a Brownian motion (Brownian bridge) between the two latent points, as defined by: zt=z0(1t)+z1t+σt(1t)ϵz_t = z_0 (1-t) + z_1 t + \sigma \sqrt{t(1-t)} \epsilon, where ϵN(0,I)\epsilon \sim \mathcal{N}(0, I), σ0\sigma \geq 0 and t[0,1]t \in [0, 1].
  • The evolution of ztz_t is governed by a Stochastic Differential Equation (SDE): dxt=(x1xt)1tdt+σdBtd x_t = \frac{(x_1 - x_t)}{1-t} \mathrm{d}t + \sigma \mathrm{d}B_t
  • A neural network (vθv_{\theta}) is trained to approximate the drift of the SDE by minimizing the difference between the true drift and the network's prediction.
  • LBM can be extended to a conditional framework by introducing a conditioning variable c to guide the generation process. The drift function approximator then becomes vθ(zt,t,c)v_{\theta}(z_t, t, c).
  • The method accelerates training by using only a few equally spaced timesteps, which are also used during inference.

Specific Applications:

LBM demonstrates versatility across various image-to-image translation tasks:

  • Object Removal
  • Depth and Surface Normal Estimation
  • Object Relighting/Image Harmonization
  • Controllable Image Relighting
  • Shadow Generation
  • Image Restoration

Performance Comparison:

  • Speed: LBM achieves state-of-the-art results using only a single inference step (or very few steps, e.g., 4), which is significantly faster than diffusion-based methods.
  • Quality: LBM competes with or outperforms state-of-the-art methods, including diffusion models and flow matching models, in terms of quality metrics like FID, foreground MSE (fMSE), PSNR, and SSIM. For example, it outperforms other approaches in most metrics for object removal, ranks among the top models for depth and normal estimation, and outperforms competitors for most metrics in image relighting.
  • The stochastic nature of LBM, due to the Brownian bridge and noise parameter σ, is beneficial and leads to better performance than deterministic flow matching.

Limitations:

  • A key limitation is the requirement for paired image data (source and target pairs) to train the model, which can be challenging when such data is not readily available.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 posts and received 391 likes.