Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 73 tok/s

Gemini 2.5 Pro 40 tok/s Pro

GPT-5 Medium 32 tok/s Pro

GPT-5 High 28 tok/s Pro

GPT-4o 75 tok/s Pro

Kimi K2 184 tok/s Pro

GPT OSS 120B 466 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

LBM: Latent Bridge Matching for Fast Image-to-Image Translation (2503.07535v1)

Published 10 Mar 2025 in cs.CV

Abstract: In this paper, we introduce Latent Bridge Matching (LBM), a new, versatile and scalable method that relies on Bridge Matching in a latent space to achieve fast image-to-image translation. We show that the method can reach state-of-the-art results for various image-to-image tasks using only a single inference step. In addition to its efficiency, we also demonstrate the versatility of the method across different image translation tasks such as object removal, normal and depth estimation, and object relighting. We also derive a conditional framework of LBM and demonstrate its effectiveness by tackling the tasks of controllable image relighting and shadow generation. We provide an open-source implementation of the method at https://github.com/gojasper/LBM.

Summary

The paper introduces LBM, a novel Latent Bridge Matching method that enables fast image-to-image translation, achieving state-of-the-art results often in a single inference step.
LBM operates in the latent space of a VAE and models the translation process as a Brownian bridge stochastic process whose drift is approximated by a trained neural network.
This versatile method performs well across various tasks like object removal and relighting, offering significantly faster inference than diffusion models while maintaining competitive quality metrics, though it requires paired training data.

The paper "LBM: Latent Bridge Matching for Fast Image-to-Image Translation" (2503.07535) introduces a novel and efficient image-to-image translation method called Latent Bridge Matching (LBM). LBM leverages bridge matching in a latent space to achieve fast and versatile image translation with state-of-the-art results in a single inference step.

Core Principles of LBM:

LBM operates within the latent space of a pre-trained Variational Autoencoder (VAE). Source and target images are encoded into latent representations using the VAE encoder, which reduces computational costs and enables scaling to high-resolution images.
The method is based on the bridge matching framework, which aims to find a transport map between the probability distributions of the latent representations of source and target images.
Given paired latent codes ( $z_0$ , $z_1$ ) from source and target images, LBM creates a stochastic interpolant $z_t$ that follows a Brownian motion (Brownian bridge) between the two latent points, as defined by: $z_t = z_0 (1-t) + z_1 t + \sigma \sqrt{t(1-t)} \epsilon$ , where $\epsilon \sim \mathcal{N}(0, I)$ , $\sigma \geq 0$ and $t \in [0, 1]$ .
The evolution of $z_t$ is governed by a Stochastic Differential Equation (SDE): $d x_t = \frac{(x_1 - x_t)}{1-t} \mathrm{d}t + \sigma \mathrm{d}B_t$
A neural network ( $v_{\theta}$ ) is trained to approximate the drift of the SDE by minimizing the difference between the true drift and the network's prediction.
LBM can be extended to a conditional framework by introducing a conditioning variable c to guide the generation process. The drift function approximator then becomes $v_{\theta}(z_t, t, c)$ .
The method accelerates training by using only a few equally spaced timesteps, which are also used during inference.

Specific Applications:

LBM demonstrates versatility across various image-to-image translation tasks:

Object Removal
Depth and Surface Normal Estimation
Object Relighting/Image Harmonization
Controllable Image Relighting
Shadow Generation
Image Restoration

Performance Comparison:

Speed: LBM achieves state-of-the-art results using only a single inference step (or very few steps, e.g., 4), which is significantly faster than diffusion-based methods.
Quality: LBM competes with or outperforms state-of-the-art methods, including diffusion models and flow matching models, in terms of quality metrics like FID, foreground MSE (fMSE), PSNR, and SSIM. For example, it outperforms other approaches in most metrics for object removal, ranks among the top models for depth and normal estimation, and outperforms competitors for most metrics in image relighting.
The stochastic nature of LBM, due to the Brownian bridge and noise parameter σ, is beneficial and leads to better performance than deterministic flow matching.

Limitations:

A key limitation is the requirement for paired image data (source and target pairs) to train the model, which can be challenging when such data is not readily available.