Deterministic Image-to-Image Translation via Denoising Brownian Bridge Models with Dual Approximators
(2512.23463v1)
Published 29 Dec 2025 in cs.CV
Abstract: Image-to-Image (I2I) translation involves converting an image from one domain to another. Deterministic I2I translation, such as in image super-resolution, extends this concept by guaranteeing that each input generates a consistent and predictable output, closely matching the ground truth (GT) with high fidelity. In this paper, we propose a denoising Brownian bridge model with dual approximators (Dual-approx Bridge), a novel generative model that exploits the Brownian bridge dynamics and two neural network-based approximators (one for forward and one for reverse process) to produce faithful output with negligible variance and high image quality in I2I translations. Our extensive experiments on benchmark datasets including image generation and super-resolution demonstrate the consistent and superior performance of Dual-approx Bridge in terms of image quality and faithfulness to GT when compared to both stochastic and deterministic baselines. Project page and code: https://github.com/bohan95/dual-app-bridge
The paper introduces the Dual-approxim Bridge, leveraging Brownian bridge SDEs with two neural approximators to achieve deterministic image-to-image translation.
It decouples the forward and reverse denoising processes, reducing stochasticity while ensuring faithful, high-quality image reconstructions.
Empirical evaluations demonstrate significant improvements in PSNR, SSIM, and FID across benchmarks, outperforming traditional GAN and diffusion approaches.
Deterministic Image-to-Image Translation via Denoising Brownian Bridge Models with Dual Approximators
Overview
"Deterministic Image-to-Image Translation via Denoising Brownian Bridge Models with Dual Approximators" (2512.23463) introduces a new generative model architecture for deterministic image-to-image (I2I) translation. The central contribution is the Dual-approx Bridge, which synergistically exploits the Brownian bridge framework and two independently trained neural network-based approximators—for the forward and reverse SDE directions. The outcome is deterministic, high-fidelity image generation with negligible sample variance, superior ground truth faithfulness, and image quality rivaling or exceeding state-of-the-art stochastic diffusion and GAN-based approaches.
Motivation and Problem Formulation
Deterministic I2I translation is critical in applications demanding one-to-one correspondence between input and output domains, such as super-resolution, inpainting, and denoising. While GANs provide deterministic inference, the produced images often suffer from suboptimal fidelity, exhibiting artifacts or blurriness, especially in tasks with a unique ground truth solution. Conventional diffusion and bridge models, particularly those based on SDEs, provide excellent sample quality but inherit stochasticity; this introduces variability across runs, thereby reducing result predictiveness and reliability for faithful translation scenarios.
The proposed method leverages the Brownian bridge structure, offering tractable interpolation between paired input and output distributions, but mitigates both the randomness of SDE-based samplers as well as the blurriness typical in PF-ODE deterministic variants. It achieves this by training two separate networks for forward and reverse approximation in the denoising process, decoupling the tasks of latent variable estimation and noise approximation, thus reconstructing fine details deterministically.
Model Architecture
The Dual-approx Bridge is founded on Brownian bridge SDEs. The generative procedure involves two separately parameterized neural networks:
Forward Approximator (ϵθ​): Trained to estimate the initial state (X0​) from a noised intermediate state (Xt​). The network receives Xt​ and the corresponding time-step t as input, optionally conditioned on the GT, to predict X0​ by minimizing the deviation from the analytically derived trajectory.
Reverse Approximator (Zϕ​): Trained to estimate the latent white noise variable (z) at each reverse step given the current state and time. This enables precise reconstruction of the denoising path, accounting for the reverse stochastic process.
The sampling process is then performed deterministically via these dual networks, with only a negligible degree of randomness at the initial reverse step (coefficient of order O(1/T​)).
Figure 1: Architecture and workflow—forward and reverse dual approximators and sampling dynamics.
Figure 2: Comparison of output variance and quality: (a) SDE-based sampler—high-quality but stochastic, (b) PF-ODE-based sampler—deterministic but blurry output, (c) Dual-approx Bridge—deterministic, sharp, and faithful synthesis.
Training Procedures
Training is conducted independently for the two networks:
The forward approximator is trained with conventional backpropagation to minimize the difference between its prediction and the actual initial state X0​, given Xt​ simulated along the diffusion trajectory.
The reverse approximator is trained to minimize the reconstruction error between estimated noise and true white noise samples, facilitating accurate reverse-time path integration.
This two-network structure diverges from prior works which use a single universal approximator, failing to disentangle the complexity between estimation of the initial state and the reverse-time noise residual.
Empirical Evaluation
The Dual-approx Bridge is systematically evaluated both quantitatively and qualitatively on benchmark I2I translation and super-resolution datasets, including Cityscapes, Edges2Handbags, BSD100, and Urban100. Metrics captured include FID, LPIPS, PSNR, and SSIM.
Ablation studies demonstrate:
Minimal sampling output variance, especially at low-step regimes.
Image quality (FID, LPIPS) can be tuned by the number of sampling steps, with a tradeoff where fewer steps increase faithfulness (PSNR/SSIM) but may reduce fine detail, while more steps increase perceptual quality at the cost of slightly reduced faithfulness.
Faithfulness analysis (Table \ref{tab:cityscape:var}, Figure 3) confirms that Dual-approx Bridge achieves negligible variability (standard deviation of order 10−3 or less) in PSNR/SSIM and reconstructions nearly indistinguishable from GT, outperforming Brownian bridge SDE-based and PF-ODE-based samplers by significant absolute margins (e.g., ∼12% improvement in SSIM).
Figure 3: Output variability on Edges2Handbags; SDE samplers are high quality but inconsistent, PF-ODEs are deterministic but lack detail, Dual-approx achieves both.
Deterministic I2I Tasks: On Cityscapes, Dual-approx Bridge delivers an FID of 48.7 (versus 75–100 for GAN baselines), PSNR of 15.7, and SSIM over 53%, substantially higher than GAN-based counterparts. For BSD100 and Urban100, it consistently matches or surpasses leading GAN and diffusion models across LPIPS and PSNR, with far fewer sampling steps.
Figure 4: Visual comparison on Cityscapes—more structurally faithful and less artifact-prone outputs than SOTA GANs and diffusion models.
Figure 5: Super-resolution results on BSD100; sharper, more accurate recovery of details than baseline methods.
Figure 6: Qualitative results on Urban100, validating artifact-free and faithful reconstruction at both global and local (zoomed-in) levels.
General I2I Translation Tasks: On Edges2Handbags, Dual-approx Bridge achieves competitive FID (1.36, second only to A-Bridge with SDE sampling), but critically, is deterministic and exhibits drastically reduced variance and high consistency across repeated runs.
Figure 7: Edges2Handbags comparison to SOTA methods, showcasing SDE-level visual quality without stochastic variability.
Figure 8: Sampling fidelity variance histogram; Dual-approx variance an order of magnitude lower than A-bridge (SDE).
Theoretical and Practical Implications
This research challenges a prevailing dichotomy in generative modeling: that faithfulness and high sample quality are in tension due to the necessity of stochasticity for detail recovery in SDE frameworks. The introduction of dual, independently trained approximators allows the model to circumvent the limitations of deterministic ODE-based samplers (which tend toward oversmoothing) while maintaining output uniqueness per input.
Theoretically, the method opens new lines of inquiry into the utility of multi-network architectures for fine-grained control of generative processes along stochastic bridges. Practically, it enables deployment in high-reliability, high-interpretability settings (e.g., medical imaging, restoration, high-precision super-resolution) where every input must reproducibly yield the same high-fidelity output, and stochastic variability is inadmissible.
Future Directions
Potential future directions include the exploration of non-VP (variance preserving) SDE regimes, integration of stochastic control in unpaired I2I or domain adaptation scenarios, and broader adoption in other conditional generative modeling contexts.
Conclusion
The Dual-approx Bridge establishes a deterministic paradigm for Brownian bridge-based I2I translation, combining the faithfulness and consistency requisite for critical applications with the high perceptual quality characteristic of modern diffusion models. The decoupling of forward and reverse SDE approximation via two neural networks yields near-zero output variance and robust detection and reconstruction of fine structural details, empirically validated across a spectrum of datasets and tasks. This work advances the theoretical and practical boundary of deterministic, high-quality generative modeling for image translation.