Iterative Markovian Fitting (IMF)

Updated 27 October 2025

Iterative Markovian Fitting is a family of projection algorithms that alternates Markovian with reciprocal steps to construct stochastic processes satisfying endpoint constraints.
The method minimizes KL divergence via alternating projections, yielding exponential convergence in both discrete and continuous settings.
IMF has versatile applications in signal decomposition, time series analysis, generative modeling, and optimal transport across various domains.

Iterative Markovian Fitting (IMF) is a family of iterative projection algorithms designed to construct stochastic processes—particularly those solving the Schrödinger Bridge (SB) problem—by alternately enforcing the Markov property and the reciprocal (bridge-matching) structure. Originating in both signal decomposition (0808.2827) and optimal transport contexts (Shi et al., 2023, Gushchin et al., 23 May 2024, Kholkin et al., 3 Oct 2024, 2502.01416, Howard et al., 20 Jun 2025, Sokolov et al., 4 Aug 2025, Silveri et al., 23 Oct 2025), IMF provides efficient convergence and has inspired diverse algorithmic adaptations for generative modeling, time series analysis, and transport across continuous, discrete, and multi-marginal domains.

1. Foundational Principles and Algorithmic Structure

The canonical IMF strategy targets problems of interpolating between prescribed endpoint distributions, typically by minimizing the Kullback–Leibler (KL) divergence to a reference process under marginal constraints. In the SB framework, the optimal path measure $P^\ast$ satisfies: $P^\ast = \arg\min_{P\,:\,P_0=\mu_0,\; P_T=\mu_T} \text{KL}(P\;\|\;Q)$ where $Q$ is a reference (often a Brownian motion or another Markov process) (Shi et al., 2023, Kholkin et al., 3 Oct 2024).

IMF achieves this by alternating two projections:

Markovian Projection: Given a candidate path measure, project onto the set of Markovian processes by constructing transition kernels or SDE drifts so that the process becomes Markov, while trying to retain the desired marginals as closely as possible.
Reciprocal (Bridge-Matching) Projection: Replace the intermediate transitions so that the process conditioned on its endpoints matches the reference bridge (e.g., Brownian bridge conditionals).

In discrete-time settings,

$q^{2l+1} = \text{proj}_{\mathcal{M}}(q^{2l}), \quad q^{2l+2} = \text{proj}_{\mathcal{R}}(q^{2l+1})$

and analogous alternations hold for continuous-time versions (Gushchin et al., 23 May 2024, 2502.01416).

This procedure generalizes classical Iterative Proportional Fitting (IPF) (Sinkhorn’s algorithm) by preserving both endpoint marginals and path structure (Kholkin et al., 3 Oct 2024), leading to the combined Iterative Proportional Markovian Fitting (IPMF) for enhanced stability.

2. Theoretical Analysis and Convergence Rates

IMF exhibits rapid convergence in the optimization landscape of the SB problem. Recent works provide explicit non-asymptotic exponential convergence rates under mild regularity conditions (Sokolov et al., 4 Aug 2025, Silveri et al., 23 Oct 2025). For instance, in finite state discrete problems,

$\text{KL}(p_k\;||\;p^\ast) \leq (1 - m^3/4)^k\, \text{KL}(p_0\;||\;p^\ast)$

where $m$ is a strictly positive constant determined by minimal probabilities of state space transitions and marginals (Sokolov et al., 4 Aug 2025).

In continuous spaces with strongly log-concave marginals and reference measures (e.g., Langevin diffusions), a similar rate holds: $\text{KL}(\pi_{0,T}^{(n)} \;\| \; \pi^*) \leq \left(\frac{L_U}{T(\alpha_\phi + \alpha_\psi + \alpha)}\right)^{n}\, \text{KL}(\pi_{0,T}^{(0)} \|\pi^*)$ here, $L_U$ is a Lipschitz constant of the reference transition density, and $\alpha_\phi, \alpha_\psi, \alpha$ are curvature parameters tracing to marginals and bridge potentials (Silveri et al., 23 Oct 2025). For weakly log-concave marginals the rate is modulated by extra constants.

The contraction, grounded in gradient convexity and Lipschitz gradient properties of the KL divergence, ensures geometric rate error reduction and makes IMF robust in high-dimensional or multimodal applications.

3. Algorithmic Variants and Extensions

Discrete-Time IMF and Categorical Matching

IMF has discrete versions ("D-IMF"), particularly suited for finite or categorical state spaces (2502.01416). Here, the Markovian projection is performed by learning transitions $q(x_{t_n}|x_{t_{n-1}})$ via parametrized models (neural networks, GANs (Gushchin et al., 23 May 2024)), while the bridge-matching step enforces the reference’s conditional structure: $\text{proj}_{\mathcal{R}}(q)(x_0, x_{\text{in}}, x_1) = q^{\text{ref}}(x_{\text{in}}|x_0,x_1)\cdot q(x_0,x_1)$ The Categorical Schrödinger Bridge Matching (CSBM) algorithm employs factorized update rules and proves KL convergence to the unique SB solution in discrete spaces (2502.01416).

IPMF and Bidirectional Updates

Efficient practical solutions use bidirectional alternation, projecting forward to fix initial marginals, then backward to fix terminal marginals: $\begin{aligned} q^{4k+2} &= p_1(x_1)\prod_n q_\phi(x_{t_{n-1}}|x_{t_n}) & \text{(backward)} \ q^{4k+4} &= p_0(x_0)\prod_n q_\theta(x_{t_n}|x_{t_{n-1}}) & \text{(forward)} \end{aligned}$ Alternating forward/backward Markovian-IPF steps stabilizes drift estimation and prevents error accumulation, resulting in improved generation quality (Kholkin et al., 3 Oct 2024).

GAN-based Implementations

For computational efficiency, D-IMF transitions are learned using adversarial approaches such as Denoising Diffusion GANs (DD-GAN), permitting high-quality domain translation with only a few discrete steps (Gushchin et al., 23 May 2024). This enables practical deployment with dramatically fewer network evaluations compared to continuous SDE-based flows.

Multi-Marginal and Tree-Structured Extensions

IMF generalizes to tree-structured optimal transport (Howard et al., 20 Jun 2025), supporting Wasserstein barycentre problems. Here, alternating projections factor along tree edges, leveraging parallelization and preserving marginal constraints for multi-marginal coupling of distributions: $\text{KL}(P\;\|\;\mathop{\tilde{R}}) = \sum_{(u,v)\in E}\mathbb{E}_{X_u\sim P_u}[\text{KL}(P^{(u,v)}(\cdot|X_u)\|\tilde{R}^{(u,v)}(\cdot|X_u))]$

4. Signal Decomposition and Time–Frequency Analysis

IMF originally appeared as a fast algorithm for intrinsic mode decomposition of time series, providing a convergent and predictable alternative to Empirical Mode Decomposition (EMD) (0808.2827). Control points based on extrema and median-based residue splines enable highly adaptive signal bases, efficiently separating oscillatory components and trends.

For time–frequency analysis, IMF-based approaches (or Iterative Filtering) yield decompositions preserving Fourier energy and facilitate sparse, local time–frequency representations. The IMFogram (Cicone et al., 2022), built from local amplitude and instantaneous frequency, converges to the spectrogram in stationary limits. These decompositions provide fine local frequency tracking, outperforming windowed methods in many nonstationary settings.

5. Applications in Generative Modeling and Domain Translation

IMF and its variants underpin modern advances in generative modeling, enabling sample-efficient stochastic process interpolation between domains. In image-to-image translation, domain adaptation, and latent code transport in neural autoencoders, IMF-based methods enforce both distributional alignment and process optimality (Shi et al., 2023, 2502.01416). GAN-based discrete versions have shown empirical superiority in speed and quality for unpaired domain translation, with competitive or improved Fréchet Inception Distance (FID) and domain-matching metrics.

In multi-marginal optimal transport, tree-structured IMF solvers efficiently compute entropic Wasserstein barycentres for clustering, posterior aggregation, and hierarchical data modeling (Howard et al., 20 Jun 2025). In time series, IMF-based decomposition and filtering offer effective noise reduction and nonstationarity handling.

6. Limitations, Open Problems, and Future Directions

IMF’s convergence is now rigorously understood in both discrete and continuous domains, with explicit exponential rates (Sokolov et al., 4 Aug 2025, Silveri et al., 23 Oct 2025). However, practical challenges remain:

High dimensional discrete spaces necessitate factorized transition models, potentially losing feature dependencies.
Continuous SDE-based flows may require many integration steps; efficient discrete approximations such as D-IMF/ASBM (Gushchin et al., 23 May 2024) mitigate this at the cost of discretization errors and step-size selection.
Widening applicability to multi-modal and hierarchical problems depends on further architectural advances and warm-starting strategies (Howard et al., 20 Jun 2025).

Emergent directions include refining projection operators to improve parallelization, developing "α–IMF" flows to limit iteration count, and extending contraction theory to more complex bridge and reference structures.

7. Summary Table: Algorithmic Regimes and Convergence Properties

Algorithmic Regime	Structure Assumptions	Convergence Guarantee
Discrete IMF	Finite state space, minimal probs.	$\text{KL}(p_k\\|p^*)\leq(1-m^3/4)^k$
Continuous IMF – Strongly Log-Concave Marginals	Lipschitz ref., uniform convexity	Exponential: $\rho = L_U/(T(\cdots))$
Continuous IMF – Weakly Log-Concave Marginals	Locally convex, bounded gradients	Damped exponential, extra factor
GAN-based D-IMF/ASBM	Parametrized transitions, adversarial	Empirical: order-of-mag. speedup
Tree-structured IMF	Marginals on vertices, edge structure	Empirical: faster, parallelizable

All convergence guarantees are traceable to (Sokolov et al., 4 Aug 2025, Silveri et al., 23 Oct 2025); empirical performance (speed and sample quality) is detailed in (Gushchin et al., 23 May 2024, Howard et al., 20 Jun 2025).

IMF constitutes a unified conceptual and algorithmic framework for constructing path measures, optimally interpolating between distributions by harnessing alternating Markovian-reciprocal projections. It has demonstrated theoretically justified exponential convergence, superior computational efficiency, and flexible adaptation—from classical signal decomposition to modern generative modeling and optimal transport. Continued research is focusing on expanding its tractability and scalability across increasingly complex domains.