Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 65 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 113 tok/s Pro
Kimi K2 200 tok/s Pro
GPT OSS 120B 445 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Markov Space Flow Matching (MarS-FM)

Updated 1 October 2025
  • MarS-FM is a generative modeling technique that extends continuous flow matching to arbitrary Markov state spaces, including discrete, manifold, and Lie group structures.
  • It leverages conditional flow matching and neural generator training to model macroscopic state transitions efficiently, achieving significant speedups in applications like molecular dynamics.
  • Grounded in rigorous theoretical guarantees, MarS-FM ensures stability and convergence, making it a robust tool for high-dimensional probabilistic inference and complex system simulation.

Markov Space Flow Matching (MarS-FM) is a class of generative modeling techniques in which flow matching frameworks are extended to operate over general Markov state spaces and processes. The defining feature of MarS-FM is its ability to model transitions between states in structured or discrete spaces—such as those defined by Markov State Models (MSMs), Riemannian manifolds, categorical spaces, or Lie groups—while inheriting the simulation-free, scalable, and theoretically principled machinery of continuous Flow Matching and Conditional Flow Matching. MarS-FM has become a foundational approach for efficient generative modeling and probabilistic inference in domains ranging from molecular dynamics to probabilistic MCMC acceleration and beyond.

1. Formal Foundations of Markov Space Flow Matching

MarS-FM generalizes the deterministic transport of probability mass via continuous flows to arbitrary state spaces endowed with Markovian or more general generator structures. In this framework, a time-dependent transformation (or flow) ψt\psi_t is parameterized to connect a source distribution p0p_0 to a target p1p_1 through a prescribed probability path {pt}t[0,1]\{p_t\}_{t \in [0,1]}. For traditional applications in Rd\mathbb{R}^d, this aligns with solving the ODE

ddtψt(x)=ut(ψt(x)),ψ0(x)=x,\frac{d}{dt} \psi_t(x) = u_t(\psi_t(x)), \quad \psi_0(x) = x,

where utu_t is the velocity field learned to match the evolution of ptp_t via the continuity equation.

In MarS-FM, this principle is extended to state spaces SS that may be:

  • Discrete (Markov chains, e.g., for text generation or molecular metastable states)
  • Manifold-valued (e.g., geometric configuration spaces of proteins)
  • Hybrid (combining Euclidean and non-Euclidean coordinates, as in SE(3)-equivariant state representations)

The generator formalism underpins this approach. A general time-dependent generator Lt\mathcal{L}_t (see (Lipman et al., 9 Dec 2024)) can combine deterministic flows, stochastic diffusions, and discrete jumps:

Ltf(x)=f(x)ut(x)+12σt2(x)2f(x)+[f(y)f(x)]Qt(dy,x)\mathcal{L}_t f(x) = \nabla f(x)^\top u_t(x) + \frac{1}{2} \sigma_t^2(x) \cdot \nabla^2 f(x) + \int [f(y) - f(x)] Q_t(dy, x)

where ut(x)u_t(x) is a velocity field, σt2(x)\sigma_t^2(x) the diffusion coefficient, and Qt(dy,x)Q_t(dy, x) a jump kernel. The MarS-FM problem is to fit a neural generator Ltθ\mathcal{L}_t^\theta that matches the dynamics prescribed by a probability path ptp_t on such general spaces.

Conditional Flow Matching plays a central role: Instead of mapping between frames in Rd\mathbb{R}^d directly, MarS-FM leverages interpolating paths (e.g., geodesic or Markov interpolants) conditioned on initial and final states, optimizing:

Et,x0,x1[utθ(xt)ut(xtx1)2],xt=It(x0,x1)\mathbb{E}_{t, x_0, x_1} \left[ \|u_t^\theta(x_t) - u_t(x_t | x_1)\|^2 \right], \quad x_t = I_t(x_0, x_1)

with ItI_t a suitable interpolation respecting the Markovian dynamics and state space geometry (Lipman et al., 9 Dec 2024).

2. Efficient Conditional Sampling via Markov State Models

A central advancement of MarS-FM lies in its integration with Markov State Models (MSMs) for efficient modeling of complex dynamical processes, particularly in high-dimensional molecular dynamics (Kapuśniak et al., 29 Sep 2025). MSMs partition the state space into metastable regions and construct a transition matrix Tij\mathsf{T}_{ij} empirically estimated from fine-grained simulations:

Tij=CijkCik,Cij={x(t)Si:x(t+τ)Sj}\mathsf{T}_{ij} = \frac{C_{ij}}{\sum_k C_{ik}}, \quad C_{ij} = |\{x(t) \in S_i : x(t+\tau) \in S_j\}|

MarS-FM models transitions between these metastable states instead of frame-to-frame jumps, eliminating the dominance of short-timescale, uninformative intra-state transitions in training.

In practice, MarS-FM draws a source state SiS_i and a target state SjS_j from T\mathsf{T}, selects representative conformations x0,x1x_0, x_1, computes a noisy interpolation xsx_s, and seeks to regress the velocity field vθ(s,xs;x0)v_\theta(s, x_s; x_0) so that samples flow along macroscopic state transitions. The target for velocity, χ˙s(1)\dot\chi_s^{(1)}, incorporates kinetic information from a trigonometric interpolation between structure and noise:

χs(1)=σ(s)ε+α(s)χ(1),α(s)=sin(π2s),  σ(s)=cos(π2s)\chi_s^{(1)} = \sigma(s)\varepsilon + \alpha(s)\chi^{(1)}, \quad \alpha(s) = \sin\left(\frac{\pi}{2}s\right), \; \sigma(s) = \cos\left(\frac{\pi}{2}s\right)

This approach achieves sampling acceleration by over two orders of magnitude compared to traditional MD, while capturing long-timescale events (e.g., folding, unfolding) that are inaccessible to fixed-lag emulators (Kapuśniak et al., 29 Sep 2025).

3. Extensions to Non-Euclidean and Structured Spaces

MarS-FM generalizes beyond vector spaces:

  • Lie Groups: When states involve rotations or rigid motions (e.g., protein backbones in SE(3)), flow matching is performed intrinsically on the group manifold using exponential curves as interpolants (Sherry et al., 1 Apr 2025). Here, the fundamental update is gt=g0exp(tlog(g01g1))g_t = g_0 \exp(t \log(g_0^{-1}g_1)) with velocity ut(gg1)=(Lg)log(g1g1)/(1t)u_t(g \mid g_1) = (L_g)_*\log(g^{-1}g_1)/(1-t), where (Lg)(L_g)_* pushes forward the Lie algebra vector.
  • Discrete Markov Chains: For jump processes or categorical data, the FM framework is instantiated using Continuous-Time Markov Chains (CTMCs) and generator matrices/rate parameters, ensuring evolutions are compatible with the structure of the state space (Lipman et al., 9 Dec 2024).
  • Riemannian Manifolds: Flow matching can be defined along geodesics or through tangent bundle dynamics, crucial for data on spheres or other curved geometric domains.

The general approach in all cases is to preserve the compatibility between the flow interpolation and the underlying structure of the state space, ensuring that the learned dynamics respect symmetries, invariants, or flow constraints.

4. Stability, Convergence, and Theoretical Guarantees

MarS-FM inherits and extends the theoretical analysis of continuous flow matching. Under mild regularity assumptions (e.g., Lipschitz continuity), the error in the learned velocity field upper-bounds the divergence between the generated and true data distributions in Wasserstein-2 and χ2\chi^2 senses (Dao et al., 2023, Xu et al., 3 Oct 2024):

  • In the latent (or local) context,

W22(p0,p^0)Δ2+Lg2exp(1+2L^)01v(zt,t)v^(zt,t)2dqtdtW_2^2(p_0, \hat{p}_0) \leq \|\Delta\|^2 + L_g^2\exp(1 + 2\hat{L}) \int_0^1 \int \|v(z_t, t) - \hat{v}(z_t, t)\|^2 dq_t dt

where Δ\Delta encodes autoencoder reconstruction error.

  • In local FM block architectures,

χ2(pNq)e2γNχ2(p0q)+[C4/(1e2γ)]ϵ1/2\chi^2(p_N \Vert q) \leq e^{-2\gamma N} \chi^2(p_0 \Vert q) + [C_4/(1 - e^{-2\gamma})]\epsilon^{1/2}

when ϵ\epsilon bounds the per-block velocity field error.

  • For general Markov processes, the equivalence of “global” and conditional generator-matching losses (via Bregman divergences) holds, and the gradient of the generative modeling loss with respect to the velocity parameterization is invariant under conditional path sampling (Lipman et al., 9 Dec 2024).

Stability of learned flows, especially in the context of non-convex energy landscapes or physically constrained data, is addressed by leveraging Lyapunov function parameterizations, control-theoretic invariance principles, and autonomous vector field design (Sprague et al., 8 Feb 2024).

5. Computational Efficiency, Practical Implementation, and Model Variants

One of the principal motivations for MarS-FM is computational tractability for high-dimensional, structured, or resource-constrained generative modeling:

  • Latent Space Flow Matching: Performing FM in the latent space of pretrained autoencoders yields orders-of-magnitude reductions in function evaluations and enables efficient ODE integration for high-resolution synthesis (Dao et al., 2023).
  • Local Flow Matching: Sequential composition of local FM blocks (editor's term, LFM) offers modularity and amortizes the complexity of global transport over smaller Markovian subflows, improving both training efficiency and distillation to fast generators (Xu et al., 3 Oct 2024).
  • Model-aligned Couplings: Matching training couplings not only via geometric distance (optimal transport) but also by aligning with model capacity can improve straightness of trajectories and improve generation quality and computational budget (Lin et al., 29 May 2025).
  • Markovian FM for MCMC Acceleration: Embedding continuous flows within MCMC pipelines, where local gradient-based kernels are interleaved with learned non-local flow proposals, results in improved mixing and mode coverage in challenging inference tasks at reduced computational cost (Cabezas et al., 23 May 2024).

Key implementation elements also include classifier-free guidance for conditional generation, flexible ODE solvers (Euler, Heun), and methods for on-the-fly parameter adaptation and tempering.

6. Applications and Empirical Performance

MarS-FM has demonstrated utility across a range of data modalities and domains:

  • Molecular Dynamics: MarS-FM (as an MSM-Emulator) achieves more than two orders of magnitude sampling speedup over classical MD while closely matching reference MD statistics (RMSD, radius of gyration, secondary structure, free energy landscapes) across diverse protein domains, including strict separation between training/test sequences (Kapuśniak et al., 29 Sep 2025).
  • Accelerated Probabilistic Inference: MarS-FM-based samplers combine local and global moves to achieve state-of-the-art target density approximation and mixing efficiency on benchmark Bayesian and physical systems, e.g., field systems, Cox point processes (Cabezas et al., 23 May 2024).
  • Conditional and Structured Generation: By conditioning on class labels, semantic masks, or structural information, MarS-FM frameworks offer competitive FID and recall scores for high-resolution image synthesis, inpainting, and semantic-to-image tasks (Dao et al., 2023).
  • Unsupervised Anomaly Detection: Time-reversed FM variants provide new mechanisms for anomaly detection/localization by constructing displacement paths with “degenerate potential wells” for normal vs. anomalous samples, yielding state-of-the-art AUROC scores on industrial defect datasets (Li et al., 7 Aug 2025).

7. Future Directions and Open Challenges

Future research directions for MarS-FM include:

  • Integrated modeling for molecular complexes and larger biomolecular assemblies
  • Sequence-to-ensemble generative modeling by unifying MarS-FM with structure prediction pipelines
  • Advanced MSM constructions adaptive to temperature or environmental variations
  • Hybrid models combining MSM-driven state transitions with local fine-grained dynamics
  • Extending MarS-FM to more general classes of continuous/discrete Markov processes, including non-reversible and non-stationary systems
  • Scalable, structure-preserving flows on complex manifolds and product spaces using group and geometric representations

Efforts are also directed toward educational resources, open-source codebases, and modular framework designs to foster broader adoption and further theoretical analysis (Lipman et al., 9 Dec 2024).


MarS-FM unifies multiple perspectives on simulation-free generative modeling, encompassing continuous flows, stochastic processes, and Markovian structure. Its integration of efficient learning objectives, flexible state space representations, and rigorous mathematical guarantees positions MarS-FM as a foundational tool in modern generative modeling, particularly for complex, structured, or high-dimensional probabilistic systems.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Markov Space Flow Matching (MarS-FM).