Trajectory Distribution Matching (TDM)

Updated 22 November 2025

Trajectory Distribution Matching is a framework that models complete path-valued distributions, ensuring consistency across all time points and observed marginals.
It applies to time-series generation, conditional generative modeling, and sequential decision processes by matching the full joint distribution rather than individual endpoints.
TDM leverages multi-marginal optimal transport, analytical loss functions, and uncertainty estimation to achieve scalable, robust, and interpretable trajectory modeling.

Trajectory Distribution Matching (TDM) is a unified theoretical and algorithmic framework designed for learning and evaluating models over path-valued random processes. The central objective of TDM is to learn stochastic processes or transformation policies such that the entire trajectory distribution—not merely marginal or endpoint distributions—is matched between a model and empirical or semantic targets. TDM finds application in time-series generation, conditional generative modeling (images, videos), sequential decision processes, multi-agent density control, and trajectory similarity analysis. Recent developments have produced practical and theoretically robust algorithms for TDM that scale to high dimensions, handle irregular/sparse data, and support both deterministic and stochastic dynamics.

1. Core Formulation and Mathematical Foundations

TDM addresses the challenge of modeling the full space of trajectory couplings—joint probability laws over all observed time points—rather than reducing dynamics to pairwise or marginal matching. In sequence modeling, given observed data at times $t_0 < \dots < t_M$ with empirical marginals $\{\rho_i\}_{i=0}^M$ , the goal is to construct a stochastic process $(x_t)_{t\in [t_0, t_M]}$ such that for any set of times, the joint marginal of $(x_{t_0}, ..., x_{t_M})$ matches the empirical joint $q(z)$ , where $z = (x_{t_0}, ..., x_{t_M})$ .

The formal TDM objective, as realized in Interpolative Multi-Marginal Flow Matching (IMMFM) (Islam et al., 3 Oct 2025), is: $\mathcal{L}_{\mathrm{IMMFM}}(\theta) = \mathbb{E}_{t, z, x} \left[ \| v_\theta(t, x, c) - u^\circ_t(x \mid z) \|_2^2 + \lambda(t)^2 \| s_\theta(t, x, c) - \nabla_x \log p_t(x \mid z) \|_2^2 \right] + \beta\, \mathbb{E}_{t, z, x} \left\| g_\theta(t, x, c)^2 - \delta_\theta(t, x, z)^2 \right\|_2^2$ where $v_\theta$ is a neural drift field, $s_\theta$ a score field, $g_\theta$ a data-driven diffusion, and $u^\circ_t$ and $\nabla_x\log p_t$ are closed-form conditional flow and score targets. The loss decomposes into a flow-matching term and an uncertainty term.

In multi-agent density control and optimal transport (Duan et al., 8 Oct 2025), the TDM problem is to find a control policy $f_\theta(x, t)$ satisfying: $\partial_t \rho(x,t) + \nabla \cdot [\rho(x,t) f_\theta(x, t)] = (1/2)\sigma^2 \Delta \rho$ with $\rho(x, 0) = p_0(x)$ and $\rho(x, T) = p_T(x)$ , and additionally minimizing integrated trajectory-dependent costs such as collision avoidance.

TDM is also realized for irregularly sampled time series (Jahn et al., 29 May 2025) by locally learning time-inhomogeneous Markov generators with matching marginals and (optionally) parameterized jump processes for discontinuity modeling.

2. Reference Trajectory Construction and Pathwise Targets

A central challenge in TDM is constructing suitable reference trajectories $p_t(x \mid z)$ that reflect the intrinsic stochasticity, nonlinearity, and irregularity of real data. IMMFM (Islam et al., 3 Oct 2025) utilizes a piecewise-quadratic interpolation: $\mu_t(z) = x_{t_i} + v_i (t - t_i) + \tfrac{1}{2} \alpha_t (v_i - v_{i+1}) (t - t_i), \qquad \sigma(t) = \sigma_0 (t - t_i) \alpha_t$ with segment velocities $v_i = (x_{t_{i+1}} - x_{t_i}) / (t_{i+1} - t_i)$ and normalized weights $\alpha_t$ . This yields conditionally Gaussian paths for which drift and score can be analytically computed: $u^\circ_t(x \mid z) = \mu'_t(z) + \frac{\sigma'(t)}{\sigma(t)} (x - \mu_t(z)), \quad \nabla_x \log p_t(x \mid z) = \frac{\mu_t(z) - x}{\sigma(t)^2}$

In the context of generative modeling for diffusion-based super-resolution (You et al., 26 Mar 2025) and few-step diffusion models (Luo et al., 9 Mar 2025), reference ODE trajectories are constructed using Probability Flow ODEs (PF-ODE), enabling the definition of loss functions over entire synthetic or real sample paths, not just adjacent pairs or endpoints.

For stochastic and jump-diffusion processes (Jahn et al., 29 May 2025), the local interpolator's law is a Gaussian with analytically computable drift and jump kernels, parameterized so that marginals are always matched to prescribed values. This structure underpins the generator-matching loss for both continuous and discrete events.

3. Multi-Marginal Couplings, Optimal Transport, and Mean-Field Control

Handling irregular, sparse, or high-dimensional sampling requires principled coupling of observed marginals. IMMFM (Islam et al., 3 Oct 2025) builds a multi-marginal optimal transport (OT) coupling: $q(z) = \frac{\prod_{i=0}^{M-1} \pi^*_{i, i+1}(x_{t_i}, x_{t_{i+1}})}{\prod_{i=1}^{M-1} \rho_i(x_{t_i})}$ where $\pi^*_{i, i+1}$ minimizes quadratic cost among all couplings of $\rho_i$ and $\rho_{i+1}$ . This construction guarantees that the learned stochastic process matches every observed marginal: $\int q(z) \prod_{j \neq i} dx_{t_j} = \rho_i(x_{t_i})$ .

In control and robotics, TDM is cast via coupled mean-field FBSDEs and Hamilton–Jacobi–Bellman equations (Duan et al., 8 Oct 2025), where additional trajectory-dependent costs (e.g., obstacle avoidance) enter the running cost, and the policy depends on the time-evolving density $\rho(t)$ , supporting swarm and collision-aware dynamics.

These constructions support models that remain consistent with the full observed distribution across time, in contrast to models that focus solely on pairwise step-to-step transitions or endpoints.

4. Algorithmic Realizations and Optimization Strategies

TDM is implemented by jointly learning model parameters to minimize distributional discrepancy across entire trajectories or associated interpolations. The optimization and parameterization depend on the domain.

Continuous-time flow models: Neural parameterizations for drift, score, and diffusion components, optimized via multi-marginal losses as in IMMFM (Islam et al., 3 Oct 2025), with analytic targets for both flow and uncertainty (heteroscedastic regression).
Trajectory-controlled diffusion models: In super-resolution (You et al., 26 Mar 2025), a two-stage learning strategy alternates between consistency training along PF-ODE paths and Distribution Trajectory Matching (DTM) loss, aligning the entire PF-ODE paths of synthetic and ground-truth data.
Few-step, sampling-adaptive models: Data-free score distillation objectives are used in (Luo et al., 9 Mar 2025), matching intermediate student and teacher trajectory marginals via reverse KL and step-aware objectives, supporting arbitrary $K$ -step generation without solving the continuous-time teacher ODE.
Distributional kernel similarity: For trajectory similarity (Wang et al., 2023), the kernel mean embedding provides an efficient, dataset-adaptive representation of trajectory distributions in RKHS, yielding a similarity measure $K(\mu_X, \mu_Y)$ that possesses the injectivity property (uniqueness) and linear computational cost.

Optimization typically leverages stochastic sampling of time points, intervals, and data pairs, and often integrates analytic marginalization to achieve variance reduction, efficient coupling, and stability across irregularly sampled or noisy data.

5. Theoretical Guarantees and Consistency

A distinguishing feature of modern TDM algorithms is rigorous theoretical support for optimality and stability. IMMFM (Islam et al., 3 Oct 2025) establishes that stationary points of the constructed objective coincide with those of the underlying SDE flow-matching loss: $\text{If~} \nabla_\theta \mathcal{L}_{\mathrm{SDE}} = 0, \text{~then~} \nabla_\theta \mathcal{L}_{\mathrm{IMMFM}} = 0$ provided mild regularity. The uncertainty term in the objective has zero gradient at true drift, ensuring no bias.

For kernel mean embedding approaches (Wang et al., 2023), TDM similarity is injective for characteristic kernels (e.g., Gaussian or Isolation Kernel): $\mu_{\mathcal{P}_X} = \mu_{\mathcal{P}_Y} \iff \mathcal{P}_X = \mathcal{P}_Y$ and empirical estimates converge with rate $O_p(1/\sqrt{m})$ , where $m$ is the trajectory length.

In mean-field control settings (Duan et al., 8 Oct 2025), optimality conditions are derived via the coupled HJB/FBSDE system, and TDM recovers classical OT and entropy-regularized OT (Schrödinger bridge) as special cases.

6. Applications Across Domains

TDM has broad practical impact across several domains:

Longitudinal neuroimaging and clinical forecasting: IMMFM (Islam et al., 3 Oct 2025) achieves $1–4\%$ improvement in Dice coefficient, $1.5–2.2$ dB gain in PSNR, and up to $9\%$ improvement in AD/CN classification (visit 2→4) over previous methods (e.g., Trajectory Flow Matching, MMFM).
Time series and jump-diffusion processes: In (Jahn et al., 29 May 2025), TDM matches both smooth trends and jump noise even under heavy subsampling, outperforming regularized trajectory flow methods in Maximum-Mean-Discrepancy by an order of magnitude.
Few-step generative modeling (images, video): In (Luo et al., 9 Mar 2025) (text-to-image) and (Sun et al., 8 Aug 2025) (video), TDM delivers state-of-the-art perceptual quality with only 2–4 function evaluations—improving FVD and VBench metrics compared to previous distillation, LCM, and PCM approaches.
Trajectory similarity and anomaly mining: Kernel-based TDM (Wang et al., 2023) yields best-in-class performance on trajectory anomaly detection, sub-trajectory detection, and pattern mining at $10^3$ – $10^4 \times$ speedup over classical distances.

7. Comparative Advantages and Limitations

TDM fundamentally differs from point-to-point or marginal-based methodologies by capturing the "global" structure of paths. Advantages include:

Global distributional matching: Ensures consistency across all time points and observed marginals (multi-marginal OT constraint).
Flexible support for irregular, sparse data: Multi-marginal construction and analytic reference paths handle diverse sampling schemes.
Computational scalability: Linear or near-linear time algorithms (kernel methods), analytic loss construction, and simulation-free optimization enable application to high-dimensional trajectories.
Robustness to stochasticity and jumps: TDM handles both diffusion and jump processes via generator-matching losses.

A plausible implication is that TDM's guarantees and analytic structure make it preferable in domains requiring strict distributional consistency, interpretable interpolation, and adaptation to misspecified, high-dimensional, or sparsely sampled data.

A remaining limitation is that the choice of reference coupling and path construction (e.g., quadratic vs. linear, Gaussian vs. non-Gaussian) may introduce inductive biases, and optimality still depends on the faithfulness of these choices to the underlying process.

References:

"Longitudinal Flow Matching for Trajectory Modeling" (Islam et al., 3 Oct 2025)
"Trajectory-Optimized Density Control with Flow Matching" (Duan et al., 8 Oct 2025)
"Trajectory Generator Matching for Time Series" (Jahn et al., 29 May 2025)
"Learning Few-Step Diffusion Models by Trajectory Distribution Matching" (Luo et al., 9 Mar 2025)
"Consistency Trajectory Matching for One-Step Generative Super-Resolution" (You et al., 26 Mar 2025)
"A principled distributional approach to trajectory similarity measurement" (Wang et al., 2023)
"SwiftVideo: A Unified Framework for Few-Step Video Generation through Trajectory-Distribution Alignment" (Sun et al., 8 Aug 2025)