Marginal-Data Transport: Theory & Applications
- Marginal-data transport is a framework that reconstructs latent dynamics—such as velocity fields, joint couplings, or network flows—from partial marginal observations.
- It leverages continuum formulations, discrete multi-marginal models, and RKHS embedding to recover minimum-energy dynamics and other latent structures.
- Applications range from inverse estimation in population flows to adaptive communication strategies, demonstrating advantages in cost efficiency and computational speed.
Marginal-data transport denotes a family of transport problems in which the available observations are marginals, temporal snapshots, aggregate counts, partial endpoint laws, or decoupled coordinatewise samples rather than fully observed trajectories or a completely specified joint coupling. In the cited literature, this includes recovering a minimum-energy velocity field from a time-continuous family of marginals, solving multi-marginal optimal transport with prescribed marginals on selected coordinates or times, estimating latent flows from aggregate observations, projecting scarce coupled data onto richer marginal information, and optimizing network flows under temporal departure–arrival constraints (Nakano, 27 Apr 2026, Yang et al., 2022, Kim et al., 29 Mar 2026, Pathan et al., 2024, Dong et al., 16 Feb 2026). A distinct communications usage applies the phrase to opportunistic movement of data bundles by vehicles under sparse infrastructure, where the transported object is digital data rather than probability mass (Mohammed et al., 2019).
1. Core formulations and observed information
The most direct transport-theoretic usage arises when one is given marginals but not trajectories. In continuum-marginal optimal transport, the input is a time-indexed family of Borel probability measures
with densities , and the task is to recover the minimum-energy deterministic velocity field whose flow reproduces every marginal (Nakano, 27 Apr 2026). In discrete-time multi-marginal formulations, one instead prescribes some subset of marginals of a high-order coupling tensor and optimizes over the remaining degrees of freedom, often under entropy regularization and graph-structured costs (Haasler et al., 2020).
A second class of problems uses marginal data as incomplete evidence about a latent joint object. Inverse multi-marginal OT for population flows observes aggregated count vectors over time and infers latent transition flows and time-varying cost functions (Yang et al., 2022). Projection-based coupling reconstruction starts from a small coupled sample and larger decoupled marginal samples , then estimates the joint distribution by Wasserstein projection onto the set of couplings with those marginals (Kim et al., 29 Mar 2026). Related Schrödinger bridge formulations on graphs handle endpoint marginals that are only partially known on subsets of nodes or only through moments, and reconstruct the unknown parts of the marginals jointly with the most likely path law (Pathan et al., 2024).
A third class arises on networks and flux spaces. Dynamic multi-commodity minimum-cost flow can be reformulated as a multi-marginal OT problem in which time-slice edge occupancies and commodity-conditioned endpoint data are marginals or bi-marginals of a high-order tensor (Haasler et al., 2021). Temporally flexible transport scheduling treats departure rates and arrival rates themselves as temporal marginals, with nodal capacity limits and either independent or coupled departure–arrival constraints (Dong et al., 16 Feb 2026). In multi-material branched transport, the prescribed source and target data are arbitrary compactly supported -valued Radon measures, and the unknown is a matrix-valued flux satisfying (Marchese et al., 2018).
| Setting | Observed information | Recovered object |
|---|---|---|
| Continuum-marginal OT | 0 | Minimum-energy velocity field |
| Inverse MOT for aggregates | 1 | Latent flows 2, costs 3 |
| OT projection with mixed data | 4 and 5 | Joint distribution 6 |
| Incomplete-marginal Schrödinger bridge | Partial endpoint values or moments | Full endpoint marginals and path law |
| Network DA scheduling | Departure and arrival rates over time | Optimal path-time schedule |
2. Continuum-time marginal prescriptions and dynamic recovery
The continuum-marginal formulation is the most explicit statement of transport from marginal data observed over time. The unknown drift 7 drives the ODE
8
and must satisfy the continuity equation
9
The optimization problem is
0
where 1 denotes admissible velocity fields satisfying the weak continuity equation (Nakano, 27 Apr 2026). This differs from classical Benamou–Brenier OT because the entire density path 2 is prescribed in advance rather than only 3 and 4. The same paper identifies the problem as the continuum limit of two-marginal Benamou–Brenier OT and the deterministic limit of the Nelson problem, with
5
At the population level, the recovered object is unique under the stated assumptions. If 6, the problem admits a unique minimizer 7, determined 8-a.e., every optimal path solves
9
and the minimizer has gradient structure
0
for some potential 1 (Nakano, 27 Apr 2026). This gives an identifiability statement that is stronger than endpoint-only OT: a full continuum of marginals fixes the minimum-energy deterministic dynamics.
The main computational contribution in this line is an RKHS embedding of the weak continuity equation. Defining the residual
2
the paper constructs an RKHS-valued representer 3 and proves the exact equivalence
4
This yields the penalized objective
5
which is mesh-free and sample-only: it uses only samples 6, values of 7, and kernel evaluations, with no Eulerian spatial discretization (Nakano, 27 Apr 2026). The paper proves variational exactness as 8 and, under an additional structural assumption, convergence 9 in 0.
A related but distinct use of marginal preservation appears in 1-rectified flow. There, one begins with a coupling of two endpoint distributions 2, forms an interpolation process 3, and modifies its expected velocity by removing only the 4-marginal-preserving component. The resulting ODE
5
preserves the full family of time marginals,
6
while monotonically decreasing the chosen convex transport cost and reaching fixed points that are exactly 7-optimal couplings (Liu, 2022). This preserves marginals of a current interpolation rather than inferring them from external data, but it makes the continuity-equation viewpoint operational in flow-based optimization.
3. Multi-marginal, graphical, and regression-based transport
Discrete multi-marginal OT generalizes the pairwise coupling problem to a tensor 8 with only a subset of marginals prescribed: 9 Under entropy regularization,
0
the optimizer has multiplicative form 1, where 2 and 3 (Haasler et al., 2020). When the cost decomposes over a factor graph,
4
the Gibbs kernel factorizes as
5
and the problem becomes equivalent to constrained Bayesian marginal inference in a probabilistic graphical model. On trees, the paper derives constrained belief-propagation equations, an Iterative Scaling Belief Propagation algorithm, and a Constrained Norm-Product algorithm, with exactness and global convergence in the tree-structured setting (Haasler et al., 2020).
Distributional regression supplies another multi-marginal interpretation. Given time-stamped observed distributions 6, the paper seeks a curve 7 in Wasserstein space minimizing
8
where 9 is a family of lifted Euclidean curve templates such as linear or quadratic latent trajectories (Karimi et al., 2021). For linear paths,
0
and for quadratic paths,
1
The corresponding optimization over 2 is exactly equivalent to a multi-marginal OT problem over latent trajectory parameters and observed snapshot variables. In discrete form, entropy regularization yields generalized Sinkhorn iterations with per-iteration complexity reduced to 3 in the linear model by exploiting separability of the cost (Karimi et al., 2021). The method therefore infers a coupling across multiple observed marginals without trajectory identities.
Sample-defined marginals motivate a different reduction. A data-driven linear-programming method approximates
4
approximates the unknown plan as
5
and decomposes transport into local componentwise OT costs 6 plus a global transport LP in the weights 7 (Chen et al., 2017). Under a product-form assumption on component densities and quadratic cost, local costs reduce to sums of one-dimensional OT costs, and adaptive mesh refinement restricts the support of the LP. This makes marginal-data transport possible directly from samples 8 and 9, without analytic marginals.
For discrete two-marginal OT, Genetic Column Generation addresses the complementary problem of sparse exact recovery. Over discrete supports of sizes 0, the paper proves that any extreme point of the Kantorovich polytope is supported on at most 1 points and that GenCol converges almost surely to an exact optimizer for arbitrary costs and marginals whenever the support budget satisfies 2 (Friesecke et al., 2023). In this sense, marginal-data transport can be solved exactly while optimizing only over a dynamically updated sparse support of size 3.
The same multi-marginal viewpoint extends beyond Euclidean geometry. On the Heisenberg group 4, multi-marginal transport with barycentric cost
5
admits, under technical conditions, a unique optimal Kantorovich plan induced by a Monge map over the first variable and factorization through a Wasserstein barycenter (Pass et al., 2020). This shows that marginal-data transport retains a barycentric coupling structure in a sub-Riemannian setting, though the proofs require assumptions absent in Euclidean space.
4. Aggregate, partial, and mixed-data inference
When observations are aggregate counts rather than trajectories, the latent transport object can be learned by inverse multi-marginal OT. In the population-flow setting, the observed data are count vectors
6
over discrete states 7, while the latent object is the sequence of transition flows
8
The paper formulates a convex latent-flow estimation problem with KL terms
9
and shows that, after setting
0
the problem is equivalent to an entropy-regularized MOT problem whose pairwise projections yield the transition flows
1
The proposed algorithms, SBP-ISTC and SBP-ISTA, outperform STAY, CGM, CNP, and SBP-EM on four mobility datasets, with reported NMAE values such as 2 and 3 on Beijing Taxi and 4 and 5 on San Francisco Cabs (Yang et al., 2022). The transport law is therefore learned from marginals alone, rather than from tracked individuals.
A different reconstruction problem arises when one has both coupled data and separate marginal data. Let
6
be the empirical joint law from coupled observations and
7
the empirical marginals from decoupled samples. The estimator is the Wasserstein projection
8
The paper proves stability bounds
9
derives sample complexity
0
and gives an explicit shadow representation
1
together with an entropic shadow approximation in almost linear time and in parallel (Kim et al., 29 Mar 2026). Here marginal-data transport means extending scarce dependence information to richer coordinatewise marginals.
Partial endpoint knowledge leads to entropy-regularized transport over networks with incomplete marginals information. On a finite directed graph, a path-space law 2 is selected by minimizing relative entropy 3 to a prior 4, but instead of fully prescribing 5 and 6, one prescribes either their values on subsets of nodes or only certain moments (Pathan et al., 2024). In the subset-observation case, the endpoint coupling 7 solves a KL projection problem subject to
8
9
and normalization. The optimizer has multiplicative form
00
with a generalized Schrödinger system and explicit reconstruction of the unknown parts of the endpoint marginals (Pathan et al., 2024). In moment-constrained cases, the optimal endpoint coupling becomes an exponential-family tilt of 01.
Monotonicity constraints can also recover sharp information from marginals alone. In directional OT, admissible couplings are
02
Feasibility holds iff 03, and there exists a unique coupling 04 determined entirely from the marginals (Nutz et al., 2020). It is characterized by the minimal cdf property
05
and is optimal for all integrable submodular rewards. In particular, it yields the sharp upper bound for 06 under the monotone treatment effect restriction 07 (Nutz et al., 2020). This is a canonical example of marginal-only transport under a structural support constraint.
5. Network, flux, and temporally constrained transport
Dynamic flow over networks can be cast as multi-marginal OT by treating edge occupancies at each time slice as marginals of a tensor. For a time-expanded network with 08 edges and horizon 09, a tensor
10
represents amounts of flow assigned to edge sequences. The single-time projection
11
is the flow over edges at time 12, and bi-marginals such as 13 and 14 encode commodity-conditioned source and sink data in the multi-commodity setting (Haasler et al., 2021). With graph-structured cost
15
entropy regularization yields generalized Sinkhorn iterations. By exploiting graph sparsity, one sweep costs 16 in the sparse multi-commodity case, and the paper reports good approximations at least one order of magnitude faster than an LP solver, with roughly two orders of magnitude speed advantage in a sparse grid example (Haasler et al., 2021).
Temporally flexible transport scheduling moves the marginal viewpoint from spatial endpoint distributions to temporal departure and arrival laws. On a line graph, independent DA constraints prescribe
17
while intermediate crossing-time marginals are constrained by
18
On line graphs, feasibility is characterized by a shifted stochastic-order condition,
19
and the independent DA problem admits a unique minimizer under the generalized Monge condition (Dong et al., 16 Feb 2026). Coupled DA constraints instead prescribe a joint departure–arrival law 20; the resulting problem is an unequal-dimensional OT problem, and under non-degeneracy and 21-twist the optimizer is unique and pure: 22 For general graphs, the paper reduces the problem to a prescribed set of source-sink paths 23, introduces pathwise couplings 24, and solves the entropically regularized problem by a graph-structured Sinkhorn method with linear convergence rate in terms of marginal violation (Dong et al., 16 Feb 2026).
Flux formulations generalize still further. In multi-material transport with arbitrary marginals, the data are compactly supported vector-valued Radon measures
25
and admissible transport is a matrix-valued flux
26
satisfying
27
For discrete graphs the cost is
28
and for arbitrary marginals it is defined by flat relaxation (Marchese et al., 2018). The paper proves existence of minimizers for arbitrary compatible data, finite-cost existence under an admissibility condition on 29, stability under weak-* perturbations of the marginals, and an integral representation
30
for rectifiable transportation networks (Marchese et al., 2018). This is marginal-data transport in Eulerian form, where the marginals are vector-valued source and target measures rather than distributions over trajectories.
A distinct but network-centered usage appears in smart-community communications. There, data produced in a block must be moved to a Smart Community Management Center without a communication backbone between local brokers and the destination, and passing vehicles serve as one-shot data ferries (Mohammed et al., 2019). The online objective is to minimize average overall delay, defined as delivery delay plus waiting delay, by adaptively selecting among threshold, mean, and median hiring rules. Although this usage is not OT, it also treats transport as recovery of a viable dynamics from incomplete infrastructure and limited information.
6. Generative-model reinterpretations and broader significance
In generative modeling, the phrase acquires a task-specific meaning. In few-step 3D flow distillation, Marginal-Data Transport is the target of learning the transport from an intermediate marginal 31 to the data distribution 32, rather than learning a direct one-shot map from pure noise to data (Zhou et al., 4 Sep 2025). With
33
the primary objective is
34
Because the path integral is intractable to be implemented, the paper derives two surrogate objectives: Velocity Matching,
35
and Velocity Distillation, a density-level objective whose gradient is equivalent to score distillation up to a scalar factor (Zhou et al., 4 Sep 2025). Applied to TRELLIS, the method reduces each flow transformer from 36 steps to 37 or 38, with reported latencies 39s and 40s, speedups 41 and 42, and one-step metrics 43, 44, 45 (Zhou et al., 4 Sep 2025).
This machine-learning usage is consistent with a broader theme already present in marginal-preserving rectified flows: transport can be optimized while preserving a family of marginals, and the learned object can be a deterministic time-dependent flow map rather than a single static coupling (Liu, 2022). A plausible synthesis is that marginal-data transport has become a unifying label for problems in which the observable constraints are marginals, while the latent object of interest is richer: a velocity field, a path law, a sparse coupling, a schedule, a flux, or a distilled generator.
Across these literatures, the common structural question is not whether transport exists between two fixed endpoints, but how much dynamical or joint information can be reconstructed when only marginal information is available. The answers differ by regime. In continuum formulations, the object is a minimum-energy drift consistent with all observed marginals (Nakano, 27 Apr 2026). In graphical and multi-marginal formulations, it is a high-order coupling constrained by selected marginals and often reducible by message passing (Haasler et al., 2020). In inverse and mixed-data settings, it is a latent joint law calibrated by aggregates, moments, or decoupled samples (Yang et al., 2022, Kim et al., 29 Mar 2026, Pathan et al., 2024). In constrained one-dimensional or networked settings, it is an extremal or entropy-regularized transport consistent with order, capacity, or scheduling structure (Nutz et al., 2020, Haasler et al., 2021, Dong et al., 16 Feb 2026). The subject is therefore best understood as a transport framework whose primary inputs are marginal observations and whose primary output is a compatible dynamical or joint structure.