Linearized Optimal Transport (LOT)

Updated 24 May 2026

Linearized Optimal Transport is a framework that embeds probability measures into Euclidean space using optimal transport theory.
It reduces nonlinear Wasserstein geometry to a linear structure, enabling efficient variance decomposition and scalable linear learning.
Empirical studies demonstrate high explained variance and competitive accuracy across vision, text, and biomedical applications.

Linearized Optimal Transport (LOT) is a mathematical and algorithmic framework that provides an explicit linear embedding of probability measures into a high-dimensional Euclidean (Hilbert) space via optimal transport theory. By fixing a reference measure, LOT maps each measure to the displacement field of its optimal transport map relative to the reference. This structure yields computational benefits for statistical analysis and machine learning tasks, notably by reducing the nonlinear geometry of Wasserstein space to a linear Euclidean one, enabling efficient variance decomposition and scalable linear learning techniques. Recent research has further extended LOT to settings involving structured data, such as Fused Gromov-Wasserstein distances, and established its effectiveness in high-dimensional and graph-based applications (Wilson et al., 2024).

1. Core Definition and Construction of the LOT Embedding

Given a reference measure $\mu_0 = \sum_{i=1}^n a_i \delta_{x_i}$ on a space $\Omega \subset \mathbb{R}^d$ (with $a_i > 0$ , $\sum_i a_i=1$ ), and a target measure $\mu = \sum_{j=1}^m b_j \delta_{y_j}$ , LOT constructs the embedding via the barycentric projection of the optimal transport plan $\gamma^* \in \Pi(a,b)$ solving

$W_2^2(\mu_0, \mu) = \min_{\gamma \in \Pi(a,b)} \sum_{i,j} \gamma_{ij} \|x_i - y_j\|^2.$

For each template point $x_i$ , the barycentric projection determines

$T_{\mu \to \mu_0}(x_i) = \frac{1}{a_i} \sum_{j=1}^m \gamma_{ij}^* y_j.$

The LOT embedding is then

$\Phi(\mu) = [\,\sqrt{a_1}(T_{\mu \to \mu_0}(x_1) - x_1)\,;\; \dots; \sqrt{a_n}(T_{\mu \to \mu_0}(x_n)-x_n)\,] \in \mathbb{R}^{nd}.$

This map is linear: geodesics in Wasserstein space through $\Omega \subset \mathbb{R}^d$ 0 correspond to straight lines in LOT space. For two measures, the distance in LOT space is the $\Omega \subset \mathbb{R}^d$ 1 norm of their embeddings,

$\Omega \subset \mathbb{R}^d$ 2

which linearly approximates their 2-Wasserstein distance near $\Omega \subset \mathbb{R}^d$ 3 (Wilson et al., 2024).

2. Fréchet Variance Decomposition in Wasserstein Space

For a collection $\Omega \subset \mathbb{R}^d$ 4 with Wasserstein barycenter $\Omega \subset \mathbb{R}^d$ 5, the Fréchet variance is

$\Omega \subset \mathbb{R}^d$ 6

LOT enables a variance decomposition: $\Omega \subset \mathbb{R}^d$ 7 where $\Omega \subset \mathbb{R}^d$ 8 is the mean embedding. The explained term is the trace of the covariance matrix $\Omega \subset \mathbb{R}^d$ 9 of the LOT embeddings. Diagonalizing $a_i > 0$ 0 yields eigenvalues $a_i > 0$ 1, and the fraction of Fréchet variance explained by the first $a_i > 0$ 2 LOT coordinates is

$a_i > 0$ 3

This quantifies the representational efficiency of the LOT embedding and underpins the use of principal component analysis and related methods in LOT space (Wilson et al., 2024).

3. Variance Decomposition in Fused Gromov–Wasserstein (FGW) Space

The LOT variance decomposition extends to Fused Gromov-Wasserstein (FGW) settings used for structured objects (e.g., graphs). For an object $a_i > 0$ 4 and data $a_i > 0$ 5, the squared FGW distance is

$a_i > 0$ 6

With barycentric projections in both node and edge domains, the total variance splits into a deterministic (explained) part—corresponding to squared Euclidean norm in the joint LOT embedding—and a residual (probabilistic) part from the FGW coupling: $a_i > 0$ 7 where $a_i > 0$ 8 is the residual GW cost (Wilson et al., 2024).

4. Algorithms and Computational Aspects

The practical pipeline comprises several stages:

A. Reference Barycenter Computation

Initialize support points and weights for $a_i > 0$ 9.
For each measure, compute the optimal transport plan to $\sum_i a_i=1$ 0 (using entropic regularization—Sinkhorn, for speed).
Update support locations using barycentric updates; repeat until convergence.

B. LOT Embedding Generation

For each measure, solve OT to $\sum_i a_i=1$ 1, compute barycentric projections, and form embedding vectors.

C. Covariance Analysis

Stack LOT embeddings into a matrix, center, and form the covariance.
Diagonalize to obtain principal directions and—via $\sum_i a_i=1$ 2—quantify variance explained.

Complexity is dominated by OT computation (per embedding: $\sum_i a_i=1$ 3 for Sinkhorn; $\sum_i a_i=1$ 4 for exact LP), barycenter steps ( $\sum_i a_i=1$ 5 per iteration), and eigen-decomposition ( $\sum_i a_i=1$ 6, though only the top components need be computed in practice).

5. Empirical Results and Observed Effectiveness

Empirical analyses demonstrate strong variance-explaining and classification properties for LOT embeddings across diverse datasets (Wilson et al., 2024):

MNIST (vision):

With as few as $\sum_i a_i=1$ 7 support points, over $\sum_i a_i=1$ 8 of population Fréchet variance is captured; at $\sum_i a_i=1$ 9, over $\mu = \sum_{j=1}^m b_j \delta_{y_j}$ 0.
Classification accuracy (LightGBM tree on LOT embedding) is $\mu = \sum_{j=1}^m b_j \delta_{y_j}$ 1 at $\mu = \sum_{j=1}^m b_j \delta_{y_j}$ 2, rising to $\mu = \sum_{j=1}^m b_j \delta_{y_j}$ 3 at $\mu = \sum_{j=1}^m b_j \delta_{y_j}$ 4.

IMDB-50000 (text-graph):

Word2Vec clouds embedded; edge weights encode word-order structure.
$\mu = \sum_{j=1}^m b_j \delta_{y_j}$ 5 variance explained at $\mu = \sum_{j=1}^m b_j \delta_{y_j}$ 6 for $\mu = \sum_{j=1}^m b_j \delta_{y_j}$ 7 (geometry-only LOT).
Classification accuracy reaches $\mu = \sum_{j=1}^m b_j \delta_{y_j}$ 8 in this regime.

Diffusion Tensor MRI (biomedical):

Empirical measures in $\mu = \sum_{j=1}^m b_j \delta_{y_j}$ 9.
$\gamma^* \in \Pi(a,b)$ 0 variance at $\gamma^* \in \Pi(a,b)$ 1, $\gamma^* \in \Pi(a,b)$ 2 SVM accuracy for gender, outperforming mean-FA baselines.

These results indicate that low-dimensional LOT representations simultaneously achieve high explained variance and competitive accuracy, supporting the practical use of compact LOT-based representations.

6. Practical Guidelines and Limitations

Key recommendations include:

Choose $\gamma^* \in \Pi(a,b)$ 3 (template support size, embedding dimension) via the “elbow” method on the explained variance curve $\gamma^* \in \Pi(a,b)$ 4; small $\gamma^* \in \Pi(a,b)$ 5 often suffices for high coverage in vision and moderate in biomedical tasks.
$\gamma^* \in \Pi(a,b)$ 6 should be tuned based on whether node geometry or edge structure is more relevant; for many applications, $\gamma^* \in \Pi(a,b)$ 7 (classical LOT) is effective.
Balance embedding dimension against computational cost: lower $\gamma^* \in \Pi(a,b)$ 8 yields simpler eigendecomposition and faster downstream learning.
LOT provides only a local linearization; global curvature is neglected—nonlinear OT-PCA or similar tools may be needed for heavily nonlinear datasets.
The barycenter computation remains a major bottleneck for very large-scale datasets, though free-support Sinkhorn algorithms and stochastic Frank-Wolfe methods alleviate this.

Open directions include improved barycenter solvers, direct LOT extensions to unbalanced/partial/multimarginal OT, and adapting LOT for manifold or SPD-valued data (Wilson et al., 2024).

References:

"Fused Gromov-Wasserstein Variance Decomposition with Linear Optimal Transport" (Wilson et al., 2024).
"Linearized Wasserstein Barycenters: Synthesis, Analysis, Representational Capacity, and Applications" (Werenski et al., 2024).

Markdown Report Issue Upgrade to Chat

References (2)

Fused Gromov-Wasserstein Variance Decomposition with Linear Optimal Transport (2024)

Linearized Wasserstein Barycenters: Synthesis, Analysis, Representational Capacity, and Applications (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Linearized Optimal Transport (LOT).