Papers
Topics
Authors
Recent
Search
2000 character limit reached

Linearized Optimal Transport (LOT)

Updated 24 May 2026
  • Linearized Optimal Transport is a framework that embeds probability measures into Euclidean space using optimal transport theory.
  • It reduces nonlinear Wasserstein geometry to a linear structure, enabling efficient variance decomposition and scalable linear learning.
  • Empirical studies demonstrate high explained variance and competitive accuracy across vision, text, and biomedical applications.

Linearized Optimal Transport (LOT) is a mathematical and algorithmic framework that provides an explicit linear embedding of probability measures into a high-dimensional Euclidean (Hilbert) space via optimal transport theory. By fixing a reference measure, LOT maps each measure to the displacement field of its optimal transport map relative to the reference. This structure yields computational benefits for statistical analysis and machine learning tasks, notably by reducing the nonlinear geometry of Wasserstein space to a linear Euclidean one, enabling efficient variance decomposition and scalable linear learning techniques. Recent research has further extended LOT to settings involving structured data, such as Fused Gromov-Wasserstein distances, and established its effectiveness in high-dimensional and graph-based applications (Wilson et al., 2024).

1. Core Definition and Construction of the LOT Embedding

Given a reference measure μ0=i=1naiδxi\mu_0 = \sum_{i=1}^n a_i \delta_{x_i} on a space ΩRd\Omega \subset \mathbb{R}^d (with ai>0a_i > 0, iai=1\sum_i a_i=1), and a target measure μ=j=1mbjδyj\mu = \sum_{j=1}^m b_j \delta_{y_j}, LOT constructs the embedding via the barycentric projection of the optimal transport plan γΠ(a,b)\gamma^* \in \Pi(a,b) solving

W22(μ0,μ)=minγΠ(a,b)i,jγijxiyj2.W_2^2(\mu_0, \mu) = \min_{\gamma \in \Pi(a,b)} \sum_{i,j} \gamma_{ij} \|x_i - y_j\|^2.

For each template point xix_i, the barycentric projection determines

Tμμ0(xi)=1aij=1mγijyj.T_{\mu \to \mu_0}(x_i) = \frac{1}{a_i} \sum_{j=1}^m \gamma_{ij}^* y_j.

The LOT embedding is then

Φ(μ)=[a1(Tμμ0(x1)x1);  ;an(Tμμ0(xn)xn)]Rnd.\Phi(\mu) = [\,\sqrt{a_1}(T_{\mu \to \mu_0}(x_1) - x_1)\,;\; \dots; \sqrt{a_n}(T_{\mu \to \mu_0}(x_n)-x_n)\,] \in \mathbb{R}^{nd}.

This map is linear: geodesics in Wasserstein space through ΩRd\Omega \subset \mathbb{R}^d0 correspond to straight lines in LOT space. For two measures, the distance in LOT space is the ΩRd\Omega \subset \mathbb{R}^d1 norm of their embeddings,

ΩRd\Omega \subset \mathbb{R}^d2

which linearly approximates their 2-Wasserstein distance near ΩRd\Omega \subset \mathbb{R}^d3 (Wilson et al., 2024).

2. Fréchet Variance Decomposition in Wasserstein Space

For a collection ΩRd\Omega \subset \mathbb{R}^d4 with Wasserstein barycenter ΩRd\Omega \subset \mathbb{R}^d5, the Fréchet variance is

ΩRd\Omega \subset \mathbb{R}^d6

LOT enables a variance decomposition: ΩRd\Omega \subset \mathbb{R}^d7 where ΩRd\Omega \subset \mathbb{R}^d8 is the mean embedding. The explained term is the trace of the covariance matrix ΩRd\Omega \subset \mathbb{R}^d9 of the LOT embeddings. Diagonalizing ai>0a_i > 00 yields eigenvalues ai>0a_i > 01, and the fraction of Fréchet variance explained by the first ai>0a_i > 02 LOT coordinates is

ai>0a_i > 03

This quantifies the representational efficiency of the LOT embedding and underpins the use of principal component analysis and related methods in LOT space (Wilson et al., 2024).

3. Variance Decomposition in Fused Gromov–Wasserstein (FGW) Space

The LOT variance decomposition extends to Fused Gromov-Wasserstein (FGW) settings used for structured objects (e.g., graphs). For an object ai>0a_i > 04 and data ai>0a_i > 05, the squared FGW distance is

ai>0a_i > 06

With barycentric projections in both node and edge domains, the total variance splits into a deterministic (explained) part—corresponding to squared Euclidean norm in the joint LOT embedding—and a residual (probabilistic) part from the FGW coupling: ai>0a_i > 07 where ai>0a_i > 08 is the residual GW cost (Wilson et al., 2024).

4. Algorithms and Computational Aspects

The practical pipeline comprises several stages:

A. Reference Barycenter Computation

  • Initialize support points and weights for ai>0a_i > 09.
  • For each measure, compute the optimal transport plan to iai=1\sum_i a_i=10 (using entropic regularization—Sinkhorn, for speed).
  • Update support locations using barycentric updates; repeat until convergence.

B. LOT Embedding Generation

  • For each measure, solve OT to iai=1\sum_i a_i=11, compute barycentric projections, and form embedding vectors.

C. Covariance Analysis

  • Stack LOT embeddings into a matrix, center, and form the covariance.
  • Diagonalize to obtain principal directions and—via iai=1\sum_i a_i=12—quantify variance explained.

Complexity is dominated by OT computation (per embedding: iai=1\sum_i a_i=13 for Sinkhorn; iai=1\sum_i a_i=14 for exact LP), barycenter steps (iai=1\sum_i a_i=15 per iteration), and eigen-decomposition (iai=1\sum_i a_i=16, though only the top components need be computed in practice).

5. Empirical Results and Observed Effectiveness

Empirical analyses demonstrate strong variance-explaining and classification properties for LOT embeddings across diverse datasets (Wilson et al., 2024):

MNIST (vision):

  • With as few as iai=1\sum_i a_i=17 support points, over iai=1\sum_i a_i=18 of population Fréchet variance is captured; at iai=1\sum_i a_i=19, over μ=j=1mbjδyj\mu = \sum_{j=1}^m b_j \delta_{y_j}0.
  • Classification accuracy (LightGBM tree on LOT embedding) is μ=j=1mbjδyj\mu = \sum_{j=1}^m b_j \delta_{y_j}1 at μ=j=1mbjδyj\mu = \sum_{j=1}^m b_j \delta_{y_j}2, rising to μ=j=1mbjδyj\mu = \sum_{j=1}^m b_j \delta_{y_j}3 at μ=j=1mbjδyj\mu = \sum_{j=1}^m b_j \delta_{y_j}4.

IMDB-50000 (text-graph):

  • Word2Vec clouds embedded; edge weights encode word-order structure.
  • μ=j=1mbjδyj\mu = \sum_{j=1}^m b_j \delta_{y_j}5 variance explained at μ=j=1mbjδyj\mu = \sum_{j=1}^m b_j \delta_{y_j}6 for μ=j=1mbjδyj\mu = \sum_{j=1}^m b_j \delta_{y_j}7 (geometry-only LOT).
  • Classification accuracy reaches μ=j=1mbjδyj\mu = \sum_{j=1}^m b_j \delta_{y_j}8 in this regime.

Diffusion Tensor MRI (biomedical):

  • Empirical measures in μ=j=1mbjδyj\mu = \sum_{j=1}^m b_j \delta_{y_j}9.
  • γΠ(a,b)\gamma^* \in \Pi(a,b)0 variance at γΠ(a,b)\gamma^* \in \Pi(a,b)1, γΠ(a,b)\gamma^* \in \Pi(a,b)2 SVM accuracy for gender, outperforming mean-FA baselines.

These results indicate that low-dimensional LOT representations simultaneously achieve high explained variance and competitive accuracy, supporting the practical use of compact LOT-based representations.

6. Practical Guidelines and Limitations

Key recommendations include:

  • Choose γΠ(a,b)\gamma^* \in \Pi(a,b)3 (template support size, embedding dimension) via the “elbow” method on the explained variance curve γΠ(a,b)\gamma^* \in \Pi(a,b)4; small γΠ(a,b)\gamma^* \in \Pi(a,b)5 often suffices for high coverage in vision and moderate in biomedical tasks.
  • γΠ(a,b)\gamma^* \in \Pi(a,b)6 should be tuned based on whether node geometry or edge structure is more relevant; for many applications, γΠ(a,b)\gamma^* \in \Pi(a,b)7 (classical LOT) is effective.
  • Balance embedding dimension against computational cost: lower γΠ(a,b)\gamma^* \in \Pi(a,b)8 yields simpler eigendecomposition and faster downstream learning.
  • LOT provides only a local linearization; global curvature is neglected—nonlinear OT-PCA or similar tools may be needed for heavily nonlinear datasets.
  • The barycenter computation remains a major bottleneck for very large-scale datasets, though free-support Sinkhorn algorithms and stochastic Frank-Wolfe methods alleviate this.

Open directions include improved barycenter solvers, direct LOT extensions to unbalanced/partial/multimarginal OT, and adapting LOT for manifold or SPD-valued data (Wilson et al., 2024).


References:

  • "Fused Gromov-Wasserstein Variance Decomposition with Linear Optimal Transport" (Wilson et al., 2024).
  • "Linearized Wasserstein Barycenters: Synthesis, Analysis, Representational Capacity, and Applications" (Werenski et al., 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Linearized Optimal Transport (LOT).