Linearized Optimal Transport (LOT)
- Linearized Optimal Transport is a framework that embeds probability measures into Euclidean space using optimal transport theory.
- It reduces nonlinear Wasserstein geometry to a linear structure, enabling efficient variance decomposition and scalable linear learning.
- Empirical studies demonstrate high explained variance and competitive accuracy across vision, text, and biomedical applications.
Linearized Optimal Transport (LOT) is a mathematical and algorithmic framework that provides an explicit linear embedding of probability measures into a high-dimensional Euclidean (Hilbert) space via optimal transport theory. By fixing a reference measure, LOT maps each measure to the displacement field of its optimal transport map relative to the reference. This structure yields computational benefits for statistical analysis and machine learning tasks, notably by reducing the nonlinear geometry of Wasserstein space to a linear Euclidean one, enabling efficient variance decomposition and scalable linear learning techniques. Recent research has further extended LOT to settings involving structured data, such as Fused Gromov-Wasserstein distances, and established its effectiveness in high-dimensional and graph-based applications (Wilson et al., 2024).
1. Core Definition and Construction of the LOT Embedding
Given a reference measure on a space (with , ), and a target measure , LOT constructs the embedding via the barycentric projection of the optimal transport plan solving
For each template point , the barycentric projection determines
The LOT embedding is then
This map is linear: geodesics in Wasserstein space through 0 correspond to straight lines in LOT space. For two measures, the distance in LOT space is the 1 norm of their embeddings,
2
which linearly approximates their 2-Wasserstein distance near 3 (Wilson et al., 2024).
2. Fréchet Variance Decomposition in Wasserstein Space
For a collection 4 with Wasserstein barycenter 5, the Fréchet variance is
6
LOT enables a variance decomposition: 7 where 8 is the mean embedding. The explained term is the trace of the covariance matrix 9 of the LOT embeddings. Diagonalizing 0 yields eigenvalues 1, and the fraction of Fréchet variance explained by the first 2 LOT coordinates is
3
This quantifies the representational efficiency of the LOT embedding and underpins the use of principal component analysis and related methods in LOT space (Wilson et al., 2024).
3. Variance Decomposition in Fused Gromov–Wasserstein (FGW) Space
The LOT variance decomposition extends to Fused Gromov-Wasserstein (FGW) settings used for structured objects (e.g., graphs). For an object 4 and data 5, the squared FGW distance is
6
With barycentric projections in both node and edge domains, the total variance splits into a deterministic (explained) part—corresponding to squared Euclidean norm in the joint LOT embedding—and a residual (probabilistic) part from the FGW coupling: 7 where 8 is the residual GW cost (Wilson et al., 2024).
4. Algorithms and Computational Aspects
The practical pipeline comprises several stages:
A. Reference Barycenter Computation
- Initialize support points and weights for 9.
- For each measure, compute the optimal transport plan to 0 (using entropic regularization—Sinkhorn, for speed).
- Update support locations using barycentric updates; repeat until convergence.
B. LOT Embedding Generation
- For each measure, solve OT to 1, compute barycentric projections, and form embedding vectors.
C. Covariance Analysis
- Stack LOT embeddings into a matrix, center, and form the covariance.
- Diagonalize to obtain principal directions and—via 2—quantify variance explained.
Complexity is dominated by OT computation (per embedding: 3 for Sinkhorn; 4 for exact LP), barycenter steps (5 per iteration), and eigen-decomposition (6, though only the top components need be computed in practice).
5. Empirical Results and Observed Effectiveness
Empirical analyses demonstrate strong variance-explaining and classification properties for LOT embeddings across diverse datasets (Wilson et al., 2024):
MNIST (vision):
- With as few as 7 support points, over 8 of population Fréchet variance is captured; at 9, over 0.
- Classification accuracy (LightGBM tree on LOT embedding) is 1 at 2, rising to 3 at 4.
IMDB-50000 (text-graph):
- Word2Vec clouds embedded; edge weights encode word-order structure.
- 5 variance explained at 6 for 7 (geometry-only LOT).
- Classification accuracy reaches 8 in this regime.
Diffusion Tensor MRI (biomedical):
- Empirical measures in 9.
- 0 variance at 1, 2 SVM accuracy for gender, outperforming mean-FA baselines.
These results indicate that low-dimensional LOT representations simultaneously achieve high explained variance and competitive accuracy, supporting the practical use of compact LOT-based representations.
6. Practical Guidelines and Limitations
Key recommendations include:
- Choose 3 (template support size, embedding dimension) via the “elbow” method on the explained variance curve 4; small 5 often suffices for high coverage in vision and moderate in biomedical tasks.
- 6 should be tuned based on whether node geometry or edge structure is more relevant; for many applications, 7 (classical LOT) is effective.
- Balance embedding dimension against computational cost: lower 8 yields simpler eigendecomposition and faster downstream learning.
- LOT provides only a local linearization; global curvature is neglected—nonlinear OT-PCA or similar tools may be needed for heavily nonlinear datasets.
- The barycenter computation remains a major bottleneck for very large-scale datasets, though free-support Sinkhorn algorithms and stochastic Frank-Wolfe methods alleviate this.
Open directions include improved barycenter solvers, direct LOT extensions to unbalanced/partial/multimarginal OT, and adapting LOT for manifold or SPD-valued data (Wilson et al., 2024).
References:
- "Fused Gromov-Wasserstein Variance Decomposition with Linear Optimal Transport" (Wilson et al., 2024).
- "Linearized Wasserstein Barycenters: Synthesis, Analysis, Representational Capacity, and Applications" (Werenski et al., 2024).