Alignment via Optimal Transport (AOT)

Updated 26 February 2026

Alignment via Optimal Transport (AOT) is a family of methodologies that recast alignment problems as variational optimal transport problems, enabling robust and interpretable matching across domains.
The framework leverages entropic regularization, hierarchical decompositions, and mutual information augmentation to handle noisy, partial, and high-dimensional data effectively.
AOT has been applied in domain adaptation, protein and network alignment, and LLM preference alignment, demonstrating state-of-the-art performance and practical interpretability.

Alignment via Optimal Transport (AOT) is a family of methodologies that recast alignment—be it across distributions, domains, sequences, graphs, networks, or multimodal signals—as a variational optimal transport (OT) problem. By leveraging the geometric structure of OT, AOT achieves principled, often robust and interpretable correspondences between complex objects or spaces. Recent research advances systematically generalize and adapt OT for a wide array of challenges, including but not limited to distribution shift, noisy or partial overlap, domain adaptation, protein and network alignment, hyperbolic and spatial-temporal signals, and even distributional LLM preference alignment.

1. Mathematical Foundations and Variants of AOT

At its core, AOT formulates alignment as the search for a transport plan $T$ between discrete (or continuous) probability measures, $\mu$ and $\nu$ , represented as empirical distributions: $\mu = \sum_{i=1}^n \mu_i\, \delta_{x_i}, \quad \nu = \sum_{j=1}^m \nu_j\, \delta_{z_j}$ subject to various constraints, with a cost matrix $C_{ij} = c(x_i, z_j)$ encoding domain-appropriate dissimilarities.

Classical OT: The Kantorovich formulation seeks

$\min_{T \geq 0} \sum_{i,j} C_{ij}T_{ij} \quad \text{s.t.} \quad T\mathbf{1} = \mu, \; T^T\mathbf{1} = \nu$

Partial and Adaptive OT: AOT extends this by replacing equalities with inequalities: $T\mathbf{1} \leq \mu, \quad T^T\mathbf{1} \leq \nu,\quad T_{ij} \geq 0$ allowing the optimal transported mass between $\mu$ and $\nu$ to be determined adaptively, filtering out outliers or regions of mismatch without manual tuning of mass constraints (Yang et al., 7 Mar 2025). Entropic regularization $\,\epsilon\sum_{ij}T_{ij}(\log T_{ij}-1)\,$ ensures efficient, smooth solutions.

Anchor/Laten Structure and Hierarchical OT: AOT admits further structure by factorizing the transport plan through a smaller set of learned anchors or via explicit hierarchies,

$\mu$ 0

which regularizes alignment and enables cluster-level/block-sparse couplings (Lin et al., 2020, Lee et al., 2019).

Graph, Gromov–Wasserstein, and Fused OT: For structured data (e.g., graphs, proteins, spatial-temporal signals), alignment may combine node-level (Wasserstein) and edge-/structure-level (Gromov-Wasserstein) terms, e.g.,

$\mu$ 1

with the fused objectives coupling feature and geometry (Chen et al., 2020, Hu et al., 8 Oct 2025).

Information-Theoretic OT and Mutual Information: InfoOT incorporates mutual information as a regularizer into the OT problem, encouraging cluster-coherent, outlier-robust alignments (Chuang et al., 2022).

1D Monotonic and Sequence OT: For sequence alignment, OT can be further specialized to efficient, linear-time monotonic alignments, sidestepping the need for marginalizations over all possible paths (Kaloga et al., 3 Feb 2025).

Convex 1D OT for Distributional Preference Alignment: In distributional LLM alignment, the first-order stochastic dominance constraint between reward distributions is relaxed to a 1D convex OT minimization with closed-form via empirical quantile matching (Melnyk et al., 2024).

2. Algorithmic Solutions and Computational Properties

Sinkhorn–Knopp Fixed Point Iterations: Entropically regularized OT problems are efficiently solved by alternating row/column normalizations on the Gibbs kernel $\mu$ 2, setting

$\mu$ 3

and forming $\mu$ 4 (Yang et al., 7 Mar 2025, Chen et al., 2020).

Unbalanced and Generalized Sinkhorn: Unbalanced OT introduces relaxation parameters (e.g., $\mu$ 5) controlling marginal deviations, with fixed point exponents $\mu$ 6 (Janati et al., 2022, Janati et al., 2019).

IPOT, Greedy/Assignment Solvers: In contexts requiring strict one-to-one mappings, as in entity or discrete sequence alignment, assignment is solved via greedy matching or the Hungarian algorithm (Ding et al., 2022, Hu et al., 8 Oct 2025).

Cluster/Anchor Block Decomposition: Hierarchical or anchor-based OT exploits block structure for computationally cheaper per-cluster alignment and improved sample efficiency (Lin et al., 2020, Lee et al., 2019).

Soft-DTW for Temporal Alignment: Where time warping is integral, soft-DTW provides a differentiable surrogate, allowing for quadratic sensitivity to time shifts (Janati et al., 2019, Janati et al., 2022).

Closed-Form Solutions in Low-Dimensional Latents: If data distributions in latent space are approximated as Gaussians, the Monge map can be computed analytically, yielding an affine alignment with $\mu$ 7 cost (Struckmeier et al., 2023).

Gradient-Based or Block-Coordinate Descent: End-to-end learning of parameters, especially when OT is part of a deep model, employs projected gradient descent with entropic projection, or block-coordinate/Sinkhorn + SGD alternations (Chuang et al., 2022, Yu et al., 26 Feb 2025).

3. Practical Applications Across Domains

AOT has been instantiated, extended, and empirically validated in diverse alignment contexts:

Partial and Noisy Domain Adaptation: Adaptive-mass AOT filters spurious matches and outperforms classical OT and partial OT alternatives under class/domain imbalance (Yang et al., 7 Mar 2025).
Few-Shot Systematics Mitigation: OT feature alignment robustly adapts to out-of-distribution or “systematic”-contaminated test sets, even with handfuls of unlabelled samples (Hassan et al., 14 Nov 2025).
Protein Global and Local Structure Alignment: Both global correspondence (UniOTalign) via FUGW and substructure mapping (PLASMA) via entropic OT yield state-of-the-art performance alongside interpretability and permutation-robustness—not attainable by DP or heuristic methods (Hu et al., 8 Oct 2025, Wang et al., 12 Oct 2025).
Entity and Network Alignment: OT-based matching resolves many-to-one conflicts, integrates local and global graph structure, and is scalable to large networks. End-to-end coupling with embedding learning further improves resilience to node/edge attribute noise (Ding et al., 2022, Yu et al., 26 Feb 2025).
Graph/NLP Multimodal Alignment: GOT and OT-based cross-domain matching enhance retrieval, VQA, and captioning by explicit entity-level and structure-level regularization, with sparse, interpretable transport plans (Chen et al., 2020, Yuan et al., 2020).
Spatio-Temporal Signal Averaging: STA combines soft temporal alignments (soft-DTW) with unbalanced OT for spatial measures, providing sharper, better-aligned barycenters for neural and video data (Janati et al., 2022, Janati et al., 2019).
Distributional LLM Preference Alignment: 1D OT with a convex penalty allows tight enforcement of first-order stochastic dominance, leading to state-of-the-art alignment on benchmark models and efficient closed-form evaluation (Melnyk et al., 2024).

4. Theoretical Guarantees and Key Insights

Existence and Duality: AOT solutions exist under broad conditions, with duals extending the classical Kantorovich structure but with important modifications (e.g., $\mu$ 8 for adaptive OT) (Yang et al., 7 Mar 2025).
Adaptive Mass Allocation: Only “active” (low-cost) entry pairs receive nonzero mass, and AOT solutions saturate source/target margins only where cost is negative, inherently filtering mismatches (Yang et al., 7 Mar 2025).
Robustness and Outlier Insensitivity: Variants with mutual information, anchor/bottleneck structure, or soft clustering demonstrate statistical advantages: $\mu$ 9 convergence rates and improved robustness to contamination, versus the $\nu$ 0 curse for full unstructured supports (Lin et al., 2020, Chuang et al., 2022).
Sample Complexity: In 1D convex AOT (LLMs), violation of the FSD constraint diminishes at $\nu$ 1 (Melnyk et al., 2024).
Efficiency: OT-based plans can be computed in $\nu$ 2 (Sinkhorn), $\nu$ 3 (affine latent), $\nu$ 4 (spatio-temporal), with further reductions via hierarchical or block-structured decompositions (Struckmeier et al., 2023, Janati et al., 2022, Lee et al., 2019).

5. Domain-Specific Methodological Extensions

Adaptive Partial Domain Alignment: Custom cost matrices leveraging both feature and label pseudo-probabilities capture intra-class relations and cross-predictive agreement (Yang et al., 7 Mar 2025).
Latent/Anchor Transport: Bottlenecking through latent anchors enables interpretable cluster-level alignments and denoising in high-noise/high-dimension regimes (Lin et al., 2020).
Mutual Information Augmentation: InfoOT's kernel-smoothed MI term ensures cluster coherence, enables out-of-sample projections, and improves generalization (Chuang et al., 2022).
Handling Non-Monotonic or Non-Sequential Structure: GW and unbalanced marginals accommodate gaps, shuffling, and partial matching in sequences, unlike DP (Hu et al., 8 Oct 2025).
Hyperbolic Geometry: Alignment extends naturally to gyrovectors on the Poincaré ball, preserving tree/hierarchy-aware alignment (Hoyos-Idrobo, 2020).
Spatio-Temporal Averaging: STA yields templates invariant to time/space shifts, addressing core challenges in neural, genomic, or video data lysing (Janati et al., 2022).

6. Empirical Evidence and Performance Highlights

AOT consistently matches or improves upon state-of-the-art baselines across vision, language, bioinformatics, multi-omics, and neural signal decoding:

Application	AOT Variant	Empirical Highlights	Reference
Partial domain adaptation	Adaptive OT	+2–4% acc. over m-POT	(Yang et al., 7 Mar 2025)
Systematics mitigation	Regularized OT	OOD accuracy up to 90%	(Hassan et al., 14 Nov 2025)
Protein alignment	FUGW/entropic OT	69.9% recall, circular robust	(Hu et al., 8 Oct 2025)
Protein substructure	Entropic OT (PLASMA)	0.95–0.99 ROC AUC	(Wang et al., 12 Oct 2025)
Entity/network alignment	OT-guided/JOENA	+16% MRR, 20x speedup	(Yu et al., 26 Feb 2025)
Spatio-temporal barycenter	ST-OT (STA)	Sharper/shift-invariant means	(Janati et al., 2022)
LLM preference	1D OT (AOT penalty)	SOTA on AlpacaEval	(Melnyk et al., 2024)

7. Key Limitations and Open Problems

Non-convexity and Initialization: Many structured AOTs are non-convex, requiring prudent initialization, e.g., via clustering or spectral methods (Janati et al., 2022).
Hyperparameter Sensitivity: Entropic and unbalanced penalties, number of anchors or clustering parameters may require tuning; adaptive schemes remain an area of development.
Scalability for Large $\nu$ 5: Although Sinkhorn and low-rank variants mitigate quadratic scaling, extremely large graphs or distributions pose ongoing computational challenges (Lee et al., 2019).
Extension to Multi-marginal and Non-Euclidean Settings: Recent advances address this partially, but automated or principled approaches for many-domain and manifold-valued data remain active topics.
Interpretability versus Flexibility: Bottleneck/anchor methods offer superior structure interpretability at the expense of potential fine-grained alignment capacity (Lin et al., 2020). A careful balance is context-dependent.

In summary, Alignment via Optimal Transport offers a principled, highly flexible set of methodologies for aligning diverse data types under broad statistical, geometrical, and algorithmic constraints. It enables robust, interpretable, and scalable alignment in structured and unstructured, partial, noisy, and high-dimensional settings, and continues to be extended into new areas of AI, computational biology, and beyond (Yang et al., 7 Mar 2025, Hassan et al., 14 Nov 2025, Hu et al., 8 Oct 2025, Wang et al., 12 Oct 2025, Janati et al., 2022, Lin et al., 2020, Melnyk et al., 2024).