MESH: Minimize Entropy of Sinkhorn

Updated 4 February 2026

MESH is an algorithmic framework that iteratively reduces the entropy of Sinkhorn’s optimal transport solutions to produce sparse and interpretable couplings.
It systematically adjusts the cost matrix via normalized gradient steps, thereby concentrating mass on meaningful correspondences with improved computational efficiency.
The framework has demonstrated significant impact in computational biology, enhancing cross-species cell-type matching through improved clustering accuracy and biological interpretability.

MESH (Minimize Entropy of Sinkhorn) is an algorithmic framework for obtaining sparse and interpretable solutions to entropy-regularized optimal transport (OT) problems. While standard Sinkhorn solvers introduce a strong entropic penalty for computational efficiency and smoothness, MESH systematically minimizes the entropy of the resulting coupling by iteratively modifying either the cost matrix or the entropic regularization, achieving transport plans that are both computationally tractable and highly structured. MESH has been particularly impactful in computational biology, specifically in cross-species cell-type matching, and also connects to foundational theory in variational inference, optimal transport, and information geometry.

1. Entropy-Regularized Optimal Transport and the Sinkhorn Algorithm

Entropy-regularized OT seeks a coupling between discrete distributions $p \in \mathbb{R}^n$ and $q \in \mathbb{R}^m$ that minimizes

$L_\alpha(W) = \sum_{i,j} W_{ij} C_{ij} + \alpha \sum_{i,j} W_{ij} (\log W_{ij} - 1),$

subject to the marginal constraints $\sum_j W_{ij} = p_i$ , $\sum_i W_{ij} = q_j$ , and $W_{ij} \ge 0$ (Cuturi, 2013). The entropic penalty parameter $\alpha > 0$ ensures existence and uniqueness of the solution, as well as smoothness and computational tractability via the Sinkhorn–Knopp matrix scaling scheme: $\begin{aligned} & G_{ij} = \exp(-C_{ij}/\alpha),\ & u \leftarrow p \oslash (G v),\quad v \leftarrow q \oslash (G^\top u),\ & W^* = \operatorname{diag}(u) G \operatorname{diag}(v). \end{aligned}$ This plan $W^*$ has entropy

$H(W^*) = -\sum_{i,j} W^*_{ij} \log W^*_{ij},$

which is generically high for non-negligible $\alpha$ , causing the plan to be diffuse and poorly interpretable in matching tasks where sparse solutions are preferred (Qiao, 30 May 2025).

2. The MESH Paradigm: Minimizing Sinkhorn Entropy

The central innovation of MESH is to iteratively reduce the entropy of the Sinkhorn-optimal plan by modifying the OT input, typically the cost matrix $C$ . Instead of passively accepting the high-entropy coupling induced by a fixed entropic penalty, MESH actively "tilts" $C$ so that the Sinkhorn solution concentrates mass on a small number of correspondences. The update at each iteration is: $C' \leftarrow C' - \lambda \frac{ \nabla_{C'} H(W(C')) }{ \|\nabla_{C'} H(W(C'))\| }$ with learning rate $\lambda > 0$ , where $W(C')$ is the current Sinkhorn plan given $C'$ (Qiao, 30 May 2025). Each MESH iteration consists of:

Computing $W(C')$ via Sinkhorn,
Calculating $H(W(C'))$ and its gradient with respect to $C'$ ,
Normalizing and taking a gradient descent step to further reduce entropy.

This yields, at convergence, a sparse $W^*$ interpretable as an optimal matching under the adjusted cost $C'(T)$ .

3. Variational and Geometric Foundations

MESH connects to continuous and discrete-time mirror descent in the space of couplings with entropic regularization (Srinivasan et al., 14 Oct 2025). In continuous time, the Sinkhorn flow for the entropic OT functional corresponds to mirror descent minimizing $H(\pi^Y \mid \nu)$ on the convex set of couplings $\pi$ with fixed marginal $\mu$ : $\dot h_t = -\log \frac{d\pi_t^Y}{d\nu}, \qquad \pi_t \propto \pi_0 \exp(h_t)$ where $\pi_0$ is the reference Gibbs kernel. This flow possesses strong $L^2$ contraction properties in two natural (mirror-Hessian) metrics, and the entropy decays exactly as

$\frac{d}{dt} H(\pi_t^Y \mid \nu) = -\|(I-Q_{\pi_t}) \log \frac{d\pi_t^Y}{d\nu}\|^2_{L^2(\pi_t)}.$

A positive spectral gap (Poincaré inequality) and exponential decay rate for entropy are guaranteed if a logarithmic Sobolev inequality (LSI) holds along the flow, with further implications for the design and stabilization of generative models (Srinivasan et al., 14 Oct 2025).

4. Algorithmic Procedure and Complexity

A standard discrete-time MESH cycle is as follows (Qiao, 30 May 2025):

Initialize $C' \gets C$ (possibly with small Gaussian noise).
Repeat for $T$ $T$ steps:
- Compute $W = \mathrm{Sinkhorn}(C')$ with entropic regularization $\alpha$ .
- Evaluate $H(W)$ and compute $\nabla_{C'} H(W)$ (via autodiff).
- Take a normalized gradient step: $C' \gets C' - \lambda \cdot [\nabla H/\|\nabla H\|]$ .
Output $W^* = \mathrm{Sinkhorn}(C')$ . Total computational complexity is $O(T\, T_\text{sink} ab)$ for $a$ and $b$ types, where $T_\text{sink}$ is the number of inner Sinkhorn iterations. In typical biological datasets ( $a, b$ on the order of tens), 4–5 MESH steps and 100 Sinkhorn iterations suffice for a stable, sparse solution.

For continuous distributions, a mirror-descent discretization with a suitable step size $\gamma$ provides a related scheme: $h_{n+1} = h_n - \gamma \log \left( \frac{ \pi_n^Y }{ \nu } \right )$ driving the dual potentials and hence the plan towards minimal entropy.

5. Practical Guidelines and Tuning

Effective use of MESH involves careful annealing of the entropic parameter $\alpha$ (or $\varepsilon$ ):

Initialize with a large $\varepsilon$ for smooth optimization and fast convergence.
Gradually decrease $\varepsilon$ to sharpen the solution and reduce the entropy, stopping above the machine-precision noise floor (typically $\gtrsim 10^{-3}$ ).
Warm-start Sinkhorn dual variables between outer loops to halve convergence time (Feydy et al., 2018).
Monitor the entropy $H(W)$ or $H_\varepsilon(\pi_\varepsilon)$ and adjust $\varepsilon$ dynamically to maintain numerical stability and consistent gradient magnitudes (raising $\varepsilon$ if gradients become too noisy, lowering it if entropy remains too high).

Stopping criteria can be based on the rate of entropy decay, estimated as

$\Delta H_n \approx -2\lambda \gamma H(\pi_n^Y \mid \nu),$

where $\lambda$ is the LSI constant associated with the geometry or latent measure $\nu$ (Srinivasan et al., 14 Oct 2025).

6. Applications and Empirical Results

MESH has been especially influential in cross-species cell-type matching for evolutionary genomics. In these contexts, sparsity and interpretability of the transport plan are essential (Qiao, 30 May 2025):

On the task of matching 12 macaque retinal bipolar types, standard Sinkhorn plans are nearly uniform (sparsity score $0$, entropy $4.69$), while OT-MESH yields a near-diagonal, sparse matching ( $\text{sparsity} = 0.8678$ , entropy $2.4597$) with high clustering accuracy (ARI $=0.9779$ ).
In broader tests (including mouse-macaque RGC matching), OT-MESH uncovers both well-established homologies and novel correspondences, some of which have been experimentally validated.
Across benchmarks, MESH outperforms projection/classifier baselines, achieving lower entropy, greater sparsity, improved accuracy, and efficient runtimes.

MESH’s approach is compatible with large-scale GPU computation via memory-efficient Sinkhorn implementations (Feydy et al., 2018).

7. Theoretical Implications and Extensions

MESH plays a dual role as both an optimization meta-algorithm and a foundation for statistical machine learning:

By iteratively reshaping the cost or the plan, MESH bridges the geometric bias of Maximum Mean Discrepancy (MMD) metrics and the combinatorial sharpness of OT, with the Sinkhorn divergence interpolating between these regimes (Feydy et al., 2018).
The entropy-minimization logic underlies recent advances in scalable generative modeling, Schrödinger bridges, and OT-based GANs, where latent-space log-Sobolev structure speeds up inner Sinkhorn iterations and improves training stability (Srinivasan et al., 14 Oct 2025).
MESH reframes the use of entropic regularization as a tunable bias/variance control instead of a mathematical necessity, allowing practitioners to trade computational efficiency for interpretability and biological plausibility.

In summary, Minimize Entropy of Sinkhorn (MESH) provides a flexible and theoretically grounded approach for producing sparse and interpretable solutions in entropy-regularized OT problems, with algorithmic, statistical, and practical advantages in diverse applications ranging from genomics to generative modeling (Qiao, 30 May 2025, Cuturi, 2013, Srinivasan et al., 14 Oct 2025, Feydy et al., 2018).