Projected Sinkhorn Iterations
- Projected Sinkhorn iterations are iterative algorithms for constrained entropy-regularized optimal transport that combine fixed-point scaling updates with explicit projection steps.
- They integrate hybrid methods like block-coordinate ascent, Newton refinement, and proximal operators to enforce feasibility on constraint sets such as Wasserstein balls or marginal polytopes.
- Advances include detailed convergence analysis, phase transition behavior with sparsity, and applications in inverse problems, robust optimization, and generative modeling.
Projected Sinkhorn iterations are iterative algorithms for entropy-regularized optimal transport and related projection problems that combine Sinkhorn-like fixed-point updates with explicit projection steps onto constraint sets, such as Wasserstein balls, marginal polytopes, or other sets defined by convex or nonlinear constraints. These methods arise naturally in contemporary developments linking optimal transport, matrix scaling, mirror descent in the space of probability measures, and continuous-time projection flows, as well as in scalable algorithms for regularized optimal transport under application-specific or robust constraints.
1. Theoretical Foundations and General Framework
Projected Sinkhorn iterations are motivated by the need to efficiently solve entropy-regularized optimal transport (OT) problems under constraints that may include not just fixed marginals but also geometric, statistical, or application-driven restrictions. The classical entropy-regularized OT seeks
where controls the regularization. In projected Sinkhorn, the essential innovation is the introduction of a further projection—either in the variable (the transport plan) or in dual variables (potentials, couplings, or scalings)—onto a nontrivial feasible set, such as a Wasserstein ball or a polytope defined by data-dependent side information.
Prominent algorithms include:
- Block-coordinate ascent on the dual problem with projection steps onto constrained sets.
- Modified Sinkhorn iterations where matrix updates include projection operators, or projections are enacted via Bregman proximal mappings.
- Hybrid frameworks coupling Sinkhorn matrix scaling with Newton or proximal steps, with a sparsification and projection structure to maintain computational tractability (Tang et al., 20 Jan 2024, Wu et al., 6 Feb 2025).
- Discretized and continuous-time flow formulations of mirror descent with built-in constraints, yielding dynamics that remain within the feasibility region throughout the evolution (Karimi et al., 2023, Deb et al., 2023).
2. Algorithmic Structure and Update Mechanisms
The canonical projected Sinkhorn iteration alternates between standard Sinkhorn scaling (normalizing marginals) and projection or proximal steps to maintain satisfaction of side constraints. Typical steps are as follows:
- Sinkhorn update: For transport plan , apply
with updates , .
- Projection step: After each Sinkhorn update or at selected intervals, project (or dual variables) back onto a constraint set. Examples include:
- Projection onto a Wasserstein ball
- (e.g., in adversarial robustness, accomplished by a dual ascent involving a Lagrange multiplier for the cost constraint and auxiliary variables for barycenter projections (Wong et al., 2019)).
- Projection onto sparse or structured supports (enforced via randomized sparsification and reweighting (Li et al., 2023)).
- Bregman proximal steps with respect to entropy or other convex divergences, yielding algorithms such as PINS, which alternate between log-domain Sinkhorn steps and sparsified Newton refinement (Wu et al., 6 Feb 2025).
- Hybrid or split updates: In some frameworks, the updates involve additional split-operator steps (e.g., Douglas–Rachford, ADMM, or primal-dual splitting) where each term in the objective, including OT cost and all constraints, is handled via its own proximal or projection mapping (Karlsson et al., 2016).
3. Convergence, Complexity, and Phase Transitions
The convergence behavior of projected Sinkhorn iterations depends crucially on the geometry and density of the constraint set:
- Exponential convergence is guaranteed in dense settings for unconstrained or regular projection problems, provided regularization and conditioning are controlled (Conforti et al., 2023, Berman, 2017). For instance, if the scaling matrix has density , convergence to doubly stochasticity occurs in iterations and time, which is optimal and matches the time to perform a single matrix-matrix operation (He, 13 Jul 2025).
- Worst-case lower bounds appear for sparser or badly conditioned matrices (), in which case iteration complexity can be for achieving error—this illustrates a phase transition in the Sinkhorn-Knopp algorithm at (He, 13 Jul 2025). This phenomenon persists even under projection, unless additional structure is exploited.
- Continuous-time and mirror descent formulations: Recent analysis has demonstrated that as and iteration count scales as , the projected Sinkhorn trajectory converges to a continuous Sinkhorn flow that is a solution to a parabolic Monge–Ampère PDE or a Wasserstein mirror gradient flow (Deb et al., 2023, Karimi et al., 2023).
- Sample-based and online settings: Projected Sinkhorn-type methods can be adapted for streaming data, achieving nearly sample complexity for regularized OT estimation (Mensch et al., 2020).
4. Applications in Inverse Problems, Robust Optimization, and Machine Learning
Projected Sinkhorn iterations are deployed in a wide array of applications where side constraints or geometric regularization are critical:
- Inverse problems: Integration with splitting frameworks enables the solution of tasks such as limited-angle tomography, where the OT term enforces geometric similarity to a prior and projections ensure data fidelity and regularization constraints are enforced (Karlsson et al., 2016).
- Adversarial robustness: Projected Sinkhorn is used to compute Wasserstein projections in adversarial image attacks, yielding attacks that better model natural invariances than norm balls (Wong et al., 2019). The projection is realized by Sinkhorn-like updates in dual variables with Newton-type corrections and local transport plans for efficiency.
- Generative modeling: Differentiable Sinkhorn divergences, with or without projection, serve as losses for training generative adversarial networks where the projection step may enforce support, sparsity, or metric constraints to reflect domain-specific geometric or statistical structure (Genevay et al., 2017, Scetbon et al., 2020).
- Barycenters and grid-free estimation: Damped (projected) Sinkhorn iterations permit the computation of doubly entropic Wasserstein barycenters, including robust and debiased variants, with global convergence guarantees for both discrete and free-support measures (Chizat et al., 2023).
5. Acceleration Techniques, Sparsification, and Hybrid Methods
Recent work accelerates projected Sinkhorn iterations using Newton-type refinement, sparsification, and proximal operators:
- Sparse Newton acceleration: The Sinkhorn-Newton-Sparse (SNS) approach uses early stopping in standard scaling updates followed by a sparsified Newton method acting on a Lyapunov potential. The Hessian sparsity is exploited so that per-iteration cost matches Sinkhorn (), with dramatically reduced iteration counts and rapid super-exponential local convergence (Tang et al., 20 Jan 2024).
- Sparsification via importance sampling: To maintain computational efficiency, only the most informative matrix entries (as determined by natural upper bounds) are sampled and rescaled, with theoretical consistency guarantees (Li et al., 2023).
- Proximal point and splitting extensions: The PINS framework embeds projected Sinkhorn as the inner loop of an entropic proximal point method, employing Newton refinement with sparse Hessians within each subproblem, resulting in global convergence, better accuracy, and increased robustness to regularization parameter choice (Wu et al., 6 Feb 2025).
- Continuous-time stochastic flows: The Sinkhorn flow admits a McKean–Vlasov diffusion representation, with the flow respecting projections enforced by the splitting structure of the drift and noise in dual coordinates determined by the Hessian of the OT potential (Deb et al., 2023).
6. Interpretations, Generalizations, and Limitations
Projected Sinkhorn iterations admit several interpretations:
- Mirror descent: These iterations implement a mirror gradient method in Wasserstein space, where each projection ensures that the flow remains within a constraint feasible set; the mirror map is defined via the squared Wasserstein distance or relative entropy (Karimi et al., 2023, Deb et al., 2023).
- Dynamic perspective: The discrete iterations approximate a nonlinear parabolic PDE (a parabolic Monge–Ampère or gradient flow equation) that, in the scaling limit, captures the time evolution of the OT potentials and projected marginals (Berman, 2017, Deb et al., 2023).
- Bregman projections: Each Sinkhorn step can be viewed as a Bregman projection (in the KL divergence) onto the polytope of joint measures with prescribed marginals or other constraints, which extends naturally to composite or more general projection sets (Karlsson et al., 2016).
However, the efficacy and convergence rates depend sensitively on problem structure, regularization, and the density of the underlying cost matrix or marginal support. For sparse matrices or highly constrained feasible regions, projected Sinkhorn methods may experience slower convergence unless additional algorithmic structure is introduced (He, 13 Jul 2025). Furthermore, practical acceleration via sparsification and Newton-type steps requires careful control of thresholding and step-size schedules to avoid numerical instability.
In summary, projected Sinkhorn iterations constitute a broad and flexible family of projection-enhanced, entropy-regularized optimal transport algorithms that include classical matrix scaling, deep learning solvers, robust optimization flows, and hybrid Newton–Sinkhorn schemes. Theoretical advances clarify their convergence, scaling, and phase transition behavior as a function of matrix density and regularization, while algorithmic innovations enable efficient deployment in large-scale and constrained OT applications.