Entropy-Regularized Linear Programming
- The paper introduces an entropy-regularized LP that adds a negative Shannon penalty to smooth the feasible region and enforce strict convexity.
- It demonstrates exponential convergence to LP optima with non-asymptotic error bounds achieved via properties like weak convexity and the Sinkhorn iteration.
- The approach underpins scalable algorithms for optimal transport and machine learning, balancing computational efficiency with trade-offs between accuracy and runtime.
An entropy-regularized linear programming approach augments a standard linear program (LP) with a negative Shannon entropy penalty. This method smooths the polyhedral feasible region, leads to strictly convex objectives, and critically underpins the scalability of algorithms for optimal transport and related large-scale optimization in machine learning. At its core, an entropy penalty enables exponentially fast, quantifiable convergence to LP optima while admitting algorithmic strategies (e.g., Sinkhorn iteration) with favorable computational and parallelization properties. This framework also provides non-asymptotic explicit error bounds, elucidates sharp trade-offs between accuracy and computational effort, and demonstrates fundamental limits on the achievable complexity for certain combinatorial LPs such as the assignment problem (Weed, 2018).
1. Classical Linear Programs and Entropic Penalties
A standard LP in minimization form is
where is assumed bounded and nonempty, and is not constant on . The entropy-regularized variant introduces a negative Shannon entropy penalty,
with regularization parameter , transforming the objective to
As , the penalty vanishes, recovering the original LP. For moderate , strong convexity of the entropic term facilitates efficient algorithms, notably the Sinkhorn method.
2. Quantitative Error Bounds and Exponential Convergence
Let and . To relate the entropic and original optima, define the suboptimality gap , the -radius , and the entropic radius .
A non-asymptotic convergence theorem establishes
valid for any LP. Explicitly, if ,
The proof decomposes as a convex combination of optimal and suboptimal vertices and uses weak convexity properties of entropy. Notably, the exponential rate is optimal and matching lower bounds exist: for a rescaled simplex, with , the rate is tight up to constants. No improvement is possible in the dependencies on (Weed, 2018).
3. Limitation: Assignment Problem and Complexity Barriers
Consider the assignment (minimum-cost perfect matching) LP: The Birkhoff polytope then has . The exponential convergence theorem implies that to reach -objective accuracy, one must set
Sinkhorn-type algorithms require time per run, resulting in a total complexity of , which precludes the existence of a near-linear time () approximation scheme for the assignment problem by entropy-regularized means alone. Furthermore, if , recovery of even a constant-factor approximate assignment is impossible (Weed, 2018).
4. Methodological Components and Key Lemmas
The analysis leverages several fundamental properties of the entropy function. Key results include:
- Weak convexity: For any nonnegative , and ,
with .
- Monotonicity and scalar bounds on the binary entropy facilitate the derivation of sharp fixed-point bounds for the convex combination weights in the optimality analysis.
- The analysis holds uniformly for arbitrary LPs, not just for specific instances such as transport polytopes.
These structural insights underpin both the exponential convergence rate and the necessity for large regularization in combinatorially complex LPs.
5. Practical Implications for Machine Learning and Optimal Transport
In large-scale optimal transport (OT) and machine learning, entropy regularization via the Sinkhorn algorithm is widely adopted for computational expedience on GPUs and other parallel hardware. However, exact objective accuracy requires
For OT problems on size , typically , whence is required. Small yields fast but biased solutions, while increasing reduces bias exponentially slowly. There is a trade-off: computationally efficient but approximate solutions when is modest, and high-precision solutions only at high computational cost. In coarse ML applications where approximate distances suffice, moderate entropic regularization is typically acceptable. For exact recovery or fine-grained OT, the exponential rate in governs achievable bias (Weed, 2018).
A critical lesson is that the entropic diameter () and the data-dependent condition number () jointly determine the optimal choice of . Future work aims to precisely estimate the energy spectrum of near-optima (the distribution of and ) for adaptive parameter tuning.
6. Summary and Outlook
The entropy-regularized LP approach yields exponentially fast, fully explicit convergence to LP optima across arbitrary problem instances, supported by sharp upper and lower bounds. There are foundational limitations for combinatorial LPs—for instance, the assignment problem cannot be solved in near-linear time by entropic smoothing alone. Nonetheless, the method underpins scalable algorithms for large-scale OT and machine learning, where practical trade-offs between accuracy and runtime must be balanced by tuning the regularization parameter in light of intrinsic geometric characteristics of the LP feasible region (Weed, 2018).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free