LSEMINK Algorithm for Log-Sum-Exp Minimization

Updated 8 March 2026

LSEMINK algorithm is a modified Newton–Krylov method that minimizes the log‐sum‐exp function using Hessian regularization to achieve rapid and robust convergence.
It employs a Krylov subspace strategy with matrix‐free operations, making it scalable and effective in handling large-scale or ill-conditioned data.
Empirical results demonstrate that LSEMINK outperforms traditional methods in applications like multinomial logistic regression and geometric programming through accelerated objective reduction.

The LSEMINK algorithm is a modified Newton–Krylov method designed for efficient and robust minimization of the log-sum-exp function subject to a linear model, as encountered in geometric programming and multinomial logistic regression. The central innovation of LSEMINK is a Hessian regularization in the row space of the model, yielding rapid and stable convergence in situations where standard Newton methods may fail due to unbounded quadratic models. LSEMINK only requires matrix-vector operations with the data matrix, making it well-suited for large-scale, matrix-free environments and problems with potentially ill-conditioned Hessians (Kan et al., 2023).

1. Problem Structure and Mathematical Foundations

LSEMINK addresses the unconstrained convex minimization problem

$\min_{x\in\mathbb R^n} f(x)\;=\;\log\Bigl\{\sum_{i=1}^m\exp(a_i^T x)\Bigr\}$

where $A=[a_1,\dots,a_m]^T\in\mathbb R^{m\times n}$ is the data matrix and $x\in\mathbb R^n$ is the parameter vector. Common applications include geometric programming and multinomial logistic regression, where this objective arises as a smoothed convex surrogate for maximum-type losses.

The function's gradient admits a closed-form expression: $\nabla f(x) = A^T p(x), \quad p_i(x) = \frac{\exp(a_i^T x)}{\sum_{j=1}^m \exp(a_j^T x)}$ The Hessian is

$\nabla^2 f(x) = A^T \Lambda(x) A, \quad \Lambda(x) = \mathrm{diag}(p(x)) - p(x) p(x)^T$

which is positive semidefinite but may be singular when some $p_i(x)$ concentrate, leading to local quadratic models that are unbounded below in those directions.

2. Modified Newton–Krylov Framework

The LSEMINK algorithm modifies the standard Newton update by regularizing the Hessian: $H_{\mathrm{mod}}(x_k) = \nabla^2 f(x_k) + \beta_k A^T A = A^T [\Lambda(x_k) + \beta_k I_m] A = A^T S_k A$ for some shift parameter $\beta_k > 0$ , and $S_k \succ 0$ . This ensures that the quadratic model

$q_{\mathrm{mod},x_k}(d) = f(x_k) + \nabla f(x_k)^T d + \frac{1}{2} d^T H_{\mathrm{mod}}(x_k) d$

becomes bounded below in the model's effective subspace.

The search direction $d_k$ is defined by the solution of the modified Newton system: $H_{\mathrm{mod}}(x_k) d_k = -\nabla f(x_k)$ The solution lies in the row space of $A$ , guaranteeing consistency and boundedness.

3. Krylov Subspace Strategy and Algorithmic Workflow

LSEMINK applies the Conjugate Gradient (CG) method to compute $d_k$ in a matrix-free fashion, relying only on (potentially efficient) matrix-vector multiplications with $A$ and $A^T$ . Each CG iteration involves one or two such products and local vector operations.

Stability and sufficient decrease in the line search are ensured by adaptively increasing $\beta_k$ if the Armijo condition

$f(x_k + d) \leq f(x_k) + \gamma \nabla f(x_k)^T d$

is not satisfied. Upon line search success, the iterate is updated $x_{k+1} = x_k + d$ ; termination is triggered by tolerance parameters on either primal or gradient progress.

Algorithmic summary:

Input: $A, x_0, \beta_0>0, \gamma\in(0,1)$ , tolerances.
For $k = 0,1,2,...$ $k = 0, 1, 2, ...$ , repeat:
- Evaluate $f(x_k), \nabla f(x_k)$ .
- Set $H_{\mathrm{mod}} = A^T [\Lambda(x_k) + \beta_k I_m] A$ .
- Apply CG to $H_{\mathrm{mod}} d = -\nabla f(x_k)$ , to given tolerance.
- Line search on $f(x_k + d)$ ; double $\beta_k$ if necessary and repeat CG.
- Check convergence; update $x_{k+1}$ .
- Optionally update $\beta_{k+1}$ .
Output: approximate solution (Kan et al., 2023).

4. Theoretical Guarantees and Subspace Properties

Under convexity, differentiability with Lipschitz gradients, and coercivity assumptions, the algorithm is globally convergent: the sequence $\{x_k\}$ converges to a global minimizer regardless of initialization or initial $\beta_0$ . Key analytical properties include:

All iterates and search directions remain in $\mathrm{Row}(A)$ .
Descent is always obtained: $\nabla f(x_k)^T d_k < 0$ .
Armijo line search ensures function decrease.
Monotonic f(x) decrease and diminishing step norm guarantee stationarity. The analysis follows the structure presented in [(Kan et al., 2023), Theorem 3.1 and Lemmas 5.1–5.4].

5. Computational Complexity and Scalability

LSEMINK's per-iteration cost is dominated by matrix-vector products involving $A$ and $A^T$ . No explicit Hessian formation or factorization is needed, allowing for scalability to large $n$ or $m$ . The method is matrix-free and only requires $O(r_{\max})$ auxiliary vectors per iteration, where $r_{\max}$ is the maximum dimension of the Krylov subspace during CG. Efficiency is retained for cases where $m \ll n$ or $n \ll m$ , and the method is effective even when $A$ is rank-deficient.

6. Empirical Performance and Robustness

LSEMINK demonstrates rapid and robust convergence in diverse applications:

Image Classification (Multinomial Logistic Regression): On MNIST and CIFAR-10, LSEMINK achieves 1–2 orders of magnitude faster reduction in $f(x)$ at early iterates and converges within ~30 seconds on standard hardware (an order of magnitude improvement over CVX solvers). Test accuracy and gradient norm are comparable to best competing methods.
Geometric Programming: For minimization tasks regularized by log-sum-exp with strong smoothing (as $\eta \to 0$ ), LSEMINK outperforms standard Newton–CG (which may fail due to indefinite quadratic models) and is substantially faster (15–60×) than CVX/Mosek/SDPT3/SeDuMi. Natural gradient descent is too slow for high-accuracy requirements, while LSEMINK remains robust even near the nonsmooth regime.

Performance characteristics highlight LSEMINK’s excellent initial convergence and its ability to cope with severe ill-conditioning when softmax probabilities are nearly one-hot (Kan et al., 2023).

7. Practical Considerations, Limitations, and Extensibility

Key algorithmic features include only requiring matrix-free access to $A$ , no dependence on sparsity or full-rank structure, and a principled approach to handling nonsmooth or nearly singular situations. Memory requirements are light, as dense Hessian storage is unnecessary. Early stopping, restart, and adaptive $\beta_k$ logic are directly supported.

Limitations are primarily tied to the nature of the objective; the method is tailored to smooth, convex functionals of the log-sum-exp form. A plausible implication is that for non-log-sum-exp targets or objectives lacking this structure, LSEMINK's specific Hessian regularization may be suboptimal.

The invariant subspace property (all iterates in $\mathrm{Row}(A)$ ) may reduce effective dimensionality and is attractive for machine learning problems with redundant parameterizations or nonphysical latent variables.

Table: Summary of LSEMINK’s Core Properties

Feature	Detail	Reference
Objective	Minimize $f(x)=\log\sum_{i=1}^m \exp(a_i^T x)$	(Kan et al., 2023)
Hessian Regularization	$H_{\mathrm{mod}} = \nabla^2 f + \beta_k A^T A$	[(Kan et al., 2023), Eq 3.1]
Solver	Conjugate Gradient in Krylov subspace, matrix-free	(Kan et al., 2023)
Scalability	Suited for large-scale, rank-deficient, or matrix-free settings	(Kan et al., 2023)
Convergence	Global (convex f), all iterates in $\mathrm{Row}(A)$	(Kan et al., 2023)
Applications	Multinomial logistic regression, geometric programming	(Kan et al., 2023)

LSEMINK offers an efficient, robust, and practical approach for convex log-sum-exp minimization, advancing state-of-the-art in Newton-type optimization under challenging high-dimensional and ill-conditioned settings (Kan et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

LSEMINK: A Modified Newton-Krylov Method for Log-Sum-Exp Minimization (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LSEMINK Algorithm.