Dual Spectral Projected Gradient (DSPG)
- Dual Spectral Projected Gradient (DSPG) is a first-order algorithm designed for solving dual log-determinant semidefinite programming problems with linear constraints and nonsmooth regularizers.
- It utilizes spectral projected gradients with Barzilai–Borwein step-size selection and a nonmonotone line search to efficiently converge to first-order optimality.
- The method scales to high-dimensional settings by reducing per-iteration costs while outperforming interior-point methods in speed and accuracy for sparse covariance and graphical model estimation.
The Dual Spectral Projected Gradient (DSPG) method is a first-order algorithm designed for efficiently solving the dual of log-determinant semidefinite programming (SDP) problems subject to linear equality constraints and nonsmooth convex regularizers, with core applications in sparse Gaussian graphical model selection, covariance estimation, and related high-dimensional inference problems. DSPG generalizes the spectral projected gradient framework of Birgin et al. to log-determinant optimization, enabling the rapid solution of both standard and structured covariance selection SDP instances at scales not tractable by interior-point or conventional first-order methods (Nakagaki et al., 2018, Namchaisiri et al., 2024).
1. Problem Formulation
The primary domain of DSPG is the regularized log-determinant SDP, frequently appearing in graphical lasso-type estimation. The canonical primal form is
where denotes the set of real symmetric matrices, encodes sample information (typically empirical covariance), is a log-barrier parameter, are element-wise regularization weights, and encodes linear equality constraints.
The dual problem introduces variables for equality constraints and for the elementwise box constraints : 0
1
The gradient of 2 is
3
Extensions accommodate structured penalties such as block-wise, group-wise, or hidden cluster 4-like terms by generalizing the dual with additional dual variables and projections, as in (Namchaisiri et al., 2024, Namchaisiri et al., 2024).
2. Algorithmic Framework
DSPG is a nonmonotone projected gradient algorithm using Barzilai–Borwein (BB) step-size selection and line search, formulated for the dual SDP:
- Initialization: Select 5 or, for generalized settings, 6 so that dual feasibility and 7 are satisfied. Set algorithmic parameters 8.
- Stopping Test: Compute the projected gradient direction:
9
where 0 is the projection onto the feasible set (e.g., box and LMI constraints). If 1, terminate.
- Spectral Step and Search Direction: Compute BB-scaled search direction
2
with 3 determined by BB update.
- Dual Feasibility Safeguard: Ensure 4 by restricting step sizes using the minimum eigenvalue of the direction in the transformed metric.
- Nonmonotone Line Search: Seek the maximal 5 (by geometric reduction, e.g., 6) satisfying
7
for globalization.
- Update: Set 8 and update 9 via BB-type rule.
Table: Key steps and operations in DSPG (generalized form)
| Step | Operation | Dominant Cost per Iteration |
|---|---|---|
| Gradient Eval | 0, compute 1 | Cholesky factorization 2 |
| Projections | Box, ball, and custom projections | 3 – 4 (PAVA, sorting) |
| Line Search | Feasibility safeguard, function evaluations | 5 per trial |
3. Projection Operators and Special Structures
Projection onto box constraints is component-wise truncation: 6 In advanced settings with hidden clustering, an auxiliary variable 7 is introduced, constrained to lie in the image of 8 for suitably bounded 9. Projection onto this set reduces to isotonic regression (ordered 0-regression), efficiently solved via the pool-adjacent-violators algorithm (PAVA), which incurs 1 complexity for 2 variables (Namchaisiri et al., 2024).
For generalized regularizers (block, group, multitask), projections onto 3-balls or block vehicle sets are performed separately for each variable block, often via closed-form or efficient sorting-based routines (Namchaisiri et al., 2024).
4. Convergence Properties
Convergence of DSPG is established under standard conditions: surjectivity of 4, strict primal and dual feasibility, and bounded level sets. Key properties include:
- All iterates remain in a compact level set of the dual objective 5.
- The search direction is a true ascent direction when nonzero.
- The line search always results in step size 6 bounded from below by a positive minimum.
- BB step-sizes remain within given bounds.
- Either finite termination occurs, or the projected gradient vanishes asymptotically (7), ensuring first-order optimality.
- Under convex-concave structure and dual-primal strong duality (Slater’s condition), the dual optimizer reconstructs the primal optimizer uniquely.
- No global linear convergence rate is claimed; local linear convergence is possible under local strong concavity and smoothness (Nakagaki et al., 2018, Namchaisiri et al., 2024).
5. Computational Complexity and Scalability
The main per-iteration cost stems from a Cholesky factorization of an 8 matrix (9) and the subsequent cost of projections. For structure-exploiting cases (e.g., when 0 or regularization operators are sparse or block-diagonal), this cost can be reduced. Projection operations scale as 1 (component-wise constraints) or 2 for isotonic regression in hidden clustering models (Namchaisiri et al., 2024). Overall memory requirements are modest, allowing the method to scale to large problem instances (3 up to 4000–5000).
6. Numerical Performance
Empirical benchmarks report that DSPG solves standard sparse and structured covariance selection SDPs with 4 up to 5000 in 5–6 iterations, achieving primal-dual gaps below 7 and outperforming inexact primal-dual interior-point, adaptive spectral projected gradient (ASPG), and Nesterov’s smooth method in wall-clock time, especially for moderate to high-accuracy requirements. For hidden clusters, isotonic projection reduces total runtime by several orders of magnitude compared to direct approaches (Nakagaki et al., 2018, Namchaisiri et al., 2024, Namchaisiri et al., 2024). DSPG is also competitive or superior to specialized solvers such as QUIC on gene expression and structured multitask data, particularly when extended with block or multitask regularizers.
7. Implementation and Practical Considerations
DSPG is parameterized by 8, with typical values 9 to 0, 1, 2–3, and BB step bounds from 4 to 5. Initializing with dual-feasible 6 and maintaining the positivity constraint via Cholesky-based step size control is essential. For large-scale or structured cases, exploiting sparsity in 7 and leveraging efficient projection routines (including PAVA and fast sorting for block norms) is critical for performance. Recommended stopping tolerance is 8, and safeguards against near-singular updates are advised (Namchaisiri et al., 2024, Namchaisiri et al., 2024).
DSPG’s flexibility enables application across a range of log-det SDP problems: standard graphical lasso, hidden-structure precision matrix recovery, multitask graphical model learning, and block/group regularized structure learning. It is particularly suited to problems where moderate to high numerical precision is needed without incurring the cost of explicit KKT system formation, and where the structure of constraints allows efficient projections (Nakagaki et al., 2018, Namchaisiri et al., 2024, Namchaisiri et al., 2024).