Conic Descent Optimization
- Conic Descent is a family of first-order optimization techniques that model objectives with conic structures and geometric duality for both unconstrained and conic-constrained problems.
- CD methods compute effective stepsizes via trial steps and rigorous safeguards, ensuring convergence under smoothness and convexity assumptions.
- Variants like MOCO integrate momentum and memory-efficient strategies to tackle large-scale semidefinite programming with proven O(1/k) convergence rates.
Conic Descent (CD) refers to a family of first-order optimization algorithms designed for both unconstrained and conic-constrained problems. Distinguished by their use of conic models and geometric duality, CD methods systematically deliver efficient stepsizes along search directions and are particularly suited for large-scale, low-storage scenarios in signal processing, machine learning, and semidefinite programming (SDP). CD fuses structure-exploiting model construction, rigorous convergence guarantees, explicit dual certificates, and can be extended with momentum and memory-efficient variants.
1. Conic Descent for Unconstrained Optimization
The archetypal Conic Descent method for unconstrained smooth minimization was introduced by Liu and Liu and rigorously detailed by Liu (Liu et al., 2019). At each iterate , the method constructs a local conic model of the objective %%%%1%%%%: where , , and is a positive definite matrix. The model is selected whenever a quadratic model is poorly justified, detected by the "closeness-to-quadratic" statistic
with and .
The trial step leads to a scalarized conic model in ,
whose stationary point,
is used—subject to safeguards and projection onto Barzilai-Borwein stepsize bounds, when gradient history allows.
The full algorithm incorporates a Zhang–Hager nonmonotone line search and, as fallback, quadratic or derivative-based models if the conic step is invalid. Inner products and updates incur only storage, and the method is robust to poor curvature information.
2. Conic Descent for General Conic Programs
CD was extended to general conic-constrained optimization—minimize subject to for a closed convex cone —with a clear geometric and dual framework (Li et al., 2023). The primal-dual structure motivates an update that alternates between:
- Ray minimization: For current , find scaling minimizing , enforcing .
- Ray search: At , minimize the linear surrogate over , ; this is a Frank–Wolfe subproblem on the dual. Then, perform a one-dimensional search in the direction.
The update is thus
with . This structure allows a unified view of CD as alternating between complementary slackness and dual feasibility pushes.
3. Rigorous Convergence Theory and Stopping Criteria
For -smooth and strictly convex with a convex cone , CD achieves explicit convergence rates in both primal and dual gaps (Li et al., 2023): and
where is the dual cone and is a nonnegative term. These bounds translate directly to the number of iterations required for primal or dual accuracy.
A distinctive feature is an analytic stopping certificate: where is the running average of gradients. The quantity monitors KKT residuals, certifying -solution status when it drops below .
For unconstrained (non-conic) problems, under standard smoothness and convexity assumptions, global convergence and -linear rate for strongly convex are established (Liu et al., 2019).
4. Momentum and Preconditioning Variants
The MOCO (Momentum Conic Descent) algorithm introduces a heavy-ball type averaging of gradients,
with , yielding smoothed dual steps and mitigated oscillations (Li et al., 2023). CD and MOCO share the same rate, though MOCO introduces a nonnegative gap term expressing its momentum benefit.
Preconditioning is enabled by the dual convergence bounds' dependence on . Changing variables for a well-conditioned can dramatically reduce the iteration count for a given dual error. The ideal balances coordinate-wise Lipschitz constants, reducing both and .
5. Algorithmic Summaries and Key Parameters
In unconstrained smooth optimization, CD employs the following scheme:
- For , select step length heuristically.
- For , test quadraticity (). If not quadratic, use the conic model and ; otherwise, fall back to quadratic models or differences.
- Safeguard key model parameters: clamp , ; limit stepsize (typically ).
- Apply nonmonotone line search (Zhang–Hager) to validate new iterates.
- For fallback models, test for gradient collinearity and adapt accordingly ( threshold, for step increase, for finite-difference Hessian).
In the conic case, at each iteration, CD (or MOCO) alternates explicit ray and Frank–Wolfe subproblems on , with memory and per-iteration cost scaling linearly in given access to gradients and projections.
6. Memory-Efficient Variants for Large-Scale Semidefinite Programs
Large SDPs, especially in lifted formulations where , pose severe memory barriers. The memory-efficient MOCO (Li et al., 2023):
- Works with reduced variables in rather than .
- Maintains random sketches for fixed Gaussian matrix , updating via , where solves the current subproblem.
- Enables recovery of low-rank approximations to using and with controlled error, reducing storage cost to .
This scheme preserves convergence for primal and dual certificates and has been validated empirically for matrix completion and phase retrieval tasks, where it achieves solution quality comparable to classic CD and FW but with significantly lower memory usage and runtime (Li et al., 2023).
7. Empirical Performance and Practical Implementation
Numerical experiments (Liu et al., 2019) on benchmark suites (80pAndr and 144pCUTEr) and advanced SDPs (Li et al., 2023) demonstrate:
- On unconstrained problems, CD matches or outperforms Barzilai-Borwein (BB), spectral BB, CGOPT, and CG_DESCENT in both iteration count and total function/gradient evaluations.
- CD solves all 80 large-scale Andrei problems compared to 76 for its closest fallback-variant; in function calls, CD wins on 77% of problems over BB/SBB4.
- In SDPs, memory-efficient MOCO (and MOCO with greedy rank-update steps) achieves lowest primal error versus time for large-scale matrix completion ( up to 1600) and recovers high-dimensional images in lifted phase retrieval at half the runtime and memory of naive approaches.
Algorithmic cost per iteration involves only a single gradient, several inner products, and sparse matrix-vector updates in standard settings; in memory-optimized SDP variants, extra cost is limited to the sketch size .
Safeguard and tuning parameters such as , , , , and nonmonotone search settings should be adapted for problem scaling and to control step lengths, convergence speed, and stability.
References:
- "An Improved Gradient Method with Approximately Optimal Stepsize Based on Conic model for Unconstrained Optimization" (Liu et al., 2019)
- "Conic Descent Redux for Memory-Efficient Optimization" (Li et al., 2023)