Momentum-Conic Descent (MOCO) Optimization
- Momentum-Conic Descent (MOCO) is an advanced optimization method for convex conic programs enhanced by heavy-ball momentum to boost convergence in both primal and dual formulations.
- It employs a geometric ray-search strategy that alternates between ray minimization and a Frank–Wolfe-type subproblem for efficient descent over closed convex cones.
- MOCO integrates preconditioning and memory-efficient sketching techniques, making it highly effective for large-scale semidefinite programming in signal processing and machine learning.
Momentum Conic Descent (MOCO) is an advanced first-order optimization method designed for convex conic programs where the objective is minimized over a closed convex cone. MOCO generalizes the original Conic Descent (CD) algorithm by incorporating a heavy-ball momentum term, yielding enhanced convergence rates and efficiency in both primal and dual formulations. This algorithm is particularly relevant for large-scale semidefinite programming (SDP) problems in signal processing and machine learning, and introduces innovations in stopping criteria, preconditioning, and memory-efficient computation for low-rank solutions (Li et al., 2023).
1. Primal and Dual Formulation of Conic Programs
Consider the convex conic program:
where is a closed convex cone, and is convex and differentiable. The equivalent unconstrained formulation leverages the indicator function:
The Fenchel dual is expressed as:
resulting in the dual problem:
where is the convex conjugate of , and is the dual cone. Strong duality holds under mild regularity conditions such as Slater's condition.
2. Geometric Ray-Search Intuition and Algorithmic Structure
Every admits the representation , with , , and scalar . The algorithm first solves a univariate problem along each ray:
Finding the optimal reduces to a compact search over directions on the cone. Conic Descent alternates between ray minimization and a Frank–Wolfe-type subproblem for ray search:
- Ray minimization: .
- Ray search: with as the descent direction.
MOCO extends this by incorporating a momentum term via heavy-ball averaging for , enhancing descent speed.
3. Momentum-Conic Descent (MOCO) Algorithm
MOCO iteratively updates both the search direction and scaling using momentum-augmented gradients. The principal steps per iteration are:
- Ray Minimization: .
- Momentum Update: , with .
- Frank–Wolfe Subproblem (Ray Search): .
- Step-Size Line Search: .
- Primal Update: .
At termination, the solution is given by .
Key MOCO equations:
- Conic dual:
- Descent direction:
- Heavy-ball momentum:
- Primal update:
4. Convergence Rates and Proof Sketches
Convergence analysis for MOCO under strict convexity and Lipschitz gradient conditions shows:
- Primal Rate:
where quantifies additional reduction from momentum.
- Dual Rate:
Thus, an -approximate KKT point is obtained in iterations.
The proof leverages Bregman-type lower bounds built from linearizations of , and invokes a generalization of Nesterov’s lemma to relate primal and dual gaps.
5. Stopping Criterion and Preconditioning Techniques
Direct computation of the dual residual requires a projection onto , often computationally expensive. Instead, MOCO uses the subproblem multiplier:
with guaranteed rate:
Termination is certified when , yielding .
Preconditioning by linear change-of-variables can sharply reduce the dual error constant. An appropriately chosen positive-definite that balances the Hessian and cone geometry minimizes , thereby accelerating convergence.
6. Memory-Efficient MOCO for SDP with Low-Rank Structure
MOCO adapts for large-scale semidefinite programs (SDP):
with linear and with -Lipschitz gradient. To circumvent storing , MOCO maintains:
- A random sketch , using fixed Gaussian with
The affine update for the sketch:
where is the minimal-eigenvector of . Each Frank–Wolfe iteration costs via Lanczos, with total memory . Recovery of an -accurate from the sketch is controlled by the true rank and the excess singular values, provided .
7. Empirical Performance and Practical Guidelines
Numerical experiments demonstrate MOCO's effectiveness on raised-up SDP problems such as matrix completion and phase-retrieval:
- For matrix completion ( recovery from noisy, partial entries), MOCO and CD have comparable runtime–primal error profiles, but greedy-accelerated MOCOg outperforms all methods at large .
- For phase-retrieval (rank-1 SDP lifted from quadratic measurements), MOCOg and a heuristic step-size variant (MOCOh) match or surpass CDg in visual quality and runtime-loss performance, significantly outperforming standard Frank–Wolfe approaches.
Noteworthy practical observations include:
- Momentum-augmented Frank–Wolfe within the conic framework (MOCO) yields tighter convergence by the positive momentum term .
- The stopping criterion is efficiently computed and directly certifies dual feasibility.
- Preconditioners significantly reduce dual residual constants, allowing earlier termination.
- The memory-efficient variant using sketching is effective for large-scale SDP with rigorous low-rank recovery.
- Greedy acceleration (Burer–Monteiro step) and heuristic step-size selection (e.g., where ) expedite convergence without substantial additional memory cost (Li et al., 2023).