Cyclic Accelerated Coordinate Descent (CACD)
- CACD is a deterministic accelerated algorithm that employs fixed cyclic updates with Nesterov momentum to optimize block-wise smooth convex objectives.
- It leverages reproducible cyclic sampling and performance estimation frameworks to clarify worst-case bounds and enhance data locality.
- Extensions like A-CODER adapt CACD for composite and finite-sum problems, offering robust solutions in cache-optimized and structured settings.
Cyclic Accelerated Coordinate Descent (CACD) is a class of first-order optimization algorithms designed for unconstrained minimization of (block-)coordinate-wise smooth convex functions. Unlike classical coordinate descent schemes, which may select coordinates randomly or greedily, CACD performs accelerated updates in a fixed cyclic order over blocks or coordinates. The key innovation is the combination of deterministic cyclic sampling—ensuring reproducibility and favorable data locality—with Nesterov-type acceleration schemes, which theoretically afford faster rates in randomized settings. Despite their practical appeal and widespread use, the theoretical properties of deterministic cyclic accelerated methods remain less understood than their randomized counterparts; recent worst-case analyses and performance estimation paradigms have significantly clarified the attainable rates, revealing both fundamental limitations and sharp upper bounds (Kamri et al., 22 Jul 2025, Wright, 2015, Lin et al., 2023).
1. Algorithmic Structure of CACD
The canonical setup assumes a partition of , a smooth convex objective with block-wise Lipschitz constants , and a cyclic scheduling policy. The CACD iteration maintains a pair of sequences (“main” and “momentum” points, and ), a momentum parameter , and applies—at each step—a coordinate or block update for block : where and selects block . Each “cycle” consists of such updates. This structure mirrors Nesterov’s acceleration but replaces randomized coordinate selection with a deterministic sweep (Kamri et al., 22 Jul 2025, Wright, 2015).
2. Interpolation Conditions and PEP-based Analysis
To rigorously characterize the worst-case performance of CACD and related block coordinate schemes, the Performance Estimation Problem (PEP) framework is employed (Kamri et al., 22 Jul 2025). This approach introduces necessary (and for pairs, sufficient) interpolation inequalities for the class of block-smooth convex functions: for all and , encoding both convexity and block-smoothness. The PEP is then instantiated as a finite-dimensional SDP that yields sharp numerical bounds on the worst-case error. Crucially, PEP-based bounds enable the discovery of scale-invariance properties and precise quantification of the dependence on the number of blocks.
3. Convergence Rates: Deterministic vs. Randomized
Randomized Accelerated Coordinate Descent (RACD) achieves the classical rate in expectation using random coordinate choices (Fercoq–Qu 2015). Formally,
where is the number of updates, the block count, and bounds the initial distance (Kamri et al., 22 Jul 2025).
By contrast, deterministic cyclic CACD provably fails to attain a worst-case convergence rate. PEP-based upper bounds for CACD decay between and , but -rescaled errors are non-decreasing in , directly ruling out an accelerated guarantee. The fundamental lower bound for deterministic cyclic methods matches that of gradient descent multiplied by the block count: no cyclic scheme with step-sizes can uniformly beat (Kamri et al., 22 Jul 2025). Therefore, acceleration with cyclic ordering does not match the behavior of the randomized protocol.
4. Structural Phenomena and Scalability
Several key phenomena revealed by PEP analyses are:
- Scale-Invariance: The worst-case performance of cyclic CD with per-block steps and norm depends only on and the normalized steps ; rescaling each block by reduces to the uniform case (Kamri et al., 22 Jul 2025).
- Linear Dependence on Block Count: Empirically, PEP upper bounds for cyclic CD grow linearly (not cubically) with , refining earlier, more pessimistic analytical results.
- Descent Lemma: For blocks, one can prove a double-sided decrease inequality on the objective, yielding an bound on the partial-gradient norm. This illuminates the global structure of progress under cyclic updates.
This suggests that while CACD does not admit the same acceleration as in randomized settings, its worst-case performance scales more gracefully with problem dimension than previously believed.
5. Algorithmic Extensions and Composite Optimization
Recent developments include the Accelerated Cyclic Coordinate Dual Averaging with Extrapolation (A-CODER), which extends CACD to composite convex problems, supporting block-separable nonsmooth regularizers via proximal oracles (Lin et al., 2023). In this framework, dual averaging and gradient extrapolation are combined over cyclic block passes, with an adaptive momentum schedule maintaining accelerated rates. For finite-sum problems, variance-reduced variants (VR-A-CODER) deliver state-of-the-art complexity, matching the ergodic rates of Katyusha-type schemes but without stochastic coordinate selection.
A summary of extensions is provided below:
| Variant | Problem Setting | Rate (Convex) |
|---|---|---|
| Classical CACD | smooth, unconstrained | Between and (worst-case) |
| RACD (randomized) | smooth, unconstrained | (expected) |
| A-CODER | composite, block-separable | |
| VR-A-CODER | composite, finite-sum |
(Lin et al., 2023, Kamri et al., 22 Jul 2025)
6. Practical and Implementation Considerations
CACD and its variants provide fully deterministic schemes, appealing in settings where reproducibility, memory access patterns, or data locality are important (e.g., embedded, cache-optimized systems). Momentum vectors and auxiliary sequences can be maintained efficiently by exploiting sparsity and avoiding redundant calculations. Empirically, CACD often matches or exceeds ARCD on well-structured convex problems where the block ordering synergizes with the problem's coupling pattern; ARCD may be preferable when heterogeneity or parallelization demands are dominant (Wright, 2015).
Though worst-case rates are weaker for cyclic acceleration, in practical regimes the faster convergence of deterministic schemes is often observed. However, in adversarially-coupled or pathological cases, deterministic cyclic schemes may stall, but convexity precludes this under the standard assumptions.
7. Research Directions and Open Problems
Despite recent advances in worst-case analysis—especially via the PEP approach—the precise convergence behavior of cyclic accelerated schemes remains partially open, especially for non-uniform block structures, dependence on block correlations, and adaptive step-size schedules. The extension of acceleration to nonconvex, large-scale, and composite optimization with deterministic cyclic orderings is receiving increased attention. Further closing the gap between practical performance and theoretical lower bounds, particularly for “intermediate” problem structures between the worst-case and typical cases encountered in applications, represents a central direction for future research (Kamri et al., 22 Jul 2025, Lin et al., 2023).