Papers
Topics
Authors
Recent
Search
2000 character limit reached

Cyclic Accelerated Coordinate Descent (CACD)

Updated 30 March 2026
  • CACD is a deterministic accelerated algorithm that employs fixed cyclic updates with Nesterov momentum to optimize block-wise smooth convex objectives.
  • It leverages reproducible cyclic sampling and performance estimation frameworks to clarify worst-case bounds and enhance data locality.
  • Extensions like A-CODER adapt CACD for composite and finite-sum problems, offering robust solutions in cache-optimized and structured settings.

Cyclic Accelerated Coordinate Descent (CACD) is a class of first-order optimization algorithms designed for unconstrained minimization of (block-)coordinate-wise smooth convex functions. Unlike classical coordinate descent schemes, which may select coordinates randomly or greedily, CACD performs accelerated updates in a fixed cyclic order over blocks or coordinates. The key innovation is the combination of deterministic cyclic sampling—ensuring reproducibility and favorable data locality—with Nesterov-type acceleration schemes, which theoretically afford faster rates in randomized settings. Despite their practical appeal and widespread use, the theoretical properties of deterministic cyclic accelerated methods remain less understood than their randomized counterparts; recent worst-case analyses and performance estimation paradigms have significantly clarified the attainable rates, revealing both fundamental limitations and sharp upper bounds (Kamri et al., 22 Jul 2025, Wright, 2015, Lin et al., 2023).

1. Algorithmic Structure of CACD

The canonical setup assumes a partition of Rd=Rd1××Rdp\mathbb{R}^d = \mathbb{R}^{d_1} \times \cdots \times \mathbb{R}^{d_p}, a smooth convex objective f:RdRf:\mathbb{R}^d\to\mathbb{R} with block-wise Lipschitz constants L1,,LpL_1, \dots, L_p, and a cyclic scheduling policy. The CACD iteration maintains a pair of sequences (“main” and “momentum” points, xx and zz), a momentum parameter θi\theta_i, and applies—at each step—a coordinate or block update for block =(imodp)+1\ell = (i \bmod p) + 1: yi1=(1θi1)xi1+θi1zi1, zi=zi1γpθi1U()f(yi1), xi=yi1+pθi1(zizi1), θi=θi14+4θi12θi122,\begin{align*} y_{i-1} &= (1-\theta_{i-1}) x_{i-1} + \theta_{i-1} z_{i-1}, \ z_i &= z_{i-1} - \frac{\gamma_\ell}{p\, \theta_{i-1}} U_\ell \nabla^{(\ell)} f(y_{i-1}), \ x_i &= y_{i-1} + p \theta_{i-1} (z_i - z_{i-1}), \ \theta_{i} &= \frac{ \sqrt{ \theta_{i-1}^4 + 4 \theta_{i-1}^2 } - \theta_{i-1}^2 }{2}, \end{align*} where γ=1/L\gamma_\ell = 1/L_\ell and UU_\ell selects block \ell. Each “cycle” consists of pp such updates. This structure mirrors Nesterov’s acceleration but replaces randomized coordinate selection with a deterministic sweep (Kamri et al., 22 Jul 2025, Wright, 2015).

2. Interpolation Conditions and PEP-based Analysis

To rigorously characterize the worst-case performance of CACD and related block coordinate schemes, the Performance Estimation Problem (PEP) framework is employed (Kamri et al., 22 Jul 2025). This approach introduces necessary (and for pairs, sufficient) interpolation inequalities for the class Fcoord\mathcal{F}_{\mathrm{coord}} of block-smooth convex functions: fifj+gj(),xi()xj()+12Lgi()gj()2,f_i \geq f_j + \langle g_j^{(\ell)}, x_i^{(\ell)} - x_j^{(\ell)} \rangle + \frac{1}{2 L_\ell} \| g_i^{(\ell)} - g_j^{(\ell)} \|^2, for all i,ji, j and \ell, encoding both convexity and block-smoothness. The PEP is then instantiated as a finite-dimensional SDP that yields sharp numerical bounds on the worst-case error. Crucially, PEP-based bounds enable the discovery of scale-invariance properties and precise quantification of the dependence on the number of blocks.

3. Convergence Rates: Deterministic vs. Randomized

Randomized Accelerated Coordinate Descent (RACD) achieves the classical O(1/N2)O(1/N^2) rate in expectation using random coordinate choices (Fercoq–Qu 2015). Formally,

E[f(xN)f]4p2(N1+2p)2R2,\mathbb{E}[f(x_N) - f^*] \leq \frac{4p^2}{(N-1+2p)^2} R^2,

where NN is the number of updates, pp the block count, and R2R^2 bounds the initial distance (Kamri et al., 22 Jul 2025).

By contrast, deterministic cyclic CACD provably fails to attain a 1/K21/K^2 worst-case convergence rate. PEP-based upper bounds for CACD decay between O(1/K)O(1/K) and O(1/K2)O(1/K^2), but K2K^2-rescaled errors are non-decreasing in KK, directly ruling out an accelerated O(1/K2)O(1/K^2) guarantee. The fundamental lower bound for deterministic cyclic methods matches that of gradient descent multiplied by the block count: no cyclic scheme with step-sizes 1/L1/L_\ell can uniformly beat O(1/K)O(1/K) (Kamri et al., 22 Jul 2025). Therefore, acceleration with cyclic ordering does not match the behavior of the randomized protocol.

4. Structural Phenomena and Scalability

Several key phenomena revealed by PEP analyses are:

  • Scale-Invariance: The worst-case performance of cyclic CD with per-block steps γ/L\gamma_\ell/L_\ell and norm L\|\cdot\|_L depends only on pp and the normalized steps {γ}\{\gamma_\ell\}; rescaling each block by L\sqrt{L_\ell} reduces to the uniform case L1==Lp=1L_1 = \cdots = L_p = 1 (Kamri et al., 22 Jul 2025).
  • Linear Dependence on Block Count: Empirically, PEP upper bounds for cyclic CD grow linearly (not cubically) with pp, refining earlier, more pessimistic analytical results.
  • Descent Lemma: For p=2p=2 blocks, one can prove a double-sided decrease inequality on the objective, yielding an O(1/K)O(1/K) bound on the partial-gradient norm. This illuminates the global structure of progress under cyclic updates.

This suggests that while CACD does not admit the same acceleration as in randomized settings, its worst-case performance scales more gracefully with problem dimension than previously believed.

5. Algorithmic Extensions and Composite Optimization

Recent developments include the Accelerated Cyclic Coordinate Dual Averaging with Extrapolation (A-CODER), which extends CACD to composite convex problems, supporting block-separable nonsmooth regularizers via proximal oracles (Lin et al., 2023). In this framework, dual averaging and gradient extrapolation are combined over cyclic block passes, with an adaptive momentum schedule maintaining accelerated rates. For finite-sum problems, variance-reduced variants (VR-A-CODER) deliver state-of-the-art complexity, matching the ergodic rates of Katyusha-type schemes but without stochastic coordinate selection.

A summary of extensions is provided below:

Variant Problem Setting Rate (Convex)
Classical CACD smooth, unconstrained Between O(1/K)O(1/K) and O(1/K2)O(1/K^2) (worst-case)
RACD (randomized) smooth, unconstrained O(1/K2)O(1/K^2) (expected)
A-CODER composite, block-separable O(1/k2)O(1/k^2)
VR-A-CODER composite, finite-sum O(1/k2)O(1/k^2)

(Lin et al., 2023, Kamri et al., 22 Jul 2025)

6. Practical and Implementation Considerations

CACD and its variants provide fully deterministic schemes, appealing in settings where reproducibility, memory access patterns, or data locality are important (e.g., embedded, cache-optimized systems). Momentum vectors and auxiliary sequences can be maintained efficiently by exploiting sparsity and avoiding redundant calculations. Empirically, CACD often matches or exceeds ARCD on well-structured convex problems where the block ordering synergizes with the problem's coupling pattern; ARCD may be preferable when heterogeneity or parallelization demands are dominant (Wright, 2015).

Though worst-case rates are weaker for cyclic acceleration, in practical regimes the faster convergence of deterministic schemes is often observed. However, in adversarially-coupled or pathological cases, deterministic cyclic schemes may stall, but convexity precludes this under the standard assumptions.

7. Research Directions and Open Problems

Despite recent advances in worst-case analysis—especially via the PEP approach—the precise convergence behavior of cyclic accelerated schemes remains partially open, especially for non-uniform block structures, dependence on block correlations, and adaptive step-size schedules. The extension of acceleration to nonconvex, large-scale, and composite optimization with deterministic cyclic orderings is receiving increased attention. Further closing the gap between practical performance and theoretical lower bounds, particularly for “intermediate” problem structures between the worst-case and typical cases encountered in applications, represents a central direction for future research (Kamri et al., 22 Jul 2025, Lin et al., 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cyclic Accelerated Coordinate Descent (CACD).