Papers
Topics
Authors
Recent
Search
2000 character limit reached

Anderson-Accelerated Coordinate Descent

Updated 27 April 2026
  • AA-CD is a method that applies Anderson acceleration to coordinate descent, accelerating fixed-point iterations for improved convergence.
  • It integrates proximal updates with nonlinear extrapolation, achieving speedups of up to 2×–10× on large-scale convex and composite optimization problems.
  • Objective safeguarding and active manifold identification ensure local linear convergence even in nonsmooth or ill-conditioned scenarios.

Anderson-Accelerated Coordinate Descent (AA-CD) denotes the application of Anderson acceleration—a nonlinear extrapolation technique designed to speed up fixed-point methods—to cyclic coordinate descent and proximal coordinate descent methods. AA-CD has demonstrated practical superiority over both traditional first-order and inertially accelerated approaches, particularly on a spectrum of large-scale convex and composite optimization problems central to machine learning and signal processing.

1. Optimization Problem Framework

AA-CD addresses composite convex minimization problems of the form

minxRpF(x)=f(Ax)+λj=1pgj(xj)\min_{x\in\mathbb{R}^p} F(x) = f(Ax) + \lambda\sum_{j=1}^p g_j(x_j)

where:

  • ARn×pA \in \mathbb{R}^{n \times p},
  • f:RnRf: \mathbb{R}^n \to \mathbb{R} is convex and typically LL-smooth,
  • Each gj:RR{+}g_j: \mathbb{R}\to\mathbb{R}\cup\{+\infty\} is proper, closed, convex, and separable.

Canonical instances include least-squares (F(x)=12Axy2F(x) = \frac12\|Ax-y\|^2), Lasso (F(x)=12Axy2+λx1F(x) = \frac12\|Ax-y\|^2 + \lambda\|x\|_1), elastic-net (F(x)=12nAxy2+λx1+ρ2x22F(x) = \frac{1}{2n}\|Ax-y\|^2 + \lambda\|x\|_1 + \frac\rho2\|x\|_2^2), and sparse logistic regression (F(x)=ilog(1+eyi(Ax)i)+λx1F(x) = \sum_{i}\log(1+e^{-y_i(Ax)_i}) + \lambda\|x\|_1) (Bertrand et al., 2020).

2. Classical and Proximal Coordinate Descent

Coordinate descent (CD) optimizes the objective by iteratively updating one coordinate (or block) at a time. For a general composite form as above, cyclic coordinate updates are written as

xjk+1=proxλLjgj(xjk1Ljjf(Axk))x_j^{k+1} = \operatorname{prox}_{\frac{\lambda}{L_j}g_j}\left(x_j^k - \frac{1}{L_j}\nabla_j f(Ax^k)\right)

where each ARn×pA \in \mathbb{R}^{n \times p}0 is a coordinate-wise Lipschitz constant. A complete sweep produces the fixed-point map ARn×pA \in \mathbb{R}^{n \times p}1 so that ARn×pA \in \mathbb{R}^{n \times p}2.

Proximal coordinate descent generalizes this framework to nonsmooth settings via component-wise proximal operators, rendering it applicable for constraints and regularizers prevalent in practice (Bertrand et al., 2020, Li et al., 2024).

3. Anderson Acceleration: Principle and Formulation

Anderson acceleration (AA) aims to enhance fixed-point iterations ARn×pA \in \mathbb{R}^{n \times p}3 by constructing a nonlinear extrapolation over recent iterates. At step ARn×pA \in \mathbb{R}^{n \times p}4, AA forms residuals

ARn×pA \in \mathbb{R}^{n \times p}5

and computes coefficients ARn×pA \in \mathbb{R}^{n \times p}6 solving

ARn×pA \in \mathbb{R}^{n \times p}7

with the extrapolated iterate

ARn×pA \in \mathbb{R}^{n \times p}8

A closed-form solution is available: ARn×pA \in \mathbb{R}^{n \times p}9 where f:RnRf: \mathbb{R}^n \to \mathbb{R}0. Safeguarding is enforced by only adopting the AA iterate if objective function descent is achieved. In the context of coordinate descent, this mechanism is triggered every f:RnRf: \mathbb{R}^n \to \mathbb{R}1 epochs, utilizing the last f:RnRf: \mathbb{R}^n \to \mathbb{R}2 iterates (Bertrand et al., 2020, Li et al., 2024).

4. Integration with Coordinate Descent and Algorithm Description

In AA-CD, Anderson acceleration is wrapped around coordinate descent in the following manner:

  1. Perform f:RnRf: \mathbb{R}^n \to \mathbb{R}3 epochs of coordinate (or proximal coordinate) descent, storing the latest f:RnRf: \mathbb{R}^n \to \mathbb{R}4 iterates f:RnRf: \mathbb{R}^n \to \mathbb{R}5.
  2. Construct the matrix of differences f:RnRf: \mathbb{R}^n \to \mathbb{R}6.
  3. Solve the regularized least-squares problem for coefficients f:RnRf: \mathbb{R}^n \to \mathbb{R}7 constrained by f:RnRf: \mathbb{R}^n \to \mathbb{R}8:

f:RnRf: \mathbb{R}^n \to \mathbb{R}9

  1. Compute extrapolated candidate LL0.
  2. Update LL1 if LL2.

A detailed pseudocode, matching Algorithm 1 ("Online Anderson PCD") in (Bertrand et al., 2020), elaborates these steps, including computational details and safeguarding measures.

5. Convergence Theory

Quadratic and Symmetric Cases

For linear iterations of the form LL3 with LL4 symmetric positive semidefinite and LL5, Anderson acceleration achieves accelerated linear convergence: LL6 The online variant yields an exponential rate modified by the memory parameter LL7.

Non-Symmetric and Composite Settings

For cyclic coordinate descent on quadratics with non-symmetric LL8, sublinear convergence rates are established via polynomial approximation on the numerical range LL9. Symmetrization via forward–backward sweeps enables linear rates up to a factor of gj:RR{+}g_j: \mathbb{R}\to\mathbb{R}\cup\{+\infty\}0.

Nonsmooth/Composite Case

When gj:RR{+}g_j: \mathbb{R}\to\mathbb{R}\cup\{+\infty\}1 and gj:RR{+}g_j: \mathbb{R}\to\mathbb{R}\cup\{+\infty\}2 are sufficiently smooth (locally gj:RR{+}g_j: \mathbb{R}\to\mathbb{R}\cup\{+\infty\}3), local contraction of the fixed-point operator gj:RR{+}g_j: \mathbb{R}\to\mathbb{R}\cup\{+\infty\}4 is guaranteed, and asymptotic acceleration of Anderson-accelerated coordinate updates follows. Objective safeguarding secures global convergence.

A sharp local R-linear convergence result is obtained for nonsmooth problems under "active manifold identification": If the PCD operator identifies a smooth submanifold (e.g., with stabilized support patterns) near a critical point gj:RR{+}g_j: \mathbb{R}\to\mathbb{R}\cup\{+\infty\}5, then AA-CD iterates satisfy

gj:RR{+}g_j: \mathbb{R}\to\mathbb{R}\cup\{+\infty\}6

for some gj:RR{+}g_j: \mathbb{R}\to\mathbb{R}\cup\{+\infty\}7, and gj:RR{+}g_j: \mathbb{R}\to\mathbb{R}\cup\{+\infty\}8; gj:RR{+}g_j: \mathbb{R}\to\mathbb{R}\cup\{+\infty\}9 is the composed coordinate update map (Li et al., 2024).

6. Empirical Performance and Evaluation

Benchmark experiments for AA-CD have been conducted on a range of regression and classification tasks, including least-squares, Lasso, elastic-net, and sparse logistic regression models using datasets from LIBSVM/OpenML (e.g., rcv1, real-sim, news20, leukemia) at varying regularization strengths.

Comparative methods include:

Results demonstrate:

  • PCD outperforms PGD and FISTA on high-dimensional problems.
  • Inertial CD, despite theoretical acceleration, may stall or deteriorate without careful restarts.
  • Anderson-accelerated PGD provides modest improvements over FISTA.
  • AA-CD delivers speedups by factors ranging from 2× to 10× in wall-clock time to a prescribed accuracy threshold, with most pronounced gains in ill-conditioned and low-regularization regimes.
  • Speedup is especially substantial during the phase in which the algorithm has identified the problem's active manifold (Bertrand et al., 2020, Li et al., 2024).

Overhead incurred by AA-CD for managing the least-squares subproblem (with typically F(x)=12Axy2F(x) = \frac12\|Ax-y\|^20 and F(x)=12Axy2F(x) = \frac12\|Ax-y\|^21) remains negligible compared to the dominant cost of data matrix operations.

7. Connections, Limitations, and Theoretical Significance

Anderson acceleration provides an extrapolation-based alternative to inertial and Nesterov-type momentum accelerations, with the practical advantage of being line-search-free and simple to implement in coordinate settings. The method leverages fixed-point formulations, which generalize naturally to nonsmooth and composite environments as long as active manifold identification properties hold.

Analytically, the main technical device is the local smoothness of the coordinate descent update map on the active manifold, extending the classical sensitivity and implicit function results from smooth to piecewise-smooth (e.g., F(x)=12Axy2F(x) = \frac12\|Ax-y\|^22) settings. When the operator mapping contracts sufficiently near the optimum, the AA scheme ensures local linear acceleration. Objective safeguarding addresses global behavior and ensures stability in the presence of nonsmooth pivots or when leaving the local identification regime.

The empirical and theoretical results collectively situate AA-CD as a robust, efficient acceleration scheme for coordinate-based algorithms on modern large-scale convex and composite machine learning problems (Bertrand et al., 2020, Li et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Anderson-Accelerated Coordinate Descent.