Anderson-Accelerated Coordinate Descent
- AA-CD is a method that applies Anderson acceleration to coordinate descent, accelerating fixed-point iterations for improved convergence.
- It integrates proximal updates with nonlinear extrapolation, achieving speedups of up to 2×–10× on large-scale convex and composite optimization problems.
- Objective safeguarding and active manifold identification ensure local linear convergence even in nonsmooth or ill-conditioned scenarios.
Anderson-Accelerated Coordinate Descent (AA-CD) denotes the application of Anderson acceleration—a nonlinear extrapolation technique designed to speed up fixed-point methods—to cyclic coordinate descent and proximal coordinate descent methods. AA-CD has demonstrated practical superiority over both traditional first-order and inertially accelerated approaches, particularly on a spectrum of large-scale convex and composite optimization problems central to machine learning and signal processing.
1. Optimization Problem Framework
AA-CD addresses composite convex minimization problems of the form
where:
- ,
- is convex and typically -smooth,
- Each is proper, closed, convex, and separable.
Canonical instances include least-squares (), Lasso (), elastic-net (), and sparse logistic regression () (Bertrand et al., 2020).
2. Classical and Proximal Coordinate Descent
Coordinate descent (CD) optimizes the objective by iteratively updating one coordinate (or block) at a time. For a general composite form as above, cyclic coordinate updates are written as
where each 0 is a coordinate-wise Lipschitz constant. A complete sweep produces the fixed-point map 1 so that 2.
Proximal coordinate descent generalizes this framework to nonsmooth settings via component-wise proximal operators, rendering it applicable for constraints and regularizers prevalent in practice (Bertrand et al., 2020, Li et al., 2024).
3. Anderson Acceleration: Principle and Formulation
Anderson acceleration (AA) aims to enhance fixed-point iterations 3 by constructing a nonlinear extrapolation over recent iterates. At step 4, AA forms residuals
5
and computes coefficients 6 solving
7
with the extrapolated iterate
8
A closed-form solution is available: 9 where 0. Safeguarding is enforced by only adopting the AA iterate if objective function descent is achieved. In the context of coordinate descent, this mechanism is triggered every 1 epochs, utilizing the last 2 iterates (Bertrand et al., 2020, Li et al., 2024).
4. Integration with Coordinate Descent and Algorithm Description
In AA-CD, Anderson acceleration is wrapped around coordinate descent in the following manner:
- Perform 3 epochs of coordinate (or proximal coordinate) descent, storing the latest 4 iterates 5.
- Construct the matrix of differences 6.
- Solve the regularized least-squares problem for coefficients 7 constrained by 8:
9
- Compute extrapolated candidate 0.
- Update 1 if 2.
A detailed pseudocode, matching Algorithm 1 ("Online Anderson PCD") in (Bertrand et al., 2020), elaborates these steps, including computational details and safeguarding measures.
5. Convergence Theory
Quadratic and Symmetric Cases
For linear iterations of the form 3 with 4 symmetric positive semidefinite and 5, Anderson acceleration achieves accelerated linear convergence: 6 The online variant yields an exponential rate modified by the memory parameter 7.
Non-Symmetric and Composite Settings
For cyclic coordinate descent on quadratics with non-symmetric 8, sublinear convergence rates are established via polynomial approximation on the numerical range 9. Symmetrization via forward–backward sweeps enables linear rates up to a factor of 0.
Nonsmooth/Composite Case
When 1 and 2 are sufficiently smooth (locally 3), local contraction of the fixed-point operator 4 is guaranteed, and asymptotic acceleration of Anderson-accelerated coordinate updates follows. Objective safeguarding secures global convergence.
A sharp local R-linear convergence result is obtained for nonsmooth problems under "active manifold identification": If the PCD operator identifies a smooth submanifold (e.g., with stabilized support patterns) near a critical point 5, then AA-CD iterates satisfy
6
for some 7, and 8; 9 is the composed coordinate update map (Li et al., 2024).
6. Empirical Performance and Evaluation
Benchmark experiments for AA-CD have been conducted on a range of regression and classification tasks, including least-squares, Lasso, elastic-net, and sparse logistic regression models using datasets from LIBSVM/OpenML (e.g., rcv1, real-sim, news20, leukemia) at varying regularization strengths.
Comparative methods include:
- Proximal gradient descent (PGD), FISTA (accelerated PGD), and Anderson-accelerated PGD
- Cyclic coordinate descent (PCD), randomized coordinate descent (PRCD), inertial CD
- Anderson-accelerated coordinate descent (AA-CD)
Results demonstrate:
- PCD outperforms PGD and FISTA on high-dimensional problems.
- Inertial CD, despite theoretical acceleration, may stall or deteriorate without careful restarts.
- Anderson-accelerated PGD provides modest improvements over FISTA.
- AA-CD delivers speedups by factors ranging from 2× to 10× in wall-clock time to a prescribed accuracy threshold, with most pronounced gains in ill-conditioned and low-regularization regimes.
- Speedup is especially substantial during the phase in which the algorithm has identified the problem's active manifold (Bertrand et al., 2020, Li et al., 2024).
Overhead incurred by AA-CD for managing the least-squares subproblem (with typically 0 and 1) remains negligible compared to the dominant cost of data matrix operations.
7. Connections, Limitations, and Theoretical Significance
Anderson acceleration provides an extrapolation-based alternative to inertial and Nesterov-type momentum accelerations, with the practical advantage of being line-search-free and simple to implement in coordinate settings. The method leverages fixed-point formulations, which generalize naturally to nonsmooth and composite environments as long as active manifold identification properties hold.
Analytically, the main technical device is the local smoothness of the coordinate descent update map on the active manifold, extending the classical sensitivity and implicit function results from smooth to piecewise-smooth (e.g., 2) settings. When the operator mapping contracts sufficiently near the optimum, the AA scheme ensures local linear acceleration. Objective safeguarding addresses global behavior and ensures stability in the presence of nonsmooth pivots or when leaving the local identification regime.
The empirical and theoretical results collectively situate AA-CD as a robust, efficient acceleration scheme for coordinate-based algorithms on modern large-scale convex and composite machine learning problems (Bertrand et al., 2020, Li et al., 2024).