Accelerated, Parallel and Proximal Coordinate Descent (1312.5799v2)

Published 20 Dec 2013 in math.OC, cs.DC, cs.NA, math.NA, and stat.ML

Abstract: We propose a new stochastic coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate $2\bar{\omega}\bar{L} R^2/(k+1)² $, where $k$ is the iteration counter, $\bar{\omega}$ is an average degree of separability of the loss function, $\bar{L}$ is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and $R$ is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of existing accelerated coordinate descent methods. The fact that the method depends on the average degree of separability, and not on the maximum degree of separability, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel stochastic coordinate descent algorithms based on the concept of ESO.

Citations (370)

View on Semantic Scholar

Summary

The paper introduces APPROX, the first algorithm to combine acceleration, parallel processing, and proximal updates in stochastic coordinate descent.
The paper achieves an O(1/k²) convergence rate for smooth problems and proposes novel stepsizes based on an improved Expected Separable Overapproximation framework.
The paper demonstrates efficient implementation by avoiding full-dimensional vector operations, enhancing parallel performance in high-dimensional convex optimization.

Accelerated, Parallel, and Proximal Coordinate Descent: An Overview

The paper "Accelerated, Parallel, and Proximal Coordinate Descent" introduces a novel stochastic coordinate descent method named APPROX, aiming to optimize the minimization of a sum of convex functions, with each function dependent only on a subset of coordinates. This method is significant due to its comprehensive incorporation of acceleration, parallelization, and proximal concepts simultaneously within the coordinate descent framework, a combination that had not been achieved in existing literature at the time.

Key Contributions

The paper makes several notable contributions within the domain of optimization:

Innovative Algorithm Design: The APPROX algorithm is the first to synergize acceleration, parallel processing, and proximal updates in stochastic coordinate descent. This approach marks a significant advancement, leading to faster convergence rates compared to non-accelerated counterparts. Specifically, it attains an $O(1/k^2)$ convergence rate for smooth problems without constraints, aligning with Nesterov's accelerated gradient methods.
Enhanced Stepsizes: The authors introduce novel stepsizes premised on an improved Expected Separable Overapproximation (ESO) framework. These stepsizes are contingent on the average degree of separability, $\bar{\omega}$ , rather than the maximum degree, $\omega$ . The new stepsizes can lead to substantial performance gains in scenarios where $\bar{\omega}$ is markedly less than $\omega$ .
Efficient Implementation: The APPROX method is structured to preclude full-dimensional vector operations, a key bottleneck in prior accelerated methods. By circumventing this limitation, the algorithm can better exploit modern parallel computing architectures while maintaining efficiency in high-dimensional spaces.

Theoretical and Practical Implications

The theoretical implications of the paper lie primarily in the enhanced understanding and capabilities of coordinate descent methods when extended to high-dimensional convex optimization problems. By integrating acceleration, the method is particularly suited for applications requiring high accuracy and rapid convergence. The development of new stepsizes based on a refined ESO condition also provides a framework that can be adapted and potentially improve other stochastic coordinate descent methods.

Practically, the APPROX method has implications across various domains where large-scale optimization problems prevail. These include machine learning, data mining, and scientific computing, among others. The ability to parallelize effectively while still achieving rapid convergence opens pathways to apply this method to vast data resources without being hampered by the typical computational burdens associated with high-dimensional datasets.

Future Developments

The intersection of accelerated, parallel, and proximal methods in coordinate descent opens multiple avenues for further research. Future developments could explore:

Extensions to non-convex optimization settings, which remain critical in areas such as deep learning.
Investigations into dynamic stepsize adjustments based on real-time data characteristics, potentially improving adaptability and efficiency.
Further enhancements in parallel and distributed computing environments, maximizing resource utilization in increasingly available large-scale infrastructures.

The APPROX method represents a significant step forward in the field of optimization, innovating upon classical coordinate descent strategies by embedding acceleration and parallel computing principles through a proximal framework. Its contributions to both theory and practice highlight its relevance and potential impact across numerous data-driven disciplines.

PDF Markdown