Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization (1409.3257v2)

Published 10 Sep 2014 in math.OC and stat.ML

Abstract: We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convex-concave saddle point problem. We propose a stochastic primal-dual coordinate (SPDC) method, which alternates between maximizing over a randomly chosen dual variable and minimizing over the primal variable. An extrapolation step on the primal variable is performed to obtain accelerated convergence rate. We also develop a mini-batch version of the SPDC method which facilitates parallel computing, and an extension with weighted sampling probabilities on the dual variables, which has a better complexity than uniform sampling on unnormalized data. Both theoretically and empirically, we show that the SPDC method has comparable or better performance than several state-of-the-art optimization methods.

Authors (2)

Yuchen Zhang (112 papers)
Lin Xiao (82 papers)

Citations (259)

View on Semantic Scholar

Summary

The paper introduces the Stochastic Primal-Dual Coordinate (SPDC) method for regularized empirical risk minimization, framing it as a convex-concave saddle-point problem.
The algorithm utilizes iterative primal-dual updates with primal extrapolation and a mini-batch adaptation incorporating non-uniform sampling for efficiency and acceleration.
SPDC achieves a competitive theoretical convergence rate of O((n+\u221a\u03ba n)log(1/\u03b5)) and demonstrates strong empirical performance, especially in ill-conditioned scenarios.

Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization

The paper presents a Stochastic Primal-Dual Coordinate (SPDC) method targeting the optimization problem of regularized empirical risk minimization (ERM) for linear predictors—a common objective in machine learning tasks such as classification and regression. The authors frame this optimization challenge as a convex-concave saddle-point problem, a strategic formulation that bridges primal and dual spaces effectively through modern optimization techniques.

Algorithmic Approach

The proposed SPDC method exploits a primal-dual approach by iteratively maximizing over a randomly chosen dual variable and minimizing over the primal variable. Notably, the method incorporates an extrapolation step on the primal variable, which is crucial to achieving an accelerated convergence rate. This step draws on Nesterov’s acceleration techniques, aiming to reduce the burden of slow convergence inherent in first-order methods.

Moreover, the authors advance a mini-batch adaptation of the SPDC algorithm, enhancing computational efficiency and suitability for parallel processing environments. This variant also includes an innovation in the form of non-uniform sampling of dual variables, where the probability of updating a dual variable is weighted by its associated feature norm. This strategy aims to boost algorithmic stability and performance on datasets with features of diverse magnitudes.

Theoretical Contributions

Under assumptions of smoothness and strong convexity, the SPDC method is shown to achieve convergence rates competitive with, or superior to, state-of-the-art methods, specifically in the context of minimizing composite objectives arising from regularized ERM with high condition numbers. The condition number, essential for expressing problem difficulty, drives the complexity estimates. Results indicate a complexity governed by $O((n+\sqrt{\kappa n})\log(1/\epsilon))$ , where $\kappa$ is the condition number, $n$ is the dataset size, and $\epsilon$ is the desired precision.

The paper extends its theoretical framework to handle non-smooth scenarios, by introducing slight modifications to convex conjugate functions via smoothing and strongly convex perturbations, maintaining broad applicability across different loss functions and regularizations.

Empirical Evaluation

Empirical evaluations highlight the efficiency and robustness of the SPDC method through comparisons against other prominent methods, including accelerated full gradient methods and dual coordinate ascent strategies. SPDC exhibits accelerated convergence particularly in ill-conditioned settings, attributed to its clever balance of sampling and update dynamics, underscoring its utility in large-scale machine learning problems.

Implications and Future Directions

This research contributes to the field of optimization for machine learning models by:

Offering a refined method capable of addressing the common pitfalls involved in large-scale, high-dimensional optimization.
Demonstrating how stochasticity combined with thoughtfully designed primal-dual interactions can lead to efficient learning algorithms.

Long-term, the SPDC paradigm prompts further exploration into stochastic coordinate techniques and their potential synergies with batch methods. Moreover, it opens questions regarding optimal parameter selection strategies, adaptive enhancements, and the extension to non-linear and non-convex domains integral to modern deep learning systems. The interplay between theoretical convergence results and practical performance continues to be a pillar for advancing computational methodologies in machine learning.

PDF Markdown