Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization (1309.2375v2)

Published 10 Sep 2013 in stat.ML, cs.LG, cs.NA, and stat.CO

Abstract: We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure. We analyze the runtime of the framework and obtain rates that improve state-of-the-art results for various key machine learning optimization problems including SVM, logistic regression, ridge regression, Lasso, and multiclass SVM. Experiments validate our theoretical findings.

Citations (462)

View on Semantic Scholar

Summary

The paper develops an accelerated Proximal SDCA algorithm that significantly improves convergence for regularized loss minimization.
It integrates a novel inner-outer iteration framework to efficiently handle both low and high condition number scenarios.
Experimental results demonstrate reduced epochs and runtimes compared to standard SDCA and gradient descent methods.

Overview of "Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization"

This paper introduces an advanced algorithm for minimizing regularized loss functions in machine learning. The authors propose a proximal version of stochastic dual coordinate ascent (SDCA) and accelerate it using an inner-outer iteration framework. The methodology is applicable to fundamental optimization problems like SVMs, logistic regression, ridge regression, Lasso, and multiclass SVMs, achieving improved convergence rates compared to existing methods.

Key Contributions

Proximal Stochastic Dual Coordinate Ascent (Prox-SDCA):
- The paper develops a proximal version of SDCA that supports general strongly convex regularizers and smooth loss functions.
- It provides rigorous analysis, showcasing improved convergence rates over standard SDCA algorithms in existing literature.
- The approach efficiently handles problems where the condition number is less than or equal to the dataset size, achieving runtimes close to linear in data size.
Acceleration Framework:
- When the condition number surpasses the dataset size, the accelerated SDCA significantly reduces computational complexity.
- This is accomplished by iteratively solving modified problems with enhanced regularization, offering quadratic improvements in the condition number's impact on runtime.
- The accelerated algorithm achieves state-of-the-art runtimes for high condition number scenarios, improving significantly over accelerated gradient descent methods.
Application to Diverse Machine Learning Problems:
- The framework is applied to ridge regression, logistic regression, and Lasso, demonstrating versatility across different problems.
- For non-smooth loss functions, a smoothing technique is incorporated, allowing the framework's extension to hinge-loss in SVMs and multiclass prediction problems.

Numerical Results and Implications

The experiments validate the theoretical findings, showing superior performance with reduced epochs required to achieve a desired accuracy across multiple datasets. The results indicate that the accelerated approach considerably outperforms both traditional SDCA and FISTA in low regularization settings (i.e., small $\lambda$ ) which are common in practice.

Theoretical and Practical Implications

Theoretical Advancements:
- The paper extends the analysis of SDCA to more general settings, allowing smoothness and strong convexity considered under different norms.
- It pioneers the integration of acceleration techniques within the SDCA paradigm, achieving logarithmic dependence on desired accuracy – a significant theoretical contribution.
Practical Utility:
- The proposed methods can be integrated into existing machine learning frameworks, offering improved efficiency for large-scale problems.
- The approach allows practitioners to handle broader classes of loss functions and regularizers without sacrificing computational efficiency.

Future Directions

Future work might explore further generalization of the acceleration framework to accommodate more complex regularization schemes and different norm types. Additionally, extending the theoretical results to stochastic environments and other machine learning settings remains a promising avenue for research.

In conclusion, this paper makes significant advancements in optimization algorithms for machine learning, offering both theoretical insights and practical benefits in solving a wide range of regularized loss minimization problems efficiently.

PDF Markdown