Stochastic Variance Reduction Methods for Saddle-Point Problems (1605.06398v2)

Published 20 May 2016 in cs.LG and math.OC

Abstract: We consider convex-concave saddle-point problems where the objective functions may be split in many components, and extend recent stochastic variance reduction methods (such as SVRG or SAGA) to provide the first large-scale linearly convergent algorithms for this class of problems which is common in machine learning. While the algorithmic extension is straightforward, it comes with challenges and opportunities: (a) the convex minimization analysis does not apply and we use the notion of monotone operators to prove convergence, showing in particular that the same algorithm applies to a larger class of problems, such as variational inequalities, (b) there are two notions of splits, in terms of functions, or in terms of partial derivatives, (c) the split does need to be done with convex-concave terms, (d) non-uniform sampling is key to an efficient algorithm, both in theory and practice, and (e) these incremental algorithms can be easily accelerated using a simple extension of the "catalyst" framework, leading to an algorithm which is always superior to accelerated batch algorithms.

Citations (204)

View on Semantic Scholar

Summary

The paper extends stochastic variance reduction methods (SVRG, SAGA) to convex-concave saddle-point problems using monotone operators.
The authors propose SVRG and SAGA variants, leveraging non-uniform sampling and Catalyst acceleration for improved performance.
These methods provide computationally efficient solutions for large-scale machine learning problems involving saddle points, advancing theoretical understanding via monotone operators.

Stochastic Variance Reduction Methods for Saddle-Point Problems

The paper addresses the extension of stochastic variance reduction methods, traditionally applied to separable optimization problems, to convex-concave saddle-point problems. This class of problems frequently arises in machine learning, particularly in contexts involving Lagrange or Fenchel duality. The authors present a framework for large-scale linearly convergent algorithms for solving these problems, thereby filling a significant gap in current computational optimization methodologies.

Central to this discussion is the saddle-point problem typified by an objective structure that is convex in one variable and concave in another. Formally, this can be expressed as:

$\min_{x \in R^d} \max_{y \in R^n} \ K(x,y) + M(x,y)$

Here, $K$ is smooth and $M$ may be non-smooth with easily computable proximal operators. The decomposition of $K$ into several components provides a structured pathway allowing stochastic variance-reduced gradient (SVRG) and SAGA methods to be adapted for these problems.

Key Contributions

Extension to Monotone Operators: The paper establishes that the convex minimization analyses for stochastic variance-reduced algorithms can be expanded to accommodate saddle-point problems via monotone operators. This is achieved through demonstrating that the notion of monotone operators underlies convergence, extending the applicability to variational inequalities.
Algorithmic Frameworks: Two primary variants are proposed: SVRG and SAGA for saddle-point problems. These adopt a stochastic gradient estimation strategy that leverages the separability of the objective function into gradients of its components, enabling substantial computational savings in practice.
Non-Uniform Sampling: The paper identifies non-uniform sampling as a critical mechanism for improving the efficiency of the proposed incremental algorithms. This rigorous adaptation results in superior theoretical and empirical performance compared to uniform sampling methods.
Catalyst Acceleration: By inserting an additional regularization term and exploiting the Catalyst framework, a significant enhancement is achieved, offering an acceleration that universally outperforms traditional accelerated batch methods.

Practical and Theoretical Implications

Practical Implications

The extension of SVRG and SAGA to saddle-point problems has practical implications, having the potential to transform the computational landscape for machine learning tasks that involve non-separable losses or regularizers. Notably, the algorithms are apt for supervised learning scenarios with complex modeling requirements such as robust optimization and convex relaxation for unsupervised learning.

Theoretical Implications

Theoretical implications underline the importance of monotone operators in defining convergence for saddle-point problems, thus facilitating a broader understanding that encapsulates variational inequalities. The proposed framework advances the goals of achieving linear convergence for a wider spectrum of problem classes.

Future Directions

Future research could focus on extending adaptive approaches to these settings, without presupposing the availability of strong convexity-concavity constants, akin to those already developed for convex minimization paradigms.

The paper’s contributions fundamentally enhance computational methodologies applicable to a wide variety of machine learning problems, emphasizing theoretical soundness and practical versatility in processing modern large-scale datasets efficiently.

PDF Markdown