Global Convergence of ADMM in Nonconvex Nonsmooth Optimization (1511.06324v8)

Published 18 Nov 2015 in math.OC, cs.NA, and math.NA

Abstract: In this paper, we analyze the convergence of the alternating direction method of multipliers (ADMM) for minimizing a nonconvex and possibly nonsmooth objective function, $\phi(x_0,\ldots,x_p,y)$, subject to coupled linear equality constraints. Our ADMM updates each of the primal variables $x_0,\ldots,x_p,y$, followed by updating the dual variable. We separate the variable $y$ from $x_i$'s as it has a special role in our analysis. The developed convergence guarantee covers a variety of nonconvex functions such as piecewise linear functions, $\ell_q$ quasi-norm, Schatten-$q$ quasi-norm ($0<q<1$), minimax concave penalty (MCP), and smoothly clipped absolute deviation (SCAD) penalty. It also allows nonconvex constraints such as compact manifolds (e.g., spherical, Stiefel, and Grassman manifolds) and linear complementarity constraints. Also, the $x_0$-block can be almost any lower semi-continuous function. By applying our analysis, we show, for the first time, that several ADMM algorithms applied to solve nonconvex models in statistical learning, optimization on manifold, and matrix decomposition are guaranteed to converge. Our results provide sufficient conditions for ADMM to converge on (convex or nonconvex) monotropic programs with three or more blocks, as they are special cases of our model. ADMM has been regarded as a variant to the augmented Lagrangian method (ALM). We present a simple example to illustrate how ADMM converges but ALM diverges with bounded penalty parameter $\beta$. Indicated by this example and other analysis in this paper, ADMM might be a better choice than ALM for some nonconvex \emph{nonsmooth} problems, because ADMM is not only easier to implement, it is also more likely to converge for the concerned scenarios.

Citations (1,093)

View on Semantic Scholar

Summary

The paper establishes that under conditions like coercivity and Lipschitz continuity, the ADMM algorithm converges to a stationary point for nonconvex nonsmooth objectives.
It introduces an ADMM variant that updates multiple variable blocks in a cyclic or arbitrary order, ensuring convergence through an augmented Lagrangian framework.
The convergence guarantees extend ADMM's applicability to practical problems in matrix decomposition and statistical learning, offering robust performance in complex optimization settings.

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

In the paper "Global Convergence of ADMM in Nonconvex Nonsmooth Optimization," the authors examine the convergence properties of the Alternating Direction Method of Multipliers (ADMM) applied to a broad class of nonconvex and nonsmooth optimization problems. Specifically, the paper addresses the minimization of a nonconvex and potentially nonsmooth objective function subject to coupled linear equality constraints.

Problem Formulation and Convergence Analysis

The authors define the problem of interest as minimizing an objective function $\phi(x_0, \ldots, x_p, y)$ with respect to constrained variables $x_0, \ldots, x_p,$ and $y$ , such that $A_0 x_0 + A_1 x_1 + \cdots + A_p x_p + By = b$ . The variable $y$ is treated separately due to its special role in the convergence analysis.

Algorithm Description

ADMM is extended to handle multiple blocks of variables $(x_0, \ldots, x_p)$ and $y$ . Each variable block is updated in a cyclic manner followed by an update of the dual variable $w$ . This sequence ensures that the updates adhere to the augmented Lagrangian framework. The paper introduces an ADMM variant where the blocks can be updated in any arbitrary order at each iteration, provided that $x_0$ is the first to update and $y$ is updated last.

Convergence Conditions and Results

The authors derive theoretical conditions under which the ADMM algorithm is guaranteed to converge. These conditions include:

Coercivity: The objective function $\phi(x_0, \ldots, x_p, y)$ must be coercive over the feasible set defined by the linear constraints.
Feasibility: The image space of the concatenation of matrices $A_i$ must be a subset of the image space of $B$ .
Lipschitz Continuity: The maps that arise from solving subproblems of the form $\argmin_{y} \phi(x, y)$ and $\argmin_{x_i} \phi(x, y)$ underpin the required Lipschitz continuity.
Objective Regularity: This includes specific structural properties of the functions involved like lower semi-continuity and restricted prox-regularity.

The primary theoretical contribution is establishing that under these conditions, the ADMM sequences converges to a stationary point of the augmented Lagrangian function. Moreover, if the augmented Lagrangian function adheres to the Kurdyka-Łojasiewicz (KŁ) inequality, global convergence to the unique limit point is guaranteed.

Implications and Applications

This work has significant implications:

Practical: The findings offer substantial guarantees for the applicability of ADMM to real-world problems such as matrix decomposition, statistical learning models, smooth optimization over compact manifolds, and others. The convergence guarantees provided can be beneficial to those using ADMM in practice, ensuring reliability and performance.
Theoretical: The assumptions and proofs broaden the scope of ADMM convergence theory, encompassing nonconvex and nonsmooth settings that were previously challenging. Particularly, the results show that certain non-Lipschitz and nonconvex functions (e.g., $\ell_q$ quasi-norms, Schatten-(q) quasi-norms) are included under the convergence guarantees.

Speculation on Future Developments

The paper hints at several directions for future work and potential developments in the field:

Relaxation of Assumptions: While the current framework already includes nonconvex and nonsmooth objectives, further relaxation of assumptions may be explored to make ADMM more generally applicable.
Inexact ADMM: Extending the analysis to inexact variants of ADMM may yield new insights and allow for the development of more robust algorithms in practice.
Dynamic Update Orders: Investigating different update orders for variable blocks, beyond the fixed and cyclic schemes considered, could uncover new efficient methods for diverse applications.

Conclusion

The paper "Global Convergence of ADMM in Nonconvex Nonsmooth Optimization" provides a rigorous and thorough exploration of ADMM's applicability to nonconvex and nonsmooth problems. It establishes comprehensive conditions under which ADMM converges and extends the theoretical boundaries of what ADMM can achieve, both in practical and theoretical domains. The findings underscore the utility of ADMM in addressing complex optimization challenges prevalent in various scientific and engineering fields.

PDF Markdown