Global convergence of splitting methods for nonconvex composite optimization (1407.0753v6)

Published 3 Jul 2014 in math.OC, cs.LG, math.NA, and stat.ML

Abstract: We consider the problem of minimizing the sum of a smooth function $h$ with a bounded Hessian, and a nonsmooth function. We assume that the latter function is a composition of a proper closed function $P$ and a surjective linear map $\cal M$, with the proximal mappings of $\tau P$, $\tau > 0$, simple to compute. This problem is nonconvex in general and encompasses many important applications in engineering and machine learning. In this paper, we examined two types of splitting methods for solving this nonconvex optimization problem: alternating direction method of multipliers and proximal gradient algorithm. For the direct adaptation of the alternating direction method of multipliers, we show that, if the penalty parameter is chosen sufficiently large and the sequence generated has a cluster point, then it gives a stationary point of the nonconvex problem. We also establish convergence of the whole sequence under an additional assumption that the functions $h$ and $P$ are semi-algebraic. Furthermore, we give simple sufficient conditions to guarantee boundedness of the sequence generated. These conditions can be satisfied for a wide range of applications including the least squares problem with the $\ell_{1/2}$ regularization. Finally, when $\cal M$ is the identity so that the proximal gradient algorithm can be efficiently applied, we show that any cluster point is stationary under a slightly more flexible constant step-size rule than what is known in the literature for a nonconvex $h$.

Citations (392)

View on Semantic Scholar

Summary

The paper establishes that under appropriate conditions, ADMM with a proximal term converges globally to a stationary point in nonconvex settings.
It extends the Proximal Gradient Algorithm with a flexible constant step-size rule, enabling larger steps without sacrificing convergence.
These findings provide a robust theoretical foundation for applying splitting methods in diverse engineering and machine learning applications.

An Expert Overview of "Global Convergence of Splitting Methods for Nonconvex Composite Optimization"

The paper "Global Convergence of Splitting Methods for Nonconvex Composite Optimization" by Guoyin Li and Ting Kei Pong addresses the problem of minimizing an objective function expressed as the sum of a smooth part with a bounded Hessian and a nonsmooth part. The nonsmooth component encompasses a composition of a closed proper function and a surjective linear map. Importantly, the proximal maps of the latter are simple to compute. The problem setup is notably nonconvex, covering numerous applications in engineering and machine learning.

To tackle this problem, the authors investigate two types of splitting methods: the Alternating Direction Method of Multipliers (ADMM) and the Proximal Gradient Algorithm. These methods are evaluated under conditions typically seen in real-world applications, such as the semi-algebraic nature of the functions involved or the simple form of proximal mappings, making their computations feasible.

Key Contributions and Theoretical Developments

ADMM for Nonconvex Problems:
- A focus is placed on a version of ADMM that uses a proximal term to address the evaluated nonconvex problem.
- The authors establish that under suitable conditions, especially when the linear map is surjective, the sequence generated from ADMM (with a large enough penalty parameter) will produce a stationary point. This assumes the sequence has a cluster point.
- The authors present a proof of convergence for the entire sequence under the additional assumption of the semi-algebraic nature of both function components. This lends further credence to the expectation of convergence in many practical situations.
- Boundedness of the generated sequence is ensured under specific conditions, covering a range of applications like those using the least squares loss with $\ell_{1/2}$ regularization.
Proximal Gradient Algorithm:
- The paper extends to situations where the nonsmooth term is directly involved, allowing for the application of a more flexible constant step-size rule than previously documented in the literature.
- The proposed rule benefits the practical application by enabling larger step sizes without compromising convergence to a stationary point.

Practical and Theoretical Implications

The implications of these findings are multi-faceted. Practically, they provide a solid foundation for extending splitting methods to solve nonconvex optimization problems reliably, particularly within the ML space where such structures are ubiquitous. The inclusive scope of the models discussed (like the $\ell_{1/2}$ regularization in least squares problems) hints at broad applicability.

In theoretical terms, by showing the convergence of sequences under a set of well-defined conditions and bounds, the paper satisfies a critical requirement for computational methods intended for complex nonconvex landscapes. The work ensures that splitting methods remain robust even as the complexity of the data or functional domains increases.

Future Directions

For future exploration, it would be promising to adapt further splitting methods—particularly those known for convex problems—to the nonconvex setting, either by modifying existing algorithms or proposing new frameworks. Investigating the potential for these methods to be adapted with polynomial-time guarantees remains an intriguing open question. Additionally, addressing situations where the linear map is only injective might pave the way to broader applicability and enhanced computational effectiveness.

In summary, Li and Pong offer significant contributions toward understanding and applying splitting methods in nonconvex composite optimization scenarios. They provide essential theoretical assurance for practitioners aiming to tackle complex real-world problems with structured objective functions.

PDF Markdown