Accelerated Bregman Proximal Gradient Methods for Relatively Smooth Convex Optimization (1808.03045v3)

Published 9 Aug 2018 in math.OC

Abstract: We consider the problem of minimizing the sum of two convex functions: one is differentiable and relatively smooth with respect to a reference convex function, and the other can be nondifferentiable but simple to optimize. We investigate a triangle scaling property of the Bregman distance generated by the reference convex function and present accelerated Bregman proximal gradient (ABPG) methods that attain an $O(k^{-\gamma})$ convergence rate, where $\gamma\in(0,2]$ is the triangle scaling exponent (TSE) of the Bregman distance. For the Euclidean distance, we have $\gamma=2$ and recover the convergence rate of Nesterov's accelerated gradient methods. For non-Euclidean Bregman distances, the TSE can be much smaller (say $\gamma\leq 1$), but we show that a relaxed definition of intrinsic TSE is always equal to 2. We exploit the intrinsic TSE to develop adaptive ABPG methods that converge much faster in practice. Although theoretical guarantees on a fast convergence rate seem to be out of reach in general, our methods obtain empirical $O(k^{-2})$ rates in numerical experiments on several applications and provide posterior numerical certificates for the fast rates.

Citations (80)

View on Semantic Scholar

Summary

Accelerated Bregman Proximal Gradient Methods for Relatively Smooth Convex Optimization

Introduction

The paper focuses on an advanced approach to solving convex optimization problems involving the minimization of the sum of two convex functions, wherein one function is differentiable and relatively smooth concerning a reference convex function, while the other can be non-differentiable yet straightforward to optimize. The paper contributes significantly to optimization by proposing accelerated Bregman proximal gradient (ABPG) methods, showcasing improved convergence properties.

Methodology and Results

The proposed methods are grounded in the concept of Bregman distances, specifically leveraging a triangle scaling property to attain convergence rates expressed as $O(k^{-\gamma})$ , with $\gamma$ being the triangle scaling exponent (TSE). For Euclidean distances, the methods align with established accelerated gradient techniques, recovering the convergence rate attributed to Nesterov’s methods. However, when dealing with non-Euclidean Bregman distances, the TSE may be considerably lower (with $\gamma \leq 1$ ). The paper introduces a relaxation in defining intrinsic TSE, ensuring it always equals 2, thereby enabling methods that adaptively achieve faster convergence in practice without losing precision in theoretical guarantees.

Numerical experiments reveal that these adaptive ABPG methods generally retain $O(k^{-2})$ rates across various applications. Through empirical analysis on several datasets, the methods consistently demonstrate superior efficiency in convergence over traditional proximal gradient approaches, establishing robust empirical baselines.

Implications and Future Directions

The implications of this research are multifaceted. Practically, it offers a framework for optimizing relatively smooth convex functions with higher computational efficiency, potentially impacting areas like signal processing, machine learning, and statistical inference where such optimization problems frequently occur. Theoretically, it enriches the understanding of the convergence behavior of first-order methods under relaxed smoothness conditions, bridging the gap between classical gradient methods and modern adaptive schemes.

Looking forward, the paper suggests several intriguing pathways for the evolution of Bregman-based methods. The exploration of additional structural properties of Bregman divergences could further refine convergence enhancements. Moreover, integrating these methods within broader optimization contexts—such as constrained optimization or stochastic frameworks—could extend their applicability and efficacy.

Conclusion

The paper offers a valuable contribution to the field of convex optimization by refining and extending the Bregman proximal gradient structure to accommodate relatively smooth scenarios. It underscores the utility of adaptive approaches in achieving substantial performance improvements, supported by rigorous empirical evidence and promising theoretical advancements. Researchers in optimization and related fields will find the insights and methodologies presented invaluable as they navigate the complexities of modern large-scale optimization problems.