Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information (1708.07164v4)

Published 23 Aug 2017 in math.OC, cs.CC, cs.LG, and stat.ML

Abstract: We consider variants of trust-region and cubic regularization methods for non-convex optimization, in which the Hessian matrix is approximated. Under mild conditions on the inexact Hessian, and using approximate solution of the corresponding sub-problems, we provide iteration complexity to achieve $ \epsilon $-approximate second-order optimality which have shown to be tight. Our Hessian approximation conditions constitute a major relaxation over the existing ones in the literature. Consequently, we are able to show that such mild conditions allow for the construction of the approximate Hessian through various random sampling methods. In this light, we consider the canonical problem of finite-sum minimization, provide appropriate uniform and non-uniform sub-sampling strategies to construct such Hessian approximations, and obtain optimal iteration complexity for the corresponding sub-sampled trust-region and cubic regularization methods.

Authors (3)

Peng Xu (357 papers)
Fred Roosta (36 papers)
Michael W. Mahoney (233 papers)

Citations (203)

View on Semantic Scholar

Summary

The paper introduces an inexact Hessian condition that enables approximated Newton-type methods while preserving convergence guarantees.
It proposes tailored trust-region and cubic regularization algorithms that maintain iteration complexity similar to exact methods in nonconvex settings.
Randomized sampling techniques are applied to construct approximate Hessians, significantly reducing computational costs in large-scale machine learning problems.

Overview of "Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information"

This paper investigates the development of Newton-type methods for non-convex optimization problems under scenarios where the Hessian matrix is not exactly known but rather approximated. The authors contribute to the field by examining variants of trust-region and adaptive cubic regularization methods, providing both theoretical convergence guarantees and practical implementation guidance.

Main Contributions

Hessian Approximation Condition: The paper introduces an inexact Hessian regularity condition which is weaker than those found in prior work. This condition allows for the Hessian to be approximated with a known error bound, enabling efficient computation while still providing theoretical convergence guarantees. This innovation supports the development of practical algorithms that maintain consistency with the tight iteration complexities of classical methods.
Algorithmic Framework: The authors propose versions of trust-region and adaptive cubic regularization methods tailored for settings with inexact Hessian information. These methods maintain the iteration complexity of the exact counterparts, meaning that the computational cost of reaching an approximate second-order critical solution is unchanged despite using approximations.
Randomized Sampling: In the context of large-scale finite-sum optimization problems ubiquitous in machine learning, the paper applies randomized sampling techniques to construct approximate Hessians. These sampling strategies ensure with high probability that the approximated Hessian satisfies the regularity conditions.
Theoretical Analysis: The paper establishes rigorous complexity bounds for the proposed algorithms, similar to those obtained by exact Hessian-based methods. Trust-region methods achieve a complexity of $\mathcal{O}(\max\{\epsilon_g^{-2}\epsilon_H^{-1}, \epsilon_H^{-3}\})$ , and cubic regularization methods reach $\mathcal{O}(\max\{\epsilon_g^{-2}, \epsilon_H^{-3}\})$ . These results affirm that the gains in computational efficiency do not compromise convergence properties.

Practical Implications

Sub-sampling Strategies: By using sub-sampling strategies, the authors show that one can significantly reduce the computational burden associated with Hessian evaluations in large-scale problems, providing practical guidelines for selecting sample sizes and sampling distributions.
Machine Learning Applications: The paper addresses challenges typical in big data environments and suggests that the presented methods could be particularly useful for machine learning applications, where computations are a major concern and approximate solutions are often sufficient.

Theoretical and Practical Impacts

This work significantly broadens the practical applicability of Newton-type methods for non-convex optimization by accommodating inexact Hessian information, furnishing robust theoretical guarantees, and suggesting efficient computational strategies. The constraint-relaxing regularity condition not only simplifies the implementation but also retains the attractive convergence properties of second-order methods. Additionally, these methods lay the groundwork for future research in optimizing non-convex functions with computationally feasible strategies, anticipating extensions to settings with approximate gradient information and stochastic optimization frameworks.

Future Directions

The authors suggest that future work could explore gradient approximation while ensuring the accuracy of Hessian computations. Investigating such extensions might yield more efficient algorithms, even when facing large-scale data or limited computational resources. The paper of distributed optimization frameworks, where communication overhead can be a critical issue, also presents an area for potential advances building on the results and methods introduced in this paper.

In summary, this paper provides a comprehensive framework for employing Newton-type methods in settings where computing or storing the exact Hessian is impractical. By leveraging advances in randomized numerical linear algebra, this work presents a significant step forward in the field of optimization, offering both computational efficiency and robust theoretical underpinnings.

PDF Markdown