Exact and Inexact Subsampled Newton Methods for Optimization (1609.08502v1)

Published 27 Sep 2016 in math.OC and stat.ML

Abstract: The paper studies the solution of stochastic optimization problems in which approximations to the gradient and Hessian are obtained through subsampling. We first consider Newton-like methods that employ these approximations and discuss how to coordinate the accuracy in the gradient and Hessian to yield a superlinear rate of convergence in expectation. The second part of the paper analyzes an inexact Newton method that solves linear systems approximately using the conjugate gradient (CG) method, and that samples the Hessian and not the gradient (the gradient is assumed to be exact). We provide a complexity analysis for this method based on the properties of the CG iteration and the quality of the Hessian approximation, and compare it with a method that employs a stochastic gradient iteration instead of the CG method. We report preliminary numerical results that illustrate the performance of inexact subsampled Newton methods on machine learning applications based on logistic regression.

Citations (168)

View on Semantic Scholar

Summary

The paper introduces exact subsampled Newton methods that achieve superlinear convergence by adaptively approximating both gradients and Hessians.
The paper presents an inexact Newton method that leverages conjugate gradient iterations to efficiently solve linear systems with bounded complexity.
The paper validates its methods with logistic regression applications, demonstrating improved efficiency for large-scale machine learning problems.

Exact and Inexact Subsampled Newton Methods for Optimization

The paper under consideration explores advanced subsampled Newton methods for stochastic optimization, focusing on both exact and inexact variants. Such methods leverage stochastic approximations of gradients and Hessians to optimize computational efficiency in large-scale machine learning problems.

Main Contributions

Newton-like Methods: The paper provides a detailed analysis of Newton-type methods that utilize both subsampled gradients and Hessians. The authors adaptively coordinate the approximation levels of these components to achieve a superlinear rate of convergence in expectation. This is an improvement over prior approaches that predominantly aim for linear convergence.
Inexact Newton Methods: An in-depth analysis is conducted on Newton methods that solve linear systems approximately. The paper introduces a scheme using the conjugate gradient (CG) method and provides complexity analyses based on the properties of CG iterations and the quality of Hessian approximations. A comparison is made with methods employing stochastic gradient iterations (SGI).
Numerical Illustrations: Preliminary numerical results based on logistic regression applications highlight the effectiveness of inexact subsampled Newton methods, supporting theoretical claims with empirical evidence.

Key Findings

The exact subsampled Hessian approaches can achieve superlinear convergence under appropriate conditions, while the inexact methods that utilize CG or SGI show practical promise. Notably, the CG-based inexact method exhibits promising efficiency, with the computational complexity bounded by the number of Hessian-vector products required per iteration.

Implications for Machine Learning and Optimization

The implications of this research extend to both theoretical and practical domains:

Theoretical Implications: The convergence conditions set forth for exact Newton methods encourage the exploration of sampling strategies and error analysis in large-scale settings. The complexity analyses of inexact methods provide a foundation for further refinement of iterative solvers in subsampled environments.
Practical Implications: The results serve as a reference for implementing scalable second-order optimization methods, particularly in machine learning applications involving immense datasets with significant dimensionality.

Future Directions

Explorations into non-uniform sampling are anticipated to yield further optimizations. Another promising avenue is the development of hybrid approaches combining various iterative solvers to enhance convergence rates while maintaining computational tractability. Additional empirical validation across broader datasets would solidify the general applicability of the proposed methods.

In conclusion, this work contributes notably to the growing literature on efficient stochastic optimization techniques in machine learning, with specific advancements in subsampled Newton methodologies. The careful balance between theoretical rigor and practical application makes it a valuable resource for ongoing research and development in this dynamic field.

PDF Markdown