A Survey of Optimization Methods from a Machine Learning Perspective (1906.06821v2)

Published 17 Jun 2019 in cs.LG, math.OC, and stat.ML

Abstract: Machine learning develops rapidly, which has made many theoretical breakthroughs and is widely applied in various fields. Optimization, as an important part of machine learning, has attracted much attention of researchers. With the exponential growth of data amount and the increase of model complexity, optimization methods in machine learning face more and more challenges. A lot of work on solving optimization problems or improving optimization methods in machine learning has been proposed successively. The systematic retrospect and summary of the optimization methods from the perspective of machine learning are of great significance, which can offer guidance for both developments of optimization and machine learning research. In this paper, we first describe the optimization problems in machine learning. Then, we introduce the principles and progresses of commonly used optimization methods. Next, we summarize the applications and developments of optimization methods in some popular machine learning fields. Finally, we explore and give some challenges and open problems for the optimization in machine learning.

Citations (495)

View on Semantic Scholar

Summary

The paper provides a comprehensive survey categorizing optimization methods as first-order, high-order, and derivative-free approaches for machine learning.
It evaluates practical implementations like SGD, Adam, and Hessian-free optimization, highlighting their strengths and limitations in deep learning and reinforcement learning.
It identifies challenges and future research directions, including optimizing non-convex functions and enhancing gradient estimation in large-scale models.

Overview of Optimization Methods in Machine Learning

The paper, titled "A Survey of Optimization Methods from a Machine Learning Perspective," provides a comprehensive examination of optimization strategies crucial for ML. As ML models grow in complexity and data explodes in volume, the significance of effective optimization cannot be overstated. This survey offers an extensive review of well-established and contemporary optimization methods, evaluating their applications, advantages, and limitations within the domain of ML.

Fundamental Optimization Methods

The paper categorizes optimization techniques into first-order, high-order, and derivative-free methods.

First-order Methods:
- Stochastic Gradient Descent (SGD): A widely adopted method due to its simplicity and efficiency with large datasets, albeit with limitations such as oscillating learning rates.
- Adaptive Methods (e.g., Adam, RMSProp): These algorithms enhance SGD by dynamically adjusting learning rates based on their trajectory, improving convergence rates in many scenarios.
- Variance Reduction Techniques: Methods like SAG and SVRG aim to address the high variance seen in vanilla SGD, achieving faster convergence by refining gradient estimation.
High-order Methods:
- Newton and Quasi-Newton Methods: Utilizing curvature information (second derivatives), these methods facilitate faster convergence. However, the computational overhead remains a challenge, especially with large-scale problems.
- Hessian-free Optimization: Avoids direct computation of Hessians by leveraging conjugate gradient methods, thus improving feasibility in the context of neural networks.
Derivative-free Optimization:
- Coordinate Descent: A suitable approach when derivatives are either non-existent or computationally expensive, by optimizing along one coordinate direction at a time.

Applications in Machine Learning

The discussion extends to the deployment of these optimization strategies across various ML sectors:

Deep Neural Networks (DNNs): Given the non-convex nature of DNNs, first-order methods dominate; however, second-order methods are gaining traction with techniques such as Hessian-free optimization tailored for DNN architectures.
Reinforcement Learning (RL): Optimization in RL often involves policy gradient methods, requiring simultaneous optimization over policy parameters.
Meta-Learning: Optimization plays a role in configuring models that can rapidly adapt to new tasks, highlighting development in techniques like Model-Agnostic Meta-Learning (MAML).
Variational Inference and MCMC: Techniques like stochastic variational inference blend optimization with probabilistic modeling, enabling efficient approximation of posterior distributions in Bayesian inference.

Challenges and Future Directions

The survey elucidates ongoing challenges such as optimizing non-convex functions in DNNs, efficient segmentation of sequences in models dealing with large datasets, and enhancing optimization algorithms to work efficiently in environments with limited data.

Potential areas for future research include developing robust methods for non-convex optimization, further integrating high-order derivative information into stochastic frameworks, and enhancing optimization in sequential and meta-learning contexts.

Conclusion

The exploration of optimization methods presented in this survey underscores their indispensable role in advancing machine learning. As the complexity of models escalates, continued refinement and innovation in optimization strategies remain critical to harnessing the potential of ML technologies effectively.

PDF Markdown