AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks
(2105.10190v2)
Published 21 May 2021 in cs.LG, cs.NE, and stat.ML
Abstract: Convolutional neural networks (CNNs) are trained using stochastic gradient descent (SGD)-based optimizers. Recently, the adaptive moment estimation (Adam) optimizer has become very popular due to its adaptive momentum, which tackles the dying gradient problem of SGD. Nevertheless, existing optimizers are still unable to exploit the optimization curvature information efficiently. This paper proposes a new AngularGrad optimizer that considers the behavior of the direction/angle of consecutive gradients. This is the first attempt in the literature to exploit the gradient angular information apart from its magnitude. The proposed AngularGrad generates a score to control the step size based on the gradient angular information of previous iterations. Thus, the optimization steps become smoother as a more accurate step size of immediate past gradients is captured through the angular information. Two variants of AngularGrad are developed based on the use of Tangent or Cosine functions for computing the gradient angular information. Theoretically, AngularGrad exhibits the same regret bound as Adam for convergence purposes. Nevertheless, extensive experiments conducted on benchmark data sets against state-of-the-art methods reveal a superior performance of AngularGrad. The source code will be made publicly available at: https://github.com/mhaut/AngularGrad.
Overview of "AngularGrad: A New Optimization Technique for Angular Convergence of Neural Networks"
The paper "AngularGrad: A New Optimization Technique for Angular Convergence of Neural Networks" introduces an innovative optimization algorithm, AngularGrad, which addresses the inherent limitations of existing gradient-based optimizers in training deep neural networks (DNNs). While existing optimizers such as Adam are popular due to their adaptive moment capabilities, they face challenges in efficiently utilizing gradient curvature information and in overcoming optimization trajectory issues, such as zigzagging. The AngularGrad algorithm aims to overcome these challenges by leveraging angular gradient information, thus providing a novel perspective in optimization techniques for deep learning.
The primary contribution of AngularGrad is the introduction of angular coefficients that adaptively control the step size for convergence by examining the angular information of consecutive gradients. This approach is novel as it is the first to consider both the magnitude and directional changes (i.e., angular information) of gradients. Two variants of AngularGrad are introduced, which differ in their use of Tangent or Cosine functions to compute gradient angular information. The research demonstrates that AngularGrad achieves the same theoretical regret bound as Adam, while extensive experiments indicate superior performance across a wide range of benchmark datasets and network architectures.
Numerical Results and Significance
From a numerical standpoint, AngularGrad showcases substantial improvements in convergence speed and accuracy across diverse neural network architectures. The performance of AngularGrad was rigorously evaluated on CIFAR10, CIFAR100, Mini-ImageNet, and ImageNet datasets, with additional tests on fine-grained classification datasets, such as Stanford Cars and CUB-200-2011. Consistently, AngularGrad outperformed traditional optimizers like SGDM, RMSprop, and Adam, manifesting superior classification accuracy and smoother optimization trajectories. The empirical analysis, including tests on toy functions and the challenging Rosenbrock function, further validates the proposed method's ability to reach global minima, avoiding pitfalls common to other optimizers, such as overshooting and stagnation at local minima.
Theoretical and Practical Implications
Theoretically, AngularGrad introduces a compelling approach to optimization by addressing the zigzag phenomena commonly associated with high variance in gradients during DNN training. The convergence analysis supports its efficacy, revealing an average regret convergence of O(1/T), indicating strong theoretical underpinnings comparable to leading contemporary optimizers.
In practice, AngularGrad's enhanced convergence path smoothness translates to improved model generalization, crucial for applications demanding high reliability and efficiency. This has significant implications for real-world scenarios where computational resources and time efficiency are critical, such as in autonomous vehicles, medical diagnosis systems, and large-scale image recognition tasks.
Future Directions
Although AngularGrad presents noticeable advancements, future work could focus on further refining the algorithm for larger and more complex datasets, exploring its application in other forms of neural networks beyond CNNs, such as recurrent neural networks (RNNs) or transformer architectures. Additionally, integrating AngularGrad with other model refinement techniques, like ensemble methods or neuroevolutionary strategies, might yield further performance gains.
Overall, AngularGrad stands as a significant contribution to the field of optimization within deep learning, providing a robust framework for further exploration and enhancement of gradient-based optimization methods. The novel utilization of angular information provides a fresh direction for research, potentially inspiring the development of other angle-informed techniques that could revolutionize training dynamics in machine learning models.