Stability of Accuracy for the Training of DNNs Via the Uniform Doubling Condition (2210.08415v3)
Abstract: We study the stability of accuracy during the training of deep neural networks (DNNs). In this context, the training of a DNN is performed via the minimization of a cross-entropy loss function, and the performance metric is accuracy (the proportion of objects that are classified correctly). While training results in a decrease of loss, the accuracy does not necessarily increase during the process and may sometimes even decrease. The goal of achieving stability of accuracy is to ensure that if accuracy is high at some initial time, it remains high throughout training. A recent result by Berlyand, Jabin, and Safsten introduces a doubling condition on the training data, which ensures the stability of accuracy during training for DNNs using the absolute value activation function. For training data in $\mathbb{R}n$, this doubling condition is formulated using slabs in $\mathbb{R}n$ and depends on the choice of the slabs. The goal of this paper is twofold. First, to make the doubling condition uniform, that is, independent of the choice of slabs. This leads to sufficient conditions for stability in terms of training data only. In other words, for a training set $T$ that satisfies the uniform doubling condition, there exists a family of DNNs such that a DNN from this family with high accuracy on the training set at some training time $t_0$ will have high accuracy for all time $t>t_0$. Moreover, establishing uniformity is necessary for the numerical implementation of the doubling condition. The second goal is to extend the original stability results from the absolute value activation function to a broader class of piecewise linear activation functions with finitely many critical points, such as the popular Leaky ReLU.
- Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems, 2, 1989.
- Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012.
- Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 27, 2014.
- Stability for the training of deep neural networks and other classifiers. Mathematical Models and Methods in Applied Sciences, 31(11):2345–2390, 2021.
- Deep learning. MIT Press, 2016.
- The implicit bias of gradient descent on separable data. The Journal of Machine Learning Research, 19(1):2822–2878, 2018.
- Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of the 24th International Conference on Machine Learning, pages 807–814, 2007.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
- The power of interpolation: Understanding the effectiveness of sgd in modern over-parametrized learning. In International Conference on Machine Learning, pages 3325–3334. PMLR, 2018.
- Generalization in deep learning. arXiv preprint arXiv:1710.05468, 2017.
- Learning curves for overparametrized deep neural networks: A field theory perspective. Physical Review Research, 3(2):023034, 2021.
- Trained rank pruning for efficient deep neural networks. In 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), pages 14–17. IEEE, 2019.
- Learning low-rank deep neural networks via singular vector orthogonality regularization and singular value sparsification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 678–679, 2020.
- Restructuring of deep neural network acoustic models with singular value decomposition. In Interspeech, pages 2365–2369, 2013.
- Fast learning of deep neural networks via singular value decomposition. In Pacific Rim International Conference on Artificial Intelligence, pages 820–826. Springer, 2014.
- Svd-based dnn pruning and retraining. Journal of Tsinghua University (Science and Technology), 56(7):772–776, 2016.
- Enhancing accuracy in deep learning using random matrix theory. arXiv preprint arXiv:2310.03165, 2023.
- Deep learning weight pruning with rmt-svd: Increasing accuracy and reducing overfitting. arXiv preprint arXiv:2303.08986, 2023.
- Boundary between noise and information applied to filtering neural network weight matrices. Phys. Rev. E, 108:L022302, Aug 2023.
- The optimal hard threshold for singular values is 4/3434/\sqrt{3}4 / square-root start_ARG 3 end_ARG. IEEE Transactions on Information Theory, 60(8):5040–5053, 2014.
- Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Improving the robustness of deep neural networks via stability training. In Proceedings of the Ieee Conference on Computer Vision and Pattern Recognition, pages 4480–4488, 2016.
- On mixup training: Improved calibration and predictive uncertainty for deep neural networks. Advances in Neural Information Processing Systems, 32, 2019.