Can we avoid Double Descent in Deep Neural Networks? (2302.13259v4)
Abstract: Finding the optimal size of deep learning models is very actual and of broad impact, especially in energy-saving schemes. Very recently, an unexpected phenomenon, the ``double descent'', has caught the attention of the deep learning community. As the model's size grows, the performance gets first worse, and then goes back to improving. It raises serious questions about the optimal model's size to maintain high generalization: the model needs to be sufficiently over-parametrized, but adding too many parameters wastes training resources. Is it possible to find, in an efficient way, the best trade-off? Our work shows that the double descent phenomenon is potentially avoidable with proper conditioning of the learning problem, but a final answer is yet to be found. We empirically observe that there is hope to dodge the double descent in complex scenarios with proper regularization, as a simple $\ell_2$ regularization is already positively contributing to such a perspective.
- “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022.
- “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” in International Conference on Learning Representations, 2019.
- “To update or not to update? neurons at equilibrium in deep models,” Advances in neural information processing systems, 2022.
- Enzo Tartaglione, “The rise of the lottery heroes: why zero-shot pruning is hard,” in 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022, pp. 2361–2365.
- “Classification of thyroid nodules in ultrasound images using deep model based transfer learning and hybrid features,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017, pp. 919–923.
- “Multi-task self-supervised learning for robust speech recognition,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 6989–6993.
- “Reconciling modern machine-learning practice and the classical bias–variance trade-off,” Proceedings of the National Academy of Sciences, vol. 116, no. 32, pp. 15849–15854, jul 2019.
- “A jamming transition from under-to over-parametrization affects generalization in deep learning,” Journal of Physics A: Mathematical and Theoretical, vol. 52, no. 47, pp. 474001, 2019.
- “Jamming transition as a paradigm to understand the loss landscape of deep neural networks,” Physical Review E, vol. 100, no. 1, jul 2019.
- “Optimal regularization can mitigate double descent,” in International Conference on Learning Representations, 2021.
- “Multiple descent in the multiple random feature model,” 2022.
- “Harmless interpolation of noisy data in regression,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 1, pp. 67–83, 2020.
- “Regularization-wise double descent: Why it occurs and how to eliminate it,” in 2022 IEEE International Symposium on Information Theory (ISIT). IEEE, 2022, pp. 426–431.
- “Deep double descent: Where bigger models and more data hurt,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2021, no. 12, pp. 124003, 2021.
- “Provable benefits of overparameterization in model compression: From double descent to pruning neural networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, pp. 6974–6983.
- “Sparse double descent: Where network pruning aggravates overfitting,” in International Conference on Machine Learning. PMLR, 2022, pp. 8635–8659.
- “Learning both weights and connections for efficient neural network,” Advances in neural information processing systems, vol. 28, 2015.
- “The state of sparsity in deep neural networks,” arXiv preprint arXiv:1902.09574, 2019.
- “The lottery ticket hypothesis: Training pruned neural networks,” in Proceedings of the 7th International Conference on Learning Representations (ICLR). New Orleans, Louisiana, 2019, pp. 1–42.
- “Dimensionality-driven learning with noisy labels,” in International Conference on Machine Learning. PMLR, 2018, pp. 3355–3364.
- Victor Quétu (11 papers)
- Enzo Tartaglione (68 papers)