Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Manipulating Sparse Double Descent (2401.10686v1)

Published 19 Jan 2024 in cs.LG

Abstract: This paper investigates the double descent phenomenon in two-layer neural networks, focusing on the role of L1 regularization and representation dimensions. It explores an alternative double descent phenomenon, named sparse double descent. The study emphasizes the complex relationship between model complexity, sparsity, and generalization, and suggests further research into more diverse models and datasets. The findings contribute to a deeper understanding of neural network training and optimization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. Neural networks as kernel learners: The silent alignment effect. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=1NvflqAdoom.
  2. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32):15849–15854, August 2019. doi: 10.1073/pnas.1903070116. URL https://www.pnas.org/doi/10.1073/pnas.1903070116. Publisher: Proceedings of the National Academy of Sciences.
  3. A u-turn on double descent: Rethinking parameter counting in statistical learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=O0Lz8XZT2b.
  4. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rJl-b3RcF7.
  5. Disentangling feature and lazy training in deep neural networks. Journal of Statistical Mechanics: Theory and Experiment, 2020(11):113301, nov 2020. doi: 10.1088/1742-5468/abc4de. URL https://dx.doi.org/10.1088/1742-5468/abc4de.
  6. Sparse double descent: Where network pruning aggravates overfitting. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 8635–8659. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/he22d.html.
  7. Neural tangent kernel: Convergence and generalization in neural networks (invited paper). In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, page 6, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450380539. doi: 10.1145/3406325.3465355. URL https://doi.org/10.1145/3406325.3465355.
  8. Optimal brain damage. In D. Touretzky, editor, Advances in Neural Information Processing Systems, volume 2. Morgan-Kaufmann, 1989. URL https://proceedings.neurips.cc/paper_files/paper/1989/file/6c9882bbac1c7093bd25041881277658-Paper.pdf.
  9. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
  10. Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020. doi: 10.1073/pnas.2015509117.
  11. Why l1 is a good approximation to l0: A geometric explanation. Journal of Uncertain Systems, 2013.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Ya Shi Zhang (3 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com