Papers
Topics
Authors
Recent
2000 character limit reached

spred: Solving $L_1$ Penalty with SGD (2210.01212v5)

Published 3 Oct 2022 in cs.LG and stat.ML

Abstract: We propose to minimize a generic differentiable objective with $L_1$ constraint using a simple reparametrization and straightforward stochastic gradient descent. Our proposal is the direct generalization of previous ideas that the $L_1$ penalty may be equivalent to a differentiable reparametrization with weight decay. We prove that the proposed method, \textit{spred}, is an exact differentiable solver of $L_1$ and that the reparametrization trick is completely ``benign" for a generic nonconvex function. Practically, we demonstrate the usefulness of the method in (1) training sparse neural networks to perform gene selection tasks, which involves finding relevant features in a very high dimensional space, and (2) neural network compression task, to which previous attempts at applying the $L_1$-penalty have been unsuccessful. Conceptually, our result bridges the gap between the sparsity in deep learning and conventional statistical learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202.
  2. What is the state of neural network pruning? Proceedings of machine learning and systems, 2:129–146.
  3. Enhancing sparsity by reweighted l1 minimization. Journal of Fourier analysis and applications, 14(5):877–905.
  4. Sparse multinomial logistic regression via bayesian l1 regularisation. Advances in neural information processing systems, 19.
  5. Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on information theory, 52(4):1289–1306.
  6. Gradient descent can take exponential time to escape saddle points. Advances in neural information processing systems, 30.
  7. Least angle regression. The Annals of statistics, 32(2):407–499.
  8. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1):1.
  9. The state of sparsity in deep neural networks. arXiv preprint arXiv:1902.09574.
  10. Grandvalet, Y. (1998). Least absolute shrinkage is equivalent to quadratic penalization. In International Conference on Artificial Neural Networks, pages 201–206. Springer.
  11. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.
  12. Hoff, P. D. (2017). Lasso, fractional norm and structured sparse estimation using a hadamard product parametrization. Computational Statistics & Data Analysis, 115:186–198.
  13. How to escape saddle points efficiently. In International Conference on Machine Learning, pages 1724–1732. PMLR.
  14. Claudin-2 is an independent negative prognostic factor in breast cancer and specifically predicts early liver recurrences. Molecular oncology, 8(1):119–128.
  15. Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets. Plos one, 14(3):e0213584.
  16. Soft threshold weight reparameterization for learnable sparsity. In International Conference on Machine Learning, pages 5544–5555. PMLR.
  17. Integrating factor analysis and a transgenic mouse model to reveal a peripheral blood predictor of breast tumors. BMC medical genomics, 4(1):1–14.
  18. Optimal brain damage. Advances in neural information processing systems, 2.
  19. A review of deep learning applications for genomic selection. BMC genomics, 22(1):1–23.
  20. Regional variation in gene expression in the healthy colon is dysregulated in ulcerative colitis. Gut, 57(10):1398–1405.
  21. Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. Cancer cell, 9(3):157–173.
  22. Smooth bilevel programming for sparse regularization. Advances in Neural Information Processing Systems, 34:1543–1555.
  23. Smooth over-parameterized solvers for non-smooth structured optimization. arXiv preprint arXiv:2205.01385.
  24. based pam50 subtype predictor identifies higher responses and improved survival outcomes in her2-positive breast cancer in the noah study. Clinical Cancer Research, 20(2):511–521.
  25. Linear inversion of band-limited reflection seismograms. SIAM Journal on Scientific and Statistical Computing, 7(4):1307–1330.
  26. Group sparse regularization for deep neural networks. Neurocomputing, 241:81–89.
  27. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19(17):2246–2253.
  28. Feature selection for nonlinear regression and its application to cancer research. In Proceedings of the 2015 SIAM International Conference on Data Mining, pages 73–81. SIAM.
  29. Pruning neural networks without any data by iteratively conserving synaptic flow. Advances in Neural Information Processing Systems, 33:6377–6389.
  30. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288.
  31. Wasserman, L. (2013). All of statistics: a concise course in statistical inference. Springer Science & Business Media.
  32. Discovering neural wirings. Advances in Neural Information Processing Systems, 32.
  33. High-dimensional feature selection by feature-wise kernelized lasso. Neural computation, 26(1):185–207.
  34. Sgd can converge to local maxima. In International Conference on Learning Representations.
Citations (11)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube