Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks (2407.16293v1)

Published 23 Jul 2024 in cs.LG

Abstract: The $\ell_{1,\infty}$ norm is an efficient-structured projection, but the complexity of the best algorithm is, unfortunately, $\mathcal{O}\big(n m \log(n m)\big)$ for a matrix $n\times m$.\ In this paper, we propose a new bi-level projection method, for which we show that the time complexity for the $\ell_{1,\infty}$ norm is only $\mathcal{O}\big(n m \big)$ for a matrix $n\times m$. Moreover, we provide a new $\ell_{1,\infty}$ identity with mathematical proof and experimental validation. Experiments show that our bi-level $\ell_{1,\infty}$ projection is $2.5$ times faster than the actual fastest algorithm and provides the best sparsity while keeping the same accuracy in classification applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Z. He and W. Yu, “Stable feature selection for biomarker discovery,” Computational biology and chemistry, vol. 34, no. 4, pp. 215–225, 2010.
  2. R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni, “Green AI,” 2019, preprint arXiv:1907.10597.
  3. J. M. Alvarez and M. Salzmann, “Learning the number of neurons in deep networks,” in Advances in Neural Information Processing Systems, 2016, pp. 2270–2278.
  4. S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in Advances in neural information processing systems, 2015, pp. 1135–1143.
  5. A. N. Gomez, I. Zhang, K. Swersky, Y. Gal, and G. E. Hinton, “Learning sparse networks using targeted dropout,” arXiv :1905.13678, 2019.
  6. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
  7. J. Cavazza, P. Morerio, B. Haeffele, C. Lane, V. Murino, and R. Vidal, “Dropout as a low-rank regularizer for matrix factorization,” in International Conference on Artificial Intelligence and Statistics (AISTATS), 2018, pp. 435–444.
  8. R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996.
  9. T. Hastie, R. Tibshirani, and M. Wainwright, “Statistcal learning with sparsity: The lasso and generalizations,” CRC Press, 2015.
  10. M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49–67, 2006.
  11. Z. Huang and N. Wang, “Data-driven sparse structure selection for deep neural networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 304–320.
  12. J. Yoon and S. J. Hwang, “Combined group and exclusive sparsity for deep neural networks,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70.   JMLR. org, 2017, pp. 3958–3966.
  13. S. Scardapane, D. Comminiello, A. Hussain, and A. Uncini, “Group sparse regularization for deep neural networks,” Neurocomputing, vol. 241, pp. 81–89, 2017.
  14. N. Simon, J. Friedman, T. Hastie, and R. Tibshirani, “A sparse-group lasso,” Journal of Computational and Graphical Statistics, vol. 22, no. 2, pp. 231–245, 2013.
  15. I. Yasutoshi, F. Yasuhiro, and K. Hisashi, “Fast sparse group lasso,” in Advances in Neural Information Processing Systems, vol. 32.   Curran Associates, Inc., 2019.
  16. J. Friedman, T. Hastie, and R. Tibshirani, “Regularization path for generalized linear models via coordinate descent,” Journal of Statistical Software, vol. 33, pp. 1–122, 2010.
  17. J. Mairal and B. Yu, “Complexity analysis of the lasso regularization path,” in Proceedings of the 29th International Conference on Machine Learning (ICML-12), 2012, pp. 353–360.
  18. G. Chierchia, N. Pustelnik, J. C. Pesquet, and B. Pesquet-Popescu, “Epigraphical projection and proximal tools for solving constrained convex optimization problems,” Signal, Image and Video Processing, 2015.
  19. M. Barlaud, W. Belhajali, P. Combettes, and L. Fillatre, “Classification and regression using an outer approximation projection-gradient method,” vol. 65, no. 17, 2017, pp. 4635–4643.
  20. L. Condat, “Fast projection onto the simplex and the l1 ball,” Mathematical Programming Series A, vol. 158, no. 1, pp. 575–585, 2016.
  21. G. Perez, M. Barlaud, L. Fillatre, and J.-C. Régin, “A filtered bucket-clustering method for projection onto the simplex and the ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-ball,” Mathematical Programming, May 2019.
  22. A. Quattoni, X. Carreras, M. Collins, and T. Darrell, “An efficient projection for ℓ1,∞subscriptℓ1\ell_{1,\infty}roman_ℓ start_POSTSUBSCRIPT 1 , ∞ end_POSTSUBSCRIPT regularization,” in Proceedings of the 26th Annual International Conference on Machine Learning, 2009, pp. 857–864.
  23. B. Bejar, I. Dokmanić, and R. Vidal, “The fastest ℓ1,∞subscriptℓ1\ell_{1,\infty}roman_ℓ start_POSTSUBSCRIPT 1 , ∞ end_POSTSUBSCRIPT prox in the West,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 7, pp. 3858–3869, 2021.
  24. G. Chau, B. Wohlberg, and P. Rodriguez, “Efficient projection onto the ℓ1,∞subscriptℓ1\ell_{1,\infty}roman_ℓ start_POSTSUBSCRIPT 1 , ∞ end_POSTSUBSCRIPT mixed-norm ball using a newton root search method,” SIAM Journal on Imaging Sciences, vol. 12, no. 1, pp. 604–623, 2019.
  25. D. Chu, C. Zhang, S. Sun, and Q. Tao, “Semismooth newton algorithm for efficient projections onto ℓ1,∞subscriptℓ1\ell_{1,\infty}roman_ℓ start_POSTSUBSCRIPT 1 , ∞ end_POSTSUBSCRIPT-norm ball,” in International Conference on Machine Learning, 2020, pp. 1974–1983.
  26. J. J. Moreau, “Fonctions convexes duales et points proximaux dans un espace hilbertien,” Comptes Rendus de l’Académie des Sciences de Paris, vol. A255, no. 22, pp. 2897–2899, Nov 1962.
  27. P. L. Combettes and J.-C. Pesquet, “Proximal splitting methods in signal processing,” in Fixed-point algorithms for inverse problems in science and engineering.   Springer, 2011, pp. 185–212.
  28. L. Condat, D. Kitahara, A. Contreras, and A. Hirabayashi, “Proximal splitting algorithms for convex optimization: A tour of recent advances, with new twists,” SIAM Review, vol. 65, no. 2, pp. 375–435, May 2023.
  29. G. Perez, L. Condat, and M. Barlaud, “Near-linear time projection onto the l1,infty ball application to sparse autoencoders.” arXiv: 2307.09836, 2023.
  30. A. Sinha, P. Malo, and K. Deb, “A review on bilevel optimization: From classical to evolutionary approaches and applications,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 2, pp. 276–295, 2018.
  31. K. Bennett, J. Hu, X. Ji, G. Kunapuli, and J.-S. Pang, “Model selection via bilevel optimization,” IEEE International Conference on Neural Networks - Conference Proceedings, 2006.
  32. P. C. Hansen and D. P. O’Leary, “The use of the l-curve in the regularization of discrete ill-posed problems,” SIAM Journal on Scientific Computing, vol. 14, 1993.
  33. J. Moreau, “Proximité et dualité dans un espace hilbertien,” Bull. Soc.Math. France., 93, pp. 273–299, 1965.
  34. N. Parikh and S. Boyd, “Proximal algorithms,” Foundations and Trends® in Optimization, 2014.
  35. L. Theis, W. Shi, A. Cunningham, and F. Huszár, “Lossy image compression with compressive autoencoders,” ICLR Conference Toulon, 2017.
  36. D. Kingma and M. Welling, “Auto-encoding variational bayes,” International Conference on Learning Representation, 2014.
  37. J. Snoek, R. Adams, and H. Larochelle, “On nonparametric guidance for learning autoencoder representations,” in Artificial Intelligence and Statistics.   PMLR, 2012, pp. 1073–1080.
  38. L. Le, A. Patterson, and M. White, “Supervised autoencoders: Improving generalization performance with unsupervised regularizers,” Advances in Neural Information Processing Systems, 2018.
  39. M. Barlaud and F. Guyard, “Learning a sparse generative non-parametric supervised autoencoder,” Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, June 2021.
  40. J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” arXiv preprint arXiv:1803.03635, 2018.
  41. H. Zhou, J. Lan, R. Liu, and J. Yosinski, “Deconstructing lottery tickets: Zeros, signs, and the supermask,” in Advances in Neural Information Processing Systems 32, 2019, pp. 3597–3607.
  42. D. Kingma and J. Ba, “a method for stochastic optimization.” International Conference on Learning Representations, pp. 1–13, 2015.
  43. M. Truchi, C. Lacoux, C. Gille, J. Fassy, V. Magnone, R. Lopes Goncalves, C. Girard-Riboulleau, I. Manosalva-Pena, M. Gautier-Isola, K. Lebrigand, P. Barbry, S. Spicuglia, G. Vassaux, R. Rezzonico, M. Barlaud, and B. Mari, “Detecting subtle transcriptomic perturbations induced by lncrnas knock-down in single-cell crispri screening using a new sparse supervised autoencoder neural network,” Frontiers in Bioinformatics, 2024.
  44. M. Barlaud and F. Guyard, “Learning sparse deep neural networks using efficient structured projections on convex constraints for green ai,” International Conference on Pattern Recognition, Milan, pp. 1566–1573, 2020.
  45. G. Cyprien, F. Guyard, M. Antonini, and M. Barlaud, “Learning sparse autoencoders for green ai image coding,” Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Rhodes, Greece, June 2023.
  46. J. Ascenso, E. Alshina, and T. Ebrahimi, “The jpeg ai standard: Providing efficient human and machine visual data consumption,” IEEE MultiMedia, vol. 30, no. 1, pp. 100–111, 2023.

Summary

We haven't generated a summary for this paper yet.