Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks (2405.02086v2)

Published 3 May 2024 in cs.LG

Abstract: The $\ell_{1,\infty}$ norm is an efficient structured projection but the complexity of the best algorithm is unfortunately $\mathcal{O}\big(n m \log(n m)\big)$ for a matrix in $\mathbb{R}{n\times m}$. In this paper, we propose a new bi-level projection method for which we show that the time complexity for the $\ell_{1,\infty}$ norm is only $\mathcal{O}\big(n m \big)$ for a matrix in $\mathbb{R}{n\times m}$, and $\mathcal{O}\big(n + m \big)$ with full parallel power. We generalize our method to tensors and we propose a new multi-level projection, having an induced decomposition that yields a linear parallel speedup up to an exponential speedup factor, resulting in a time complexity lower-bounded by the sum of the dimensions, instead of the product of the dimensions. we provide a large base of implementation of our framework for bi-level and tri-level (matrices and tensors) for various norms and provides also the parallel implementation. Experiments show that our projection is $2$ times faster than the actual fastest Euclidean algorithms while providing same accuracy and better sparsity in neural networks applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. T. Abeel, T. Helleputte, Y. Van de Peer, P. Dupont, and Y. Saeys, “Robust biomarker identification for cancer diagnosis with ensemble feature selection methods,” Bioinformatics, vol. 26, no. 3, pp. 392–398, 2009.
  2. Z. He and W. Yu, “Stable feature selection for biomarker discovery,” Computational biology and chemistry, vol. 34, no. 4, pp. 215–225, 2010.
  3. D. L. Donoho et al., “Compressed sensing,” IEEE Transactions on information theory, vol. 52, no. 4, pp. 1289–1306, 2006.
  4. S. J. Wright, R. D. Nowak, and M. A. Figueiredo, “Sparse reconstruction by separable approximation,” IEEE Transactions on signal processing, vol. 57, no. 7, pp. 2479–2493, 2009.
  5. J. M. Alvarez and M. Salzmann, “Learning the number of neurons in deep networks,” in Advances in Neural Information Processing Systems, 2016, pp. 2270–2278.
  6. S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in Advances in neural information processing systems, 2015, pp. 1135–1143.
  7. E. Tartaglione, S. Lepsøy, A. Fiandrotti, and G. Francini, “Learning sparse neural networks via sensitivity-driven regularization,” in Advances in Neural Information Processing Systems, 2018, pp. 3878–3888.
  8. A. N. Gomez, I. Zhang, K. Swersky, Y. Gal, and G. E. Hinton, “Learning sparse networks using targeted dropout,” arXiv :1905.13678, 2019.
  9. U. Oswal, C. Cox, M. Lambon-Ralph, T. Rogers, and R. Nowak, “Representational similarity learning with application to brain networks,” in International Conference on Machine Learning, 2016, pp. 1041–1049.
  10. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
  11. J. Cavazza, P. Morerio, B. Haeffele, C. Lane, V. Murino, and R. Vidal, “Dropout as a low-rank regularizer for matrix factorization,” in International Conference on Artificial Intelligence and Statistics (AISTATS), 2018, pp. 435–444.
  12. R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996.
  13. T. Hastie, R. Tibshirani, and M. Wainwright, “Statistcal learning with sparsity: The lasso and generalizations,” CRC Press, 2015.
  14. L. Condat, “Fast projection onto the simplex and the l1 ball,” Mathematical Programming Series A, vol. 158, no. 1, pp. 575–585, 2016.
  15. G. Perez, M. Barlaud, L. Fillatre, and J.-C. Régin, “A filtered bucket-clustering method for projection onto the simplex and the ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-ball,” Mathematical Programming, May 2019.
  16. M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49–67, 2006.
  17. Z. Huang and N. Wang, “Data-driven sparse structure selection for deep neural networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 304–320.
  18. J. Yoon and S. J. Hwang, “Combined group and exclusive sparsity for deep neural networks,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70.   JMLR. org, 2017, pp. 3958–3966.
  19. S. Scardapane, D. Comminiello, A. Hussain, and A. Uncini, “Group sparse regularization for deep neural networks,” Neurocomputing, vol. 241, pp. 81–89, 2017.
  20. N. Simon, J. Friedman, T. Hastie, and R. Tibshirani, “A sparse-group lasso,” Journal of Computational and Graphical Statistics, vol. 22, no. 2, pp. 231–245, 2013.
  21. I. Yasutoshi, F. Yasuhiro, and K. Hisashi, “Fast sparse group lasso,” in Advances in Neural Information Processing Systems, vol. 32.   Curran Associates, Inc., 2019.
  22. M. Barlaud and F. Guyard, “Learning sparse deep neural networks using efficient structured projections on convex constraints for green ai,” International Conference on Pattern Recognition, Milan, pp. 1566–1573, 2020.
  23. A. Quattoni, X. Carreras, M. Collins, and T. Darrell, “An efficient projection for ℓ1,∞subscriptℓ1\ell_{1,\infty}roman_ℓ start_POSTSUBSCRIPT 1 , ∞ end_POSTSUBSCRIPT regularization,” in Proceedings of the 26th Annual International Conference on Machine Learning, 2009, pp. 857–864.
  24. A. Rakotomamonjy, R. Flamary, G. Gasso, and S. Canu, “lp-lq penalty for sparse linear and sparse multiple kernel multitask learning,” IEEE Transactions on Neural Networks, vol. 22, p. 307–1320, 2011.
  25. B. Bejar, I. Dokmanić, and R. Vidal, “The fastest ℓ1,∞subscriptℓ1\ell_{1,\infty}roman_ℓ start_POSTSUBSCRIPT 1 , ∞ end_POSTSUBSCRIPT prox in the West,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 7, pp. 3858–3869, 2021.
  26. G. Chau, B. Wohlberg, and P. Rodriguez, “Efficient projection onto the ℓ1,∞subscriptℓ1\ell_{1,\infty}roman_ℓ start_POSTSUBSCRIPT 1 , ∞ end_POSTSUBSCRIPT mixed-norm ball using a newton root search method,” SIAM Journal on Imaging Sciences, vol. 12, no. 1, pp. 604–623, 2019.
  27. D. Chu, C. Zhang, S. Sun, and Q. Tao, “Semismooth newton algorithm for efficient projections onto ℓ1,∞subscriptℓ1\ell_{1,\infty}roman_ℓ start_POSTSUBSCRIPT 1 , ∞ end_POSTSUBSCRIPT-norm ball,” in International Conference on Machine Learning, 2020, pp. 1974–1983.
  28. A. Sinha, P. Malo, and K. Deb, “A review on bilevel optimization: From classical to evolutionary approaches and applications,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 2, pp. 276–295, 2018.
  29. K. Bennett, J. Hu, X. Ji, G. Kunapuli, and J.-S. Pang, “Model selection via bilevel optimization,” IEEE International Conference on Neural Networks - Conference Proceedings, 2006.
  30. G. Perez, L. Condat, and M. Barlaud, “Near-linear time projection onto the l1,infty ball application to sparse autoencoders.” arXiv: 2307.09836, 2023.
  31. D. Kong, R. Fujimaki, J. Liu, F. Nie, and C. Ding, “Exclusive feature learning on arbitrary structures via \e⁢l⁢l⁢_⁢{1,2}\absent𝑒𝑙𝑙_12\backslash ell\_\{1,2\}\ italic_e italic_l italic_l _ { 1 , 2 } -norm,” in Advances in Neural Information Processing Systems 27.   Curran Associates, Inc., 2014, pp. 1655–1663.
  32. D. Gregoratti, X. Mestre, and C. Buelga, “Exclusive group lasso for structured variable selection,” arXiv,2108.10284, 2021.
  33. Y. Zhou, R. Jin, and S. C. Hoi, “Exclusive group lasso for multi-task feature selection,” Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010.
  34. M. Barlaud, A. Chambolle, and J.-B. Caillau, “Classification and feature selection using a primal-dual method and projection on structured constraints,” International Conference on Pattern Recognition, Milan, pp. 6538–6545, 2020.
  35. L. Theis, W. Shi, A. Cunningham, and F. Huszár, “Lossy image compression with compressive autoencoders,” ICLR Conference Toulon, 2017.
  36. F. Mentzer, G. Toderici, M. Tschannen, and E. Agustsson, “High-fidelity generative image compression,” NEURIPS, 2020.
  37. J. Ascenso, E. Alshina, and T. Ebrahimi, “The jpeg ai standard: Providing efficient human and machine visual data consumption,” IEEE MultiMedia, vol. 30, no. 1, pp. 100–111, 2023.
  38. D. Kingma and M. Welling, “Auto-encoding variational bayes,” International Conference on Learning Representation, 2014.
  39. D. P. Kingma, S. Mohamed, D. Jimenez Rezende, and M. Welling, “Semi-supervised learning with deep generative models,” Advances in neural information processing systems, vol. 27, 2014.
  40. J. Snoek, R. Adams, and H. Larochelle, “On nonparametric guidance for learning autoencoder representations,” in Artificial Intelligence and Statistics.   PMLR, 2012, pp. 1073–1080.
  41. L. Le, A. Patterson, and M. White, “Supervised autoencoders: Improving generalization performance with unsupervised regularizers,” Advances in Neural Information Processing Systems, 2018.
  42. M. Barlaud and F. Guyard, “Learning a sparse generative non-parametric supervised autoencoder,” Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, June 2021.
  43. M. Barlaud, W. Belhajali, P. Combettes, and L. Fillatre, “Classification and regression using an outer approximation projection-gradient method,” vol. 65, no. 17, 2017, pp. 4635–4643.
  44. T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu, “The entire regularization path for the support vector machine,” Journal of Machine Learning Research, vol. 5, pp. 1391–1415, 2004.
  45. J. Friedman, T. Hastie, and R. Tibshirani, “Regularization path for generalized linear models via coordinate descent,” Journal of Statistical Software, vol. 33, pp. 1–122, 2010.
  46. J. Mairal and B. Yu, “Complexity analysis of the lasso regularization path,” in Proceedings of the 29th International Conference on Machine Learning (ICML-12), 2012, pp. 353–360.
  47. J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” arXiv preprint arXiv:1803.03635, 2018.
  48. H. Zhou, J. Lan, R. Liu, and J. Yosinski, “Deconstructing lottery tickets: Zeros, signs, and the supermask,” in Advances in Neural Information Processing Systems 32, 2019, pp. 3597–3607.
  49. D. Kingma and J. Ba, “a method for stochastic optimization.” International Conference on Learning Representations, pp. 1–13, 2015.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets