Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks (2405.02086v2)
Abstract: The $\ell_{1,\infty}$ norm is an efficient structured projection but the complexity of the best algorithm is unfortunately $\mathcal{O}\big(n m \log(n m)\big)$ for a matrix in $\mathbb{R}{n\times m}$. In this paper, we propose a new bi-level projection method for which we show that the time complexity for the $\ell_{1,\infty}$ norm is only $\mathcal{O}\big(n m \big)$ for a matrix in $\mathbb{R}{n\times m}$, and $\mathcal{O}\big(n + m \big)$ with full parallel power. We generalize our method to tensors and we propose a new multi-level projection, having an induced decomposition that yields a linear parallel speedup up to an exponential speedup factor, resulting in a time complexity lower-bounded by the sum of the dimensions, instead of the product of the dimensions. we provide a large base of implementation of our framework for bi-level and tri-level (matrices and tensors) for various norms and provides also the parallel implementation. Experiments show that our projection is $2$ times faster than the actual fastest Euclidean algorithms while providing same accuracy and better sparsity in neural networks applications.
- T. Abeel, T. Helleputte, Y. Van de Peer, P. Dupont, and Y. Saeys, “Robust biomarker identification for cancer diagnosis with ensemble feature selection methods,” Bioinformatics, vol. 26, no. 3, pp. 392–398, 2009.
- Z. He and W. Yu, “Stable feature selection for biomarker discovery,” Computational biology and chemistry, vol. 34, no. 4, pp. 215–225, 2010.
- D. L. Donoho et al., “Compressed sensing,” IEEE Transactions on information theory, vol. 52, no. 4, pp. 1289–1306, 2006.
- S. J. Wright, R. D. Nowak, and M. A. Figueiredo, “Sparse reconstruction by separable approximation,” IEEE Transactions on signal processing, vol. 57, no. 7, pp. 2479–2493, 2009.
- J. M. Alvarez and M. Salzmann, “Learning the number of neurons in deep networks,” in Advances in Neural Information Processing Systems, 2016, pp. 2270–2278.
- S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in Advances in neural information processing systems, 2015, pp. 1135–1143.
- E. Tartaglione, S. Lepsøy, A. Fiandrotti, and G. Francini, “Learning sparse neural networks via sensitivity-driven regularization,” in Advances in Neural Information Processing Systems, 2018, pp. 3878–3888.
- A. N. Gomez, I. Zhang, K. Swersky, Y. Gal, and G. E. Hinton, “Learning sparse networks using targeted dropout,” arXiv :1905.13678, 2019.
- U. Oswal, C. Cox, M. Lambon-Ralph, T. Rogers, and R. Nowak, “Representational similarity learning with application to brain networks,” in International Conference on Machine Learning, 2016, pp. 1041–1049.
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
- J. Cavazza, P. Morerio, B. Haeffele, C. Lane, V. Murino, and R. Vidal, “Dropout as a low-rank regularizer for matrix factorization,” in International Conference on Artificial Intelligence and Statistics (AISTATS), 2018, pp. 435–444.
- R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996.
- T. Hastie, R. Tibshirani, and M. Wainwright, “Statistcal learning with sparsity: The lasso and generalizations,” CRC Press, 2015.
- L. Condat, “Fast projection onto the simplex and the l1 ball,” Mathematical Programming Series A, vol. 158, no. 1, pp. 575–585, 2016.
- G. Perez, M. Barlaud, L. Fillatre, and J.-C. Régin, “A filtered bucket-clustering method for projection onto the simplex and the ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-ball,” Mathematical Programming, May 2019.
- M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49–67, 2006.
- Z. Huang and N. Wang, “Data-driven sparse structure selection for deep neural networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 304–320.
- J. Yoon and S. J. Hwang, “Combined group and exclusive sparsity for deep neural networks,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 3958–3966.
- S. Scardapane, D. Comminiello, A. Hussain, and A. Uncini, “Group sparse regularization for deep neural networks,” Neurocomputing, vol. 241, pp. 81–89, 2017.
- N. Simon, J. Friedman, T. Hastie, and R. Tibshirani, “A sparse-group lasso,” Journal of Computational and Graphical Statistics, vol. 22, no. 2, pp. 231–245, 2013.
- I. Yasutoshi, F. Yasuhiro, and K. Hisashi, “Fast sparse group lasso,” in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019.
- M. Barlaud and F. Guyard, “Learning sparse deep neural networks using efficient structured projections on convex constraints for green ai,” International Conference on Pattern Recognition, Milan, pp. 1566–1573, 2020.
- A. Quattoni, X. Carreras, M. Collins, and T. Darrell, “An efficient projection for ℓ1,∞subscriptℓ1\ell_{1,\infty}roman_ℓ start_POSTSUBSCRIPT 1 , ∞ end_POSTSUBSCRIPT regularization,” in Proceedings of the 26th Annual International Conference on Machine Learning, 2009, pp. 857–864.
- A. Rakotomamonjy, R. Flamary, G. Gasso, and S. Canu, “lp-lq penalty for sparse linear and sparse multiple kernel multitask learning,” IEEE Transactions on Neural Networks, vol. 22, p. 307–1320, 2011.
- B. Bejar, I. Dokmanić, and R. Vidal, “The fastest ℓ1,∞subscriptℓ1\ell_{1,\infty}roman_ℓ start_POSTSUBSCRIPT 1 , ∞ end_POSTSUBSCRIPT prox in the West,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 7, pp. 3858–3869, 2021.
- G. Chau, B. Wohlberg, and P. Rodriguez, “Efficient projection onto the ℓ1,∞subscriptℓ1\ell_{1,\infty}roman_ℓ start_POSTSUBSCRIPT 1 , ∞ end_POSTSUBSCRIPT mixed-norm ball using a newton root search method,” SIAM Journal on Imaging Sciences, vol. 12, no. 1, pp. 604–623, 2019.
- D. Chu, C. Zhang, S. Sun, and Q. Tao, “Semismooth newton algorithm for efficient projections onto ℓ1,∞subscriptℓ1\ell_{1,\infty}roman_ℓ start_POSTSUBSCRIPT 1 , ∞ end_POSTSUBSCRIPT-norm ball,” in International Conference on Machine Learning, 2020, pp. 1974–1983.
- A. Sinha, P. Malo, and K. Deb, “A review on bilevel optimization: From classical to evolutionary approaches and applications,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 2, pp. 276–295, 2018.
- K. Bennett, J. Hu, X. Ji, G. Kunapuli, and J.-S. Pang, “Model selection via bilevel optimization,” IEEE International Conference on Neural Networks - Conference Proceedings, 2006.
- G. Perez, L. Condat, and M. Barlaud, “Near-linear time projection onto the l1,infty ball application to sparse autoencoders.” arXiv: 2307.09836, 2023.
- D. Kong, R. Fujimaki, J. Liu, F. Nie, and C. Ding, “Exclusive feature learning on arbitrary structures via \ell_{1,2}\absent𝑒𝑙𝑙_12\backslash ell\_\{1,2\}\ italic_e italic_l italic_l _ { 1 , 2 } -norm,” in Advances in Neural Information Processing Systems 27. Curran Associates, Inc., 2014, pp. 1655–1663.
- D. Gregoratti, X. Mestre, and C. Buelga, “Exclusive group lasso for structured variable selection,” arXiv,2108.10284, 2021.
- Y. Zhou, R. Jin, and S. C. Hoi, “Exclusive group lasso for multi-task feature selection,” Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010.
- M. Barlaud, A. Chambolle, and J.-B. Caillau, “Classification and feature selection using a primal-dual method and projection on structured constraints,” International Conference on Pattern Recognition, Milan, pp. 6538–6545, 2020.
- L. Theis, W. Shi, A. Cunningham, and F. Huszár, “Lossy image compression with compressive autoencoders,” ICLR Conference Toulon, 2017.
- F. Mentzer, G. Toderici, M. Tschannen, and E. Agustsson, “High-fidelity generative image compression,” NEURIPS, 2020.
- J. Ascenso, E. Alshina, and T. Ebrahimi, “The jpeg ai standard: Providing efficient human and machine visual data consumption,” IEEE MultiMedia, vol. 30, no. 1, pp. 100–111, 2023.
- D. Kingma and M. Welling, “Auto-encoding variational bayes,” International Conference on Learning Representation, 2014.
- D. P. Kingma, S. Mohamed, D. Jimenez Rezende, and M. Welling, “Semi-supervised learning with deep generative models,” Advances in neural information processing systems, vol. 27, 2014.
- J. Snoek, R. Adams, and H. Larochelle, “On nonparametric guidance for learning autoencoder representations,” in Artificial Intelligence and Statistics. PMLR, 2012, pp. 1073–1080.
- L. Le, A. Patterson, and M. White, “Supervised autoencoders: Improving generalization performance with unsupervised regularizers,” Advances in Neural Information Processing Systems, 2018.
- M. Barlaud and F. Guyard, “Learning a sparse generative non-parametric supervised autoencoder,” Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, June 2021.
- M. Barlaud, W. Belhajali, P. Combettes, and L. Fillatre, “Classification and regression using an outer approximation projection-gradient method,” vol. 65, no. 17, 2017, pp. 4635–4643.
- T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu, “The entire regularization path for the support vector machine,” Journal of Machine Learning Research, vol. 5, pp. 1391–1415, 2004.
- J. Friedman, T. Hastie, and R. Tibshirani, “Regularization path for generalized linear models via coordinate descent,” Journal of Statistical Software, vol. 33, pp. 1–122, 2010.
- J. Mairal and B. Yu, “Complexity analysis of the lasso regularization path,” in Proceedings of the 29th International Conference on Machine Learning (ICML-12), 2012, pp. 353–360.
- J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” arXiv preprint arXiv:1803.03635, 2018.
- H. Zhou, J. Lan, R. Liu, and J. Yosinski, “Deconstructing lottery tickets: Zeros, signs, and the supermask,” in Advances in Neural Information Processing Systems 32, 2019, pp. 3597–3607.
- D. Kingma and J. Ba, “a method for stochastic optimization.” International Conference on Learning Representations, pp. 1–13, 2015.