Less is More -- Towards parsimonious multi-task models using structured sparsity (2308.12114v3)
Abstract: Model sparsification in deep learning promotes simpler, more interpretable models with fewer parameters. This not only reduces the model's memory footprint and computational needs but also shortens inference time. This work focuses on creating sparse models optimized for multiple tasks with fewer parameters. These parsimonious models also possess the potential to match or outperform dense models in terms of performance. In this work, we introduce channel-wise l1/l2 group sparsity in the shared convolutional layers parameters (or weights) of the multi-task learning model. This approach facilitates the removal of extraneous groups i.e., channels (due to l1 regularization) and also imposes a penalty on the weights, further enhancing the learning efficiency for all tasks (due to l2 regularization). We analyzed the results of group sparsity in both single-task and multi-task settings on two widely-used Multi-Task Learning (MTL) datasets: NYU-v2 and CelebAMask-HQ. On both datasets, which consist of three different computer vision tasks each, multi-task models with approximately 70% sparsity outperform their dense equivalents. We also investigate how changing the degree of sparsification influences the model's performance, the overall sparsity percentage, the patterns of sparsity, and the inference time.
- Model comparison and the principle of parsimony. Oxford Handbook of Computational and Mathematical Psychology, pages 300–317, 2015.
- A convergence theory for deep learning via over-parameterization. In International conference on machine learning, pages 242–252. PMLR, 2019.
- Recent advances in deep learning theory. arXiv preprint arXiv:2012.10931, 2020.
- A comparison of model selection methods for prediction in the presence of multiply imputed data. Biometrical Journal, 61, 10 2018. doi:10.1002/bimj.201700232.
- Michael Crawshaw. Multi-task learning with deep neural networks: A survey. arXiv preprint arXiv:2009.09796, 2020.
- Understanding and improving information transfer in multi-task learning. In International Conference on Learning Representations, 2020.
- Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. J. Mach. Learn. Res., 22(1), jan 2021. ISSN 1532-4435.
- Pruning convolutional neural networks via filter similarity analysis. Machine Learning, 111(9):3161–3180, 2022.
- Exploiting the redundancy in convolutional filters for parameter reduction. In proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1410–1420, 2021.
- Learning multiple tasks with multilinear relationship networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 1593–1602, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.
- Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1131–1140, 2016.
- Latent multi-task architecture learning. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):4822–4829, Jul. 2019. doi:10.1609/aaai.v33i01.33014822.
- Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 845–850, Beijing, China, July 2015. Association for Computational Linguistics. doi:10.3115/v1/P15-2139.
- Neural architecture search: A survey. The Journal of Machine Learning Research, 20(1):1997–2017, 2019.
- Adaptively sharing multi-levels of distributed representations in multi-task learning. Information Sciences, 591:226–234, 2022. ISSN 0020-0255. doi:https://doi.org/10.1016/j.ins.2022.01.035.
- Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1131–1140, 2017. doi:10.1109/CVPR.2017.126.
- Learning to branch for multi-task learning. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.
- Adashare: Learning what to share for efficient deep multi-task learning. Advances in Neural Information Processing Systems, 33:8728–8740, 2020a.
- Learning sparse sharing architectures for multiple tasks. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 8936–8943, 2020b.
- A contrastive sharing model for multi-task recommendation. In Proceedings of the ACM Web Conference 2022, pages 3239–3247, 2022.
- Mssm: A multiple-level sparse sharing model for efficient multi-task learning. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, page 2237–2241, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450380379. doi:10.1145/3404835.3463022.
- Multi-task feature learning. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006.
- The sparsity and bias of the Lasso selection in high-dimensional linear regression. The Annals of Statistics, 36(4):1567 – 1594, 2008. doi:10.1214/07-AOS520.
- A dirty model for multi-task learning. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 23. Curran Associates, Inc., 2010.
- Pruning filter in filter. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 17629–17640. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/ccb1d45fb76f7c5a0bf619f979c6cf36-Paper.pdf.
- Taking advantage of sparsity in multi-task learning. arXiv preprint arXiv:0903.1468, 2009.
- Sharing to learn and learning to share – fitting together meta-learning, multi-task learning, and transfer learning: A meta review, 2023a. arXiv:2111.12146.
- Learning task structure via sparsity grouped multitask learning. arXiv preprint arXiv:1705.04886, 2017.
- Rich Caruana. Multitask learning. Machine Learning, 28:41–75, 1997.
- Multi-task learning for dense prediction tasks: A survey. IEEE transactions on pattern analysis and machine intelligence, 2021.
- Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491, 2018.
- Ming Yuan and Yi Lin. Model Selection and Estimation in Regression with Grouped Variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1):49–67, 12 2005. ISSN 1369-7412. doi:10.1111/j.1467-9868.2005.00532.x.
- Optimization with sparsity-inducing penalties. Foundations and Trends® in Machine Learning, 4(1):1–106, 2012.
- Structured sparsity inducing adaptive optimizers for deep learning. CoRR, abs/2102.03869, 2021.
- Signal recovery by proximal forward-backward splitting. Multiscale Modeling & Simulation, 4(4):1168–1200, 2005. doi:10.1137/050626090.
- Learning structured sparsity in deep neural networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 2082–2090, Red Hook, NY, USA, 2016. Curran Associates Inc. ISBN 9781510838819.
- Pushmeet Kohli Nathan Silberman, Derek Hoiem and Rob Fergus. Indoor segmentation and support inference from rgbd images. In ECCV, 2012.
- Maskgan: Towards diverse and interactive facial image manipulation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Dilated residual networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Rethinking atrous convolution for semantic image segmentation, 2017. arXiv:1706.05587.
- Adam: A method for stochastic optimization, 2017. arXiv:1412.6980.
- Multi-task meta learning: learn how to adapt to unseen tasks. In 2023 International Joint Conference on Neural Networks (IJCNN), pages 1–10, 2023b. doi:10.1109/IJCNN54540.2023.10191400.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.