Multi-Task Dense Prediction via Mixture of Low-Rank Experts (2403.17749v2)
Abstract: Previous multi-task dense prediction methods based on the Mixture of Experts (MoE) have received great performance but they neglect the importance of explicitly modeling the global relations among all tasks. In this paper, we present a novel decoder-focused method for multi-task dense prediction, called Mixture-of-Low-Rank-Experts (MLoRE). To model the global task relationships, MLoRE adds a generic convolution path to the original MoE structure, where each task feature can go through this path for explicit parameter sharing. Furthermore, to control the parameters and computational cost brought by the increase in the number of experts, we take inspiration from LoRA and propose to leverage the low-rank format of a vanilla convolution in the expert network. Since the low-rank experts have fewer parameters and can be dynamically parameterized into the generic convolution, the parameters and computational cost do not change much with the increase of experts. Benefiting from this design, we increase the number of experts and its reception field to enlarge the representation capacity, facilitating multiple dense tasks learning in a unified network. Extensive experiments on the PASCAL-Context and NYUD-v2 benchmarks show that our MLoRE achieves superior performance compared to previous state-of-the-art methods on all metrics. Our code is available at https://github.com/YuqiYang213/MLoRE.
- Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255, 2020.
- Adabins: Depth estimation using adaptive bins. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4009–4018, 2021.
- Automated search for resource-efficient branched multi-task networks. arXiv preprint arXiv:2008.10292, 2020.
- Exploring relational context for multi-task dense prediction. In Int. Conf. Comput. Vis., pages 15869–15878, 2021.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell., 40(4):834–848, 2017.
- Adamv-moe: Adaptive multi-task vision mixture-of-experts. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17346–17357, 2023.
- Detect what you can: Detecting and representing objects using holistic models and body parts. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1971–1978, 2014.
- Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International conference on machine learning, pages 794–803. PMLR, 2018.
- Just pick a sign: Optimizing deep multitask models with gradient sign dropout. Advances in Neural Information Processing Systems, 33:2039–2050, 2020.
- Mod-squad: Designing mixtures of experts as modular multi-task learners. In IEEE Conf. Comput. Vis. Pattern Recog., pages 11828–11837, 2023.
- Unified scaling laws for routed language models. In Int. Conf. Mach. Learn., pages 4057–4086. PMLR, 2022.
- Diverse branch block: Building a convolution as an inception-like unit. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10886–10895, 2021.
- Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13733–13742, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. In Int. Conf. Learn. Represent., 2020.
- M33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTvit: Mixture-of-experts vision transformer for efficient multi-task learning with model-accelerator co-design. In Adv. Neural Inform. Process. Syst., volume 35, pages 28441–28457, 2022.
- Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. The Journal of Machine Learning Research, 23(1):5232–5270, 2022.
- Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3205–3214, 2019.
- Dynamic task prioritization for multitask learning. In Proceedings of the European conference on computer vision (ECCV), pages 270–287, 2018.
- Learning to branch for multi-task learning. In Int. Conf. Mach. Learn., pages 3854–3863. PMLR, 2020.
- Strip pooling: Rethinking spatial pooling for scene parsing. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4003–4012, 2020.
- Coordinate attention for efficient mobile network design. In IEEE Conf. Comput. Vis. Pattern Recog., pages 13713–13722, 2021.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Low-rank compression of neural nets: Learning the rank of each layer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8049–8059, 2020.
- Multi-task learning with attention for end-to-end autonomous driving. In IEEE Conf. Comput. Vis. Pattern Recog., pages 2902–2911, 2021.
- Learning piecewise control strategies in a modular neural network architecture. IEEE Transactions on Systems, Man, and Cybernetics, 23(2):337–345, 1993.
- Adaptive mixtures of local experts. Neural Comput., 3(1):79–87, 1991.
- Fact: Factor-tuning for lightweight adaptation on vision transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 1060–1068, 2023.
- Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491, 2018.
- Rethinking self-driving: Multi-task knowledge for better generalization and accident explanation ability. arXiv preprint arXiv:1809.11100, 2018.
- Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1925–1934, 2017.
- End-to-end multi-task learning with attention. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1871–1880, 2019.
- Polyhistor: Parameter-efficient multi-task adaptation for dense vision tasks. Advances in Neural Information Processing Systems, 35:36889–36901, 2022.
- Fully convolutional networks for semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3431–3440, 2015.
- Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5334–5343, 2017.
- Attentive single-tasking of multiple tasks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1851–1860, 2019.
- Cross-stitch networks for multi-task learning. In IEEE Conf. Comput. Vis. Pattern Recog., pages 3994–4003, 2016.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- Vision transformers for dense prediction. In Int. Conf. Comput. Vis., pages 12179–12188, 2021.
- Scaling vision with sparse mixture of experts. Advances in Neural Information Processing Systems, 34:8583–8595, 2021.
- Latent multi-task architecture learning. In AAAI Conf. Artif. Intell., volume 33, pages 4822–4829, 2019.
- Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.
- Indoor segmentation and support inference from rgbd images. In Eur. Conf. Comput. Vis., pages 746–760. Springer, 2012.
- Multi-task learning with low rank attribute embedding for person re-identification. In Proceedings of the IEEE international conference on computer vision, pages 3739–3747, 2015.
- Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5693–5703, 2019.
- Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5227–5237, 2022.
- Generalized low rank models. Foundations and Trends® in Machine Learning, 9(1):1–118, 2016.
- Branched multi-task networks: deciding what layers to share. In Brit. Mach. Vis. Conf., 2019.
- Multi-task learning for dense prediction tasks: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 44(7):3614–3633, 2021.
- Mti-net: Multi-scale task interaction networks for multi-task learning. In Eur. Conf. Comput. Vis., pages 527–543. Springer, 2020.
- Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In IEEE Conf. Comput. Vis. Pattern Recog., pages 675–684, 2018.
- Trace norm regularised deep multi-task learning. arXiv preprint arXiv:1606.04038, 2016.
- Inverted pyramid multi-task transformer for dense scene understanding. In Eur. Conf. Comput. Vis., pages 514–530. Springer, 2022.
- Taskprompter: Spatial-channel multi-task prompting for dense scene understanding. In Int. Conf. Learn. Represent., 2022.
- Taskexpert: Dynamically assembling multi-task representations with memorial mixture-of-experts. In Int. Conf. Comput. Vis., 2023.
- Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Eur. Conf. Comput. Vis., pages 325–341, 2018.
- Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In IEEE Conf. Comput. Vis. Pattern Recog., pages 2636–2645, 2020.
- Joint task-recursive learning for semantic segmentation and depth estimation. In Eur. Conf. Comput. Vis., pages 235–251, 2018.
- Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4106–4115, 2019.
- A modulation module for multi-task learning with applications in image retrieval. In Proceedings of the European Conference on Computer Vision (ECCV), pages 401–416, 2018.
- Pattern-structure diffusion for multi-task learning. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4514–4523, 2020.
- Yuqi Yang (21 papers)
- Peng-Tao Jiang (34 papers)
- Qibin Hou (82 papers)
- Hao Zhang (948 papers)
- Jinwei Chen (24 papers)
- Bo Li (1107 papers)