Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

InterroGate: Learning to Share, Specialize, and Prune Representations for Multi-task Learning (2402.16848v1)

Published 26 Feb 2024 in cs.LG

Abstract: Jointly learning multiple tasks with a unified model can improve accuracy and data efficiency, but it faces the challenge of task interference, where optimizing one task objective may inadvertently compromise the performance of another. A solution to mitigate this issue is to allocate task-specific parameters, free from interference, on top of shared features. However, manually designing such architectures is cumbersome, as practitioners need to balance between the overall performance across all tasks and the higher computational cost induced by the newly added parameters. In this work, we propose \textit{InterroGate}, a novel multi-task learning (MTL) architecture designed to mitigate task interference while optimizing inference computational efficiency. We employ a learnable gating mechanism to automatically balance the shared and task-specific representations while preserving the performance of all tasks. Crucially, the patterns of parameter sharing and specialization dynamically learned during training, become fixed at inference, resulting in a static, optimized MTL architecture. Through extensive empirical evaluations, we demonstrate SoTA results on three MTL benchmarks using convolutional as well as transformer-based backbones on CelebA, NYUD-v2, and PASCAL-Context.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
  2. Budget-aware adapters for multi-domain learning. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp.  382–391, 2019.
  3. Mult: An end-to-end multitask learning transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  12031–12041, 2022.
  4. Stochastic filter groups for multi-task cnns: Learning specialist and generalist convolution kernels. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.  1385–1394, 2019.
  5. Exploring relational context for multi-task dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.  15869–15878, 2021.
  6. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pp.  801–818, 2018a.
  7. Detect what you can: Detecting and representing objects using holistic models and body parts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  1971–1978, 2014.
  8. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International Conference on Machine Learning (ICML), pp.  794–803. PMLR, 2018b.
  9. Just pick a sign: Optimizing deep multitask models with gradient sign dropout. Advances in Neural Information Processing Systems, 33:2039–2050, 2020.
  10. Mod-squad: Designing mixtures of experts as modular multi-task learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  11828–11837, 2023.
  11. The pascal visual object classes (voc) challenge. International journal of computer vision, 88:303–338, 2010.
  12. M33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTvit: Mixture-of-experts vision transformer for efficient multi-task learning with model-accelerator co-design. Advances in Neural Information Processing Systems, 35:28441–28457, 2022.
  13. Efficiently identifying task groupings for multi-task learning. Advances in Neural Information Processing Systems, 34:27503–27516, 2021.
  14. Mtl-nas: Task-agnostic neural architecture search towards general-purpose multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  11543–11552, 2020.
  15. Learning to branch for multi-task learning. In International conference on machine learning, pp.  3854–3863. PMLR, 2020.
  16. Dselect-k: Differentiable selection in the mixture of experts with applications to multi-task learning. Advances in Neural Information Processing Systems, 34:29335–29347, 2021.
  17. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  770–778, 2016.
  18. Rotograd: Gradient homogenization in multitask learning. In International Conference on Learning Representations (ICLR), 2021.
  19. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  7482–7491, 2018.
  20. Reasonable effectiveness of random weighting: A litmus test for multi-task learning. Transactions on Machine Learning Research, 2022.
  21. Conflict-averse gradient descent for multi-task learning. Advances in Neural Information Processing Systems, 34:18878–18890, 2021.
  22. End-to-end multi-task learning with attention. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  1871–1880, 2019.
  23. Auto-lambda: Disentangling dynamic task relationships. Transactions on Machine Learning Research, 2022.
  24. Deep learning face attributes in the wild. Proceedings of the IEEE international Conference on Computer Vision (ICCV), pp.  3730–3738, 2015.
  25. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp.  1930–1939, 2018.
  26. Piggyback: Adapting a single network to multiple tasks by learning to mask weights. Proceedings of the European Conference on Computer Vision (ECCV), pp.  67–82, 2018.
  27. Attentive single-tasking of multiple tasks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  1851–1860, 2019.
  28. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  3994–4003, 2016.
  29. Multi-task learning as a bargaining game. International Conference on Machine Learning, pp.  16428–16446, 2022.
  30. Dynashare: Task and instance conditioned parameter sharing for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4534–4542, 2023.
  31. Vision transformers for dense prediction. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.  12179–12188, 2021.
  32. Scalarization for multi-task and multi-domain learning at scale. Advances in Neural Information Processing Systems, 2023.
  33. Edge-moe: Memory-efficient multi-task vision transformer architecture with task-level sparsity via mixture-of-experts. arXiv preprint arXiv:2305.18691, 2023.
  34. Multi-task learning as multi-objective optimization. Advances in Neural Information Processing Systems, 31, 2018.
  35. Indoor segmentation and support inference from rgbd images. Proceedings of the European Conference on Computer Vision (ECCV), pp.  746–760, 2012.
  36. Which tasks should be learned together in multi-task learning? In International Conference on Machine Learning, pp.  9120–9132. PMLR, 2020.
  37. Adashare: Learning what to share for efficient deep multi-task learning. Advances in Neural Information Processing Systems, 33:8728–8740, 2020.
  38. Mti-net: Multi-scale task interaction networks for multi-task learning. Proceedings of the European Conference on Computer Vision (ECCV), pp.  527–543, 2020.
  39. Multi-task learning for dense prediction tasks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(7):3614–3633, 2021.
  40. Task adaptive parameter sharing for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  7561–7570, 2022.
  41. Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 43(10):3349–3364, 2020.
  42. Characterizing and avoiding negative transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  11293–11302, 2019.
  43. Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  675–684, 2018.
  44. Edgemoe: Fast on-device inference of moe-based large language models. arXiv preprint arXiv:2308.14352, 2023.
  45. Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems, 33:5824–5836, 2020.
  46. Pattern-affinitive propagation across depth, surface normal and semantic segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  4106–4115, 2019.
  47. A modulation module for multi-task learning with applications in image retrieval. Proceedings of the European Conference on Computer Vision (ECCV), pp.  401–416, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Babak Ehteshami Bejnordi (19 papers)
  2. Gaurav Kumar (46 papers)
  3. Amelie Royer (11 papers)
  4. Christos Louizos (30 papers)
  5. Tijmen Blankevoort (37 papers)
  6. Mohsen Ghafoorian (15 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com