Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Task Indicating Transformer for Task-conditional Dense Predictions (2403.00327v1)

Published 1 Mar 2024 in cs.CV

Abstract: The task-conditional model is a distinctive stream for efficient multi-task learning. Existing works encounter a critical limitation in learning task-agnostic and task-specific representations, primarily due to shortcomings in global context modeling arising from CNN-based architectures, as well as a deficiency in multi-scale feature interaction within the decoder. In this paper, we introduce a novel task-conditional framework called Task Indicating Transformer (TIT) to tackle this challenge. Our approach designs a Mix Task Adapter module within the transformer block, which incorporates a Task Indicating Matrix through matrix decomposition, thereby enhancing long-range dependency modeling and parameter-efficient feature adaptation by capturing intra- and inter-task features. Moreover, we propose a Task Gate Decoder module that harnesses a Task Indicating Vector and gating mechanism to facilitate adaptive multi-scale feature refinement guided by task embeddings. Experiments on two public multi-task dense prediction benchmarks, NYUD-v2 and PASCAL-Context, demonstrate that our approach surpasses state-of-the-art task-conditional methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. “Multi-task learning for dense prediction tasks: A survey,” TPAMI, vol. 44, no. 7, pp. 3614–3633, 2021.
  2. “Efficient controllable multi-task architectures,” in ICCV, 2023, pp. 5740–5751.
  3. “Cross-stitch networks for multi-task learning,” in CVPR, 2016, pp. 3994–4003.
  4. “Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction,” in CVPR, 2019, pp. 3205–3214.
  5. “End-to-end multi-task learning with attention,” in CVPR, 2019, pp. 1871–1880.
  6. “Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing,” in CVPR, 2018, pp. 675–684.
  7. “Mti-net: Multi-scale task interaction networks for multi-task learning,” in ECCV, 2020, pp. 527–543.
  8. “Exploring relational context for multi-task dense prediction,” in ICCV, 2021, pp. 15869–15878.
  9. “Inverted pyramid multi-task transformer for dense scene understanding,” in ECCV, 2022, pp. 514–530.
  10. “Scale-aware task message transferring for multi-task learning,” in ICME, 2023, pp. 1859–1864.
  11. “Attentive single-tasking of multiple tasks,” in CVPR, 2019, pp. 1851–1860.
  12. “Reparameterizing convolutions for incremental multi-task learning without task interference,” in ECCV, 2020, pp. 689–707.
  13. “Task switching network for multi-task learning,” in ICCV, 2021, pp. 8291–8300.
  14. “Compositetasking: Understanding images by spatial composition of tasks,” in CVPR, 2021, pp. 6870–6880.
  15. “Non-local neural networks,” in CVPR, 2018, pp. 7794–7803.
  16. “Attention is all you need,” NeurIPS, vol. 30, 2017.
  17. “An image is worth 16x16 words: Transformers for image recognition at scale,” in ICLR, 2021.
  18. “Swin transformer: Hierarchical vision transformer using shifted windows,” in ICCV, 2021, pp. 10012–10022.
  19. “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in ICCV, 2021, pp. 568–578.
  20. “Vision transformers for dense prediction,” in ICCV, 2021, pp. 12179–12188.
  21. “Mtformer: Multi-task learning via transformer and cross-task reasoning,” in ECCV, 2022, pp. 304–321.
  22. “Multi-task learning with multi-query transformer for dense prediction,” IEEE TCSVT, 2023.
  23. “Demt: Deformable mixer transformer for multi-task learning of dense prediction,” in AAAI, 2023, vol. 37, pp. 3072–3080.
  24. “Transfer vision patterns for multi-task pixel learning,” in ACM MM, 2021, pp. 97–106.
  25. “Parameter-efficient transfer learning for nlp,” in ICML, 2019, p. 2790–2799.
  26. “Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks,” in ACL, 2021, pp. 565–576.
  27. “Polyhistor: Parameter-efficient multi-task adaptation for dense vision tasks,” NeurIPS, vol. 35, pp. 36889–36901, 2022.
  28. “Vision transformer adapter for dense predictions,” in ICLR, 2023.
  29. “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” in EMNLP, 2014.
  30. “Delving deeper into convolutional networks for learning video representations,” in ICLR, 2016.
  31. “Indoor segmentation and support inference from rgbd images,” in ECCV, 2012, pp. 746–760.
  32. “The role of context for object detection and semantic segmentation in the wild,” in CVPR, 2014, pp. 891–898.
Citations (2)

Summary

We haven't generated a summary for this paper yet.