Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data (2403.15389v1)

Published 22 Mar 2024 in cs.CV and cs.LG

Abstract: Recently, there has been an increased interest in the practical problem of learning multiple dense scene understanding tasks from partially annotated data, where each training sample is only labeled for a subset of the tasks. The missing of task labels in training leads to low-quality and noisy predictions, as can be observed from state-of-the-art methods. To tackle this issue, we reformulate the partially-labeled multi-task dense prediction as a pixel-level denoising problem, and propose a novel multi-task denoising diffusion framework coined as DiffusionMTL. It designs a joint diffusion and denoising paradigm to model a potential noisy distribution in the task prediction or feature maps and generate rectified outputs for different tasks. To exploit multi-task consistency in denoising, we further introduce a Multi-Task Conditioning strategy, which can implicitly utilize the complementary nature of the tasks to help learn the unlabeled tasks, leading to an improvement in the denoising performance of the different tasks. Extensive quantitative and qualitative experiments demonstrate that the proposed multi-task denoising diffusion model can significantly improve multi-task prediction maps, and outperform the state-of-the-art methods on three challenging multi-task benchmarks, under two different partial-labeling evaluation settings. The code is available at https://prismformore.github.io/diffusionmtl/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Attentive single-tasking of multiple tasks. In CVPR, 2019.
  2. Robust learning through cross-task consistency. In CVPR, 2020.
  3. Iasonas Kokkinos. Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In CVPR, 2017.
  4. Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In CVPR, 2018.
  5. Multi-task learning for dense prediction tasks: A survey. TPAMI, 44(7):3614–3633, 2021.
  6. Learning multiple dense prediction tasks from partially annotated data. In CVPR, 2022.
  7. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  8. Diffusion models beat gans on image synthesis. In NeurIPS, 2021.
  9. Cross-stitch networks for multi-task learning. In CVPR, 2016.
  10. End-to-end multi-task learning with attention. In CVPR, 2019.
  11. Yu Zhang and Qiang Yang. A survey on multi-task learning. TKDE, 2021.
  12. Taskonomy: Disentangling task transfer learning. In CVPR, 2018.
  13. Composite learning for robust and effective dense predictions. In WACV, 2023.
  14. M3 vit: Mixture-of-experts vision transformer for efficient multi-task learning with model-accelerator co-design. In NeurIPS, 2022.
  15. Mod-squad: Designing mixture of experts as modular multi-task learners. arXiv preprint arXiv:2212.08066, 2022.
  16. Three ways to improve semantic segmentation with self-supervised depth estimation. In CVPR, 2021.
  17. Mti-net: Multi-scale task interaction networks for multi-task learning. In ECCV, 2020.
  18. Exploring relational context for multi-task dense prediction. In ICCV, 2021.
  19. Contrastive multi-task dense prediction. In AAAI, 2023.
  20. Auto-lambda: Disentangling dynamic task relationships. TMLR, 2022.
  21. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In CVPR, 2018.
  22. Conflict-averse gradient descent for multi-task learning. In NeurIPS, 2021.
  23. Just pick a sign: Optimizing deep multitask models with gradient sign dropout. In NeurIPS, 2020.
  24. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In ICML, 2018.
  25. Gradient vaccine: Investigating and improving multi-task optimization in massively multilingual models. In ICLR, 2020.
  26. Gradient surgery for multi-task learning. In NeurIPS, 2020.
  27. Taskprompter: Spatial-channel multi-task prompting for dense scene understanding. In ICLR, 2023.
  28. Multimae: Multi-modal multi-task masked autoencoders. In ECCV, 2022.
  29. Multi-task learning with multi-query transformer for dense prediction. TCSVT, 2023.
  30. Taskexpert: Dynamically assembling multi-task representations with memorial mixture-of-experts. In ICCV, 2023.
  31. Mtformer: Multi-task learning via transformer and cross-task reasoning. In ECCV, 2022.
  32. Inverted pyramid multi-task transformer for dense scene understanding. In ECCV, 2022.
  33. Invpt++: Inverted pyramid multi-task transformer for visual scene understanding. arXiv preprint arXiv:2306.04842, 2023.
  34. Universal representations: A unified look at multiple task and domain learning. arXiv preprint arXiv:2204.02744, 2022.
  35. Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction. In CVPR, 2019.
  36. Mtl-nas: Task-agnostic neural architecture search towards general-purpose multi-task learning. In CVPR, 2020.
  37. Automtl: A programming framework for automating efficient multi-task learning. In NeurIPS, 2021.
  38. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
  39. Variational diffusion models. NeurIPS, 2021.
  40. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  41. Scalable diffusion models with transformers. arXiv preprint arXiv:2212.09748, 2022.
  42. Card: Classification and regression diffusion models. arXiv preprint arXiv:2206.07275, 2022.
  43. Label-efficient semantic segmentation with diffusion models. In ICLR, 2022.
  44. Segdiff: Image segmentation with diffusion probabilistic models. arXiv preprint arXiv:2112.00390, 2021.
  45. A generalist framework for panoptic segmentation of images and videos. arXiv preprint arXiv:2210.06366, 2022.
  46. Diffusioninst: Diffusion model for instance segmentation. arXiv preprint arXiv:2212.02773, 2022.
  47. Seggen: Supercharging segmentation models with text2mask and mask2img synthesis. arXiv preprint arXiv:2311.03355, 2023.
  48. Diffusiondet: Diffusion model for object detection. arXiv preprint arXiv:2211.09788, 2022.
  49. Attention is all you need. In NeurIPS, 2017.
  50. The pascal visual object classes (voc) challenge. IJCV, 111:98–136, 2010.
  51. Indoor segmentation and support inference from rgbd images. In ECCV, 2012.
  52. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
  53. Deep residual learning for image recognition. In CVPR, 2016.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets