Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prompting classes: Exploring the Power of Prompt Class Learning in Weakly Supervised Semantic Segmentation (2307.00097v3)

Published 30 Jun 2023 in cs.CV

Abstract: Recently, CLIP-based approaches have exhibited remarkable performance on generalization and few-shot learning tasks, fueled by the power of contrastive language-vision pre-training. In particular, prompt tuning has emerged as an effective strategy to adapt the pre-trained language-vision models to downstream tasks by employing task-related textual tokens. Motivated by this progress, in this work we question whether other fundamental problems, such as weakly supervised semantic segmentation (WSSS), can benefit from prompt tuning. Our findings reveal two interesting observations that shed light on the impact of prompt tuning on WSSS. First, modifying only the class token of the text prompt results in a greater impact on the Class Activation Map (CAM), compared to arguably more complex strategies that optimize the context. And second, the class token associated with the image ground truth does not necessarily correspond to the category that yields the best CAM. Motivated by these observations, we introduce a novel approach based on a PrOmpt cLass lEarning (POLE) strategy. Through extensive experiments we demonstrate that our simple, yet efficient approach achieves SOTA performance in a well-known WSSS benchmark. These results highlight not only the benefits of language-vision models in WSSS but also the potential of prompt learning for this problem. The code is available at https://github.com/rB080/WSS_POLE.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Information provided by chatgpt, an artificial intelligence language model developed by openai. Accessed on [March 23rd, 2023]. Generated text should be treated as such.
  2. Weakly supervised learning of instance segmentation with inter-pixel relations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2209–2218, 2019.
  3. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4981–4990, 2018.
  4. What’s the point: Semantic segmentation with point supervision. In Proceedings of the European Conference on Computer Vision, pages 549–565. Springer, 2016.
  5. Weakly-supervised semantic segmentation via sub-category exploration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8991–9000, 2020.
  6. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. Transactions on Pattern Analysis and Machine Intelligence, 40:834–848, 2018.
  7. Weakly supervised semantic segmentation with boundary exploration. In Proceedings of the European Conference on Computer Vision, pages 347–362. Springer, 2020.
  8. Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4288–4298, 2022.
  9. Class re-activation maps for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 969–978, 2022.
  10. BNC Consortium et al. British national corpus. Oxford Text Archive Core Collection, 2007.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
  12. Learning to prompt for open-vocabulary object detection with vision-language model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14084–14093, 2022.
  13. The pascal visual object classes (voc) challenge. International journal of computer vision, 88:303–308, 2009.
  14. Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4283–4292, 2020.
  15. CIAN: Cross-image affinity net for weakly supervised semantic segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 10762–10769, 2020.
  16. Clip-adapter: Better vision-language models with feature adapters. arXiv preprint arXiv:2110.04544, 2021.
  17. Zeroshot detection via vision and language knowledge distillation. arXiv preprint arXiv:2104.13921, 2(3):4, 2021.
  18. Semantic contours from inverse detectors. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 991–998. IEEE, 2011.
  19. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  20. Self-erasing network for integral object attention. Advances in Neural Information Processing Systems, 31, 2018.
  21. Pushing the limits of simple pipelines for few-shot learning: External data and fine-tuning make a difference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9068–9077, 2022.
  22. L2G: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16886–16896, 2022.
  23. Prompting visual-language models for efficient video understanding. In Proceedings of the European Conference on Computer Vision, pages 105–124. Springer, 2022.
  24. Constrained-CNN losses for weakly supervised segmentation. Medical image analysis, 54:88–99, 2019.
  25. Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 876–885, 2017.
  26. Discriminative region suppression for weakly-supervised semantic segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1754–1761, 2021.
  27. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In Proceedings of the European Conference on Computer Vision, pages 695–711. Springer, 2016.
  28. Word vectors, reuse, and replicability: Towards a community repository of large-text resources. In Proceedings of the 58th Conference on Simulation and Modelling, pages 271–276. Linkoping University Electronic Press, 2017.
  29. Unlocking the potential of ordinary classifier: Class-specific adversarial erasing framework for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6994–7003, 2021.
  30. Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5267–5276, 2019.
  31. Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4071–4080, 2021.
  32. Weakly supervised semantic segmentation using out-of-distribution data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16897–16906, 2022.
  33. Threshold matters in wsss: manipulating the activation for the robust and accurate segmentation model against thresholds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4330–4339, 2022.
  34. Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5495–5505, 2021.
  35. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, 2021.
  36. Language-driven semantic segmentation. In International Conference on Learning Representations, 2021.
  37. Expansion and shrinkage of localization for weakly-supervised semantic segmentation. In Advances in Neural Information Processing Systems, 2022.
  38. Transcam: Transformer attention-based cam refinement for weakly supervised semantic segmentation. arXiv preprint arXiv:2203.07239, 2022.
  39. Prefix-tuning: Optimizing continuous prompts for generation. In ACL, pages 4582–4597, 2021.
  40. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3159–3167, 2016.
  41. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
  42. Cutting down on prompts and parameters: Simple few-shot learning with language models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2824–2835, 2022.
  43. Learning to compose soft prompts for compositional zero-shot learning. In International Conference on Learning Representations, 2023.
  44. Weakly supervised segmentation with cross-modality equivariant constraints. Medical Image Analysis, 77:102374, 2022.
  45. Constrained convolutional neural networks for weakly supervised segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1796–1804, 2015.
  46. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  47. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  48. Denseclip: Language-guided dense prediction with context-aware prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18082–18091, 2022.
  49. Max pooling with vision transformers reconciles class and shape in weakly supervised semantic segmentation. In Proceedings of the European Conference on Computer Vision, pages 446–463. Springer, 2022.
  50. Learning visual words for weakly-supervised semantic segmentation. In IJCAI, volume 5, page 6, 2021.
  51. Weakly-supervised semantic segmentation with visual words learning and hybrid pooling. International Journal of Computer Vision, 130(4):1127–1144, 2022.
  52. Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16846–16855, 2022.
  53. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  54. Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3136–3145, 2019.
  55. Mining cross-image semantics for weakly supervised semantic segmentation. In Proceedings of the European Conference on Computer Vision, pages 347–365. springer, 2020.
  56. On regularized losses for weakly-supervised cnn segmentation. In Proceedings of the European Conference on Computer Vision, pages 507–522. Springer, 2018.
  57. Learning to decompose visual features with latent textual prompts. arXiv preprint arXiv:2210.04287, 2022.
  58. Weakly-supervised semantic segmentation by iterative affinity learning. International Journal of Computer Vision, 128(6):1736–1749, 2020.
  59. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12275–12284, 2020.
  60. Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1568–1576, 2017.
  61. Adaptive spatial-bce loss for weakly supervised semantic segmentation. In Proceedings of the European Conference on Computer Vision, pages 199–216. Springer, 2022.
  62. Embedded discriminative attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16765–16774, 2021.
  63. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90:119–133, 2019.
  64. CLIMS: Cross language image matching for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4483–4492, 2022.
  65. Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
  66. Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6984–6993, 2021.
  67. Multi-class token transformer for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4310–4319, 2022.
  68. Non-salient region object mining for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2623–2632, 2021.
  69. Adversarial erasing framework via triplet with gated pyramid pooling layer for weakly supervised semantic segmentation. In Proceedings of the European Conference on Computer Vision, pages 326–344. Springer, 2022.
  70. Affinity attention graph neural network for weakly supervised semantic segmentation. Transactions on Pattern Analysis and Machine Intelligence, 2021.
  71. Complementary patch for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7242–7251, 2021.
  72. Tip-adapter: Training-free adaption of clip for few-shot classification. In Proceedings of the European Conference on Computer Vision, pages 493–510. Springer, 2022.
  73. Learning deep features for discriminative localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2921–2929, 2016.
  74. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022.
  75. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.
  76. Zegclip: Towards adapting clip for zero-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11175–11185, 2023.
Citations (3)

Summary

We haven't generated a summary for this paper yet.