Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
55 tokens/sec
2000 character limit reached

Compositor: Bottom-up Clustering and Compositing for Robust Part and Object Segmentation (2306.07404v3)

Published 12 Jun 2023 in cs.CV

Abstract: In this work, we present a robust approach for joint part and object segmentation. Specifically, we reformulate object and part segmentation as an optimization problem and build a hierarchical feature representation including pixel, part, and object-level embeddings to solve it in a bottom-up clustering manner. Pixels are grouped into several clusters where the part-level embeddings serve as cluster centers. Afterwards, object masks are obtained by compositing the part proposals. This bottom-up interaction is shown to be effective in integrating information from lower semantic levels to higher semantic levels. Based on that, our novel approach Compositor produces part and object segmentation masks simultaneously while improving the mask quality. Compositor achieves state-of-the-art performance on PartImageNet and Pascal-Part by outperforming previous methods by around 0.9% and 1.3% on PartImageNet, 0.4% and 1.7% on Pascal-Part in terms of part and object mIoU and demonstrates better robustness against occlusion by around 4.4% and 7.1% on part and object respectively. Code will be available at https://github.com/TACJu/Compositor.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Nemo: Neural mesh models of contrastive features for robust 3d pose estimation. In ICLR, 2021.
  2. Object detection using strongly-supervised deformable part models. In ECCV, pages 836–849. Springer, 2012.
  3. Irving Biederman. Recognition-by-components: a theory of human image understanding. Psychological review, 94(2):115, 1987.
  4. End-to-end object detection with transformers. In ECCV, pages 213–229. Springer, 2020.
  5. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017.
  6. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, pages 801–818, 2018.
  7. Detect what you can: Detecting and representing objects using holistic models and body parts. In CVPR, pages 1971–1978, 2014.
  8. Masked-attention mask transformer for universal image segmentation. In CVPR, pages 1290–1299, 2022.
  9. Per-pixel classification is not all you need for semantic segmentation. Neurips, 34, 2021.
  10. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255. Ieee, 2009.
  11. Towards unified human parsing and pose estimation. In CVPR, pages 843–850, 2014.
  12. Simple training strategies and model scaling for object detection. arXiv, 2021.
  13. A generative model for parts-based object segmentation. Neurips, 25, 2012.
  14. The pascal visual object classes (voc) challenge. IJCV, 88(2):303–338, 2010.
  15. One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence, 28(4):594–611, 2006.
  16. Pictorial structures for object recognition. IJCV, 61(1):55–79, 2005.
  17. The representation and matching of pictorial structures. IEEE Transactions on computers, 100(1):67–92, 1973.
  18. Simple copy-paste is a strong data augmentation method for instance segmentation. In CVPR, pages 2918–2928, 2021.
  19. Object detection with grammar models. Neurips, 24, 2011.
  20. Corl: Compositional representation learning for few-shot classification. In WACV, pages 3890–3899, 2023.
  21. Partimagenet: A large, high-quality dataset of parts. In ECCV, pages 128–145. Springer, 2022.
  22. Mask r-cnn. In ICCV, pages 2961–2969, 2017.
  23. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  24. Scops: Self-supervised co-part segmentation. pages 869–878, 2019.
  25. Panoptic feature pyramid networks. In CVPR, pages 6399–6408, 2019.
  26. Compositional convolutional neural networks: A deep architecture with innate robustness to partial occlusion. In CVPR, pages 8940–8949, 2020.
  27. Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97, 1955.
  28. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
  29. Expectation-maximization attention networks for semantic segmentation. pages 9167–9176, 2019.
  30. Feature pyramid networks for object detection. In CVPR, pages 2117–2125, 2017.
  31. Unsupervised part segmentation through disentangling appearance and shape. pages 8355–8364, 2021.
  32. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 10012–10022, 2021.
  33. Object-centric learning with slot attention. Neurips, 33:11525–11538, 2020.
  34. Fully convolutional networks for semantic segmentation. In CVPR, pages 3431–3440, 2015.
  35. Robust category-level 6d pose estimation with coarse-to-fine rendering of neural features. In ECCV, pages 492–508. Springer, 2022.
  36. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 3DV, pages 565–571. IEEE, 2016.
  37. Attention-based joint detection of object and semantic part. arXiv, 2020.
  38. Segmenter: Transformer for semantic segmentation. In CVPR, pages 7262–7272, 2021.
  39. Attention is all you need. Neurips, 30, 2017.
  40. Robust object detection under occlusion with context-aware compositionalnets. In CVPR, pages 12645–12654, 2020.
  41. Max-deeplab: End-to-end panoptic segmentation with mask transformers. In CVPR, pages 5463–5474, 2021.
  42. Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In ECCV, pages 108–126. Springer, 2020.
  43. Joint object and part segmentation using deep learned potentials. In ICCV, pages 1573–1581, 2015.
  44. Unsupervised learning of models for recognition. In ECCV, pages 18–32. Springer, 2000.
  45. Segformer: Simple and efficient design for semantic segmentation with transformers. Neurips, 34, 2021.
  46. Articulated pose estimation with flexible mixtures-of-parts. In CVPR, pages 1385–1392. IEEE, 2011.
  47. Cmt-deeplab: Clustering mask transformers for panoptic segmentation. In CVPR, pages 2560–2570, 2022.
  48. k-means Mask Transformer. In ECCV, pages 288–307. Springer, 2022.
  49. Part-based r-cnns for fine-grained category detection. In ECCV, pages 834–849. Springer, 2014.
  50. K-net: Towards unified image segmentation. Neurips, 34, 2021.
  51. A stochastic grammar of images. Foundations and Trends® in Computer Graphics and Vision, 2(4):259–362, 2007.
  52. Self-supervised learning of object parts for semantic segmentation. In CVPR, pages 14502–14511, 2022.
Citations (10)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.