PLUG: Revisiting Amodal Segmentation with Foundation Model and Hierarchical Focus (2405.16094v2)
Abstract: Aiming to predict the complete shapes of partially occluded objects, amodal segmentation is an important step towards visual intelligence. With crucial significance, practical prior knowledge derives from sufficient training, while limited amodal annotations pose challenges to achieve better performance. To tackle this problem, utilizing the mighty priors accumulated in the foundation model, we propose the first SAM-based amodal segmentation approach, PLUG. Methodologically, a novel framework with hierarchical focus is presented to better adapt the task characteristics and unleash the potential capabilities of SAM. In the region level, due to the association and division in visible and occluded areas, inmodal and amodal regions are assigned as the focuses of distinct branches to avoid mutual disturbance. In the point level, we introduce the concept of uncertainty to explicitly assist the model in identifying and focusing on ambiguous points. Guided by the uncertainty map, a computation-economic point loss is applied to improve the accuracy of predicted boundaries. Experiments are conducted on several prominent datasets, and the results show that our proposed method outperforms existing methods with large margins. Even with fewer total parameters, our method still exhibits remarkable advantages.
- Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, page 7319–7328, 2021.
- Image amodal completion: A survey. Computer Vision and Image Understanding, page 103661, 2023.
- Foundational models defining a new era in vision: A survey and outlook. arXiv preprint arXiv:2307.13721, 2023.
- Amodal cityscapes: a new dataset, its generation, and an amodal semantic segmentation challenge baseline. In IEEE Intelligent Vehicles Symposium, pages 1018–1025, 2022.
- Amodal instance segmentation via prior-guided expansion. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 313–321, 2023.
- Reproducible scaling laws for contrastive language-image learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2818–2829, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations, 2021.
- Krona: Parameter efficient tuning with kronecker adapter. In Advances in Neural Information Processing Systems, The Third Workshop on Efficient Natural Language and Speech Processing, 2023.
- Eva: Exploring the limits of masked visual representation learning at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19358–19369, 2023.
- Learning to see the invisible: End-to-end trainable amodal instance segmentation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pages 1328–1336, 2019.
- Coarse-to-fine amodal segmentation with shape prior. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1262–1271, 2023.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3354–3361, 2012.
- Panodr: Spherical panorama diminished reality for indoor scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3716–3726, 2021.
- Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
- Parameter-efficient transfer learning for nlp. In Proceedings of the International Conference on Machine Learning, pages 2790–2799, 2019.
- LoRA: Low-rank adaptation of large language models. In Proceedings of the International Conference on Learning Representations, 2022.
- LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, page 5254–5276, 2023.
- Detecting layered structures of partially occluded objects for bin picking. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5786–5791, 2019.
- Compacter: Efficient low-rank hypercomplex adapter layers. In Advances in Neural Information Processing Systems, volume 34, pages 1022–1035, 2021.
- Deep occlusion-aware instance segmentation with overlapping bilayers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4019–4028, 2021.
- Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023.
- Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9799–9808, 2020.
- Compositional convolutional neural networks: A deep architecture with innate robustness to partial occlusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8940–8949, 2020.
- Compositional convolutional neural networks: A robust and interpretable model for object recognition under occlusion. International Journal of Computer Vision, 129:736–760, 2021.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, page 3045–3059, 2021.
- Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proceedings of the International Conference on Machine Learning, pages 12888–12900, 2022.
- Ke Li and Jitendra Malik. Amodal instance segmentation. In Proceedings of the European Conference on Computer Vision, pages 677–693, 2016.
- Grounded language-image pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10965–10975, 2022.
- 2d amodal instance segmentation guided by 3d shape prior. In Proceedings of the European Conference on Computer Vision, pages 165–181, 2022.
- Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, pages 740–755, 2014.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023.
- BLADE: Box-level supervised amodal segmentation through directed expansion. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 3846–3854, 2024.
- Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations, 2018.
- Image segmentation using text and image prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7086–7096, 2022.
- A weakly supervised amodal segmenter with boundary uncertainty estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7396–7405, 2021.
- Stephen E Palmer. Vision science: Photons to phenomenology. MIT press, 1999.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, volume 32, pages 8026–8037, 2019.
- Instant automatic emptying of panoramic indoor scenes. IEEE Transactions on Visualization and Computer Graphics, 28(11):3629–3639, 2022.
- Amodal instance segmentation with kins dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3014–3023, 2019.
- Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, pages 8748–8763, 2021.
- Amodal segmentation through out-of-task and out-of-distribution generalization with a bayesian model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1215–1224, 2022.
- Aisformer: Amodal instance segmentation with transformer. In Proceedings of the British Machine Vision Conference, 2022.
- Instance segmentation of visible and occluded regions for finding and picking target from a pile of objects. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2048–2055, 2018.
- Joint learning of instance and semantic segmentation for robotic pick-and-place with heavy occlusions in clutter. In Proceedings of the International Conference on Robotics and Automation, pages 9558–9564, 2019.
- Robust object detection under occlusion with context-aware compositionalnets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12645–12654, 2020.
- Caption anything: Interactive image description with diverse multimodal controls. arXiv preprint arXiv:2305.02677, 2023.
- Seggpt: Towards segmenting everything in context. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1130–1140, 2023.
- Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, page 5744–5760, 2022.
- Amodal segmentation based on visible region segmentation and shape prior. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 2995–3003, 2021.
- Filip: Fine-grained interactive language-image pre-training. In Proceedings of the International Conference on Learning Representations, 2022.
- Self-supervised scene de-occlusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3784–3792, 2020.
- Faster segment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289, 2023.
- Learning semantics-aware distance map with semantics layering network for amodal instance segmentation. In Proceedings of the 27th ACM International Conference on Multimedia, pages 2124–2132, 2019.
- A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419, 2023.
- Semantic amodal segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1464–1472, 2017.
- Segment everything everywhere all at once. In Advances in Neural Information Processing Systems, volume 36, pages 19769–19782, 2023.