BLADE: Box-Level Supervised Amodal Segmentation through Directed Expansion (2401.01642v3)
Abstract: Perceiving the complete shape of occluded objects is essential for human and machine intelligence. While the amodal segmentation task is to predict the complete mask of partially occluded objects, it is time-consuming and labor-intensive to annotate the pixel-level ground truth amodal masks. Box-level supervised amodal segmentation addresses this challenge by relying solely on ground truth bounding boxes and instance classes as supervision, thereby alleviating the need for exhaustive pixel-level annotations. Nevertheless, current box-level methodologies encounter limitations in generating low-resolution masks and imprecise boundaries, failing to meet the demands of practical real-world applications. We present a novel solution to tackle this problem by introducing a directed expansion approach from visible masks to corresponding amodal masks. Our approach involves a hybrid end-to-end network based on the overlapping region - the area where different instances intersect. Diverse segmentation strategies are applied for overlapping regions and non-overlapping regions according to distinct characteristics. To guide the expansion of visible masks, we introduce an elaborately-designed connectivity loss for overlapping regions, which leverages correlations with visible masks and facilitates accurate amodal segmentation. Experiments are conducted on several challenging datasets and the results show that our proposed method can outperform existing state-of-the-art methods with large margins.
- Image amodal completion: A survey. Computer Vision and Image Understanding, 103661.
- Weakly supervised instance segmentation by learning annotation consistent instances. In Proceedings of the European Conference on Computer Vision, 254–270.
- Amodal cityscapes: a new dataset, its generation, and an amodal semantic segmentation challenge baseline. In IEEE Intelligent Vehicles Symposium, 1018–1025.
- Object-driven multi-layer scene decomposition from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 5369–5378.
- SeGAN: Segmenting and generating the invisible. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6144–6153.
- Learning to see the invisible: End-to-end trainable amodal instance segmentation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 1328–1336.
- Panodr: Spherical panorama diminished reality for indoor scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3716–3726.
- Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 770–778.
- Weakly supervised instance segmentation using the bounding box tightness prior. Advances in Neural Information Processing Systems, 32: 6586–6597.
- Dynamic filter networks. Advances in Neural Information Processing Systems, 29: 667–675.
- Organization in vision: Essays on Gestalt perception. Praeger Publishers.
- Deep occlusion-aware instance segmentation with overlapping bilayers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4019–4028.
- Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 876–885.
- Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 4015–4026.
- Compositional convolutional neural networks: A deep architecture with innate robustness to partial occlusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8940–8949.
- Compositional convolutional neural networks: A robust and interpretable model for object recognition under occlusion. International Journal of Computer Vision, 129: 736–760.
- Amodal instance segmentation. In Proceedings of the European Conference on Computer Vision, 677–693.
- 2D amodal instance segmentation guided by 3D shape prior. In Proceedings of the European Conference on Computer Vision, 165–181.
- Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
- Variational amodal object completion. Advances in Neural Information Processing Systems, 33: 16246–16257.
- Nanay, B. 2018. The importance of amodal completion in everyday perception. i-Perception, 9(4).
- A weakly supervised amodal segmenter with boundary uncertainty estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 7396–7405.
- Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32: 8026–8037.
- Bayesian semantic instance segmentation in open set world. In Proceedings of the European Conference on Computer Vision, 3–18.
- Amodal instance segmentation with kins dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3014–3023.
- “GrabCut” interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3): 309–314.
- Amodal segmentation through out-of-task and out-of-distribution generalization with a Bayesian model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1215–1224.
- Weakly supervised instance segmentation based on two-stage transfer learning. IEEE Access, 8: 24135–24144.
- Conditional convolutions for instance segmentation. In Proceedings of the European Conference on Computer Vision, 282–298.
- FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 9627–9636.
- Boxinst: High-performance instance segmentation with box annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5443–5452.
- AISFormer: Amodal instance segmentation with transformer. In Proceedings of the British Machine Vision Conference.
- Instance segmentation of visible and occluded regions for finding and picking target from a pile of objects. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2048–2055.
- Joint learning of instance and semantic segmentation for robotic pick-and-place with heavy occlusions in clutter. In Proceedings of the International Conference on Robotics and Automation, 9558–9564.
- Robust object detection under occlusion with context-aware compositionalnets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12645–12654.
- Detectron2. https://github.com/facebookresearch/detectron2.
- Beyond pascal: A benchmark for 3D object detection in the wild. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 75–82.
- Amodal segmentation based on visible region segmentation and shape prior. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 2995–3003.
- Self-supervised scene de-occlusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3784–3792.
- Learning semantics-aware distance map with semantics layering network for amodal instance segmentation. In Proceedings of the 27th ACM International Conference on Multimedia, 2124–2132.
- Semantic amodal segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1464–1472.
- Zhaochen Liu (19 papers)
- Zhixuan Li (20 papers)
- Tingting Jiang (27 papers)