Towards Bounding-Box Free Panoptic Segmentation (2002.07705v3)

Published 18 Feb 2020 in cs.CV and cs.RO

Abstract: In this work we introduce a new Bounding-Box Free Network (BBFNet) for panoptic segmentation. Panoptic segmentation is an ideal problem for proposal-free methods as it already requires per-pixel semantic class labels. We use this observation to exploit class boundaries from off-the-shelf semantic segmentation networks and refine them to predict instance labels. Towards this goal BBFNet predicts coarse watershed levels and uses them to detect large instance candidates where boundaries are well defined. For smaller instances, whose boundaries are less reliable, BBFNet also predicts instance centers by means of Hough voting followed by mean-shift to reliably detect small objects. A novel triplet loss network helps merging fragmented instances while refining boundary pixels. Our approach is distinct from previous works in panoptic segmentation that rely on a combination of a semantic segmentation network with a computationally costly instance segmentation network based on bounding box proposals, such as Mask R-CNN, to guide the prediction of instance labels using a Mixture-of-Expert (MoE) approach. We benchmark our proposal-free method on Cityscapes and Microsoft COCO datasets and show competitive performance with other MoE based approaches while outperforming existing non-proposal based methods on the COCO dataset. We show the flexibility of our method using different semantic segmentation backbones.

Citations (11)

View on Semantic Scholar

Summary

The paper introduces BBFNet, an innovative architecture that removes bounding-box predictions to reduce computational complexity in panoptic segmentation.
It employs watershed prediction and a Hough voting mechanism to accurately delineate both large and small object instances.
Experimental results on Cityscapes and COCO show improved PQ scores and efficiency, supporting real-time applications in computer vision.

Overview of Bounding-Box Free Panoptic Segmentation

The paper "Towards Bounding-Box Free Panoptic Segmentation" introduces a novel architecture known as the Bounding-Box Free Network (BBFNet). This network is a step forward in the field of computer vision, specifically in the task of panoptic segmentation. This work diverges from traditional methods by eliminating the necessity for bounding-box predictions in instance segmentation, thus addressing the computational burdens associated with two-stage detection methods like Mask R-CNN.

Core Methodology

BBFNet capitalizes on proposal-free methods, leveraging the dense per-pixel semantic label predictions from existing semantic segmentation networks. The network comprises several stages for processing imagery, notably:

Watershed Prediction: This layer is designed to identify potential large object instances leveraging watershed-level predictions from a coarse segmentation map, refining boundaries initially derived from semantic segmentation outputs.
Hough Voting Mechanism: For smaller instances, which are often fragmented and less distinct, BBFNet employs a Hough voting process to estimate instance centers. The computed offsets and subsequent clustering through mean-shift allow for accurate small-object detection without bounding-boxes.
Triplet Loss Network: To merge fragmented instances and refine boundary pixels, a triplet loss is employed, fostering consistent and refined instance delineations. This approach integrates both spatial and feature-based considerations to optimize instance-level segmentation.

The BBFNet architecture allows for the integration of any semantic segmentation backbone, demonstrating flexibility in its implementation and potential for adaptation to various datasets and tasks.

Experimental Evaluation

The paper evaluates BBFNet on notable datasets, including Cityscapes and Microsoft COCO, demonstrating its competitive performance. A significant achievement of BBFNet is its ability to outperform existing non-proposal based approaches on the COCO dataset. Key numerical results highlight:

Competitive PQ (Panoptic Quality) scores on Cityscapes and COCO without relying on computationally intensive MoE (Mixture-of-Expert) strategies.
Notable improvements in the segmentation of small objects using a refined clustering and voting strategy.

Implications and Future Directions

The theoretical implications of this work are vast, challenging the necessity of bounding-boxes in panoptic segmentation tasks. By removing the bounding-box dependency, BBFNet reduces computational overhead and suggests a paradigm shift towards more efficient and adaptable segmentation frameworks.

Practically, this allows BBFNet to be employed in real-time applications where computational efficiency is crucial, such as autonomous driving and real-time video analysis.

The paper opens multiple avenues for future research. Enhancing the adaptability of BBFNet with varied semantic segmentation backbones can broaden its applicability. Further integration of more advanced feature extraction techniques and real-time optimization strategies can potentially enhance performance while maintaining efficiency.

In summary, BBFNet presents a compelling argument for bounding-box free methodologies in panoptic segmentation, suggesting that such approaches may hold the key to achieving balance between accuracy and computational demand in future computer vision tasks.

PDF Markdown

Related Papers

YouTube

Show All Videos