Bottom-up Object Detection by Grouping Extreme and Center Points (1901.08043v3)

Published 23 Jan 2019 in cs.CV

Abstract: With the advent of deep learning, object detection drifted from a bottom-up to a top-down recognition problem. State of the art algorithms enumerate a near-exhaustive list of object locations and classify each into: object or not. In this paper, we show that bottom-up approaches still perform competitively. We detect four extreme points (top-most, left-most, bottom-most, right-most) and one center point of objects using a standard keypoint estimation network. We group the five keypoints into a bounding box if they are geometrically aligned. Object detection is then a purely appearance-based keypoint estimation problem, without region classification or implicit feature learning. The proposed method performs on-par with the state-of-the-art region based detection methods, with a bounding box AP of 43.2% on COCO test-dev. In addition, our estimated extreme points directly span a coarse octagonal mask, with a COCO Mask AP of 18.9%, much better than the Mask AP of vanilla bounding boxes. Extreme point guided segmentation further improves this to 34.6% Mask AP.

Citations (840)

View on Semantic Scholar

Summary

The paper introduces a bottom-up detection method using ExtremeNet that groups extreme and center points to form bounding boxes without relying on region proposals.
It employs multi-peak heatmap generation and geometric grouping, achieving a bounding box AP of 43.7% on COCO for efficient object detection.
The paper shows that extreme point annotations simplify labeling while enabling competitive instance segmentation, improving Mask AP to 34.6% with guided segmentation.

Bottom-up Object Detection by Grouping Extreme and Center Points

The paper "Bottom-up Object Detection by Grouping Extreme and Center Points," written by Xingyi Zhou, Jiacheng Zhuo, and Philipp Krähenbühl, challenges the predominance of top-down approaches in object detection by revisiting and enhancing the bottom-up framework. Unlike conventional top-down detection that relies on exhaustive region proposals and region classification, this research proposes the ExtremeNet, which focuses on detecting and grouping object keypoints without region proposals.

Main Contributions

This research introduces a bottom-up object detection framework, known as ExtremeNet, that successfully identifies objects by locating four extreme points (top-most, left-most, bottom-most, right-most) and a center point. These points are detected using a sophisticated keypoint estimation framework. The authors argue that this approach leverages the natural geometric properties of objects, offering a competitive alternative to the more resource-intensive, top-down methods.

Key components of the proposed method include:

Extreme Points Detection: The ExtremeNet employs an advanced keypoint estimation network to detect the extreme points of objects, creating four multi-peak heatmaps for each object category.
Center Grouping: The detected extreme points are geometrically grouped into bounding boxes if they align correctly with a high-confidence center point prediction. This removes the need for classifying regions or learning implicit features.
Heatmaps and Geometric Grouping: The network generates five heatmaps (four for extreme points and one for the center), and extreme points are grouped based on geometric alignments, verified by their geometric center on the center heatmap.
Bounding Box and Mask Estimation: The method demonstrated not only provides competitive bounding box detection (with an AP of 43.7% on COCO test-dev) but also generates a coarse octagonal mask. Using guided segmentation methods such as DEXTR, it further improves mask accuracy to a Mask AP of 34.6%.

Numerical Results and Evaluation

The ExtremeNet framework was evaluated on the COCO dataset, achieving a bounding box AP of 43.7%, outperforming many state-of-the-art one-stage detectors and on par with sophisticated two-stage detectors. The paper additionally reports significant performance in object instance segmentation:

Bounding Box AP: 43.7%
COCO Mask AP (from coarse octagonal mask): 18.9%
Improved COCO Mask AP (using DEXTR): 34.6%

These figures show impressive competitiveness, specifically against leading methods such as Mask-RCNN, considering no COCO mask annotations were used during training for the instance segmentation.

Implications and Future Directions

The research offers several practical and theoretical implications:

Performance Efficiency: The bottom-up approach presented circumvents the exhaustive region proposal stages of top-down approaches, reducing computational complexity without sacrificing accuracy.
Segmentation Insight: The use of extreme points yields informative masks that enhance the accuracy of instance segmentation. This approach, combined with DEXTR, brings significant performance improvements.
Scalability and Flexibility: By leveraging keypoint detection and geometric alignment, ExtremeNet showcases a flexible method adaptable to various object shapes and sizes.
Annotation Efficiency: Given that extreme point annotations are significantly faster to collect compared to full instance segmentation masks, this method can streamline future dataset creation efforts, particularly for fine-grained object detection tasks.

Conclusion

Overall, ExtremeNet's bottom-up framework offers a lucid and efficient alternative to conventional top-down object detection methodologies. It highlights leveraging geometric properties for grouping keypoints into bounding boxes, proving competitive on benchmarks like COCO. With further research and optimization, particularly in keypoint accuracy and detection speed, this approach could see broader application in real-world object detection and segmentation challenges.

PDF Markdown

Related Papers

Objects as Points (2019)
CenterNet: Keypoint Triplets for Object Detection (2019)
CenterNet++ for Object Detection (2022)
Objects as Extreme Points (2021)
Achieving Competitive Play Through Bottom-Up Approach in Semantic Segmentation (2021)