Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ObjectBox: From Centers to Boxes for Anchor-Free Object Detection (2207.06985v1)

Published 14 Jul 2022 in cs.CV

Abstract: We present ObjectBox, a novel single-stage anchor-free and highly generalizable object detection approach. As opposed to both existing anchor-based and anchor-free detectors, which are more biased toward specific object scales in their label assignments, we use only object center locations as positive samples and treat all objects equally in different feature levels regardless of the objects' sizes or shapes. Specifically, our label assignment strategy considers the object center locations as shape- and size-agnostic anchors in an anchor-free fashion, and allows learning to occur at all scales for every object. To support this, we define new regression targets as the distances from two corners of the center cell location to the four sides of the bounding box. Moreover, to handle scale-variant objects, we propose a tailored IoU loss to deal with boxes with different sizes. As a result, our proposed object detector does not need any dataset-dependent hyperparameters to be tuned across datasets. We evaluate our method on MS-COCO 2017 and PASCAL VOC 2012 datasets, and compare our results to state-of-the-art methods. We observe that ObjectBox performs favorably in comparison to prior works. Furthermore, we perform rigorous ablation experiments to evaluate different components of our method. Our code is available at: https://github.com/MohsenZand/ObjectBox.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mohsen Zand (11 papers)
  2. Ali Etemad (118 papers)
  3. Michael Greenspan (30 papers)
Citations (43)

Summary

Overview of "ObjectBox: From Centers to Boxes for Anchor-Free Object Detection"

The paper, authored by Mohsen Zand, Ali Etemad, and Michael Greenspan, introduces a novel approach to object detection called ObjectBox. The method represents a significant advancement within the field of anchor-free object detection, presenting a methodology that generalizes across datasets without the need for dataset-specific hyperparameter tuning. ObjectBox diverges from traditional anchor-based methods by focusing solely on object center locations, treating these as the primary positive samples for all object scales, and using these samples for bounding box regression.

Methodological Approach

ObjectBox leverages center-based anchor-free detection by using shape- and size-agnostic anchors defined by object center locations. This is a departure from the anchor-based approach, which typically relies on a predefined set of anchor boxes with fixed sizes and aspect ratios, thus requiring adjustment of several hyperparameters to generalize across datasets. The proposed approach does not impose specific constraints based on the size or shape of objects, which often biases traditional methods towards certain scales.

For label assignment, ObjectBox considers all objects across various scales as positive samples. This is achieved by computing distances from the center cell's corners to the bounding box's four sides as new regression targets. This strategy ensures that the number of positive samples per object remains independent of the object's size. Furthermore, the authors propose a tailored Intersection over Union (IoU) loss function, termed Scale-invariant Distance IoU (SDIoU), which enhances the accuracy of object detection across varying scales by focusing on the distances from the object center to the bounding box boundaries.

Experimental Evaluation

The authors have conducted comprehensive evaluations on the MS-COCO 2017 and PASCAL VOC 2012 datasets to demonstrate the efficacy of ObjectBox. The reported results indicate that ObjectBox outperforms many established state-of-the-art methods, such as RetinaNet and FCOS, with significant improvements in average precision (AP) scores. For instance, using a ResNet-101 backbone, ObjectBox achieves an AP of 46.1%, while with CSPDarknet, it reaches an AP of 46.8%. The method also shows enhanced performance in detecting small (APSAP_S) and large objects (APLAP_L), suggesting its robustness across different object scales.

Through an extensive ablation paper, the authors verify the impact of their novel label assignment strategy and the effectiveness of the SDIoU loss. They find that leveraging augmented center locations for regression and maintaining a single prediction per location per scale yields the best results. The paper substantiates that unlike previous methods that require complex label assignment strategies, the relaxation of constraints by ObjectBox, coupled with the tailored loss function, leads to superior performance and fosters better generalization.

Implications and Future Outlook

The findings of this paper hold significant implications for real-world applications of object detection in environments where generalization across datasets is crucial. The elimination of hyperparameter tuning associated with anchor box specification considerably reduces the complexity involved in deploying object detection models in diverse scenarios.

In theoretical terms, ObjectBox's success in using a center-based approach opens new avenues for improving anchor-free detection algorithms. The concept of leveraging object center locations to drive object detection, along with the introduction of SDIoU, could inspire further research into developing more efficient and less computationally demanding detection models.

Future exploration could involve extending ObjectBox's framework to integrate temporal information in video object detection or adapt it for real-time applications. Furthermore, exploring its applicability in different domains, such as medical imaging or autonomous vehicles, where objects may appear at varying scales and positions, could be highly beneficial.

Overall, ObjectBox represents a notable contribution to the field of computer vision, specifically in anchor-free object detection methods, distinguishing itself through its unique label assignment approach and refined loss function that together foster improved performance and generalization capabilities.

Github Logo Streamline Icon: https://streamlinehq.com