Open-Set Object Detection Using Classification-free Object Proposal and Instance-level Contrastive Learning (2211.11530v2)

Published 21 Nov 2022 in cs.CV and cs.RO

Abstract: Detecting both known and unknown objects is a fundamental skill for robot manipulation in unstructured environments. Open-set object detection (OSOD) is a promising direction to handle the problem consisting of two subtasks: objects and background separation, and open-set object classification. In this paper, we present Openset RCNN to address the challenging OSOD. To disambiguate unknown objects and background in the first subtask, we propose to use classification-free region proposal network (CF-RPN) which estimates the objectness score of each region purely using cues from object's location and shape preventing overfitting to the training categories. To identify unknown objects in the second subtask, we propose to represent them using the complementary region of known categories in a latent space which is accomplished by a prototype learning network (PLN). PLN performs instance-level contrastive learning to encode proposals to a latent space and builds a compact region centering with a prototype for each known category. Further, we note that the detection performance of unknown objects can not be unbiasedly evaluated on the situation that commonly used object detection datasets are not fully annotated. Thus, a new benchmark is introduced by reorganizing GraspNet-1billion, a robotic grasp pose detection dataset with complete annotation. Extensive experiments demonstrate the merits of our method. We finally show that our Openset RCNN can endow the robot with an open-set perception ability to support robotic rearrangement tasks in cluttered environments. More details can be found in https://sites.google.com/view/openset-rcnn/

Authors (4)

Zhongxiang Zhou (19 papers)
Yifei Yang (50 papers)
Yue Wang (676 papers)
Rong Xiong (115 papers)

Citations (11)

View on Semantic Scholar

Summary

The paper introduces Openset RCNN, which enhances object detection by using classification-free proposals based on spatial cues.
It employs a Prototype Learning Network with instance-level contrastive learning to form discriminative representations for object differentiation.
The research establishes a new OSOD benchmark from GraspNet-1billion, demonstrating superior performance for robotic tasks compared to conventional methods.

Overview of Open-Set Object Detection Using Classification-free Object Proposal and Instance-level Contrastive Learning

The paper "Open-Set Object Detection Using Classification-free Object Proposal and Instance-level Contrastive Learning" addresses a critical aspect of robotic perception in unstructured environments—detecting both known and unknown objects. This is particularly relevant for robots engaged in manipulation tasks where they encounter diverse and unpredictable objects. The paper proposes Openset RCNN, a framework for open-set object detection (OSOD) that integrates a classification-free region proposal network (CF-RPN) and a prototype learning network (PLN).

The OSOD problem is bifurcated into two distinct challenges: distinguishing objects from the background and classifying unknown objects. Traditional object detectors are limited by assuming a fixed set of known object categories during both training and testing phases. Such closed-set assumptions are inadequate for real-world scenarios where object categories are infinite. This research advances toward solving these challenges by using classification-free object proposals and contrastive learning at the instance level.

Key Contributions

Classification-free Region Proposal Network (CF-RPN): The CF-RPN enhances generalization by relying on object location and shape cues rather than classification, allowing the model to avoid overfitting to the training set categories. It utilizes centerness regression and bounding box refinement without classification, thereby improving the network's ability to propose potential object locations without bias toward known categories.
Prototype Learning Network (PLN): The PLN encodes proposals into a latent space and utilizes instance-level contrastive learning to form a compact, discriminative representation (prototype) for each known category. This allows for effective separation of known from unknown objects based on their latent distances to category prototypes.
Benchmark Creation:
- The paper proposes a new benchmark for OSOD by reorganizing GraspNet-1billion, a dataset with comprehensive annotations, to facilitate unbiased evaluation of unknown object detection performance. This addresses a significant limitation of commonly used datasets like PASCAL VOC and COCO, which are not fully annotated.
Robotic Applications: The proposed Openset RCNN algorithm has practical applications in robotic systems, particularly for tasks requiring the identification of new objects among clutter. The work demonstrates the utility in robotic rearrangement tasks, suggesting Openset RCNN's potential to support real-time decision-making in robotics.

Numerical Results and Implications

The experimental results are convincing—Openset RCNN exhibits superior performance compared to existing methods like ORE, PROSER, and OpenDet, as evidenced by metrics such as Wilderness Impact (WI), Absolute Open-Set Error (AOSE), mAP of known categories (mAP $_\mathcal{K}$ ), and recall of unknown objects (R $_\mathcal{U}$ ). These outcomes suggest that classification-free proposals, combined with contrastive learning, significantly boost the model's capability to handle open-set conditions.

The authors' experiments not only validate Openset RCNN's proficiency on conventional datasets like VOC and COCO but also demonstrate its effectiveness on the newly proposed GraspNet OSOD benchmark. The results on GraspNet are particularly compelling as they provide a more accurate reflection of the model's performance due to complete annotations.

Implications and Future Directions

The methodology proposed in this paper contributes to theoretical advancements in the field of open-set recognition and practical robotic perception. By decoupling object detection from fixed category assumptions, Openset RCNN paves the way for more versatile and adaptable robotic systems. In practice, this could reduce failure rates in robotic manipulation tasks involving unknown objects, leading to more autonomous and intelligent systems.

Moving forward, future research could enhance Openset RCNN by exploring alternative architectures or integration with other sensor modalities to further bolster open-set object detection capabilities. Additionally, extending these methodologies to other dynamic environments and fine-tuning for real-time performance could rapidly advance the applicability of these systems in various industrial and consumer domains. The framework set by this paper is a promising step toward developing robots that can more seamlessly adapt to an ever-changing world.

PDF Markdown

Related Papers

YouTube

Show All Videos