OpenSet RCNN for Object Detection
- OpenSet RCNN is an object detection framework that identifies known and unknown objects using category-agnostic region proposals and prototype-based learning.
- It replaces traditional RPN with a centerness and IoU regression mechanism, enhancing localization without overfitting to training categories.
- Benchmarking shows that the combined CF-RPN and PLN approach significantly reduces open-set errors while improving unknown-class precision.
OpenSet RCNN is an object detection framework designed for open-set object detection (OSOD), where the aim is to detect objects belonging to an open vocabulary—both objects from categories present during training (“known”) and novel object categories (“unknown”)—as well as robustly separate objects from background regions. OpenSet RCNN introduces a two-stage architecture that systematically addresses the unique challenges of OSOD through classification-free region proposals and instance-level prototype-based contrastive learning, evaluated under a fully-annotated benchmarking protocol that enables unbiased measurement of unknown-class performance (Zhou et al., 2022).
1. Architectural Overview and Core Innovations
OpenSet RCNN employs a standard ResNet-50 backbone with Feature Pyramid Network (FPN) to generate multiscale feature maps. The architecture diverges from conventional two-stage detectors through two principal components:
- Classification-Free RPN (CF-RPN): The first stage replaces the traditional object/background classifier with a localization-based, category-agnostic objectness estimator.
- @@@@1@@@@ (PLN): The second stage refines region proposals and simultaneously embeds each region in a low-dimensional latent space for prototype-based discrimination between known and unknown objects, in parallel with a standard classification head.
This arrangement is specifically constructed to avoid the overfitting to known categories that occurs when conventional region proposal networks (RPNs) leverage category information even in the absence of exhaustive annotations, and to facilitate robust unknown object identification unseen during training (Zhou et al., 2022).
2. Classification-Free Region Proposal Network (CF-RPN)
The CF-RPN is central to OpenSet RCNN’s ability to propose candidate regions without reliance on object category information. Unlike the conventional RPN, which applies a binary object/background classifier, CF-RPN replaces this with a centerness regression head (following the FCOS style) and an offset-based bounding box regression:
- Centerness Score (): Encodes the geometric likelihood that an anchor centers an object, independent of class.
- IoU Regression (): Refines the proposal and regresses its predicted Intersection-over-Union with any ground-truth box.
The final objectness score per region is , ensuring that only location and shape cues determine objectness at proposal time. Training employs a composite loss: with empirically tuned weights. Proposals are sampled based on IoU thresholds with ground-truth to ensure high object-versus-background discrimination without reliance on categories (Zhou et al., 2022).
3. Prototype Learning Network (PLN) and Unknown Object Disambiguation
PLN addresses the open-set disambiguation problem by embedding RoI features into a learned latent space, where known classes are represented by learned “prototypes” .
- Each region’s feature vector is projected as (typically ), enabling similarity comparisons via cosine distance:
- A double-margin supervised contrastive loss encourages embedding features close to the matching prototype if known, and pushed apart otherwise:
with positive and negative margins , .
At inference, a proposal is deemed “unknown” if the minimum distance to any class prototype exceeds a threshold ; otherwise, standard softmax classification assigns a label. This mechanism is designed to leverage the complement of the known prototype space for unknown class localization (Zhou et al., 2022).
4. Training, Inference, and Loss Functions
Training involves jointly optimizing the multi-task loss: with empirically set weights (). Hyperparameters for sampling and thresholds are described for each stage; for example, the PLN loss is only applied to RoIs with IoU exceeding to focus learning on meaningful proposals.
The inference process executes as follows:
- Extract multi-scale features using the backbone and FPN.
- Generate anchors and score by centerness; top anchors undergo NMS and further refinement.
- For each candidate, compute objectness (); filter proposals with .
- Embed each region and compute all prototype distances. Proposals with are labeled as unknown; remaining boxes are classified via the softmax head.
- Non-maximum suppression (NMS) is performed separately for known and unknown sets (Zhou et al., 2022).
5. Benchmarking Protocol and Open-Set Metrics
To ensure unbiased open-set evaluation, OpenSet RCNN introduces a rigorous benchmark based on a reorganized GraspNet-1Billion dataset, which is exhaustively annotated:
- 28 classes are set as “known” for training; the remaining 60 classes are held out as “unknown” for testing.
- Two test settings are defined: increasing number of unknown classes; and increasing “Wilderness Ratio” (ratio of unknown to known objects per scene).
The evaluation protocol leverages the following metrics:
- mAP: Mean Average Precision over known categories (COCO style).
- AP: Average precision over unknown classes, enabled by exhaustive ground-truth.
- Wilderness Impact (WI): Measures how much open-set noise degrades known-class precision at a fixed recall.
- Absolute Open-Set Error (AOSE): The number of unknown objects misidentified as belonging to a known class.
Typical results on the hardest test (GraspNet-OSOD Test 6, WR=3) are summarized as:
| Method | WI ↓ | AOSE ↓ | mAP ↑ | AP ↑ |
|---|---|---|---|---|
| Baseline | 0.28 | 254,027 | 61.15 | 0.00 |
| + CF-RPN only | 0.29 | 236,594 | 61.24 | 0.00 |
| + PLN only | 0.39 | 69,339 | 61.97 | 37.30 |
| CF-RPN + PLN (ours) | 0.27 | 53,047 | 62.46 | 49.34 |
This illustrates that the joint design (CF-RPN + PLN) provides a substantial reduction in AOSE, significant improvement in AP, and a slight increase in known-class mAP, relative to all baselines (Zhou et al., 2022).
6. Implementation Details and Reproducibility
OpenSet RCNN uses a ResNet-50 backbone pretrained on ImageNet, integrated with FPN. Prototype embedding dimension is set at . Typical margin hyperparameters are and , with a distance threshold for unknown detection. Loss weights and sampling hyperparameters are tuned to achieve optimal trade-offs between detection and open-set discrimination. Further low-level implementation details (learning rates, data augmentation, NMS thresholds, batch sampling schemes) are provided in the official code repository (Zhou et al., 2022).
7. Impact and Position in Open-Set Detection Research
OpenSet RCNN advances the state of the art in OSOD by addressing the two subproblems—object proposal and open-set recognition—without bias from non-exhaustive training data. The methodological foundation, comprising classification-free proposals and instance-level contrastive learning with prototypes, separates object versus background and known versus unknown without overfitting to training categories. Across multiple evaluation settings, OpenSet RCNN achieves lower open-set error, improved unknown-class recall and average precision, and robust performance on known classes. The introduction of an unbiased, fully-annotated benchmark enables quantitative and fair assessment, establishing OpenSet RCNN as a reference method for open-set detection in robotics and related domains (Zhou et al., 2022).