Classification-Free RPN for Open-Set Detection
- The paper introduces a classification-free framework that relies on centerness and IoU to score object proposals without class-specific supervision.
- It utilizes a parallel dual-head architecture with RoIAlign refinement to accurately regress box coordinates and estimate proposal quality.
- The method prevents overfitting to annotated classes, offering robust open-set detection by leveraging localization cues in unstructured settings.
A Classification-Free Region Proposal Network (CF-RPN) is a network architecture designed for generating object proposals without using category-specific classification signals. Unlike standard region proposal networks (RPNs), CF-RPNs estimate objectness scores based solely on localization cues such as centerness and predicted intersection-over-union (IoU) with ground-truth, eschewing any class-dependent binary object/background discrimination. Introduced in the context of open-set object detection, the CF-RPN’s objectness formulation is specifically constructed to avoid overfitting to annotated classes, making it suitable for environments with unannotated or unknown categories. The CF-RPN is a core component of the Openset RCNN framework for open-set object detection in unstructured settings (Zhou et al., 2022).
1. Network Architecture
The CF-RPN architecture is layered upon a backbone and feature pyramid. Input images are processed by ResNet-50 augmented with a Feature Pyramid Network (FPN) to create a set of multi-scale feature maps . Each FPN level passes through a shared convolutional layer (with ReLU activation, output channels 256), producing .
Object proposal generation proceeds via two parallel heads:
- Centerness Head: For each spatial location and anchor, a convolution followed by a convolution and sigmoid activation yields the scalar centerness score . This head produces a localization-focused confidence and replaces the class/foreground-vs-background classifier of standard RPNs.
- Box Regression Head (ltrb): A parallel convolution followed by a convolution produces the four values per anchor, encoding the distances from the feature point to the left, top, right, and bottom box sides, respectively, avoiding the use of the parameterization.
Anchors are ranked by centerness, and the top-K (typically =2,000 for training, 1,000 for inference) are retained as proposals. Each proposal is further refined via RoIAlign to obtain fixed-size features over –, which are then processed by:
- IoU Regression Head: Predicts .
- Standard Box Regression Head: Uses the conventional offsets as in Faster R-CNN.
The final per-proposal objectness is .
2. Objectness Scoring and Loss Formulation
The objectness scoring omits direct binary classification and instead relies upon the interplay of predicted centerness and IoU. For an FPN location+anchor :
- : predicted centerness from the centerness head.
- : predicted IoU from the refinement head.
The final objectness score is
Training is accomplished via four smooth L1 loss terms over a sampled set of anchors (): The overall CF-RPN loss is
with typical weights: , , .
3. Key Differences with Standard RPN
The CF-RPN departs from standard RPN in several critical aspects:
| Feature | Standard RPN | CF-RPN |
|---|---|---|
| Classification Head | Binary object/background | None (classification-free) |
| Objectness Signal | Classification/softmax | Centerness × IoU |
| Anchor Regression | (ltrb), plus refinement | |
| Negative Sampling | May treat unknown objects as background | Avoids negative bias, improved for open-set |
This approach prevents overfitting to training categories and avoids mislabeling unannotated or unknown objects as negative samples during training (Zhou et al., 2022). Objectness prediction becomes category-agnostic, critically supporting open-set settings.
4. Proposal Generation Pipeline
CF-RPN utilizes standard anchors (e.g., 3 scales × 3 aspect ratios at each FPN location). Proposal ranking and refinement proceed as follows:
- All anchors are scored for centerness.
- The regression head predicts to form preliminary proposals.
- Top-K proposals by centerness are selected.
- Proposal features are extracted via RoIAlign.
- IoU and regression are performed for each proposal.
- The final objectness is computed as .
- Proposals with are filtered out during inference.
The proposal selection mechanism avoids reliance on class label priors, which is vital for open-set settings.
5. Integration with Openset RCNN and PLN
Within the Openset RCNN architecture, the CF-RPN provides class-agnostic object proposals with their objectness scores. These proposals then pass through subsequent open-set classification and filtering steps:
- Per-proposal features from RoIAlign are passed to the Prototype Learning Network (PLN).
- PLN encodes each feature to a latent embedding and compares it to known-class prototypes via cosine distance .
- If , the proposal is labeled “unknown”; else, it is classified into the most similar known category via a -way softmax.
- Known and unknown proposals are non-max suppressed separately (IoU threshold 0.5), and the top 50 of each category are retained for final detection.
This division enables robust distinction between unknown objects and background, leveraging the category-agnostic nature of CF-RPN scoring (Zhou et al., 2022).
6. Hyperparameters and Training Strategy
CF-RPN training implements the following key hyperparameter regimes:
- Sampling: for initial ltrb; for refinement; , , for two training stages.
- PLN: Margin parameters , ; embedding dimension ; IoU threshold .
- Unknown threshold: –$0.23$, determined via validation.
- Loss weights: (CF-RPN), –$2$ (PLN), (softmax classifier).
- Inference: Top 1,000 anchors by centerness, objectness filtering at , thresholding unknown/known, separate NMS, top 50 each.
A pseudocode overview of the inference procedure in the context of Openset RCNN is presented in the original work.
7. Context and Significance in Open-set Object Detection
The CF-RPN addresses a central challenge in open-set object detection (OSOD): the inability of standard proposal mechanisms to separate unknown objects from unannotated background due to reliance on class-based supervision. CF-RPN’s localization-driven scoring is specifically designed to promote generalization to unknown or novel objects and to prevent the systematic exclusion of such instances as negatives. This is particularly critical when evaluating on datasets with incomplete annotations or for real-world robotic perception tasks in unstructured environments (Zhou et al., 2022). CF-RPN underpins the OSOD capability of Openset RCNN, enabling practical open-set perception for robotic rearrangement in cluttered domains.