Classification-Free RPN for Open-Set Detection

Updated 18 March 2026

The paper introduces a classification-free framework that relies on centerness and IoU to score object proposals without class-specific supervision.
It utilizes a parallel dual-head architecture with RoIAlign refinement to accurately regress box coordinates and estimate proposal quality.
The method prevents overfitting to annotated classes, offering robust open-set detection by leveraging localization cues in unstructured settings.

A Classification-Free Region Proposal Network (CF-RPN) is a network architecture designed for generating object proposals without using category-specific classification signals. Unlike standard region proposal networks (RPNs), CF-RPNs estimate objectness scores based solely on localization cues such as centerness and predicted intersection-over-union (IoU) with ground-truth, eschewing any class-dependent binary object/background discrimination. Introduced in the context of open-set object detection, the CF-RPN’s objectness formulation is specifically constructed to avoid overfitting to annotated classes, making it suitable for environments with unannotated or unknown categories. The CF-RPN is a core component of the Openset RCNN framework for open-set object detection in unstructured settings (Zhou et al., 2022).

1. Network Architecture

The CF-RPN architecture is layered upon a backbone and feature pyramid. Input images are processed by ResNet-50 augmented with a Feature Pyramid Network (FPN) to create a set of multi-scale feature maps $\{P_2,P_3,P_4,P_5,P_6\}$ . Each FPN level $\ell$ passes through a shared $3 \times 3$ convolutional layer (with ReLU activation, output channels 256), producing $F_\ell \in \mathbb{R}^{H_\ell \times W_\ell \times 256}$ .

Object proposal generation proceeds via two parallel heads:

Centerness Head: For each spatial location and anchor, a $3 \times 3$ convolution followed by a $1 \times 1$ convolution and sigmoid activation yields the scalar centerness score $c_i \in (0,1)$ . This head produces a localization-focused confidence and replaces the class/foreground-vs-background classifier of standard RPNs.
Box Regression Head (ltrb): A parallel $3 \times 3$ convolution followed by a $1 \times 1$ convolution produces the four values $l, t, r, b$ per anchor, encoding the distances from the feature point to the left, top, right, and bottom box sides, respectively, avoiding the use of the $\Delta x, \Delta y, \Delta w, \Delta h$ parameterization.

Anchors are ranked by centerness, and the top-K (typically $K$ =2,000 for training, 1,000 for inference) are retained as proposals. Each proposal is further refined via RoIAlign to obtain fixed-size features over $P_2$ – $P_5$ , which are then processed by:

IoU Regression Head: Predicts $b_i = \hat{\mathrm{IoU}}(\mathrm{box}_i, \mathrm{ground\_truth}) \in (0,1)$ .
Standard Box Regression Head: Uses the conventional $\Delta x, \Delta y, \Delta w, \Delta h$ offsets as in Faster R-CNN.

The final per-proposal objectness is $s_i = \sqrt{c_i \cdot b_i}$ .

2. Objectness Scoring and Loss Formulation

The objectness scoring omits direct binary classification and instead relies upon the interplay of predicted centerness and IoU. For an FPN location+anchor $x$ :

$c(x)$ : predicted centerness from the centerness head.
$b(x)$ : predicted IoU from the refinement head.

The final objectness score is

$s(x) = \sqrt{\,c(x)\; \cdot \; b(x)\,}\,.$

Training is accomplished via four smooth L1 loss terms over a sampled set $S$ of anchors ( $|S| = N_s$ ): $\begin{align*} L_{\rm ctr} &= \mathrm{smooth}_{L_1}(c_i - c_i^*) \ L_{\rm box1} &= \mathrm{smooth}_{L_1}([l_i, t_i, r_i, b_i] - \mathrm{target}_{\mathrm{ltrb} i}) \ L_{\rm iou} &= \mathrm{smooth}_{L_1}(b_i - \mathrm{IoU}_i^*) \ L_{\rm box2} &= \mathrm{smooth}_{L_1}(\Delta_i - \Delta_i^*) \ \end{align*}$ The overall CF-RPN loss is

$\mathcal{L}_{\rm CF\!-\!RPN} = \lambda_1 L_{\rm ctr} + \lambda_2 L_{\rm box1} + \lambda_3 L_{\rm iou} + \lambda_4 L_{\rm box2}$

with typical weights: $\lambda_1 = \lambda_3 = 0.5$ , $\lambda_2 = 10$ , $\lambda_4 = 2$ .

3. Key Differences with Standard RPN

The CF-RPN departs from standard RPN in several critical aspects:

Feature	Standard RPN	CF-RPN
Classification Head	Binary object/background	None (classification-free)
Objectness Signal	Classification/softmax	Centerness × IoU
Anchor Regression	$\Delta x,\Delta y,\Delta w,\Delta h$	$l, t, r, b$ (ltrb), plus $\Delta$ refinement
Negative Sampling	May treat unknown objects as background	Avoids negative bias, improved for open-set

This approach prevents overfitting to training categories and avoids mislabeling unannotated or unknown objects as negative samples during training (Zhou et al., 2022). Objectness prediction becomes category-agnostic, critically supporting open-set settings.

4. Proposal Generation Pipeline

CF-RPN utilizes standard anchors (e.g., 3 scales × 3 aspect ratios at each FPN location). Proposal ranking and refinement proceed as follows:

All anchors are scored for centerness.
The regression head predicts $l, t, r, b$ to form preliminary proposals.
Top-K proposals by centerness are selected.
Proposal features are extracted via RoIAlign.
IoU and $\Delta$ regression are performed for each proposal.
The final objectness is computed as $s_i = \sqrt{c_i \cdot b_i}$ .
Proposals with $s_i < 0.05$ are filtered out during inference.

The proposal selection mechanism avoids reliance on class label priors, which is vital for open-set settings.

5. Integration with Openset RCNN and PLN

Within the Openset RCNN architecture, the CF-RPN provides class-agnostic object proposals with their objectness scores. These proposals then pass through subsequent open-set classification and filtering steps:

Per-proposal features from RoIAlign are passed to the Prototype Learning Network (PLN).
PLN encodes each feature $f_i$ to a latent embedding $z_i$ and compares it to $K$ known-class prototypes $P_j$ via cosine distance $D_{ij}$ .
If $\min_j D_{ij} > T_u$ , the proposal is labeled “unknown”; else, it is classified into the most similar known category via a $K$ -way softmax.
Known and unknown proposals are non-max suppressed separately (IoU threshold 0.5), and the top 50 of each category are retained for final detection.

This division enables robust distinction between unknown objects and background, leveraging the category-agnostic nature of CF-RPN scoring (Zhou et al., 2022).

6. Hyperparameters and Training Strategy

CF-RPN training implements the following key hyperparameter regimes:

Sampling: $N_s=256$ for initial ltrb; $N_s=512$ for refinement; $T_{\rm pos}=0.3/0.7$ , $T_{\rm neg}=0.1/0.3$ , $P_{\rm pos}=1.0/0.5$ for two training stages.
PLN: Margin parameters $m_p=0.05$ , $m_n=0.95$ ; embedding dimension $d_z=256$ ; IoU threshold $T_{\rm iou}=0.5$ .
Unknown threshold: $T_u \approx 0.17$ –$0.23$, determined via validation.
Loss weights: $\alpha=1$ (CF-RPN), $\beta=0.5$ –$2$ (PLN), $\gamma\approx 1$ (softmax classifier).
Inference: Top 1,000 anchors by centerness, objectness filtering at $s_i<0.05$ , thresholding unknown/known, separate NMS, top 50 each.

A pseudocode overview of the inference procedure in the context of Openset RCNN is presented in the original work.

7. Context and Significance in Open-set Object Detection

The CF-RPN addresses a central challenge in open-set object detection (OSOD): the inability of standard proposal mechanisms to separate unknown objects from unannotated background due to reliance on class-based supervision. CF-RPN’s localization-driven scoring is specifically designed to promote generalization to unknown or novel objects and to prevent the systematic exclusion of such instances as negatives. This is particularly critical when evaluating on datasets with incomplete annotations or for real-world robotic perception tasks in unstructured environments (Zhou et al., 2022). CF-RPN underpins the OSOD capability of Openset RCNN, enabling practical open-set perception for robotic rearrangement in cluttered domains.

Markdown Report Issue Upgrade to Chat

References (1)

Open-Set Object Detection Using Classification-free Object Proposal and Instance-level Contrastive Learning (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Classification-free Region Proposal Network (CF-RPN).