Papers
Topics
Authors
Recent
Search
2000 character limit reached

Classification-Free RPN for Open-Set Detection

Updated 18 March 2026
  • The paper introduces a classification-free framework that relies on centerness and IoU to score object proposals without class-specific supervision.
  • It utilizes a parallel dual-head architecture with RoIAlign refinement to accurately regress box coordinates and estimate proposal quality.
  • The method prevents overfitting to annotated classes, offering robust open-set detection by leveraging localization cues in unstructured settings.

A Classification-Free Region Proposal Network (CF-RPN) is a network architecture designed for generating object proposals without using category-specific classification signals. Unlike standard region proposal networks (RPNs), CF-RPNs estimate objectness scores based solely on localization cues such as centerness and predicted intersection-over-union (IoU) with ground-truth, eschewing any class-dependent binary object/background discrimination. Introduced in the context of open-set object detection, the CF-RPN’s objectness formulation is specifically constructed to avoid overfitting to annotated classes, making it suitable for environments with unannotated or unknown categories. The CF-RPN is a core component of the Openset RCNN framework for open-set object detection in unstructured settings (Zhou et al., 2022).

1. Network Architecture

The CF-RPN architecture is layered upon a backbone and feature pyramid. Input images are processed by ResNet-50 augmented with a Feature Pyramid Network (FPN) to create a set of multi-scale feature maps {P2,P3,P4,P5,P6}\{P_2,P_3,P_4,P_5,P_6\}. Each FPN level \ell passes through a shared 3×33 \times 3 convolutional layer (with ReLU activation, output channels 256), producing FRH×W×256F_\ell \in \mathbb{R}^{H_\ell \times W_\ell \times 256}.

Object proposal generation proceeds via two parallel heads:

  • Centerness Head: For each spatial location and anchor, a 3×33 \times 3 convolution followed by a 1×11 \times 1 convolution and sigmoid activation yields the scalar centerness score ci(0,1)c_i \in (0,1). This head produces a localization-focused confidence and replaces the class/foreground-vs-background classifier of standard RPNs.
  • Box Regression Head (ltrb): A parallel 3×33 \times 3 convolution followed by a 1×11 \times 1 convolution produces the four values l,t,r,bl, t, r, b per anchor, encoding the distances from the feature point to the left, top, right, and bottom box sides, respectively, avoiding the use of the Δx,Δy,Δw,Δh\Delta x, \Delta y, \Delta w, \Delta h parameterization.

Anchors are ranked by centerness, and the top-K (typically KK=2,000 for training, 1,000 for inference) are retained as proposals. Each proposal is further refined via RoIAlign to obtain fixed-size features over P2P_2P5P_5, which are then processed by:

  • IoU Regression Head: Predicts bi=IoU^(boxi,ground_truth)(0,1)b_i = \hat{\mathrm{IoU}}(\mathrm{box}_i, \mathrm{ground\_truth}) \in (0,1).
  • Standard Box Regression Head: Uses the conventional Δx,Δy,Δw,Δh\Delta x, \Delta y, \Delta w, \Delta h offsets as in Faster R-CNN.

The final per-proposal objectness is si=cibis_i = \sqrt{c_i \cdot b_i}.

2. Objectness Scoring and Loss Formulation

The objectness scoring omits direct binary classification and instead relies upon the interplay of predicted centerness and IoU. For an FPN location+anchor xx:

  • c(x)c(x): predicted centerness from the centerness head.
  • b(x)b(x): predicted IoU from the refinement head.

The final objectness score is

s(x)=c(x)    b(x).s(x) = \sqrt{\,c(x)\; \cdot \; b(x)\,}\,.

Training is accomplished via four smooth L1 loss terms over a sampled set SS of anchors (S=Ns|S| = N_s): Lctr=smoothL1(cici) Lbox1=smoothL1([li,ti,ri,bi]targetltrbi) Liou=smoothL1(biIoUi) Lbox2=smoothL1(ΔiΔi) \begin{align*} L_{\rm ctr} &= \mathrm{smooth}_{L_1}(c_i - c_i^*) \ L_{\rm box1} &= \mathrm{smooth}_{L_1}([l_i, t_i, r_i, b_i] - \mathrm{target}_{\mathrm{ltrb} i}) \ L_{\rm iou} &= \mathrm{smooth}_{L_1}(b_i - \mathrm{IoU}_i^*) \ L_{\rm box2} &= \mathrm{smooth}_{L_1}(\Delta_i - \Delta_i^*) \ \end{align*} The overall CF-RPN loss is

LCF ⁣ ⁣RPN=λ1Lctr+λ2Lbox1+λ3Liou+λ4Lbox2\mathcal{L}_{\rm CF\!-\!RPN} = \lambda_1 L_{\rm ctr} + \lambda_2 L_{\rm box1} + \lambda_3 L_{\rm iou} + \lambda_4 L_{\rm box2}

with typical weights: λ1=λ3=0.5\lambda_1 = \lambda_3 = 0.5, λ2=10\lambda_2 = 10, λ4=2\lambda_4 = 2.

3. Key Differences with Standard RPN

The CF-RPN departs from standard RPN in several critical aspects:

Feature Standard RPN CF-RPN
Classification Head Binary object/background None (classification-free)
Objectness Signal Classification/softmax Centerness × IoU
Anchor Regression Δx,Δy,Δw,Δh\Delta x,\Delta y,\Delta w,\Delta h l,t,r,bl, t, r, b (ltrb), plus Δ\Delta refinement
Negative Sampling May treat unknown objects as background Avoids negative bias, improved for open-set

This approach prevents overfitting to training categories and avoids mislabeling unannotated or unknown objects as negative samples during training (Zhou et al., 2022). Objectness prediction becomes category-agnostic, critically supporting open-set settings.

4. Proposal Generation Pipeline

CF-RPN utilizes standard anchors (e.g., 3 scales × 3 aspect ratios at each FPN location). Proposal ranking and refinement proceed as follows:

  • All anchors are scored for centerness.
  • The regression head predicts l,t,r,bl, t, r, b to form preliminary proposals.
  • Top-K proposals by centerness are selected.
  • Proposal features are extracted via RoIAlign.
  • IoU and Δ\Delta regression are performed for each proposal.
  • The final objectness is computed as si=cibis_i = \sqrt{c_i \cdot b_i}.
  • Proposals with si<0.05s_i < 0.05 are filtered out during inference.

The proposal selection mechanism avoids reliance on class label priors, which is vital for open-set settings.

5. Integration with Openset RCNN and PLN

Within the Openset RCNN architecture, the CF-RPN provides class-agnostic object proposals with their objectness scores. These proposals then pass through subsequent open-set classification and filtering steps:

  • Per-proposal features from RoIAlign are passed to the Prototype Learning Network (PLN).
  • PLN encodes each feature fif_i to a latent embedding ziz_i and compares it to KK known-class prototypes PjP_j via cosine distance DijD_{ij}.
  • If minjDij>Tu\min_j D_{ij} > T_u, the proposal is labeled “unknown”; else, it is classified into the most similar known category via a KK-way softmax.
  • Known and unknown proposals are non-max suppressed separately (IoU threshold 0.5), and the top 50 of each category are retained for final detection.

This division enables robust distinction between unknown objects and background, leveraging the category-agnostic nature of CF-RPN scoring (Zhou et al., 2022).

6. Hyperparameters and Training Strategy

CF-RPN training implements the following key hyperparameter regimes:

  • Sampling: Ns=256N_s=256 for initial ltrb; Ns=512N_s=512 for refinement; Tpos=0.3/0.7T_{\rm pos}=0.3/0.7, Tneg=0.1/0.3T_{\rm neg}=0.1/0.3, Ppos=1.0/0.5P_{\rm pos}=1.0/0.5 for two training stages.
  • PLN: Margin parameters mp=0.05m_p=0.05, mn=0.95m_n=0.95; embedding dimension dz=256d_z=256; IoU threshold Tiou=0.5T_{\rm iou}=0.5.
  • Unknown threshold: Tu0.17T_u \approx 0.17–$0.23$, determined via validation.
  • Loss weights: α=1\alpha=1 (CF-RPN), β=0.5\beta=0.5–$2$ (PLN), γ1\gamma\approx 1 (softmax classifier).
  • Inference: Top 1,000 anchors by centerness, objectness filtering at si<0.05s_i<0.05, thresholding unknown/known, separate NMS, top 50 each.

A pseudocode overview of the inference procedure in the context of Openset RCNN is presented in the original work.

7. Context and Significance in Open-set Object Detection

The CF-RPN addresses a central challenge in open-set object detection (OSOD): the inability of standard proposal mechanisms to separate unknown objects from unannotated background due to reliance on class-based supervision. CF-RPN’s localization-driven scoring is specifically designed to promote generalization to unknown or novel objects and to prevent the systematic exclusion of such instances as negatives. This is particularly critical when evaluating on datasets with incomplete annotations or for real-world robotic perception tasks in unstructured environments (Zhou et al., 2022). CF-RPN underpins the OSOD capability of Openset RCNN, enabling practical open-set perception for robotic rearrangement in cluttered domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Classification-free Region Proposal Network (CF-RPN).