Papers
Topics
Authors
Recent
Search
2000 character limit reached

P2 Detection Head in Object Detection

Updated 7 February 2026
  • P2 detection head is a high-resolution branch that operates on 1/4-scale feature maps to improve small object detection.
  • It integrates into multi-scale architectures like YOLOv8 and MHD-Net, using convolutional blocks and center-sampling for precise localization.
  • Empirical results show enhanced recall and precision in domains such as agronomy and traffic surveillance with moderate computational impact.

The P2 detection head refers to a specialized high-resolution branch in multi-scale object detection architectures, designed to enhance the detection of very small objects. "P2" denotes the feature pyramid level at a stride of 4 pixels relative to the input, corresponding to the highest-resolution output typically produced by the network's feature aggregation module. Integration of a P2 head extends the canonical multi-branch design (conventionally p3, p4, p5 strides at 8, 16, 32) by attaching an additional detection layer to the 1/4-scale feature map, explicitly targeting small-scale objects whose features are lost at coarser resolutions. This mechanism has found practical application in fields requiring precise small instance detection, such as phenotyping in agronomy and traffic object detection (Chen et al., 28 Jul 2025, Shi et al., 2022).

1. Architectural Integration and Multi-Scale Context

Modern object detectors, notably the YOLO and MHD-Net families, employ feature pyramid architectures to handle the heterogeneity of object scales. The P2 detection head is introduced as an additional detection branch, interfacing with the 1/4-resolution feature map (F2F_2) provided by a feature fusion neck such as a BiFPN. For example, in the improved YOLOv8 architecture (Chen et al., 28 Jul 2025), four detection heads (p2, p3, p4, p5) are attached to successively downsampled outputs at spatial resolutions of 160×160, 80×80, 40×40, and 20×20 for a 640×640 input. The P2 head processes F2F_2, enabling direct access to finer-grained spatial cues crucial for small object localization. Similarly, MHD-Net (Shi et al., 2022) indexes detection heads H1–H5, with H1 denoting the P2 branch at stride 4, and advocates selective head configuration based on dataset-specific object scale distributions.

2. Layer-Wise Structure and Head Design

The P2 head reuses the typical convolutional blocks and detection modules found in anchor-free detectors. In YOLOv8s-p2 (Chen et al., 28 Jul 2025), the block is structured as:

  • Input: Feature map F2∈RC×160×160F_2 \in \mathbb{R}^{C \times 160 \times 160} (post-BiFPN, C=256C=256).
  • Detector head (per grid cell):

1. 1×11 \times 1 convolution with C/2C/2 output channels, batch normalization, SiLU activation. 2. 3×33 \times 3 convolution with C/2C/2 output channels, batch normalization, SiLU activation. 3. 1×11 \times 1 convolution restoring CC channels, batch normalization, SiLU activation. 4. For each prediction head (objectness, class, box): 1×11 \times 1 convolution (output: Na×(1+nc+4)N_a \times (1 + n_c + 4) channels); Na=1N_a=1, ncn_c = number of classes.

  • Output: Raw predictions are followed by sigmoid/activation during inference.

No additional upsampling or lateral fusion is required beyond what the neck provides. The head operates directly on the high-resolution feature, maximizing retention of localized spatial information.

3. Object-Scale Matching and Head Assignment

Optimal allocation of detection heads to object scales is achieved through empirical distribution analysis. MHD-Net (Shi et al., 2022) formalizes head-object matching by defining, for each head ii, a coverage ratio RiR_i indicating the fraction of ground-truth boxes whose area falls within head ii's effective receptive range. The scale range for head ii is specified as:

SRi={s:⌈2iwowin⌉2≤s<⌈2i+1wowin⌉2}SR_i = \left\{ s : \left\lceil 2^i \frac{w_o}{w_{in}} \right\rceil^2 \leq s < \left\lceil 2^{i+1} \frac{w_o}{w_{in}} \right\rceil^2 \right\}

where wow_o is the original image width, winw_{in} input width, and ss the area in pixels2\text{pixels}^2.

Empirical studies indicate that at lower input resolutions, P2/H1 heads capture a significant portion of small objects, while at higher resolutions, their marginal benefit diminishes. MHD-Net recommends selecting two "cross-scale" heads such that their cumulative RiR_i covers ≥99%\geq99\% of true objects, discarding intermediate heads to improve efficiency.

4. Loss Functions, Assignment, and Mathematical Details

All heads, including P2, share common loss terms:

Ltotal=λboxLbox+λobjLobj+λclsLclsL_{\text{total}} = \lambda_{\text{box}} L_{\text{box}} + \lambda_{\text{obj}} L_{\text{obj}} + \lambda_{\text{cls}} L_{\text{cls}}

  • LboxL_{\text{box}}: Complete IoU (CIoU) loss for bounding box regression.
  • LobjL_{\text{obj}}: Binary cross-entropy (BCE) on objectness.
  • LclsL_{\text{cls}}: BCE on classification output.

Dynamic KK-matching assigns positive samples across all heads, with IoU threshold at $0.5$ (Chen et al., 28 Jul 2025). For anchor-free prediction, P2 employs center-sampling; each grid point is responsible for boxes whose centers lie within a radius r=2r=2 cells.

Decoding a prediction at (i,j)(i,j) on F2 (stride s=4s=4) follows:

Δx=sigmoid(tx) bx=(i+2Δx−0.5)⋅s bw=(2⋅sigmoid(tw))2⋅s \begin{aligned} \Delta x & = \text{sigmoid}(t_x)\ b_x & = (i + 2\Delta x - 0.5) \cdot s\ b_w & = (2 \cdot \text{sigmoid}(t_w))^2 \cdot s\ \end{aligned}

Analogous expressions apply for by,bhb_y, b_h.

5. Empirical Efficacy and Performance Impact

P2 heads are empirically validated to enhance detection of small objects under challenging conditions. In the rice spikelet flowering detection task (Chen et al., 28 Jul 2025), augmenting YOLOv8s with a p2 head yields notable improvements:

Model [email protected] (%) Precision (%) Recall (%) F1-score (%) FPS
Baseline YOLOv8s 62.8 59.2 50.7 54.6 109
+p2 head 65.9 67.6 61.5 64.4 69

Performance gains are prominent in recall and precision (ΔR=+10.8%, ΔP=+8.4%), attributable to the improved grid resolution for sub-20×20px objects. Inference speed remains within real-time constraints.

Similar findings emerge in MHD-Net (Shi et al., 2022), where judicious use of P2 (H1) and P8 (H3) heads, supplemented with a lightweight dilated-convolutional block, results in a ~30–40% reduction in parameters/FLOPs while preserving or surpassing the baseline mean average precision on challenging datasets such as BDD100K and ETFOD-v2.

6. Architectural Variations and Augmentations

Enhancements to P2 utility can include receptive field expansion immediately upstream of the head. MHD-Net introduces parallel 3×33\times3 convolutions with dilation rates 1, 4, and 8, whose outputs are element-wise summed to aggregate multi-contextual features before the P2 head. This strategy bestows a +2.6 mAP point improvement with a negligible increase in parameters and computational demand (Shi et al., 2022). The combination of high-resolution localization and enlarged receptive field supports robust discrimination of small, context-sensitive instances.

7. Application Scope and Configuration Trade-Offs

P2 detection heads are especially effective in domains marked by dense, tiny objects recurring at unpredictable locations—agricultural monitoring, traffic surveillance, or crewed vehicle driver-attention monitoring. Empirical findings across studies underscore the importance of aligning the number and scale of heads to the true distribution of target sizes present in the data. Excessive specialization via many heads can inflate computational footprint without commensurate accuracy gains, while insufficient scale diversity can leave small objects underrepresented. Evaluations suggest that, with proper configuration, architectures leveraging a P2 head can nearly match the accuracy of much larger detectors with substantially reduced model complexity and higher inference efficiency (Chen et al., 28 Jul 2025, Shi et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to P2 Detection Head.