P2 Detection Head in Object Detection

Updated 7 February 2026

P2 detection head is a high-resolution branch that operates on 1/4-scale feature maps to improve small object detection.
It integrates into multi-scale architectures like YOLOv8 and MHD-Net, using convolutional blocks and center-sampling for precise localization.
Empirical results show enhanced recall and precision in domains such as agronomy and traffic surveillance with moderate computational impact.

The P2 detection head refers to a specialized high-resolution branch in multi-scale object detection architectures, designed to enhance the detection of very small objects. "P2" denotes the feature pyramid level at a stride of 4 pixels relative to the input, corresponding to the highest-resolution output typically produced by the network's feature aggregation module. Integration of a P2 head extends the canonical multi-branch design (conventionally p3, p4, p5 strides at 8, 16, 32) by attaching an additional detection layer to the 1/4-scale feature map, explicitly targeting small-scale objects whose features are lost at coarser resolutions. This mechanism has found practical application in fields requiring precise small instance detection, such as phenotyping in agronomy and traffic object detection (Chen et al., 28 Jul 2025, Shi et al., 2022).

1. Architectural Integration and Multi-Scale Context

Modern object detectors, notably the YOLO and MHD-Net families, employ feature pyramid architectures to handle the heterogeneity of object scales. The P2 detection head is introduced as an additional detection branch, interfacing with the 1/4-resolution feature map ( $F_2$ ) provided by a feature fusion neck such as a BiFPN. For example, in the improved YOLOv8 architecture (Chen et al., 28 Jul 2025), four detection heads (p2, p3, p4, p5) are attached to successively downsampled outputs at spatial resolutions of 160×160, 80×80, 40×40, and 20×20 for a 640×640 input. The P2 head processes $F_2$ , enabling direct access to finer-grained spatial cues crucial for small object localization. Similarly, MHD-Net (Shi et al., 2022) indexes detection heads H1–H5, with H1 denoting the P2 branch at stride 4, and advocates selective head configuration based on dataset-specific object scale distributions.

2. Layer-Wise Structure and Head Design

The P2 head reuses the typical convolutional blocks and detection modules found in anchor-free detectors. In YOLOv8s-p2 (Chen et al., 28 Jul 2025), the block is structured as:

Input: Feature map $F_2 \in \mathbb{R}^{C \times 160 \times 160}$ (post-BiFPN, $C=256$ ).
Detector head (per grid cell):

1. $1 \times 1$ convolution with $C/2$ output channels, batch normalization, SiLU activation. 2. $3 \times 3$ convolution with $C/2$ output channels, batch normalization, SiLU activation. 3. $1 \times 1$ convolution restoring $C$ channels, batch normalization, SiLU activation. 4. For each prediction head (objectness, class, box): $1 \times 1$ convolution (output: $N_a \times (1 + n_c + 4)$ channels); $N_a=1$ , $n_c$ = number of classes.

Output: Raw predictions are followed by sigmoid/activation during inference.

No additional upsampling or lateral fusion is required beyond what the neck provides. The head operates directly on the high-resolution feature, maximizing retention of localized spatial information.

3. Object-Scale Matching and Head Assignment

Optimal allocation of detection heads to object scales is achieved through empirical distribution analysis. MHD-Net (Shi et al., 2022) formalizes head-object matching by defining, for each head $i$ , a coverage ratio $R_i$ indicating the fraction of ground-truth boxes whose area falls within head $i$ 's effective receptive range. The scale range for head $i$ is specified as:

$SR_i = \left\{ s : \left\lceil 2^i \frac{w_o}{w_{in}} \right\rceil^2 \leq s < \left\lceil 2^{i+1} \frac{w_o}{w_{in}} \right\rceil^2 \right\}$

where $w_o$ is the original image width, $w_{in}$ input width, and $s$ the area in $\text{pixels}^2$ .

Empirical studies indicate that at lower input resolutions, P2/H1 heads capture a significant portion of small objects, while at higher resolutions, their marginal benefit diminishes. MHD-Net recommends selecting two "cross-scale" heads such that their cumulative $R_i$ covers $\geq99\%$ of true objects, discarding intermediate heads to improve efficiency.

4. Loss Functions, Assignment, and Mathematical Details

All heads, including P2, share common loss terms:

$L_{\text{total}} = \lambda_{\text{box}} L_{\text{box}} + \lambda_{\text{obj}} L_{\text{obj}} + \lambda_{\text{cls}} L_{\text{cls}}$

$L_{\text{box}}$ : Complete IoU (CIoU) loss for bounding box regression.
$L_{\text{obj}}$ : Binary cross-entropy (BCE) on objectness.
$L_{\text{cls}}$ : BCE on classification output.

Dynamic $K$ -matching assigns positive samples across all heads, with IoU threshold at $0.5$ (Chen et al., 28 Jul 2025). For anchor-free prediction, P2 employs center-sampling; each grid point is responsible for boxes whose centers lie within a radius $r=2$ cells.

Decoding a prediction at $(i,j)$ on F2 (stride $s=4$ ) follows:

$\begin{aligned} \Delta x & = \text{sigmoid}(t_x)\ b_x & = (i + 2\Delta x - 0.5) \cdot s\ b_w & = (2 \cdot \text{sigmoid}(t_w))^2 \cdot s\ \end{aligned}$

Analogous expressions apply for $b_y, b_h$ .

5. Empirical Efficacy and Performance Impact

P2 heads are empirically validated to enhance detection of small objects under challenging conditions. In the rice spikelet flowering detection task (Chen et al., 28 Jul 2025), augmenting YOLOv8s with a p2 head yields notable improvements:

Model	[email protected] (%)	Precision (%)	Recall (%)	F1-score (%)	FPS
Baseline YOLOv8s	62.8	59.2	50.7	54.6	109
+p2 head	65.9	67.6	61.5	64.4	69

Performance gains are prominent in recall and precision (ΔR=+10.8%, ΔP=+8.4%), attributable to the improved grid resolution for sub-20×20px objects. Inference speed remains within real-time constraints.

Similar findings emerge in MHD-Net (Shi et al., 2022), where judicious use of P2 (H1) and P8 (H3) heads, supplemented with a lightweight dilated-convolutional block, results in a ~30–40% reduction in parameters/FLOPs while preserving or surpassing the baseline mean average precision on challenging datasets such as BDD100K and ETFOD-v2.

6. Architectural Variations and Augmentations

Enhancements to P2 utility can include receptive field expansion immediately upstream of the head. MHD-Net introduces parallel $3\times3$ convolutions with dilation rates 1, 4, and 8, whose outputs are element-wise summed to aggregate multi-contextual features before the P2 head. This strategy bestows a +2.6 mAP point improvement with a negligible increase in parameters and computational demand (Shi et al., 2022). The combination of high-resolution localization and enlarged receptive field supports robust discrimination of small, context-sensitive instances.

7. Application Scope and Configuration Trade-Offs

P2 detection heads are especially effective in domains marked by dense, tiny objects recurring at unpredictable locations—agricultural monitoring, traffic surveillance, or crewed vehicle driver-attention monitoring. Empirical findings across studies underscore the importance of aligning the number and scale of heads to the true distribution of target sizes present in the data. Excessive specialization via many heads can inflate computational footprint without commensurate accuracy gains, while insufficient scale diversity can leave small objects underrepresented. Evaluations suggest that, with proper configuration, architectures leveraging a P2 head can nearly match the accuracy of much larger detectors with substantially reduced model complexity and higher inference efficiency (Chen et al., 28 Jul 2025, Shi et al., 2022).

Markdown Report Issue Upgrade to Chat

References (2)

An Improved YOLOv8 Approach for Small Target Detection of Rice Spikelet Flowering in Field Environments (2025)

Rethinking the Detection Head Configuration for Traffic Object Detection (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to P2 Detection Head.