Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 18 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 194 tok/s Pro
GPT OSS 120B 432 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

AC-BiFPN: Augmented Feature Pyramid Network

Updated 12 October 2025
  • The paper demonstrates that AC-BiFPN improves detection accuracy by merging multi-scale features using learnable fusion weights and enhanced convolutional modules.
  • It introduces convolutional feature enhancement modules that combine dilated, deformable, and standard convolutions to enrich detail while preserving contextual information.
  • Attention strategies via BiFormer Attention dynamically focus on salient regions, yielding significant improvements in precision across small, medium, and large object detection.

The Augmented Convolutional Bi-directional Feature Pyramid Network (AC-BiFPN) is a specialized neural feature fusion architecture designed for multi-scale object detection, with a particular focus on extracting detailed and contextual information from complex images. AC-BiFPN builds upon the fundamental Bi-directional Feature Pyramid Network (BiFPN) by integrating advanced convolutional operators and attention mechanisms to address scale variation, noise, and occlusion challenges in domains such as medical image analysis and maritime surveillance.

1. Architectural Foundations

AC-BiFPN operates as an encoder backbone in hybrid and end-to-end pipelines, most notably in tasks requiring precise localization and semantic-rich representations. Its core function is to process input images at multiple resolutions and fuse the resulting feature maps in a bi-directional manner, supporting both top-down and bottom-up aggregation pathways. The architecture accepts images resized to various scales, extracts features through hierarchical and parallel convolutional modules, and fuses these maps, ultimately producing a deep multi-scale representation optimized for downstream tasks. Feature fusion employs learnable weights for each pathway, formalized as:

PlO=jwljUjjwlj+ϵP_l^O = \frac{\sum_j w_{lj} \cdot U_j}{\sum_j w_{lj} + \epsilon}

where wljw_{lj} are fusion weights, UjU_j are incoming feature maps (possibly resized or attention-refined), and ϵ\epsilon stabilizes the denominator. This system generalizes across levels of the feature pyramid and supports attention integration at critical fusion nodes.

2. Convolutional Feature Enhancement

A key innovation of AC-BiFPN is the integration of multi-branch convolutional feature enhancement modules (CFE), positioned directly after backbone extractions. The CFE module addresses limitations in shallow (lack of context) and deep (loss of fine detail) layers by constructing a parallel multi-pathway convolutional system. It leverages standard, dilated, and deformable convolutions with various kernel shapes to adaptively expand receptive fields and sample diverse semantic cues. Three illustrative branches are:

  • B1=DConv3×3Conv3×1{Conv1×3[Conv1×1(F)]}B_1 = DConv_{3 \times 3} \left\langle Conv_{3 \times 1}\{Conv_{1 \times 3}[Conv_{1 \times 1}(F)]\} \right\rangle
  • B2=DConv3×3Conv5×1{Conv1×5[Conv1×1(F)]}B_2 = DConv_{3 \times 3} \left\langle Conv_{5 \times 1}\{Conv_{1 \times 5}[Conv_{1 \times 1}(F)]\} \right\rangle
  • B3=DFConv3×3Conv1×3{Conv3×1[Conv1×1(F)]}B_3 = DFConv_{3 \times 3} \left\langle Conv_{1 \times 3}\{Conv_{3 \times 1}[Conv_{1 \times 1}(F)]\} \right\rangle

Outputs are concatenated and fused residually with the input:

Y=Concat(B1,B2,B3)+Conv1×1(F)Y = \text{Concat}(B_1, B_2, B_3) + Conv_{1 \times 1}(F)

This multi-scale enrichment ensures preservation of critical details required for small object detection and compensates for context loss at lower levels.

3. Attention-Based Feature Fusion

AC-BiFPN incorporates advanced attention strategies during feature pyramid fusion, specifically via BiFormer Attention (BA). BA implements a bi-level dynamic sparse attention mechanism, subdivided into three stages:

  1. Region Partitioning and Projection: The feature map FRH×W×CF \in \mathbb{R}^{H \times W \times C} is divided into S×SS \times S regions, each linearly projected to queries, keys, and values.
  2. Region-to-Region Attention: Mean-pooled region queries and keys construct an affinity graph, selecting top-KK connections for each region.
  3. Token-to-Token Attention: Aggregated keys and values from relevant regions provide inputs for scaled dot-product attention, producing region-refined output embeddings.

The attention operation is formalized as:

BA(F)=Softmax(QrKgTC)Vg+LCE(Vr)BA(F) = \text{Softmax} \left( \frac{Q_r K_g^T}{\sqrt{C}} \right) V_g + LCE(V_r)

where KgK_g, VgV_g are aggregated keys and values, and LCELCE denotes local-context embedding. This mechanism enables adaptive focusing on salient image regions and enhances cross-scale context modeling, vital for precise detection under noise and occlusion.

4. Performance Evaluation

Applied to diverse tasks, AC-BiFPN demonstrates marked improvements over conventional CNN-based models. On SSDD, the framework achieves an average precision (AP) of ~0.702, outperforming 25 state-of-the-art detectors. Notable gains are observed in small object detection (APS_S = 0.698), medium (APM_M = 0.733), and large targets (APL_L = 0.701). Ablation studies reveal that:

  • CFE alone improves AP via enriched representation.
  • BA alone delivers superior refinement.
  • The combined AC-BiFPN yields a 19% AP gain, 24% APS_S, 11.7% APM_M, and 31% APL_L compared to Faster R-CNN + FPN.

In medical imaging, integration with a Transformer decoder further enhances report generation performance, with BLEU-1 \approx 38.2, METEOR \approx 17.0, ROUGE \approx 31.0, CIDEr \approx 45.8 on the RSNA Intracranial Hemorrhage Detection dataset. These metrics signify improved diagnostic accuracy and text coherence relative to non-pyramid CNN backbones (Bouslimi et al., 9 Oct 2025).

5. Applications

AC-BiFPN’s multi-scale fusion capacity and attention integration target domains with challenging visual landscapes. Key applications include:

Domain Task Motivation
Maritime Ship detection in SAR images Robust detection under clutter and scale variation
Medical Radiology report generation Accurate small anomaly detection, report coherence
Environmental Coastal monitoring Adaptation to varying resolution and noise

It supports real-time monitoring, search and rescue, illegal activity detection, clinical decision support, and educational platforms for trainees via automated feedback (Meng et al., 18 Jun 2025, Bouslimi et al., 9 Oct 2025).

6. Significance and Implications

The AC-BiFPN framework consolidates convolutional feature enhancement and dynamic attention fusion, yielding superior cross-scale modeling and adaptability compared to static pyramid designs and non-attentive methods. This suggests improvements in robustness, sensitivity, and scalability for both small and large target detection. Its performance in clinical and surveillance settings substantiates its potential to streamline workflows and reduce critical diagnostic delays.

A plausible implication is that further development of AC-BiFPN-inspired architectures may generalize to additional domains requiring granular feature extraction under adverse imaging conditions, as well as enhance multi-modal feature fusion mechanisms beyond current paradigms.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to AC-BiFPN.