Papers
Topics
Authors
Recent
Search
2000 character limit reached

BiFPN: Adaptive Multi-Scale Feature Fusion

Updated 26 May 2026
  • BiFPN is a neural module that fuses multi-scale features bidirectionally, enhancing representation for detection and segmentation tasks.
  • It employs iterative top-down and bottom-up passes with learnable weighted fusion, reducing parameters and computational costs.
  • BiFPN has been adapted in various domains, from object detection to audio event localization and multi-sensor fusion applications.

A Bidirectional Feature Pyramid Network (BiFPN) is a neural architecture module designed for efficient and adaptive multi-scale feature fusion in deep networks, particularly in detection and segmentation pipelines. BiFPN extends standard feature pyramid approaches by introducing iterative, learnable, bidirectional cross-scale connections, enabling precise and computationally efficient aggregation of information from different spatial resolutions. The concept is formally introduced in "EfficientDet: Scalable and Efficient Object Detection" (Tan et al., 2019) and has been widely adopted and extended in recent object, audio, and multi-modal detection research.

1. Topology and Fusion Principles

BiFPN operates on a set of feature maps at multiple spatial scales, typically denoted {P}\{P_\ell\}, where each PP_\ell is a feature map of a particular resolution derived from a backbone network (e.g., EfficientNet, ResNet, GhostNet). The canonical BiFPN layer is structured as two sequential passes:

  • Top-Down Pass: Propagates semantically strong features from low-resolution (deeper, coarser) levels to higher-resolution (shallower) levels. At each level \ell, the upsampled output from (+1)(\ell+1) is fused with the original PP_\ell using weighted combinations.
  • Bottom-Up Pass: Aggregates spatial detail from fine scales up to coarser levels, fusing the top-down output, the lateral backbone features, and lower-level bottom-up outcomes. Downsampling is performed as necessary.

Each fusion node receives two or more inputs (never a single input, as such nodes are omitted for efficiency), with architectural recursion and skip-connections facilitating direct lateral and residual flows (Tan et al., 2019, Tang et al., 2024).

2. Learnable Weighted Feature Fusion

BiFPN introduces a normalized, learnable scalar weighting for each incoming feature at a fusion node. The fusion at a node with NN inputs {Ii}i=1N\{I_i\}_{i=1}^N is computed as:

wi=ReLU(αi)w_i = \mathrm{ReLU}(\alpha_i)

w^i=wiϵ+j=1Nwj\hat{w}_i = \frac{w_i}{\epsilon + \sum_{j=1}^N w_j}

Fout=i=1Nw^iIiF_\text{out} = \sum_{i=1}^N \hat{w}_i I_i

  • PP_\ell0 are unconstrained learnable parameters.
  • ReLU ensures PP_\ell1 for non-negative fusion weights.
  • PP_\ell2 (e.g., PP_\ell3) prevents division by zero and aids numerical stability.
  • In the original design, normalization can alternatively use softmax, but ReLU-normalization achieves comparable accuracy with up to 30% lower GPU latency (Tan et al., 2019, Tang et al., 2024).

This mechanism lets the network dynamically prioritize relevant scales for each spatial region and task instance.

BiFPN generalizes earlier pyramid fusion networks:

Feature Fusion Network Top-Down Bottom-Up Learnable Weights Repeatable Convolution Type Node Pruning/Same-Level Skips
FPN Yes No No No 3x3 regular No
PANet Yes Yes No No 3x3 regular No
NAS-FPN Yes Yes No Yes Architecture search-derived Varies
BiFPN Yes Yes Yes Yes Depthwise-separable, variant Yes
  • BiFPN’s repeated, bidirectional blocks and pruning of single-input nodes set it apart, yielding both improved computational efficiency and feature expressivity (Tan et al., 2019, Tang et al., 2024, Meng et al., 2022).
  • EfficientDet’s version uses only depthwise-separable convolutions at all fusion nodes; some extensions incorporate GhostConv, channel-shuffle, or projected convolutions to further reduce parameters (Li et al., 2023, Xu et al., 2022).
  • Attention-enhanced BiFPN variants introduce content-adaptive fusion at the node level or region-level sparse attention (see Section 6) (Meng et al., 18 Jun 2025).

4. Empirical Performance and Complexity

BiFPN demonstrably outperforms prior neck designs across various benchmarks and domains. Representative comparisons include:

Backbone + Neck AP/mAP Params FLOPs Notable Datasets
EfficientNet-B3 + FPN 40.3 21M 75B COCO
EfficientNet-B3 + BiFPN (no weights) 43.9 ~18.5M ~50B COCO
EfficientNet-B3 + BiFPN (weights) 44.4 ~18.5M ~50B COCO
YOLOv5n + PANet 67.6% 1.77M 4.2B Fire Detection
YOLOv5n + Light-BiFPN 68.6% 1.25M 3.3B Fire Detection
DETR baseline 39.9 --- --- COCO
DETR++ (with BiFPN) 41.8 +few% PP_\ell410% overhead COCO, RICO

5. Application Domains and Customizations

Since its original formulation, BiFPN has been adapted for diverse modalities and detection contexts:

Common customizations include:

6. Variants: Attention and Enhanced Fusion

Several studies extend the BiFPN concept with explicit nonlocal attention, content-adaptive fusion, or additional context modules:

  • AFBiFPN adds BiFormer region-level and token-level sparse attention at fusion nodes, yielding significant AP improvements—particularly on small and medium object subsets—in SAR ship detection (Meng et al., 18 Jun 2025).
  • CFE + BiFPN enhances local feature diversity before fusion using a multi-branch convolutional preprocess, followed by BiFPN and attention for scale-aware context (Meng et al., 18 Jun 2025).
  • SimAM and Shuffle Attention Mechanisms are stacked with BiFPN in road-crack detection to further boost spatial discriminability and channel selectivity (Tang et al., 2024).
  • Diffusion-conditioned BiFPN (cMini-BiFPN) combines multi-resolution latent denoising with BiFPN-style fusion for robust sensor fusion (Le et al., 2024).

These modifications consistently demonstrate that BiFPN’s learnable fusion weights interact favorably with content-adaptive attention, providing complementary local/global context modeling and further boosting detection and recognition metrics (Tang et al., 2024, Meng et al., 18 Jun 2025).

7. Implementation and Practical Considerations

Best practices for BiFPN integration, according to published benchmarks, include:

  • Depthwise-separable convolution at fusion points to minimize parameters and operations (Tan et al., 2019).
  • Channel unification via 1×1 convolution prior to fusion at each scale level (especially when scales have mismatched feature widths) (Ibrahim et al., 2 Apr 2025, Chen et al., 28 Jul 2025).
  • Limiting the number of BiFPN layers for real-time or memory-constrained inference—typically 1–3 is effective; further stacking yields diminishing returns and increased memory/FLOPs (Chen et al., 28 Jul 2025).
  • ReLU-based fusion normalization is more resource-efficient than softmax for BiFPN weight normalization, with nearly identical empirical performance (Tan et al., 2019).
  • Careful ablation/combination with lightweight backbones (GhostNet, CSPDarkNet, etc.) and attention modules is critical; BiFPN alone without such context may not always yield net gain (Ruiqiang, 2022).
  • Explicit preservation of high-resolution pyramid levels (e.g., P2 at 160×160, or even P3–P9 in segmentation) is central for challenging small-object or dense pixelwise tasks (Meng et al., 2022, Ibrahim et al., 2 Apr 2025, Chen et al., 28 Jul 2025).

References

  • "EfficientDet: Scalable and Efficient Object Detection" (Tan et al., 2019)
  • "Enhancing Road Crack Detection Accuracy with BsS-YOLO: Optimizing Feature Fusion and Attention Mechanisms" (Tang et al., 2024)
  • "Light-YOLOv5: A Lightweight Algorithm for Improved YOLOv5 in Complex Fire Scenarios" (Xu et al., 2022)
  • "DETR++: Taming Your Multi-Scale Detection Transformer" (Zhang et al., 2022)
  • "Revisiting Multi-Scale Feature Fusion for Semantic Segmentation" (Meng et al., 2022)
  • "Enhancing Traffic Sign Recognition On The Performance Based On Yolov8" (Ibrahim et al., 2 Apr 2025)
  • "Fast vehicle detection algorithm based on lightweight YOLO7-tiny" (Li et al., 2023)
  • "An Improved YOLOv8 Approach for Small Target Detection of Rice Spikelet Flowering in Field Environments" (Chen et al., 28 Jul 2025)
  • "Feature Aggregation in Joint Sound Classification and Localization Neural Networks" (Healy et al., 2023)
  • "YOLOv5s-GTB: light-weighted and improved YOLOv5s for bridge crack detection" (Ruiqiang, 2022)
  • "DifFUSER: Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation" (Le et al., 2024)
  • "Convolutional Feature Enhancement and Attention Fusion BiFPN for Ship Detection in SAR Images" (Meng et al., 18 Jun 2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BiFPN.