Papers
Topics
Authors
Recent
Search
2000 character limit reached

EfficientDet BiFPN: Efficient Feature Fusion

Updated 5 April 2026
  • EfficientDet’s BiFPN is a multi-scale architecture that fuses features bidirectionally using learnable, normalized weights to enhance object detection accuracy.
  • It optimizes efficiency by pruning redundant nodes and employing fast ReLU-based normalization, reducing computation and memory usage.
  • Empirical results show that stacking BiFPN layers improves average precision while lowering parameters and FLOPs, benefiting scalable detection systems.

EfficientDet’s BiFPN, or Bidirectional Feature Pyramid Network, is a multi-scale feature fusion architecture designed to maximize both accuracy and efficiency in modern object detection systems. Introduced in the context of the EfficientDet family of detectors, BiFPN simultaneously optimizes for memory, FLOPs, and parameter count by structuring feature fusion as a repeated, bidirectional, and weight-normalized computation graph. Unlike conventional FPNs or their variants (such as PANet or NAS-FPN), BiFPN incorporates learnable, normalized fusion weights, skip connections, node pruning, and a scalable stacking design. These innovations have established BiFPN as a backbone-agnostic neck, with demonstrated utility across vision benchmarks and notable subsequent extensions for memory-critical or robustness-challenged regimes (Tan et al., 2019, Jain, 2023, Chiley et al., 2022).

1. Architectural Design and Topology

BiFPN takes as input a vector of multi-resolution feature maps from the detection backbone:

P⃗in=(Pl1in,…,Plkin),\vec P^{in} = (P^{in}_{l_1},\ldots,P^{in}_{l_k}),

and outputs deeply fused, multi-level features:

P⃗out=f(P⃗in)\vec P^{out} = f(\vec P^{in})

such that both top-down and bottom-up cross-level information flows are realized. Standard FPNs perform only a single top-down pass, while PANet introduces a separate bottom-up aggregation. BiFPN integrates these directions into a single, repeated computation graph, subject to the following architectural principles (Tan et al., 2019):

  1. Every BiFPN layer contains both top-down and bottom-up fusion steps.
  2. Nodes with a single input are pruned, eliminating redundant computation.
  3. Skip connections from each resolution’s input to its output are included, retaining backbone information.
  4. Each BiFPN layer is stacked with DbifpnD_{\mathrm{bifpn}} depth, and the weights are not shared between layers.

The canonical configuration uses feature levels P3P_3 to P7P_7 from the backbone at spatial resolutions $1/8$ to $1/128$ of the input.

2. Weighted Feature Fusion and Normalization

A principal contribution of BiFPN is its learnable, normalized attention mechanism for fusing multiple feature inputs at each node. Instead of unweighted sum fusion used by FPN (O=∑iIiO = \sum_i I_i), BiFPN employs per-edge scalar weights wiw_i with non-negativity constraint and normalization. Three fusion strategies were evaluated:

  • Unbounded fusion: O=∑iwiIiO = \sum_i w_i I_i, which can diverge due to unbounded P⃗out=f(P⃗in)\vec P^{out} = f(\vec P^{in})0.
  • Softmax-based fusion: P⃗out=f(P⃗in)\vec P^{out} = f(\vec P^{in})1, improved stability at extra compute.
  • Fast ReLU-based normalization:

P⃗out=f(P⃗in)\vec P^{out} = f(\vec P^{in})2

This achieves near-identical accuracy as softmax but 25–30% faster GPU throughput.

A typical top-down and bottom-up computation at feature level 6 is as follows:

P⃗out=f(P⃗in)\vec P^{out} = f(\vec P^{in})3

P⃗out=f(P⃗in)\vec P^{out} = f(\vec P^{in})4

3. Complexity, Efficiency, and Comparative Results

BiFPN’s edge pruning and weight-normalized fusion enable significant reductions in both parameters and FLOPs compared to equally stacked FPN or PANet feature networks. The following table summarizes key empirical results for the backbone ResNet50 under a matched training regime (Tan et al., 2019):

Model AP Parameters FLOPs
Repeated top-down FPN 42.29 1.00× 1.00×
Repeated FPN+PANet 44.08 1.00× 1.00×
NAS-FPN 43.16 0.71× 0.72×
BiFPN (no weights) 43.94 0.88× 0.67×
BiFPN (fast normalized wts) 44.39 0.88× 0.68×

For object detector variants, EfficientDet-D0 achieves the same AP (P⃗out=f(P⃗in)\vec P^{out} = f(\vec P^{in})5) as YOLOv3 with P⃗out=f(P⃗in)\vec P^{out} = f(\vec P^{in})6 fewer FLOPs (2.5B vs. 71B), and EfficientDet-D4 outperforms AmoebaNet+NAS-FPN in both accuracy (P⃗out=f(P⃗in)\vec P^{out} = f(\vec P^{in})7AP = +1.1) and computational/parameter efficiency (FLOPs P⃗out=f(P⃗in)\vec P^{out} = f(\vec P^{in})8, params P⃗out=f(P⃗in)\vec P^{out} = f(\vec P^{in})9). Across the EfficientDet-D0 through D7 models, measured latency accelerates by DbifpnD_{\mathrm{bifpn}}0–DbifpnD_{\mathrm{bifpn}}1 on GPU and DbifpnD_{\mathrm{bifpn}}2–DbifpnD_{\mathrm{bifpn}}3 on CPU relative to comparable detectors (Tan et al., 2019).

4. Ablation and Scaling Analysis

Comprehensive ablations disentangle BiFPN’s gains (Tan et al., 2019):

  • Substituting FPN with BiFPN on EfficientNet-B3 improves AP from 40.3 (FPN) to 44.4 (BiFPN) while reducing parameters (from 21M to 12M) and FLOPs (from 75B to 24B).
  • Fusion normalization: softmax- vs. fast-normalized fusion results in a negligible AP delta (DbifpnD_{\mathrm{bifpn}}4APDbifpnD_{\mathrm{bifpn}}5[DbifpnD_{\mathrm{bifpn}}6,DbifpnD_{\mathrm{bifpn}}7]) and DbifpnD_{\mathrm{bifpn}}8–DbifpnD_{\mathrm{bifpn}}9 GPU speedup for the latter.
  • Compound scaling: joint scaling of depth, width, and input resolution yields a superior AP/FLOPs tradeoff curve compared to unidimensional scaling.

The hyperparameters for EfficientDet variants (indexed by compound scaling coefficient P3P_30) are:

P3P_31 P3P_32 P3P_33 P3P_34 P3P_35
0 3 64 3 512
1 4 88 3 640
2 5 112 3 768
3 6 160 4 896
4 7 224 4 1024
5 8 288 4 1280
6 9 384 5 1280
7 10 384 5 1536

Depthwise separable convolutions are uniformly applied, with each convolution followed by batch normalization (P3P_36e-3, decay=0.99) and SiLU activation. Focal loss (P3P_37) and 9-anchor parameterization complete the prediction head.

5. Variants: Robustness and Memory Efficiency Extensions

BiFPN’s topology and principles admit various extensions:

BiSkFPN (DeepSeaNet context): In underwater object detection under severe visibility noise, a modified BiFPN—termed BiSkFPN—was shown to increase robustness and feature localization (Jain, 2023). Key modifications include an extra deconvolution stream in the top-down path, skip connections from immediate lower-level backbone features, and channel-wise concatenation as fusion, followed by P3P_38 convolution. Quantitatively, BiSkFPN raised mean mAP from P3P_39 (BiFPN) to P7P_70, and feature-map IoU to P7P_71. Theoretical rationale posits that skip connections preserve fine-scale feature detail and that concatenation-based fusion resists degradation under adversarial perturbations.

RevBiFPN (Reversible BiFPN): To address activation memory bottlenecks in deep or wide BiFPN stacks, RevBiFPN replaces each fusion layer with a reversible residual silo (RevSilo). By structuring additive couplings both top-down and bottom-up, the model achieves exact inversion of intermediate feature maps, enabling gradient backpropagation with P7P_72 activation memory cost (Chiley et al., 2022). This brings up to P7P_73 memory savings at high depths and allows models to scale to regimes otherwise infeasible on accelerators.

6. Implementation Considerations and Practical Guidelines

The direct implementation steps for BiFPN are:

  • Construct a bidirectional graph spanning the required input resolutions, including both standard and skip connections.
  • At each fusion edge, insert a learnable scalar P7P_74, with edge aggregation performed by fast-normalized fusion.
  • Stack P7P_75 identical BiFPN layers, with all convolutions realized as depthwise separable blocks.
  • Match box/class head widths to BiFPN and process all fused features through these heads.
  • For robust or memory-constrained variants, adapt the basic BiFPN with, respectively, BiSkFPN-style fusion or reversible coupling.

The stackability, normalized fusion, and pruning of single-input nodes minimize computation without forgoing localization accuracy. BiFPN’s formalism continues to underpin state-of-the-art detectors and inspires both memory- and robustness-sensitive adaptations (Tan et al., 2019, Jain, 2023, Chiley et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to EfficientDet BiFPN.