Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fused-MBConv in EfficientNetV2

Updated 8 February 2026
  • Fused-MBConv is an operator that fuses 1×1 expansion and k×k depthwise convolution into a single operation to enhance training throughput and parameter efficiency.
  • It streamlines the convolutional block by using a fused k×k convolution followed by batch normalization, activation, optional squeeze-and-excitation, and projection.
  • Empirical results indicate that using Fused-MBConv in early network stages reduces training time while slightly increasing parameter counts and boosting overall accuracy.

Fused-MBConv is an operator introduced in the EfficientNetV2 convolutional architecture, designed to optimize training throughput and parameter efficiency by rethinking the structure of the canonical MBConv (Mobile Inverted Bottleneck Convolution) block. Fused-MBConv eliminates the traditional separation between the 1×1 expansion and the k×kk\times k depthwise convolution by merging them into a single k×kk\times k convolution with increased output channels, followed by batch normalization, activation, optional squeeze-and-excitation (SE), a projection, and skip connection (when applicable). This fused structure trades slightly higher parameter counts for significantly improved hardware utilization and faster throughput, especially in the early, narrow layers of modern convolutional neural networks (Tan et al., 2021).

1. Block Structure and Mathematical Formulation

The standard MBConv architecture consists of a sequence of expansion (1×1 convolution), depthwise (spatial k×kk\times k) convolution, squeeze-and-excitation, projection (1×1 convolution), and residual addition under constraints. In contrast, Fused-MBConv executes the expansion and spatial filtering in a single k×kk\times k convolution. The block’s structure can be precisely described as:

  • Fused Conv: k×kk\times k Conv, input channels CinC_{\mathrm{in}}, output channels t Cint\,C_{\mathrm{in}}, stride ss.
  • BatchNorm and Activation
  • Squeeze-and-Excitation (optional):
    • u=GlobalAvgPool(Z1)u = \mathrm{GlobalAvgPool}(Z_1)
    • e=σ(W2 Act(W1 u))e = \sigma(W_2\,\mathrm{Act}(W_1\,u))
    • k×kk\times k0
  • Projection: k×kk\times k1 Conv, k×kk\times k2, BN.
  • Residual: If k×kk\times k3 and k×kk\times k4, add input.

Mathematically, for input k×kk\times k5, expansion ratio k×kk\times k6, kernel size k×kk\times k7, stride k×kk\times k8, and output channels k×kk\times k9: k×kk\times k0

This architecture eliminates the depthwise convolution and expansion 1×1 convolution, integrating both into a single dense k×kk\times k1 convolution, which results in improved hardware efficiency (Tan et al., 2021).

2. Implementation Workflow and Pseudocode

The Fused-MBConv block can be instantiated with the following pseudocode that details tensor shapes and parameterization:

t Cint\,C_{\mathrm{in}}5

For example, in Stage 1 of EfficientNetV2-S, with k×kk\times k2, k×kk\times k3, k×kk\times k4, k×kk\times k5, the fused conv is k×kk\times k6 (5184 parameters), projection k×kk\times k7 (576 parameters), totaling approximately 5760 parameters per block. Later stages with k×kk\times k8 (e.g., k×kk\times k9, 20736 parameters plus projection, 4608 parameters) total approximately 25,344 per block (Tan et al., 2021).

3. Computational Complexity and Parameter Analysis

Fused-MBConv and MBConv differ in their FLOPs and parameter composition. Let k×kk\times k0 be kernel size, k×kk\times k1 spatial dimensions.

  • MBConv:
    • Expand k×kk\times k2: k×kk\times k3
    • Depthwise k×kk\times k4: k×kk\times k5
    • Project k×kk\times k6: k×kk\times k7
    • k×kk\times k8
    • Parameters: k×kk\times k9
  • Fused-MBConv:
    • Fused k×kk\times k0: k×kk\times k1
    • Project k×kk\times k2: k×kk\times k3
    • k×kk\times k4
    • Parameters: k×kk\times k5

Although Fused-MBConv generally incurs a greater parameter and FLOP cost than standard depthwise-separable convolution, the increase is limited in the early stages (where k×kk\times k6 is small) and is offset by improved throughput due to better accelerator utilization (Tan et al., 2021).

Empirical data, e.g., EfficientNet-B4 baseline versus a variant with Fused-MBConv in the early stages: | Configuration | Params | FLOPs | Top-1 | Images/sec (TPUv3) | |-----------------------------------|--------|-------|--------|--------------------| | No Fused (all MBConv) | 19.3M | 4.5B | 82.8% | 262 | | Fused in Stages 1–3 Only | 20.0M | 7.5B | 83.1% | 362 |

Fully replacing all MBConv blocks increases parameter count substantially (e.g., 132M) and degrades training efficiency, motivating a hybrid approach (Tan et al., 2021).

4. Neural Architecture Search and Block Selection

Fused-MBConv arose from training-aware neural architecture search (NAS) utilizing a stage-wise and factorized search over operator type, kernel size, expansion ratio, and repeat count. The search reward for configuration k×kk\times k7 is: k×kk\times k8 where k×kk\times k9 is Top-1 accuracy, CinC_{\mathrm{in}}0 is normalized step time, CinC_{\mathrm{in}}1 is parameter count, CinC_{\mathrm{in}}2, CinC_{\mathrm{in}}3.

The search space included the operator choice CinC_{\mathrm{in}}4MBConv, Fused-MBConvCinC_{\mathrm{in}}5, kernel CinC_{\mathrm{in}}6, and expansion CinC_{\mathrm{in}}7. Empirical search observations demonstrated:

  • Early stages (1–3) consistently favor Fused-MBConv for improved throughput.
  • Later stages (4–7), where CinC_{\mathrm{in}}8 is larger, favor conventional MBConv to maintain parameter and FLOP efficiency and exploit depthwise separation.

This hybrid pattern, adopting Fused-MBConv in early blocks and MBConv in later, high-channel-depth blocks, was computationally validated to offer better speed-accuracy tradeoffs than homogeneous block choices (Tan et al., 2021).

5. Empirical Results and Practical Impact

Empirical evaluation of the Fused-MBConv operator within EfficientNetV2 demonstrates:

  • Training step time is reduced by 30–40% when Fused-MBConv is used in early network stages (e.g., EfficientNetV2-S achieves CinC_{\mathrm{in}}920 ms/step for 83.9% Top-1, compared to EfficientNet-V1’s 45 ms/step for similar accuracy).
  • Selective use of Fused-MBConv in stages 1–3 increases throughput by 38% and Top-1 accuracy by t Cint\,C_{\mathrm{in}}00.3 percentage points compared to an all-MBConv baseline.
  • End-to-end model comparison (EfficientNetV2-S: 83.9% Top-1, 22M params, 8.8B FLOPs, 7h train-time) shows superior efficiency relative to earlier architectures (EfficientNet-B7: 84.7% Top-1, 66M params, 38B FLOPs, 139h train-time), with EfficientNetV2-M matching or exceeding B7 accuracy at %%%%59k×kk\times k160%%%% faster training and t Cint\,C_{\mathrm{in}}320% fewer parameters.
  • Overuse of Fused-MBConv (in later, wide stages) severely increases parameter count and can degrade accuracy, justifying its selective adoption (Tan et al., 2021).

6. Significance and Architectural Implications

Fused-MBConv represents an evolution in mobile and resource-aware convolutional block design. By merging the expand and depthwise operations into a single dense convolution, it addresses accelerator memory access bottlenecks prevalent in depthwise kernels for early-stage, low-channel layers. The block’s design enables modern GPUs/TPUs to operate at higher throughput on these stages with only minor overhead in parameter count, as confirmed by NAS-informed block selection. The significance is particularly evident in training efficiency; EfficientNetV2 models with selectively integrated Fused-MBConv blocks train 3–11t Cint\,C_{\mathrm{in}}4 faster end-to-end while preserving or even improving state-of-the-art accuracy across diverse datasets (Tan et al., 2021). The block thus provides a principled, empirically grounded operator that supports both speed and accuracy targets in contemporary convolutional architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fused-MBConv.