Papers
Topics
Authors
Recent
2000 character limit reached

Wide Fire Module: Optimizing CNN Efficiency

Updated 4 January 2026
  • Wide Fire Module (WFM) is an efficient CNN component that uses group convolutions to optimize parameter usage and computational cost in object detectors.
  • It employs a squeeze stage and dual expand paths—using 1×1 and 3×3 group convolutions—to significantly reduce parameters and MACs compared to traditional Fire modules.
  • Its integration within Fire SSD improves mAP and throughput on edge devices, striking a balance between accuracy and computational efficiency.

The Wide Fire Module (WFM) is an architectural component designed to optimize the trade-off between computational efficiency and representational capacity in convolutional neural networks, particularly for object detection on edge devices. First introduced as the core building block of Fire SSD, a variant of the Single Shot Detector (SSD) tailored for resource-constrained environments, WFM represents a direct evolution of the SqueezeNet Fire module. The WFM achieves significant reductions in both parameter count and multiply–accumulate operations (MACs)—while simultaneously improving accuracy—by deploying group convolutions with carefully chosen group sizes in both the 1×11 \times 1 and 3×33 \times 3 expansion paths (Liau et al., 2018).

1. Structural Definition and Construction

A single Wide Fire Module operates on an input feature map X∈RH×W×CinX \in \mathbb{R}^{H \times W \times C_{in}} and transforms it through three stages:

  • Squeeze stage: 1×11 \times 1 convolution reducing the channel dimension to Cs=Cin/4C_s = C_{in} / 4. Output: H×W×CsH \times W \times C_s.
  • Expand stage: Parallel pathways—
    • Path 1 ("1×1 group convolution"): g1=2g_1=2 groups, channels Cs→Cout/2C_s \to C_{out}/2, stride 1, no padding, ReLU.
    • Path 2 ("3×3 group convolution"): g3=16g_3=16 groups, channels Cs→Cout/2C_s \to C_{out}/2, stride 1, padding 1, ReLU.
  • Concatenation: Outputs from both expand paths are concatenated along the channel axis, yielding H×W×CoutH \times W \times C_{out}, where Cout=CinC_{out}=C_{in}.

No pooling or channel shuffling is performed within the WFM. All convolutions are immediately followed by a ReLU activation.

2. Mathematical and Layerwise Comparison

Relative to the classic Fire module, WFM introduces group convolutions in both expansion paths to decrease model complexity:

  • Fire module expansion: Standard 1×11\times 1 and 3×33\times 3 convolutions, each outputting Cin/2C_{in}/2 channels.
  • WFM expansion: Identical output channel allocation but utilizes group convolutions (g1,g3g_1, g_3 groups for 1×11\times 1, 3×33\times 3 paths, respectively).

Parameter count for each expansion path in WFM:

Pwfm=Csâ‹…(Cout/2)g1+9â‹…Csâ‹…(Cout/2)g3P_{wfm} = \frac{C_s \cdot (C_{out}/2)}{g_1} + 9 \cdot \frac{C_s \cdot (C_{out}/2)}{g_3}

This structure leads to an approximate 5×5\times–6×6\times parameter reduction in the expand stage, depending on channel configuration.

For example, at Cin=Cout=512C_{in}=C_{out}=512:

  • Squeeze: 512→128512 \rightarrow 128 ($65,536$ parameters)
  • Expand, 1×11\times1 (g1=2g_1=2): $16,384$ parameters
  • Expand, 3×33\times3 (g3=16g_3=16): $18,432$ parameters Total expand: $34,816$ parameters, compared to $327,680$ for the original Fire module expand stage.

3. FLOPs and Computational Efficiency

WFM offers a substantial reduction in multiply–accumulate operations (MACs) within the expand stage:

  • Original Fire MACs:

MACorig=5â‹…Hâ‹…Wâ‹…Csâ‹…CoutMAC_{orig} = 5 \cdot H \cdot W \cdot C_s \cdot C_{out}

  • WFM MACs:

MACwfm=Hâ‹…Wâ‹…Csâ‹…(Cout/2)â‹…(1/g1+9/g3)MAC_{wfm} = H \cdot W \cdot C_s \cdot (C_{out}/2) \cdot (1/g_1 + 9/g_3)

For g1=2g_1 = 2, g3=16g_3 = 16, this simplifies to approximately 0.106â‹…MACorig0.106 \cdot MAC_{orig}.

Empirical results within the Fire SSD context show that replacing all appended modules with WFMs reduces the total parameter count from $7.94$M to $6.77$M (a −9.3%-9.3\% decrease) and MACs from $2.59$G to $2.43$G (a −6%-6\% decrease), while mean average precision (mAP) on VOC 2007 rises by $0.4$ points (Liau et al., 2018).

4. Integration in Fire SSD Architecture

In Fire SSD, the backbone is based on SqueezeNet with residual connections up to layer "Fire8." The extended feature extractor comprises ten sequential WFMs (Fire9–Fire16) and a 1×11\times1 convolution (Conv17). For detection, six output scales (feature map resolutions: 38×3838\times38 to 1×11\times1) are used:

  • On 38×3838\times38 and 19×1919\times19 feature maps: two WFMs stacked with residual connection before detection heads.
  • On 10×1010\times10 and 5×55\times5: a single WFM precedes detection.
  • On 3×33\times3 and 1×11\times1: no WFM added.

This placement leverages the WFM’s efficiency, allowing more expressivity at deeper, high-resolution layers. The resulting model achieves 70.5%70.5\% mAP on PASCAL VOC 2007, outpacing SSD+SqueezeNet (64.3%64.3\%) and SSD+MobileNet (68.0%68.0\%), while maintaining a compact $28$ MB size ($7.13$M parameters). Throughput is measured at $31.7$ FPS on a mainstream low-power CPU and $39.8$ FPS on integrated GPU in FP16 mode.

Design Choice Params (M) MACs (G) VOC 2007 mAP (%)
Base 7.94 2.59 68.5
+WFM 6.77 2.43 68.9
+WFM+DRMD 7.13 2.67 69.1
+WFM+DRMD+NDM 7.13 2.67 70.5

5. Ablation Analysis and Optimal Group Size Selection

Experimental ablation demonstrates the relevance of the chosen group sizes. Within the WFM, g1=2g_1 = 2 for the 1×11\times1 path and g3=16g_3 = 16 for the 3×33\times3 path are empirically optimal. Increasing g1g_1 reduces cross-channel mixing and degrades accuracy, while a smaller g3g_3 leads to less computational saving. The adopted configuration yields balanced expressivity across both expanded paths:

C1×1g1⋅12≈C3×3g3⋅32\frac{C_{1\times1}}{g_1} \cdot 1^2 \approx \frac{C_{3\times3}}{g_3} \cdot 3^2

This design supports improved feature learning without excess resource consumption.

6. Practical Impact and Applicability

The WFM’s efficiency gains have a pronounced effect on both real-time performance and overall resource requirements, particularly critical for edge devices where compute and thermal budgets are stringent. Fire SSD, using the WFM throughout the backbone and detection heads, is approximately 6×6\times faster than SSD300 with only a $6.7$ mAP point decrease, enabling deployment scenarios (e.g., video surveillance) previously impractical for full-scale CNN object detectors (Liau et al., 2018).

A plausible implication is that the WFM’s plug-and-play nature makes it readily adaptable for any Fire-based backbone or multi-scale head within the wider family of object detectors where parameter and compute efficiency are required.

7. Conclusion

The Wide Fire Module replaces standard expansion layers with group convolutions of finely tuned cardinalities, yielding superior accuracy-to-cost and speed-to-size ratios for object detection models on edge hardware. The WFM enhances the representational capacity per MAC and per parameter relative to the original Fire module, achieving both substantial reductions in resource consumption and small gains in detection accuracy. Its architecture has direct implications for the development of highly efficient yet accurate CNN models, especially for deployment in constrained environments (Liau et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Wide Fire Module (WFM).