Wide Fire Module: Optimizing CNN Efficiency

Updated 4 January 2026

Wide Fire Module (WFM) is an efficient CNN component that uses group convolutions to optimize parameter usage and computational cost in object detectors.
It employs a squeeze stage and dual expand paths—using 1×1 and 3×3 group convolutions—to significantly reduce parameters and MACs compared to traditional Fire modules.
Its integration within Fire SSD improves mAP and throughput on edge devices, striking a balance between accuracy and computational efficiency.

The Wide Fire Module (WFM) is an architectural component designed to optimize the trade-off between computational efficiency and representational capacity in convolutional neural networks, particularly for object detection on edge devices. First introduced as the core building block of Fire SSD, a variant of the Single Shot Detector (SSD) tailored for resource-constrained environments, WFM represents a direct evolution of the SqueezeNet Fire module. The WFM achieves significant reductions in both parameter count and multiply–accumulate operations (MACs)—while simultaneously improving accuracy—by deploying group convolutions with carefully chosen group sizes in both the $1 \times 1$ and $3 \times 3$ expansion paths (Liau et al., 2018).

1. Structural Definition and Construction

A single Wide Fire Module operates on an input feature map $X \in \mathbb{R}^{H \times W \times C_{in}}$ and transforms it through three stages:

Squeeze stage: $1 \times 1$ convolution reducing the channel dimension to $C_s = C_{in} / 4$ . Output: $H \times W \times C_s$ .
Expand stage: Parallel pathways—
- Path 1 ("1×1 group convolution"): $g_1=2$ groups, channels $C_s \to C_{out}/2$ , stride 1, no padding, ReLU.
- Path 2 ("3×3 group convolution"): $g_3=16$ groups, channels $C_s \to C_{out}/2$ , stride 1, padding 1, ReLU.
Concatenation: Outputs from both expand paths are concatenated along the channel axis, yielding $H \times W \times C_{out}$ , where $C_{out}=C_{in}$ .

No pooling or channel shuffling is performed within the WFM. All convolutions are immediately followed by a ReLU activation.

2. Mathematical and Layerwise Comparison

Relative to the classic Fire module, WFM introduces group convolutions in both expansion paths to decrease model complexity:

Fire module expansion: Standard $1\times 1$ and $3\times 3$ convolutions, each outputting $C_{in}/2$ channels.
WFM expansion: Identical output channel allocation but utilizes group convolutions ( $g_1, g_3$ groups for $1\times 1$ , $3\times 3$ paths, respectively).

Parameter count for each expansion path in WFM:

$P_{wfm} = \frac{C_s \cdot (C_{out}/2)}{g_1} + 9 \cdot \frac{C_s \cdot (C_{out}/2)}{g_3}$

This structure leads to an approximate $5\times$ – $6\times$ parameter reduction in the expand stage, depending on channel configuration.

For example, at $C_{in}=C_{out}=512$ :

Squeeze: $512 \rightarrow 128$ ($65,536$ parameters)
Expand, $1\times1$ ( $g_1=2$ ): $16,384$ parameters
Expand, $3\times3$ ( $g_3=16$ ): $18,432$ parameters Total expand: $34,816$ parameters, compared to $327,680$ for the original Fire module expand stage.

3. FLOPs and Computational Efficiency

WFM offers a substantial reduction in multiply–accumulate operations (MACs) within the expand stage:

Original Fire MACs:

$MAC_{orig} = 5 \cdot H \cdot W \cdot C_s \cdot C_{out}$

WFM MACs:

$MAC_{wfm} = H \cdot W \cdot C_s \cdot (C_{out}/2) \cdot (1/g_1 + 9/g_3)$

For $g_1 = 2$ , $g_3 = 16$ , this simplifies to approximately $0.106 \cdot MAC_{orig}$ .

Empirical results within the Fire SSD context show that replacing all appended modules with WFMs reduces the total parameter count from $7.94$M to $6.77$M (a $-9.3\%$ decrease) and MACs from $2.59$G to $2.43$G (a $-6\%$ decrease), while mean average precision (mAP) on VOC 2007 rises by $0.4$ points (Liau et al., 2018).

4. Integration in Fire SSD Architecture

In Fire SSD, the backbone is based on SqueezeNet with residual connections up to layer "Fire8." The extended feature extractor comprises ten sequential WFMs (Fire9–Fire16) and a $1\times1$ convolution (Conv17). For detection, six output scales (feature map resolutions: $38\times38$ to $1\times1$ ) are used:

On $38\times38$ and $19\times19$ feature maps: two WFMs stacked with residual connection before detection heads.
On $10\times10$ and $5\times5$ : a single WFM precedes detection.
On $3\times3$ and $1\times1$ : no WFM added.

This placement leverages the WFM’s efficiency, allowing more expressivity at deeper, high-resolution layers. The resulting model achieves $70.5\%$ mAP on PASCAL VOC 2007, outpacing SSD+SqueezeNet ( $64.3\%$ ) and SSD+MobileNet ( $68.0\%$ ), while maintaining a compact $28$ MB size ($7.13$M parameters). Throughput is measured at $31.7$ FPS on a mainstream low-power CPU and $39.8$ FPS on integrated GPU in FP16 mode.

Design Choice	Params (M)	MACs (G)	VOC 2007 mAP (%)
Base	7.94	2.59	68.5
+WFM	6.77	2.43	68.9
+WFM+DRMD	7.13	2.67	69.1
+WFM+DRMD+NDM	7.13	2.67	70.5

5. Ablation Analysis and Optimal Group Size Selection

Experimental ablation demonstrates the relevance of the chosen group sizes. Within the WFM, $g_1 = 2$ for the $1\times1$ path and $g_3 = 16$ for the $3\times3$ path are empirically optimal. Increasing $g_1$ reduces cross-channel mixing and degrades accuracy, while a smaller $g_3$ leads to less computational saving. The adopted configuration yields balanced expressivity across both expanded paths:

$\frac{C_{1\times1}}{g_1} \cdot 1^2 \approx \frac{C_{3\times3}}{g_3} \cdot 3^2$

This design supports improved feature learning without excess resource consumption.

6. Practical Impact and Applicability

The WFM’s efficiency gains have a pronounced effect on both real-time performance and overall resource requirements, particularly critical for edge devices where compute and thermal budgets are stringent. Fire SSD, using the WFM throughout the backbone and detection heads, is approximately $6\times$ faster than SSD300 with only a $6.7$ mAP point decrease, enabling deployment scenarios (e.g., video surveillance) previously impractical for full-scale CNN object detectors (Liau et al., 2018).

A plausible implication is that the WFM’s plug-and-play nature makes it readily adaptable for any Fire-based backbone or multi-scale head within the wider family of object detectors where parameter and compute efficiency are required.

7. Conclusion

The Wide Fire Module replaces standard expansion layers with group convolutions of finely tuned cardinalities, yielding superior accuracy-to-cost and speed-to-size ratios for object detection models on edge hardware. The WFM enhances the representational capacity per MAC and per parameter relative to the original Fire module, achieving both substantial reductions in resource consumption and small gains in detection accuracy. Its architecture has direct implications for the development of highly efficient yet accurate CNN models, especially for deployment in constrained environments (Liau et al., 2018).

PDF Markdown Chat (Pro)

References (1)

Fire SSD: Wide Fire Modules based Single Shot Detector on Edge Device (2018)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Wide Fire Module (WFM).