Wide Fire Module: Optimizing CNN Efficiency
- Wide Fire Module (WFM) is an efficient CNN component that uses group convolutions to optimize parameter usage and computational cost in object detectors.
- It employs a squeeze stage and dual expand paths—using 1×1 and 3×3 group convolutions—to significantly reduce parameters and MACs compared to traditional Fire modules.
- Its integration within Fire SSD improves mAP and throughput on edge devices, striking a balance between accuracy and computational efficiency.
The Wide Fire Module (WFM) is an architectural component designed to optimize the trade-off between computational efficiency and representational capacity in convolutional neural networks, particularly for object detection on edge devices. First introduced as the core building block of Fire SSD, a variant of the Single Shot Detector (SSD) tailored for resource-constrained environments, WFM represents a direct evolution of the SqueezeNet Fire module. The WFM achieves significant reductions in both parameter count and multiply–accumulate operations (MACs)—while simultaneously improving accuracy—by deploying group convolutions with carefully chosen group sizes in both the and expansion paths (Liau et al., 2018).
1. Structural Definition and Construction
A single Wide Fire Module operates on an input feature map and transforms it through three stages:
- Squeeze stage: convolution reducing the channel dimension to . Output: .
- Expand stage: Parallel pathways—
- Path 1 ("1×1 group convolution"): groups, channels , stride 1, no padding, ReLU.
- Path 2 ("3×3 group convolution"): groups, channels , stride 1, padding 1, ReLU.
- Concatenation: Outputs from both expand paths are concatenated along the channel axis, yielding , where .
No pooling or channel shuffling is performed within the WFM. All convolutions are immediately followed by a ReLU activation.
2. Mathematical and Layerwise Comparison
Relative to the classic Fire module, WFM introduces group convolutions in both expansion paths to decrease model complexity:
- Fire module expansion: Standard and convolutions, each outputting channels.
- WFM expansion: Identical output channel allocation but utilizes group convolutions ( groups for , paths, respectively).
Parameter count for each expansion path in WFM:
This structure leads to an approximate – parameter reduction in the expand stage, depending on channel configuration.
For example, at :
- Squeeze: ($65,536$ parameters)
- Expand, (): $16,384$ parameters
- Expand, (): $18,432$ parameters Total expand: $34,816$ parameters, compared to $327,680$ for the original Fire module expand stage.
3. FLOPs and Computational Efficiency
WFM offers a substantial reduction in multiply–accumulate operations (MACs) within the expand stage:
- Original Fire MACs:
- WFM MACs:
For , , this simplifies to approximately .
Empirical results within the Fire SSD context show that replacing all appended modules with WFMs reduces the total parameter count from $7.94$M to $6.77$M (a decrease) and MACs from $2.59$G to $2.43$G (a decrease), while mean average precision (mAP) on VOC 2007 rises by $0.4$ points (Liau et al., 2018).
4. Integration in Fire SSD Architecture
In Fire SSD, the backbone is based on SqueezeNet with residual connections up to layer "Fire8." The extended feature extractor comprises ten sequential WFMs (Fire9–Fire16) and a convolution (Conv17). For detection, six output scales (feature map resolutions: to ) are used:
- On and feature maps: two WFMs stacked with residual connection before detection heads.
- On and : a single WFM precedes detection.
- On and : no WFM added.
This placement leverages the WFM’s efficiency, allowing more expressivity at deeper, high-resolution layers. The resulting model achieves mAP on PASCAL VOC 2007, outpacing SSD+SqueezeNet () and SSD+MobileNet (), while maintaining a compact $28$ MB size ($7.13$M parameters). Throughput is measured at $31.7$ FPS on a mainstream low-power CPU and $39.8$ FPS on integrated GPU in FP16 mode.
| Design Choice | Params (M) | MACs (G) | VOC 2007 mAP (%) |
|---|---|---|---|
| Base | 7.94 | 2.59 | 68.5 |
| +WFM | 6.77 | 2.43 | 68.9 |
| +WFM+DRMD | 7.13 | 2.67 | 69.1 |
| +WFM+DRMD+NDM | 7.13 | 2.67 | 70.5 |
5. Ablation Analysis and Optimal Group Size Selection
Experimental ablation demonstrates the relevance of the chosen group sizes. Within the WFM, for the path and for the path are empirically optimal. Increasing reduces cross-channel mixing and degrades accuracy, while a smaller leads to less computational saving. The adopted configuration yields balanced expressivity across both expanded paths:
This design supports improved feature learning without excess resource consumption.
6. Practical Impact and Applicability
The WFM’s efficiency gains have a pronounced effect on both real-time performance and overall resource requirements, particularly critical for edge devices where compute and thermal budgets are stringent. Fire SSD, using the WFM throughout the backbone and detection heads, is approximately faster than SSD300 with only a $6.7$ mAP point decrease, enabling deployment scenarios (e.g., video surveillance) previously impractical for full-scale CNN object detectors (Liau et al., 2018).
A plausible implication is that the WFM’s plug-and-play nature makes it readily adaptable for any Fire-based backbone or multi-scale head within the wider family of object detectors where parameter and compute efficiency are required.
7. Conclusion
The Wide Fire Module replaces standard expansion layers with group convolutions of finely tuned cardinalities, yielding superior accuracy-to-cost and speed-to-size ratios for object detection models on edge hardware. The WFM enhances the representational capacity per MAC and per parameter relative to the original Fire module, achieving both substantial reductions in resource consumption and small gains in detection accuracy. Its architecture has direct implications for the development of highly efficient yet accurate CNN models, especially for deployment in constrained environments (Liau et al., 2018).