HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions (1908.03888v1)

Published 11 Aug 2019 in cs.CV, cs.LG, and eess.IV

Abstract: MobileNets, a class of top-performing convolutional neural network architectures in terms of accuracy and efficiency trade-off, are increasingly used in many resourceaware vision applications. In this paper, we present Harmonious Bottleneck on two Orthogonal dimensions (HBO), a novel architecture unit, specially tailored to boost the accuracy of extremely lightweight MobileNets at the level of less than 40 MFLOPs. Unlike existing bottleneck designs that mainly focus on exploring the interdependencies among the channels of either groupwise or depthwise convolutional features, our HBO improves bottleneck representation while maintaining similar complexity via jointly encoding the feature interdependencies across both spatial and channel dimensions. It has two reciprocal components, namely spatial contraction-expansion and channel expansion-contraction, nested in a bilaterally symmetric structure. The combination of two interdependent transformations performing on orthogonal dimensions of feature maps enhances the representation and generalization ability of our proposed module, guaranteeing compelling performance with limited computational resource and power. By replacing the original bottlenecks in MobileNetV2 backbone with HBO modules, we construct HBONets which are evaluated on ImageNet classification, PASCAL VOC object detection and Market-1501 person re-identification. Extensive experiments show that with the severe constraint of computational budget our models outperform MobileNetV2 counterparts by remarkable margins of at most 6.6%, 6.3% and 5.0% on the above benchmarks respectively. Code and pretrained models are available at https://github.com/d-li14/HBONet.

Authors (3)

Duo Li (31 papers)
Aojun Zhou (45 papers)
Anbang Yao (33 papers)

Citations (37)

View on Semantic Scholar

Summary

An Analytical Review of HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions

The paper presents "HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions," a novel convolutional neural network (CNN) architecture unit aimed at optimizing efficiency and accuracy, particularly for highly resource-constrained environments (< 40 MFLOPs). This work addresses the limitations of existing lightweight architectures by introducing a dual-faceted bottleneck design that simultaneously considers spatial and channel dimensions—an aspect that has received limited attention in prior architectures like MobileNets and ShuffleNets.

Core Contributions

Harmonious Bottleneck Design: The foundational innovation is the Harmonious Bottleneck (HBO), which incorporates two intertwined processes: spatial contraction-expansion and channel expansion-contraction. Unlike conventional bottleneck designs that largely focus on intrachannel dependencies, the HBO unit manages these dependencies across both spatial and channel dimensions through a bilaterally symmetric structure. This integration allows the bottleneck to enhance representational power with minimal computational overhead.
Integration into MobileNetV2: By replacing standard bottlenecks in the MobileNetV2 architecture with the proposed HBO units, the authors construct HBONet, which demonstrates superior performance metrics. Specifically, in scenarios where computational budgets are as low as 40 MFLOPs, HBONet outperforms conventional MobileNetV2 by up to 6.6\% on tasks such as ImageNet classification, PASCAL VOC object detection, and Market-1501 person re-identification.
Performance Gains Across Benchmarks: The experimental results are noteworthy for significant performance improvements. For instance, HBONet achieves a top-1 classification accuracy of 73.1% on ImageNet under 300 MFLOPs, surpassing other state-of-the-art architectures like ShuffleNetV2 and the original MobileNetV2.

Implications and Future Directions

The dual-dimensional bottleneck strategy proposed in this work illustrates a promising direction for CNN architecture optimization. By efficiently managing computational complexity through simultaneous channel and spatial transformations, HBONet provides a balanced trade-off between accuracy and resource consumption. This methodology is particularly valuable for deploying CNNs on mobile and other resource-limited platforms where power efficiency is paramount.

The theoretical and practical implications of this work suggest several avenues for future research:

Broader Application Settings: Extending the HBO architecture to other deep learning models and tasks could validate its adaptability and efficacy in diverse contexts, including more computationally intense environments.
Exploration of Complex Contraction-Expansion Dynamics: The practical performance enhancements invite further exploration into more complex or adaptive spatial and channel transformations, potentially involving dynamic or attention-based mechanisms to adjust transformations in real-time.
Automated Neural Architecture Search: Integrating the HBO design into NAS techniques can further exploit its efficiency, potentially leading to even more optimized architectures discovered through automated search processes.

The HBONet architecture adds a compelling dimension to the lightweight CNN domain, presenting an innovative approach to balancing computational and representational efficiency. As a methodological advance, its contributions are poised to influence ongoing and future explorations in efficient CNN design and deployment.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - d-li14/HBONet: [ICCV 2019] Harmonious Bottleneck on Two Orthogonal Dimensions, surpassing MobileNetV2 (103 stars)