Lite-HRNet: A Lightweight High-Resolution Network (2104.06403v1)

Published 13 Apr 2021 in cs.CV

Abstract: We present an efficient high-resolution network, Lite-HRNet, for human pose estimation. We start by simply applying the efficient shuffle block in ShuffleNet to HRNet (high-resolution network), yielding stronger performance over popular lightweight networks, such as MobileNet, ShuffleNet, and Small HRNet. We find that the heavily-used pointwise (1x1) convolutions in shuffle blocks become the computational bottleneck. We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1x1) convolutions in shuffle blocks. The complexity of channel weighting is linear w.r.t the number of channels and lower than the quadratic time complexity for pointwise convolutions. Our solution learns the weights from all the channels and over multiple resolutions that are readily available in the parallel branches in HRNet. It uses the weights as the bridge to exchange information across channels and resolutions, compensating the role played by the pointwise (1x1) convolution. Lite-HRNet demonstrates superior results on human pose estimation over popular lightweight networks. Moreover, Lite-HRNet can be easily applied to semantic segmentation task in the same lightweight manner. The code and models have been publicly available at https://github.com/HRNet/Lite-HRNet.

Citations (250)

View on Semantic Scholar

Summary

The paper introduces a novel lightweight architecture that replaces costly convolutions with a conditional channel weighting mechanism.
It achieves efficient multi-resolution information exchange, balancing high-resolution accuracy with reduced computational demands.
Empirical results on COCO, MPII, and Cityscapes demonstrate improved mAP, PCKh, and mIoU compared to other lightweight models.

Lite-HRNet: A Lightweight High-Resolution Network

The paper, "Lite-HRNet: A Lightweight High-Resolution Network," introduces an efficient network architecture tailored for human pose estimation and semantic segmentation. The authors have made significant strides by addressing computational bottlenecks, particularly focusing on optimizing the high-resolution network (HRNet) using a lightweight framework.

Introduction and Motivation

Human pose estimation necessitates high-resolution representations for achieving high accuracy. While HRNet has demonstrated substantial potential in handling such tasks across large models, its application in resource-constrained environments has been challenging. The primary objective is to develop a model that balances accuracy with computational efficiency, a critical demand in real-world systems.

Core Contributions

Replacement of Convolutional Operations: The paper identifies the computational inefficiencies of pointwise ( $1\times 1$ ) convolutions inherent in shuffle blocks and addresses this by introducing a lightweight conditional channel weighting mechanism. This approach significantly reduces computational overhead while maintaining the integrity of information exchange across channels.
Efficient Information Exchange Across Resolutions: The conditional channel weighting is designed to function efficiently across various resolutions. By adaptively learning and applying weights, the method facilitates seamless information flow across different resolution scales. The introduction of multi-resolution integration enhances model performance without incurring additional computational burdens.
Empirical Validation and Performance: Lite-HRNet is benchmarked against prevalent lightweight networks such as MobileNet, ShuffleNet, and Small HRNet. The results reflect superior performance in human pose estimation, particularly on COCO and MPII datasets, indicating that Lite-HRNet achieves better accuracy-complexity trade-offs.

Numerical Results and Implications

Human Pose Estimation on COCO: With input sizes of $256\times 192$ and $384\times 288$ , Lite-HRNet-30 achieves significant improvements in mAP scores while maintaining lower GFLOPs compared to other lightweight models.
MPII Dataset Performance: Lite-HRNet-18 and Lite-HRNet-30 present enhancements in PCKh scores compared to MobileNetV2 and ShuffleNetV2, demonstrating the network's adaptability across datasets.
Semantic Segmentation: Beyond pose estimation, the Lite-HRNet shows promise in semantic segmentation tasks, notably on the Cityscapes dataset, achieving competitive mIoU scores respective to NAS-based and handcrafted networks.

Theoretical and Practical Implications

The strategy of utilizing conditional channel weighting as a substitution for costly convolutions can inspire future optimizations across other vision tasks. This approach not only improves inference speed but also opens avenues for efficient model design applicable in edge computing and mobile devices where resource availability is limited.

The method's success suggests a broader applicability in AI domains where computational efficiency must coexist with accuracy. The reduced complexity potentially impacts real-time applications such as autonomous driving and AR systems.

Future Directions

Potential areas for further exploration include refining the channel weighting mechanism and exploring its application across other forms of neural architectures beyond HRNet. Examining the integration of such lightweight designs in various domains could lead to broader adoption and further performance enhancements.

In conclusion, Lite-HRNet represents a meaningful advancement in the field of lightweight architectures, achieving a harmonious balance between performance and computational efficiency. This work lays a foundation for continued research in high-resolution network optimizations in the context of constrained environments.

PDF Markdown

Related Papers

GitHub

GitHub - HRNet/Lite-HRNet: This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network. (846 stars)