SBNet: Sparse Blocks Network for Fast Inference (1801.02108v2)

Published 7 Jan 2018 in cs.CV

Abstract: Conventional deep convolutional neural networks (CNNs) apply convolution operators uniformly in space across all feature maps for hundreds of layers - this incurs a high computational cost for real-time applications. For many problems such as object detection and semantic segmentation, we are able to obtain a low-cost computation mask, either from a priori problem knowledge, or from a low-resolution segmentation network. We show that such computation masks can be used to reduce computation in the high-resolution main network. Variants of sparse activation CNNs have previously been explored on small-scale tasks and showed no degradation in terms of object classification accuracy, but often measured gains in terms of theoretical FLOPs without realizing a practical speed-up when compared to highly optimized dense convolution implementations. In this work, we leverage the sparsity structure of computation masks and propose a novel tiling-based sparse convolution algorithm. We verified the effectiveness of our sparse CNN on LiDAR-based 3D object detection, and we report significant wall-clock speed-ups compared to dense convolution without noticeable loss of accuracy.

Authors (4)

Mengye Ren (52 papers)
Andrei Pokrovsky (5 papers)
Bin Yang (320 papers)
Raquel Urtasun (161 papers)

Citations (176)

View on Semantic Scholar

Summary

An Analysis of "SBNet: Sparse Blocks Network for Fast Inference"

The paper "SBNet: Sparse Blocks Network for Fast Inference" proposes a novel approach to reduce the computational burden of deep convolutional neural networks (CNNs) in high-resolution tasks, focusing on applications such as 3D object detection using LiDAR. The SBNet architecture leverages a sparse blocks mechanism, which utilizes computation masks to guide the convolution process more efficiently. This approach addresses the challenge of maintaining high accuracy while achieving practical speed-ups for real-time applications.

Key Innovations and Methodology

The SBNet introduces the concept of sparse block modularity, wherein a blockwise decomposition of the input feature space is utilized to exploit structured sparsity. The primary innovations can be summarized as follows:

Tiling-based Sparse Convolution: SBNet implements a tiling-based sparse convolution method that leverages computation masks to determine active and inactive locations for convolution operations. This reduces redundant computations, focusing resources on areas determined to be of higher interest or likely to contain relevant information.
Integration with Residual Networks: The sparse block approach integrates seamlessly with popular residual architectures (ResNet), allowing for increased inference speeds without compromising resolution and model capacity.
Efficient Gather and Scatter Operations: SBNet utilizes custom CUDA kernels for execution on GPUs, implementing efficient gather and scatter operations that align with the sparse block methodology. This design choice is fundamental to achieving practical speed-ups not seen with previous sparse CNN implementations.
Fused Kernels: The use of fused kernels for reduce-mask-to-indices, gather, transpose, and scatter operations results in minimized overhead, contributing to substantial wall-clock gain for real-time deployment.

Empirical Evaluation and Results

The authors conducted experiments on LiDAR-based 3D object detection tasks to evaluate the performance of SBNet. Significant speed-ups were reported when employing SBNet compared to dense convolution, particularly when high-resolution data and dense models would have otherwise led to computational inefficiencies. Key results include:

Speed-ups: SBNet demonstrates average speed-ups of 1.8x and 2.7x when using static road maps and dynamically predicted masks, respectively.
Detection Accuracy: Remarkably, SBNet maintains high detection accuracy, even showing slight improvements in some configurations compared to dense baselines, highlighting its utility in processing large sparse inputs efficiently.
Comparison with Existing Methods: SBNet outperforms existing unstructured sparse methods such as Sub-manifold Sparse Convolution Networks in both speed and practicality, especially in high-resolution tasks.

Implications and Future Directions

The SBNet framework provides a viable pathway for leveraging sparsity in convolutional layers to reduce computational costs, addressing a crucial bottleneck in deploying deep learning models in resource-constrained or real-time environments. The structured, block-based approach presents several advantages over unstructured sparsity, particularly in tasks where spatial information and computation costs can be optimized by focusing on regions of interest.

Future research directions could extend the SBNet framework by exploring variable block shapes and sizes to further tailor the sparsity representation to specific domains or task requirements. Additionally, combining SBNet with other model compression strategies such as pruning or quantization could yield further improvements in efficiency. The approach's compatibility with different attention mechanisms also suggests potential integrations for tasks that require context-sensitive computations.

In conclusion, SBNet offers a promising direction for efficient inference in deep learning, combining novel methodological advances with empirical validations that underscore its potential impact in the domain of computer vision and beyond. The ability to maintain performance while achieving real-time compatibility paves the way for broader applicability in industrial and research settings.

PDF Markdown