FastDepth: Fast Monocular Depth Estimation on Embedded Systems (1903.03273v1)

Published 8 Mar 2019 in cs.CV and cs.RO

Abstract: Depth sensing is a critical function for robotic tasks such as localization, mapping and obstacle detection. There has been a significant and growing interest in depth estimation from a single RGB image, due to the relatively low cost and size of monocular cameras. However, state-of-the-art single-view depth estimation algorithms are based on fairly complex deep neural networks that are too slow for real-time inference on an embedded platform, for instance, mounted on a micro aerial vehicle. In this paper, we address the problem of fast depth estimation on embedded systems. We propose an efficient and lightweight encoder-decoder network architecture and apply network pruning to further reduce computational complexity and latency. In particular, we focus on the design of a low-latency decoder. Our methodology demonstrates that it is possible to achieve similar accuracy as prior work on depth estimation, but at inference speeds that are an order of magnitude faster. Our proposed network, FastDepth, runs at 178 fps on an NVIDIA Jetson TX2 GPU and at 27 fps when using only the TX2 CPU, with active power consumption under 10 W. FastDepth achieves close to state-of-the-art accuracy on the NYU Depth v2 dataset. To the best of the authors' knowledge, this paper demonstrates real-time monocular depth estimation using a deep neural network with the lowest latency and highest throughput on an embedded platform that can be carried by a micro aerial vehicle.

Citations (278)

View on Semantic Scholar

Summary

The paper presents an efficient encoder-decoder architecture, FastDepth, that significantly reduces computational complexity for real-time depth estimation on embedded devices.
It employs depthwise separable convolutions, simple upsampling, and NetAdapt pruning to optimize performance without sacrificing accuracy.
Deployment on the NVIDIA Jetson TX2 demonstrates practical utility by achieving 178 fps and maintaining under 10 W power consumption for robotics applications.

FastDepth: Fast Monocular Depth Estimation on Embedded Systems

The paper "FastDepth: Fast Monocular Depth Estimation on Embedded Systems" presents an efficient solution for real-time depth estimation on embedded platforms, which is critical for various robotic applications. The authors highlight the challenges posed by current state-of-the-art methods, which are often computationally intensive and unsuitable for real-time processing on constrained hardware. This work is centered around developing an efficient encoder-decoder architecture specifically optimized for low-power embedded systems such as micro aerial vehicles.

Key Contributions

Efficient Network Architecture: The authors introduce a lightweight encoder-decoder network called FastDepth. The encoder utilizes MobileNet, known for its efficiency due to depthwise separable convolutions, while the decoder employs a design focused on low latency. By using techniques like depthwise decomposition and simple upsampling methods (nearest-neighbor interpolation followed by depthwise separable convolution), the network reduces computational complexity significantly.
Network Pruning: A state-of-the-art pruning algorithm, NetAdapt, is employed to further streamline the network, systematically removing redundancies and achieving additional speedup without significant accuracy loss.
Deployment on Embedded Platforms: The paper demonstrates the deployment on an NVIDIA Jetson TX2, achieving 178 fps using the GPU and maintaining active power consumption under 10 W, which is practical for real-world robotic applications where resources are shared among multiple tasks.

Numerical and Performance Insights

The proposed solution showcases significant improvements in computational efficiency while maintaining competitive accuracy levels. FastDepth achieves a $\delta_1$ accuracy of 77.1% on the NYU Depth v2 dataset, aligning closely with existing state-of-the-art methods but with an impressive increase in throughput. The methodology behind optimizing and pruning the network to suit embedded platforms results in FastDepth running orders of magnitude faster than prior approaches, attesting to its suitability for real-time applications.

Theoretical and Practical Implications

From a theoretical perspective, this work advances the application domain of neural networks beyond large, powerful computing systems, prompting further exploration into efficient model designs for resource-constrained environments. The practical implications are substantial, particularly in the robotics and autonomous systems sectors, where real-time perception is paramount.

The success of FastDepth suggests potential for broader applications in areas such as mobile computing and edge AI, where power efficiency and processing speed are critical constraints.

Future Directions

Given the demonstrated efficacy of FastDepth on embedded platforms, future work could explore further optimizations through the integration of quantization techniques or advanced neural architecture search methods tailored for specific hardware. Another direction entails expanding the applications of FastDepth to other perception tasks requiring dense outputs, potentially using a similar architectural framework.

Overall, "FastDepth: Fast Monocular Depth Estimation on Embedded Systems" provides a robust framework for deploying deep learning models in real-time applications on constrained hardware, marking a significant step towards making intelligent robotic systems both practical and efficient.

PDF Markdown