Papers
Topics
Authors
Recent
2000 character limit reached

MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching (2108.09770v1)

Published 22 Aug 2021 in cs.CV

Abstract: Recent methods in stereo matching have continuously improved the accuracy using deep models. This gain, however, is attained with a high increase in computation cost, such that the network may not fit even on a moderate GPU. This issue raises problems when the model needs to be deployed on resource-limited devices. For this, we propose two light models for stereo vision with reduced complexity and without sacrificing accuracy. Depending on the dimension of cost volume, we design a 2D and a 3D model with encoder-decoders built from 2D and 3D convolutions, respectively. To this end, we leverage 2D MobileNet blocks and extend them to 3D for stereo vision application. Besides, a new cost volume is proposed to boost the accuracy of the 2D model, making it performing close to 3D networks. Experiments show that the proposed 2D/3D networks effectively reduce the computational expense (27%/95% and 72%/38% fewer parameters/operations in 2D and 3D models, respectively) while upholding the accuracy. Our code is available at https://github.com/cogsys-tuebingen/mobilestereonet.

Citations (66)

Summary

  • The paper introduces lightweight 2D and 3D stereo networks using MobileNet extensions to significantly reduce computational complexity while maintaining competitive accuracy.
  • It employs an innovative interlaced cost volume that effectively aggregates left and right image features for improved stereo matching performance.
  • Experimental results on SceneFlow and KITTI 2015 demonstrate up to 16.6x fewer parameters and 3.9x lower computational demands compared to state-of-the-art models.

An Overview of MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching

The paper, titled "MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching," presents a concerted effort to address the computational complexities associated with stereo vision tasks through the development of two light models: a 2D and a 3D network. Stereo matching is a key technique in depth perception that leverages disparities in binocular images. With the rapid advances in deep learning, accuracy in stereo matching has improved significantly, but this comes at the cost of increased computational demands, often exceeding the capacity of moderate GPUs. Consequently, deploying such models on resource-constrained devices remains a formidable challenge.

Key Contributions and Methodologies

The authors propose an innovative approach using MobileNet blocks to achieve lightweight yet accurate stereo networks. The key methodological contributions of the paper can be highlighted as follows:

  1. Lightweight Architecture Through MobileNet Extensions: By extending MobileNet-V1 and MobileNet-V2 blocks to 3D, the authors create a 2D and a 3D network that significantly reduces computational load without compromising accuracy. These blocks possess depth-wise and point-wise convolutions, optimized for reduced operations and parameters.
  2. Interlacing Cost Volume Construction: The paper inaugurates a learnable cost volume designed to better aggregate the left and right unary features by interlacing them, improving the stereo matching performance in 2D models. This method introduces parameterization in the cost volume construction, consequently enhancing the network’s learning capabilities.
  3. Architecture Optimization and Implementation: Through exhaustive experiments, various network components were substituted with MobileNet blocks to create an optimal architecture balancing between accuracy and computational efficiency. This culminated in the development of 2D-MobileStereoNet and 3D-MobileStereoNet.

Experimental Results and Performance Evaluation

The proposed models were evaluated on the SceneFlow and KITTI 2015 datasets. Both models demonstrated a significant reduction in parameters and operations compared to existing methods:

  • 2D-MobileStereoNet: It shows a notable reduction in computational demands, achieving 16.6x fewer parameters compared to other contemporary 2D models like AutoDispNet-C, while maintaining competitive accuracy.
  • 3D-MobileStereoNet: This model performs comparably to state-of-the-art 3D models but with a drastic decrease in complexity. Compared with GwcNet-gc, it employs 3.9x fewer GigaMACs and requires only 1.77 million parameters, which is 3.9 times fewer than the comparison baseline.

Qualitative evaluations further affirm that the results of these models are visually consistent with the ground truth disparities.

Implications and Future Speculations

The implications of this research are profound, offering a pathway for deploying stereo vision models on edge devices, thanks to their efficient memory and computational requirements. The nuanced approach of parameterizing cost volume and leveraging MobileNet architecture underscores potential future trends in AI for meeting the pressing constraints posed by embedded systems.

Theoretically, the extension and adaptation of MobileNet blocks, alongside the innovative cost construction strategy, could offer insights into similar adaptations across other domains within deep learning. This paper sets the foundation for expanding low-resource deep networks in varied stereo vision applications, particularly in autonomous systems and mobile robotics, fostering developments that align with the constraints of computational efficiency and real-time processing.

As the field evolves, leveraging such lightweight networks without sacrificing performance will likely become a cornerstone in the development of AI technologies that are both environmentally and operationally sustaining. Further research could explore the integration of these methods with other sensing technologies or their extensions to even more complex vision tasks.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 4 tweets with 41 likes about this paper.

Youtube Logo Streamline Icon: https://streamlinehq.com