Deep Learning for Omnidirectional Vision: A Survey and New Perspectives (2205.10468v2)

Published 21 May 2022 in cs.CV

Abstract: Omnidirectional image (ODI) data is captured with a 360x180 field-of-view, which is much wider than the pinhole cameras and contains richer spatial information than the conventional planar images. Accordingly, omnidirectional vision has attracted booming attention due to its more advantageous performance in numerous applications, such as autonomous driving and virtual reality. In recent years, the availability of customer-level 360 cameras has made omnidirectional vision more popular, and the advance of deep learning (DL) has significantly sparked its research and applications. This paper presents a systematic and comprehensive review and analysis of the recent progress in DL methods for omnidirectional vision. Our work covers four main contents: (i) An introduction to the principle of omnidirectional imaging, the convolution methods on the ODI, and datasets to highlight the differences and difficulties compared with the 2D planar image data; (ii) A structural and hierarchical taxonomy of the DL methods for omnidirectional vision; (iii) A summarization of the latest novel learning strategies and applications; (iv) An insightful discussion of the challenges and open problems by highlighting the potential research directions to trigger more research in the community.

Citations (34)

View on Semantic Scholar

Summary

The paper introduces a taxonomy of deep learning methods for omnidirectional images, emphasizing spherical convolutions and tailored projection techniques.
It examines challenges from distortions in ERP, CP, and tangent projections while presenting geometry-aware filters and graph-based convolutions as key solutions.
The survey highlights practical applications in VR, autonomous driving, and scene reconstruction, and outlines future research directions such as adversarial robustness and multi-modal fusion.

Deep Learning for Omnidirectional Vision: A Survey and New Perspectives

This paper presents an extensive survey and analysis of the recent advancements in deep learning (DL) methods tailored for omnidirectional vision, especially focusing on omnidirectional images (ODIs). Omnidirectional image data, captured with a $360^\circ \times 180^\circ$ field-of-view, offers a comprehensive view of the environment, surpassing the conventional capabilities of pinhole cameras. This feature has fostered increasing interest due to its substantial advantages in applications like autonomous driving and virtual reality.

Key Insights and Contributions

Omnidirectional Imaging Techniques: The paper introduces the principles of omnidirectional imaging, addressing the challenges posed by different projection methods for ODIs, such as equirectangular projection (ERP), cubemap projection (CP), and tangent projections. These projections often involve distortion, which complicates the direct application of traditional image processing techniques. The survey emphasizes methods such as spherical convolution and geometry-aware filters that are specifically developed to handle these challenges.
Taxonomy of DL Methods: This survey proposes a comprehensive taxonomy that categorizes DL approaches for omnidirectional vision based on several criteria:
- Convolution Techniques: Discusses planar projection-based and spherical convolution approaches, highlighting works like SphereNet and the adoption of graph-based convolutional networks (GCNs) for ODIs.
- Applications: Segments the use-cases into categories such as scene understanding, manipulation tasks (image generation, completion, and synthesis), and visual quality assessment.
Datasets and Training Techniques: The scarcity of labeled ODI data is a significant bottleneck. The paper documents existing datasets and examines training methodologies, including unsupervised, semi-supervised learning, and domain adaptation, to bridge the gap caused by limited annotated data.
Practical Applications: The survey explores the impact of ODI on various applications:
- VR/AR and Autonomous Driving: ODIs enhance immersive experiences by providing comprehensive environmental data for real-time navigation and interaction. Autonomous driving benefits from ODIs' ability to capture a full scene, improving situational awareness.
- Object Detection and Scene Reconstruction: The paper explores room layout estimation and scene reconstruction, leveraging the spatial completeness of ODIs for improved accuracy.
Future Directions: The authors speculate on potential future developments and challenges, advocating for:
- Adversarial Robustness: Addressing vulnerabilities in DL models applied to ODIs.
- Multi-modal Integration: Combining ODIs with other sensor modalities in smart city and metaverse applications, which demands efficient data fusion and cross-modal learning.

This comprehensive survey contributes significantly to the field by encapsulating the state-of-the-art in omnidirectional vision processing and highlighting the implications of various methodologies. As an informative resource, it identifies key challenges and potential directions for research, encouraging the development of more robust and efficient omnidirectional vision systems. The repository of works and methods mentioned in this paper forms a crucial foundation for researchers pursuing advancements in this area.

PDF Markdown

Related Papers

GitHub

GitHub - haoai-1997/Deep-learning-Survey-for-Omnidirectional-vision (130 stars)