OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas (1807.09620v1)

Published 25 Jul 2018 in cs.CV

Abstract: Recent work on depth estimation up to now has only focused on projective images ignoring 360 content which is now increasingly and more easily produced. We show that monocular depth estimation models trained on traditional images produce sub-optimal results on omnidirectional images, showcasing the need for training directly on 360 datasets, which however, are hard to acquire. In this work, we circumvent the challenges associated with acquiring high quality 360 datasets with ground truth depth annotations, by re-using recently released large scale 3D datasets and re-purposing them to 360 via rendering. This dataset, which is considerably larger than similar projective datasets, is publicly offered to the community to enable future research in this direction. We use this dataset to learn in an end-to-end fashion the task of depth estimation from 360 images. We show promising results in our synthesized data as well as in unseen realistic images.

Citations (202)

View on Semantic Scholar

Summary

The paper introduces two novel CNN architectures, UResNet and RectNet, specifically designed for depth estimation on 360° indoor panoramas.
The authors create a synthetic 360D dataset by rendering equirectangular images from large-scale 3D indoor scenes to provide accurate ground truth depth.
Experimental results demonstrate that RectNet outperforms traditional monocular models, effectively handling spherical distortions.

Exploration of Depth Estimation for Omnidirectional Imagery in Indoor Settings

The paper "OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas" addresses the problem of depth estimation from 360-degree images, specifically in indoor environments. The authors present a method to train convolutional neural networks (CNNs) to perform depth estimation directly on omnidirectional images, overcoming the challenges posed by the lack of available training datasets for spherical imagery. The significance of this work lies in adapting existing 3D datasets and creating a new synthetic dataset to bridge this gap, facilitating efficient training of depth estimation models for 360-degree content.

Context and Significance

Traditional monocular depth estimation techniques often rely on projective images captured via pinhole cameras. However, these approaches do not transfer effectively to spherical panoramas due to the inherent projection differences. The proliferation of omnidirectional media necessitates models trained specifically for 360-degree images, particularly for applications in navigation, 3D scene reconstruction, and virtual reality. The authors argue for the need to directly train on 360 datasets and introduce novel methodologies to construct these datasets from existing 3D scene data.

Methodology and Contributions

The authors propose a novel approach for generating realistic 360-degree datasets using existing 3D structured data. By rendering equirectangular images and their corresponding depth maps from four large-scale datasets, they create a comprehensive dataset that integrates both synthetic and real-world scanned indoor scenes. This is a crucial step, addressing the challenge of acquiring high-quality omnidirectional datasets with accurate ground truth depth annotations.

Two convolutional neural network architectures are proposed: UResNet and RectNet. UResNet, based on a traditional encoder-decoder architecture, incorporates skip connections to maintain high-resolution depth predictions. In contrast, RectNet is directly tailored for equirectangular images, focusing on capturing the spherical domain's broader field of view with dilated convolutions, and employing specialized convolution filters to handle the distortions associated with equirectangular projections.

The authors demonstrate through rigorous experimentation that these models significantly outperform existing monocular depth estimation models when applied to omnidirectional imagery. RectNet, in particular, shows a marked improvement in inference performance, attributed to its design addressing spherical image properties.

Results and Implications

The paper’s results include impressive quantitative and qualitative performance of both proposed networks on a synthesized test set and an unseen dataset (SceneNet), with RectNet showing superior results due to its explicit design for handling spherical distortions. Comparing these results to existing monocular models reveals the inadequacy of traditional models when applied to 360-degree images, particularly demonstrating challenges when using methods trained on monocular imagery without bias towards the equirectangular format.

A critical contribution is the publicly available dataset termed '360D dataset,' which serves as a valuable asset for future research into spherical panorama depth estimation. By offering this dataset, the authors facilitate ongoing research and development in fields reliant on depth estimation in VR, robotics, and other applications of omnidirectional imaging.

Future Directions

The research opens several pathways for future exploration. Firstly, extending the dataset to include outdoor environments could significantly enhance model versatility, as current work focuses solely on indoor environments. The introduction of models capable of handling varying lighting conditions and dynamic scenarios would also be beneficial. Another promising direction is exploring unsupervised and semi-supervised learning strategies to further enhance model robustness across diverse real-world scenarios, potentially utilizing adversarial networks to generate realistic training environments.

In conclusion, the paper makes substantial advances in the field of depth estimation for omnidirectional images. It highlights the need for adapted networks and datasets that cater to the unique properties of spherical media, paving the way for future innovations in understanding and utilizing 360-degree visual data.

PDF Markdown