Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

IRS: A Large Naturalistic Indoor Robotics Stereo Dataset to Train Deep Models for Disparity and Surface Normal Estimation (1912.09678v2)

Published 20 Dec 2019 in cs.CV and cs.RO

Abstract: Indoor robotics localization, navigation, and interaction heavily rely on scene understanding and reconstruction. Compared to the monocular vision which usually does not explicitly introduce any geometrical constraint, stereo vision-based schemes are more promising and robust to produce accurate geometrical information, such as surface normal and depth/disparity. Besides, deep learning models trained with large-scale datasets have shown their superior performance in many stereo vision tasks. However, existing stereo datasets rarely contain the high-quality surface normal and disparity ground truth, which hardly satisfies the demand of training a prospective deep model for indoor scenes. To this end, we introduce a large-scale synthetic but naturalistic indoor robotics stereo (IRS) dataset with over 100K stereo RGB images and high-quality surface normal and disparity maps. Leveraging the advanced rendering techniques of our customized rendering engine, the dataset is considerably close to the real-world captured images and covers several visual effects, such as brightness changes, light reflection/transmission, lens flare, vivid shadow, etc. We compare the data distribution of IRS with existing stereo datasets to illustrate the typical visual attributes of indoor scenes. Besides, we present DTN-Net, a two-stage deep model for surface normal estimation. Extensive experiments show the advantages and effectiveness of IRS in training deep models for disparity estimation, and DTN-Net provides state-of-the-art results for normal estimation compared to existing methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Qiang Wang (271 papers)
  2. Shizhen Zheng (3 papers)
  3. Qingsong Yan (9 papers)
  4. Fei Deng (35 papers)
  5. Kaiyong Zhao (16 papers)
  6. Xiaowen Chu (108 papers)
Citations (19)

Summary

  • The paper presents the IRS dataset, featuring over 100K stereo RGB images with comprehensive, naturalistic ground truth for disparity and surface normals.
  • It introduces DTN-Net, a two-stage deep learning model that significantly outperforms existing methods in indoor surface normal estimation.
  • Experimental results show that models trained on IRS achieve higher accuracy and better generalization in both disparity and normal estimation tasks for robotics.

Overview of IRS: A Large Naturalistic Indoor Robotics Stereo Dataset

The paper introduces the IRS dataset, a novel contribution to the field of stereo vision-based scene understanding for indoor robotics. Stereo vision, in contrast to monocular vision, directly incorporates geometric constraints, making it more reliable for inferring accurate disparity and surface normal information which are critical in robotics. Despite the advantages, the development of robust deep learning models in this domain has been hindered by the lack of large-scale, high-quality datasets with comprehensive disparity and surface normal ground truth.

Key Contributions

The IRS dataset presented in this paper consists of over 100,000 stereo RGB images paired with high-quality disparity and surface normal maps. These were synthetically generated but are claimed to be naturalistic, leveraging a customized rendering engine based on Unreal Engine 4 (UE4). The dataset aims to close the gap between synthetic data and the complexities of real-world environments by incorporating various visual effects such as brightness changes, light reflection, transmission, and lens flare, which are typical in indoor settings.

Furthermore, the paper introduces DTN-Net, a two-stage deep learning model designed to estimate surface normals effectively. This model, when trained on the IRS dataset, reportedly outperforms existing methods, suggesting that the data indeed enables the training of more accurate models.

Dataset and Methodology

The authors emphasize the synthetic yet realistic nature of the IRS dataset. Utilizing advanced rendering techniques, the dataset captures intricate lighting phenomena and material properties faithfully. Compared to existing datasets like FlyingThings3D, IRS is particularly curated to reflect the visual attributes common in indoor scenes, such as close distances and textured surfaces.

Quantitative comparisons indicate that the distribution of disparity and normal information in IRS aligns well with real-world indoor environments, addressing the disparities often observed when models trained on synthetic data are tested on natural data.

Numerical Results and Experiments

The experimental results reveal the effectiveness of the IRS dataset in training deep models for both disparity and surface normal estimation. The DTN-Net shows significant improvements over other existing architectures, achieving state-of-the-art results in normal estimation tasks. The paper notably demonstrates that models trained on IRS exhibit better generalization to both synthetic and real-world datasets. Specifically, models trained with IRS data demonstrate higher accuracy and robustness, particularly in dealing with the visual richness presented by indoor scenes.

The paper also provides compelling evidence that disparity estimation models, such as FADNet and GwcNet, when trained on IRS outperform those trained on traditional datasets like FlyingThings3D when applied to real-world scenarios, highlighting the dataset's practical relevance.

Implications and Future Directions

The IRS dataset contributes significantly to the stereo vision community by providing a robust platform for training deep learning models tailored to indoor robotics applications. It fills a critical gap by offering a large-scale, high-quality resource that incorporates comprehensive ground-truth data for both disparity and normals.

The framework and methodology adopted in constructing IRS open up avenues for even more realistic synthetic data generation, potentially extending into various environments beyond the current indoor focus. Future developments may explore further enhancement of realism, such as real-time lighting adjustments and dynamic object interactions, to better prepare models for diverse operational settings.

In conclusion, this work is poised to drive forward the capabilities of AI in indoor robotics, with IRS serving as a benchmark for training advanced neural architectures in the field of stereo vision.