TartanAir: A Dataset to Push the Limits of Visual SLAM (2003.14338v2)

Published 31 Mar 2020 in cs.RO

Abstract: We present a challenging dataset, the TartanAir, for robot navigation tasks and more. The data is collected in photo-realistic simulation environments with the presence of moving objects, changing light and various weather conditions. By collecting data in simulations, we are able to obtain multi-modal sensor data and precise ground truth labels such as the stereo RGB image, depth image, segmentation, optical flow, camera poses, and LiDAR point cloud. We set up large numbers of environments with various styles and scenes, covering challenging viewpoints and diverse motion patterns that are difficult to achieve by using physical data collection platforms. In order to enable data collection at such a large scale, we develop an automatic pipeline, including mapping, trajectory sampling, data processing, and data verification. We evaluate the impact of various factors on visual SLAM algorithms using our data. The results of state-of-the-art algorithms reveal that the visual SLAM problem is far from solved. Methods that show good performance on established datasets such as KITTI do not perform well in more difficult scenarios. Although we use the simulation, our goal is to push the limits of Visual SLAM algorithms in the real world by providing a challenging benchmark for testing new methods, while also using a large diverse training data for learning-based methods. Our dataset is available at \url{http://theairlab.org/tartanair-dataset}.

Authors (9)

Wenshan Wang (41 papers)
Delong Zhu (16 papers)
Xiangwei Wang (6 papers)
Yaoyu Hu (18 papers)
Yuheng Qiu (18 papers)
Chen Wang (600 papers)
Yafei Hu (7 papers)
Ashish Kapoor (64 papers)
Sebastian Scherer (163 papers)

Citations (295)

View on Semantic Scholar

Summary

Overview of TartanAir: A Dataset to Push the Limits of Visual SLAM

The paper, "TartanAir: A Dataset to Push the Limits of Visual SLAM," presents a comprehensive dataset aimed at advancing the field of Visual Simultaneous Localization and Mapping (V-SLAM). The TartanAir dataset is designed to challenge existing algorithms by providing synthetic, photo-realistic simulation environments that include dynamic scenes, varied weather, and lighting conditions. The data collection leverages modern computer graphics to emulate real-world scenarios, thus overcoming limitations of physical data collection environments.

Key Contributions

Extensive Dataset with Diverse Environments: The dataset comprises data from 30 simulated environments, categorized into urban, rural, nature, domestic, public, and sci-fi settings. It contains over 1,000 motion sequences covering a broad spectrum of challenges like dynamic objects and adverse lighting, ultimately offering over 4TB of data.
Multi-Modal Data: TartanAir provides enriched data that includes stereo RGB images, depth images, segmentation labels, optical flow, camera poses, and LiDAR point clouds. This offers a wide scope for testing diverse SLAM setups, including monocular, stereo, and RGB-D configurations.
Automatic Data Collection Pipeline: The authors introduce an automatic pipeline for data mapping, trajectory sampling, processing, and verification. This innovation enables large-scale data collection with minimal manual intervention, effectively facilitating a diverse and exhaustive dataset.
Benchmarking SLAM Algorithms: Through baseline evaluations with state-of-the-art algorithms like ORBSLAM and DSO, the paper demonstrates that many algorithms that perform adequately on existing datasets struggle with TartanAir's scenarios, highlighting the unsolved challenges in the field.
Quantitative Metrics for Evaluation: Employing Absolute Trajectory Error (ATE), Relative Pose Error (RPE), and Success Rate (SR), the authors provide numerical benchmarks to evaluate and compare algorithm performance comprehensively.

Experimental Evaluation Insights

The dataset reveals significant degradations in algorithmic performance under challenging settings, such as rain, dynamic object presence, and night-time scenarios, with notable drops in SR and accuracy. Moreover, the results suggest stereo algorithms' superiority over monocular methods in dynamic environments, though with limitations at extreme scenario settings.

Implications and Future Directions

The creation of the TartanAir dataset addresses the critical problem of overfitting current SLAM algorithms to specific datasets, which often exhibit limited diversity in terms of environments and motion patterns. By offering a simulation-based approach, TartanAir paves the way for more robust SLAM solutions transferable to real-world applications, suggesting potential improvements in autonomous navigation and interactive robotics.

Future interventions could focus on further diminishing the sim-to-real gap by incorporating additional levels of randomness and realism in synthetic environments. Researchers might explore domain adaptation techniques to improve the transfer of models trained on synthetic data to physical settings, thereby enhancing real-world applicability.

The dataset also propels interdisciplinary research opportunities in AI, utilizing its multi-modal capabilities for other computer vision tasks such as object detection, scene parsing, and optical flow estimation. Advancements in these areas may indirectly contribute to enhanced V-SLAM methods by enriching the dataset's ecosystem within machine learning frameworks.

In conclusion, TartanAir represents a significant step toward challenging and evolving V-SLAM capabilities. It sets a robust benchmark for evaluating the next generation of visual navigation algorithms and encourages continued exploration into diverse, complex scenario modeling. Researchers in robotics and computer vision are thus provided with a vital tool to advance the effectiveness and reliability of autonomous systems.

PDF Markdown