CAD2RL: Real Single-Image Flight without a Single Real Image (1611.04201v4)

Published 13 Nov 2016 in cs.LG, cs.CV, and cs.RO

Abstract: Deep reinforcement learning has emerged as a promising and powerful technique for automatically acquiring control policies that can process raw sensory inputs, such as images, and perform complex behaviors. However, extending deep RL to real-world robotic tasks has proven challenging, particularly in safety-critical domains such as autonomous flight, where a trial-and-error learning process is often impractical. In this paper, we explore the following question: can we train vision-based navigation policies entirely in simulation, and then transfer them into the real world to achieve real-world flight without a single real training image? We propose a learning method that we call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models. Our method uses single RGB images from a monocular camera, without needing to explicitly reconstruct the 3D geometry of the environment or perform explicit motion planning. Our learned collision avoidance policy is represented by a deep convolutional neural network that directly processes raw monocular images and outputs velocity commands. This policy is trained entirely on simulated images, with a Monte Carlo policy evaluation algorithm that directly optimizes the network's ability to produce collision-free flight. By highly randomizing the rendering settings for our simulated training set, we show that we can train a policy that generalizes to the real world, without requiring the simulator to be particularly realistic or high-fidelity. We evaluate our method by flying a real quadrotor through indoor environments, and further evaluate the design choices in our simulator through a series of ablation studies on depth prediction. For supplementary video see: https://youtu.be/nXBWmzFrj5s

Authors (2)

Fereshteh Sadeghi (12 papers)
Sergey Levine (531 papers)

Citations (794)

View on Semantic Scholar

Summary

The paper presents a simulation-to-reality approach that trains vision-based RL policies exclusively on synthetic data for collision-free indoor flight.
It utilizes extensive randomization in CAD-modeled 3D hallways and deep convolutional networks with Monte Carlo policy evaluation to enhance generalization.
Experimental results in both simulated and real-world tests demonstrate superior collision avoidance and robust performance over baseline methods.

CAD $^2$ RL: Training Vision-Based RL Policies in Simulation for Real-World Flight

In the domain of robotic navigation, safe and efficient indoor flight through unstructured environments is a complex and essential task. This paper, authored by Fereshteh Sadeghi and Sergey Levine, addresses the challenge of training vision-based reinforcement learning (RL) policies for collision-free navigation in indoor environments using only simulated data. The core contribution of the proposed CAD $^2$ RL algorithm—a term short for Collision Avoidance via Deep Reinforcement Learning—is its ability to generalize from simulated training data to real-world applications without requiring any real images during training.

Key Contributions and Methodology

CAD $^2$ RL proposes a deep reinforcement learning approach that leverages a combination of synthetic 3D environments and deep neural networks to achieve collision avoidance. The primary innovation lies in training the neural network exclusively on a simulated dataset and then deploying it successfully in real-world scenarios. The key components of the approach include:

Simulated Training Environment: The training is conducted on a variety of synthetic indoor 3D hallways created using CAD models, where textures, lighting, and furniture placement are randomized to enhance generalization. The training hallways replicate typical indoor environments with varied geometrical structures and layouts.
Deep Convolutional Neural Network: A fully convolutional neural network processes monocular RGB images and outputs velocity commands. The network is initially pre-trained using a heuristic free-space detection task and further refined using a deep RL algorithm.
Monte Carlo Policy Evaluation: A Monte Carlo policy evaluation method is employed to optimize the network by simulating multiple rollouts for different actions from various states and training the network to predict long-horizon collision probabilities.
Randomization for Generalization: By substantially randomizing the rendering settings in the simulated training environment, the trained policy learns to handle diverse obstacle appearances and lighting conditions, which aids in real-world generalization.

Experimental Results

The efficacy of CAD $^2$ RL is validated through an extensive empirical evaluation, both in simulation and real-world environments:

Simulation Experiments: In controlled synthetic test hallways with and without furniture, CAD $^2$ RL outperformed baseline methods—including a state-of-the-art learning-based method—showing superior performance in terms of length of collision-free flight.
Realistic Simulation Evaluation: When evaluated on a realistically textured 3D mesh of an indoor hallway, CAD $^2$ RL exhibited robustness and maintained collision-free flight over longer distances compared to baseline methods.
Real-World Flight: The algorithm was tested on actual drones navigating through challenging indoor environments, including hallways and rooms in academic buildings, corroborating the generalization from simulation to reality.

Implications and Future Work

The results indicate that training policies entirely in simulation, with stringent randomization of training conditions, can yield effective real-world navigation capabilities for aerial robots. Notably, although CAD $^2$ RL occasionally experienced collisions in the real world, its performance surpassed other baseline methods, demonstrating the promise of this approach for training robust vision-based navigation policies.

The implications of this research are substantial for the development of autonomous systems, particularly for applications where collecting real-world training data is prohibitively expensive or unsafe. The approach's dependence on high-quality simulated environments can be both a strength and a limitation. If the simulated environments do not capture the necessary diversity or complexity, the trained policies might underperform in certain real-world scenarios.

Future developments could explore combining simulated training with a limited amount of real-world data, employing domain adaptation techniques, and incorporating additional sensory inputs like depth cameras to further enhance robustness and performance. Moreover, refinements in the simulation itself, including higher fidelity rendering and more complex environmental interactions, could improve generalization even further.

In conclusion, CAD $^2$ RL represents a significant step forward in leveraging simulated environments for training real-world autonomous navigation policies, providing a foundation for further advancements in the field of robotic vision and reinforcement learning.

PDF Markdown

Related Papers

YouTube

Show All Videos