Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning (1609.05143v1)

Published 16 Sep 2016 in cs.CV

Abstract: Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new target goals, and (2) data inefficiency i.e., the model requires several (and often costly) episodes of trial and error to converge, which makes it impractical to be applied to real-world scenarios. In this paper, we address these two issues and apply our model to the task of target-driven visual navigation. To address the first issue, we propose an actor-critic model whose policy is a function of the goal as well as the current state, which allows to better generalize. To address the second issue, we propose AI2-THOR framework, which provides an environment with high-quality 3D scenes and physics engine. Our framework enables agents to take actions and interact with objects. Hence, we can collect a huge number of training samples efficiently. We show that our proposed method (1) converges faster than the state-of-the-art deep reinforcement learning methods, (2) generalizes across targets and across scenes, (3) generalizes to a real robot scenario with a small amount of fine-tuning (although the model is trained in simulation), (4) is end-to-end trainable and does not need feature engineering, feature matching between frames or 3D reconstruction of the environment. The supplementary video can be accessed at the following link: https://youtu.be/SmBxMDiOrvs.

PDF Abstract

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

The paper "Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning" by Yuke Zhu et al. addresses two significant challenges in deep reinforcement learning (DRL): lack of generalization capability to new target goals and data inefficiency. The focus of the paper is on the task of target-driven visual navigation in indoor environments. The authors propose an actor-critic model where the policy is a function of both the goal and the current state. This approach aims to improve generalization across targets within a scene and to different scenes.

Key Contributions

Target-driven Actor-Critic Model: The model integrates target information explicitly into the policy function, which is different from conventional DRL models where the target is implicitly embedded. This allows the model to generalize better to new targets without the need for retraining.
AI2-THOR Framework: To address data inefficiency and the challenge of simulating realistic environments, the authors introduce the AI2-THOR framework. This high-quality 3D simulation environment allows for efficient data collection and interaction modeling, thus enabling the training of DRL models at scale.
Simulation to Real-world Transfer: The model trained in the AI2-THOR simulated environment can be transferred to real-world scenarios with minimal fine-tuning. This is demonstrated through a real-robot navigation task, showcasing the practical applicability of the proposed approach.

Numerical Results and Evaluation

The authors present comprehensive evaluations of the proposed model across several dimensions:

Comparison with Baselines: The proposed model outperforms traditional DRL models such as A3C and One-step Q-learning in terms of data efficiency and convergence speed. The target-driven actor-critic model achieves shorter average trajectory lengths compared to baselines, indicating more efficient navigation.
Generalization Across Targets: The model's ability to generalize to new targets within the same scene is evaluated. It consistently exhibits higher success rates in reaching unseen targets compared to the baseline models, illustrating the effectiveness of integrating goal information directly into the policy.
Generalization Across Scenes: The paper evaluates the model's ability to generalize across different scenes. The results demonstrate that pre-training on multiple scenes and fine-tuning scene-specific layers lead to faster convergence and better generalization in new environments.

Implications and Future Directions

Practical Applications: The ability of the model to generalize across scenes and targets makes it particularly suitable for deployment in real-world applications where robots need to navigate dynamically changing and previously unseen environments. Applications can range from domestic robots performing household tasks to autonomous agents in commercial settings like warehouses and retail stores.
Scalability and Efficiency: The AI2-THOR framework provides a scalable and efficient platform for training and evaluating DRL models. Its high-quality simulations closely mimic real-world environments, reducing the sim-to-real gap and allowing for practical deployment of trained models with minimal adjustments.

Future Research Directions

Enhancing Environmental Complexity: Future work could involve increasing the number and diversity of 3D scenes within the AI2-THOR framework to further test the robustness and adaptability of navigation models.
Expanding Interaction Capabilities: While the current focus is on navigation, extending the model to handle object manipulation and more complex interactions could enhance its applicability in tasks requiring fine-grained physical interactions.
Improving Simulation Realism: Continued efforts to narrow the gap between simulated and real-world environments will improve transfer learning capabilities, reducing the need for extensive fine-tuning when deploying models in real-world scenarios.

In conclusion, this paper presents a comprehensive framework for target-driven visual navigation that addresses key limitations in current DRL approaches. The integration of target information into the policy function, supported by a robust simulation environment, opens new possibilities for efficient and scalable training of navigation models capable of operating in complex real-world settings.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Yuke Zhu (134 papers)
Roozbeh Mottaghi (66 papers)
Eric Kolve (13 papers)
Joseph J. Lim (36 papers)
Abhinav Gupta (178 papers)
Li Fei-Fei (199 papers)
Ali Farhadi (138 papers)

Citations (1,458)

View on Semantic Scholar

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning (1609.05143v1)