Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement (1910.04417v4)

Published 10 Oct 2019 in cs.LG, cs.AI, cs.RO, and stat.ML

Abstract: This paper studies Learning from Observations (LfO) for imitation learning with access to state-only demonstrations. In contrast to Learning from Demonstration (LfD) that involves both action and state supervision, LfO is more practical in leveraging previously inapplicable resources (e.g. videos), yet more challenging due to the incomplete expert guidance. In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives. We first prove that the gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and the expert, if following the modeling approach of GAIL. More importantly, the upper bound of this gap is revealed by a negative causal entropy which can be minimized in a model-free way. We term our method as Inverse-Dynamics-Disagreement-Minimization (IDDM) which enhances the conventional LfO method through further bridging the gap to LfD. Considerable empirical results on challenging benchmarks indicate that our method attains consistent improvements over other LfO counterparts.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Chao Yang (333 papers)
  2. Xiaojian Ma (52 papers)
  3. Wenbing Huang (95 papers)
  4. Fuchun Sun (127 papers)
  5. Huaping Liu (97 papers)
  6. Junzhou Huang (137 papers)
  7. Chuang Gan (195 papers)
Citations (66)

Summary

We haven't generated a summary for this paper yet.