Imitation Learning as $f$-Divergence Minimization (1905.12888v2)

Published 30 May 2019 in cs.LG, cs.IT, cs.RO, math.IT, and stat.ML

Abstract: We address the problem of imitation learning with multi-modal demonstrations. Instead of attempting to learn all modes, we argue that in many tasks it is sufficient to imitate any one of them. We show that the state-of-the-art methods such as GAIL and behavior cloning, due to their choice of loss function, often incorrectly interpolate between such modes. Our key insight is to minimize the right divergence between the learner and the expert state-action distributions, namely the reverse KL divergence or I-projection. We propose a general imitation learning framework for estimating and minimizing any f-Divergence. By plugging in different divergences, we are able to recover existing algorithms such as Behavior Cloning (Kullback-Leibler), GAIL (Jensen Shannon) and Dagger (Total Variation). Empirical results show that our approximate I-projection technique is able to imitate multi-modal behaviors more reliably than GAIL and behavior cloning.

PDF Abstract

Imitation Learning as f-Divergence Minimization

The paper "Imitation Learning as f-Divergence Minimization" presents a novel perspective on imitation learning (IL) by viewing it as a problem of minimizing f-divergences between the trajectory distributions of the learner and the expert. Authored by Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee, and Siddhartha Srinivasa, the research introduces a unifying framework that encompasses existing imitation learning algorithms such as Behavior Cloning, GAIL, and DAgger as specific instances of f-divergence minimization.

Problem Formulation and Contributions

Imitation learning traditionally involves a learner attempting to mimic the demonstrated behavior of an expert, often represented as trajectories in an Markov Decision Process (MDP). The authors critique existing techniques such as behavior cloning and generative adversarial imitation learning (GAIL) for their inadequacies in dealing with multi-modal expert demonstrations. They propose that when experts demonstrate diverse, near-optimal solutions, a more suitable approach is to imitate just one mode rather than learning all of them. The proposed framework evaluates imitation learning as a divergence minimization problem, specifically within the family of f-divergences.

The paper makes several key contributions:

Unifying Framework: The authors introduce a conceptual framework that generalizes imitation learning as a task aiming to minimize the f-divergence between the learner's and the expert's trajectory distributions.
Divergence-Specific Algorithms: By selecting different f-divergences, the framework recovers well-known IL algorithms like Behavior Cloning with Kullback-Leibler (KL) divergence, GAIL with Jensen-Shannon divergence, and DAgger with Total Variation.
Reverse KL for Multi-Modal Inputs: The authors highlight the reverse KL (I-projection) divergence for its mode-seeking properties, which allow it to handle multi-modal demonstrations effectively by collapsing onto a subset of modes rather than interpolating between them as seen with other divergence measures like KL.
Empirical Evaluation: The paper showcases empirical results demonstrating the superiority of reverse KL in replicating multi-modal expert behavior without deviation, thereby avoiding unsafe interpolations that could result from other divergences.

Implications and Future Directions

The paper's theoretical and empirical analysis offers insights into how the choice of divergence measure influences the safety and reliability of learned behaviors in imitation learning. The mode-seeking nature of reverse KL is particularly highlighted as advantageous for tasks where learning and replicating a single mode of behavior is sufficient and safer.

From a practical perspective, the ability to tailor the imitation learning algorithm to specific types of tasks by selecting different f-divergences has significant utility implications for robotics and autonomous systems, which often deal with multi-modal human demonstrations.

Theoretically, this research invites further exploration into the role of divergence properties in other machine learning paradigms, potentially providing insights into robust and targeted learning strategies across varied domains. One speculative direction could include exploiting integral probability metrics such as Maximal Mean Discrepancy as alternative divergence measures.

Additionally, ensuring minimal assumptions about divergence estimability without requiring exact expert policy access opens up various robust learning settings like learning from observations, which could be pivotal in scenarios where expert actions are partially observed or completely unavailable.

The research successfully reframes imitation learning through the lens of probability divergences, unlocking new dimensions of understanding and capacity in AI systems focused on learning from demonstration.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Liyiming Ke (13 papers)
Sanjiban Choudhury (62 papers)
Matt Barnes (12 papers)
Wen Sun (124 papers)
Gilwoo Lee (11 papers)
Siddhartha Srinivasa (52 papers)

Citations (148)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos