Investigating the dual-V formulation of ReCOIL

Investigate the dual-V formulation of ReCOIL for off-policy imitation learning, including its properties and practical utility.

Background

ReCOIL is introduced as a discriminator-free, off-policy imitation learning method that matches mixture distributions and is derived via a dual-Q formulation. The authors mention that a dual-V form of ReCOIL exists and is presented in the appendix.

However, they explicitly state that the dual-V variant’s investigation is deferred to future work, leaving open the task of analyzing, developing, and evaluating this formulation.

References

We also present the dual-V form for ReCOIL in Appendix~\ref{ap:closer} but defer its investigation for future work.

— Dual RL: Unification and New Methods for Reinforcement and Imitation Learning (2302.08560 - Sikchi et al., 2023) in Section 4 (ReCOIL: Imitation Learning from Arbitrary Experience)

Investigating the dual-V formulation of ReCOIL

Background

References

Related Problems