Closing the agent generalization gap via stronger imitation policies
Determine whether increasing the capability of the behavioral cloning reference policy—using methods such as Generative Adversarial Imitation Learning or Diffusion Policies—suffices to close the agent generalization gap observed when policies trained in self-play are evaluated against unseen human drivers in log-replay.
References
Additionally, it is still to be seen if the agent generalization gap can be closed simply by increasing the capability of the BC policy using more complex imitation methods such as GAIL~\citep{ho2016generative} or better architectures such as Diffusion Policies~\citep{chi2023diffusion}.
— Human-compatible driving partners through data-regularized self-play reinforcement learning
(2403.19648 - Cornelisse et al., 28 Mar 2024) in Conclusion and future work