Offline Learning from Demonstrations and Unlabeled Experience (2011.13885v1)
Abstract: Behavior cloning (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations. However, BC does not effectively leverage what we will refer to as unlabeled experience: data of mixed and unknown quality without reward annotations. This unlabeled data can be generated by a variety of sources such as human teleoperation, scripted policies and other agents on the same robot. Towards data-driven offline robot learning that can use this unlabeled experience, we introduce Offline Reinforced Imitation Learning (ORIL). ORIL first learns a reward function by contrasting observations from demonstrator and unlabeled trajectories, then annotates all data with the learned reward, and finally trains an agent via offline reinforcement learning. Across a diverse set of continuous control and simulated robotic manipulation tasks, we show that ORIL consistently outperforms comparable BC agents by effectively leveraging unlabeled experience.
- Konrad Zolna (24 papers)
- Alexander Novikov (30 papers)
- Ksenia Konyushkova (16 papers)
- Caglar Gulcehre (71 papers)
- Ziyu Wang (137 papers)
- Yusuf Aytar (36 papers)
- Misha Denil (36 papers)
- Nando de Freitas (98 papers)
- Scott Reed (32 papers)