Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning (2402.17511v1)
Abstract: Language-conditioned robot behavior plays a vital role in executing complex tasks by associating human commands or instructions with perception and actions. The ability to compose long-horizon tasks based on unconstrained language instructions necessitates the acquisition of a diverse set of general-purpose skills. However, acquiring inherent primitive skills in a coupled and long-horizon environment without external rewards or human supervision presents significant challenges. In this paper, we evaluate the relationship between skills and language instructions from a mathematical perspective, employing two forms of mutual information within the framework of language-conditioned policy learning. To maximize the mutual information between language and skills in an unsupervised manner, we propose an end-to-end imitation learning approach known as Language Conditioned Skill Discovery (LCSD). Specifically, we utilize vector quantization to learn discrete latent skills and leverage skill sequences of trajectories to reconstruct high-level semantic instructions. Through extensive experiments on language-conditioned robotic navigation and manipulation tasks, encompassing BabyAI, LORel, and CALVIN, we demonstrate the superiority of our method over prior works. Our approach exhibits enhanced generalization capabilities towards unseen tasks, improved skill interpretability, and notably higher rates of task completion success.
- Variational option discovery algorithms. arXiv preprint arXiv:1807.10299.
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691.
- Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
- Explore, discover and learn: Unsupervised discovery of state-covering skills. In International Conference on Machine Learning, 1317–1327. PMLR.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34: 15084–15097.
- Babyai: A platform to study the sample efficiency of grounded language learning. arXiv preprint arXiv:1810.08272.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070.
- Contrastive learning as goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems, 35: 35603–35620.
- LISA: Learning Interpretable Skill Abstractions from Language. In Oh, A. H.; Agarwal, A.; Belgrave, D.; and Cho, K., eds., Advances in Neural Information Processing Systems.
- Variational intrinsic control. arXiv preprint arXiv:1611.07507.
- Vision-and-language navigation: A survey of tasks, methods, and future directions. arXiv preprint arXiv:2203.12667.
- Instruction-driven history-aware policies for robotic manipulations. In Conference on Robot Learning, 175–187. PMLR.
- Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. arXiv preprint arXiv:1910.11956.
- Latent space policies for hierarchical reinforcement learning. In International Conference on Machine Learning, 1851–1860. PMLR.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33: 6840–6851.
- Fast decoding in sequence models using discrete latent variables. In International Conference on Machine Learning, 2390–2399. PMLR.
- Unsupervised reinforcement learning with contrastive intrinsic control. Advances in Neural Information Processing Systems, 35: 34478–34491.
- Choreographer: Learning and Adapting Skills in Imagination. arXiv preprint arXiv:2211.13350.
- Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks. IEEE Robotics and Automation Letters, 7(3): 7327–7334.
- Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation. CoRR, abs/2109.01115.
- Controllability-Aware Unsupervised Skill Discovery. arXiv preprint arXiv:2302.05103.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763. PMLR.
- Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657.
- Skill-based model-based reinforcement learning. arXiv preprint arXiv:2207.07560.
- Cliport: What and where pathways for robotic manipulation. In Conference on Robot Learning, 894–906. PMLR.
- Perceiver-actor: A multi-task transformer for robotic manipulation. In Conference on Robot Learning, 785–799. PMLR.
- Skill decision transformer. arXiv preprint arXiv:2301.13573.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, 5026–5033. IEEE.
- Neural discrete representation learning. Advances in neural information processing systems, 30.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, 1094–1100. PMLR.
- Vlmbench: A compositional benchmark for vision-and-language manipulation. Advances in Neural Information Processing Systems, 35: 665–678.
- Zhaoxun Ju (1 paper)
- Chao Yang (333 papers)
- Hongbo Wang (29 papers)
- Yu Qiao (563 papers)
- Fuchun Sun (127 papers)