Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust Policy Learning via Offline Skill Diffusion (2403.00225v3)

Published 1 Mar 2024 in cs.LG, cs.AI, and cs.RO

Abstract: Skill-based reinforcement learning (RL) approaches have shown considerable promise, especially in solving long-horizon tasks via hierarchical structures. These skills, learned task-agnostically from offline datasets, can accelerate the policy learning process for new tasks. Yet, the application of these skills in different domains remains restricted due to their inherent dependency on the datasets, which poses a challenge when attempting to learn a skill-based policy via RL for a target domain different from the datasets' domains. In this paper, we present a novel offline skill learning framework DuSkill which employs a guided Diffusion model to generate versatile skills extended from the limited skills in datasets, thereby enhancing the robustness of policy learning for tasks in different domains. Specifically, we devise a guided diffusion-based skill decoder in conjunction with the hierarchical encoding to disentangle the skill embedding space into two distinct representations, one for encapsulating domain-invariant behaviors and the other for delineating the factors that induce domain variations in the behaviors. Our DuSkill framework enhances the diversity of skills learned offline, thus enabling to accelerate the learning procedure of high-level policies for different domains. Through experiments, we show that DuSkill outperforms other skill-based imitation learning and RL algorithms for several long-horizon tasks, demonstrating its benefits in few-shot imitation and online RL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. Is Conditional Generative Modeling All You Need for Decision-Making? In Proceedings of the 11th International Conference on Learning Representations.
  2. Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets. arXiv:2304.08742.
  3. Behavioural cloning in control of a dynamic system. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics., 2904–2909.
  4. Hierarchical Few-Shot Imitation with Skill Transition Models. In Proceedings of the 10th International Conference on Learning Representations.
  5. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In Proceedings of the 4th International Conference on Learning Representations.
  6. Denoising Diffusion Probabilistic Models. In Proceedings of the 34th Conference on Neural Information Processing System.
  7. Classifier-Free Diffusion Guidance. arXiv:2207.12598.
  8. AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners. In Proceedings of the 40th International Conference on Machine Learning.
  9. Learning and Retrieval from Prior Data for Skill-based Imitation Learning. In Proceedings of the 6th Conference on Robot Learning, 2181–2204.
  10. Imitating Human Behaviour with Diffusion Models. In Proceedings of the 11th International Conference on Learning Representations.
  11. Accelerating Reinforcement Learning with Learned Skill Priors. In Proceedings of the 4th Conference on Robot Learning, 188–204.
  12. Demonstration-Guided Reinforcement Learning with Learned Skills. In Proceedings of the 5th Conference on Robot Learning, 729–739.
  13. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the Conference on Computer Vision and Pattern Recorgnition, 10609–10619.
  14. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the Conference on Computer Vision and Pattern Recorgnition, 10674–10685.
  15. Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning. In Proceedings of the 11th International Conference on Learning Representations.
  16. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning. In Proceedings of the 3rd Conference on Robot Learning, 1094–1100.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Woo Kyung Kim (6 papers)
  2. Minjong Yoo (9 papers)
  3. Honguk Woo (16 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com