Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning (2407.20798v1)
Abstract: We introduce Diffusion Augmented Agents (DAAG), a novel framework that leverages LLMs, vision LLMs, and diffusion models to improve sample efficiency and transfer learning in reinforcement learning for embodied agents. DAAG hindsight relabels the agent's past experience by using diffusion models to transform videos in a temporally and geometrically consistent way to align with target instructions with a technique we call Hindsight Experience Augmentation. A LLM orchestrates this autonomous process without requiring human supervision, making it well-suited for lifelong learning scenarios. The framework reduces the amount of reward-labeled data needed to 1) finetune a vision LLM that acts as a reward detector, and 2) train RL agents on new tasks. We demonstrate the sample efficiency gains of DAAG in simulated robotics environments involving manipulation and navigation. Our results show that DAAG improves learning of reward detectors, transferring past experience, and acquiring new tasks - key abilities for developing efficient lifelong learning agents. Supplementary material and visualizations are available on our website https://sites.google.com/view/diffusion-augmented-agents/
- Do as i can, not as i say: Grounding language in robotic affordances, 2022.
- Flamingo: a visual language model for few-shot learning, 2022.
- Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
- Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023.
- On the opportunities and risks of foundation models, 2022.
- Robocat: A self-improving generalist agent for robotic manipulation, 2023.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control, 2023.
- Language models are few-shot learners, 2020.
- Scaling data-driven robotics with reward sketching and batch reinforcement learning, 2020.
- Decision transformer: Reinforcement learning via sequence modeling, 2021.
- Genaug: Retargeting behaviors to unseen situations via generative augmentation, 2023.
- Diffusion policy: Visuomotor policy learning via action diffusion, 2023.
- O. X.-E. Collaboration. Open x-embodiment: Robotic learning datasets and rt-x models, 2023.
- P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis, 2021.
- Towards a unified agent with foundation models, 2023.
- Go-explore: a new approach for hard-exploration problems, 2021.
- Foundation models in robotics: Applications, challenges, and the future. arXiv preprint arXiv:2312.07843, 2023.
- Gemini-Team. Gemini: A family of highly capable multimodal models, 2023.
- Retinagan: An object-aware approach to sim-to-real transfer, 2021.
- Denoising diffusion probabilistic models, 2020.
- Training compute-optimal large language models, 2022.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022.
- Voxposer: Composable 3d value maps for robotic manipulation with language models, 2023.
- Text2video-zero: Text-to-image diffusion models are zero-shot video generators, 2023.
- Offline reinforcement learning with implicit q-learning, 2021.
- Language models as zero-shot trajectory generators, 2023.
- Beyond pick-and-place: Tackling robotic stacking of diverse shapes. In Conference on Robot Learning, pages 1089–1131. PMLR, 2022.
- Code as policies: Language model programs for embodied control, 2023.
- Synthetic experience replay, 2023.
- Interactive language: Talking to robots in real time, 2022.
- Cacti: A framework for scalable multi-task multi-scene visual imitation learning, 2023.
- Playing atari with deep reinforcement learning, 2013.
- Self-imitation learning. In International Conference on Machine Learning, pages 3878–3887. PMLR, 2018.
- OpenAI. Gpt-4 technical report, 2023.
- Learning transferable visual models from natural language supervision, 2021.
- Zero-shot text-to-image generation, 2021.
- Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer, 2020.
- A generalist agent, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022.
- Creating multimodal interactive agents with imitation and self-supervised learning, 2022.
- Replay across experiments: A natural extension of off-policy rl, 2023.
- Large language models for robotics: Opportunities, challenges, and perspectives. arXiv preprint arXiv:2401.04334, 2024.
- A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432, 2023.
- Robotic skill acquisition via instruction augmentation with vision-language models, 2023.
- Scaling robot learning with semantically imagined experience. arXiv preprint arXiv:2302.11550, 2023a.
- Language to rewards for robotic skill synthesis, 2023b.
- Adding conditional control to text-to-image diffusion models, 2023.
- Unpaired image-to-image translation using cycle-consistent adversarial networks, 2020.
- Diffusion models for reinforcement learning: A survey, 2024.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.