Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diffusion Reward: Learning Rewards via Conditional Video Diffusion (2312.14134v3)

Published 21 Dec 2023 in cs.LG, cs.CV, and cs.RO

Abstract: Learning rewards from expert videos offers an affordable and effective solution to specify the intended behaviors for reinforcement learning (RL) tasks. In this work, we propose Diffusion Reward, a novel framework that learns rewards from expert videos via conditional video diffusion models for solving complex visual RL problems. Our key insight is that lower generative diversity is exhibited when conditioning diffusion on expert trajectories. Diffusion Reward is accordingly formalized by the negative of conditional entropy that encourages productive exploration of expert behaviors. We show the efficacy of our method over robotic manipulation tasks in both simulation platforms and the real world with visual input. Moreover, Diffusion Reward can even solve unseen tasks successfully and effectively, largely surpassing baseline methods. Project page and code: https://diffusion-reward.github.io.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Is conditional generative modeling all you need for decision-making? In International Conference on Learning Representations (ICLR), 2023.
  2. Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 2015.
  3. Exploration by random network distillation. In International Conference on Learning Representations (ICLR), 2019.
  4. Pix2video: Video editing using image diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
  5. Learning generalizable robotic reward functions from ”in-the-wild” human videos. In Robotics: Science and Systems (RSS), 2021.
  6. Genaug: Retargeting behaviors to unseen situations via generative augmentation. ArXiv, abs/2302.06671, 2023.
  7. Diffusion policy: Visuomotor policy learning via action diffusion. In Robotics: Science and Systems (RSS), 2023.
  8. Learning universal policies via text-guided video generation. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
  9. Video prediction models as rewards for reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
  10. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  11. Structure and content-guided video synthesis with diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
  12. Vector quantized diffusion model for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  13. On pre-training for visuo-motor control: Revisiting a learning-from-scratch baseline. In International Conference on Machine Learning (ICML), 2023.
  14. Idql: Implicit q-learning as an actor-critic method with diffusion policies. ArXiv, abs/2304.10573, 2023.
  15. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  16. Imagen video: High definition video generation with diffusion models. ArXiv, abs/2210.02303, 2022a.
  17. Video diffusion models. ArXiv, abs/2204.03458, 2022b.
  18. Instructed diffuser with temporal condition guidance for offline reinforcement learning. ArXiv, abs/2306.04875, 2023.
  19. Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning (ICML), 2022.
  20. Auto-encoding variational bayes. ArXiv, abs/1312.6114, 2013.
  21. Learning to Act from Actionless Video through Dense Correspondences. ArXiv, abs/2310.08576, 2023.
  22. Language conditioned imitation learning over unstructured data. In Robotics: Science and Systems (RSS), 2020.
  23. Vip: Towards universal visual reward and representation via value-implicit pre-training. In International Conference on Learning Representations (ICLR), 2023.
  24. Dreamix: Video diffusion models are general video editors. ArXiv, abs/2302.01329, 2023.
  25. Planning with goal-conditioned policies. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
  26. Extracting reward functions from diffusion models. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
  27. Amp: Adversarial motion priors for stylized physics-based character control. ACM Trans. Graph (ToG)., 2021.
  28. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In Robotics: Science and Systems (RSS), 2018.
  29. Unsupervised perceptual rewards for imitation learning. ArXiv, abs/1612.06699, 2016.
  30. Time-contrastive networks: Self-supervised learning from video. In IEEE International Conference on Robotics and Automation (ICRA), 2017.
  31. End-to-end robotic reinforcement learning without reward engineering. ArXiv, abs/1904.07854, 2019.
  32. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning (ICML), 2015.
  33. Denoising diffusion implicit models. In International Conference on Learning Representations (ICLR), 2022.
  34. Score-based generative modeling through stochastic differential equations. ArXiv, abs/2011.13456, 2020.
  35. Generative adversarial imitation from observation. ArXiv, abs/1807.06158, 2018.
  36. Vrl3: A data-driven framework for visual deep reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  37. Diffusion policies as an expressive policy class for offline reinforcement learning. In International Conference on Learning Representations (ICLR), 2023.
  38. Videogpt: Video generation using vq-vae and transformers. ArXiv, abs/2104.10157, 2021.
  39. Mastering visual continuous control: Improved data-augmented reinforcement learning. In International Conference on Learning Representations (ICLR), 2022.
  40. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning (CoRL), 2020.
  41. Scaling robot learning with semantically imagined experience. ArXiv, abs/2302.11550, 2023.
  42. Xirl: Cross-embodiment inverse reinforcement learning. In Conference on Robot Learning (CoRL), 2022.
  43. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  44. Taco: Temporal latent action-driven contrastive loss for visual reinforcement learning. ArXiv, abs/2306.13229, 2023.
  45. Toward multimodal image-to-image translation. Advances in Neural Information Processing Systems (NeurIPS), 2017.
  46. Maximum entropy inverse reinforcement learning. In Association for the Advancement of Artificial Intelligence (AAAI), 2008.
Citations (13)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com