Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adversarial Environment Design via Regret-Guided Diffusion Models (2410.19715v2)

Published 25 Oct 2024 in cs.LG and cs.AI

Abstract: Training agents that are robust to environmental changes remains a significant challenge in deep reinforcement learning (RL). Unsupervised environment design (UED) has recently emerged to address this issue by generating a set of training environments tailored to the agent's capabilities. While prior works demonstrate that UED has the potential to learn a robust policy, their performance is constrained by the capabilities of the environment generation. To this end, we propose a novel UED algorithm, adversarial environment design via regret-guided diffusion models (ADD). The proposed method guides the diffusion-based environment generator with the regret of the agent to produce environments that the agent finds challenging but conducive to further improvement. By exploiting the representation power of diffusion models, ADD can directly generate adversarial environments while maintaining the diversity of training environments, enabling the agent to effectively learn a robust policy. Our experimental results demonstrate that the proposed method successfully generates an instructive curriculum of environments, outperforming UED baselines in zero-shot generalization across novel, out-of-distribution environments. Project page: https://rllab-snu.github.io/projects/ADD

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, Feb. 2015.
  2. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, Jan. 2016.
  3. Learning to walk in minutes using massively parallel deep reinforcement learning. In Proceedings of the Conference on Robot Learning. PMLR, Dec. 2022.
  4. Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, Aug. 2023.
  5. Quantifying generalization in reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, Jun. 2019.
  6. A dissection of overfitting and generalization in continuous reinforcement learning. arXiv preprint arXiv:1806.07937, Jun. 2018.
  7. Nick Jakobi. Evolutionary robotics and the radical envelope-of-noise hypothesis. Adaptive Behavior, 6(2):325–368, Sep. 1997.
  8. CAD2RL: Real single-image flight without a single real image. In Proceedings of Robotics: Sicence and Systems, Jul. 2017.
  9. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the International Conference on Intelligent Robots and Systems. IEEE, Sep. 2017.
  10. Automatic curriculum learning for deep rl: A short survey. In Proceedings of the International Joint Conference on Artificial Intelligence, Mar. 2020.
  11. Curriculum learning for reinforcement learning domains: A framework and survey. Journal of Machine Learning Research, 21(1):7382–7431, Jan. 2020.
  12. Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753, Feb. 2019.
  13. CQM: Curriculum reinforcement learning with a quantized world model. In Advances in Neural Information Processing Systems, Dec. 2024.
  14. Emergent complexity and zero-shot transfer via unsupervised environment design. In Advances in Neural Information Processing Systems, Dec. 2020.
  15. Leonard J Savage. The theory of statistical decision. Journal of the American Statistical Association, 46(253):55–67, Mar. 1951.
  16. CLUTR: Curriculum learning via unsupervised task representation learning. In Proceedings of the International Conference on Machine Learning. PMLR, Jul. 2023.
  17. Enhancing the hierarchical environment design via generative trajectory modeling. arXiv preprint arXiv:2310.00301, Feb. 2024.
  18. Prioritized level replay. In Proceedings of the International Conference on Machine Learning. PMLR, Jul. 2021.
  19. Replay-guided adversarial environment design. In Advances in Neural Information Processing Systems, Dec. 2021.
  20. Evolving curricula with regret-based environment design. In Proceedings of the International Conference on Machine Learning. PMLR, Jul. 2022.
  21. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, Dec. 2020.
  22. Brian D Ziebart. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University, Dec. 2010.
  23. Curriculum reinforcement learning using optimal transport via gradual domain adaptation. In Advances in Neural Information Processing Systems, Dec. 2022.
  24. Outcome-directed reinforcement learning by uncertainty & temporal distance-aware curriculum goal generation. In Proceedings of International Conference on Learning Representations, Feb. 2023.
  25. Enhanced POET: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In Proceedings of the International Conference on Machine Learning. PMLR, Jul. 2020.
  26. Learning to generate levels from nothing. In Proceedings of the IEEE Conference on Games, Aug. 2021.
  27. On gradient-based learning in continuous games. SIAM Journal on Mathematics of Data Science, 2(1):103–131, May 2020.
  28. Human-timescale adaptation in an open-ended task space. In Proceedings of the International Conference on Machine Learning. PMLR, Jul. 2023.
  29. Distributionally robust policy learning via adversarial environment generation. IEEE Robotics and Automation Letters, 7(2):1379–1386, Apr. 2022.
  30. Learning robust real-world dexterous grasping policies via implicit shape augmentation. In Proceedings of the Conference on Robot Learning, Dec. 2022.
  31. Score-based generative modeling through stochastic differential equations. In Proceedings of the International Conference on Learning Representations, May 2021.
  32. Denoising diffusion implicit models. In Proceedings of the International Conference on Learning Representations, May 2020.
  33. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in neural information processing systems, Dec. 2022.
  34. Video diffusion models. In Advances in Neural Information Processing Systems, Dec. 2022.
  35. Planning with diffusion for flexible behavior synthesis. In Proceedings of the International Conference on Machine Learning, Jul. 2022.
  36. Diffused task-agnostic milestone planner. In Advances in Neural Information Processing Systems, Dec. 2023.
  37. Diffusion models beat gans on image synthesis. In Advances in Neural Information Processing Systems, Dec. 2021.
  38. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, Jul. 2022.
  39. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, Jun. 2022.
  40. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, Apr. 2022.
  41. Align your latents: High-resolution video synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2023.
  42. Advdiffuser: Natural adversarial example synthesis with diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Oct. 2023.
  43. Diffscene: Diffusion-based safety-critical scenario generation for autonomous vehicles. In Proceedings of the International Conference on Machine Learning Workshop on New Frontiers in Adversarial Machine Learning, Jul. 2023.
  44. Censored sampling of diffusion models using 3 minutes of human feedback. In Advances in Neural Information Processing Systems, Dec. 2023.
  45. Brian D.O. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, May 1982.
  46. Improved techniques for training score-based generative models. In Advances in Neural Information Processing Systems, Dec. 2020.
  47. Stabilizing unsupervised environment design with a learned adversary. In Proceedings of the Conference on Lifelong Learning Agents. PMLR, Aug. 2023.
  48. Deep surrogate assisted generation of environments. In Advances in Neural Information Processing Systems, Dec. 2022.
  49. A distributional perspective on reinforcement learning. In Proceedings of the International Conference on Machine Learning, Aug. 2017.
  50. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, Aug. 2017.
  51. Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. In Advances in Neural Information Processing Systems, Dec. 2023.
  52. Dred: Zero-shot transfer in reinforcement learning via data-regularised environment design. In Proceedings of the International Conference on Machine Learning. PMLR, Jul. 2024.
  53. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, Aug. 2008.
  54. Openai gym. arXiv:1606.01540, Jun. 2016.
  55. Apprenticeship learning using linear programming. In Proceedings of the International Conference on Machine Learning, Jul. 2008.
  56. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems, Dec. 2016.
  57. Maurice Sion. On general minimax theorems. Pacific Journal of Mathematics, 8(1):171–176, Mar. 1958.
  58. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention, Nov. 2015.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com