Diffusion Actor-Critic with Entropy Regulator (2405.15177v4)
Abstract: Reinforcement learning (RL) has proven highly effective in addressing complex decision-making and control tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution with learned mean and variance, which constrains their capability to acquire complex policies. In response to this problem, we propose an online RL algorithm termed diffusion actor-critic with entropy regulator (DACER). This algorithm conceptualizes the reverse process of the diffusion model as a novel policy function and leverages the capability of the diffusion model to fit multimodal distributions, thereby enhancing the representational capacity of the policy. Since the distribution of the diffusion policy lacks an analytical expression, its entropy cannot be determined analytically. To mitigate this, we propose a method to estimate the entropy of the diffusion policy utilizing Gaussian mixture model. Building on the estimated entropy, we can learn a parameter $\alpha$ that modulates the degree of exploration and exploitation. Parameter $\alpha$ will be employed to adaptively regulate the variance of the added noise, which is applied to the action output by the diffusion model. Experimental trials on MuJoCo benchmarks and a multimodal task demonstrate that the DACER algorithm achieves state-of-the-art (SOTA) performance in most MuJoCo control tasks while exhibiting a stronger representational capacity of the diffusion policy.
- An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning, pages 104–114. PMLR, 2020.
- Is conditional generative modeling all you need for decision-making? The Eleventh International Conference on Learning Representations, 2023.
- Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127, 2023.
- Offline rl without off-policy evaluation. Advances in Neural Information Processing Systems, 34:4933–4946, 2021.
- Boosting continuous control with consistency policy. arXiv preprint arXiv:2310.06343, 2023.
- Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
- Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9329–9338, 2019.
- Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
- Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE Transactions on Neural Networks and Learning Systems, 33(11):6584–6598, 2021.
- Dsac-t: Distributional soft actor-critic with three refinements. arXiv preprint arXiv:2310.05858, 2023.
- Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning, pages 1587–1596. PMLR, 2018.
- Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning, pages 2052–2062. PMLR, 2019.
- Integrated decision and control: Toward interpretable and computationally efficient driving intelligence. IEEE Transactions on Cybernetics, 53(2):859–873, 2022.
- Reinforcement learning with deep energy-based policies. In International Conference on Machine Learning, pages 1352–1361. PMLR, 2017.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pages 1861–1870. PMLR, 2018.
- Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Efficient diffusion policies for offline reinforcement learning. Conference and Workshop on Neural Information Processing Systems, 2023.
- Efficient diffusion policies for offline reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
- Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Actor-critic algorithms. Advances in Neural Information Processing Systems, 12, 1999.
- S Eben Li. Reinforcement Learning for Sequential Decision and Optimal Control. Springer Verlag, Singapore, 2023.
- Learning to drive by imitation: An overview of deep behavior cloning methods. IEEE Transactions on Intelligent Vehicles, 6(2):195–209, 2020.
- Diganta Misra. Mish: A self regularized non-monotonic activation function. arXiv preprint arXiv:1908.08681, 2019.
- Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023.
- End-to-end autonomous driving through dueling double deep q-network. Automotive Innovation, 4:328–337, 2021.
- Learning a diffusion model policy from rewards via q-score matching. arXiv preprint arXiv:2312.11752, 2023.
- Trust region policy optimization. In International Conference on Machine Learning, pages 1889–1897. PMLR, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Deterministic policy gradient algorithms. In International Conference on Machine Learning, pages 387–395. Pmlr, 2014.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
- Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34:1415–1428, 2021.
- Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Reinforcement learning: An introduction. MIT press, 2018.
- Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems, 2012.
- Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
- Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017.
- Gops: A general optimal control problem solver for autonomous driving and industrial control applications. Communications in Transportation Research, 3:100096, 2023.
- Diffusion policies as an expressive policy class for offline reinforcement learning. The Eleventh International Conference on Learning Representations, 2023.
- Saining Xie Xinlei Chen, Zhuang Liu and Kaiming He. Deconstructing denoising diffusion models for self-supervised learning. arXiv preprint arXiv:2401.14404, 2024.
- Policy representation via diffusion probability model for reinforcement learning. arXiv preprint arXiv:2305.13122, 2023.
- Diffusion probabilistic modeling for video generation. Entropy, 25(10):1469, 2023.
- Yinuo Wang (7 papers)
- Likun Wang (4 papers)
- Yuxuan Jiang (51 papers)
- Wenjun Zou (4 papers)
- Tong Liu (316 papers)
- Xujie Song (5 papers)
- Wenxuan Wang (128 papers)
- Liming Xiao (3 papers)
- Jiang Wu (58 papers)
- Jingliang Duan (42 papers)
- Shengbo Eben Li (98 papers)