Enhancing data efficiency in reinforcement learning: a novel imagination mechanism based on mesh information propagation (2309.14243v2)
Abstract: Reinforcement learning(RL) algorithms face the challenge of limited data efficiency, particularly when dealing with high-dimensional state spaces and large-scale problems. Most of RL methods often rely solely on state transition information within the same episode when updating the agent's Critic, which can lead to low data efficiency and sub-optimal training time consumption. Inspired by human-like analogical reasoning abilities, we introduce a novel mesh information propagation mechanism, termed the 'Imagination Mechanism (IM)', designed to significantly enhance the data efficiency of RL algorithms. Specifically, IM enables information generated by a single sample to be effectively broadcasted to different states across episodes, instead of simply transmitting in the same episode. This capability enhances the model's comprehension of state interdependencies and facilitates more efficient learning of limited sample information. To promote versatility, we extend the IM to function as a plug-and-play module that can be seamlessly and fluidly integrated into other widely adopted RL algorithms. Our experiments demonstrate that IM consistently boosts four mainstream SOTA RL algorithms, such as SAC, PPO, DDPG, and DQN, by a considerable margin, ultimately leading to superior performance than before across various tasks. For access to our code and data, please visit https://github.com/OuAzusaKou/imagination_mechanism
- Agent57: Outperforming the atari human benchmark. In International conference on machine learning, pp. 507–517. PMLR, 2020.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
- Openai gym, 2016.
- Jumping nlp curves: A review of natural language processing research. IEEE Computational intelligence magazine, 9(2):48–57, 2014.
- Challenges of real-world reinforcement learning. arXiv preprint arXiv:1904.12901, 2019.
- Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694, 2020.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp. 1861–1870. PMLR, 2018.
- Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738, 2020.
- Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374, 2019.
- Kacper Piotr Kielak. Do recent advancements in model-based deep reinforcement learning really improve data efficiency? 2019.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pp. 5639–5650. PMLR, 2020a.
- Reinforcement learning with augmented data. Advances in neural information processing systems, 33:19884–19895, 2020b.
- Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model. Advances in Neural Information Processing Systems, 33:741–752, 2020.
- Relational complexity influences analogical reasoning ability. Iscience, 26(4), 2023.
- Achieving fairness at no utility cost via data reweighing with influence. In International Conference on Machine Learning, pp. 12917–12930. PMLR, 2022.
- Interactive learning from policy-dependent human feedback. In International conference on machine learning, pp. 2285–2294. PMLR, 2017.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Trust region policy optimization. In International conference on machine learning, pp. 1889–1897. PMLR, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929, 2020.
- Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems, 34:12686–12699, 2021.
- Deterministic policy gradient algorithms. In International conference on machine learning, pp. 387–395. Pmlr, 2014.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- Robert J Sternberg. Intelligence, information processing, and analogical reasoning: The componential analysis of human abilities. Lawrence Erlbaum, 1977.
- The development of analogical reasoning processes. Journal of experimental child psychology, 27(2):195–232, 1979.
- Richard S Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in neural information processing systems, 8, 1995.
- Reinforcement learning: An introduction. MIT press, 2018.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE, 2012. doi: 10.1109/IROS.2012.6386109.
- Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
- When to use parametric models in reinforcement learning? Advances in Neural Information Processing Systems, 32, 2019.
- Fairness-aware unsupervised feature selection. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 3548–3552, 2021.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
- Zihang Wang (14 papers)
- Maowei Jiang (5 papers)