Attention Loss Adjusted Prioritized Experience Replay (2309.06684v2)
Abstract: Prioritized Experience Replay (PER) is a technical means of deep reinforcement learning by selecting experience samples with more knowledge quantity to improve the training rate of neural network. However, the non-uniform sampling used in PER inevitably shifts the state-action space distribution and brings the estimation error of Q-value function. In this paper, an Attention Loss Adjusted Prioritized (ALAP) Experience Replay algorithm is proposed, which integrates the improved Self-Attention network with Double-Sampling mechanism to fit the hyperparameter that can regulate the importance sampling weights to eliminate the estimation error caused by PER. In order to verify the effectiveness and generality of the algorithm, the ALAP is tested with value-function based, policy-gradient based and multi-agent reinforcement learning algorithms in OPENAI gym, and comparison studies verify the advantage and efficiency of the proposed training framework.
- D. Silver, A. Huang and C. J. Maddison, “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
- Z. Xiao, X. Li and L. Wang, “Using convolution control block for chinese sentiment analysis,” Journal of Parallel and Distributed Computing, vol. 116, pp. 18–26, 2018.
- T. Fan, P. Long, W, Liu, and J. Pan, “Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios,” The International Journal of Robotics Research, vol. 39, no. 7, pp. 856–892, 2020.
- L. Lin, “Self-improving reactive agents based on reinforcement learning, planning and teaching,“ Machine Learning, vol. 8, no. 3, pp. 293–321, 1992.
- Z. Bing, “Solving robotic manipulation with sparse reward reinforcement learning via graph-based diversity and proximity,” IEEE Transactions on Industrial Electronics, vol. 70, no. 3, pp. 2759–2769, 2023.
- T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” in International Conference on Learning Representations, 2016.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv:1312.5602, 2013.
- V. Mnih, K. Kavukcuoglu, D. Silver, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
- J. Sharma, P. A. Andersen, O. C. Granmo and M. Goodwin, “Deep q-learning with Q-matrix transfer learning for novel fire evacuation environment,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 12, pp. 7363–7381, 2021.
- G. Tesauro, “Extending Q-learning to general adaptive multi-agent systems,” in Proceedings of the 16th International Conference on Neural Information Processing Systems, pp. 871–878, 2004.
- R. Yang, D. Wang, and J. Qiao, “Policy gradient adaptive critic design with dynamic prioritized experience replay for wastewater treatment process control,” IEEE Transactions on Industrial Informatics, vol. 18, no. 5, pp. 3150–3158, 2022.
- F. Sovrano, A. Raymond, and A. Prorok, “Explanation-aware experience replay in rule-dense environments,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 898–905, 2022.
- G. Kaan, G. Hakan, “Generalized huber loss for robust learning and its dfficient minimization for a robust statistics,” arXiv:2108.12627, 2021.
- S. Fujimoto, D. Meger, and D. Precup “An equivalence between loss functions and non-uniform sampling in experience replay,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, vol. 33, pp. 14219–14230, 2020.
- B. Saglam, F. B. Mutlu, D. C. Cicek, and S. S. Kozat, “Actor prioritized experience replay,” arXiv:2209.00532, 2022.
- P. Timothy, J. Jonathan, “Continuous control with deep reinforcement learning,” arXiv:1509.02971, 2015.
- R. Lowe, Y. Wu, A. Tamar, J. Harb, O. P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” in Proceedings of the 31th International Conference on Neural Information Processing Systems, pp. 6379–6390, 2017.
- G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba“Openai gym,” arXiv:1606.01540, 2016.
- A. W. Moore and C. G. Atkeson, “ Prioritized sweeping: Reinforcement learning with less data and less time,” Machine learning, vol. 13, no. 1, pp. 103–130, 1993.
- A. David, F. Nir, and P. Ronald, “Generalized prioritized sweeping,” in Proceedings of the 10th International Conference on Neural Information Processing Systems, pp. 1001–1007, 1997.
- H. V. Seijen, R. S, Sutton, “ Planning by prioritized sweeping with small backups,” arXiv:1301.2343, 2013.
- X. Tao and A. S. Hafid, “DeepSensing: A Novel Mobile Crowdsensing Framework With Double Deep Q-Network and Prioritized Experience Replay,” IEEE Internet of Things Journal, vol. 7, no. 12, pp. 11547–11558, 2020.
- Y. Hou, L. Liu, Q. Wei, X. Xu and C. Chen, “A novel DDPG method with prioritized experience replay,” in IEEE International Conference on Systems, Man, and Cybernetics, pp. 316–321, 2017.
- J. Lu, Y. B. Zhao, Y. Kang, Y. Wang and Y. Deng, “Strategy generation based on DDPG with prioritized experience replay for UCAV,” in International Conference on Advanced Robotics and Mechatronics, pp. 157–162, 2022.
- M. Hessel, J. Modayil, V. H. Hasselt, and T. Schaul,“Rainbow: Combining improvements in deep reinforcement learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, pp. 3215–3222, 2018.
- R. Liu, J. Zou, “The effects of memory replay in reinforcement learning,” in Proceedings of the Annual Allerton Conference on Communication, Control, and Computing, pp. 478–485, 2018.
- H. Vanseijen, R. S. Sutton, “A deeper look at planning as learning from replay,” in Proceedings of the International Conference on Machine Learning, pp. 2314–2322, 2015.
- K. H. Shen and P. Y. Tsai, “Memory reduction through experience classification for deep reinforcement learning with prioritized experience replay,” in IEEE International Workshop on Signal Processing Systems, pp. 166–171, 2019.
- J. Gao, X. Li, W. Liu and J. Zhao, “Prioritized experience replay method based on experience reward,” in International Conference on Machine Learning and Intelligent Systems Engineering, pp. 214–219, 2021.
- C. Kang, C. Rong, W. Ren, F. Huo, and P. Liu, “Deep deterministic policy gradient based on double network prioritized experience replay,” IEEE Access, vol. 9, pp. 60296–60308, 2021.
- S. Iqbal, F. Sha, “Actor-attention-critic for Multi-agent reinforcement learning,” in Proceedings of the 36th International Conference on Machine Learning Research, pp. 2961–2970, 2019.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv:1706.03762, 2017.
- K. Kanno, and A. Uchida, “Photonic reinforcement learning based on optoelectronic reservoir computing,” arXiv:2202.12896, 2022.