Papers

Topics

Authors

Recent

View all

Assistant AI Research Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

GPT-5.1

GPT-5.1 91 tok/s

Gemini 3.0 Pro 46 tok/s Pro

Gemini 2.5 Flash 148 tok/s Pro

Kimi K2 170 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

Chrome Extension

Enhance arXiv with our new Chrome Extension.

Sponsor

Organize your preprints, BibTeX, and PDFs with Paperpile.
Get 30 days free

Content

Paper Summary Paper Prompts Open Problems Continue Learning Related Papers Authors Collections Tweets

Pausing Policy Learning in Non-stationary Reinforcement Learning (2405.16053v1)

Published 25 May 2024 in cs.LG

Abstract: Real-time inference is a challenge of real-world reinforcement learning due to temporal differences in time-varying environments: the system collects data from the past, updates the decision model in the present, and deploys it in the future. We tackle a common belief that continually updating the decision is optimal to minimize the temporal gap. We propose forecasting an online reinforcement learning framework and show that strategically pausing decision updates yields better overall performance by effectively managing aleatoric uncertainty. Theoretically, we compute an optimal ratio between policy update and hold duration, and show that a non-zero policy hold duration provides a sharper upper bound on the dynamic regret. Our experimental evaluations on three different environments also reveal that a non-zero policy hold duration yields higher rewards compared to continuous decision updates.

References (43)

Experience replay for real-time reinforcement learning control. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(2):201–212, 2012.
Continuous adaptation via meta-learning in nonstationary and competitive environments. In International Conference on Learning Representations, 2018.
Real-time bidding by reinforcement learning in display advertising. In Proceedings of the tenth ACM international conference on web search and data mining, pp. 661–670, 2017.
Fast global convergence of natural policy gradient methods with entropy regularization. Operations Research, 70(4):2563–2578, 2022.
Towards safe policy improvement for non-stationary mdps. Advances in Neural Information Processing Systems, 33:9156–9168, 2020a.
Optimizing for the future in non-stationary mdps. In International Conference on Machine Learning, pp. 1414–1425. PMLR, 2020b.
An adaptive deep rl method for non-stationary environments with piecewise stable context. In Advances in Neural Information Processing Systems, volume 35, pp. 35449–35461, 2022.
Reinforcement learning for non-stationary markov decision processes: The blessing of (more) optimism. In International Conference on Machine Learning, pp. 1843–1854. PMLR, 2020.
Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, 2016.
Provably efficient primal-dual reinforcement learning for cmdps with non-stationary objectives and constraints. In AAAI, 2023.
Non-stationary risk-sensitive reinforcement learning: Near-optimal dynamic regret, adaptive detection, and separation design. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6):7405–7413, 2022.
Challenges of real-world reinforcement learning. arXiv preprint arXiv:1904.12901, 2019.
Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In International conference on machine learning, pp. 1407–1416. PMLR, 2018.
Deepmind ai reduces google data centre cooling bill by 40 URL https://deepmind.google/discover/blog/.
Learning rates for q-learning. J. Mach. Learn. Res., 5:1–25, dec 2004. ISSN 1532-4435.
Dynamic regret of policy optimization in non-stationary environments. Advances in Neural Information Processing Systems, 33:6743–6754, 2020.
Factored adaptation for non-stationary reinforcement learning. Advances in Neural Information Processing Systems, 35:31957–31971, 2022.
Online meta-learning. In International Conference on Machine Learning, pp. 1920–1930. PMLR, 2019.
Gal, Y. Uncertainty in deep learning. phd thesis, University of Cambridge, 2016.
Reinforcement learning for electric power system decision and control: Past considerations and perspectives. International Federation of Automatic Control, 50:6918–6927, 2017.
Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp. 1861–1870. PMLR, 2018.
Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
Texplore: real-time sample-efficient reinforcement learning for robots. Machine learning, 90:385–429, 2013.
Adarl: What, where, and how to adapt in transfer reinforcement learning. In International Conference on Learning Representations, 2022.
Autonomous navigation of uav by using real-time model-based reinforcement learning. 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV), 2016.
When to trust your model: Model-based policy optimization. Advances in neural information processing systems, 32, 2019.
Kakade, S. M. A natural policy gradient. Advances in neural information processing systems, 14, 2001.
Rl for latent mdps: Regret guarantees and a lower bound. Advances in Neural Information Processing Systems, 34:24523–24534, 2021.
Tempo adaptation in non-stationary reinforcement learning. arXiv preprint arXiv:2309.14989, 2023.
Prediction, consistency, curvature: Representation learning for locally-linear control. arXiv preprint arXiv:1909.01506, 2019.
Model-free non-stationary rl: Near-optimal regret and applications in multi-agent rl and inventory control. arXiv preprint arXiv:2010.03161, 2020.
Near-optimal model-free reinforcement learning in non-stationary episodic mdps. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 7447–7458. PMLR, 2021.
Finite-time analysis of asynchronous stochastic approximation and q𝑞qitalic_q-learning. In Abernethy, J. and Agarwal, S. (eds.), Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pp. 3185–3205. PMLR, 09–12 Jul 2020.
Real-time reinforcement learning. Advances in neural information processing systems, 32, 2019.
Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
Mastering the game of go with deep neural networks and tree search. Nature, 529:484–503, 2016.
Spirtes, P. An anytime algorithm for causal inference. In Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, volume R3 of Proceedings of Machine Learning Research, pp. 278–285. PMLR, 04–07 Jan 2001.
Deep learning for recommender systems: A netflix case study. AI Magazine, 42(3):7–18, 2021.
Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE, 2012.
Anytime inference in probabilistic logic programs with tp-compilation. In Proceedings of 24th International Joint Conference on Artificial Intelligence (IJCAI), volume 2015, pp. 1852–1858. IJCAI-INT JOINT CONF ARTIF INTELL, 2015.
Real-time bidding: A new frontier of computational advertising research. Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 2015.
Q-learning. Machine learning, 8:279–292, 1992.
Varibad: Variational bayes-adaptive deep rl via meta-learning. Journal of Machine Learning Research, 22(289):1–39, 2021.