An Investigation of Time Reversal Symmetry in Reinforcement Learning (2311.17008v1)
Abstract: One of the fundamental challenges associated with reinforcement learning (RL) is that collecting sufficient data can be both time-consuming and expensive. In this paper, we formalize a concept of time reversal symmetry in a Markov decision process (MDP), which builds upon the established structure of dynamically reversible Markov chains (DRMCs) and time-reversibility in classical physics. Specifically, we investigate the utility of this concept in reducing the sample complexity of reinforcement learning. We observe that utilizing the structure of time reversal in an MDP allows every environment transition experienced by an agent to be transformed into a feasible reverse-time transition, effectively doubling the number of experiences in the environment. To test the usefulness of this newly synthesized data, we develop a novel approach called time symmetric data augmentation (TSDA) and investigate its application in both proprioceptive and pixel-based state within the realm of off-policy, model-free RL. Empirical evaluations showcase how these synthetic transitions can enhance the sample efficiency of RL agents in time reversible scenarios without friction or contact. We also test this method in more realistic environments where these assumptions are not globally satisfied. We find that TSDA can significantly degrade sample efficiency and policy performance, but can also improve sample efficiency under the right conditions. Ultimately we conclude that time symmetry shows promise in enhancing the sample efficiency of reinforcement learning and provide guidance when the environment and reward structures are of an appropriate form for TSDA to be employed effectively.
- On learning symmetric locomotion. In Proceedings of the 12th ACM SIGGRAPH Conference on Motion, Interaction and Games, pages 1–10, 2019.
- Openai gym, 2016.
- Deep reinforcement learning in a handful of trials using probabilistic dynamics models, 2018.
- Forward-backward reinforcement learning, 2018.
- There is no turning back: A self-supervised approach for reversibility-aware reinforcement learning. Advances in Neural Information Processing Systems, 34:1898–1911, 2021.
- Gym Library. Lunar lander. https://www.gymlibrary.dev/environments/box2d/lunar_lander/#rewards. Accessed June 8, 2023.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, 2018.
- Frank Kelly. Reversibility and Stochastic Networks, 1978. URL http://www.statslab.cam.ac.uk/ frank/BOOKS/book/whole.pdf.
- Bidirectional molecular dynamics: Interpretation in terms of a modern formulation of classical mechanics. Journal of Computational Chemistry, 17(13):1564–1570, 1996. https://doi.org/10.1002/(SICI)1096-987X(199610)17:13¡1564::AID-JCC8¿3.0.CO;2-Q. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels, 2021.
- Time-reversal symmetry in dynamical systems: a survey. Physica D: Nonlinear Phenomena, 112(1-2):1--39, 1998.
- Reinforcement learning with augmented data. Advances in neural information processing systems, 33:19884--19895, 2020.
- Invariant transform experience replay: Data augmentation for deep reinforcement learning. IEEE Robotics and Automation Letters, 5(4):6615--6622, oct 2020. 10.1109/lra.2020.3013937.
- Aditi Mavalankar. Goal-conditioned batch reinforcement learning for rotation invariant locomotion. arXiv preprint arXiv:2004.08356, 2020.
- Playing atari with deep reinforcement learning, 2013.
- TRASS: time reversal as self-supervision. In 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31 - August 31, 2020, pages 115--121. IEEE, 2020. 10.1109/ICRA40945.2020.9196862. URL https://doi.org/10.1109/ICRA40945.2020.9196862.
- Mastering the game of go with deep neural networks and tree search. Nature, 529:484--503, 2016. URL http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html.
- Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018. URL http://incompleteideas.net/book/the-book-2nd.html.
- Deepmind control suite, 2018.
- Mdp homomorphic networks: Group symmetries in reinforcement learning. Advances in Neural Information Processing Systems, 33:4199--4210, 2020.
- Loup Verlet. Computer experiments on classical fluids. i. thermodynamical properties of lennard-jones molecules. Phys. Rev., 159:98--103, Jul 1967. 10.1103/PhysRev.159.98. URL https://link.aps.org/doi/10.1103/PhysRev.159.98.
- Mastering visual continuous control: Improved data-augmented reinforcement learning, 2021a.
- Improving sample efficiency in model-free reinforcement learning from images. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 10674--10681, 2021b.
- Learning symmetric and low-energy locomotion. ACM Transactions on Graphics (TOG), 37(4):1--12, 2018.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.