Diagnosing and exploiting the computational demands of videos games for deep reinforcement learning (2309.13181v1)
Abstract: Humans learn by interacting with their environments and perceiving the outcomes of their actions. A landmark in artificial intelligence has been the development of deep reinforcement learning (dRL) algorithms capable of doing the same in video games, on par with or better than humans. However, it remains unclear whether the successes of dRL models reflect advances in visual representation learning, the effectiveness of reinforcement learning algorithms at discovering better policies, or both. To address this question, we introduce the Learning Challenge Diagnosticator (LCD), a tool that separately measures the perceptual and reinforcement learning demands of a task. We use LCD to discover a novel taxonomy of challenges in the Procgen benchmark, and demonstrate that these predictions are both highly reliable and can instruct algorithmic development. More broadly, the LCD reveals multiple failure cases that can occur when optimizing dRL algorithms over entire video game benchmarks like Procgen, and provides a pathway towards more efficient progress.
- Deep reinforcement learning at the edge of the statistical precipice. August 2021.
- Unifying count-based exploration and intrinsic motivation. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper/2016/file/afda332245e2af431fb7b672a68b659d-Paper.pdf.
- Forethought and hindsight in credit assignment. In Proceedings of the 34th International Conference on Neural Information Processing Systems, number Article 191 in NIPS’20, pp. 2270–2281, Red Hook, NY, USA, December 2020. Curran Associates Inc.
- Leveraging procedural generation to benchmark reinforcement learning. In Hal Daumé Iii and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 2048–2056. PMLR, 2020.
- Phasic policy gradient. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 2020–2027. PMLR, 2021.
- Investigating simple object representations in Model-Free deep reinforcement learning. February 2020.
- The role of pretrained representations for the OOD generalization of reinforcement learning agents. July 2021.
- Go-Explore: a new approach for Hard-Exploration problems. January 2019.
- First return, then explore. Nature, 590(7847):580–586, February 2021.
- L L Emberson. Chapter one - how does experience shape early development? considering the role of Top-Down mechanisms. In Janette B Benson (ed.), Advances in Child Development and Behavior, volume 52, pp. 1–41. JAI, January 2017.
- IMPALA: Scalable distributed Deep-RL with importance weighted Actor-Learner architectures. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 1407–1416. PMLR, 2018.
- Comparing machines and humans on a visual categorization test. Proc. Natl. Acad. Sci. U. S. A., 108(43):17621–17625, October 2011.
- Meta learning shared hierarchies. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=SyX0IeWAW.
- Five points to check when comparing visual perception in humans and machines. J. Vis., 21(3):16, March 2021.
- Generalisation in humans and deep neural networks. In S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, and R Garnett (eds.), Advances in Neural Information Processing Systems 31, pp. 7549–7561. Curran Associates, Inc., 2018.
- Partial success in closing the gap between human and machine vision. June 2021.
- Meta-reinforcement learning of structured exploration strategies. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/file/4de754248c196c85ee4fbdcee89179bd-Paper.pdf.
- Deep residual learning for image recognition. December 2015.
- Evolved policy gradients. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/file/7876acb66640bad41f1e1371ef30c180-Paper.pdf.
- Reinforcement learning with unsupervised auxiliary tasks. November 2016.
- Not-So-CLEVR: learning same-different relations strains feedforward neural networks. Interface Focus, 8(4):20180011, August 2018.
- Disentangling neural mechanisms for perceptual grouping. International Conference on Representation Learning, 2020.
- Adam: A method for stochastic optimization. December 2014.
- A survey of generalisation in deep reinforcement learning. November 2021.
- CURL: Contrastive unsupervised representations for reinforcement learning. In Hal Daumé Iii and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 5639–5650. PMLR, 2020.
- Reward-predictive representations generalize across tasks in reinforcement learning. PLoS Comput. Biol., 16(10):e1008317, October 2020.
- Perceptual learning and top-down influences in primary visual cortex. Nat. Neurosci., 7(6):651–657, June 2004.
- Feature pyramid networks for object detection. December 2016.
- Learning long-range spatial dependencies with horizontal gated-recurrent units. May 2018a.
- Global-and-local attention networks for visual recognition. May 2018b.
- Learning what and where to attend with humans in the loop. In International Conference on Learning Representations, 2019.
- Stable and expressive recurrent vision models. NeurIPS, 2020.
- Tracking without re-recognition in humans and machines. May 2021.
- The emergence of objectness: Learning zero-shot segmentation from videos. Advances in Neural Information Processing Systems, 34:13137–13152, 2021.
- Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440, 2015.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015.
- R C O’Reilly. Generalization in interactive networks: the benefits of inhibitory competition and hebbian learning. Neural Comput., 13(6):1199–1241, June 2001.
- Offline Meta-Reinforcement learning with online Self-Supervision. July 2021.
- Learning transferable visual models from natural language supervision. February 2021.
- Real-World robot learning with masked visual pre-training. CoRL.
- Deep abstract Q-Networks. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, pp. 131–138, Richland, SC, July 2018. International Foundation for Autonomous Agents and Multiagent Systems.
- The implementation of visual routines. Vision Res., 40(10-12):1385–1411, 2000.
- Pieter R Roelfsema. Cortical algorithms for perceptual grouping. Annu. Rev. Neurosci., 29:203–227, January 2006.
- Perceptual learning rules based on reinforcers and attention. Trends Cogn. Sci., 14(2):64–71, February 2010.
- Learning montezuma’s revenge from a single demonstration. December 2018.
- Proximal policy optimization algorithms. July 2017.
- A Domain-General theory of the development of perceptual discrimination. Curr. Dir. Psychol. Sci., 16(4):197–201, 2007.
- Loss is its own reward: Self-Supervision for reinforcement learning. February 2017.
- Mastering the game of go without human knowledge. Nature, 550(7676):354–359, October 2017.
- A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, December 2018.
- B F Skinner. The Behavior of Organisms: An Experimental Analysis. D. Appleton & Company, December 1938.
- Decoupling representation learning from reinforcement learning. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 9870–9879. PMLR, 2021.
- DeepMind control suite. January 2018.
- Long range arena: A benchmark for efficient transformers. ICLR, 2021.
- Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(56):1633–1685, 2009. URL http://jmlr.org/papers/v10/taylor09a.html.
- S Ullman. Visual routines. Cognition, 18(1-3):97–159, December 1984.
- Understanding the computational demands underlying visual reasoning. Neural Comput., 34(5):1075–1099, April 2022.
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, November 2019.
- Optic flow is used to control human walking. Nat. Neurosci., 4(2):213–216, February 2001.
- William H Warren. Information is where you find it: Perception as an ecologically Well-Posed problem. Iperception, 12(2):20416695211000366, March 2021.
- Perceptual learning without perception. Nature, 413(6858):844–848, October 2001.
- Single-cell responses in striate cortex of kittens deprived of vision in one eye. J. Neurophysiol., 26:1003–1017, November 1963.
- Eric Wiewiora. Reward shaping. In Claude Sammut and Geoffrey I Webb (eds.), Encyclopedia of Machine Learning, pp. 863–865. Springer US, Boston, MA, 2010.
- Masked visual pre-training for motor control. arXiv:2203. 06173.
- Meta-Gradient reinforcement learning. In S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, and R Garnett (eds.), Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
- Meta-gradient reinforcement learning with an objective discovered online. In Proceedings of the 34th International Conference on Neural Information Processing Systems, number Article 1279 in NIPS’20, pp. 15254–15264, Red Hook, NY, USA, December 2020. Curran Associates Inc.
- Learning invariant representations for reinforcement learning without reconstruction. June 2020.
- Lakshmi Narasimhan Govindarajan (7 papers)
- Rex G Liu (1 paper)
- Drew Linsley (20 papers)
- Alekh Karkada Ashok (6 papers)
- Max Reuter (6 papers)
- Thomas Serre (57 papers)
- Michael J Frank (1 paper)