Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning (2302.04419v3)

Published 9 Feb 2023 in cs.LG, cs.AI, and cs.CV

Abstract: Unsupervised object-centric representation (OCR) learning has recently drawn attention as a new paradigm of visual representation. This is because of its potential of being an effective pre-training technique for various downstream tasks in terms of sample efficiency, systematic generalization, and reasoning. Although image-based reinforcement learning (RL) is one of the most important and thus frequently mentioned such downstream tasks, the benefit in RL has surprisingly not been investigated systematically thus far. Instead, most of the evaluations have focused on rather indirect metrics such as segmentation quality and object property prediction accuracy. In this paper, we investigate the effectiveness of OCR pre-training for image-based reinforcement learning via empirical experiments. For systematic evaluation, we introduce a simple object-centric visual RL benchmark and conduct experiments to answer questions such as Does OCR pre-training improve performance on object-centric tasks?'' andCan OCR pre-training help with out-of-distribution generalization?''. Our results provide empirical evidence for valuable insights into the effectiveness of OCR pre-training for RL and the potential limitations of its use in certain scenarios. Additionally, this study also examines the critical aspects of incorporating OCR pre-training in RL, including performance in a visually complex environment and the appropriate pooling layer to aggregate the object representations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Causalworld: A robotic manipulation benchmark for causal structure and transfer learning. In International Conference on Learning Representations, 2021.
  2. Interaction networks for learning about objects, relations and physics. Advances in neural information processing systems, 29, 2016.
  3. The prospects of working memory training for improving deductive reasoning. Frontiers in human neuroscience, 9:56, 2015.
  4. Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390, 2019.
  5. Reinforcement learning for sparse-reward object-interaction tasks in a first-person simulated 3d environment. arXiv preprint arXiv:2010.15195, 2020.
  6. Roots: Object-centric representation and rendering of 3d scenes. The Journal of Machine Learning Research, 22(1):11770–11805, 2021.
  7. Spatially invariant unsupervised object detection with convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp.  3412–3420, 2019.
  8. The different representational frameworks underpinning abstract and concrete knowledge: Evidence from odd-one-out judgements. Quarterly Journal of Experimental Psychology, 62(7):1377–1390, 2009.
  9. Generalization and robustness implications in object-centric learning. arXiv preprint arXiv:2107.00637, 2021.
  10. An object-oriented representation for efficient reinforcement learning. In Proceedings of the 25th international conference on Machine learning, pp.  240–247, 2008.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  12. Savi++: Towards end-to-end object-centric learning from real-world videos. arXiv preprint arXiv:2206.07764, 2022.
  13. Genesis: Generative scene inference and sampling with object-centric latent representations. arXiv preprint arXiv:1907.13052, 2019.
  14. Genesis-v2: Inferring unordered object representations without iterative refinement. Advances in Neural Information Processing Systems, 34:8085–8094, 2021.
  15. Attend, infer, repeat: Fast scene understanding with generative models. Advances in Neural Information Processing Systems, 29, 2016.
  16. Towards deep symbolic reinforcement learning. arXiv preprint arXiv:1609.05518, 2016.
  17. Recurrent independent mechanisms. arXiv preprint arXiv:1909.10893, 2019.
  18. Multi-object representation learning with iterative variational inference. In International Conference on Machine Learning, pp. 2424–2433. PMLR, 2019.
  19. On the binding problem in artificial neural networks. arXiv preprint arXiv:2012.05208, 2020.
  20. Recurrent world models facilitate policy evolution. Advances in neural information processing systems, 31, 2018.
  21. Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019.
  22. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16000–16009, 2022.
  23. Visuomotor control in multi-object scenes using object-aware representations. arXiv preprint arXiv:2205.06333, 2022.
  24. Comparing partitions. Journal of classification, 2(1):193–218, 1985.
  25. Scalor: Generative world models with scalable object representations. In International Conference on Learning Representations, 2019.
  26. Object-centric slot diffusion. arXiv preprint arXiv:2303.10834, 2023.
  27. Kahneman, D. Thinking, fast and slow. Macmillan, 2011.
  28. Schema networks: Zero-shot transfer with a generative causal model of intuitive physics. In International conference on machine learning, pp. 1809–1818. PMLR, 2017.
  29. Systematic evaluation of causal discovery in visual model based reinforcement learning. arXiv preprint arXiv:2107.00848, 2021.
  30. Contrastive learning of structured world models. arXiv preprint arXiv:1911.12247, 2019.
  31. Conditional object-centric learning from video. arXiv preprint arXiv:2111.12594, 2021.
  32. Sequential attend, infer, repeat: Generative modelling of moving objects. Advances in Neural Information Processing Systems, 31, 2018.
  33. Building machines that learn and think like people. Behavioral and brain sciences, 40, 2017.
  34. Tell me why! explanations support learning relational and causal structure. In International Conference on Machine Learning, pp. 11868–11890. PMLR, 2022.
  35. Space: Unsupervised object-oriented scene representation via spatial attention and decomposition. In International Conference on Learning Representations, 2019.
  36. Improving generative imagination in object-centric world models. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp.  6140–6149. PMLR, 2020. URL http://proceedings.mlr.press/v119/lin20f.html.
  37. Object-centric learning with slot attention. Advances in Neural Information Processing Systems, 33:11525–11538, 2020.
  38. Compositional multi-object reinforcement learning with linear relation networks. arXiv preprint arXiv:2201.13388, 2022.
  39. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  40. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  41. Stable baselines3, 2019.
  42. A simple neural network module for relational reasoning. Advances in neural information processing systems, 30, 2017.
  43. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  44. Illiterate dall-e learns to compose. In International Conference on Learning Representations, 2021a.
  45. Structured world belief for reinforcement learning in pomdp. In International Conference on Machine Learning, pp. 9744–9755. PMLR, 2021b.
  46. Simple unsupervised object-centric learning for complex and naturalistic videos. arXiv preprint arXiv:2205.14065, 2022.
  47. Neural systematic binder. In The Eleventh International Conference on Learning Representations, 2023.
  48. An investigation into the open world survival game crafter. In Decision Awareness in Reinforcement Learning Workshop at ICML 2022.
  49. One of these greebles is not like the others: Semi-supervised models for similarity structures. Cognitive Science Society, 2008.
  50. The role of pretrained representations for the ood generalization of rl agents. In International Conference on Learning Representations, 2021.
  51. A perspective on objects and systematic generalization in model-based rl. arXiv preprint arXiv:1906.01035, 2019.
  52. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  53. Entity abstraction in visual model-based reinforcement learning. In Conference on Robot Learning, pp.  1439–1456. PMLR, 2020.
  54. Visual interaction networks: Learning a physics simulator from video. Advances in neural information processing systems, 30, 2017.
  55. Spriteworld: A flexible, configurable reinforcement learning environment. https://github.com/deepmind/spriteworld/, 2019a. URL https://github.com/deepmind/spriteworld/.
  56. Cobra: Data-efficient model-based rl through unsupervised object discovery and curiosity-driven exploration. arXiv preprint arXiv:1905.09275, 2019b.
  57. Generative video transformer: Can objects be the words? In International Conference on Machine Learning, pp. 11307–11318. PMLR, 2021.
  58. Masked visual pre-training for motor control. arXiv preprint arXiv:2203.06173, 2022.
  59. Self-supervised visual reinforcement learning with object-centric representations. arXiv preprint arXiv:2011.14381, 2020.
  60. Deep reinforcement learning with relational inductive biases. In International conference on learning representations, 2018.
  61. Policy architectures for compositional generalization in control. arXiv preprint arXiv:2203.05960, 2022.
Citations (32)

Summary

We haven't generated a summary for this paper yet.