FOCUS: Object-Centric World Models for Robotics Manipulation (2307.02427v2)
Abstract: Understanding the world in terms of objects and the possible interplays with them is an important cognition ability, especially in robotics manipulation, where many tasks require robot-object interactions. However, learning such a structured world model, which specifically captures entities and relationships, remains a challenging and underexplored problem. To address this, we propose FOCUS, a model-based agent that learns an object-centric world model. Thanks to a novel exploration bonus that stems from the object-centric representation, FOCUS can be deployed on robotics manipulation tasks to explore object interactions more easily. Evaluating our approach on manipulation tasks across different settings, we show that object-centric world models allow the agent to solve tasks more efficiently and enable consistent exploration of robot-object interactions. Using a Franka Emika robot arm, we also showcase how FOCUS could be adopted in real-world settings.
- End-to-end training of deep visuomotor policies. J. Mach. Learn. Res., 2016.
- Solving rubik’s cube with a robot hand. ArXiv, abs/1910.07113, 2019.
- Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. ArXiv, abs/1806.10293, 2018.
- AW-opt: Learning robotic skills with imitation andreinforcement at scale. In 5th Annual Conference on Robot Learning (CoRL), 2021.
- Beyond pick-and-place: Tackling robotic stacking of diverse shapes. ArXiv, abs/2110.06192, 2021.
- Dream to control: Learning behaviors by latent imagination, 2020.
- Addressing function approximation error in actor-critic methods, 2018.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, 2018.
- D. Ha and J. Schmidhuber. World models. 2018. doi:10.5281/ZENODO.1207631. URL https://zenodo.org/record/1207631.
- Mastering atari with discrete world models. In ICLR, 2021.
- Mastering the unsupervised reinforcement learning benchmark from pixels. 2023.
- Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
- Daydreamer: World models for physical robot learning, 2022.
- Multi-view masked world models for visual robotic manipulation, 2023.
- Masked world models for visual control, 2022.
- Dreamerpro: Reconstruction-free model-based reinforcement learning with prototypical representations, 2021.
- Concrete problems in ai safety, 2016.
- J. Clark and D. Amodei. Faulty reward functions in the wild. https://openai.com/blog/faulty-reward-functions/, 2016. Accessed: 2022-04-19.
- V. Krakovna et al. Specification gaming: the flip side of ai ingenuity. https://www.deepmind.com/blog/specification-gaming-the-flip-side-of-ai-ingenuity, 2020. Accessed: 2022-04-19.
- I. Popov et al. Data-efficient deep reinforcement learning for dexterous manipulation, 2017.
- Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation, 11(2):265–286, 2007.
- J. Schmidhuber. Curious model-building control systems. In [Proceedings] 1991 IEEE International Joint Conference on Neural Networks, pages 1458–1463 vol.2, 1991.
- A theory of how columns in the neocortex enable learning the structure of the world. Frontiers in Neural Circuits, 11, 2017. ISSN 1662-5110. doi:10.3389/fncir.2017.00081. URL https://www.frontiersin.org/articles/10.3389/fncir.2017.00081.
- The developing infant creates a curriculum for statistical learning. Trends in Cognitive Sciences, 22(4):325–336, 2018. ISSN 1364-6613. doi:https://doi.org/10.1016/j.tics.2018.02.004. URL https://www.sciencedirect.com/science/article/pii/S1364661318300275.
- Self‐generated variability in object images predicts vocabulary growth. Developmental Science, 22, 02 2019. doi:10.1111/desc.12816.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, dec 2020. doi:10.1038/s41586-020-03051-4. URL https://doi.org/10.1038%2Fs41586-020-03051-4.
- Temporal difference learning for model predictive control, 2022.
- Curiosity-driven exploration by self-supervised prediction, 2017.
- Curiosity-driven exploration via latent bayesian surprise, 2022.
- Touch-based curiosity for sparse-reward tasks, 2021.
- Self-supervised exploration via disagreement, 2019.
- Planning to explore via self-supervised world models. In ICML, 2020.
- H. Liu and P. Abbeel. Behavior from the void: Unsupervised active pre-training. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 18459–18473. Curran Associates, Inc., 2021.
- State entropy maximization with random encoders for efficient exploration, 2021.
- Task-agnostic exploration via policy gradient of a non-parametric state entropy estimate, 2021.
- Large-scale study of curiosity-driven learning, 2018.
- Generalization and robustness implications in object-centric learning, 2022.
- Object-centric learning with slot attention, 2020.
- Multi-object representation learning with iterative variational inference, 2020.
- Monet: Unsupervised scene decomposition and representation, 2019.
- Interaction-based disentanglement of entities for object-centric world models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=JQc2VowqCzz.
- An object-oriented representation for efficient reinforcement learning. In Proceedings of the 25th international conference on Machine learning, pages 240–247, 2008.
- Contrastive learning of structured world models, 2020.
- Learning latent dynamics for planning from pixels. In ICML, pages 2555–2565, 2019.
- Nearest neighbor estimates of entropy. American journal of mathematical and management sciences, 23(3-4):301–321, 2003.
- Maniskill2: A unified benchmark for generalizable manipulation skills, 2023.
- robosuite: A modular simulation framework and benchmark for robot learning. In arXiv preprint arXiv:2009.12293, 2020.
- Urlb: Unsupervised reinforcement learning benchmark, 2021.
- Segment anything, 2023.
- Slotformer: Unsupervised visual dynamics simulation with object-centric models, 2023.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Segment and track anything. arXiv preprint arXiv:2305.06558, 2023.
- The distracting control suite – a challenging benchmark for reinforcement learning from pixels, 2021.
- A simple framework for contrastive learning of visual representations, 2020.
- On the properties of neural machine translation: Encoder-decoder approaches, 2014. URL https://arxiv.org/abs/1409.1259.