Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FOCUS: Object-Centric World Models for Robotics Manipulation (2307.02427v2)

Published 5 Jul 2023 in cs.RO and cs.AI

Abstract: Understanding the world in terms of objects and the possible interplays with them is an important cognition ability, especially in robotics manipulation, where many tasks require robot-object interactions. However, learning such a structured world model, which specifically captures entities and relationships, remains a challenging and underexplored problem. To address this, we propose FOCUS, a model-based agent that learns an object-centric world model. Thanks to a novel exploration bonus that stems from the object-centric representation, FOCUS can be deployed on robotics manipulation tasks to explore object interactions more easily. Evaluating our approach on manipulation tasks across different settings, we show that object-centric world models allow the agent to solve tasks more efficiently and enable consistent exploration of robot-object interactions. Using a Franka Emika robot arm, we also showcase how FOCUS could be adopted in real-world settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res., 2016.
  2. Solving rubik’s cube with a robot hand. ArXiv, abs/1910.07113, 2019.
  3. Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. ArXiv, abs/1806.10293, 2018.
  4. AW-opt: Learning robotic skills with imitation andreinforcement at scale. In 5th Annual Conference on Robot Learning (CoRL), 2021.
  5. Beyond pick-and-place: Tackling robotic stacking of diverse shapes. ArXiv, abs/2110.06192, 2021.
  6. Dream to control: Learning behaviors by latent imagination, 2020.
  7. Addressing function approximation error in actor-critic methods, 2018.
  8. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, 2018.
  9. D. Ha and J. Schmidhuber. World models. 2018. doi:10.5281/ZENODO.1207631. URL https://zenodo.org/record/1207631.
  10. Mastering atari with discrete world models. In ICLR, 2021.
  11. Mastering the unsupervised reinforcement learning benchmark from pixels. 2023.
  12. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  13. Daydreamer: World models for physical robot learning, 2022.
  14. Multi-view masked world models for visual robotic manipulation, 2023.
  15. Masked world models for visual control, 2022.
  16. Dreamerpro: Reconstruction-free model-based reinforcement learning with prototypical representations, 2021.
  17. Concrete problems in ai safety, 2016.
  18. J. Clark and D. Amodei. Faulty reward functions in the wild. https://openai.com/blog/faulty-reward-functions/, 2016. Accessed: 2022-04-19.
  19. V. Krakovna et al. Specification gaming: the flip side of ai ingenuity. https://www.deepmind.com/blog/specification-gaming-the-flip-side-of-ai-ingenuity, 2020. Accessed: 2022-04-19.
  20. I. Popov et al. Data-efficient deep reinforcement learning for dexterous manipulation, 2017.
  21. Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation, 11(2):265–286, 2007.
  22. J. Schmidhuber. Curious model-building control systems. In [Proceedings] 1991 IEEE International Joint Conference on Neural Networks, pages 1458–1463 vol.2, 1991.
  23. A theory of how columns in the neocortex enable learning the structure of the world. Frontiers in Neural Circuits, 11, 2017. ISSN 1662-5110. doi:10.3389/fncir.2017.00081. URL https://www.frontiersin.org/articles/10.3389/fncir.2017.00081.
  24. The developing infant creates a curriculum for statistical learning. Trends in Cognitive Sciences, 22(4):325–336, 2018. ISSN 1364-6613. doi:https://doi.org/10.1016/j.tics.2018.02.004. URL https://www.sciencedirect.com/science/article/pii/S1364661318300275.
  25. Self‐generated variability in object images predicts vocabulary growth. Developmental Science, 22, 02 2019. doi:10.1111/desc.12816.
  26. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, dec 2020. doi:10.1038/s41586-020-03051-4. URL https://doi.org/10.1038%2Fs41586-020-03051-4.
  27. Temporal difference learning for model predictive control, 2022.
  28. Curiosity-driven exploration by self-supervised prediction, 2017.
  29. Curiosity-driven exploration via latent bayesian surprise, 2022.
  30. Touch-based curiosity for sparse-reward tasks, 2021.
  31. Self-supervised exploration via disagreement, 2019.
  32. Planning to explore via self-supervised world models. In ICML, 2020.
  33. H. Liu and P. Abbeel. Behavior from the void: Unsupervised active pre-training. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 18459–18473. Curran Associates, Inc., 2021.
  34. State entropy maximization with random encoders for efficient exploration, 2021.
  35. Task-agnostic exploration via policy gradient of a non-parametric state entropy estimate, 2021.
  36. Large-scale study of curiosity-driven learning, 2018.
  37. Generalization and robustness implications in object-centric learning, 2022.
  38. Object-centric learning with slot attention, 2020.
  39. Multi-object representation learning with iterative variational inference, 2020.
  40. Monet: Unsupervised scene decomposition and representation, 2019.
  41. Interaction-based disentanglement of entities for object-centric world models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=JQc2VowqCzz.
  42. An object-oriented representation for efficient reinforcement learning. In Proceedings of the 25th international conference on Machine learning, pages 240–247, 2008.
  43. Contrastive learning of structured world models, 2020.
  44. Learning latent dynamics for planning from pixels. In ICML, pages 2555–2565, 2019.
  45. Nearest neighbor estimates of entropy. American journal of mathematical and management sciences, 23(3-4):301–321, 2003.
  46. Maniskill2: A unified benchmark for generalizable manipulation skills, 2023.
  47. robosuite: A modular simulation framework and benchmark for robot learning. In arXiv preprint arXiv:2009.12293, 2020.
  48. Urlb: Unsupervised reinforcement learning benchmark, 2021.
  49. Segment anything, 2023.
  50. Slotformer: Unsupervised visual dynamics simulation with object-centric models, 2023.
  51. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  52. Segment and track anything. arXiv preprint arXiv:2305.06558, 2023.
  53. The distracting control suite – a challenging benchmark for reinforcement learning from pixels, 2021.
  54. A simple framework for contrastive learning of visual representations, 2020.
  55. On the properties of neural machine translation: Encoder-decoder approaches, 2014. URL https://arxiv.org/abs/1409.1259.
Citations (10)

Summary

We haven't generated a summary for this paper yet.