Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
117 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neurosymbolic Grounding for Compositional World Models (2310.12690v2)

Published 19 Oct 2023 in cs.LG, cs.AI, and stat.ML

Abstract: We introduce Cosmos, a framework for object-centric world modeling that is designed for compositional generalization (CompGen), i.e., high performance on unseen input scenes obtained through the composition of known visual "atoms." The central insight behind Cosmos is the use of a novel form of neurosymbolic grounding. Specifically, the framework introduces two new tools: (i) neurosymbolic scene encodings, which represent each entity in a scene using a real vector computed using a neural encoder, as well as a vector of composable symbols describing attributes of the entity, and (ii) a neurosymbolic attention mechanism that binds these entities to learned rules of interaction. Cosmos is end-to-end differentiable; also, unlike traditional neurosymbolic methods that require representations to be manually mapped to symbols, it computes an entity's symbolic attributes using vision-language foundation models. Through an evaluation that considers two different forms of CompGen on an established blocks-pushing domain, we show that the framework establishes a new state-of-the-art for CompGen in world modeling. Artifacts are available at: https://trishullab.github.io/cosmos-web/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Learning to compose neural networks for question answering. arXiv preprint arXiv:1601.01705, 2016.
  2. Neural module networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 39–48, 2016.
  3. Neural machine translation by jointly learning to align and translate. In ICLR, 2015.
  4. Hierarchical abstraction for combinatorial generalization in object rearrangement. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
  5. Object representations as fixed points: Training iterative refinement algorithms with implicit differentiation. Advances in Neural Information Processing Systems, 35:32694–32708, 2022.
  6. Group equivariant convolutional networks. In International conference on machine learning, pages 2990–2999. PMLR, 2016.
  7. Steerable cnns. arXiv preprint arXiv:1612.08498, 2016.
  8. Learning to infer graphics programs from hand-drawn images. In Advances in neural information processing systems, pages 6059–6068, 2018.
  9. Neural production systems. Neural Information Processing Systems, 2021.
  10. Neural production systems. Advances in Neural Information Processing Systems, 34:25673–25687, 2021.
  11. Visual programming: Compositional visual reasoning without training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14953–14962, 2023.
  12. World models. arXiv preprint arXiv:1803.10122, 2018.
  13. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  14. Stevan Harnad. The symbol grounding problem. Physica D: Nonlinear Phenomena, 42(1-3):335–346, 1990.
  15. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  16. Ns3d: Neuro-symbolic grounding of 3d objects and relations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2614–2623, 2023.
  17. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
  18. Improving object-centric learning with query optimization. In The Eleventh International Conference on Learning Representations, 2022.
  19. Systematic evaluation of causal discovery in visual model based reinforcement learning. arXiv preprint arXiv:2107.00848, 2021.
  20. Measuring compositional generalization: A comprehensive method on realistic data. In International Conference on Learning Representations, 2020.
  21. Contrastive learning of structured world models. arXiv preprint arXiv:1911.12247, 2019.
  22. Segment anything. arXiv:2304.02643, 2023.
  23. Object-centric learning with slot attention. Advances in Neural Information Processing Systems, 33:11525–11538, 2020.
  24. Program-guided image manipulators. In Proceedings of the IEEE International Conference on Computer Vision, pages 4030–4039, 2019.
  25. Learning to compose soft prompts for compositional zero-shot learning. arXiv preprint arXiv:2204.03574, 2022.
  26. Learning symmetric embeddings for equivariant world models. arXiv preprint arXiv:2204.11371, 2022.
  27. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  28. Balaraman Ravindran. An algebraic approach to abstraction in reinforcement learning. University of Massachusetts Amherst, 2004.
  29. Learning differentiable programs with admissible neural heuristics. In Advances in Neural Information Processing Systems, 2020.
  30. Task programming: Learning data efficient behavior representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2876–2885, 2021.
  31. Vipergpt: Visual inference via python execution for reasoning. arXiv preprint arXiv:2303.08128, 2023.
  32. From perception to programs: regularize, overparameterize, and amortize. In International Conference on Machine Learning, pages 33616–33631. PMLR, 2023.
  33. Houdini: Lifelong learning as program synthesis. In Advances in Neural Information Processing Systems, pages 8701–8712, 2018.
  34. Plannable approximations to mdp homomorphisms: Equivariance under actions. arXiv preprint arXiv:2002.11963, 2020.
  35. Mdp homomorphic networks: Group symmetries in reinforcement learning. Advances in Neural Information Processing Systems, 33:4199–4210, 2020.
  36. Entity abstraction in visual model-based reinforcement learning. In Leslie Pack Kaelbling, Danica Kragic, and Komei Sugiura, editors, Proceedings of the Conference on Robot Learning, volume 100 of Proceedings of Machine Learning Research, pages 1439–1456. PMLR, 30 Oct–01 Nov 2020.
  37. Trajectory prediction using equivariant continuous convolution. arXiv preprint arXiv:2010.11344, 2020.
  38. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
  39. Programmatically grounded, compositionally generalizable robotic manipulation. arXiv preprint arXiv:2304.13826, 2023.
  40. Cobra: Data-efficient model-based rl through unsupervised object discovery and curiosity-driven exploration. arXiv preprint arXiv:1905.09275, 2019.
  41. Spatial broadcast decoder: A simple architecture for learning disentangled representations in vaes. arXiv preprint arXiv:1901.07017, 2019.
  42. Neural-symbolic vqa: Disentangling reasoning from vision and language understanding. Advances in neural information processing systems, 31, 2018.
  43. Unsupervised learning of neurosymbolic encoders. CoRR, abs/2107.13132, 2021.
  44. Toward compositional generalization in object-oriented world modeling. In International Conference on Machine Learning, pages 26841–26864. PMLR, 2022.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com