Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalization properties of contrastive world models (2401.00057v1)

Published 29 Dec 2023 in cs.LG and cs.CV

Abstract: Recent work on object-centric world models aim to factorize representations in terms of objects in a completely unsupervised or self-supervised manner. Such world models are hypothesized to be a key component to address the generalization problem. While self-supervision has shown improved performance however, OOD generalization has not been systematically and explicitly tested. In this paper, we conduct an extensive study on the generalization properties of contrastive world model. We systematically test the model under a number of different OOD generalization scenarios such as extrapolation to new object attributes, introducing new conjunctions or new attributes. Our experiments show that the contrastive world model fails to generalize under the different OOD tests and the drop in performance depends on the extent to which the samples are OOD. When visualizing the transition updates and convolutional feature maps, we observe that any changes in object attributes (such as previously unseen colors, shapes, or conjunctions of color and shape) breaks down the factorization of object representations. Overall, our work highlights the importance of object-centric representations for generalization and current models are limited in their capacity to learn such representations required for human-level generalization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Factored world models for zero-shot generalization in robotic manipulation. arXiv preprint arXiv:2202.05333 .
  2. Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390 .
  3. The nature of explanation. volume 445. CUP Archive.
  4. On the transfer of disentangled representations in realistic settings. arXiv preprint arXiv:2010.14407 .
  5. Better set representations for relational reasoning. Advances in Neural Information Processing Systems 33, 895–905.
  6. Physics-as-inverse-graphics: Joint unsupervised learning of objects and physics from video. arXiv preprint arXiv:1905.11169 .
  7. Deep variational bayes filters: Unsupervised learning of state space models from raw data. arXiv preprint arXiv:1605.06432 .
  8. Contrastive learning of structured world models. arXiv preprint arXiv:1911.12247 .
  9. Structured object-aware physics prediction for video modeling and planning. arXiv preprint arXiv:1910.02425 .
  10. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review 62.
  11. On the fairness of disentangled representations. Advances in neural information processing systems 32.
  12. Challenging common assumptions in the unsupervised learning of disentangled representations, in: international conference on machine learning, PMLR. pp. 4114–4124.
  13. Principles of object perception. Cognitive science 14, 29–56.
  14. The development of object perception. Visual cognition: An invitation to cognitive science 2, 297–330.
  15. Core knowledge. Developmental science 10, 89–96.
  16. Pure reasoning in 12-month-old infants as probabilistic inference. science 332, 1054–1059.
  17. On disentangled representations learned from correlated data, in: International Conference on Machine Learning, PMLR. pp. 10401–10412.
  18. Are disentangled representations helpful for abstract visual reasoning? Advances in Neural Information Processing Systems 32.
  19. Entity abstraction in visual model-based reinforcement learning, in: Conference on Robot Learning, PMLR. pp. 1439–1456.
  20. The Oxford handbook of perceptual organization. OUP Oxford.

Summary

We haven't generated a summary for this paper yet.