Neural Slot Interpreters: Grounding Object Semantics in Emergent Slot Representations (2403.07887v2)

Published 2 Feb 2024 in cs.CV and cs.AI

Abstract: Several accounts of human cognition posit that our intelligence is rooted in our ability to form abstract composable concepts, ground them in our environment, and reason over these grounded entities. This trifecta of human thought has remained elusive in modern intelligent machines. In this work, we investigate whether slot representations extracted from visual scenes serve as appropriate compositional abstractions for grounding and reasoning. We present the Neural Slot Interpreter (NSI), which learns to ground object semantics in slots. At the core of NSI is an XML-like schema that uses simple syntax rules to organize the object semantics of a scene into object-centric schema primitives. Then, the NSI metric learns to ground primitives into slots through a structured objective that reasons over the intermodal alignment. We show that the grounded slots surpass unsupervised slots in real-world object discovery and scale with scene complexity. Experiments with a bi-modal object-property and scene retrieval task demonstrate the grounding efficacy and interpretability of correspondences learned by NSI. Finally, we investigate the reasoning abilities of the grounded slots. Vision Transformers trained on grounding-aware NSI tokenizers using as few as ten tokens outperform patch-based tokens on challenging few-shot classification tasks.

References (39)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Related Papers

Object-Centric Learning with Slot Attention (2020)
Grounded Object Centric Learning (2023)
Towards Improving the Generation Quality of Autoregressive Slot VAEs (2022)
Improving Object-centric Learning with Query Optimization (2022)
Cycle Consistency Driven Object Discovery (2023)

Tweets

https://twitter.com/CSVisionPapers/status/1768137707916575231

https://twitter.com/Kokingkoal/status/1838138076700323975