Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
52 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
10 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Open-World Visual Reasoning by a Neuro-Symbolic Program of Zero-Shot Symbols (2407.13382v1)

Published 18 Jul 2024 in cs.LG

Abstract: We consider the problem of finding spatial configurations of multiple objects in images, e.g., a mobile inspection robot is tasked to localize abandoned tools on the floor. We define the spatial configuration of objects by first-order logic in terms of relations and attributes. A neuro-symbolic program matches the logic formulas to probabilistic object proposals for the given image, provided by language-vision models by querying them for the symbols. This work is the first to combine neuro-symbolic programming (reasoning) and language-vision models (learning) to find spatial configurations of objects in images in an open world setting. We show the effectiveness by finding abandoned tools on floors and leaking pipes. We find that most prediction errors are due to biases in the language-vision model.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets