Dice Question Streamline Icon: https://streamlinehq.com

Effective Integration of Symbolic Systems with Multi-Modal LLMs

Determine effective mechanisms for integrating and exploiting symbolic reasoning systems together with multi-modal large language models to enable robust multi-modal reasoning in applications such as visual question answering, embodied AI, and spatial intelligence.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper surveys neuro-symbolic approaches to enhance the reasoning abilities of LLMs and identifies gaps when extending these methods beyond text-only settings. Many real-world tasks require reasoning across multiple modalities (e.g., text, images, actions), yet current multi-modal reasoning often reduces to language-centric processing, limiting fidelity to how humans reason.

In the open research directions, the authors explicitly state that effectively exploiting symbolic systems in conjunction with multi-modal LLMs remains unresolved. They highlight a mismatch with human reasoning, such as geometry problem solving that involves visual manipulations (e.g., drawing auxiliary lines), underscoring the need for principled neuro-symbolic designs that operate natively over multiple modalities.

References

How to effectively exploit symbolic systems that are integrated with multi-modal LLMs remains an open problem.

Neuro-Symbolic Artificial Intelligence: Towards Improving the Reasoning Abilities of Large Language Models (2508.13678 - Yang et al., 19 Aug 2025) in Section 7 (Challenges Open Research Directions), Multi-Modal Reasoning paragraph