Dice Question Streamline Icon: https://streamlinehq.com

Determine whether multimodality confers System 2 competence in large language models

Determine whether multi-modal large language models, such as GPT-4V, attain System 2 competence—i.e., deliberate reasoning and planning abilities as characterized by Kahneman’s System 2—by virtue of their added modalities, rather than merely expanding System 1 reflexive, pattern-completion capabilities.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper distinguishes between System 1 (reflexive, pattern-based behaviors) and System 2 (deliberative reasoning and planning) and argues that current LLMs primarily exhibit System 1-like behavior. It notes growing interest in multi-modal LLMs (e.g., GPT-4V) that incorporate vision and other modalities.

Despite these advances, the authors explicitly state uncertainty about whether multimodality translates into genuine System 2 competence. This raises a core question about the cognitive capabilities enabled by multimodal inputs and whether they support principled reasoning and planning beyond improved pattern completion.

References

While multi-modality is a great addition that increases the coverage of their System 1 imagination (Figure 1), it is not clear that this gives them System 2 competence.

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks (2402.01817 - Kambhampati et al., 2 Feb 2024) in Section 4, Related Work