Overall effect of image-aided reasoning on LLM performance
Determine the overall effect of image-aided reasoning—defined as prompting large language models to generate and iteratively modify intermediate images while following compositional object reconstruction instructions—on performance relative to language-only reasoning in the mental imagery task adapted from Finke et al. (1989), and identify the conditions and model configurations under which image generation helps or hinders accuracy.
References
It is unclear what the overall effect of image-aided reasoning is, as the models still found some success (though diminished), and more exploration of its effects is needed \citep{yang2025, wu2024}.
— Artificial Phantasia: Evidence for Propositional Reasoning-Based Mental Imagery in Large Language Models
(2509.23108 - McCarty et al., 27 Sep 2025) in Subsection "Image-aided Reasoning" (Results)