Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Assessing GPT4-V on Structured Reasoning Tasks (2312.11524v1)

Published 13 Dec 2023 in cs.CL, cs.AI, and cs.CV

Abstract: Multi-modality promises to unlock further uses for LLMs. Recently, the state-of-the-art LLM GPT-4 was enhanced with vision capabilities. We carry out a prompting evaluation of GPT-4V and five other baselines on structured reasoning tasks, such as mathematical reasoning, visual data analysis, and code generation. We show that visual Chain-of-Thought, an extension of Chain-of-Thought to multi-modal LLMs, yields significant improvements over the vanilla model. We also present a categorized analysis of scenarios where these models perform well and where they struggle, highlighting challenges associated with coherent multimodal reasoning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Mukul Singh (13 papers)
  2. José Cambronero (22 papers)
  3. Sumit Gulwani (55 papers)
  4. Vu Le (26 papers)
  5. Gust Verbruggen (15 papers)
Citations (9)