Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models (2312.01714v2)

Published 4 Dec 2023 in cs.CL

Abstract: The advancement of LLMs has brought substantial attention to the Chain of Thought (CoT) approach, primarily due to its ability to enhance the capability of LLMs on complex reasoning tasks. Moreover, the significance of CoT approaches extends to the application of LLMs for multi-modal tasks. However, the selection of optimal CoT demonstration examples in multi-modal reasoning remains less explored for LLMs due to the inherent complexity of multi-modal examples. In this paper, we introduce a novel approach that addresses this challenge by using retrieval mechanisms to dynamically and automatically select demonstration examples based on cross-modal and intra-modal similarities. Furthermore, we employ a Stratified Sampling method of categorising demonstration examples into groups based on their types and then retrieving examples from different groups respectively to promote the diversity of demonstration examples. Through a series of experiments on two popular benchmark datasets: ScienceQA and MathVista, we demonstrate that our approach significantly improves the performance of GPT-4 by 6% on ScienceQA and 12.9% on MathVista, and enhances the performance of GPT-4V on two datasets by 2.7%, substantially improving the performance of the most advanced LLMs and LMMs for complex multi-modal reasoning tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Bingshuai Liu (4 papers)
  2. Chenyang Lyu (44 papers)
  3. Zijun Min (4 papers)
  4. Zhanyu Wang (22 papers)
  5. Jinsong Su (96 papers)
  6. Longyue Wang (87 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.