Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models (2304.09842v3)

Published 19 Apr 2023 in cs.CL, cs.AI, cs.CV, and cs.LG
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

Abstract: LLMs have achieved remarkable progress in solving various natural language processing tasks due to emergent reasoning abilities. However, LLMs have inherent limitations as they are incapable of accessing up-to-date information (stored on the Web or in task-specific knowledge bases), using external tools, and performing precise mathematical and logical reasoning. In this paper, we present Chameleon, an AI system that mitigates these limitations by augmenting LLMs with plug-and-play modules for compositional reasoning. Chameleon synthesizes programs by composing various tools (e.g., LLMs, off-the-shelf vision models, web search engines, Python functions, and heuristic-based modules) for accomplishing complex reasoning tasks. At the heart of Chameleon is an LLM-based planner that assembles a sequence of tools to execute to generate the final response. We showcase the effectiveness of Chameleon on two multi-modal knowledge-intensive reasoning tasks: ScienceQA and TabMWP. Chameleon, powered by GPT-4, achieves an 86.54% overall accuracy on ScienceQA, improving the best published few-shot result by 11.37%. On TabMWP, GPT-4-powered Chameleon improves the accuracy by 17.0%, lifting the state of the art to 98.78%. Our analysis also shows that the GPT-4-powered planner exhibits more consistent and rational tool selection via inferring potential constraints from instructions, compared to a ChatGPT-powered planner. The project is available at https://chameleon-LLM.github.io.

Plug-and-Play Compositional Reasoning with LLMs

The paper presented develops an innovative framework called Plug-and-Play Compositional Reasoning, which extends the capabilities of LLMs by integrating plug-and-play modules for enhanced compositional reasoning. This adaptation aims to overcome inherent LLM limitations concerning up-to-date external data access, precise logical and mathematical reasoning, and real-time tool utilization. The framework's focal point is an LLM-based planner that synthesizes programs incorporating multiple tool types to address specific reasoning tasks.

Model Architecture and Approach

The core contribution of the paper lies in its modular approach, which diversifies tool use by dynamically composing LLMs with off-the-shelf models, such as vision models, web search engines, Python functions, and more heuristic modules. The framework's essence is a sequence-based planner that judiciously assembles a particular set of tools tailored to the input query, producing a program comprising these tool modules for the final response.

A distinctive element of the approach involves designing the natural language planner to generate understandable and modifiable natural-language-like programs. These programs are easier for users to comprehend and do not demand extensive programming knowledge, promoting accessibility and extensibility to a broader array of applications and user scenarios.

Evaluation on Benchmarks

The system was evaluated using two large benchmarks: ScienceQA, a multi-modal question-answering benchmark emphasizing scientific reasoning across diverse contexts, and TabMWP, a mathematical reasoning benchmark necessitating precise table-based operations. The results highlighted the tangible benefits of augmenting LLMs with the presented framework:

  1. ScienceQA: The framework improved the state-of-the-art few-shot accuracy to 86.54% using GPT-4, representing an improvement of 11.37% over previous results.
  2. TabMWP: The tabular reasoning task saw accuracy improvements, achieving 98.78% accuracy with GPT-4, representing an increment of 17.0% beyond the best-known models.

Such substantial improvements showcase the framework's efficacy in addressing complex reasoning tasks by seamlessly integrating multi-modal tools for more effective decision-making.

Implications and Future Directions

By successfully fusing the foundational LLM capabilities with external tools in a modular and adaptable structure, this research indicates a significant step forward in compositional AI. Practically, this framework can be applied to diverse scenarios requiring multi-source reasoning, such as educational tools, analytical systems, and decision-support applications.

Theoretically, the success of this approach suggests avenues for future research in designing even more specialized tools and improving the planner's sophistication. This could involve more advanced constraint handling or adaptive learning mechanisms that optimize tool selection dynamically based on task characteristics. Furthermore, investigating the development of new modules that contribute further domain-specific insights or analyzing the interplay of tool selection strategies may yield further improvements.

Overall, the paper offers substantial contributions to enhancing LLM capabilities through an innovative plug-and-play approach, setting a foundation for further developments in AI compositional reasoning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Pan Lu (42 papers)
  2. Baolin Peng (72 papers)
  3. Hao Cheng (190 papers)
  4. Michel Galley (50 papers)
  5. Kai-Wei Chang (292 papers)
  6. Ying Nian Wu (138 papers)
  7. Song-Chun Zhu (216 papers)
  8. Jianfeng Gao (344 papers)
Citations (243)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com