Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models (2304.09842v3)

Published 19 Apr 2023 in cs.CL, cs.AI, cs.CV, and cs.LG

Abstract: LLMs have achieved remarkable progress in solving various natural language processing tasks due to emergent reasoning abilities. However, LLMs have inherent limitations as they are incapable of accessing up-to-date information (stored on the Web or in task-specific knowledge bases), using external tools, and performing precise mathematical and logical reasoning. In this paper, we present Chameleon, an AI system that mitigates these limitations by augmenting LLMs with plug-and-play modules for compositional reasoning. Chameleon synthesizes programs by composing various tools (e.g., LLMs, off-the-shelf vision models, web search engines, Python functions, and heuristic-based modules) for accomplishing complex reasoning tasks. At the heart of Chameleon is an LLM-based planner that assembles a sequence of tools to execute to generate the final response. We showcase the effectiveness of Chameleon on two multi-modal knowledge-intensive reasoning tasks: ScienceQA and TabMWP. Chameleon, powered by GPT-4, achieves an 86.54% overall accuracy on ScienceQA, improving the best published few-shot result by 11.37%. On TabMWP, GPT-4-powered Chameleon improves the accuracy by 17.0%, lifting the state of the art to 98.78%. Our analysis also shows that the GPT-4-powered planner exhibits more consistent and rational tool selection via inferring potential constraints from instructions, compared to a ChatGPT-powered planner. The project is available at https://chameleon-LLM.github.io.

Citations (243)

View on Semantic Scholar

Summary

The paper presents a plug-and-play modular framework that extends LLMs with dynamic tool integration for enhanced compositional reasoning.
It employs a natural language planner to generate clear, modifiable programs combining vision models, search engines, and Python functions.
Evaluations show significant gains with improvements of 11.37% on ScienceQA and 17.0% on TabMWP, demonstrating practical state-of-the-art performance.

Plug-and-Play Compositional Reasoning with LLMs

The paper presented develops an innovative framework called Plug-and-Play Compositional Reasoning, which extends the capabilities of LLMs by integrating plug-and-play modules for enhanced compositional reasoning. This adaptation aims to overcome inherent LLM limitations concerning up-to-date external data access, precise logical and mathematical reasoning, and real-time tool utilization. The framework's focal point is an LLM-based planner that synthesizes programs incorporating multiple tool types to address specific reasoning tasks.

Model Architecture and Approach

The core contribution of the paper lies in its modular approach, which diversifies tool use by dynamically composing LLMs with off-the-shelf models, such as vision models, web search engines, Python functions, and more heuristic modules. The framework's essence is a sequence-based planner that judiciously assembles a particular set of tools tailored to the input query, producing a program comprising these tool modules for the final response.

A distinctive element of the approach involves designing the natural language planner to generate understandable and modifiable natural-language-like programs. These programs are easier for users to comprehend and do not demand extensive programming knowledge, promoting accessibility and extensibility to a broader array of applications and user scenarios.

Evaluation on Benchmarks

The system was evaluated using two large benchmarks: ScienceQA, a multi-modal question-answering benchmark emphasizing scientific reasoning across diverse contexts, and TabMWP, a mathematical reasoning benchmark necessitating precise table-based operations. The results highlighted the tangible benefits of augmenting LLMs with the presented framework:

ScienceQA: The framework improved the state-of-the-art few-shot accuracy to 86.54% using GPT-4, representing an improvement of 11.37% over previous results.
TabMWP: The tabular reasoning task saw accuracy improvements, achieving 98.78% accuracy with GPT-4, representing an increment of 17.0% beyond the best-known models.

Such substantial improvements showcase the framework's efficacy in addressing complex reasoning tasks by seamlessly integrating multi-modal tools for more effective decision-making.

Implications and Future Directions

By successfully fusing the foundational LLM capabilities with external tools in a modular and adaptable structure, this research indicates a significant step forward in compositional AI. Practically, this framework can be applied to diverse scenarios requiring multi-source reasoning, such as educational tools, analytical systems, and decision-support applications.

Theoretically, the success of this approach suggests avenues for future research in designing even more specialized tools and improving the planner's sophistication. This could involve more advanced constraint handling or adaptive learning mechanisms that optimize tool selection dynamically based on task characteristics. Furthermore, investigating the development of new modules that contribute further domain-specific insights or analyzing the interplay of tool selection strategies may yield further improvements.

Overall, the paper offers substantial contributions to enhancing LLM capabilities through an innovative plug-and-play approach, setting a foundation for further developments in AI compositional reasoning.

PDF Markdown

Related Papers

GitHub

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

Tweets

https://twitter.com/lupantech/status/1791318198551499095

https://twitter.com/PlacidoDomenech/status/1649087724345851906

https://twitter.com/matthewclifford/status/1649072660146536449

https://twitter.com/johnjnay/status/1649036276627132418

https://twitter.com/furoku/status/1648854120126730241

https://twitter.com/furoku/status/1649094120974041096

YouTube

Show All Videos