An LLM Compiler for Parallel Function Calling (2312.04511v3)

Published 7 Dec 2023 in cs.CL

Abstract: The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LLMs to select and coordinate multiple functions based on the context to tackle more complex problems. However, current methods for function calling often require sequential reasoning and acting for each function which can result in high latency, cost, and sometimes inaccurate behavior. To address this, we introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calls. Drawing inspiration from the principles of classical compilers, LLMCompiler enables parallel function calling with three components: (i) a Function Calling Planner, formulating execution plans for function calling; (ii) a Task Fetching Unit, dispatching function calling tasks; and (iii) an Executor, executing these tasks in parallel. LLMCompiler automatically generates an optimized orchestration for the function calls and can be used with both open-source and closed-source models. We have benchmarked LLMCompiler on a range of tasks with different patterns of function calling. We observe consistent latency speedup of up to 3.7x, cost savings of up to 6.7x, and accuracy improvement of up to ~9% compared to ReAct. Our code is available at https://github.com/SqueezeAILab/LLMCompiler.

PDF Abstract

Introduction to LLMCompiler

LLMs have emerged as powerful tools for complex reasoning and problem-solving. These capabilities allow LLMs to execute function calls, incorporating external functions to overcome their inherent limitations and enhance their problem-solving scope. Present methods for LLMs' multi-function calling are often sequential and can lead to inefficiencies. This paper introduces the LLMCompiler, a novel framework designed for executing multiple function calls in parallel, boosting efficiency and performance.

Design and Components of LLMCompiler

The architecture of LLMCompiler draws inspiration from traditional compilers, emphasizing the parallel execution of tasks. It comprises three main components:

LLM Planner: This component formulates execution strategies, determining the necessary tasks and their dependencies to construct a Directed Acyclic Graph (DAG).
Task Fetching Unit: Responsible for dispatching tasks that are ready to be executed and updating tasks with the actual outputs of preceding tasks to progress execution.
Executor: Carries out the parallel execution of tasks, adhering to the dependencies within the DAG and streamlining the multi-function calling process.

Performance Evaluation

LLMCompiler's performance was benchmarked against ReAct, a previous method, and OpenAI's parallel function calling feature. The evaluation covered different parallel function calling scenarios and showed:

Consistent latency improvements up to 3.7 times, cost savings up to 6 times, and accuracy improvements up to approximately 9% compared to ReAct.
Latency improvements up to 1.35 times over OpenAI's parallel function calling feature, with comparable accuracy.
Enhanced efficiency for GPT models and open-source models like LLaMA-2.

Future Directions and Conclusion

The introduction of LLMCompiler represents a significant advancement in executing multi-function calls with LLMs, promoting efficiency and effectiveness. The parallel orchestration facilitated by LLMCompiler is poised to transform LLM-based software development, especially as the field trends towards viewing LLMs within an operating systems framework. Future work could explore integrating LLMCompiler with the concept of LLMs as operating systems, further extending the range of complex tasks that can benefit from the power of LLMs.