TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks (2401.12869v1)

Published 23 Jan 2024 in cs.AI

Abstract: LLMs (LMs) can solve tasks such as answering questions about tables or images by writing programs. However, using primitive functions often leads to verbose and error-prone programs, and higher-level functions require expert design. To enable better solutions without human labor, we ask code LMs to curate reusable high-level functions, and use them to write solutions. We present TROVE, a training-free method of inducing a verifiable and efficient toolbox of functions, by generating via using, growing, and periodically trimming the toolbox. On 11 datasets from math, table question answering, and image reasoning tasks, TROVE consistently yields simpler solutions with higher accuracy than baselines using CODELLAMA and previous methods using GPT, while using 79-98% smaller toolboxes. TROVE further enables 31% faster and 13% more accurate human verification than baselines. With the same pipeline, it creates diverse functions for varied tasks and datasets, providing insights into their individual characteristics.

References (34)

Citations (12)

View on Semantic Scholar

Summary

The paper introduces a zero-training approach that iteratively builds a dynamic toolbox of high-level functions without extra supervision.
The methodology employs execution agreement and periodic trimming, achieving 31% faster human verification and 13% higher accuracy than baselines.
TroVE simplifies code generation by producing leaner, simpler solutions across diverse tasks, demonstrating broad applicability.

Introduction

LLMs (LMs) have been progressively utilized in the field of code generation, where their utility ranges from answering questions about structured data to performing image reasoning tasks, underpinned by the ability to compose programs in languages like Python. Nevertheless, a challenge persists as models relying on low-level or "primitive" functions typically produce verbose and error-prone code, while high-level, abstract functions necessitate expert construction. The reliance on such functions can lead to inefficiencies—both in solution complexity and in the subsequent human verification processes. In light of this, the introduced work—titled "TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks"—promises a novel method to alleviate these issues.

Methodology

TroVE posits a zero-training approach that constructs a dynamic toolbox of high-level functions through a mechanism that involves generating, utilizing, pruning, and constantly refining these tool constructs over time. This method stands apart by requiring no additional training or supervision, as it iteratively builds the toolbox while solving a stream of questions. The approach is structured around three core components: (1) the iterative use and expansion of a toolbox across examples, (2) an execution agreement-based selection criterion for optimal outputs, and (3) periodic trimming to discard low-utility functions. This methodology was tested on 11 datasets encompassing mathematical problem-solving, table question answering, and visual reasoning tasks, revealing that TroVE consistently attained higher accuracy and produced solutions of lower complexity with substantially smaller toolboxes compared to baselines using GPT models and CodeLLaMa.

Results

TroVE's results are compelling. When juxtaposed with baseline methods, TroVE significantly simplifies the verification process, rendering it 31% faster and 13% more accurate for human validators. Moreover, it was consistently superior in generating simpler, more accurate solutions while maintaining a leaner function library. TroVE's approach to cultivating specialized functions shows adaptability across tasks and datasets, indicating its capacity for wide-ranging applications. Crucially, TroVE exhibited robust performance irrespective of example ordering and demonstrated efficiency gains via its periodic toolbox trimming mechanism, which ultimately curtails the proliferation of redundant functions.

Implications and Conclusion

The TroVE framework represents a substantial advancement in the automation of function curation for LMs in code generation contexts. The strategy it embodies streamlines the creation of expressive, high-level functions without encumbering human intervention, and it finely balances the trade-offs between model performance, solution complexity, and library size. It promotes a more efficient human verification process, which is essential as LLMs increasingly become collaborators in coding workflows. The research successively paves a path toward the enhancement of LLMs as autonomous agents in programming, equipped with the ability to induce, apply, and manage sophisticated abstract functionalities. The work holds considerable promise for streamlining programmatic problem-solving and potentially transforming how we approach coding tasks with AI assistance.