Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs (2401.10065v3)

Published 18 Jan 2024 in cs.CL

Abstract: Reasoning is a fundamental component of language understanding. Recent prompting techniques, such as chain of thought, have consistently improved LLMs' performance on various reasoning tasks. Nevertheless, there is still little understanding of what triggers reasoning abilities in LLMs in the inference stage. In this paper, we introduce code prompting, a chain of prompts that transforms a natural language problem into code and directly prompts the LLM using the generated code without resorting to external code execution. We hypothesize that code prompts can elicit certain reasoning capabilities of LLMs trained on text and code and utilize the proposed method to improve conditional reasoning, the ability to infer different conclusions depending on the fulfiLLMent of certain conditions. We find that code prompting exhibits a high-performance boost for multiple LLMs (up to 22.52 percentage points on GPT 3.5, 7.75 on Mixtral, and 16.78 on Mistral) across multiple conditional reasoning datasets. We then conduct comprehensive experiments to understand how code prompts trigger reasoning abilities and which capabilities are elicited in the underlying models. Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement. Furthermore, code prompts improve sample efficiency of in-context learning and facilitate state tracking of variables or entities.

PDF HTML Abstract

Overview of Code Prompting

A paper investigates a novel approach to enhancing the conditional reasoning abilities of text+code LLMs, such as GPT 3.5. Through a process termed 'code prompting,' a natural language task is transformed into code, with the generated code used to prompt the LLM. This method leverages the LLM's capability to understand both textual and code inputs, aiming for performance improvements in tasks that require conditional reasoning.

Experimental Findings

The research outlines a clear performance improvement when using code prompts over traditional text prompts in reasoning tasks. This advancement is quantified as an increase between 2.6 and 7.7 points across different datasets—ConditionalQA and BoardgameQA. Significantly, code prompts do more than just transform text into code—they retain the natural language text within the produced code as comments, which is crucial for understanding the problem.

Investigation into Code Prompt Efficacy

The transformative methodology requires that the code not only takes on the structural form but also bears a close semantic resemblance to the original problem text. It is the alignment of the logic expressed in the code with the semantics of the text that unlocks the enhanced reasoning capabilities of the LLM. A pivotal finding is the superior efficiency of code prompts—they are found to require fewer examples (demonstrations) to guide the LLM towards correct reasoning, which makes them particularly advantageous in resource-constrained scenarios.

Implications and Future Potential

The technique showcases an increased ability of the LLM to track the state of variables or key entities throughout reasoning tasks. This implies an intrinsic advantage in facilitating logical operational tasks that deal with stateful or conditional information. Looking ahead, the researchers intend to investigate the application of this approach to other reasoning types and models, potentially broadening its utility across a more extensive range of LLM applications.

The method's main limitation lies in the necessity for an intermediate transformation step increasing the overall processing cost. However, the simplicity of the transformation holds promise for further optimization, such as outsourcing the task to a specialized but smaller model. Despite this, the research presents a compelling case for the role of code prompting in elevating the reasoning faculties of LLMs in conditional reasoning scenarios.