Dice Question Streamline Icon: https://streamlinehq.com

Test-case-free self-correction for LLM program debugging

Determine whether large language models can debug and self-correct programs using only intermediate runtime execution information without access to correctness-labeled test cases, i.e., establish the feasibility of test-case-free debugging for large language models.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper introduces LDB, a framework that enables LLMs to debug generated programs by leveraging runtime execution information, such as execution traces and intermediate variable states after each basic block. LDB iteratively refines programs until they pass visible test cases and is shown to improve performance across multiple benchmarks.

In the Limitation section, the authors note that LDB currently requires correct test cases to guide execution and comparison against the task description. They explicitly identify as an open question whether LLMs can perform self-correction solely by inspecting intermediate execution states in the absence of test cases indicating correctness—referred to as test-case-free debugging.

References

It remains an open question in future study whether LLMs are able to do self-correct by simply looking at its intermediate execution without knowing whether the result is correct or not (a.k.a. test-case-free debugging).

Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step (2402.16906 - Zhong et al., 25 Feb 2024) in Section: Limitation