Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions (2212.10561v3)

Published 20 Dec 2022 in cs.CL, cs.AI, and cs.LG

Abstract: Despite recent success in LLM reasoning, LLMs struggle with hierarchical multi-step reasoning tasks like generating complex programs. For these tasks, humans often start with a high-level algorithmic design and implement each part gradually. We introduce Parsel, a framework enabling automatic implementation and validation of complex algorithms with code LLMs. With Parsel, we automatically decompose algorithmic tasks into hierarchical natural language function descriptions and then search over combinations of possible function implementations using tests. We show that Parsel can be used across domains requiring hierarchical reasoning, including program synthesis and robotic planning. We find that, using Parsel, LLMs solve more competition-level problems in the APPS dataset, resulting in pass rates over 75\% higher than prior results from directly sampling AlphaCode and Codex, while often using a smaller sample budget. Moreover, with automatically generated tests, we find that Parsel can improve the state-of-the-art pass@1 performance on HumanEval from 67\% to 85\%. We also find that LLM-generated robotic plans using Parsel are more than twice as likely to be considered accurate than directly generated plans. Lastly, we explore how Parsel addresses LLM limitations and discuss how Parsel may be useful for human programmers. We release our code at https://github.com/ezelikman/parsel

Citations (41)

View on Semantic Scholar

Summary

The paper introduces Parsel, a framework that decomposes tasks to boost LLM performance by over 75% on benchmark datasets.
It leverages hierarchical decomposition to generate and validate modular function implementations using pre-defined tests.
Parsel demonstrates practical impact in competitive programming and robotics by improving solution accuracy and plan reliability.

Algorithmic Reasoning with Parsel and LLMs

The paper presents Parsel, a novel framework designed to enhance the capability of LLMs in performing hierarchical multi-step reasoning tasks. These tasks, such as generating complex programs or planning tasks in robotics, are areas where LLMs have struggled due to their inherent linear generation process. By decomposing these tasks into smaller, manageable components, Parsel leverages LLMs more effectively to solve algorithmic problems.

Overview of Parsel Framework

Parsel introduces a structured approach that begins by dividing a given task into hierarchical natural language function descriptions. These descriptions are then transformed into implementable components by LLMs through an iterative process. The core of Parsel involves implementing a combinatorial search over possible function implementations, validating these implementations using pre-defined tests to ensure correctness.

The framework consists of three primary phases:

Decomposition: Parsel decomposes algorithmic tasks into function descriptions using the capabilities of LLMs. This decomposition mirrors how experienced developers break down complex problems into simpler parts.
Implementation: Using a LLM, Parsel generates several candidate implementations for each function. This modular approach allows Parsel to explore multiple combinations efficiently.
Composition and Verification: With the help of a synthesizer, Parsel assembles these implementations and tests their validity using constraints, such as input-output examples.

Strong Numerical Results

The empirical evaluation of Parsel demonstrates significant improvements in performance. The system was tested on the APPS dataset, specifically focusing on competition-level problems. Parsel achieved a pass rate over 75% higher than existing benchmarks such as AlphaCode and Codex. Specifically, Parsel reached a pass@1 performance on the HumanEval dataset from 67% to 85%. In robotic planning tasks, plans generated by Parsel were more than twice as likely to be accurate compared to directly generated plans.

Theoretical and Practical Implications

Theoretically, Parsel represents a shift towards integrating hierarchical decomposition in LLMs. By structuring tasks as a sequence of connected components, the framework can handle more abstract reasoning, unlocking possibilities for automated synthesis of large-scale solutions. Parsel’s use of constraints for verification also aligns with advancements in formal methods, bridging a gap between high-level reasoning and low-level execution.

Practically, this framework can significantly impact domains requiring complex problem-solving, such as competitive programming, large-scale software synthesis, and automated planning in robotics. Parsel enables human developers to focus on problem-solving rather than syntactic implementation details, which could revolutionize both educational and professional programming environments.

Future Directions

Future research should focus on addressing current limitations, such as recursive function dependencies and the integration of multiple specialized tools. Enhancing Parsel's capability to handle languages underrepresented in training data, as well as leveraging open-source models to avoid reliance on closed APIs, are critical avenues for exploration.

Building on automatic test generation and extending the framework for theorem proving are also promising directions. Moreover, if LLMs could generate functional, domain-specific languages within Parsel, this could unlock even broader applications by tailoring the LLM's generation capabilities to complex niche areas.

By proposing an innovative approach to programming language integration with LLMs, Parsel opens new pathways to making sophisticated algorithmic tasks more accessible and manageable, potentially reshaping how computational problems are approached and solved.

PDF Markdown

Related Papers

GitHub

GitHub - ezelikman/parsel: Code for Parsel 🐍 - generate complex programs with language models (396 stars)

Tweets

https://twitter.com/gm8xx8/status/1835416787875426607