Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

41 tokens/sec

GPT-4o

60 tokens/sec

Gemini 2.5 Pro Pro

44 tokens/sec

o3 Pro

8 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Learning to Reason via Program Generation, Emulation, and Search (2405.16337v3)

Published 25 May 2024 in cs.CL and cs.AI

Abstract: Program synthesis with LLMs (LMs) has unlocked a large set of reasoning abilities; code-tuned LMs have proven adept at generating programs that solve a wide variety of algorithmic symbolic manipulation tasks (e.g. word concatenation). However, not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding. Our goal is to extend an LM's program synthesis skills to such tasks and evaluate the results via pseudo-programs, namely Python programs where some leaf function calls are left undefined. To that end, we propose, Code Generation and Emulated EXecution (CoGEX). CoGEX works by (1) training LMs to generate pseudo-programs, (2) teaching them to emulate their generated program's execution, including those leaf functions, allowing the LM's knowledge to fill in the execution gaps; and (3) using them to search over many programs to find an optimal one. To adapt the CoGEX model to a new task, we introduce a method for performing program search to find a single program whose pseudo-execution yields optimal performance when applied to all the instances of a given dataset. We show that our approach yields large improvements compared to standard in-context learning approaches on a battery of tasks, both algorithmic and soft reasoning. This result thus demonstrates that code synthesis can be applied to a much broader class of problems than previously considered. Our released dataset, fine-tuned models, and implementation can be found at \url{https://github.com/nweir127/CoGEX}.

PDF HTML Abstract

Learning to Reason via Program Generation, Emulation, and Search

The paper "Learning to Reason via Program Generation, Emulation, and Search" proposes a novel methodology named CoGEX (Code Generation and Emulated Execution). This approach intends to extend the reasoning capabilities of LLMs (LMs) from strictly algorithmic tasks to softer reasoning challenges such as commonsense reasoning, moral decision-making, and understanding sarcasm. Traditional LMs trained for program synthesis excel at technical computations but are less suited for more subjective or nuanced reasoning tasks. CoGEX offers a compelling solution by generating what the authors term "pseudo-programs"—Python scripts where some function calls are underspecified, allowing the LM to leverage its latent knowledge to fill gaps during emulation.

Methodology

CoGEX operates by training LMs on pseudo-programs, enabling them to generate and "emulate" their execution. This involves creating code scripts that incorporate both definable reasoning processes and placeholders for more ambiguous reasoning steps. The LM predicts the outcomes of these placeholders during execution, simulating how these steps would resolve given the contextual information in the model's knowledge base.

The authors introduce a program search mechanism named CoTACS, which allows for task adaptation by identifying a general program that best fits a dataset. CoTACS employs a methodical search over potential programs generated by CoGEX to find one that optimizes performance across different data instances without updating LM parameters. This step embodies a crucial shift from solving tasks by instance-specific processing toward a broader application of a single generalizable program.

Results

Experiments spanning a variety of reasoning tasks—ranging from symbolic manipulation to commonsense questions—demonstrate that CoGEX outperforms baseline models, including few-shot examples from off-the-shelf LMs and instruction-tuned Alpaca models. Remarkably, CoGEX shows a significant improvement in less conventional reasoning tasks where traditional program synthesis would be ineffectual.

For instance, on tasks demanding numerical operations such as "sum of large numbers," CoGEX demonstrates marked effectiveness relative to established NL-based reasoning paradigms. Additionally, despite its code-centric process, CoGEX retains strong performance on text-oriented tasks like emotion classification and commonsense reasoning, showcasing its flexibility and potential broader applicability.

Implications and Future Directions

This approach suggests a new trajectory for AI reasoning development, bridging the gap between hard-coded reasoning processes and the softer, more contextually fluctuating inferences needed for human-like cognition. The integration of code generation and emulation allows LLMs to apply programmatic reasoning frameworks to sensibly diverse task domains, effectively broadening the mold of problems addressable by program synthesis methods.

Future work could explore refining these pseudo-programs for even deeper contextual understanding and improving program generalization across more dataset varieties. Increased functionality could also arise from refining the emulation process the LM undergoes during pseudo-execution, potentially bringing model reasoning closer to simulated human-like logic. This strategy paves the way toward LMs that can function autonomously across an expanded range of problem-solving contexts.

Overall, CoGEX exemplifies a significant advancement in reasoning capabilities for LMs, showcasing how code generation, aligned with an understanding of orchestration between various reasoning dimensions, can lead to more sophisticated, adaptable AI systems.

PDF Markdown Bookmark Chat (Pro)

References (44)

Authors (5)

Nathaniel Weir (17 papers)
Muhammad Khalifa (24 papers)
Linlu Qiu (14 papers)
Orion Weller (30 papers)
Peter Clark (108 papers)

Citations (2)

View on Semantic Scholar

Tweets

https://twitter.com/Nathaniel_Weir/status/1795590176485605689

https://twitter.com/GptMaestro/status/1796764573565124815

https://twitter.com/TheGrizztronic/status/1891208975942607225