Self-Steering Language Models (2504.07081v1)

Published 9 Apr 2025 in cs.CL and cs.AI

Abstract: While test-time reasoning enables LLMs to tackle complex tasks, searching or planning in natural language can be slow, costly, and error-prone. But even when LMs struggle to emulate the precise reasoning steps needed to solve a problem, they often excel at describing its abstract structure--both how to verify solutions and how to search for them. This paper introduces DisCIPL, a method for "self-steering" LMs where a Planner model generates a task-specific inference program that is executed by a population of Follower models. Our approach equips LMs with the ability to write recursive search procedures that guide LM inference, enabling new forms of verifiable and efficient reasoning. When instantiated with a small Follower (e.g., Llama-3.2-1B), DisCIPL matches (and sometimes outperforms) much larger models, including GPT-4o and o1, on challenging constrained generation tasks. In decoupling planning from execution, our work opens up a design space of highly-parallelized Monte Carlo inference strategies that outperform standard best-of-N sampling, require no finetuning, and can be implemented automatically by existing LMs.

Summary

The paper introduces DisCIPL, a framework that decouples planning and execution to enhance constraint satisfaction and reasoning in language models.
It deploys a Planner LM (GPT-4o) to generate probabilistic inference programs that guide Follower LMs (Llama-3.2-1B) using Monte Carlo methods like SMC.
Experiments demonstrate that DisCIPL boosts constraint compliance and coherence, enabling smaller models to achieve performance close to larger ones on challenging tasks.

This paper introduces DisCIPL (Distributing Computation via Inference Programs for LLMs), a framework designed to improve the ability of LLMs (LMs) to handle complex tasks involving constraints and reasoning, areas where even large models often struggle. The core idea is to decouple the planning of how to solve a task from the execution of the solution steps.

Framework Overview:

Planner LM: A capable LM (like GPT-4o) takes the user's task description.
Inference Program Generation: The Planner generates a task-specific inference program written in a probabilistic programming language (PPL). This program encodes both how to verify a solution and a plan (search strategy) for finding one.
Follower LM Execution: A population of potentially smaller LMs (the Followers, e.g., Llama-3.2-1B) executes this program. The program guides the Followers' generation process step-by-step, making calls to the Follower model either generatively (sampling) or evaluatively (computing probabilities).
Inference Engine: An engine runs the generated program, coordinating the Follower LMs using specified Monte Carlo inference strategies (like Importance Sampling or Sequential Monte Carlo) within a given computational budget (N).
Feedback Loop: If the program execution results in an error, the error message and traceback are fed back to the Planner, which attempts to correct the program (up to R retries).

Implementation Details:

Probabilistic Programming Language (PPL): The paper uses LLaMPPL, a Python-based PPL specifically designed for LMs. LLaMPPL allows programs to define a step() function that extends candidate generations token-by-token or word-by-word and updates their scores based on compliance with constraints or likelihood under the Follower model.
Inference Algorithms: The framework utilizes Monte Carlo methods implemented by the LLaMPPL inference engine:
- Importance Sampling (IS): Generates N full candidates in parallel and selects one based on final scores.
- Sequential Monte Carlo (SMC): Maintains N parallel candidates ("particles"), iteratively extends them via step(), updates weights based on scores, and periodically resamples to focus computation on promising partial solutions. This adaptive allocation helps improve coherency and handles complex constraints.
- Rejection Sampling (RS): A simpler baseline where N samples are generated and checked post-hoc.
Common Programming Patterns: The Planner is prompted to generate programs using several effective patterns:
- Step-by-step decomposition: Breaking the task into manageable step() units (e.g., one word, one line of poetry).
- Prior and Proposal Prompts: Using a general "prior" prompt for fluency and a task-specific "proposal" prompt for constraints, with importance weighting to balance them.
- Constrained Generation with Weight Correction: Using token masks to enforce hard constraints (like specific words or regular expressions) while correcting the probability distribution.
- Self-Hinting: Dynamically updating the Follower's prompt context with intermediate results or state information (like remaining budget or character count).
- Self-Checking: Defining a check() method within the program to verify the final output against constraints, catching generation errors or bugs in the step logic.

Experiments and Results:

Domains: Evaluated on COLLIE (challenging constrained text generation) and Puzzles (custom dataset with poetry, grant writing, budgeting, itinerary planning).
Setup: GPT-4o as Planner, Llama-3.2-1B as Follower. Baselines included Follower-only (standard, beam search), Planner-only (GPT-4o, GPT-4o-mini), and a reasoning model (o1).
Validity (Pass@1): DisCIPL significantly boosted the Llama-3.2-1B Follower's ability to satisfy constraints, surpassing both Follower-only and Planner-only baselines on difficult tasks (like sentence-level COLLIE and Puzzles) and approaching o1 performance. On simpler paragraph tasks, it closed the gap between the Follower and Planner.
Coherency: SMC generally produced more coherent outputs than IS for similar validity levels, attributed to its resampling mechanism filtering out disfluent but technically valid partial generations.
Efficiency: DisCIPL enables smaller models (Followers) guided by programs generated by a larger model (Planner) to achieve performance comparable to or exceeding much larger models, demonstrating effective inference-time scaling.
Program Generation: The Planner successfully generated effective check() methods and reasonably good step() logic, though performance gaps existed compared to expert-written programs, especially on Puzzles. Error feedback helped correct initial program bugs.

Conclusion:

The Self-Steering framework (DisCIPL) demonstrates that LMs can effectively generate task-specific inference programs that orchestrate populations of smaller LMs using probabilistic inference techniques like SMC. This approach enhances constraint satisfaction and reasoning capabilities, enabling smaller models to rival larger ones by efficiently utilizing parallel inference-time computation. It automates aspects of inference engineering and offers a flexible way to combine generation, constraint satisfaction, and search.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/_akhaliq/status/1910249880083120527

https://twitter.com/gabe_grand/status/1915237908698849635

https://twitter.com/gabe_grand/status/1912904391897518225

https://twitter.com/gm8xx8/status/1910259462457139454

https://twitter.com/nulladmin/status/1913312672915046703

https://twitter.com/GptMaestro/status/1911058011315208495

Self-Steering Language Models (2504.07081v1)

Summary

Related Papers

Tweets

YouTube

HackerNews

Reddit