Conditioning prompts to generate naturalistic yet verifiable terminal tasks

Determine a conditioning strategy for the language-model prompt used to generate task descriptions in the Endless Terminals pipeline that produces more naturalistic, user-style requests for terminal-use tasks while simultaneously maintaining sufficient explicit specification to support automated verification via initial-state and completion tests.

Background

The Endless Terminals pipeline procedurally generates terminal-use tasks paired with privileged ground truth and automated tests to enable reinforcement learning with verifiable outcomes. While this design ensures solvability and objective evaluation, the authors note that the resulting tasks often resemble competitive programming problems rather than the ambiguous, underspecified requests typical of real user interactions.

The paper highlights the tension between producing naturalistic task requests and preserving enough formal specification for automated verification. Achieving both goals within the same generation process is difficult, creating a need for improved prompt conditioning that can yield realistic user-style tasks while still allowing testable verification criteria.

References

Conditioning the generation prompt to produce more naturalistic requests while maintaining sufficient specification for verification remains an open challenge.

— Endless Terminals: Scaling RL Environments for Terminal Agents (2601.16443 - Gandhi et al., 23 Jan 2026) in Discussion (Limitations)

Conditioning prompts to generate naturalistic yet verifiable terminal tasks

Background

References

Related Problems