BesiegeField: A Testbed for Compositional Machine Design

Updated 19 October 2025

BesiegeField Environment is a modular simulation testbed for compositional machine design, blending reinforcement learning with physical simulation.
It integrates terrain generation, construction trees, and hierarchical planning to assess design validity and performance.
The platform advances agentic reasoning by combining LLM-guided assembly with RL optimization and continuous scene representation.

BesiegeField Environment is a modular, simulation-based testbed designed to evaluate machine design via compositional assembly and to serve as a challenging ground for learning agentic reasoning, planning, and physical interaction. It is deployed atop the part-based machine-building game Besiege, implements reward-driven feedback, and integrates recent methodologies in reinforcement learning (RL), neural scene representation, and agent-based navigation. BesiegeField is referenced as a primary testbed in research focused on compositional machine design with both symbolic planning and RL in realistic simulation environments (Zhang et al., 16 Oct 2025).

1. Framework Architecture and Simulation Base

BesiegeField leverages the original game Besiege's core mechanics, especially its rigid-body and elastic physics engine, and part-based construction system. The environment is formalized with a construction tree—a structured representation recording the relative attachment and articulation relationships amongst components. The game mechanics are reified to allow algorithmic, programmatic, and language-model–driven part assembly. Machines are specified by their construction trees, which define types, attachment points, and placements of standardized parts subject to geometric and functional constraints.

The simulation base allows agents to build and deploy machines (cars, catapults, etc.) and observe their operational performance in physically simulated scenarios. The evaluation framework is grounded in reward-driven performance, where successful behaviors (e.g., locomotion distance, projectile launch range) define score functions, and construction validity enforces design rules (no unattached parts, no collisions, etc.).

2. Terrain Generation, Morphologies, and Task Distribution

BesiegeField incorporates terrain complexity inspired by the Terrain RL Simulator (Berseth et al., 2018). The system can generate varied terrains—slopes, gaps, walls, steps, cliffs, mixed patterns—by parameterizing distributions over geometric features. Terrain files (in JSON-like syntax) permit specification of parameters such as “GapSpacingMin”, “GapSpacingMax”, “GapWMin”, “GapWMax”, controlling the distribution of configuration elements sampled per episode:

$W \sim \text{Uniform}(W_{\min}, W_{\max})$

The overall terrain configuration vector

$T = [W, H, \ldots]$

determines the sampled environment, with difficulty modulated via the size and variability of parameter ranges.

Agent morphologies are diverse, including bipeds, hoppers, dogs, and articulated creatures, each governed by customized actuation models: torque control, desired velocity/position control, and muscle-based control. The agent-environment coupling is realized through observation vectors that integrate both agent states (joint angles, velocities) and local terrain features, supporting behavior generalization under highly variable, stochastic task distributions.

3. Compositional Machine Design Protocols

Compositional machine design in BesiegeField is defined as the hierarchical, structured assembly of machines that meet complex functional requirements under geometric and physical constraints. Construction trees act as code-like specifications stipulating part types, positions, and attachments:

Cars: Static relational reasoning is tested—ensuring symmetry, correct orientation, and robust wheel placements.
Catapults: Dynamic relational reasoning is challenged—coordinating mechanical elements to optimize projectile height and distance.

Functional demands (locomotion, manipulation, launching) are mapped to reward metrics evaluated under simulated physics. In reinforcement learning settings, the reward function is:

$R = \text{is\_valid} \times \text{performance}$

where “is_valid” is a binary indicator (the design passes all physical and construction constraints) and “performance” is quantified by metrics specific to the task (for catapults, product of projectile height and throw distance).

4. Agentic Reasoning, LLMs, and Hierarchical Control

The environment facilitates agentic workflows employing LLMs and hierarchical chain-of-thought (CoT) reasoning. LLMs generate high-level functional requirements, which are recursively translated into syntactic, executable construction trees. In iterative workflows, agents (or multi-agent systems) plan, critique, and edit designs—balancing abstract blueprints and concrete part placements.

Success in BesiegeField necessitates advanced spatial reasoning, strategic assembly (long-horizon compositional planning), and precise instruction following (translation of functional specification to valid construction trees). Empirical benchmarks indicate current LLMs face challenges in placing parts with correct 3D orientation and in maintaining design validity through extended planning horizons.

5. Reinforcement Learning and Optimization Techniques

To address the observed limitations of LLMs, RL protocols are incorporated for policy improvement. Group relative policy optimization (GRPO) is applied, and LoRA parameterizations support efficient finetuning. Training is bootstrapped by a cold-start dataset—pairs of valid machines and expert chain-of-thought trajectories. Optimization is formalized as:

$\theta^* = \argmax_{\theta} \, \mathbb{E}_\theta[R]$

where $\theta$ represents the policy (for constructing machines), and $R$ is the verifiable reward. This protocol ensures an agent's policy converges not to solutions that only satisfy high-level specifications but rather to ones that are both valid and high-performing under the physical simulation and environment rules.

6. Scene Representation via Continuous Environment Fields

Scene representations and navigation behaviors in BesiegeField can be further enriched by encoding “reaching distance” via neural implicit functions, as outlined in "Learning Continuous Environment Fields via Implicit Functions" (Li et al., 2021). The environment field, $u(x)$ , defines the distance from any scene point to a goal along feasible (collision-free) paths, with:

$\hat{u}(x) = f_e(x)$

and loss function

$L(f_e(x), u(x)) = |f_e(x) - u(x)|$

Training data are generated using analytical methods (e.g., the fast marching method), and the continuous field supports efficient planning, especially in high-dimensional and dynamically changing terrains. For domains involving human agents, a conditional VAE is trained to produce accessible regions (human-plausible locations), ensuring trajectories are both feasible and physically plausible.

7. Challenges, Limitations, and Research Directions

Several open challenges are identified:

Misalignment between chain-of-thought and physical realization persists, especially in long-horizon compositions.
LLMs have demonstrable difficulty with spatial reasoning and attachment constraints, leading to invalid designs.
Agentic RL policies may collapse to narrow design spaces without careful balancing of exploration and exploitation.
Scene representation via implicit functions depends critically on the quality and diversity of training samples, especially in highly dynamic or chaotic settings, such as those with rapidly shifting obstacles or adversarial behaviors.
Extending generative models (such as VAEs for accessible regions) to radically new domains or topologies remains nontrivial, requiring additional data conditioning and contextual adaptation.

Future research aims to refine RL protocols, integrate multimodal feedback (visual schematics, physical metrics), and improve scene and agent conditioning to handle the high complexity and unpredictability characteristic of BesiegeField environments. There is also focus on bridging the gap between abstract, hierarchical planning and execution-level precision in part placement and behavior control.

BesiegeField Environment stands as a rigorous platform for benchmarking compositional design, agentic reasoning, and interactive machine construction under physical simulation. It interleaves methods from procedural terrain generation, neural scene representation, language-guided planning, and reinforcement learning, with explicit evaluation protocols for machine validity and functional performance. The testbed catalyzes inquiry at the interface of language, machine design, and physical reasoning, presenting technical challenges essential for progressing toward robust and versatile intelligent agents.

PDF Markdown Chat (Pro)

References (3)

Agentic Design of Compositional Machines (2025)

Terrain RL Simulator (2018)

Learning Continuous Environment Fields via Implicit Functions (2021)

Follow Topic

Get notified by email when new papers are published related to BesiegeField Environment.