Agentic Design of Compositional Machines
- The paper presents a framework where agentic design integrates LLM-driven strategic planning with rule-based modular assembly to synthesize functional machines.
- It leverages hierarchical assembly and strict constraint satisfaction to ensure geometric validity and enhance machine performance using reinforcement learning.
- The introduction of the BesiegeField testbed benchmarks design quality through simulation-based evaluations, addressing spatial reasoning and error correction challenges.
Compositional machine design is the process of synthesizing functional mechanisms through the assembly of standardized parts according to explicit construction rules. This paradigm is central to both human engineering and the ambition of artificial intelligence systems that can autonomously invent physical machines or artifacts. Recent advances in LLMs have enabled research into agentic workflows where the generation and evaluation of composite machines are formalized as structured, multi-stage reasoning and synthesis processes. This entry surveys the foundational principles, agentic workflows, simulation environments, evaluation metrics, reinforcement learning enhancements, and open challenges in the agentic design of compositional machines, with a focus on recent developments anchored by the introduction of the BesiegeField testbed (Zhang et al., 16 Oct 2025).
1. Foundations of Compositional Machine Synthesis
Compositional machine design frames the task of building a machine—such as a vehicle or robotic manipulator—as the programmatic combination of a finite set of modular parts. Each part is defined by geometric, kinematic, and functional properties, with construction rules dictating permitted spatial arrangements and functional connections. The composite machine is typically represented as a rooted tree or graph, where nodes correspond to parts and edges represent attachment relationships. The overarching aim is to realize global functions (e.g., locomotion or payload delivery) through local part interactions.
Key aspects include:
- Modularity: Parts are reusable, with clear attachment semantics (e.g., faces, orientations).
- Hierarchical Assembly: Machines are assembled bottom-up through part–subpart composition, formally encoded as a construction tree or similar data structure.
- Constraint Satisfaction: Design rules ensure geometric validity (e.g., absence of collisions), mechanical soundness, and, where applicable, task-specific requirements (e.g., reach, mobility, stability).
- Mapping to Function: The realization of dynamic behaviors (such as launching a projectile via a catapult) emerges from the static arrangement and configuration of parts under the laws of physics.
2. Role of LLMs in Agentic Design
The agentic approach leverages LLMs as both planners and synthesizers in the compositional design process (Zhang et al., 16 Oct 2025). LLMs must operate at two levels:
- Strategic Reasoning: LLMs generate high-level chains-of-thought in natural language, decomposing the target functionality into actionable subgoals (e.g., “attach four wheels symmetrically to maximize stability”).
- Structured Code Generation: LLMs instantiate these strategies in executable, structured representations such as JSON-based construction trees. These representations encode, for each part, its type, attachment geometry, parent–child linkage, and orientation.
Crucial design competencies include:
- Spatial Reasoning: Determining legal and effective placements for components in three-dimensional space.
- Strategic Assembly: Sequencing the addition of substructures to satisfy both construction constraints and functional goals.
- Instruction Following: Maintaining fidelity to specified objective functions and constraints, such as maximizing forward vehicle displacement or ensuring machine validity under simulation.
3. The BesiegeField Testbed and Environment
To systematically paper compositional machine design via agentic workflows, BesiegeField (Zhang et al., 16 Oct 2025) was introduced as a research platform built on the Besiege game engine. Core features include:
- Component Library: Nearly 80 part types, each with well-defined mechanical semantics ranging from static blocks and axles to dynamic elements like wheels and boulders.
- Representation Interface: A parsimonious, tree-based or XML machine description format, recording each part, its parent, and attachment face.
- Physics Simulation: Machines are constructed in silico and evaluated in a full-featured physics engine, testing both static and dynamic behaviors.
- Task Suite: Benchmark tasks require designs for movement (the “Car” task emphasizes symmetry and stability under locomotion) and actuation (the “Catapult” task requires relational reasoning for correct projectile launching), with a variety of terrains and obstacles.
- Objective Metrics: Reward functions combine machine validity (geometric and structural soundness) with performance metrics such as displacement, speed, or projectile range, assessed through simulation.
This environment enables the quantitative assessment of design agents through both syntactic (validity of produced files, absence of geometric violations) and functional (simulation-based reward) criteria.
4. Agentic Workflows: Synthesis and Evaluation
The agentic design workflow involves a combination of one-shot or iterative construction steps. Given a natural language task description (e.g., “Build a vehicle that moves forward as far as possible on uneven terrain”), the LLM:
- Generates a chain-of-thought that decomposes the task into explicit assembly actions and spatial arrangements.
- Translates the plan into an explicit part-based representation, conforming to the BesiegeField construction grammar.
- Submits the machine for validation—checking for constraints such as collisions, improper part orientations, or invalid attach points.
- Evaluates performance via simulation, recording outcomes (distance traveled, stability, or other rewards).
Multiple passes—or reinforcement learning (see below)—may be employed to mitigate frequent failure cases such as:
- Invalid orientation: Incorrect facing or rotation of parts, resulting in non-functional assemblies.
- Attachment errors: Illegal or inadvertently disconnected substructures.
- Blueprint deviation: Synthesis diverging from the intended design logic formulated in the chain-of-thought.
5. Benchmarking, Metrics, and Failure Analysis
Evaluating agentic compositional design requires precise, quantitative metrics:
| Metric Category | Definition | Example Application |
|---|---|---|
| File Validity | Syntactic correctness and formatting of construction files | JSON or XML parseable, all required fields present |
| Spatial Validity | Absence of inter-part collisions, adherence to allowed orientations | No overlapping bounding boxes |
| Machine Validity | Satisfaction of all construction rules and constraints | All children attached via legal faces, no floating parts |
| Simulation Performance | Functional reward as measured by the physics engine | Car displacement, catapult range |
Benchmarking against these metrics on the BesiegeField tasks reveals that while closed-source models (e.g., Gemini 2.5 Pro) are capable of producing visually sensible designs, substantive error patterns remain. Common failure cases include orientation and collision issues, especially for designs requiring coordinated motion across articulated parts.
6. Reinforcement Learning Enhancement
To address the brittleness of purely generative LLM approaches, reinforcement learning (RL) strategies are employed (Zhang et al., 16 Oct 2025). The process is as follows:
- Dataset Curation: A cold-start dataset comprising expert-level designs and associated reasoning chains is assembled.
- RL Finetuning: Models are trained with RL algorithms such as group relative policy optimization (GRPO), which optimizes expected reward under performance constraints and preserves output diversity by avoiding early entropy collapse (e.g., via Pass@k advantage estimators).
- Iterative Feedback: Simulation-based reward signals are used to further shape the generative distribution, improving the proportion of valid, high-performing machines over training epochs even when the prompt structure is kept fixed.
Results indicate that RL-driven fine-tuning closes part of the performance gap for open-source models, enhancing output validity and reward attainment through trial-based correction.
7. Open Problems and Future Directions
Despite progress, compositional machine design via agentic LLM workflows remains a challenging domain:
- Error Propagation: Even minor spatial reasoning errors can produce critical failures in physical function, highlighting the importance of robust constraint satisfaction and error detection at every design stage.
- Plan–Implementation Fidelity: Misalignment often arises between the high-level blueprint (as articulated in the LLM’s chain-of-thought) and the low-level execution in structured output, reducing effective transfer of design logic to functional assemblies.
- Multimodal Integration: Combining language-based plans with multimodal reasoning (incorporating vision or diagnostic feedback from simulation traces) could enhance part placement, mechanical reliability, and error correction.
- Hierarchical and Modular Design: More principled hierarchical decomposition and compositional strategies—potentially integrating program synthesis, symbolic reasoning, or hybrid methods—are posited as promising avenues to increase the scalability and correctness of agentic design.
- Reward Shaping and Exploration: Techniques from exploration-aware RL, curriculum design, and reward specification are likely critical for scaling to more complex, real-world design tasks where the combinatorial design space and sparse reward signals pose ongoing limitations.
References
- "Agentic Design of Compositional Machines" (Zhang et al., 16 Oct 2025)
This synthesis consolidates the current state of research in agentic compositional machine design, highlighting the integration of LLM-based planning, the utility of structured simulation benchmarks, the necessity of RL augmentation, and the persistent challenges in deploying fully autonomous, robust, and generalizable designer agents in physically grounded synthesis domains.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free