- The paper introduces an evolutionary framework combining LLMs, VLMs, and RLMs to generate parametric CAD code from natural language prompts.
- It employs a population-based optimization process with selection, crossover, and mutation to improve topology metrics like the Euler characteristic.
- Experimental results on the CADPrompt benchmark demonstrate superior spatial and topological fidelity compared to established baseline methods.
EvoCAD: Evolutionary CAD Code Generation with Vision LLMs
Introduction
EvoCAD introduces a novel framework for generating computer-aided design (CAD) objects via symbolic code representations, leveraging the synergy between LLMs, vision LLMs (VLMs), and evolutionary optimization. The method addresses the challenge of text-to-CAD generation, where the goal is to produce parametric CAD code from natural language prompts, enabling efficient, high-fidelity 3D object creation. Unlike prior approaches that rely on single-step LLM refinement or human-in-the-loop feedback, EvoCAD employs a population-based evolutionary strategy, integrating VLMs and reasoning LLMs (RLMs) for fitness evaluation and selection. The paper also introduces two topology-based metrics, derived from the Euler characteristic, to assess semantic similarity between generated and ground truth objects, complementing traditional spatial metrics.
Figure 1: Visual illustration of the EvoCAD pipeline, showing initialization from a user prompt, population generation via LLMs, and iterative evolutionary optimization with VLM and RLM feedback.
Methodology
Given a natural language prompt p describing a target object O, the objective is to generate CAD code c such that the rendered object O^=ϕ(c) is both geometrically and semantically aligned with O. The ground truth object is not available during generation, necessitating evaluation based solely on prompt-object alignment.
Evolutionary Generation Pipeline
The EvoCAD pipeline consists of the following stages:
- Initialization: An LLM samples M diverse CADQuery code candidates using the prompt and k randomly selected few-shot examples from CADQuery documentation. Diversity is enforced via non-zero temperature and varied few-shot contexts. Non-compilable code is self-debugged using error messages fed back to the LLM.
- Evaluation: Each candidate is rendered into multiview images. A VLM generates textual descriptions of the objects using chain-of-thought prompting. An RLM then ranks the candidates by comparing these descriptions to the prompt and to each other, performing three independent rankings and averaging the results.
- Selection and Crossover: Rankings are transformed into an exponential probability distribution for parent selection. The LLM receives the CAD codes, descriptions, and prompt for selected pairs, and is tasked with generating offspring by combining strengths and mitigating weaknesses.
- Mutation: Offspring are mutated with probability pm by prompting the LLM to refine and improve the code.
- Elitism: The top-ranked candidate is carried forward unmodified to preserve the best solution.
This process is iterated for N generations, with each cycle expected to improve population fitness.
Evaluation Metrics
Traditional metrics such as Point Cloud Distance (PCD), Hausdorff Distance (HDD), Intersection over Union (IoU), and Dice Coefficient (DSC) are used to assess geometric and volumetric alignment. However, these do not capture semantic or structural similarity. EvoCAD introduces two topology-based metrics:
- Topology Error (Terr): Absolute difference in Euler characteristic between ground truth and generated object.
- Topology Correctness (Tcorr): Indicator function for exact match of Euler characteristic, quantifying the percentage of topologically correct samples.
These metrics are computed on watertight objects after normalization and ICP alignment.
Experimental Results
Optimization Behavior
EvoCAD was evaluated on the CADPrompt benchmark using GPT-4V and GPT-4o as LLM/VLMs, and o3-mini as the RLM. Population size was set to 6, with 4 generations and a mutation rate of 50%. The method was compared against 3D-Premise and CADCodeVerify baselines.
(Figure 2)
Figure 2: Evolution of topology correctness (Tcorr) and topology error (Terr) across generations, demonstrating consistent improvement and surpassing baselines after the second generation.
EvoCAD shows monotonic improvement in both Tcorr and Terr, outperforming baselines after two generations. GPT-4o yields higher and more consistent topology metrics, while GPT-4V exhibits greater stability in spatial metrics.
Quantitative Comparison
| Method |
Tcorr (%) ↑ |
Terr ↓ |
PCD ↓ |
HDD ↓ |
IoU (%) ↑ |
DSC (%) ↑ |
| 3D-Premise |
79.9 |
0.579 |
0.0660 |
0.189 |
68.2 |
77.6 |
| CADCodeVerify |
80.5 |
0.629 |
0.0628 |
0.182 |
69.8 |
79.0 |
| EvoCAD-4v |
82.4 (2.3) |
0.446 (0.069) |
0.0626 (0.0007) |
0.180 (0.002) |
69.7 (0.4) |
79.1 (0.2) |
| EvoCAD-4o |
87.2 (0.4) |
0.410 (0.013) |
0.0617 (0.0020) |
0.177 (0.004) |
69.9 (1.0) |
79.4 (0.8) |
EvoCAD-4o achieves the best results across all metrics except IoU, where EvoCAD-4v is marginally lower than CADCodeVerify. The improvement in topology metrics is particularly pronounced.
Qualitative Analysis
(Figure 3)
Figure 3: Qualitative comparison of generated objects for representative prompts, showing EvoCAD's superior adherence to prompt semantics and topological correctness (Euler characteristic χ).
EvoCAD consistently generates objects with correct topological features (e.g., number of holes), as indicated by matching Euler characteristics. Prior methods frequently miss or misplace holes, leading to lower Tcorr and higher Terr. EvoCAD also demonstrates robustness in generating diverse and complex CAD objects.
Implications and Future Directions
EvoCAD demonstrates that evolutionary optimization, guided by VLM and RLM feedback, can significantly enhance the semantic and topological fidelity of text-to-CAD generation. The introduction of topology-based metrics addresses a critical gap in evaluating structural similarity, which is essential for engineering and manufacturing applications. The method is model-agnostic and benefits from advances in LLM/VLM architectures, as evidenced by superior results with GPT-4o.
A current limitation is the computational cost associated with large populations and multiple generations, due to the high number of LLM API calls. As inference costs decrease and more efficient models emerge, larger-scale evolutionary optimization will become feasible, enabling further improvements in object complexity and fidelity.
Conclusion
EvoCAD presents a robust framework for evolutionary CAD code generation, integrating LLMs, VLMs, and RLMs for population-based optimization. The method achieves superior performance on both spatial and topological metrics, particularly in generating objects with correct structural properties. The topology-based evaluation metrics introduced in this work provide a more nuanced assessment of semantic similarity, complementing existing geometric measures. Future work should focus on scaling evolutionary optimization and extending the approach to more complex design domains, leveraging ongoing advances in multimodal AI models.