Papers
Topics
Authors
Recent
Search
2000 character limit reached

EvoCAD: Evolutionary CAD Code Generation with Vision Language Models

Published 13 Oct 2025 in cs.CV, cs.AI, and cs.NE | (2510.11631v1)

Abstract: Combining LLMs with evolutionary computation algorithms represents a promising research direction leveraging the remarkable generative and in-context learning capabilities of LLMs with the strengths of evolutionary algorithms. In this work, we present EvoCAD, a method for generating computer-aided design (CAD) objects through their symbolic representations using vision LLMs and evolutionary optimization. Our method samples multiple CAD objects, which are then optimized using an evolutionary approach with vision language and reasoning LLMs. We assess our method using GPT-4V and GPT-4o, evaluating it on the CADPrompt benchmark dataset and comparing it to prior methods. Additionally, we introduce two new metrics based on topological properties defined by the Euler characteristic, which capture a form of semantic similarity between 3D objects. Our results demonstrate that EvoCAD outperforms previous approaches on multiple metrics, particularly in generating topologically correct objects, which can be efficiently evaluated using our two novel metrics that complement existing spatial metrics.

Summary

  • The paper introduces an evolutionary framework combining LLMs, VLMs, and RLMs to generate parametric CAD code from natural language prompts.
  • It employs a population-based optimization process with selection, crossover, and mutation to improve topology metrics like the Euler characteristic.
  • Experimental results on the CADPrompt benchmark demonstrate superior spatial and topological fidelity compared to established baseline methods.

EvoCAD: Evolutionary CAD Code Generation with Vision LLMs

Introduction

EvoCAD introduces a novel framework for generating computer-aided design (CAD) objects via symbolic code representations, leveraging the synergy between LLMs, vision LLMs (VLMs), and evolutionary optimization. The method addresses the challenge of text-to-CAD generation, where the goal is to produce parametric CAD code from natural language prompts, enabling efficient, high-fidelity 3D object creation. Unlike prior approaches that rely on single-step LLM refinement or human-in-the-loop feedback, EvoCAD employs a population-based evolutionary strategy, integrating VLMs and reasoning LLMs (RLMs) for fitness evaluation and selection. The paper also introduces two topology-based metrics, derived from the Euler characteristic, to assess semantic similarity between generated and ground truth objects, complementing traditional spatial metrics. Figure 1

Figure 1: Visual illustration of the EvoCAD pipeline, showing initialization from a user prompt, population generation via LLMs, and iterative evolutionary optimization with VLM and RLM feedback.

Methodology

Problem Formulation

Given a natural language prompt pp describing a target object OO, the objective is to generate CAD code cc such that the rendered object O^=ϕ(c)\hat{O} = \phi(c) is both geometrically and semantically aligned with OO. The ground truth object is not available during generation, necessitating evaluation based solely on prompt-object alignment.

Evolutionary Generation Pipeline

The EvoCAD pipeline consists of the following stages:

  • Initialization: An LLM samples MM diverse CADQuery code candidates using the prompt and kk randomly selected few-shot examples from CADQuery documentation. Diversity is enforced via non-zero temperature and varied few-shot contexts. Non-compilable code is self-debugged using error messages fed back to the LLM.
  • Evaluation: Each candidate is rendered into multiview images. A VLM generates textual descriptions of the objects using chain-of-thought prompting. An RLM then ranks the candidates by comparing these descriptions to the prompt and to each other, performing three independent rankings and averaging the results.
  • Selection and Crossover: Rankings are transformed into an exponential probability distribution for parent selection. The LLM receives the CAD codes, descriptions, and prompt for selected pairs, and is tasked with generating offspring by combining strengths and mitigating weaknesses.
  • Mutation: Offspring are mutated with probability pmp_m by prompting the LLM to refine and improve the code.
  • Elitism: The top-ranked candidate is carried forward unmodified to preserve the best solution.

This process is iterated for NN generations, with each cycle expected to improve population fitness.

Evaluation Metrics

Traditional metrics such as Point Cloud Distance (PCD), Hausdorff Distance (HDD), Intersection over Union (IoU), and Dice Coefficient (DSC) are used to assess geometric and volumetric alignment. However, these do not capture semantic or structural similarity. EvoCAD introduces two topology-based metrics:

  • Topology Error (TerrT_{err}): Absolute difference in Euler characteristic between ground truth and generated object.
  • Topology Correctness (TcorrT_{corr}): Indicator function for exact match of Euler characteristic, quantifying the percentage of topologically correct samples.

These metrics are computed on watertight objects after normalization and ICP alignment.

Experimental Results

Optimization Behavior

EvoCAD was evaluated on the CADPrompt benchmark using GPT-4V and GPT-4o as LLM/VLMs, and o3-mini as the RLM. Population size was set to 6, with 4 generations and a mutation rate of 50%. The method was compared against 3D-Premise and CADCodeVerify baselines.

(Figure 2)

Figure 2: Evolution of topology correctness (TcorrT_{corr}) and topology error (TerrT_{err}) across generations, demonstrating consistent improvement and surpassing baselines after the second generation.

EvoCAD shows monotonic improvement in both TcorrT_{corr} and TerrT_{err}, outperforming baselines after two generations. GPT-4o yields higher and more consistent topology metrics, while GPT-4V exhibits greater stability in spatial metrics.

Quantitative Comparison

Method TcorrT_{corr} (%) ↑ TerrT_{err} PCD ↓ HDD ↓ IoU (%) ↑ DSC (%) ↑
3D-Premise 79.9 0.579 0.0660 0.189 68.2 77.6
CADCodeVerify 80.5 0.629 0.0628 0.182 69.8 79.0
EvoCAD-4v 82.4 (2.3) 0.446 (0.069) 0.0626 (0.0007) 0.180 (0.002) 69.7 (0.4) 79.1 (0.2)
EvoCAD-4o 87.2 (0.4) 0.410 (0.013) 0.0617 (0.0020) 0.177 (0.004) 69.9 (1.0) 79.4 (0.8)

EvoCAD-4o achieves the best results across all metrics except IoU, where EvoCAD-4v is marginally lower than CADCodeVerify. The improvement in topology metrics is particularly pronounced.

Qualitative Analysis

(Figure 3)

Figure 3: Qualitative comparison of generated objects for representative prompts, showing EvoCAD's superior adherence to prompt semantics and topological correctness (Euler characteristic χ\chi).

EvoCAD consistently generates objects with correct topological features (e.g., number of holes), as indicated by matching Euler characteristics. Prior methods frequently miss or misplace holes, leading to lower TcorrT_{corr} and higher TerrT_{err}. EvoCAD also demonstrates robustness in generating diverse and complex CAD objects.

Implications and Future Directions

EvoCAD demonstrates that evolutionary optimization, guided by VLM and RLM feedback, can significantly enhance the semantic and topological fidelity of text-to-CAD generation. The introduction of topology-based metrics addresses a critical gap in evaluating structural similarity, which is essential for engineering and manufacturing applications. The method is model-agnostic and benefits from advances in LLM/VLM architectures, as evidenced by superior results with GPT-4o.

A current limitation is the computational cost associated with large populations and multiple generations, due to the high number of LLM API calls. As inference costs decrease and more efficient models emerge, larger-scale evolutionary optimization will become feasible, enabling further improvements in object complexity and fidelity.

Conclusion

EvoCAD presents a robust framework for evolutionary CAD code generation, integrating LLMs, VLMs, and RLMs for population-based optimization. The method achieves superior performance on both spatial and topological metrics, particularly in generating objects with correct structural properties. The topology-based evaluation metrics introduced in this work provide a more nuanced assessment of semantic similarity, complementing existing geometric measures. Future work should focus on scaling evolutionary optimization and extending the approach to more complex design domains, leveraging ongoing advances in multimodal AI models.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.