CAD Code Generation Advances

Updated 9 March 2026

CAD code generation is an automated, data-driven process that produces symbolic, human-editable 2D/3D design models using domain-specific languages.
It integrates large language models, vision-language models, evolutionary strategies, and reinforcement learning to refine design code with topological and geometric precision.
The method supports diverse representations—such as CadQuery, token sequences, and JSON schemas—facilitating post-hoc editing, constraint-based optimization, and industrial applications.

Computer-Aided Design (CAD) code generation encompasses automated, data-driven methods for producing parametric, human-editable 2D or 3D design models as symbolic code sequences—typically in domain-specific languages such as CADQuery, FlexCAD, or JSON-based schemas—based on various input modalities including text, images, point clouds, or engineering drawings. This research area integrates LLMs, vision-LLMs (VLMs), multimodal alignment, evolutionary and reinforcement learning, as well as topological and geometric validation metrics to enable high-fidelity, editable, and controllable CAD model synthesis, with particular relevance to engineering design and industrial manufacturing domains.

1. Symbolic CAD Code Representations

CAD code generation relies on producing parametric, interpretable code—far more semantically informative than mesh representations. Three dominant symbolic formats have emerged:

CADQuery (Python DSL): Widely adopted in methods such as EvoCAD (Preintner et al., 13 Oct 2025), CAD-Coder (Guan et al., 26 May 2025), CME-CAD (Niu et al., 29 Dec 2025), and text-to-CadQuery pipelines (Xie et al., 10 May 2025), CadQuery expresses operations such as sketch, extrude, fillet, and boolean directly in a chainable Python API. This enables fully editable, compact, and human-readable CAD program generation, and tight integration with LLMs proficient in code synthesis.

result = (
    Workplane("XY")
    .rect(50, 30)
    .extrude(5)
    .faces(">Z")
    .workplane()
    .circle(3)
    .cutThruAll()
)

Structured Text/Token Sequences: Approaches such as FlexCAD (Zhang et al., 2024) and SkexGen (Xu et al., 2022) represent the full sketch-extrude hierarchy linearly as token streams. These span primitives (line, arc, circle), loops, faces, sketches, and extrusions, flattening into a sequence infillable by LLM-based architectures.
Hierarchical and JSON Representations: Hierarchical Neural Coding (Xu et al., 2023) and CADmium (Govindarajan et al., 13 Jul 2025) introduce tree-like or JSON-based schemas capturing design at loop, profile, and solid levels. This supports multi-level control, local editing, and easy round-tripping between code and geometry.

This symbolic emphasis enables post-hoc editing, supports parametric sweeps and constraint-based optimization, and is essential to the integration of reasoning, multimodal alignment, and evolutionary operations.

2. Core Methodologies: Generative and Optimization Architectures

Recent advances have adopted several architectural paradigms:

Evolutionary-LLM Synergy: EvoCAD (Preintner et al., 13 Oct 2025) represents a hybrid, maintaining populations of CADQuery programs refined via evolutionary operators (initialization, fitness evaluation, crossover, mutation) orchestrated by LLMs and VLMs. Initialization involves LLM-few-shot sampling. Fitness combines VLM-driven image captioning and reasoning LLMs for prompt alignment ranking. Parent selection employs rank-based exponential probability. Crossover merges parent code using LLM in-context synthesis, while mutation enables parameteric and structural perturbations.
Masked-Inpainting and Hierarchical Control: FlexCAD (Zhang et al., 2024) implements hierarchy-aware token masking, enabling not only sketch-level infilling but control over arbitrary semantic units, from single curves up to entire CAD objects. This allows users to edit or condition on any span using [mask] tokens, which is then autoregressively reconstructed.
Multi-Expert and Reinforcement Learning: CME-CAD (Niu et al., 29 Dec 2025) introduces a heterogeneous multi-expert paradigm: numerous large models independently generate chain-of-thought (CoT) traces and CADQuery code. Collaboration is incentivized via cross-expert KL, hard-negative mining, and a complex reward structure (format, executability, geometric IoU, work-plane alignment). RL algorithms such as GRPO are employed for reward-guided fine-tuning.
Latent Diffusion and Bayesian Flow: GenCAD-3D (Yu et al., 17 Sep 2025) aligns CAD code and geometric (mesh/point cloud) latent embeddings through contrastive learning, then employs a conditional latent diffusion model to sample code sequences, further regularized via synthetic balancing. TGBFN (Zheng et al., 29 Oct 2025) leverages continuous, differentiable Bayesian flow in categorical simplex space to enable program generation with precise, continuous quantitative constraints.
Text-to-Code Pipelines: Text-to-CadQuery (Xie et al., 10 May 2025) and CAD-Llama (Li et al., 7 May 2025) leverage the code-generation strengths of pretrained LLMs, using supervised fine-tuning on large natural language to CAD code pairs. Chain-of-thought and structured prompt templates, as well as staged reinforcement learning with geometric rewards, are widely integrated for improved reasoning and geometric fidelity (Guan et al., 26 May 2025, Niu et al., 13 Aug 2025).

3. Evaluation Metrics: Geometric, Topological, and Semantic Correctness

CAD code generation models are evaluated using a broad set of criteria, including spatial, topological, and semantic metrics:

Spatial Metrics: Chamfer Distance (CD), Point Cloud Distance (PCD), Hausdorff Distance (HDD), and Volumetric Intersection-over-Union (IoU) are standard. They quantify the geometric proximity of generated and ground-truth models sampled as point clouds or meshes (Preintner et al., 13 Oct 2025, Alrashedy et al., 2024).
Topological Metrics: EvoCAD (Preintner et al., 13 Oct 2025) introduces Euler characteristic-based metrics:
- Topology Error ( $T_{err} = |\chi(O) - \chi(\hat{O})|$ )
- Topology Correctness ( $T_{corr} = 1_{\chi(O)}(\chi(\hat{O}))$ )

Additional indicators such as Discrete Mean Curvature Difference and Sphericity Discrepancy (CADmium (Govindarajan et al., 13 Jul 2025)) provide sensitivity to surface and structural characteristics overlooked by pure spatial overlap.

Executability and Validity: Fraction of output programs compiling to valid, watertight B-Rep solids is essential, especially for downstream editing (Xie et al., 10 May 2025, Guan et al., 26 May 2025).
Text-to-CAD Consistency: For text-prompted cases, alignment is measured using Boolean/instructional (“Ver-score”, “VLLM-score”), often with model ensemble voting (Zhang et al., 12 Jun 2025).
Human and VLM Judgments: Shape matching by VLMs and human assessments of realism and text alignment supplement quantitative evaluation (Govindarajan et al., 13 Jul 2025, Zhang et al., 2024).

4. Benchmarks, Datasets, and Comparative Results

Benchmarks such as CADPrompt (Alrashedy et al., 2024, Preintner et al., 13 Oct 2025) provide standardized text-to-CAD tasks with expert-annotated code and mesh pairs. Datasets integrating multi-modal alignments, e.g. CADExpert (images, code, CoT, and annotation (Niu et al., 29 Dec 2025)), FlexCAD (text-sequences (Zhang et al., 2024)), and GenCAD-Code (images and CadQuery (Doris et al., 20 May 2025)), are foundational.

Quantitative results have advanced rapidly:

Method	Topology Corr.	Chamfer Dist.	Validity	IoU	Notable Features
3D-Premise	79.9%	0.0660	–	68.2%	One-shot + VLM-refinement
CADCodeVerify	80.5%	0.0628	–	69.8%	Iterative VLM-LLM Q/A feedback loop
EvoCAD-4o	87.2%	0.0617	–	69.9%	Surpasses baseline topology by 6–7 pp
CME-CAD	–	1.00 (Mean CD)	98.3%	80.7%	Multi-expert RL, industrial drawings
CAD-Coder	–	0.17 (Med CD)	98.55%	–	CoT, RL with geometric reward
CAD-Judge	99%*	0.15	98.62%*	–	Compiler-as-Judge/Review, utility

(*Fine-/coarse primitive F1, DeepCAD test, (Zhou et al., 6 Aug 2025); see source for exact column dependencies.)

EvoCAD (Preintner et al., 13 Oct 2025) improves topology correctness (T_corr) over baselines (e.g., 87.2% for EvoCAD-4o vs. 80.5% for CADCodeVerify), while maintaining or improving geometric fidelity.

5. Semantic Control, Editability, and Local Geometry Operations

Multiple works address the need for local, user-driven editing and semantic control:

Component- and Hierarchy-Aware Inpainting: FlexCAD (Zhang et al., 2024) and GeoCAD (Zhang et al., 12 Jun 2025) support mask-based local/component-level control—editing a loop, face, or sub-sketch without affecting global model structure. GeoCAD further introduces a comprehensive geometric-instruction captioning pipeline, systematically annotating and controlling 221K local sketch parts.
Hierarchical Control and Tree Codebooks: Architectures featuring multilevel codebooks (Xu et al., 2023, Xu et al., 2022) enable the fixing, swapping, or resampling of design components at arbitrary semantic granularity, e.g., modifying a loop's geometry while preserving global arrangement.
Chain-of-Thought and CoT Verification: The integration of explicit reasoning traces—either for planning multi-stage operations or for verifying compliance with textual instructions—boosts reasoning and geometric correctness. This has been validated both in ablation and aggregate outcomes (Guan et al., 26 May 2025, Niu et al., 13 Aug 2025, Niu et al., 29 Dec 2025).
Interactive Annotation and DXF/2D Support: Approaches such as CAD-Coder (He et al., 13 May 2025) extend beyond 3D to editable, dimension- and tolerance-annotated 2D sketches, inclusively supporting manufacturing documentation.

6. Limitations and Open Problems

Despite rapid advances, open challenges in CAD code generation persist:

Scaling Costs: EvoCAD (Preintner et al., 13 Oct 2025), CME-CAD (Niu et al., 29 Dec 2025), and similar pipelines are constrained by the inference cost of LLM/VLM models, especially for population-based or multi-expert schemes.
Watertightness and Robustness: Non-watertight mesh generation (e.g., from invalid CAD programs) complicates metric computation and downstream editing (Preintner et al., 13 Oct 2025, Govindarajan et al., 13 Jul 2025).
Structural Fidelity on Rare or Complex Topologies: Failure cases arise with unusual loop counts, rare geometric motifs, or operations such as fillet/chamfer/sweep missing from training vocabularies (Zhang et al., 2024, Doris et al., 20 May 2025).
Semantic Alignment and Prompt Sensitivity: Generated models may only approximate but not strictly enforce textual or external geometric constraints, or display sensitivity to prompt phrasing (Govindarajan et al., 13 Jul 2025, Alrashedy et al., 2024).
Reward Hacking and Metric Plateaus: As RL-based pipelines optimize for geometric rewards, reward hacking and metric saturation can occur, limiting practical improvements (Guan et al., 26 May 2025, Zhou et al., 6 Aug 2025).

Impending work points toward larger-scale datasets, grammar-constrained code emission, lightweight on-prem or GPU-accelerated inference, constraint-solver integration, and end-to-end sketch/image-to-CAD pipelines with industrial editing fidelity.

7. Directions for Future Research

Emerging trends focus on:

Scalable, Locally Executable Models: To enable large populations (EvoCAD), multi-expert RL (CME-CAD), or real-time design editing (FlexCAD), research is shifting toward lighter, local LLMs and multimodal models (Preintner et al., 13 Oct 2025, Zhang et al., 2024).
Novelty Search, Niching, and Diversity Promotion: Evolutionary strategies benefiting from explicit novelty/niching remain underexplored for CAD code (Preintner et al., 13 Oct 2025).
Hybrid Search and Latent Space Sampling: Retrieval+diffusion hybrids (GenCAD-3D) and Bayesian flows (TGBFN) suggest new axes for conditional and constrained CAD code generation (Yu et al., 17 Sep 2025, Zheng et al., 29 Oct 2025).
Topological Consistency Guarantees: Euler-characteristics and watertightness-checks are being extended to metric learning and structural validation (Preintner et al., 13 Oct 2025, Govindarajan et al., 13 Jul 2025).
Integration with Industrial CAD Systems: Pushing toward real-world adoption, future work includes plugins, STEP export fidelity, dynamic parameter optimization, and support for advanced operations and assemblies (Zhang et al., 2024, Niu et al., 29 Dec 2025).
Expanding Benchmark Coverage: Next-generation datasets will include more complex assemblies, free-form domains, and broader annotation of text/image/point/constraint modalities (Zhang et al., 12 Jun 2025, Xu et al., 2024).

The field of CAD code generation is rapidly converging on highly controllable, topologically consistent, and semantically aligned pipelines, supported by advanced LLM/VLM architectures, dataset curation, and comprehensive benchmarks. The symbolic code paradigm provides robust foundations for editability, constraint-driven design, and downstream integration, with continued advances predicated on scalable computation, multimodal alignment, and deeper integration of domain-specific geometric reasoning.