CAD-Coder: Automated Text-to-CAD Generation

Updated 6 February 2026

CAD-Coder is a system that programmatically generates parametric CAD models by converting text, images, or point clouds into executable, editable scripts.
It employs a multi-stage pipeline combining supervised fine-tuning, chain-of-thought planning, and reinforcement learning to optimize geometric accuracy and code validity.
Quantitative evaluations demonstrate that CAD-Coder significantly reduces geometric errors and invalid scripts, making it a robust tool for automated CAD model generation.

A CAD-Coder is a system for programmatically generating or reverse engineering Computer-Aided Design (CAD) models—most commonly parametric 2D/3D models—directly from modality-conditioned inputs such as text, images, or point clouds, by outputting executable, editable CAD scripts or command sequences. The CAD-Coder paradigm emphasizes interpretable, parametric, and editable code representations (not meshes or voxels), direct integration with established CAD kernels, and principled learning or synthesis workflows for both generation and validation. Modern CAD-Coder frameworks leverage LLMs as code generators, often augmented with explicit geometric and logic-based validation mechanisms. This article presents a detailed examination of the CAD-Coder concept, methodologies, evaluation metrics, and open challenges, focusing on the canonical framework described in "CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward" (Guan et al., 26 May 2025).

1. CAD-Coder: Representation and Motivation

CAD-Coder reformulates the text-to-CAD problem as the generation of CadQuery scripts—a Python-based parametric design language. Each script is directly executable via the OpenCASCADE kernel, enabling a unified pathway from text to physically valid, editable 3D models. The motivation for this representation derives from several key requirements:

Parametric modeling: CadQuery exposes variables, functions, and high-level geometric constructors (e.g., .box(), .extrude(), .circle()), supporting reusable, easily modifiable designs.
Immediate geometric validation: Executing a script yields a mesh or explicit failure, enabling direct assessment of code validity without post-processing.
Rich modeling vocabulary: Beyond sketch–extrude primitives, CadQuery supports fillets, arrays, booleans, and coordinate transforms, accommodating complex assemblies.
LLM compatibility and interpretability: CadQuery’s Python-based DSL leverages pretrained LLMs' code generation strengths and provides readable, debuggable outputs.

This design aligns with the broader trend toward interpretable, parametric, and LLM-driven code generation for next-generation CAD workflows (Guan et al., 26 May 2025).

2. Structure of the CAD-Coder Pipeline

The canonical CAD-Coder framework consists of a multi-stage learning pipeline, with explicit geometric and format-aware rewards, and a chain-of-thought (CoT) prompting scheme for improved procedural reasoning.

2.1 Supervised Fine-Tuning (SFT)

CAD-Coder initially performs full-parameter fine-tuning of an LLM backbone (e.g., Qwen2.5-7B-Instruct) on a high-quality dataset of paired text and CadQuery script exemplars:

$\mathcal{L}_{\mathrm{SFT}}(\theta) = -\mathbb{E}_{(L, C_{gt}) \sim \mathcal{D}_{\mathrm{SFT}}} \sum_{t=1}^{|C_{gt}|} \log \pi_\theta\bigl(c_t \mid c_{<t}, L\bigr)$

This stage teaches the model both the CadQuery syntax and the direct mapping from natural language primitives (e.g., "draw a circle") to canonical scripting functions (e.g., .circle()).

2.2 Reinforcement Learning with Group Reward Policy Optimization (GRPO)

After SFT, reinforcement learning is performed using a group policy-gradient approach. For each prompt (with CoT), multiple CadQuery script candidates are sampled; each candidate is executed and its mesh compared to ground truth using Chamfer Distance (CD):

$\mathrm{CD}(P, Q) = \frac{1}{|P|}\sum_{x \in P}\min_{y \in Q}\|x - y\|_2^2 + \frac{1}{|Q|}\sum_{y \in Q}\min_{x \in P}\|x - y\|_2^2$

A geometric reward $R_i^{\mathrm{geo}}$ (based on CD thresholds and linear scaling) and a format reward $R_i^{\mathrm{fmt}}$ (requiring presence of CoT and code blocks) are combined:

$R_i = \lambda_{\rm geo} R_i^{\mathrm{geo}} + \lambda_{\rm fmt} R_i^{\mathrm{fmt}}$

The GRPO loss applies a clipped advantage objective with a KL penalty:

$\begin{aligned} \mathcal{L}_{\mathrm{GRPO}}(\theta) = \mathbb{E}_{L_\mathrm{cot}, \{C_i\} \sim \pi_{\theta_{\rm old}}} \Bigg[ \frac{1}{k}\sum_{i=1}^k &\frac{1}{|C_i|}\sum_{t=1}^{|C_i|} \min\Big( r_{i,t}(\theta)\hat A_{i,t},\, \text{clip}(r_{i,t}(\theta), 1-\epsilon, 1+\epsilon)\hat A_{i,t}\Big) \ &- \beta D_{\mathrm{KL}}(\pi_\theta \parallel \pi_{\rm ref}) \Bigg] \end{aligned}$

with standard notation for token ratios $r_{i,t}$ , advantages $\hat A_{i,t}$ , clipping parameter $\epsilon$ , and regularization $\beta$ .

2.3 Chain-of-Thought (CoT) Planning

Explicit chain-of-thought formatting requires models to output an explicit > … reasoning block, breaking down geometric and procedural planning, before script generation. Empirically, CoT reduces script invalidity by over 50% and biases the reward distribution toward highly accurate geometry.

Example CoT planning:

<think>
1. Decompose into Part A (cube) and Part B (triangular prism).
2. Plan coordinate systems and rotations.
3. Sketch loops for each part with scaling factors.
4. Extrude and union.
</think>

1 2	import cadquery as cq r = partA.union(partB)

3. Dataset Construction and Structure

The dataset underpinning CAD-Coder is derived from the Text2CAD corpus (178,000 samples), expanded and filtered via an automated annotation pipeline:

Text2CAD command-sequence $S$ is converted to multiple CadQuery candidates using DeepSeek-V3.
Scripts failing execution are discarded; those producing valid meshes are compared via Chamfer Distance to the reference mesh.
The lowest-CD script is retained as $C_{gt}$ for each triplet $(L, C_{gt}, M_{gt})$ .
The resulting corpus contains 110,000 validated triplets, partitioned by geometric quality (CD thresholds): 8K high-quality, 70K medium, 32K hard.
An additional CoT sub-dataset (1.5K) of hard cases is generated and manually refined.

This construction ensures rigorous semantic, syntactic, and geometric validation at data scale (Guan et al., 26 May 2025).

4. Quantitative and Qualitative Evaluation

Evaluation combines geometric fidelity (Chamfer Distance), script validity (Invalidity Ratio), and ablation studies:

Method	Mean CD↓	Median CD↓	IR %↓
GPT-4o	133.5	45.9	93.0
Text2CAD	29.3	0.37	3.75
CAD-Coder (full)	6.54	0.17	1.45

Ablation findings:

SFT only: Mean CD 74.6, IR 5.3%
GRPO w/o CoT: Mean CD 17.3, IR 4.95%
Full (SFT + CoT + GRPO): Mean CD 6.54, IR 1.45%

Qualitatively, CAD-Coder generates complex assemblies and multi-operation geometries with a high degree of correspondence to ground truth. Compared to baselines, it dramatically reduces syntactic failures and significant geometric errors (Guan et al., 26 May 2025).

5. Practical Considerations and Model Deployment

Base model: Qwen2.5-7B-Instruct, full-parameter fine-tuned via DeepSpeed ZeRO-2.
Training regimen: SFT (3 epochs, 8K high-quality); initial CoT fine-tuning on 1.5K samples (2 epochs); GRPO RL on 150K total samples.
Generation/candidate selection: Typically $k=8$ scripts per prompt; best candidate selected by inferred geometric reward.
Evaluation: Mean/median CD, Invalidity ratio, successful script rate. Model generalizes to high-complexity descriptions and rare compositions.

6. Limitations and Proposed Extensions

CAD-Coder exhibits several known limitations:

Residual spatial alignment errors in assemblies with multiple interacting parts.
Occasional confusion between extrusion and cut operations.
Susceptibility to sparse Chamfer Distance reward exploitation in thin-walled geometries and internal cavities.

Proposed future directions include:

Augmenting reward functions with metrics such as normal consistency and IoU.
Expanding input modalities, particularly for sketch/image/text fusion.
Enabling interactive, human-in-the-loop correction and iterative infrastructure.
Scaling frameworks to larger LLMs and lightweight adapters for on-device CAD assistants.

These priorities are aligned with the contemporary research trajectory across text-to-CAD and multimodal generation (Guan et al., 26 May 2025).

7. Significance for CAD Research and Engineering

CAD-Coder establishes a template for end-to-end, code-generating CAD automation, advancing the field across accuracy, editability, and integration with standard CAD toolchains. By employing parametric, interpretable scripting (CadQuery) and integrating geometric, syntactic, and reasoning-aware supervision, it achieves a significant reduction in invalid or low-quality outputs relative to prior systems. The modular nature of the pipeline—separating SFT, RL, and CoT reasoning—enables flexible adaptation to emerging multimodal benchmarks and industrial CAD workflows. Its dataset construction pipeline and rigorous metric selection (including reward-based geometric similarity) set a standard for empirical evaluation in this domain (Guan et al., 26 May 2025).

Markdown Upgrade to Chat

References (1)

CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CAD-Coder.