- The paper introduces an agentic LLM-based pipeline that generates scalable, interpretable CAD programs without relying on real construction-history data.
- It features a two-stage process that first creates diversified part descriptions and then synthesizes validated, executable CadQuery code following 19 design principles.
- The method demonstrates high geometric fidelity with quantitative improvements over existing datasets, enhancing CAD automation and geometric deep learning research.
Zero-to-CAD: Agentic Synthesis of Interpretable CAD Programs at Million-Scale Without Real Data
Introduction and Motivation
The paper "Zero-to-CAD: Agentic Synthesis of Interpretable CAD Programs at Million-Scale Without Real Data" (2604.24479) addresses the acute scarcity of large-scale datasets that encapsulate the full sequence history of Computer-Aided Design (CAD) models, critical for downstream generative and reconstruction tasks in geometric deep learning. Conventional datasets predominantly provide boundary representations (B-Reps) or meshes, discarding pivotal procedural information, thus limiting interpretability and editability. Existing datasets with construction timelines are either limited in scale or operation vocabulary, typically restricting themselves to sketch-and-extrude paradigms and missing more advanced design operations like Booleans, fillets, chamfers, shells, and lofts.
Zero-to-CAD leverages the latent domain knowledge encapsulated in state-of-the-art LLMs. The central hypothesis is that LLMs, via exposure to textual engineering and manufacturing corpus, possess substantial priors on mechanical part structure, which can be exploited to synthesize editable, executable parametric CAD timelines at scale even in the absence of genuine data. The paper formalizes CAD sequence synthesis as an agentic search problem, deploying an LLM within an interactive, feedback-driven CAD environment, equipped with tools for code execution and documentation lookup, thereby grounding generation in geometric validity and operation diversity.
Agentic Synthesis Pipeline
Zero-to-CAD’s architecture involves a two-stage generation protocol implemented atop a robust distributed infrastructure:
- Stage 1 generates semantically diversified part descriptions under structured categorization, leveraging an LLM to sample plausible mechanical specifications across 65 curated categories.
- Stage 2 conditions on these descriptions and synthesizes CadQuery code, with strict adherence to 19 design principles encoded within system prompts, ensuring parametric clarity, geometric complexity, and mechanical plausibility.
The agentic loop interleaves reasoning, tool usage, and iterative refinement, with model-driven invocation of execute-and-validate, documentation lookup, and syntax search tools. Code validation encompasses multi-stage geometric checks including connected solids, minimum B-Rep face count, positive volume, and successful STL/STEP exports, systematically filtering out degenerate or non-manufacturable specimens.
Figure 1: Exemplars from Zero-to-CAD show wide-ranging mechanical parts constructed through diverse, interpretable sequences using an LLM agent with real-time tool access.
Figure 2: Agentic code synthesis rollout demonstrates iterative CadQuery generation and error-driven repair until a geometrically valid sequence is produced.
Dataset Properties and Comparative Analysis
The resultant dataset comprises ~1M executable, readable, editable CAD construction histories, with a curated 100k subset selected via k-means clustering on DINOv3 visual embeddings for maximal coverage and diversity. Each sequence is accompanied by complete artifacts including code, meshes, and metadata, streamlining downstream evaluation and research accessibility.
Zero-to-CAD distinguishes itself from prior synthetic datasets (e.g., CAD-Recode, DeepCAD) in several dimensions:
- Semantic Interpretability: Programs are not mere transpiled coordinate chains but logically structured, human-editable parametric timelines with descriptive variables, constraints, and references.
- Operation Diversity: Full spectrum of mechanical part features captured beyond the sketch-and-extrude paradigm, including advanced operations essential for real-world engineering.
- Geometric Quality: Face count distributions align closely with real B-Rep datasets such as ABC, minimizing over-simplified geometries and disjoint bodies through strict validation.
- Distributional Alignment: Quantitative analysis via Fréchet distance and k-ball coverage confirms superior geometric alignment to ABC (Fréchet distance 0.164 vs. 0.268 for CAD-Recode; 57.2% vs. 45.3% coverage at k=5), supporting the claim of realistic diversity.
Figure 3: Failure examples show typical generation issues such as thin features, misaligned holes, and globally incoherent compositions.
Figure 4: Visual comparison of Zero-to-CAD, ABC, DeepCAD, and CAD-Recode datasets highlights progression in complexity and parametric interpretability.
Bootstrapping CAD Sequence Generation: Image-to-Sequence Experiments
Zero-to-CAD’s utility as synthetic supervision is validated through an image-to-sequence task: reconstructing editable CadQuery programs from multi-view rendered images. Qwen3-VL-2B-Instruct VLM is fully fine-tuned on the dataset, evaluated against GPT-5.2 and base Qwen3-VL-2B.
- On Zero-to-CAD test data, the fine-tuned Qwen3-VL-2B achieves 82.1% success rate, mean IoU 0.747, median IoU 0.847, with top decile near-perfect overlap—substantially outperforming proprietary GPT-5.2 models (≤72.2% success, mean IoU ≈0.49).
- On ABC OOD samples, Qwen3-VL-2B fine-tuned achieves 61.0% success, 0.377 mean IoU, outstripping GPT-5.2 in geometric fidelity, underscoring real-to-synthetic domain transferability.
Figure 5: Overview of Image-to-Sequence task, mapping rendered CAD views to executable operation sequences.
Figure 6: Qualitative reconstruction comparison on ABC samples—fine-tuned model achieves higher fidelity to ground truth than GPT-5.2 baseline.
Implications and Future Directions
Zero-to-CAD substantially lowers the barrier for training and evaluating CAD sequence models, enabling progress without dependence on proprietary timelines or real-world construction history datasets. It directly amplifies research in interpretable 3D generation, parametric shape editing, and conditional modeling (text, image, point cloud). Strong numerical results validate the premise that multimodal models can be bootstrapped using synthetic data, achieving reconstructive performance competitive with frontier LLMs and demonstrating meaningful generalization to human-designed geometries.
From a theoretical perspective, the study underscores the efficacy of agentic search protocols as a scalable mechanism for extracting actionable priors from large pretrained models. Practical implications include the facilitation of downstream design automation tasks, parametric shape retrieval, and controllable CAD generation. However, broader challenges remain: synthetic data provenance, attribution, and the ultimate upper bound of synthetic-to-real transfer in high-fidelity engineering applications.
Conclusion
Zero-to-CAD (2604.24479) introduces a rigorously validated agentic synthesis pipeline for parametric CAD sequence generation at unprecedented scale, leveraging LLM priors without any real construction-history data. The resulting dataset and models advance the operational and theoretical landscape for CAD AI, bridging the gap between geometric scale and interpretability, and opening new research avenues for multi-modal conditional generation, editable program reconstruction, and latent design intent extraction.