Neural CAD Code Generation
- Neural CAD code generation is a method that creates fully editable, parametric CAD models from multimodal inputs such as text, images, and point clouds.
- It leverages transformer architectures, hierarchical codebooks, and reinforcement learning to generate sequential construction steps and enforce geometric validity.
- The approach ensures semantic clarity and downstream editability by producing CAD scripts compatible with standard engineering tools and APIs.
Neural CAD code generation refers to the synthesis of computer-aided design (CAD) construction sequences, parametric scripts, or symbolic models from structured or unstructured input—including natural language, images, point clouds, or prompts—using neural networks. Unlike mesh or surface generation, neural CAD focuses on producing editable, semantically meaningful representations such as sketch-and-extrude histories, code for CAD scripting APIs (e.g., CadQuery), or interpretable command trees that are compatible with standard engineering tools, enabling downstream editing, fabrication, and collaboration. Recent research achieves this via a range of architectures—transformer-based models, hierarchical codebooks, latent diffusion models, multimodal encoders, and reinforcement learning—leveraging large datasets of paired CAD histories and language.
1. Foundational Challenges and Problem Structure
The goal of neural CAD code generation is to automate the creation of fully parametric, editable models given high-level intent or multimodal cues (Khan et al., 2024). CAD design is inherently a sequential, parametric process: designers construct 2D sketches—using lines, arcs, and circles—then apply 3D operations such as extrude, cut, or fillet, forming a construction history rich in design intent and modifiable parameters. Major challenges include:
- Precise semantics and editability: Unlike mesh-based outputs, the generated representation must encode a sequence of operations and parameters to ensure downstream re-editing and adaptability (Xu et al., 2023, He et al., 13 May 2025).
- Ambiguity in user intent: Mapping free-form or abstract prompts to precise geometric and topological constructs requires disambiguation and parameter inference (Khan et al., 2024).
- Data scarcity: Public datasets pairing natural language with symbolic CAD programs or sketch-extrude histories are limited, and coverage of complex operations (e.g., fillets, lofts, constraints) remains sparse (Khan et al., 2024, Govindarajan et al., 13 Jul 2025).
- Multimodal inputs: Robustness to text, images, point clouds, and design specifications is essential for practical engineering workflows (Alrashedy et al., 2024, Doris et al., 20 May 2025, Jobczyk et al., 2023).
- Validity and geometric correctness: Small errors in command sequencing or parameter values have drastic impacts on geometric feasibility, mesh validity, and downstream manufacturability (Zheng et al., 29 Oct 2025, Tsuji et al., 29 May 2025).
2. Neural Representations and Symbolic Targets
Most leading frameworks target symbolic, executable CAD representations closely tied to industry standards:
- Sketch-and-extrude sequences: A sequence of parametrized 2D primitives and 3D construction steps, often using tokenization and quantization for neural modeling (Khan et al., 2024, Xu et al., 2023, Xu et al., 2022).
- CAD scripting languages: Direct generation of parameterized Python scripts for APIs such as CadQuery, which encode hierarchy, Boolean logic, and constraints, enabling immediate validation and use in downstream CAD kernels (Guan et al., 26 May 2025, Xie et al., 10 May 2025, Zheng et al., 29 Oct 2025).
- JSON histories or tree structures: Hierarchical trees capturing part–profile–loop deconstruction offer fine-grained control and editability (Xu et al., 2023, Govindarajan et al., 13 Jul 2025).
- Hybrid representations: For complex topologies (e.g., non-planar surfaces, holes), hybrid NURBS–primitive schemas further enhance representation power and token efficiency (Usama et al., 9 Nov 2025).
Tokenization strategies often quantize continuous values (coordinates, angles, radii) into discrete bins, with special tokens for end-of-curve, end-of-loop, etc. (Khan et al., 2024, Xu et al., 2022). These approaches facilitate autoregressive decoding and enforce syntactic validity.
3. Model Architectures and Learning Pipelines
3.1. Transformer and Hierarchical Models
Autoregressive transformer architectures—stacked encoder–decoder blocks with multi-head self and cross-attention—dominate contemporary text-to-CAD systems (Khan et al., 2024, Xu et al., 2023, Govindarajan et al., 13 Jul 2025, Xu et al., 2022). Hierarchical models disentangle global part arrangements, profiles, and local curve geometries using vector-quantized codebooks, facilitating conditional control and interpolation (Xu et al., 2023). Code selection and two-stage decoding (code tree sampling, followed by full construction sequence generation) enable diverse, editable outputs.
3.2. LLMs for Code Generation
Fine-tuned, instruction-following LLMs (e.g., Qwen2.5, GPT-4 derivatives) demonstrate strong performance when directly targeting CAD scripting languages (e.g., CadQuery) (Guan et al., 26 May 2025, Xie et al., 10 May 2025, He et al., 13 May 2025). Training with high-quality, executable script–prompt pairs, sometimes annotated via automated LLM pipelines, supports both zero-shot and few-shot capabilities. Chain-of-thought (CoT) planning improves reasoning over multi-step construction and parameter assignment (Guan et al., 26 May 2025, Niu et al., 13 Aug 2025, Niu et al., 29 Dec 2025).
3.3. Reinforcement and Evolutionary Learning
Several state-of-the-art frameworks employ reinforcement learning post-training to maximize geometric accuracy and code validity. Custom reward functions combine geometric metrics (Chamfer Distance, IoU) with syntax, format, or external LLM evaluations, and effective policy optimization strategies include Group Reward Policy Optimization (GRPO), Trust Region Stretch (TRS), and multi-expert collaborative training (Guan et al., 26 May 2025, Niu et al., 13 Aug 2025, Niu et al., 29 Dec 2025). Evolutionary algorithms further refine candidate programs via crossover and mutation, using vision-language feedback to iteratively improve semantic alignment and topological correctness (Preintner et al., 13 Oct 2025).
3.4. Multimodal and Vision-Language Approaches
Vision-LLMs process rendered images, multi-view projections, or rasterized sketches to generate symbolic CAD code (Alrashedy et al., 2024, Doris et al., 20 May 2025, Jobczyk et al., 2023). Fusion modules align visual and textual features, with adapters and multimodal encoders ensuring alignment in shared latent spaces. Automated feedback loops, including question–answer refinement and human-in-the-loop correction, increase sample validity and geometric fidelity (Alrashedy et al., 2024, Badagabettu et al., 2024).
4. Datasets, Annotation, and Evaluation
4.1. Datasets and Annotation Pipelines
- DeepCAD: The backbone for many studies, providing ~170K–178K CAD models with full sketch–extrude histories (Khan et al., 2024, Xu et al., 2023).
- Text2CAD: Adds ~660K multi-level text prompts (L0–L3), spanning abstract to expert instructions (Khan et al., 2024).
- partABC, CADExpert, GenCAD-Code, ExeCAD: Curated for NURBS modeling, multi-expert RL, vision-language pairing, or benchmarking (Usama et al., 9 Nov 2025, Niu et al., 29 Dec 2025, Doris et al., 20 May 2025, Niu et al., 13 Aug 2025).
- Automated annotation: VLMs (e.g., LLaVA-NeXT, BLIP2, InternVL3) and LLMs (Mistral-50B, Qwen3 series) are integrated into multi-stage pipelines for image-to-prompt, JSON-to-instruction, and script validation (Khan et al., 2024, Usama et al., 9 Nov 2025).
4.2. Metrics
Comprehensive evaluation employs:
| Type | Metrics / Definitions |
|---|---|
| Parametric Precision | Primitive F1 (aligns predicted vs. ground-truth by type) |
| Geometry | Chamfer Distance (CD), Jensen–Shannon Divergence, IoU |
| Validity | Invalidity Ratio (IR): % of syntactically/geometrically invalid outputs |
| Topology | Euler characteristic match, Sphericity, Mean Curvature |
| Visual Quality | GPT-4V or human judges for alignment with prompt |
| Editability | Human judgment: ease of editing, expression fidelity |
4.3. Benchmarking Results
- Text2CAD (L3 expert prompts): Line F1 81.1%, Arc F1 36.0%, Circle F1 74.3%, Extrusion F1 93.3%, Median CD 0.37e-3, IR 0.9% (Khan et al., 2024).
- CAD-Coder (SFT+CoT+GRPO): Mean CD 6.54, Median CD 0.17, IR 1.45% on Text2CAD; SOTA geometry and executable code across several test benchmarks (Guan et al., 26 May 2025).
- CME-CAD: Highest reported IoU 80.7%, Mean CD 1.00 mm, Med CD 0.11 mm, Executability 98.3% on CADExpert (Niu et al., 29 Dec 2025).
- Evolutionary models (EvoCAD): Achieve best topology error (0.410), ~87% topology correctness on CADPrompt (Preintner et al., 13 Oct 2025).
5. Controllability, Editability, and Practical Considerations
The ability to control, inspect, and edit generated CAD code is central to usability:
- Hierarchical codebooks: Allow targeted edits at solid, profile, or loop levels (swap a code to change local geometry, adjust parameters for rapid prototype variation) (Xu et al., 2023).
- Hybrid NURBS/primitive schemas: Maintain fidelity while simplifying low-complexity or analytic regions (Usama et al., 9 Nov 2025).
- Interactive feedback loops: Iterative refinement using visual question-answering, code corrections (self-repair), or multi-expert voting enhances structural and geometric alignment (Alrashedy et al., 2024, Tsuji et al., 29 May 2025, Niu et al., 29 Dec 2025).
- Integration with mainstream CAD software: Scripted output in CadQuery, ezdxf, or FreeCAD macros ensures interoperability and downstream parametric editing (He et al., 13 May 2025, Xie et al., 10 May 2025, Badagabettu et al., 2024).
Table: Comparison of Controllability Mechanisms
| Mechanism | Approach | Notable Example |
|---|---|---|
| Code-tree editing | Manual code-token swap at any level | (Xu et al., 2023) |
| RL-guided refinement | Reward for editability, structure | (Guan et al., 26 May 2025, Niu et al., 29 Dec 2025) |
| CoT scaffolding | Explicit step-by-step reasoning | (Niu et al., 13 Aug 2025) |
| Human-in-the-loop | Caption/QA/failure correction | (Alrashedy et al., 2024, Badagabettu et al., 2024) |
6. Limitations, Open Problems, and Research Directions
While neural CAD code generation demonstrates strong empirical gains, open challenges remain:
- Generalization and data diversity: Most datasets (e.g., DeepCAD) overrepresent simple prismatic shapes (boxes, cylinders), with limited coverage of advanced operations (fillets, lofts, constraints, assemblies), restricting out-of-distribution robustness (Khan et al., 2024).
- Propagation of annotation/vision errors: VLM hallucinations in shape description stages or image-based pipelines introduce train-time ambiguity (Khan et al., 2024, Usama et al., 9 Nov 2025).
- Missing symbolic/physical constraints: Constraints such as angle, center-of-mass, or manufacturability criteria are rarely modeled; extending reward functions and code representations to cover these is an active area (Zheng et al., 29 Oct 2025, Niu et al., 13 Aug 2025, Niu et al., 29 Dec 2025).
- Scalability and efficiency: RL post-training (multiple PPO epochs), training of large LLMs, and multimodal fusion models require significant compute (Zheng et al., 29 Oct 2025, Niu et al., 13 Aug 2025).
- Syntactic vs. geometric validity: Self-repair and guided diffusion pipelines partially address the risk of generating non-manifold, infeasible, or invalid structures, but no method achieves perfect validity (Tsuji et al., 29 May 2025, Zheng et al., 29 Oct 2025).
- Multimodal, interactive, real-world support: Extending models to handle ambiguous, underspecified, or visually complex queries—augmented with natural dialogue or direct user correction—remains a core research direction (Badagabettu et al., 2024, Alrashedy et al., 2024).
Emergent areas include multimodal Coalition-of-Expert frameworks for robust design (see CME-CAD (Niu et al., 29 Dec 2025)), NURBS-based pipelines for higher-order surface creation (Usama et al., 9 Nov 2025), evolutionary search for semantic/topological correctness (Preintner et al., 13 Oct 2025), and latent diffusion for cross-modality alignment (Yu et al., 17 Sep 2025).
7. Outlook and Integration into Engineering Practice
Neural CAD code generation is quickly maturing from fundamental research to industrially relevant toolchains. Integration into mainstream CAD editors via script or JSON macro interfaces is already practical (Xie et al., 10 May 2025, He et al., 13 May 2025), and large-scale fine-tuned models now rival, and sometimes surpass, human designers on routine parametric tasks (Govindarajan et al., 13 Jul 2025, Niu et al., 29 Dec 2025). Future research is expected to:
- Expand coverage to advanced modeling operations and assemblies.
- Develop closed-loop systems integrating constraint solvers and visual feedback for functional and logical correctness.
- Enable collaborative, version-controlled, and explainable design iteration entirely through natural multimodal interfaces.
References:
- "Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts" (Khan et al., 2024)
- "Hierarchical Neural Coding for Controllable CAD Model Generation" (Xu et al., 2023)
- "CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward" (Guan et al., 26 May 2025)
- "Target-Guided Bayesian Flow Networks for Quantitatively Constrained CAD Generation" (Zheng et al., 29 Oct 2025)
- "Generating CAD Code with Vision-LLMs for 3D Designs" (Alrashedy et al., 2024)
- "CAD-Coder: Text-Guided CAD Files Code Generation" (He et al., 13 May 2025)
- "From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation" (Niu et al., 13 Aug 2025)
- "CME-CAD: Heterogeneous Collaborative Multi-Expert Reinforcement Learning for CAD Code Generation" (Niu et al., 29 Dec 2025)
- "CAD-Coder: An Open-Source Vision-LLM for Computer-Aided Design Code Generation" (Doris et al., 20 May 2025)
- "EvoCAD: Evolutionary CAD Code Generation with Vision LLMs" (Preintner et al., 13 Oct 2025)
- "SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks" (Xu et al., 2022)
- "Query2CAD: Generating CAD models using natural language queries" (Badagabettu et al., 2024)
- "CADmium: Fine-Tuning Code LLMs for Text-Driven Sequential CAD Design" (Govindarajan et al., 13 Jul 2025)
- "GenCAD-Self-Repairing: Feasibility Enhancement for 3D CAD Generation" (Tsuji et al., 29 May 2025)
- "GenCAD-3D: CAD Program Generation using Multimodal Latent Space Alignment and Synthetic Dataset Balancing" (Yu et al., 17 Sep 2025)
- "NURBGen: High-Fidelity Text-to-CAD Generation through LLM-Driven NURBS Modeling" (Usama et al., 9 Nov 2025)
- "Automatic Reverse Engineering: Creating computer-aided design (CAD) models from multi-view images" (Jobczyk et al., 2023)
- "SECAD-Net: Self-Supervised CAD Reconstruction by Learning Sketch-Extrude Operations" (Li et al., 2023)
- "Text-to-CadQuery: A New Paradigm for CAD Generation with Scalable Large Model Capabilities" (Xie et al., 10 May 2025)