Papers
Topics
Authors
Recent
2000 character limit reached

BrepLLM: Integrating 3D CAD and Language Models

Updated 26 December 2025
  • BrepLLM is a framework that bridges structured CAD boundary representations with language models, enabling native parsing and reasoning of 3D geometric data.
  • It employs a two-stage training pipeline with adaptive sampling, hierarchical graph encoding, and contrastive pretraining to align geometric and textual semantics.
  • BrepLLM achieves state-of-the-art performance on 3D captioning and classification benchmarks using the novel Brep2Text dataset with over 269K Brep–text pairs.

BrepLLM is a framework for enabling LLMs to natively parse, reason over, and generate knowledge from raw Boundary Representation (Brep) data, directly bridging structured 3D geometric/topological information and natural language. Standard LLMs and vision-LLMs process flat data (text, images, point clouds) as unstructured sequences, making them fundamentally incompatible with the graph-structured, watertight, and parametric characteristics inherent to Breps—the industry standard in Computer-Aided Design (CAD). BrepLLM introduces a two-stage pipeline that unifies adaptive geometric sampling, hierarchical graph encoding, cross-modal contrastive pretraining, and progressive multi-stage LLM fine-tuning, substantially outperforming prior methods on 3D captioning and classification benchmarks involving native Brep objects (Deng et al., 18 Dec 2025).

1. Foundations: Brep Data and Motivation

Boundary Representations (Breps) describe 3D solids via exact parametric surfaces (“faces”), trimmed curves (“edges”), and watertight adjacency graphs. Each face may possess a unique (u,v)(u,v) parameter domain, curvature, and normal field, while topological relations encode multi-face adjacencies and global assembly. Such structured representations are essential for CAD applications requiring precise geometry, high fidelity, and explicit feature awareness.

Prior “CAD-LLM” approaches circumvent Brep processing by relying on procedural command histories (e.g., sequences of sketches and extrusions), which inherently lose access to explicit topology and fine geometric detail, limiting their reasoning and generation capacity. The modality gap thus precludes direct ingestion of structured Brep data by conventional LLMs or even multi-modal transformers, motivating the need for fundamentally new approaches (Deng et al., 18 Dec 2025).

2. Two-Stage Training Pipeline

The BrepLLM architecture is built around a two-stage training regime designed to align Brep geometry/topology with language representations and enable downstream LLM reasoning.

2.1 Cross-Modal Alignment Pre-training

Adaptive UV and Edge Sampling

  • For each face SS with area ASA_S, adaptive sampling distributes NSN_S points over its (u,v)(u,v) domain ΩS=[umin,umax]×[vmin,vmax]\Omega_S = [u_\text{min}, u_\text{max}] \times [v_\text{min}, v_\text{max}]:

NS=Nminface+ASAminAmaxAmin(NmaxfaceNminface)N_S = N_\text{min}^\text{face} + \frac{A_S-A_\text{min}}{A_\text{max}-A_\text{min}} \cdot (N_\text{max}^\text{face}-N_\text{min}^\text{face})

Similar normalization is applied to edge CC by length C\ell_C for edge samples MCM_C.

  • Face samples yield a 10-dimensional vector at each (uk,vl)(u_k,v_l): 3D point P(u,v)P(u,v), normal n(u,v)n(u,v), curvature H(u,v)H(u,v), binary flag VV, face type tt, normalized area aa. Edges are encoded as 8-dimensional vectors with analogous geometric and type information.

Hierarchical BrepEncoder

  • Face nodes and edge adjacencies form a graph G=(V,E)G=(V,E).
  • Node features include:
    • Fine-grained: PointTransformerV3 extracts Ff(i)R32F_f^{(i)} \in \mathbb{R}^{32} from sampled face points.
    • Edge-conditioned: NNConv derives Fe(i)R32F_e^{(i)} \in \mathbb{R}^{32} using incident edge geometry.
    • Global topology: EGATConv computes Ft(i)R64F_t^{(i)} \in \mathbb{R}^{64} over adjacency graph nodes.
  • Each node token hi=[Ft(i)Fe(i)Ff(i)]R128h_i = [F_t^{(i)} \| F_e^{(i)} \| F_f^{(i)}] \in \mathbb{R}^{128}; an attention-pooled global token hclsR128h_\mathrm{cls} \in \mathbb{R}^{128} enables full-graph summarization.

Contrastive Learning Objective

  • hclsh_\mathrm{cls} is projected via MLP to RD\mathbb{R}^D and aligned with RD\mathbb{R}^D text embeddings ztextz_\text{text} (from a frozen CLIP text encoder) using a symmetric InfoNCE loss over batches of paired samples:

Lcontrastive=12Ni=1N[logexp(Sii/τ)k=1Nexp(Sik/τ)+logexp(Sii/τ)k=1Nexp(Ski/τ)]L_\text{contrastive} = -\frac{1}{2N} \sum_{i=1}^N \left[ \log \frac{\exp(S_{ii}/\tau)}{\sum_{k=1}^N \exp(S_{ik}/\tau)} + \log \frac{\exp(S_{ii}/\tau)}{\sum_{k=1}^N \exp(S_{ki}/\tau)} \right]

  • Node tokens {hi}\{h_i\} are passed forward for LLM integration; the global token is used solely for pre-training alignment.

2.2 Multi-Stage LLM Fine-tuning

  • Node tokens XgeoRn×128X_\text{geo}\in\mathbb{R}^{n\times128} are projected into a vision-language backbone (e.g. Phi-2 with Q-Former).

Stage I: Geometry–Vision Bridging

  • A two-layer MLP maps each node token to DqfD_\text{qf}-dim inputs, truncated/padded to Tmax=128T_\text{max{}}=128.
  • Q-Former cross-attends to XqfX_\text{qf}. Only MLP and final head are initially trained, exploiting 2D vision-language priors to semantically bridge Brep tokens and language.

Stage II: 3D–Language Alignment (LoRA Tuning)

  • LoRA modules tune select Q-Former and LLM layers, biasing vision-language modules toward genuine 3D semantics.

Stage III: Mixture-of-Query Experts (MQE)

  • The Stage-II Q-Former query set EbaseE_\text{base} is frozen as QbaseQ_\text{base}.
  • kk trainable expert query sets E1EkE_1\ldots E_k and a sparse router R(Xgeo)R(X_\text{geo}) select top-GG experts per sample.
  • Each selected expert adds a residual QresidualQ_\text{residual} to QbaseQ_\text{base}; only residual experts and router are trained, improving geometric diversity and stability.

3. Brep2Text Dataset

BrepLLM introduces Brep2Text, the first large-scale dataset for instruction-tuning LLMs on raw Brep representations, totaling 269,444 Brep–text QA pairs. The dataset construction process:

  • Source: Text2CAD corpus of 134,722 Brep models with detailed, human-written high-level descriptions.
  • For each Brep: two semantic prompt tiers—
    • Abstract (function, category, global shape): e.g., “Identify function.”
    • Beginner (constructive history): e.g., “List the sequence of sketch/extrude operations.”
  • Qwen-Max LLM is used in reverse to generate natural-language question–answer pairs, with human-written text as ground truth.
  • Dataset split: 200 unique Brep models in the test set (no train/val overlap), with the remainder used for model development (Deng et al., 18 Dec 2025).

4. Experimental Evaluation

BrepLLM achieves superior results compared to point-cloud-based and mesh-based LLMs on both 3D object captioning and generative classification.

4.1 3D Object Captioning

On the Brep2Text dataset, evaluation uses automatic metrics (Qwen-Max, SBERT, SimCSE) and human annotation.

Model Input Qwen-Max SBERT SimCSE Human Prec (%)
PointLLM-7B Pts Cloud 46.81 65.72 66.05 74.60
ShapeLLM-13B Pts Cloud 51.36 68.36 70.12 73.47
MiniGPT-3D (2.7B) Pts Cloud 56.58 71.64 73.13 79.40
BrepLLM (2.9B) Brep 58.89 73.05 74.46 81.85

4.2 Generative 3D Object Classification

Prompt styles include identity-inquiry and completion (“What is this?”, “This is an object of…”). Model outputs are validated using Qwen-Max automatic judgment.

Model I (%) C (%) Avg (%)
PointLLM-7B 52.70 50.10 51.40
ShapeLLM-13B 52.90 53.70 53.30
MiniGPT-3D 55.70 54.10 54.90
BrepLLM 57.40 56.70 57.05

4.3 Ablation Studies

Ablations highlight the impact of each pipeline component:

Component Gain (%) Condition
Adaptive UV sampling (full pipeline) +2.05 Stage I
Hierarchical BrepEncoder +2.78 Stage I
Stages I+II+III (full pipeline) 57.05 All stages
MQE in III only 57.05 Timing

5. Contributions and Comparative Positioning

BrepLLM is distinguished by several contributions:

  • The first framework for direct Brep-to-LLM reasoning and generation without procedural tokenization.
  • Adaptive UV/length sampling schemes enabling translation of parametric geometry into high-fidelity, information-rich attribute graphs, supporting nuanced geometric and topological modeling.
  • Hierarchical BrepEncoder integrating face, edge, and topology features into both sequence and global tokens—crucial for both local and holistic reasoning.
  • A two-stage regime unifying CLIP-style Brep–text contrastive pretraining with a three-stage progressive LLM tuning culminating in Mixture-of-Query Experts for geometric diversity.
  • Introduction of Brep2Text, the first instruction-tuning dataset pairing raw Brep data with semantically rich text (269,444 pairs), establishing a new benchmark for Brep understanding.
  • State-of-the-art performance on both 3D captioning and generative classification, realized using a compact 2.9B-parameter LLM (Deng et al., 18 Dec 2025).

6. Limitations and Prospects

BrepLLM currently targets single-part, open-CAD objects; multi-body assemblies, kinematic constraints, and in-the-loop constraint solving are outside its present operational domain. Brep2Text annotations are generated via automated LLM prompting; incorporating human-curated multi-turn dialogues and advanced geometric queries (e.g. constraint satisfaction, distance, angle) would widen the framework’s applicability. Potential extensions include scaling to larger LLM backbones, integrating additional modalities (rendered images, point clouds), and expanding Mixture-of-Query Experts for continual specialization, especially for sub-domains such as sheet metal and freeform surfaces (Deng et al., 18 Dec 2025).

A plausible implication is that as BrepLLM and related architectures mature, direct language-driven interaction with exact CAD geometry—without intermediate procedural detours—will become tractable for a wide range of engineering and scientific workflows, enabling high-fidelity, explainable, and semantically grounded geometric reasoning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to BrepLLM.