Autoregressive B-rep Holistic Token Sequence
- The paper demonstrates that holistic token sequencing unifies geometry and topology, enabling end-to-end autoregressive CAD model generation.
- It employs advanced tokenization schemes—including geometry, position, and topology tokens—to serialize complex B-rep elements into a coherent sequence.
- The approach leverages a decoder-only Transformer with causal self-attention, achieving high validity and scalability in generating precise CAD solids.
AutoRegressive Generation with B-rep Holistic Token Sequence defines a research direction at the intersection of CAD model generation, sequence modeling, and architectural innovations for boundary representations (B-rep). Traditional B-rep representations, foundational in CAD and mechanical design, inherently exhibit rich geometry–topology couplings that have historically precluded their efficient treatment within mainstream autoregressive sequence models such as causal Transformers. The emergence of holistic token sequence representations and advanced tokenization schemes has enabled the end-to-end, next-token-based generation of B-rep solids, unifying geometric precision and topological consistency in a tractable, scalable pipeline.
1. Background: B-rep Representations and the Sequence Modeling Challenge
Boundary representation (B-rep) encodes a solid model as a collection of parametric faces, bounding edges (curves), and vertices, with explicit adjacency structures linking each entity. In CAD deep learning, prior methods largely relied on decoupled graph-based representations—processing geometry and topology via parallel or hierarchical pipelines. This typically limited the applicability of modern sequence modeling frameworks, such as Transformers, which expect a serial token structure for autoregressive modeling (Jayaraman et al., 2022). Methods like ComplexGen exemplify this paradigm, producing vertex, edge, and face sets in parallel, scoring relationships via adjacency matrices, and using a global integer-linear program to ensure topological validness, completely avoiding token sequences and AR decoding (Guo et al., 2022).
The main obstacles were (1) the lack of a one-dimensional, serializable representation compatible with AR decoders, and (2) the need to jointly encode all relevant geometric and topological information such that the sequence itself is sufficient for model reconstruction.
2. Holistic Token Sequence Design: Unified Geometry–Topology Serialization
Recent advances, spearheaded by frameworks such as BrepARG (Li et al., 23 Jan 2026), AutoBrep (Xu et al., 2 Dec 2025), and BrepGPT (Li et al., 27 Nov 2025), established concrete protocols for constructing holistic token sequences from B-reps:
- Token Typing and Block Construction: A minimal, disjoint vocabulary—geometry tokens encoding latent descriptors (from VQ-VAEs or FSQ), position tokens from quantized bounding boxes, and topology tokens such as face indices or topological references—is defined (Li et al., 23 Jan 2026, Xu et al., 2 Dec 2025).
- Hierarchical/Topology-Aware Sequencing: Faces and edges are serialized to promote spatial locality and adjacency; DFS or BFS traversals yield deterministic, proximity-aware orderings, with explicit edge blocks referencing incident faces through assigned indices or sliding window tags (Li et al., 23 Jan 2026, Xu et al., 2 Dec 2025).
- Unified Representation: Each solid is represented as a single token sequence:
where and are blocks of positional, geometric, and reference tokens derived from the corresponding B-rep elements (Li et al., 23 Jan 2026).
This holistic sequence structure is designed such that a decoder-only AR Transformer, conditioned solely on the causal prefix, can learn the complete joint distribution for next-token prediction.
3. Model Architectures: Autoregressive Transformer Training and Inference
The canonical architecture for AR B-rep generation is a multi-layer, decoder-only Transformer with causal masking, which processes the holistic token sequence as its sole input (Li et al., 23 Jan 2026, Xu et al., 2 Dec 2025, Li et al., 27 Nov 2025):
- Token+Positional Embeddings: Each token is embedded into a high-dimensional space; positional encodings (absolute, rotary, or learned) are added to preserve order.
- Causal Self-Attention: All attention layers restrict each position to only observe the prefix, ensuring strict causality.
- Hierarchical Conditioning: Depending on the architecture, additional meta-tokens, user-supplied tokens (for autocompletion), or complexity indicators may be prepended (Xu et al., 2 Dec 2025).
- Training Objective: The sole optimization target is cross-entropy next-token prediction over serialized B-rep sequences, possibly augmented by masked modelling or auxiliary quantization losses if VQ-VAEs are used in tokenization (Li et al., 23 Jan 2026, Li et al., 27 Nov 2025).
- Inference: Generation proceeds purely autoregressively, with token-wise sampling (top-, temperature), optionally under constrained decoding masks to enforce known physical or topological constraints (e.g., face completeness before emitting separators) (Li et al., 27 Nov 2025, Li et al., 23 Jan 2026).
Pseudocode for AR Transformer decoding (cf. (Xu et al., 2 Dec 2025, Li et al., 23 Jan 2026)):
1 2 3 4 5 6 |
S = [START or meta-tokens] while not END: logits = ARTransformer(S) next_tok = sample(logits) # e.g. top-p sampling S.append(next_tok) complete = post_process(S) |
4. Comparison of Holistic Autoregressive Paradigms and Competing Approaches
The first holistic AR B-rep models—BrepARG, AutoBrep, BrepGPT—depart sharply from prior graph/disjoint pipelines (Guo et al., 2022, Jayaraman et al., 2022), and introduce correctness and efficiency advantages:
| Model | Sequence Representation | Topology Coupling | AR Factorization | Validity/Novelty (%) |
|---|---|---|---|---|
| ComplexGen | No (parallel sets + ILP) | Chain complex constraints | No AR | — / — |
| SolidGen | Indexed B-rep, 3-step AR (V,E,F) | Hierarchical pointers | Hierarchical AR (V,E,F) | 86.7 / 82.5 (Jayaraman et al., 2022) |
| BrepARG | Holistic (faces, edges in 1 sequence) | Face/edge indices | Flat AR on full sequence | 87.6 / 99.8 (Li et al., 23 Jan 2026) |
| AutoBrep | Unified BFS + FSQ/Ref tokens | Sliding T-refs, BFS tags | Flat AR | 70.8 / 99.8 (Xu et al., 2 Dec 2025) |
| BrepGPT | VHP vertex-centric token block | Voronoi-patch, vertex VQ | Flat AR, validity-masked | 83.9 / 97.9 (Li et al., 27 Nov 2025) |
The holistic AR models unify both geometry and topology within a causal sequence, enable full joint modeling, and support scalability (>100 faces), while maintaining higher rates of validity (watertightness, manifoldness) and distributional coverage. Sample-level novelty and uniqueness also approach or exceed 99%, indicating effective avoidance of training-set memorization (Li et al., 23 Jan 2026, Li et al., 27 Nov 2025, Xu et al., 2 Dec 2025).
5. Tokenization Schemes and Geometry–Topology Coupling
A distinguishing technical feature is the fusion of geometric compression (e.g., VQ-VAE, FSQ, DCAE) with topology-aware indexing and references:
- Geometry Tokens: Dense surface and curve samples are projected via encoders into a small grid of discrete codes (e.g., 4 tokens per face, 2 per edge), quantizing local geometry (Xu et al., 2 Dec 2025, Li et al., 23 Jan 2026).
- Position/BBox Tokens: Axis-aligned bounding boxes are uniformly quantized, allowing for coarse recovery of feature extents (Li et al., 23 Jan 2026, Xu et al., 2 Dec 2025).
- Topology Tokens: Unique indices, sliding window tags, next-pointer VHP embeddings, or explicit face pairs represent adjacency within the sequence itself (Li et al., 27 Nov 2025, Li et al., 23 Jan 2026).
- Voronoi Half-Patch: BrepGPT introduces the VHP representation, assigning geometry to Voronoi half-edges and coupling surface, curve, and connectivity tokens into enumerated vertex-based blocks for efficient sequential prediction (Li et al., 27 Nov 2025).
- Sequence Construction: Ordering strategies—DFS (BrepARG), BFS with sliding tags (AutoBrep), vertex blocks (BrepGPT)—optimize locality and disambiguate reference scope.
These comprehensive schemes ensure that, at each decoding step, the AR model has context sufficient to determine both geometric detail and topological relations, enabling lossless reconstruction.
6. Empirical Results: Performance, Scalability, and Applications
Empirical evaluations across benchmarks (DeepCAD, ABC, Furniture) consistently show that holistic AR models achieve superior distributional metrics (COV, MMD, JSD), CAD validity, and computational efficiency:
| Model | Dataset | COV(↑) | MMD(↓) | JSD(↓) | Valid(↑) | Inference Time (s/sample) |
|---|---|---|---|---|---|---|
| BrepARG | DeepCAD | 75.45 | 0.89 | 1.02 | 87.60 | 1.5 |
| BrepARG | ABC | 70.10 | 1.405 | 1.337 | 67.54 | 1.5 |
| AutoBrep | ABC-1M | 71.5 | 1.45 | 0.97 | 70.8 | 0.46 |
| BrepGPT | DeepCAD | 79.3 | 0.96 | 0.84 | 83.9 | — |
Holistic AR models natively support conditional generation (e.g., class, partial B-rep completion, image/point-cloud/text prompts), outperforming prior multi-stage or diffusion-based methods in both speed and generation quality (Li et al., 23 Jan 2026, Xu et al., 2 Dec 2025, Li et al., 27 Nov 2025).
7. Extensions, Open Problems, and Future Directions
Current holistic AR B-rep models have limitations related to quantization error (especially for fine-featured surfaces), sequence length scaling (industrial assemblies), and the hybridization needed for analytic primitives (screws, threads). Prospective directions include:
- Higher-fidelity codebooks and residual quantization to minimize information loss in geometry tokens (Li et al., 23 Jan 2026, Xu et al., 2 Dec 2025).
- Sparse attention and memory optimizations for very large solids (Li et al., 23 Jan 2026).
- Incorporation of user constraints (e.g., partial face seeds), supporting CAD in-painting, or explicit text/image-driven CAD design (Xu et al., 2 Dec 2025).
- Hybrid analytic-numeric tokenization for parametric primitives beyond point-sampled grids (Xu et al., 2 Dec 2025).
- Further generalization to other domains that require holistic AR modeling of complex topology (e.g., biological polymers with long-range constraints) (Zhang et al., 9 Oct 2025).
A plausible implication is that as holistic token sequence construction and AR generation mature, similar architectures and tokenization strategies will propagate to fields such as biomolecular generation, autoregressive image synthesis with global constraints (Zheng et al., 3 Jul 2025), and code generation conditioned on structural side-information (Zhang et al., 9 Oct 2025).
Key references include "AutoRegressive Generation with B-rep Holistic Token Sequence Representation" (Li et al., 23 Jan 2026), "AutoBrep: Autoregressive B-Rep Generation with Unified Topology and Geometry" (Xu et al., 2 Dec 2025), and "BrepGPT: Autoregressive B-rep Generation with Voronoi Half-Patch" (Li et al., 27 Nov 2025). Their architectural, tokenization, and evaluation protocols now set the state of the art for end-to-end AR CAD model generation.