Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-Modal Flow Matching for B-Rep Generation

Updated 1 February 2026
  • Multi-Modal Flow Matching for B-Rep Generation is a generative modeling approach that reformulates CAD B-Reps as compositional k-cell particles for flexible, topology-aware synthesis.
  • It utilizes a dual-stage CC-VAE with cross-attention, GCN, and a set transformer to encode geometric features and a rectified flow to deterministically transport latent vectors for both conditional and unconditional generation.
  • The framework supports versatile applications including local inpainting, non-manifold synthesis, and scalable reconstruction with improved validity, cyclomatic complexity, and geometric fidelity.

A multi-modal flow matching framework, as realized in "Flatten The Complex: Joint B-Rep Generation via Compositional kk-Cell Particles" (Lu et al., 25 Jan 2026), refers to a generative modeling paradigm for CAD boundary representations (B-Reps) where both geometric and topological components—across dimensions—are encoded as compositional particles. This framework leverages flow matching in latent space, allowing for unconditional, conditional (e.g., single-view, point cloud), and inpainting tasks with enhanced validity, topological flexibility, and robust editability.

1. Compositional kk-Cell Particle Representation

Conventional B-Rep modeling adheres to the cell complex structure C=V∪E∪F⊂R3\mathcal{C} = \mathcal{V} \cup \mathcal{E} \cup \mathcal{F} \subset \mathbb{R}^3, explicitly encoding hierarchical relations among 0-cells (vertices), 1-cells (edges), and 2-cells (faces). The framework introduced in (Lu et al., 25 Jan 2026) reorganizes this hierarchy into a set P={pi}i=1N\mathcal{P} = \{p_i\}_{i=1}^N of compositional kk-cell particles, where each particle pi=(xi,ci,hi)p_i = (x_i, c_i, \mathbf{h}_i) consists of:

  • xi∈R3x_i \in \mathbb{R}^3: spatial anchor of the cell (e.g., centroid)
  • ci∈{0,1,2}c_i \in \{0,1,2\}: cell order (vertex, edge, face)
  • hi∈RD\mathbf{h}_i \in \mathbb{R}^D: a learned latent embedding containing both geometric and graph-structural information

Boundary relations are captured over a Spatial Hasse diagram G=(N,I)\mathcal{G}=(\mathcal{N}, \mathcal{I}), with directed links encoding boundary inclusion (j≺Ii)(j \prec_{\mathcal{I}} i). Higher-order cells reuse the embeddings of their boundary constituents, with geometry decoded as: Φi=Dθ(zi,{Φj∣j≺Ii})\Phi_i = \mathcal{D}_\theta \left( \mathbf{z}_i, \{ \Phi_j \mid j \prec_{\mathcal{I}} i \} \right) for k>0k > 0, enabling intrinsic sharing and coupling of local and global B-Rep attributes.

2. CC-VAE: Encoding and Decoding the Particle Set

Cell particle sets are first processed by a two-stage compositional cell variational autoencoder (CC-VAE). The encoder comprises:

  • Local geometric injection via cross-attention from dense surface points to each xix_i
  • A 2-layer GCN over the Hasse relations I\mathcal{I}
  • A Set Transformer yielding posterior means and variances (μi,σi)(\mu_i, \sigma_i) for each particle

The decoder reconstructs

  • particle positions x^i\hat x_i
  • cell types c^i\hat c_i
  • adjacency (link) probabilities A^ij\hat A_{ij} (via a masked focal loss on boundary inclusion) while simultaneously reconstructing the geometric content of each cell via Eq. (1). The total VAE loss includes reconstruction, binary cross-entropy losses, and Kullback–Leibler regularization: LVAE=∑i∥x^i−xi∥22+BCE(ci,c^i)+∑i<jFL(Aij,A^ij)+Lgeom+β KL(q(z)∥N(0,I))\mathcal{L}_{\mathrm{VAE}} = \sum_i \|\hat x_i - x_i\|_2^2 + \mathrm{BCE}(c_i, \hat c_i) + \sum_{i<j} \mathrm{FL}(A_{ij}, \hat A_{ij}) + \mathcal{L}_\mathrm{geom} + \beta\,\mathrm{KL}(q(\mathbf{z})\|\mathcal{N}(0,I))

3. Multi-Modal Rectified Flow Matching

After CC-VAE training, every B-Rep is represented as a fixed-length unordered set of latent vectors Z={zi}\mathcal{Z} = \{\mathbf{z}_i\}. The set generator then models the generative process as a rectified flow, fitting a displacement-based flow model [Liu et al., 2023] that deterministically transports a Gaussian prior π0=N(0,I)\pi_0 = \mathcal{N}(0,I) to the empirical data law π1=p(Z)\pi_1 = p(\mathcal{Z}) using the ODE: zt=tz1+(1−t)z0,t∈[0,1]z_t = t z_1 + (1-t) z_0, \quad t \in [0,1] with the flow net vθ(zt,t)v_\theta(z_t, t) trained under mean squared velocity loss: LFM=Et∼U[0,1],z0,z1∼π0,π1∥vθ(zt,t)−(z1−z0)∥22\mathcal{L}_{\mathrm{FM}} = \mathbb{E}_{t \sim \mathcal{U}[0,1], z_0, z_1 \sim \pi_0, \pi_1} \| v_\theta(z_t, t) - (z_1 - z_0) \|_2^2 At inference, integrating z˙t=vθ(zt,t)\dot{z}_t = v_\theta(z_t, t) from t=0t=0 to $1$ (unconditional), or z˙t=vθ(zt,t∣c)\dot{z}_t = v_\theta(z_t, t \mid c) for conditioning variable cc (e.g., image or point cloud), synthesizes new B-Rep structures in latent space.

Conditional generation is handled by a dual-stream transformer backbone based on MM-DiT [SD3], where noisy latent token sequences and frozen condition tokens (e.g., DINOv2 image embeddings, Sonata point cloud features) are simultaneously ingested. The flow model conditions on cc for both training and inference.

4. Functional Properties and Inference Capabilities

The particle-based set structure and the latent flow allow for:

  • Unconditional generation: sampling from the prior and transporting latents via flow yields diverse B-Rep solids.
  • Conditional generation: incorporating external modalities (single-view images, point clouds) as condition tokens allows the generation of solids consistent with observed evidence.
  • Local inpainting: arbitrary masking of particles (i.e., holding some tokens fixed, such as vertices/edges) during flow sampling enables local or partial completion.
  • Non-manifold synthesis: by restricting the token set (e.g., only 0- and 1-cells), the method can directly synthesize wireframe or other non-manifold structures.

5. Experimental Results and Advantages over Prior Methodologies

Quantitative benchmarks on DeepCAD and ABC datasets demonstrate that the multi-modal flow matching framework achieves:

Dataset 1-NNA ↓ MMD ↓ JSD ↓ Coverage ↑ Validity ↑ CC ↑
ABC 63.02 1.74 0.66 64.32 66.50 % 12.92

Notable advantages include:

  • Higher validity, especially for B-Reps with complex topology (min-kk validity remains stable as kk increases, when other models falter for k>7k > 7).
  • Robust cyclomatic complexity (loop structure) and geometric fidelity.
  • Inference-time scaling: increasing the number of tokens beyond the training regime (e.g., from N=256N=256 to N=512N=512 or $1024$) improves validity without retraining.
  • Versatility in tasks—unconditional generation, conditional reconstruction, local inpainting, and direct wireframe synthesis—without architectural modifications.

Qualitative analysis (as shown in their Figures 6–9) confirm sharp feature preservation, watertightness, editability, as well as the ability to flexibly manipulate and edit B-Rep substructures.

6. Broader Significance and Limitations

By flattening the B-Rep cell complex to a compositional set, jointly learning holistic encoding and decoding with a CC-VAE, and employing rectified flow for generation, this framework overcomes fundamental limitations of hierarchical, cascade-based, or strictly sequential methods:

  • Topology-geometry coupling: explicit shared-latent boundaries enforce geometric and topological consistency across all orders.
  • Parallelism and flexibility: the set-based approach supports global reasoning, unrestricted editing, and scalable synthesis.
  • Failure modes: while validity and editability are improved, the set-transformer’s computational cost grows with O(N2)\mathcal{O}(N^2), and excessive masking or very high token counts may challenge decoder capacity.

Future work suggested in (Lu et al., 25 Jan 2026) includes scaling to even larger B-Reps, more advanced masking schemes, and extending the approach to richer input/output modalities and assemblies. The multi-modal flow matching framework currently represents the most holistic, edit-friendly, and contextually robust paradigm for B-Rep generative modeling.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi-Modal Flow Matching Framework.