Multi-Modal Flow Matching for B-Rep Generation

Updated 1 February 2026

Multi-Modal Flow Matching for B-Rep Generation is a generative modeling approach that reformulates CAD B-Reps as compositional k-cell particles for flexible, topology-aware synthesis.
It utilizes a dual-stage CC-VAE with cross-attention, GCN, and a set transformer to encode geometric features and a rectified flow to deterministically transport latent vectors for both conditional and unconditional generation.
The framework supports versatile applications including local inpainting, non-manifold synthesis, and scalable reconstruction with improved validity, cyclomatic complexity, and geometric fidelity.

A multi-modal flow matching framework, as realized in "Flatten The Complex: Joint B-Rep Generation via Compositional $k$ -Cell Particles" (Lu et al., 25 Jan 2026), refers to a generative modeling paradigm for CAD boundary representations (B-Reps) where both geometric and topological components—across dimensions—are encoded as compositional particles. This framework leverages flow matching in latent space, allowing for unconditional, conditional (e.g., single-view, point cloud), and inpainting tasks with enhanced validity, topological flexibility, and robust editability.

1. Compositional $k$ -Cell Particle Representation

Conventional B-Rep modeling adheres to the cell complex structure $\mathcal{C} = \mathcal{V} \cup \mathcal{E} \cup \mathcal{F} \subset \mathbb{R}^3$ , explicitly encoding hierarchical relations among 0-cells (vertices), 1-cells (edges), and 2-cells (faces). The framework introduced in (Lu et al., 25 Jan 2026) reorganizes this hierarchy into a set $\mathcal{P} = \{p_i\}_{i=1}^N$ of compositional $k$ -cell particles, where each particle $p_i = (x_i, c_i, \mathbf{h}_i)$ consists of:

$x_i \in \mathbb{R}^3$ : spatial anchor of the cell (e.g., centroid)
$c_i \in \{0,1,2\}$ : cell order (vertex, edge, face)
$\mathbf{h}_i \in \mathbb{R}^D$ : a learned latent embedding containing both geometric and graph-structural information

Boundary relations are captured over a Spatial Hasse diagram $\mathcal{G}=(\mathcal{N}, \mathcal{I})$ , with directed links encoding boundary inclusion $(j \prec_{\mathcal{I}} i)$ . Higher-order cells reuse the embeddings of their boundary constituents, with geometry decoded as: $\Phi_i = \mathcal{D}_\theta \left( \mathbf{z}_i, \{ \Phi_j \mid j \prec_{\mathcal{I}} i \} \right)$ for $k > 0$ , enabling intrinsic sharing and coupling of local and global B-Rep attributes.

2. CC-VAE: Encoding and Decoding the Particle Set

Cell particle sets are first processed by a two-stage compositional cell variational autoencoder (CC-VAE). The encoder comprises:

Local geometric injection via cross-attention from dense surface points to each $x_i$
A 2-layer GCN over the Hasse relations $\mathcal{I}$
A Set Transformer yielding posterior means and variances $(\mu_i, \sigma_i)$ for each particle

The decoder reconstructs

particle positions $\hat x_i$
cell types $\hat c_i$
adjacency (link) probabilities $\hat A_{ij}$ (via a masked focal loss on boundary inclusion) while simultaneously reconstructing the geometric content of each cell via Eq. (1). The total VAE loss includes reconstruction, binary cross-entropy losses, and Kullback–Leibler regularization: $\mathcal{L}_{\mathrm{VAE}} = \sum_i \|\hat x_i - x_i\|_2^2 + \mathrm{BCE}(c_i, \hat c_i) + \sum_{i<j} \mathrm{FL}(A_{ij}, \hat A_{ij}) + \mathcal{L}_\mathrm{geom} + \beta\,\mathrm{KL}(q(\mathbf{z})\|\mathcal{N}(0,I))$

After CC-VAE training, every B-Rep is represented as a fixed-length unordered set of latent vectors $\mathcal{Z} = \{\mathbf{z}_i\}$ . The set generator then models the generative process as a rectified flow, fitting a displacement-based flow model [Liu et al., 2023] that deterministically transports a Gaussian prior $\pi_0 = \mathcal{N}(0,I)$ to the empirical data law $\pi_1 = p(\mathcal{Z})$ using the ODE: $z_t = t z_1 + (1-t) z_0, \quad t \in [0,1]$ with the flow net $v_\theta(z_t, t)$ trained under mean squared velocity loss: $\mathcal{L}_{\mathrm{FM}} = \mathbb{E}_{t \sim \mathcal{U}[0,1], z_0, z_1 \sim \pi_0, \pi_1} \| v_\theta(z_t, t) - (z_1 - z_0) \|_2^2$ At inference, integrating $\dot{z}_t = v_\theta(z_t, t)$ from $t=0$ to $1$ (unconditional), or $\dot{z}_t = v_\theta(z_t, t \mid c)$ for conditioning variable $c$ (e.g., image or point cloud), synthesizes new B-Rep structures in latent space.

Conditional generation is handled by a dual-stream transformer backbone based on MM-DiT [SD3], where noisy latent token sequences and frozen condition tokens (e.g., DINOv2 image embeddings, Sonata point cloud features) are simultaneously ingested. The flow model conditions on $c$ for both training and inference.

4. Functional Properties and Inference Capabilities

The particle-based set structure and the latent flow allow for:

Unconditional generation: sampling from the prior and transporting latents via flow yields diverse B-Rep solids.
Conditional generation: incorporating external modalities (single-view images, point clouds) as condition tokens allows the generation of solids consistent with observed evidence.
Local inpainting: arbitrary masking of particles (i.e., holding some tokens fixed, such as vertices/edges) during flow sampling enables local or partial completion.
Non-manifold synthesis: by restricting the token set (e.g., only 0- and 1-cells), the method can directly synthesize wireframe or other non-manifold structures.

5. Experimental Results and Advantages over Prior Methodologies

Quantitative benchmarks on DeepCAD and ABC datasets demonstrate that the multi-modal flow matching framework achieves:

Dataset	1-NNA ↓	MMD ↓	JSD ↓	Coverage ↑	Validity ↑	CC ↑
ABC	63.02	1.74	0.66	64.32	66.50 %	12.92

Notable advantages include:

Higher validity, especially for B-Reps with complex topology (min- $k$ validity remains stable as $k$ increases, when other models falter for $k > 7$ ).
Robust cyclomatic complexity (loop structure) and geometric fidelity.
Inference-time scaling: increasing the number of tokens beyond the training regime (e.g., from $N=256$ to $N=512$ or $1024$) improves validity without retraining.
Versatility in tasks—unconditional generation, conditional reconstruction, local inpainting, and direct wireframe synthesis—without architectural modifications.

Qualitative analysis (as shown in their Figures 6–9) confirm sharp feature preservation, watertightness, editability, as well as the ability to flexibly manipulate and edit B-Rep substructures.

6. Broader Significance and Limitations

By flattening the B-Rep cell complex to a compositional set, jointly learning holistic encoding and decoding with a CC-VAE, and employing rectified flow for generation, this framework overcomes fundamental limitations of hierarchical, cascade-based, or strictly sequential methods:

Topology-geometry coupling: explicit shared-latent boundaries enforce geometric and topological consistency across all orders.
Parallelism and flexibility: the set-based approach supports global reasoning, unrestricted editing, and scalable synthesis.
Failure modes: while validity and editability are improved, the set-transformer’s computational cost grows with $\mathcal{O}(N^2)$ , and excessive masking or very high token counts may challenge decoder capacity.

Future work suggested in (Lu et al., 25 Jan 2026) includes scaling to even larger B-Reps, more advanced masking schemes, and extending the approach to richer input/output modalities and assemblies. The multi-modal flow matching framework currently represents the most holistic, edit-friendly, and contextually robust paradigm for B-Rep generative modeling.

Markdown Report Issue Upgrade to Chat

References (1)

Flatten The Complex: Joint B-Rep Generation via Compositional $k$-Cell Particles (2026)

Topic to Video (Beta)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Multi-Modal Flow Matching Framework.

Multi-Modal Flow Matching for B-Rep Generation

1. Compositional kkk-Cell Particle Representation

2. CC-VAE: Encoding and Decoding the Particle Set

3. Multi-Modal Rectified Flow Matching

4. Functional Properties and Inference Capabilities

5. Experimental Results and Advantages over Prior Methodologies

6. Broader Significance and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

1. Compositional $k$ -Cell Particle Representation