Mesh Silksong: Efficient Mesh Generation

Updated 8 July 2025

Mesh Silksong is a compact mesh representation framework that uniquely tokenizes each vertex to eliminate redundancy and enhance auto-regressive generation.
It employs a layered tokenization strategy by splitting mesh topology into self- and between-layer adjacency matrices, significantly improving compression and geometric accuracy.
Empirical evaluations demonstrate state-of-the-art performance with a 22% compression rate and superior preservation of manifold topology, watertightness, and consistent face normals.

Mesh Silksong is a compact and efficient mesh representation framework designed for auto-regressive mesh generation, inspired by the process of silk weaving. Addressing the inefficiencies of prior mesh tokenization techniques that repeatedly encode identical vertices within polygonal meshes, Mesh Silksong uniquely tokenizes each mesh vertex exactly once. The approach splits mesh topology into multiple structured representations to substantially reduce redundancy and simultaneously achieve high geometric integrity, resulting in polygon meshes with manifold topology, watertightness, and consistent face normals. The system achieves a state-of-the-art compression rate of approximately 22%, and experimental evaluation demonstrates superior performance in geometric accuracy and mesh quality compared to earlier methods (2507.02477).

1. Tokenization and Sequential Encoding

Mesh Silksong introduces a "weaving" paradigm for mesh tokenization, characterized by two core principles: (a) each mesh vertex is encoded only once, and (b) topological information is divided into two independently compressed parts. Initially, the mesh undergoes preprocessing, including non-manifold edge handling and vertex sorting by "layering." Each vertex is assigned a unique coordinate, $\mathcal{V}_{i}^{L}$ , determined by its layer $L$ (graph distance from a designated initial vertex) and its within-layer order $i$ . Ordering within each layer is defined by the counterclockwise arrangement provided via the half-edge data structure at the prior layer $L-1$ , enforcing that each vertex appears a single time in the token stream and reducing redundancy by approximately 50% compared to existing tree-traversal approaches.

Topological relationships are captured via two adjacency matrices per layer:

Self-Layer Matrix ( $\mathcal{S}_l$ ): Encodes intra-layer vertex connections.
Between-Layer Matrix ( $\mathcal{B}_l$ ): Encodes connections between layer $L$ and layer $L-1$ .

Compression leverages the local structure of mesh connectivity. For $\mathcal{S}_l$ , a fixed-size sliding "window" (typically $W=8$ ) along each row binary-encodes only spatially proximate connections, with extended tokens used for out-of-window links. For $\mathcal{B}_l$ , where connectivity typically forms contiguous blocks, compression is formulated as a "stars and bars" problem; for a contiguous run of $y$ connections beginning at position $x$ , the token is computed as $x \cdot Y + y - 1$ (with $Y$ an upper bound on consecutive ones).

Each vertex sequence packs three classes of tokens—coordinate tokens ( $V_{(L, i)}$ ), self-layer tokens ( $S_{(L, i)}$ ), and between-layer tokens ( $B_{(L, i)}$ )—accompanied by special control tokens marking connected component boundaries ("C"), inter-layer transitions ("U"), and component endings ("E"). The complete tokenized sequence serves as input to a decoder-only transformer for auto-regressive mesh generation.

2. Compression Efficiency and Redundancy Reduction

Mesh Silksong's approach eliminates vertex token repetition, which is prevalent in prior methods aimed at covering adjacent triangles, and adopts block-wise indexing for token packing. This allows the reduction of average vertex token count from 3 (as in previous systems such as BPT) to 2 per vertex. As a result, the overall compression ratio improves from 0.26 to approximately 0.22—meaning that, for a typical triangular mesh with average vertex degree near 6, only about 2 tokens are required for each face. This represents a state-of-the-art mesh compression rate and relates directly to improved representational efficiency and reduced network capacity usage.

3. Preservation of Geometric and Topological Properties

A significant focus of Mesh Silksong is the maintenance of crucial geometric attributes:

Manifold Topology: Non-manifold edges are addressed via an edge-graph partitioning algorithm, supported by rules such as "max cycle or max chain first" and breadth-first (BFS) processing, to guarantee all vertices are arranged as a single, connected, consistent component conforming to manifold properties.
Watertightness: The dual-structure tokenization (separating self- and between-layer connectivity) simplifies the identification of holes (e.g., incorrect or missing edge indicators), enabling straightforward post-processing for correction and repair. This is particularly important for applications with watertightness requirements, such as 3D printing.
Consistent Face Normals: Vertex ordering derived from layer assignments and traversal ensures a uniform, counterclockwise ordering in triangle face construction. During decoding, this yields face normals computed from cross-products that are spatially consistent and correctly oriented, a property quantified in evaluations by superior Normal Consistency (NC) and absolute Normal Consistency (|NC|) scores.

4. Empirical Evaluation

Mesh Silksong's empirical evaluation demonstrates improvements over several established baselines such as EdgeRunner, BPT, and TreeMeshGPT. Measured on standard test sets:

Geometric Accuracy: The method achieves lower Chamfer Distance (CD) and Hausdorff Distance (HD) and higher NC and |NC| compared to prior work.
Mesh Detail: The Face Ratio (FR)—the ratio of generated-to-ground-truth face count—typically exceeds that of baselines, indicating the ability to produce detailed meshes.
Training and Ablation: Ablation studies with progressively balanced versus naive sampling regimes reveal that the balanced strategy better handles the long-tailed distribution of mesh complexity encountered in practical datasets.
Qualitative Results: Generated meshes display robust geometry, intricate detail, and absence of common defects, especially non-manifold artifacts frequently observed in competitive methods.

5. Applications and Industrial Relevance

The compact, geometrically robust representation in Mesh Silksong is tailored for several real-world applications:

Entertainment and Design: In industries such as video games, film, and virtual reality, high-fidelity mesh generation reduces manual workload while achieving accurate, artistically relevant 3D assets.
Technical Domains: The watertight, manifold-conforming outputs with consistent normal orientation make the approach suitable for technical workflows, including UV unwrapping, 3D printing, and physical simulation, where mesh defects can critically compromise downstream tasks.
Precision Modeling: The explicit management of connected components via control tokens allows for the faithful capture of fine-grained details, supporting applications such as intricate facial feature modeling or the creation of delicate accessory parts.

6. Broader Context and Methodological Advancements

Mesh Silksong represents a methodological advance in the field of mesh representation and generation, combining strict redundancy minimization with algorithmically clear mechanisms for geometric property enforcement. Its layered vertex encoding, adjacency matrix compression, and integrated sequence packing provide a unified tokenization approach that addresses both computational efficiency and geometric correctness. By demonstrating improvements on a suite of established metrics, Mesh Silksong provides a foundation for ongoing research into further scalable, high-integrity auto-regressive 3D generation architectures.

PDF Markdown Chat (Upgrade)

References (1)

Mesh Silksong: Auto-Regressive Mesh Generation as Weaving Silk (2025)