FlashMesh: Faster and Better Autoregressive Mesh Synthesis via Structured Speculation (2511.15618v1)

Published 19 Nov 2025 in cs.CV

Abstract: Autoregressive models can generate high-quality 3D meshes by sequentially producing vertices and faces, but their token-by-token decoding results in slow inference, limiting practical use in interactive and large-scale applications. We present FlashMesh, a fast and high-fidelity mesh generation framework that rethinks autoregressive decoding through a predict-correct-verify paradigm. The key insight is that mesh tokens exhibit strong structural and geometric correlations that enable confident multi-token speculation. FlashMesh leverages this by introducing a speculative decoding scheme tailored to the commonly used hourglass transformer architecture, enabling parallel prediction across face, point, and coordinate levels. Extensive experiments show that FlashMesh achieves up to a 2 x speedup over standard autoregressive models while also improving generation fidelity. Our results demonstrate that structural priors in mesh data can be systematically harnessed to accelerate and enhance autoregressive generation.

Summary

The paper introduces a novel predict-correct-verify paradigm that speculatively decodes multiple mesh tokens to accelerate autoregressive 3D mesh synthesis.
It employs lightweight SP-Block and HF-Block modules to exploit structural correlations for parallel token prediction, improving efficiency by up to 2×.
Experimental results show that FlashMesh achieves lower Chamfer Distance and higher tokens per second, boosting quality and speed for real-time 3D applications.

FlashMesh: Faster and Better Autoregressive Mesh Synthesis via Structured Speculation

The paper "FlashMesh: Faster and Better Autoregressive Mesh Synthesis via Structured Speculation" (2511.15618) introduces a novel framework for efficient and high-quality autoregressive mesh generation, addressing fundamental challenges associated with autoregressive models in 3D mesh synthesis. Autoregressive models have demonstrated their capacity to sequentially produce high-fidelity 3D meshes comprising vertices, faces, and coordinates. However, their sequential token-by-token decoding causes slow inference, limiting practical application in interactive and large-scale scenarios.

Introduction to FlashMesh

FlashMesh is a framework that rethinks the autoregressive decoding process by implementing a predict-correct-verify paradigm optimized for mesh synthesis. The system is specifically designed to exploit structural and geometric correlations inherent in mesh tokens, permitting confident speculation of multiple tokens simultaneously, without compromising the fidelity of mesh generation. This approach mirrors speculative decoding strategies from LLMs but adapts them to the hierarchical and structurally complex nature of mesh data.

Figure 1: Overview of the FlashMesh framework showing the speculative decoding approach and hierarchical fusion.

Predict-Correct-Verify Paradigm

Predict Stage: The speculative decoding strategy is introduced within the hourglass transformer architecture, enabling parallel prediction across face, point, and coordinate levels. FlashMesh employs lightweight modules, specifically the SP-Block and HF-Block, which leverage hierarchical feature compression to predict multiple future tokens concurrently, facilitating rapid inference.

Figure 2: The Hierarchical Fusion Block (HF-Block) integrates speculative multi-token predictions with cached contextual information to optimize token prediction accuracy.

Correct Stage: The correction mechanism ensures consistency in vertex sharing and geometric coherence among the predictions. This corrective process refines the speculative outputs by employing structural priors specific to mesh geometry.

Verify Stage: Verification is conducted in a single forward pass via the backbone network, ensuring that final outputs maintain adherence to autoregressive model expectations with improved speed and quality.

Experimental Results

Through extensive experiments, FlashMesh demonstrates significant improvements over conventional autoregressive mesh generation techniques. FlashMesh offers up to a 2× speedup compared to existing models such as Meshtron, while simultaneously enhancing mesh fidelity as evidenced by lower Chamfer Distance and increased Tokens per Second.

Figure 3: FlashMesh delivers superior performance compared to baseline methods in terms of speed and quality across various metrics.

Architectural Contributions

Speculative Decoding Strategy: FlashMesh's ability to predict multiple tokens in parallel across hierarchical mesh levels optimizes generation efficiency.
Structure-Aware Correction: Ensures vertex-security consistency and geometric coherence, preventing misalignment issues in adjacent face generation.
Practical Application and Implications: The advancements proposed by FlashMesh have substantial implications for real-time 3D content creation in virtual reality and gaming industries, offering scalable solutions for generating high-quality meshes in dynamic environments.

Conclusion

FlashMesh significantly advances the field of autoregressive mesh generation by introducing a principled speculative approach that respects the structural hierarchy and dependencies of mesh data. This results in both improved efficiency and fidelity. FlashMesh not only addresses existing bottlenecks in autoregressive mesh modeling but also sets the stage for future explorations into hybrid decoding strategies and robust geometric priors, promising further enhancements in 3D mesh generation technologies.

For future research directions, the potential exists to integrate more explicit geometric constraints and hybrid strategies to augment robustness against early prediction errors inherent in autoregressive models.