Papers
Topics
Authors
Recent
Search
2000 character limit reached

MeshPad: Interactive 3D Mesh Editing

Updated 22 January 2026
  • MeshPad is a generative framework for interactive 3D mesh creation, using user-provided 2D sketches to guide localized deletion and addition operations.
  • It employs an autoregressive Transformer and a vertex-aligned speculator to accelerate token generation and ensure consistent, fine-grained mesh edits.
  • Empirical results show MeshPad outperforms previous methods on metrics like Chamfer Distance, FID, and LPIPS, delivering superior quality and interactive runtimes.

MeshPad is a generative framework for interactive 3D mesh creation and editing, directly conditioned on user-provided 2D sketches. By decomposing editing into deletion and addition operations, MeshPad achieves fine-grained, region-specific manipulation of artist-designed triangle meshes, enabling construction of complex 3D forms through an iterative sketch-based interface. The system combines an autoregressive triangle-sequence Transformer with a vertex-aligned speculative prediction module, leading to consistent edits and interactive runtimes while outperforming previous sketch-to-mesh methods in both quantitative and perceptual evaluations (Li et al., 3 Mar 2025).

1. Motivation and Problem Setting

Traditional generative mesh synthesis techniques, including MeshGPT and MeshAnything, are capable of producing plausible 3D models from input prompts. However, these methods lack support for localized, region-specific editing; they typically require full mesh regeneration for adjustments, disrupting workflow continuity and discarding unedited details. Artistic workflows, in contrast, often depend on iterative manipulation—sketching, modifying, and refining certain regions without altering the remainder. MeshPad addresses this gap by enabling interactive, sketch-conditioned edits such that only specified mesh regions are changed while the rest of the structure is preserved. Sketches serve as a highly expressive and familiar modality for conveying 3D intent, permitting precise yet intuitive content manipulation.

2. Mesh Representation and Tokenization

MeshPad represents a triangular mesh M\mathcal{M} as an ordered set of faces: M={F1,F2,,FN},Fk={vk,1,vk,2,vk,3}, vk,iR3.\mathcal{M} = \{\,\mathcal{F}_1,\mathcal{F}_2,\dots,\mathcal{F}_N\}, \quad \mathcal{F}_k = \{v_{k,1},v_{k,2},v_{k,3}\},\ v_{k,i} \in \mathbb{R}^3. To facilitate generative modeling, a tokenizer T\mathcal{T} encodes M\mathcal{M} into a sequence of discrete tokens: S=[<start>,vx1,vy1,vz1,vx2,,<split>,,<end>],S = [\,\mathtt{<start>},\,v_x^1,\,v_y^1,\,v_z^1,\,v_x^2,\,\dots,\,\mathtt{<split>},\,\dots,\,\mathtt{<end>}], where each vertex coordinate (vxiv_x^i, vyiv_y^i, vziv_z^i) is quantized to a vocabulary index and structural control tokens (e.g., <split>\mathtt{<split>}) delimit triangle fans or connectivity groups. The inverse mapping T1(S)\mathcal{T}^{-1}(S) reconstructs the mesh. This discrete, autoregressive representation enables sequence-based Transformer architectures to operate on 3D geometric content.

3. Generative Transformer Architecture

MeshPad employs an Open Pre-trained Transformer (OPT) as its core backbone. The input sequence comprises (a) sketch tokens—extracted by a frozen RADIO image encoder, (b) tokens representing the unedited mesh region (SkS_k), and (c) previously generated tokens for newly added mesh regions (SrS'_r):

  • Addition (autoregressive):

P(Sr(i+1)Sk,I,Sr(1i))=OPT(Sk,I,Sr(1i)),P(S_r^{\prime(i+1)}\,|\,S_k,\mathcal{I},S_r^{\prime(1\ldots i)}) = \mathrm{OPT}(S_k,\mathcal{I},S_r^{\prime(1\ldots i)}),

producing the output sequence SrS'_r that is integrated with the preserved mesh to yield M=MkT1(Sr)\mathcal{M}' = \mathcal{M}_k \cup \mathcal{T}^{-1}(S'_r).

  • Deletion (classification): OPT, equipped with bi-directional attention, labels each vertex token for retention or removal. A small classification head, pooling the three coordinate-token embeddings per vertex, outputs Pr(keepv)\Pr(\text{keep}\mid v) for each vv.

This dual-mode model supports both sequence generation for mesh addition and classification for localized deletion.

4. Edit Operations and Workflow

MeshPad interprets sketch edits as a partition of sketch strokes and mesh regions into those to be kept (Ik\mathcal{I}_k, Mk\mathcal{M}_k) and those to be added/removed (Ir\mathcal{I}_r, Mr\mathcal{M}_r). The deletion operation predicts removable vertices via a threshold on pvp_v, then: Mr={FMvF:vVr},Mk=MMr.\mathcal{M}'_r = \{\mathcal{F} \in \mathcal{M} \mid \exists\,v \in \mathcal{F}:v \in \mathcal{V}'_r\},\quad \mathcal{M}'_k = \mathcal{M} \setminus \mathcal{M}'_r. Supervision utilizes binary cross-entropy loss per vertex. The addition operation, conditioned on the preserved mesh region and the current sketch, auto-regressively generates token sequences for new geometry, with cross-entropy loss against ground-truth tokens.

This "delete-then-add" paradigm enables iteration and localized refinement, in alignment with artistic workflows.

5. Vertex-Aligned Speculative Prediction

To accelerate the autoregressive decoding of mesh tokens, MeshPad introduces a vertex-aligned "speculator" head—a multilayer perceptron mapped to vertex token positions. Given the hidden state ExE_x at an xx-coordinate token VxV_x', the speculator predicts the corresponding yy and zz tokens: P(Vy,z)=Speculator(Ex,Vx).P(V'_{y,z}) = \mathrm{Speculator}(E_x, V'_x). This enables decoding in units of vertices (3 tokens per vertex) rather than single tokens, permitting speculative generation of (y,z)(y, z) coordinates in a single pass. Joint training ensures the transformer's hidden states adapt to this structure. Empirically, the approach achieves a 2.2×\sim2.2\times increase in token generation throughput (from 60.7\sim60.7 T/s to 131\sim131 T/s on an A100 GPU) without quality degradation.

6. Training Objectives and Evaluation Metrics

MeshPad is trained in a self-supervised manner, using:

  • Addition head: standard cross-entropy over the token vocabulary for next-token prediction.
  • Deletion head: binary cross-entropy per vertex,

Ldel=v[vlogpv+(1v)log(1pv)].\mathcal{L}_{\rm del} = -\sum_{v}\bigl[\ell_v\log p_v + (1-\ell_v)\log(1-p_v)\bigr].

  • Speculator: cross-entropy for the two predicted (y,z)(y, z) tokens; the OPT head is used for control tokens and xx-coordinates.

Evaluation leverages multiple metrics:

  • Chamfer Distance:

Lchamfer=iminjxiyj2+jminiyjxi2L_{\rm chamfer} = \sum_{i}\min_{j}\lVert x_i - y_j\rVert^2 + \sum_{j}\min_{i}\lVert y_j - x_i\rVert^2

  • FID (on shaded renderings).
  • CLIP/LPIPS similarity between input sketches and synthesised meshes.

7. Empirical Results and Analysis

On the ShapeNet test set, MeshPad demonstrates substantial improvements over prior art:

  • Chamfer Distance (×10⁻³): LAS 22.46, SENS 8.95, MeshPad 6.20—representing a ≈22% reduction versus the best previous method.
  • FID: LAS 47.1, SENS 81.9, MeshPad 9.4.
  • LPIPS/CLIP: MeshPad achieves the lowest LPIPS and highest CLIP similarity to the input.

User studies with 35 participants rate MeshPad at 4.3/5 for mesh quality and 4.2–4.3/5 for edit consistency, exceeding baseline methods (2.7–3.5). Binary preference tests select MeshPad over LAS/SENS (and MeshAnythingV2 post-processing variants) 83–96% of the time, both for generation and editing.

8. Strengths, Limitations, and Future Prospects

MeshPad’s two-stage editing operation (deletion plus addition) enables maintenance of unedited mesh regions and incremental, part-wise construction. The vertex-aligned speculator yields interactive runtimes measured in seconds per edit, compatible with creative iterative workflows. Sketch-based conditioning provides precise, accessible user control over mesh geometry.

Key limitations include a Transformer-imposed cap on sequence length, restricting mesh complexity to approximately 768 faces and precluding extremely large-scale scene synthesis. The reliance on a fixed token vocabulary and quantization may reduce capacity for ultra-fine geometric detail. Prospective directions include hierarchical or sparse mesh representations (e.g., tile-based meshes) to address scalability while preserving interactivity (Li et al., 3 Mar 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MeshPad.