Papers
Topics
Authors
Recent
2000 character limit reached

PolyDiff: Diffusion on Polytopes

Updated 15 December 2025
  • PolyDiff is a diffusion-based generative framework for discrete data on polytopes, modeling categorical images and 3D meshes with analytic OU dynamics.
  • It employs diffusion on the probability simplex and transformer-based denoising to jointly capture geometric and combinatorial structures.
  • Empirical results show significant FID and JSD improvements over existing methods, establishing its efficacy in generating high-quality mesh and shape data.

PolyDiff refers to a family of diffusion-based generative models for discrete or piecewise-linear structures on polytopes, most notably for direct generation or reconstruction of categorical, polygonal, or mesh data. As a methodological framework, PolyDiff models leverage diffusion processes either on the probability simplex or in the space of quantized mesh coordinates, enabling coherent modeling of both geometric and combinatorial structures in data such as categorical images or 3D polygonal meshes (Floto et al., 2023, Alliegro et al., 2023).

1. Diffusion on the Probability Simplex

The foundational principle of PolyDiff is the formulation of diffusion on the probability simplex SkS^k, the set of categorical probability distributions in k+1k+1 categories. The continuous-time forward process is defined in an unconstrained latent space yRky \in \mathbb{R}^k, where yy follows a mean-reverting Ornstein–Uhlenbeck (OU) process: dYt=θYtdt+σdWt,θ>0,  σ>0dY_t = -\theta Y_t\, dt + \sigma dW_t, \qquad \theta>0, \; \sigma>0 Transitioning to the simplex is achieved by the additive-logistic (softmax) map: Xt=σ(Yt)X_t = \sigma(Y_t) with

σi(y)={eyi1+j=1keyj,i=1k 11+j=1keyj,i=k+1\sigma_i(y) = \begin{cases} \frac{e^{y_i}}{1+\sum_{j=1}^k e^{y_j}}, & i=1\ldots k \ \frac{1}{1+\sum_{j=1}^k e^{y_j}}, & i=k+1 \end{cases}

Ensuring 0Xt,i10 \leq X_{t,i} \leq 1 and i=1k+1Xt,i=1\sum_{i=1}^{k+1} X_{t,i} = 1.

Through Itô's lemma, the induced process on the simplex itself is: dXt=f(Xt,t)dt+G(Xt,t)dWtdX_t = f(X_t, t) dt + G(X_t, t) dW_t with closed-form drift ff and diffusion matrix GG (see (Floto et al., 2023) for explicit forms). The process converges to a logistic-normal stationary law as tt\to\infty. The OU process is particularly suited due to its tractable marginals, direct control over noising speed via θ\theta, and analytically computable transition kernels.

2. PolyDiff for 3D Polygonal Mesh Generation

PolyDiff is also instantiated as a diffusion model for generating 3D polygonal meshes directly in their native discrete, topological representations (Alliegro et al., 2023). Each triangle mesh is described by a vertex set V={viR3}V=\{v_i \in \mathbb{R}^3\} and face set FF constructed from vertex indices. Coordinates are quantized: Z(x)=xxminxmaxxmin(C1),C=256Z(x) = \left\lfloor\frac{x - x_{\min}}{x_{\max} - x_{\min}}(C-1)\right\rfloor, \quad C=256 forming a tensor TZm×3×3T \in \mathbb{Z}^{m\times 3\times 3}, where mm is the number of faces. The forward diffusion applies categorical noise independently to each of the D=m×3×3D=m\times 3 \times 3 tensor elements using a stochastic matrix QtQ_t, with a noise schedule βt\beta_t.

The reverse process is modeled using a transformer-based denoising network (U-ViT). The network's output is a categorical distribution over quantized bins for each triangle-face-coordinate slot. This architecture enables joint modeling of geometry and mesh topology, as the transformer observes all faces and shared vertex references.

3. Training and Objective Functions

For simplex-based PolyDiff, training utilizes continuous-time score-matching. Let sθ(x,t)s_\theta(x, t) be a time-dependent score network approximating xlogpt(x)\nabla_x \log p_t(x); the denoising objective is: L(θ)=Et,x0,xt[λ(t)sθ(xt,t)xtlogp0t(xtx0)2]\mathcal{L}(\theta) = \mathbb{E}_{t, x_0, x_t} \left[ \lambda(t) \| s_\theta(x_t, t) - \nabla_{x_t}\log p_{0t}(x_t|x_0) \|^2 \right] where p0tp_{0t} is the logistic-normal transition kernel.

For mesh PolyDiff, each discrete variable is supervised via cross-entropy loss: L(θ)=1Tt=1TEq(x0),q(xtx0)[d=1Dc=0C11[xt1d=c]logpθ(xt1d=cxt,t)]\mathcal{L}(\theta) = \frac{1}{T}\sum_{t=1}^T \mathbb{E}_{q(\mathbf{x}_0),q(\mathbf{x}_t|\mathbf{x}_0)} \left[ -\sum_{d=1}^D \sum_{c=0}^{C-1} \mathbf{1}[x_{t-1}^d=c] \log p_\theta(x_{t-1}^d=c | \mathbf{x}_t, t) \right] No explicit topological constraints are imposed; empirical results indicate that the global view of the transformer yields self-consistent face-vertex associations.

4. Sampling, Inference, and Algorithmic Workflow

For PolyDiff on the simplex, generative sampling involves simulating the reverse SDE: dXt=[f(Xt,t)12(GG)(Xt,t)GGsθ(Xt,t)]dt+G(Xt,t)dWˉtdX_t = \left[ f(X_t, t) - \tfrac{1}{2}\nabla\cdot(GG^\top)(X_t, t) - GG^\top s_\theta(X_t, t) \right] dt + G(X_t, t) d\bar{W}_t backwards in time, starting from a sample from the logistic-normal stationary law. The final categorical sample is obtained by an argmax\arg\max over the simplex coordinates.

In mesh PolyDiff, sampling starts from a uniformly random quantized tensor and applies TT reverse diffusion steps, iteratively updating each categorical variable through the transformer's predictions, followed by dequantization and reconstruction of the mesh (V,F)(V, F). No explicit post hoc topology repair is needed.

5. Empirical Evaluation and Comparisons

PolyDiff demonstrates substantial improvement over previous mesh and shape generation baselines, as established in experiments on ShapeNet classes (chairs, tables, benches, displays). Metrics include JSD, FID (via Inception-v3 renderings), MMD, and 1-NNA.

Class PolyDiff JSD BSPNet JSD PolyDiff FID PolyGen FID BSPNet FID
Chair 14.7 22.8 41.1 48.3 73.9
Table 13.1 21.0 26.2 46.2 -
Bench 86.6 102.0 49.7 - 81.9
Display 69.0 61.1 42.6 56.0 -

The average FID improvement relative to prior art is Δ18.2\Delta \approx 18.2, with a corresponding JSD gain of 5.8\approx 5.8 (Alliegro et al., 2023). Ablations confirm the necessity of discrete (vs. continuous) diffusion for coherence in mesh outputs.

On categorical image data (MNIST quantized to k=3k=3 levels), PolyDiff on the simplex yields high-fidelity digit reconstructions from discrete corners without post hoc thresholding (Floto et al., 2023).

6. Extensions to General Polytopes and Limitations

The PolyDiff methodology extends to any convex polytope PP with a smooth, invertible map T:RdPT: \mathbb{R}^d \to P. For instance, mapping the latent OU process through a coordinate-wise sigmoid yields diffusion over [0,1]d[0,1]^d (the unit cube), with analytic “sigmoid-Gaussian” kernels and tractable score functions. This generality undergirds PolyDiff's applicability to bounded image generation and other domains where target data resides on a polytope.

Limitations include slower sampling relative to feed-forward models, restriction to single-object mesh generation (for the mesh instantiation), and potential instability near simplex boundaries if the raw score (rather than the reverse-SDE term) is predicted. Accelerated samplers (e.g., Karras et al., Consistency Models) can address the first limitation; scene-level mesh generation remains an open avenue (Alliegro et al., 2023, Floto et al., 2023).

7. Relations and Connections

PolyDiff constitutes an explicit extension of denoising diffusion probabilistic models (DDPMs) [Ho et al., 2020], leveraging discrete (categorical) diffusion as in Structured Diffusion [Austin et al., 2021] and transformer-based masked modeling (U-ViT) [Bao et al., 2022]. Distinctions include its direct handling of simplex and polytopic support, precise geometric/topological modeling for mesh data, and tractable reverse processes on polytopes via analytic transition kernels derived from OU dynamics.

A plausible implication is that PolyDiff establishes a generalizable pattern for bridging continuous diffusion formulations and discrete- or combinatorial-structure generation by exploiting equivariant mappings and analytically tractable SDEs. This provides a rigorous foundation for further research in probabilistic modeling on constrained spaces—particularly in domains requiring native support for categorical, mesh, or other polytope-constrained data (Floto et al., 2023, Alliegro et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to PolyDiff.