PolyDiff: Diffusion on Polytopes

Updated 15 December 2025

PolyDiff is a diffusion-based generative framework for discrete data on polytopes, modeling categorical images and 3D meshes with analytic OU dynamics.
It employs diffusion on the probability simplex and transformer-based denoising to jointly capture geometric and combinatorial structures.
Empirical results show significant FID and JSD improvements over existing methods, establishing its efficacy in generating high-quality mesh and shape data.

PolyDiff refers to a family of diffusion-based generative models for discrete or piecewise-linear structures on polytopes, most notably for direct generation or reconstruction of categorical, polygonal, or mesh data. As a methodological framework, PolyDiff models leverage diffusion processes either on the probability simplex or in the space of quantized mesh coordinates, enabling coherent modeling of both geometric and combinatorial structures in data such as categorical images or 3D polygonal meshes (Floto et al., 2023, Alliegro et al., 2023).

1. Diffusion on the Probability Simplex

The foundational principle of PolyDiff is the formulation of diffusion on the probability simplex $S^k$ , the set of categorical probability distributions in $k+1$ categories. The continuous-time forward process is defined in an unconstrained latent space $y \in \mathbb{R}^k$ , where $y$ follows a mean-reverting Ornstein–Uhlenbeck (OU) process: $dY_t = -\theta Y_t\, dt + \sigma dW_t, \qquad \theta>0, \; \sigma>0$ Transitioning to the simplex is achieved by the additive-logistic (softmax) map: $X_t = \sigma(Y_t)$ with

$\sigma_i(y) = \begin{cases} \frac{e^{y_i}}{1+\sum_{j=1}^k e^{y_j}}, & i=1\ldots k \ \frac{1}{1+\sum_{j=1}^k e^{y_j}}, & i=k+1 \end{cases}$

Ensuring $0 \leq X_{t,i} \leq 1$ and $\sum_{i=1}^{k+1} X_{t,i} = 1$ .

Through Itô's lemma, the induced process on the simplex itself is: $dX_t = f(X_t, t) dt + G(X_t, t) dW_t$ with closed-form drift $f$ and diffusion matrix $G$ (see (Floto et al., 2023) for explicit forms). The process converges to a logistic-normal stationary law as $t\to\infty$ . The OU process is particularly suited due to its tractable marginals, direct control over noising speed via $\theta$ , and analytically computable transition kernels.

2. PolyDiff for 3D Polygonal Mesh Generation

PolyDiff is also instantiated as a diffusion model for generating 3D polygonal meshes directly in their native discrete, topological representations (Alliegro et al., 2023). Each triangle mesh is described by a vertex set $V=\{v_i \in \mathbb{R}^3\}$ and face set $F$ constructed from vertex indices. Coordinates are quantized: $Z(x) = \left\lfloor\frac{x - x_{\min}}{x_{\max} - x_{\min}}(C-1)\right\rfloor, \quad C=256$ forming a tensor $T \in \mathbb{Z}^{m\times 3\times 3}$ , where $m$ is the number of faces. The forward diffusion applies categorical noise independently to each of the $D=m\times 3 \times 3$ tensor elements using a stochastic matrix $Q_t$ , with a noise schedule $\beta_t$ .

The reverse process is modeled using a transformer-based denoising network (U-ViT). The network's output is a categorical distribution over quantized bins for each triangle-face-coordinate slot. This architecture enables joint modeling of geometry and mesh topology, as the transformer observes all faces and shared vertex references.

3. Training and Objective Functions

For simplex-based PolyDiff, training utilizes continuous-time score-matching. Let $s_\theta(x, t)$ be a time-dependent score network approximating $\nabla_x \log p_t(x)$ ; the denoising objective is: $\mathcal{L}(\theta) = \mathbb{E}_{t, x_0, x_t} \left[ \lambda(t) \| s_\theta(x_t, t) - \nabla_{x_t}\log p_{0t}(x_t|x_0) \|^2 \right]$ where $p_{0t}$ is the logistic-normal transition kernel.

For mesh PolyDiff, each discrete variable is supervised via cross-entropy loss: $\mathcal{L}(\theta) = \frac{1}{T}\sum_{t=1}^T \mathbb{E}_{q(\mathbf{x}_0),q(\mathbf{x}_t|\mathbf{x}_0)} \left[ -\sum_{d=1}^D \sum_{c=0}^{C-1} \mathbf{1}[x_{t-1}^d=c] \log p_\theta(x_{t-1}^d=c | \mathbf{x}_t, t) \right]$ No explicit topological constraints are imposed; empirical results indicate that the global view of the transformer yields self-consistent face-vertex associations.

4. Sampling, Inference, and Algorithmic Workflow

For PolyDiff on the simplex, generative sampling involves simulating the reverse SDE: $dX_t = \left[ f(X_t, t) - \tfrac{1}{2}\nabla\cdot(GG^\top)(X_t, t) - GG^\top s_\theta(X_t, t) \right] dt + G(X_t, t) d\bar{W}_t$ backwards in time, starting from a sample from the logistic-normal stationary law. The final categorical sample is obtained by an $\arg\max$ over the simplex coordinates.

In mesh PolyDiff, sampling starts from a uniformly random quantized tensor and applies $T$ reverse diffusion steps, iteratively updating each categorical variable through the transformer's predictions, followed by dequantization and reconstruction of the mesh $(V, F)$ . No explicit post hoc topology repair is needed.

5. Empirical Evaluation and Comparisons

PolyDiff demonstrates substantial improvement over previous mesh and shape generation baselines, as established in experiments on ShapeNet classes (chairs, tables, benches, displays). Metrics include JSD, FID (via Inception-v3 renderings), MMD, and 1-NNA.

Class	PolyDiff JSD	BSPNet JSD	PolyDiff FID	PolyGen FID	BSPNet FID
Chair	14.7	22.8	41.1	48.3	73.9
Table	13.1	21.0	26.2	46.2	-
Bench	86.6	102.0	49.7	-	81.9
Display	69.0	61.1	42.6	56.0	-

The average FID improvement relative to prior art is $\Delta \approx 18.2$ , with a corresponding JSD gain of $\approx 5.8$ (Alliegro et al., 2023). Ablations confirm the necessity of discrete (vs. continuous) diffusion for coherence in mesh outputs.

On categorical image data (MNIST quantized to $k=3$ levels), PolyDiff on the simplex yields high-fidelity digit reconstructions from discrete corners without post hoc thresholding (Floto et al., 2023).

6. Extensions to General Polytopes and Limitations

The PolyDiff methodology extends to any convex polytope $P$ with a smooth, invertible map $T: \mathbb{R}^d \to P$ . For instance, mapping the latent OU process through a coordinate-wise sigmoid yields diffusion over $[0,1]^d$ (the unit cube), with analytic “sigmoid-Gaussian” kernels and tractable score functions. This generality undergirds PolyDiff's applicability to bounded image generation and other domains where target data resides on a polytope.

Limitations include slower sampling relative to feed-forward models, restriction to single-object mesh generation (for the mesh instantiation), and potential instability near simplex boundaries if the raw score (rather than the reverse-SDE term) is predicted. Accelerated samplers (e.g., Karras et al., Consistency Models) can address the first limitation; scene-level mesh generation remains an open avenue (Alliegro et al., 2023, Floto et al., 2023).

7. Relations and Connections

PolyDiff constitutes an explicit extension of denoising diffusion probabilistic models (DDPMs) [Ho et al., 2020], leveraging discrete (categorical) diffusion as in Structured Diffusion [Austin et al., 2021] and transformer-based masked modeling (U-ViT) [Bao et al., 2022]. Distinctions include its direct handling of simplex and polytopic support, precise geometric/topological modeling for mesh data, and tractable reverse processes on polytopes via analytic transition kernels derived from OU dynamics.

A plausible implication is that PolyDiff establishes a generalizable pattern for bridging continuous diffusion formulations and discrete- or combinatorial-structure generation by exploiting equivariant mappings and analytically tractable SDEs. This provides a rigorous foundation for further research in probabilistic modeling on constrained spaces—particularly in domains requiring native support for categorical, mesh, or other polytope-constrained data (Floto et al., 2023, Alliegro et al., 2023).

PDF Markdown Chat (Pro)

References (2)

Diffusion on the Probability Simplex (2023)

PolyDiff: Generating 3D Polygonal Meshes with Diffusion Models (2023)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to PolyDiff.