PolyDiff: Diffusion on Polytopes
- PolyDiff is a diffusion-based generative framework for discrete data on polytopes, modeling categorical images and 3D meshes with analytic OU dynamics.
- It employs diffusion on the probability simplex and transformer-based denoising to jointly capture geometric and combinatorial structures.
- Empirical results show significant FID and JSD improvements over existing methods, establishing its efficacy in generating high-quality mesh and shape data.
PolyDiff refers to a family of diffusion-based generative models for discrete or piecewise-linear structures on polytopes, most notably for direct generation or reconstruction of categorical, polygonal, or mesh data. As a methodological framework, PolyDiff models leverage diffusion processes either on the probability simplex or in the space of quantized mesh coordinates, enabling coherent modeling of both geometric and combinatorial structures in data such as categorical images or 3D polygonal meshes (Floto et al., 2023, Alliegro et al., 2023).
1. Diffusion on the Probability Simplex
The foundational principle of PolyDiff is the formulation of diffusion on the probability simplex , the set of categorical probability distributions in categories. The continuous-time forward process is defined in an unconstrained latent space , where follows a mean-reverting Ornstein–Uhlenbeck (OU) process: Transitioning to the simplex is achieved by the additive-logistic (softmax) map: with
Ensuring and .
Through Itô's lemma, the induced process on the simplex itself is: with closed-form drift and diffusion matrix (see (Floto et al., 2023) for explicit forms). The process converges to a logistic-normal stationary law as . The OU process is particularly suited due to its tractable marginals, direct control over noising speed via , and analytically computable transition kernels.
2. PolyDiff for 3D Polygonal Mesh Generation
PolyDiff is also instantiated as a diffusion model for generating 3D polygonal meshes directly in their native discrete, topological representations (Alliegro et al., 2023). Each triangle mesh is described by a vertex set and face set constructed from vertex indices. Coordinates are quantized: forming a tensor , where is the number of faces. The forward diffusion applies categorical noise independently to each of the tensor elements using a stochastic matrix , with a noise schedule .
The reverse process is modeled using a transformer-based denoising network (U-ViT). The network's output is a categorical distribution over quantized bins for each triangle-face-coordinate slot. This architecture enables joint modeling of geometry and mesh topology, as the transformer observes all faces and shared vertex references.
3. Training and Objective Functions
For simplex-based PolyDiff, training utilizes continuous-time score-matching. Let be a time-dependent score network approximating ; the denoising objective is: where is the logistic-normal transition kernel.
For mesh PolyDiff, each discrete variable is supervised via cross-entropy loss: No explicit topological constraints are imposed; empirical results indicate that the global view of the transformer yields self-consistent face-vertex associations.
4. Sampling, Inference, and Algorithmic Workflow
For PolyDiff on the simplex, generative sampling involves simulating the reverse SDE: backwards in time, starting from a sample from the logistic-normal stationary law. The final categorical sample is obtained by an over the simplex coordinates.
In mesh PolyDiff, sampling starts from a uniformly random quantized tensor and applies reverse diffusion steps, iteratively updating each categorical variable through the transformer's predictions, followed by dequantization and reconstruction of the mesh . No explicit post hoc topology repair is needed.
5. Empirical Evaluation and Comparisons
PolyDiff demonstrates substantial improvement over previous mesh and shape generation baselines, as established in experiments on ShapeNet classes (chairs, tables, benches, displays). Metrics include JSD, FID (via Inception-v3 renderings), MMD, and 1-NNA.
| Class | PolyDiff JSD | BSPNet JSD | PolyDiff FID | PolyGen FID | BSPNet FID |
|---|---|---|---|---|---|
| Chair | 14.7 | 22.8 | 41.1 | 48.3 | 73.9 |
| Table | 13.1 | 21.0 | 26.2 | 46.2 | - |
| Bench | 86.6 | 102.0 | 49.7 | - | 81.9 |
| Display | 69.0 | 61.1 | 42.6 | 56.0 | - |
The average FID improvement relative to prior art is , with a corresponding JSD gain of (Alliegro et al., 2023). Ablations confirm the necessity of discrete (vs. continuous) diffusion for coherence in mesh outputs.
On categorical image data (MNIST quantized to levels), PolyDiff on the simplex yields high-fidelity digit reconstructions from discrete corners without post hoc thresholding (Floto et al., 2023).
6. Extensions to General Polytopes and Limitations
The PolyDiff methodology extends to any convex polytope with a smooth, invertible map . For instance, mapping the latent OU process through a coordinate-wise sigmoid yields diffusion over (the unit cube), with analytic “sigmoid-Gaussian” kernels and tractable score functions. This generality undergirds PolyDiff's applicability to bounded image generation and other domains where target data resides on a polytope.
Limitations include slower sampling relative to feed-forward models, restriction to single-object mesh generation (for the mesh instantiation), and potential instability near simplex boundaries if the raw score (rather than the reverse-SDE term) is predicted. Accelerated samplers (e.g., Karras et al., Consistency Models) can address the first limitation; scene-level mesh generation remains an open avenue (Alliegro et al., 2023, Floto et al., 2023).
7. Relations and Connections
PolyDiff constitutes an explicit extension of denoising diffusion probabilistic models (DDPMs) [Ho et al., 2020], leveraging discrete (categorical) diffusion as in Structured Diffusion [Austin et al., 2021] and transformer-based masked modeling (U-ViT) [Bao et al., 2022]. Distinctions include its direct handling of simplex and polytopic support, precise geometric/topological modeling for mesh data, and tractable reverse processes on polytopes via analytic transition kernels derived from OU dynamics.
A plausible implication is that PolyDiff establishes a generalizable pattern for bridging continuous diffusion formulations and discrete- or combinatorial-structure generation by exploiting equivariant mappings and analytically tractable SDEs. This provides a rigorous foundation for further research in probabilistic modeling on constrained spaces—particularly in domains requiring native support for categorical, mesh, or other polytope-constrained data (Floto et al., 2023, Alliegro et al., 2023).