Gaussian-Softmax Diffusion: Joint Data Modeling

Updated 19 July 2025

Gaussian-Softmax Diffusion is a method that injects Gaussian noise into logit space and uses softmax to create a continuous relaxation of discrete variables.
It enables simultaneous modeling of continuous geometric parameters and categorical variables, making it ideal for tasks like CAD sketch synthesis.
Empirical results demonstrate significant improvements in generative quality, including a FID reduction from 16.04 to 7.80 and lower NLL scores.

Gaussian-Softmax Diffusion is a generative modeling framework for handling joint continuous and discrete data, in which logits representing discrete variables are perturbed by Gaussian noise and then projected onto the probability simplex using the softmax function. This methodology enables a principled and permutation-invariant approach to simultaneous modeling of both continuous geometric parameters and categorical variables, with particular utility for combinatorial-structured data such as CAD sketches. First introduced as the core innovation in SketchDNN (Chereddy et al., 15 Jul 2025), Gaussian-Softmax diffusion establishes a continuous relaxation for discrete class labels, carefully balancing gradual information destruction and recovery over the course of the diffusion process. This mechanism yields notable improvements over prior discrete and continuous generative models, as quantified by substantial gains in Fréchet Inception Distance (FID) and negative log-likelihood (NLL) on benchmark datasets.

1. Foundations of Gaussian-Softmax Diffusion

At the core of Gaussian-Softmax diffusion is a mechanism for representing and diffusing discrete variables—particularly one-hot vectors—via a process that replaces abrupt categorical transitions with a probabilistic, continuous relaxation. This is achieved by introducing Gaussian noise in logit (log-probability) space and then mapping the result back to the simplex: $yₜ = \mathrm{softmax} \left( \sqrt{\bar{\alpha}_t}\,\log y'_0 + \sqrt{1-\bar{\alpha}_t}\,\varepsilon \right)$ where

$y'_0 = k\,y_0 + (1-k)/D \cdot 1$ is a smoothed form of the original one-hot vector $y_0$ with smoothing constant $k \approx 0.99$ and $D$ the number of classes,
$\bar{\alpha}_t$ is the cumulative noise schedule,
$\varepsilon \sim \mathcal{N}(0,I)$ is standard Gaussian noise.

Instead of “hard” one-hot class assignments, this scheme allows for “blended” or superposed class probabilities that evolve continuously as noise is added or removed, forming what may be termed a Gaussian-Softmax distribution (Editor's term).

2. Forward and Reverse Processes: Mathematical Structure

The forward, or noising, process begins with a near-one-hot vector ( $y'_0$ ) and iteratively perturbs it in logit space by adding scaled Gaussian noise, mapping the result to the simplex using the softmax function. The density induced by this process across the simplex is: $p(y \mid \mu, \sigma^2 I) = \frac{\prod_{i=1}^D y_i}{Z(\sigma)} \exp\left( -\frac{1}{2\sigma^2} \|\tilde{N}(y) - \mu'\|_\perp^2 \right)$ where $Z(\sigma)$ is a normalization constant, $\mu'$ is a shifted logit vector, and $\|\cdot\|_\perp$ denotes the norm on the subspace orthogonal to the all-ones vector (enforcing invariance to constant shifts, consistent with the softmax’s invariance property).

The reverse, or denoising, transition follows a similar pattern. In logit space, the update at time $t-1$ is: $y_{t-1} = \mu_{t-1}(y_t, y_0) + \sigma_{t-1} \cdot \varepsilon$ where

$\mu_{t-1}(y_t, y_0) = \frac{ \sqrt{\alpha_t}(1-\bar{\alpha}_{t-1}) \log y_t + \sqrt{\bar{\alpha}_{t-1}} \sqrt{1-\alpha_t} \log y_0 }{ 1 - \bar{\alpha}_t }$

with all operations performed in logit space before projecting onto the simplex through softmax. This gently sharpens the probability distribution, guiding it progressively back toward a genuine (or nearly one-hot) categorical assignment.

3. Handling Heterogeneity and Permutation Invariance in CAD Sketches

CAD sketches are composed of varied geometric primitives—each with distinct parameterizations—and are permutation-invariant with respect to their components. Gaussian-Softmax diffusion addresses both issues:

Heterogeneity: Each primitive’s identity (discrete type) and its parameters (continuous values) are jointly represented; the blended probabilities mean a primitive need not abruptly commit to a type until the final denoising steps, enabling smooth transitions and improved sample diversity.
Permutation invariance: By applying the forward and reverse diffusion processes independently across primitive instances (i.e., factorized diffusion) and omitting positional encodings in the transformer-based denoiser, the architecture preserves invariance to the order of primitives—a critical property for representing CAD sketches.

This combination allows SketchDNN to unify the modeling of both high-variance, continuous geometric attributes and categorical structure in a coherent, tractable framework (Chereddy et al., 15 Jul 2025).

4. Variance Schedule Augmentation and Noise Scheduling

Introducing noise via the softmax-projected logit space can distort the stochastic process and erode class identity too quickly if not managed carefully. To address this, an augmented variance schedule is employed: $\bar{b}_t = \frac{f(\bar{\alpha}_t)^2}{f(\bar{\alpha}_t)^2 + f(k)^2}$ where $f(x) = \log\left( \frac{1-x}{(D-1)x + 1} \right)$ .

This schedule ensures that the decay of the main class probability (i.e., the chance that the label remains unchanged) is controlled and gradual across steps, which enables the reverse process to recover high-confidence labels without abrupt transitions or vanishing gradients.

5. Empirical Results and Implications

Gaussian-Softmax diffusion within SketchDNN demonstrates tangible gains in generative quality for CAD datasets, particularly SketchGraphs. Notably:

Fréchet Inception Distance (FID) is improved from 16.04 (previous best) to 7.80.
Negative Log-Likelihood (NLL) is decreased from 84.8 to 81.33 (bits/sketch). These improvements substantiate that the soft blending and controlled noising/denoising of discrete structures foster both higher fidelity in generation and a more faithful learned data distribution.

6. Relation to Broader Diffusion and Simplex Diffusion Methods

Gaussian-Softmax diffusion closely aligns with recent advances in simplex diffusion (Floto et al., 2023), where continuous diffusion is performed in a latent space before projecting onto the simplex via softmax, as well as with invertible reparameterizations for discrete variables using Gaussian noise and softmax transformations (Potapczynski et al., 2019). The approach is particularly distinguished by its targeted application to mixed continuous-discrete structured data, making use of the softmax not only as a classifier output but as a geometric tool to maintain simplex constraints throughout the generative process.

7. Significance and Extension

By providing a mathematically grounded, permutation-invariant, and empirically effective method for continuous-discrete generative modeling, Gaussian-Softmax diffusion represents a substantive advance in structured data generation. Its principles may be extended to other settings involving hybrid discrete-continuous variables, with potential impacts spanning computer-aided design, graph generation, and any domain where categorical labels arise alongside continuous attributes in a compositional setting.

Aspect	Technique/Formula	Purpose
Forward process (discrete vars)	$y_t = \mathrm{softmax}(\sqrt{\bar{\alpha}_t} \log y'_0 + \sqrt{1-\bar{\alpha}_t} \varepsilon)$	Blend noise into class logits, project to simplex
Reverse (denoising) update	$y_{t-1} = \mu_{t-1}(y_t, y_0) + \sigma_{t-1} \varepsilon$ (in logit space)	Gradual sharpening/recovery of class confidence
Variance schedule augmentation	$\bar{b}_t = f(\bar{\alpha}_t)^2 / (f(\bar{\alpha}_t)^2 + f(k)^2)$ , $f(x) = \log(\frac{1-x}{(D-1)x + 1})$	Maintain controlled probability decay for class labels

In summary, Gaussian-Softmax diffusion leverages noise injection in logit space and softmax projection to produce a flexible, robust framework for modeling discrete categorical variables in generative modeling pipelines involving continuous-diffusion processes, with state-of-the-art performance demonstrated in structured generation tasks such as CAD sketch synthesis (Chereddy et al., 15 Jul 2025).