Interval Score Matching for Text-to-3D

Updated 9 March 2026

ISM is a score-based distillation framework for text-to-3D generation that overcomes SDS limitations by using deterministic DDIM inversion and interval score matching.
It provides consistent supervision signals that reduce over-smoothing and improve reconstruction fidelity through multi-step score alignment.
ISM facilitates faster convergence and robust 3D asset generation, forming the basis for advanced methods like Trajectory Score Matching (TSM).

Interval Score Matching (ISM) is a score-based distillation framework for text-to-3D generation that addresses the limitations of Score Distillation Sampling (SDS) by employing deterministic Denoising Diffusion Implicit Models (DDIM) inversion and interval-based score matching. ISM produces consistent supervision signals for optimizing 3D representations directly with pretrained 2D diffusion models, resulting in improved reconstruction fidelity, reduced over-smoothing, and greater training efficiency compared to prior approaches. ISM was introduced in the LucidDreamer system and has influenced the development of further generalizations such as Trajectory Score Matching (TSM) (Liang et al., 2023, Miao et al., 2024).

1. Motivation and Conceptual Foundations

Score Distillation Sampling (SDS), first used in DreamFusion, distills knowledge from text-conditioned 2D diffusion models (such as DDPMs) into 3D neural fields or explicit representations. SDS performs a reconstruction-based objective: for a rendered image $x_0 = \mathrm{render}(\theta, c)$ (with 3D parameters $\theta$ and camera $c$ ), SDS adds noise and attempts to match the predicted denoising direction to the true score of the forward diffusion kernel $q(x_t|x_0)$ :

$\mathcal{L}_\text{SDS}(\theta) = \mathbb{E}_{t,c} [ \omega(t) \|\epsilon_\phi(x_t, t, y) - \epsilon(x_t, t)\|^2 ],$

where $\epsilon_\phi$ is the diffusion model's predicted noise (conditioned on prompt $y$ ), $\epsilon(x_t, t)$ is the ground truth score, and $\omega(t)$ is a time-dependent weighting. Due to stochastic noise draws and the need for faithful reconstructions at high noise levels, SDS exhibits semantically inconsistent pseudo-ground-truths that, when averaged over many steps, lead to over-smoothing of geometry and texture.

ISM introduces deterministic DDIM inversion and interval score matching to address these challenges. Rather than targeting a denoised $x_0$ from a noisy $x_t$ , ISM encourages the matching of predicted diffusion scores across a fixed interval in the DDIM trajectory. DDIM inversion removes the randomness in noisy latent generation, yielding supervision that is consistently aligned across views and time steps. The objective targets the difference between the score at a later timestep $t$ (conditioned on text) and an earlier timestep $s$ (unconditional), regularizing the learning of 3D parameters via interval consistency (Liang et al., 2023).

2. Formalism and Loss Function

The ISM loss for a rendered view $x_0 = g(\theta, c)$ is defined as follows. Let $(x_s, x_t)$ be noisy latents at inversion steps $s$ and $t$ ($0 < s < t$), computed by deterministic DDIM inversion. The denoiser outputs $\epsilon_\phi(x_u, u, y)$ (conditioned) and $\epsilon_\phi(x_u, u, \emptyset)$ (unconditioned), and $\omega(t)$ is a reweighting function. The squared-error loss and gradient are:

$\mathcal{L}_\text{ISM}(\theta) = \mathbb{E}_{t,c} \bigl[ \omega(t) \bigl\| \epsilon_\phi(x_t, t, y) - \epsilon_\phi(x_s, s, \emptyset) \bigr\|^2 \bigr]$

$\nabla_\theta \mathcal{L}_\text{ISM}(\theta) = \mathbb{E}_{t,c}\bigl[ \omega(t)\;(\epsilon_\phi(x_t, t, y) - \epsilon_\phi(x_s, s, \emptyset)) \;\frac{\partial x_0}{\partial \theta} \bigr].$

Here, the “pseudo-ground-truth” is the unconditional score $\epsilon_\phi(x_s, s, \emptyset)$ , and the target is the text-conditioned score at $x_t$ .

Key differences with SDS are:

Latent Generation: ISM latents are created by deterministic DDIM inversion, eliminating random noise.
Supervision Interval: ISM uses interval score matching $(x_s, x_t)$ rather than reconstructing all the way from $x_t$ to $x_0$ .
Score Alignment: ISM matches conditional and unconditional scores at different timesteps, exploiting multi-step denoising quality and enhanced stability (Liang et al., 2023, Miao et al., 2024).

3. DDIM Inversion and Trajectory Consistency

The DDIM inversion procedure traverses “upwards” along the time axis, deterministically mapping the rendered image $x_0$ to noisier latents $x_s$ and $x_t$ :

Starting from $x_0$ , apply inverted DDIM updates for $u = 1, \dots, s$ (producing $x_s$ ), then continue for $u = s+1, \dots, t$ to compute $x_t$ .
The inversion formula uses the reverse-sampling update of DDIM:

$x_{u-1} = \sqrt{\alpha_{u-1}} \left( \frac{x_u - \sqrt{1-\alpha_u}\,\epsilon_\phi(x_u, u, \emptyset)}{\sqrt{\alpha_u}} \right) + \sqrt{1-\alpha_{u-1}}\,\epsilon_\phi(x_u, u, \emptyset),$

where $\alpha_u$ are noise schedule terms.

This approach ensures that for a given $(\theta, c)$ and prompt, both $x_s$ and $x_t$ are consistent across evaluations. The ISM algorithm may use large inversion strides without material impact on the supervision quality, providing computational efficiency (Liang et al., 2023). The interval length $\delta_T = t-s$ and the inversion stride $\delta_S$ act as key hyperparameters controlling granularity and speed.

4. Pseudo-Ground-Truth Inconsistency and ISM Limitations

Despite the determinism of DDIM inversion, ISM still suffers from two main sources of error:

Linearization Error: Each inversion step approximates $\epsilon_\phi(x_u, u) \approx \epsilon_\phi(x_{u-1}, u-1)$ , accumulating small deviations from the exact diffusion trajectory.
Target Drift: The “pseudo-ground-truth” $\epsilon_\phi(x_s, s, \emptyset)$ varies depending on the choice of $s$ and the path taken, as accumulated errors differ across inversion runs.

These errors manifest as local blurring or inconsistency in the resulting 3D asset, especially in high-detail or ambiguous regions. When discrepancies are large, the supervision signal becomes an average of incompatible pseudo-GTs, leading to the smoothing out of geometry or textures (Miao et al., 2024).

A summary of ISM's strengths and limitations compared to SDS is given below:

Method	Latent Generation	Score Supervision	Key Limitations
SDS	Random-noise DDPM	One-step, x₀ reconstruction	Over-smoothing, noisy pseudo-GTs
ISM	DDIM inversion	Interval, (x_s, x_t)	Drift in pseudo-GT, accumulated error

5. Generalization: Connection to Trajectory Score Matching (TSM)

Trajectory Score Matching (TSM) generalizes ISM by introducing an intermediate time $\mu \in (s, t)$ . After inversion to $x_s$ , two forward (denoising) paths are taken: one to $x_\mu$ , one to $x_t$ . The TSM loss is:

$\mathcal{L}_\text{TSM}(\theta) = \mathbb{E}_{t,c}\bigl[\omega(t)\|\epsilon_\phi(x_t, t, y)-\epsilon_\phi(x_\mu, \mu, \emptyset)\|^2\bigr].$

ISM is the special case with $\mu = s$ . For any intermediate $\mu \in (s, t)$ , the accumulated drift between $x_\mu$ and $x_t$ is strictly smaller than the drift between $x_s$ and $x_t$ , reducing pseudo-ground-truth inconsistency and increasing stability. Consequently, TSM produces sharper and more consistent outputs when compared to ISM due to its reduced error propagation (Miao et al., 2024).

6. Training Algorithm and Practical Implementation

The ISM training procedure for text-to-3D distillation proceeds as follows (paraphrased version, omitting constants):

Sample a camera $c$ and render $x_0$ from current 3D parameters $\theta$ .
Sample $t$ uniformly and set $s = t - \delta_T$ .
Perform accelerated DDIM inversion with stride $\delta_S$ to obtain $x_s$ from $x_0$ .
Continue inversion one more step to get $x_t$ from $x_s$ .
Evaluate conditional ( $\epsilon_\phi(x_t, t, y)$ ) and unconditional ( $\epsilon_\phi(x_s, s, \emptyset)$ ) scores.
Compute the ISM gradient and update $\theta$ :

$g = \omega(t)\bigl(\epsilon_\phi(x_t, t, y) - \epsilon_\phi(x_s, s, \emptyset)\bigr)\frac{\partial \mathrm{render}(\theta, c)}{\partial \theta}$

Update parameters: $\theta \leftarrow \theta - \eta g$ .

This procedure is agnostic to the type of 3D representation (e.g., NeRF, 3D Gaussian Splatting), and hyperparameters $(\delta_T, \delta_S)$ can be tuned to trade off speed, sharpness, and stability (Liang et al., 2023).

7. Empirical Results and Impact

Experiments with LucidDreamer (ISM + 3D Gaussian Splatting) demonstrate notable improvements over SDS-based methods in terms of geometric accuracy, detail preservation, training efficiency, and user preference. Specifically:

Qualitatively, ISM distills fine geometry (e.g., hair strands, clothing folds) where SDS and variants produce over-smoothed models.
User studies report LucidDreamer (ISM) as most preferred, with a ranking of 1.25 compared to DreamFusion (3.28), Magic3D (3.44), ProlificDreamer (2.37), and others.
ISM achieves faster convergence (e.g., ∼5 hours on A100 vs. 10–15 hours for SDS-based pipelines at equal batch size and settings).
Larger inversion strides $\delta_S$ speed up inversion with negligible impact on fidelity; varying $\delta_T$ alters the trade-off between local detail and global structure.

ISM's improvements validate deterministic inversion and interval-based supervision as mechanisms for robust 3D distillation from 2D diffusion priors. Successors such as TSM further ameliorate pseudo-ground-truth drift and yield enhanced stability (Liang et al., 2023, Miao et al., 2024).

ISM’s theoretical and practical advances over SDS are foundational in the current landscape of score-based text-to-3D model distillation, providing a template for further innovation in trajectory-level score supervision and robust 3D synthesis pipelines.

Markdown Report Issue Upgrade to Chat

References (2)

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching (2023)

Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Interval Score Matching (ISM).