Papers
Topics
Authors
Recent
2000 character limit reached

Splannequin: Frozen 3D Scene Synthesis

Updated 5 December 2025
  • Splannequin is a dual-detection regularization framework that synthesizes frozen 3D scenes from monocular Mannequin-Challenge videos by addressing ghosting and blur artifacts.
  • It detects hidden and defective 3D Gaussian states and applies temporally anchored regularization to ensure robust, artifact-free freeze-time rendering.
  • The method integrates seamlessly into dynamic Gaussian splatting pipelines with zero inference overhead, demonstrating significant improvements in IQA metrics and user preferences.

Splannequin is a regularization framework designed for synthesizing high-fidelity frozen 3D scenes from monocular Mannequin-Challenge (MC) videos, allowing user-controlled selection of “frozen” timestamps with minimal artifacts. Unlike standard dynamic scene reconstruction methods that focus on accurate motion modeling, Splannequin explicitly addresses the “freeze-time” challenge: rendering artifact-free static scenes from dynamic, sparsely supervised monocular sequences, where common approaches suffer from ghosting and blur due to ill-supervised 3D primitives. Splannequin introduces a dual-detection anchoring method that detects two ill-posed states of 3D Gaussians and applies temporally anchored regularization, integrating seamlessly into any dynamic Gaussian splatting pipeline with zero inference overhead (Chien et al., 4 Dec 2025).

1. Problem Setup: Freezing Monocular Mannequin-Challenge Footage

MC videos consist of casual, single-camera recordings in which actors strive to remain stationary, but often slight micro-motions persist. The task is, given NN training images {(In,Rn,bn,tn)}\{(I_n,R_n,b_n,t_n)\}—where InI_n is the input image, RnR_n the camera matrix, bnb_n any additional metadata, and tnt_n the timestamp—to generate bullet-time sequences {I^(R,b,t)}\{\hat I(R,b,t^\star)\} at arbitrary, user-selected timestamps tt^\star. A core difficulty arises because dynamic 3D scene reconstruction approaches are inherently trained along the (typically) diagonal trajectory formed by the camera path in space–time, while “freeze-time” rendering queries require slicing the learned representation horizontally at t=tt = t^\star. As a result, many Gaussians are never or only weakly supervised at the desired tt^\star, producing ghosting and blurred artifacts in conventional methods (Chien et al., 4 Dec 2025).

2. Dynamic Gaussian Splatting for Video and Naive Freeze-Time Rendering

Dynamic Gaussian splatting models scenes with a set of KK canonical 3D Gaussians {Gk}\{G_k\}, each parameterized by a static mean μk\boldsymbol{\mu}_k and covariance Σk\Sigma_k. Temporal variation is captured using a deformation MLP fθf_\theta, such that for any time tt:

(Δμk,t,ΔΣk,t)=fθ(μk,t)(\Delta\mu_{k,t}, \Delta\Sigma_{k,t}) = f_\theta(\mu_k, t)

and the time-dependent primitive is

Gk(t)=(μk+Δμk,t,Σk+ΔΣk,t)G_k(t) = (\mu_k + \Delta\mu_{k,t}, \Sigma_k + \Delta\Sigma_{k,t})

Training minimizes a photometric reconstruction loss,

Lrecon=n=1N(I^(Rn,bn,tn),In)\mathcal L_{\rm recon} = \sum_{n=1}^N \ell\bigl(\hat I(R_n, b_n, t_n), I_n\bigr)

where I^\hat I is rendered via differentiable rasterization of the current set of deformed Gaussians at tnt_n. At inference, naive freeze-time rendering substitutes t=tt = t^\star; however, since many Gaussians are either unobserved or poorly supervised at tt^\star (due to occlusions, frustum exclusion, or camera path sparsity), this leads to artifacts such as floating, blurred, or ghost blobs (Chien et al., 4 Dec 2025).

3. Ill-Supervised Gaussian Detection: Hidden and Defective States

Splannequin introduces automatic detection of two ill-supervised Gaussian states at each training time tt:

  • Hidden State: A Gaussian Gk(t)G_k(t) is “hidden” if its projected center is outside the camera frustum at tt. Formally,

$s_{\rm hidden}(k,t) = \begin{cases} 1, & \text{if projected center of $G_k(t)$ is out of frustum} \ 0, & \text{otherwise} \end{cases}$

  • Defective State: A Gaussian Gk(t)G_k(t) is “defective” if it is inside the frustum but receives negligible photometric supervision, measured by the per-primitive gradient norm:

sdefective(k,t)={1,Lrecon/ψk(t)ϵ (ϵ109) 0,otherwises_{\rm defective}(k,t)= \begin{cases} 1, & \|\partial \mathcal L_{\rm recon} / \partial \psi_k(t)\| \leq \epsilon\ (\epsilon \approx 10^{-9}) \ 0, & \text{otherwise} \end{cases}

A Gaussian is “well-supervised” only if both indicators are zero. These detector criteria allow systematic identification of Gaussians prone to drift at freeze-time surfaces (Chien et al., 4 Dec 2025).

4. Temporally Anchored Regularization Loss

When hidden or defective Gaussians are detected, Splannequin applies a temporally anchored regularization. Let ψk(t)\psi_k(t) denote the full parameter vector of Gaussian kk at time tt (position, covariance, opacity, and spherical harmonic coefficients). For a specified discrepancy measure D\mathcal D (L1 or L2 distance variants),

D(ψk(t),ψk(tref))={ψk(t)ψk(tref)1(L1) ψk(t)ψk(tref)22(L2)\mathcal D(\psi_k(t), \psi_k(t_{\rm ref}))= \begin{cases} \|\psi_k(t)-\psi_k(t_{\rm ref})\|_1 & \text{(L1)} \ \|\psi_k(t)-\psi_k(t_{\rm ref})\|_2^2 & \text{(L2)} \end{cases}

with a confidence weight,

ϕ(t,tref)=exp(τttref), τ>0\phi(t, t_{\rm ref}) = \exp(-\tau |t - t_{\rm ref}|),\ \tau>0

the anchored losses are constructed as follows:

  • Hidden-Gaussian Anchoring (for shidden(k,t)=1s_{\rm hidden}(k,t)=1): Sample a reference time tref<tt_{\rm ref}<t where the Gaussian is well-supervised, applying

Lhidden(k,t)=ϕ(t,tref)D(ψk(t),ψk(tref))\mathcal L_{\rm hidden}^{(k,t)} = \phi(t, t_{\rm ref})\, \mathcal D\big(\psi_k(t), \psi_k(t_{\rm ref})\big)

  • Defective-Gaussian Anchoring (for sdefective(k,t)=1s_{\rm defective}(k,t)=1): Sample tref>tt_{\rm ref}>t well-supervised, and regularize similarly,

Ldefective(k,t)=ϕ(t,tref)D(ψk(t),ψk(tref))\mathcal L_{\rm defective}^{(k,t)} = \phi(t, t_{\rm ref})\, \mathcal D\big(\psi_k(t), \psi_k(t_{\rm ref})\big)

  • Full Objective: The combined training loss is

L=Lrecon+λhiddenk,tLhidden(k,t)+λdefectivek,tLdefective(k,t)\mathcal L = \mathcal L_{\rm recon} + \lambda_{\rm hidden} \sum_{k,t} \mathcal L_{\rm hidden}^{(k,t)} + \lambda_{\rm defective} \sum_{k,t} \mathcal L_{\rm defective}^{(k,t)}

with λhidden=λdefective=10\lambda_{\rm hidden} = \lambda_{\rm defective} = 10 used in practice and confidence decay τ=5\tau = 5.

This approach effectively anchors drift-prone Gaussians to nearby, well-supervised past (hidden) or future (defective) reference states, limiting their freedom to introduce artifacts at poorly supervised timestamps (Chien et al., 4 Dec 2025).

5. Integration and Implementation Details

Splannequin is architecture-agnostic and incurs zero inference overhead. The dual-detection loss terms are added directly to existing dynamic Gaussian splatting systems, requiring no change to network structure or rendering procedures. During training, at each iteration:

  • Two random (view, tt) pairs are sampled.
  • Each Gaussian is classified as hidden/defective/well-supervised.
  • For each ill-supervised primitive, a suitable reference time treft_{\rm ref} is sampled and the anchor loss computed.

At inference, “freeze-time” rendering remains unchanged: the original dynamic model is queried at t=tt = t^\star, yielding high throughput (e.g., 280 FPS on an RTX 4090). The training framework is based on PyTorch and standard progressive densification, with 30,000 iterations per scene. The regularization schedule involves beginning anchored regularization at iteration 10,000 with L2 distance; at iteration 20,000, the loss switches to L1 distance. Losses are computed every 10 iterations using randomly sampled anchors (Chien et al., 4 Dec 2025).

6. Experimental Evaluation and Quantitative Gains

Benchmarks:

  • Real-world: 10 MC-style videos (2,869 input frames, 361 freeze-time clips, 640×360 resolution, <10% consistent visibility).
  • Synthetic: 10 Blender scenes (2,400 frames, 300 freeze-time renders, perfect ground-truth).

Compared baselines: 4DGaussians, D-3DGS, SC-GS (with matched hyperparameters).

Metrics:

  • Reference (synthetic): PSNR (↑), SSIM (↑), LPIPS (↓), FVD (↓).
  • No-reference (real): CQA (composition), TOPIQ-NR, CLIP-IQA, MUSIQ, HyperIQA, and COVER (semantic, technical, aesthetic, overall).

Results:

Setting Baseline +Splannequin
PSNR 28.03 28.85
SSIM 0.81 0.83
LPIPS 0.09 0.08
FVD 98.93 82.73
CQA +26.4%
COVER (overall) +6.6% (aesthetic +95.6%)
D-3DGS+Splannequin COVER (tech) +339.9%

In real-world evaluation, Splannequin significantly improves all IQA metrics and dramatically reduces ghosting and blur. A user paper with 23 participants reported 96% preference for Splannequin-rendered clips for visual appeal and 80% preference for “more perfectly frozen” scenes. Ablations demonstrate extreme degradations if either the hidden or defective loss terms are removed: e.g., removing hidden loss drops COVER overall by 1072% and CQA by 162%, and removing defective loss drops COVER by 1027% and CQA by 779%. Omitting confidence distance weighting results in over-smoothed frames (Chien et al., 4 Dec 2025).

7. Significance, Limitations, and Future Directions

Splannequin demonstrates that dual-state anchored regularization for dynamic Gaussian splatting robustly mitigates freeze-time artifacts in monocular MC videos, enabling artifact-free, high-fidelity, user-selectable time slices at arbitrary points, with no architectural or inference penalties. These results extend the practical utility of dynamic Gaussian pipelines to the MC “freezing” regime, previously a source of substantive ghosting artifacts and fidelity loss. The approach’s simplicity and compatibility with any dynamic-GS method suggest broad applicability.

A plausible implication is that future work could target more challenging scenarios with even sparser temporal supervision, or extend the dual-detection regularization approach to other temporal 3D representations beyond Gaussians. The strong quantitative and user paper gains motivate further exploration of adaptive or semantic-guided anchoring, as well as more refined measures of per-Gaussian supervision.

For comprehensive implementation details, architecture-agnostic integration steps, and source code, see the project page: https://chien90190.github.io/splannequin/ (Chien et al., 4 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Splannequin.