Splannequin: Frozen 3D Scene Synthesis
- Splannequin is a dual-detection regularization framework that synthesizes frozen 3D scenes from monocular Mannequin-Challenge videos by addressing ghosting and blur artifacts.
- It detects hidden and defective 3D Gaussian states and applies temporally anchored regularization to ensure robust, artifact-free freeze-time rendering.
- The method integrates seamlessly into dynamic Gaussian splatting pipelines with zero inference overhead, demonstrating significant improvements in IQA metrics and user preferences.
Splannequin is a regularization framework designed for synthesizing high-fidelity frozen 3D scenes from monocular Mannequin-Challenge (MC) videos, allowing user-controlled selection of “frozen” timestamps with minimal artifacts. Unlike standard dynamic scene reconstruction methods that focus on accurate motion modeling, Splannequin explicitly addresses the “freeze-time” challenge: rendering artifact-free static scenes from dynamic, sparsely supervised monocular sequences, where common approaches suffer from ghosting and blur due to ill-supervised 3D primitives. Splannequin introduces a dual-detection anchoring method that detects two ill-posed states of 3D Gaussians and applies temporally anchored regularization, integrating seamlessly into any dynamic Gaussian splatting pipeline with zero inference overhead (Chien et al., 4 Dec 2025).
1. Problem Setup: Freezing Monocular Mannequin-Challenge Footage
MC videos consist of casual, single-camera recordings in which actors strive to remain stationary, but often slight micro-motions persist. The task is, given training images —where is the input image, the camera matrix, any additional metadata, and the timestamp—to generate bullet-time sequences at arbitrary, user-selected timestamps . A core difficulty arises because dynamic 3D scene reconstruction approaches are inherently trained along the (typically) diagonal trajectory formed by the camera path in space–time, while “freeze-time” rendering queries require slicing the learned representation horizontally at . As a result, many Gaussians are never or only weakly supervised at the desired , producing ghosting and blurred artifacts in conventional methods (Chien et al., 4 Dec 2025).
2. Dynamic Gaussian Splatting for Video and Naive Freeze-Time Rendering
Dynamic Gaussian splatting models scenes with a set of canonical 3D Gaussians , each parameterized by a static mean and covariance . Temporal variation is captured using a deformation MLP , such that for any time :
and the time-dependent primitive is
Training minimizes a photometric reconstruction loss,
where is rendered via differentiable rasterization of the current set of deformed Gaussians at . At inference, naive freeze-time rendering substitutes ; however, since many Gaussians are either unobserved or poorly supervised at (due to occlusions, frustum exclusion, or camera path sparsity), this leads to artifacts such as floating, blurred, or ghost blobs (Chien et al., 4 Dec 2025).
3. Ill-Supervised Gaussian Detection: Hidden and Defective States
Splannequin introduces automatic detection of two ill-supervised Gaussian states at each training time :
- Hidden State: A Gaussian is “hidden” if its projected center is outside the camera frustum at . Formally,
$s_{\rm hidden}(k,t) = \begin{cases} 1, & \text{if projected center of $G_k(t)$ is out of frustum} \ 0, & \text{otherwise} \end{cases}$
- Defective State: A Gaussian is “defective” if it is inside the frustum but receives negligible photometric supervision, measured by the per-primitive gradient norm:
A Gaussian is “well-supervised” only if both indicators are zero. These detector criteria allow systematic identification of Gaussians prone to drift at freeze-time surfaces (Chien et al., 4 Dec 2025).
4. Temporally Anchored Regularization Loss
When hidden or defective Gaussians are detected, Splannequin applies a temporally anchored regularization. Let denote the full parameter vector of Gaussian at time (position, covariance, opacity, and spherical harmonic coefficients). For a specified discrepancy measure (L1 or L2 distance variants),
with a confidence weight,
the anchored losses are constructed as follows:
- Hidden-Gaussian Anchoring (for ): Sample a reference time where the Gaussian is well-supervised, applying
- Defective-Gaussian Anchoring (for ): Sample well-supervised, and regularize similarly,
- Full Objective: The combined training loss is
with used in practice and confidence decay .
This approach effectively anchors drift-prone Gaussians to nearby, well-supervised past (hidden) or future (defective) reference states, limiting their freedom to introduce artifacts at poorly supervised timestamps (Chien et al., 4 Dec 2025).
5. Integration and Implementation Details
Splannequin is architecture-agnostic and incurs zero inference overhead. The dual-detection loss terms are added directly to existing dynamic Gaussian splatting systems, requiring no change to network structure or rendering procedures. During training, at each iteration:
- Two random (view, ) pairs are sampled.
- Each Gaussian is classified as hidden/defective/well-supervised.
- For each ill-supervised primitive, a suitable reference time is sampled and the anchor loss computed.
At inference, “freeze-time” rendering remains unchanged: the original dynamic model is queried at , yielding high throughput (e.g., 280 FPS on an RTX 4090). The training framework is based on PyTorch and standard progressive densification, with 30,000 iterations per scene. The regularization schedule involves beginning anchored regularization at iteration 10,000 with L2 distance; at iteration 20,000, the loss switches to L1 distance. Losses are computed every 10 iterations using randomly sampled anchors (Chien et al., 4 Dec 2025).
6. Experimental Evaluation and Quantitative Gains
Benchmarks:
- Real-world: 10 MC-style videos (2,869 input frames, 361 freeze-time clips, 640×360 resolution, <10% consistent visibility).
- Synthetic: 10 Blender scenes (2,400 frames, 300 freeze-time renders, perfect ground-truth).
Compared baselines: 4DGaussians, D-3DGS, SC-GS (with matched hyperparameters).
Metrics:
- Reference (synthetic): PSNR (↑), SSIM (↑), LPIPS (↓), FVD (↓).
- No-reference (real): CQA (composition), TOPIQ-NR, CLIP-IQA, MUSIQ, HyperIQA, and COVER (semantic, technical, aesthetic, overall).
Results:
| Setting | Baseline | +Splannequin |
|---|---|---|
| PSNR | 28.03 | 28.85 |
| SSIM | 0.81 | 0.83 |
| LPIPS | 0.09 | 0.08 |
| FVD | 98.93 | 82.73 |
| CQA | — | +26.4% |
| COVER (overall) | — | +6.6% (aesthetic +95.6%) |
| D-3DGS+Splannequin COVER (tech) | — | +339.9% |
In real-world evaluation, Splannequin significantly improves all IQA metrics and dramatically reduces ghosting and blur. A user paper with 23 participants reported 96% preference for Splannequin-rendered clips for visual appeal and 80% preference for “more perfectly frozen” scenes. Ablations demonstrate extreme degradations if either the hidden or defective loss terms are removed: e.g., removing hidden loss drops COVER overall by 1072% and CQA by 162%, and removing defective loss drops COVER by 1027% and CQA by 779%. Omitting confidence distance weighting results in over-smoothed frames (Chien et al., 4 Dec 2025).
7. Significance, Limitations, and Future Directions
Splannequin demonstrates that dual-state anchored regularization for dynamic Gaussian splatting robustly mitigates freeze-time artifacts in monocular MC videos, enabling artifact-free, high-fidelity, user-selectable time slices at arbitrary points, with no architectural or inference penalties. These results extend the practical utility of dynamic Gaussian pipelines to the MC “freezing” regime, previously a source of substantive ghosting artifacts and fidelity loss. The approach’s simplicity and compatibility with any dynamic-GS method suggest broad applicability.
A plausible implication is that future work could target more challenging scenarios with even sparser temporal supervision, or extend the dual-detection regularization approach to other temporal 3D representations beyond Gaussians. The strong quantitative and user paper gains motivate further exploration of adaptive or semantic-guided anchoring, as well as more refined measures of per-Gaussian supervision.
For comprehensive implementation details, architecture-agnostic integration steps, and source code, see the project page: https://chien90190.github.io/splannequin/ (Chien et al., 4 Dec 2025).