- The paper introduces SplatWeaver, a framework that adaptively allocates Gaussian primitives using cardinality experts to capture scene complexity effectively.
- The technique leverages a high-frequency prior from wavelet transforms and neighbor-conditioned predictions to ensure geometric consistency and detailed rendering.
- Experimental results show up to 70% fewer Gaussians with improved PSNR and SSIM, demonstrating significant efficiency in diverse novel view synthesis tasks.
SplatWeaver: Adaptive Allocation of Gaussian Primitives for Generalizable Novel View Synthesis
Introduction
The paper "SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis" (2605.07287) addresses the limitations of current generalizable 3D Gaussian Splatting (3DGS) methods, which typically utilize fixed or uniform distributions of Gaussian primitives across image pixels or voxels. These conventional approaches fail to efficiently capture the spatial variability in scene complexity, leading to redundant primitives in homogeneous regions and insufficient capacity in areas with intricate geometric or textural details. SplatWeaver introduces a feed-forward, complexity-aware paradigm for the adaptive allocation of Gaussian primitives, yielding compact yet expressive 3D scene representations without the requirement of per-scene optimization.
Methodology
Cardinality Gaussian Experts and Adaptive Allocation
SplatWeaver's central contribution is the design of cardinality Gaussian experts. Each expert is specialized to predict a fixed quantity (0 to M) of Gaussian primitives, where M=3 balances allocation expressivity and routing complexity. Allocation operates at the pixel level: a router assigns each pixel's features to the most appropriate expert, thereby flexibly distributing computational resources according to local scene content. This routing is realized via a Gumbel-Softmax-based module, which allows for discrete, differentiable expert selection within a feed-forward pass.
The framework extends beyond naive assignment by leveraging feature aggregation and neighbor-conditioned parameter prediction. Rather than predicting the entire parameter set per primitive, experts acquire spatial positions and latent features, while the remaining parameters (scale, rotation, opacity, SH color coefficients) are inferred through attention-based aggregation from spatial neighbors, promoting physical coherence and local geometric consistency.
High-Frequency Prior Guidance
SplatWeaver integrates a high-frequency prior by extracting features via a discrete wavelet transform (DWT) of the input images. The resulting high-frequency energy map acts as an auxiliary signal, guiding the distribution of Gaussian experts through both feature injection and a dedicated regularization loss. Pixels ranked by high-frequency content are softly supervised to prioritize higher-cardinality experts in complex areas and lower-cardinality experts in smooth regions. This mechanism biases the allocation towards a "dense where complex, sparse where smooth" behavior, aligning the predicted Gaussian distribution with the structural demands of the scene.
Training and Regularization
The training objective consists of a composite loss:
- MSE and perceptual losses for rendering supervision,
- Huber and depth MSE for camera pose and geometry consistency,
- a routing regularization term (applied in early training, controlled by the high-frequency map),
- and a Gaussian budget control to prevent superfluous allocations (targeting 0.3x the input pixel count by default).
The model employs a multi-view geometry transformer (VGGT) backbone and is optimized for diverse uncalibrated, in-the-wild scenarios. Robustness to hyperparameters is demonstrated through comprehensive ablation studies.
Experimental Results
Scene Representation Efficiency and Fidelity
SplatWeaver is evaluated on benchmarks including DL3DV, RealEstate10K, and Mip-NeRF 360, under both sparse and dense view settings. It consistently outperforms SoTA generalizable methods and pruning-based frameworks in quantitative metrics (PSNR, SSIM, LPIPS), achieving higher PSNR with up to 70% fewer Gaussians than AnySplat, and significantly reduced storage and rendering latency.
Qualitative comparisons reveal that SplatWeaver preserves fine structures and textural attributes more accurately and robustly than query-based (C3G, TokenGS) or pruning methods (EcoSplat), especially in scenes exhibiting large intra-frame variability or limited view coverage. The adaptive budget mechanism yields emergent properties, such as spontaneous adjustment of primitive density based on both view and scene complexity without manual user intervention.
Pose Estimation and Geometric Consistency
The method also demonstrates improved camera pose accuracy (AUC@10/30) relative to both AnySplat and base VGGT, attributable to the enhanced geometric priors induced by sparsity and complexity-aware primitive assignment. In ablation, neighbor-conditioned prediction and high-frequency prior guidance contribute notable gains to geometric fidelity and detail rendering.
Theoretical and Practical Implications
SplatWeaver provides a framework for complexity-aware, budget-controlled 3DGS, closing the gap between explicit, scene-adaptive representations and the efficiency requirements of real-time or large-scale novel view synthesis. Its cardinality expert design introduces the MoE (Mixture-of-Experts) paradigm to dynamic 3D scene representations, enabling scalable, differentiable, and physically motivated primitive allocation directly from uncalibrated images.
On a theoretical level, the use of frequency priors to regularize expert routing is an effective strategy for bridging signal processing priors and learned neural allocations. The positive alignment between high-frequency maps and Gaussian density offers a general principle for structure-aware resource distribution in neural reconstruction pipelines.
In practical terms, the approach is directly extensible to various real-world, in-the-wild capture scenarios, including cases with unknown or noisy camera poses. The reduction in primitive count without sacrificing rendering fidelity has direct implications for mobile and web-based 3D graphics deployment, bandwidth-efficient scene transmission, and on-device AR/VR applications.
Future Directions
Potential developments include hierarchical expert routing for ultra-large scenes, further compression of primitive attributes via learned quantization, and extension to dynamic or monocular video sequences. Beyond vanilla 3DGS, SplatWeaver embodies a blueprint for progressive neural scene abstraction where explicit allocation policies can be driven by task or user-level criteria (e.g., region-of-interest rendering or semantic-aware allocation).
Conclusion
SplatWeaver presents a robust feed-forward solution for adaptive Gaussian allocation in generalizable novel view synthesis, characterized by its cardinality expert routing, high-frequency-guided regularization, and neighbor-conditioned prediction modules. The framework achieves superior scene fidelity with markedly fewer primitives, improving both the efficiency and expressivity of 3DGS-based representations. These findings advocate for further integration of dynamic neural network and frequency-domain principles in high-fidelity, scalable scene synthesis pipelines (2605.07287).