SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis

Published 8 May 2026 in cs.CV | (2605.07287v1)

Abstract: Generalizable novel view synthesis aims to render unseen views from uncalibrated input images without requiring per-scene optimization. Recent feed-forward approaches based on 3D Gaussian Splatting have achieved promising efficiency and rendering quality. However, most of them assign a fixed number of Gaussians to each pixel or voxel, ignoring the spatially varying complexity of real-world scenes. Such uniform allocation often wastes Gaussian primitives in smooth regions while providing insufficient capacity for fine structures, complex geometry, and high-frequency details. This motivates us to predict region-dependent primitive cardinalities rather than impose a fixed primitive budget everywhere, enabling a more expressive yet compact 3D scene representation. Therefore, we propose SplatWeaver, a generalizable novel view synthesis framework that is able to dynamically allocate Gaussian primitives over different regions in a feed-forward manner. Specifically, SplatWeaver introduces cardinality Gaussian experts and a pixel-level routing scheme, wherein each expert specializes in producing a specific number of primitives from 0 to M, and the routing scheme coordinates these experts to adaptively determine how many Gaussian primitives should be allocated to each spatial location. Moreover, SplatWeaver incorporates a high-frequency prior with attendant guidance module and routing regularization to stabilize expert selection and promote complexity-aware allocation. By leveraging high-frequency structural cues, the routing process is encouraged to assign more Gaussian primitives to fine structures, complex geometry, and textured regions, while suppressing redundant primitives in smooth areas. Extensive experiments across diverse scenarios show that SplatWeaver consistently outperforms state-of-the-art methods, delivering more faithful novel-view renderings with fewer Gaussian primitives.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces SplatWeaver, a framework that adaptively allocates Gaussian primitives using cardinality experts to capture scene complexity effectively.
The technique leverages a high-frequency prior from wavelet transforms and neighbor-conditioned predictions to ensure geometric consistency and detailed rendering.
Experimental results show up to 70% fewer Gaussians with improved PSNR and SSIM, demonstrating significant efficiency in diverse novel view synthesis tasks.

SplatWeaver: Adaptive Allocation of Gaussian Primitives for Generalizable Novel View Synthesis

Introduction

The paper "SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis" (2605.07287) addresses the limitations of current generalizable 3D Gaussian Splatting (3DGS) methods, which typically utilize fixed or uniform distributions of Gaussian primitives across image pixels or voxels. These conventional approaches fail to efficiently capture the spatial variability in scene complexity, leading to redundant primitives in homogeneous regions and insufficient capacity in areas with intricate geometric or textural details. SplatWeaver introduces a feed-forward, complexity-aware paradigm for the adaptive allocation of Gaussian primitives, yielding compact yet expressive 3D scene representations without the requirement of per-scene optimization.

Methodology

Cardinality Gaussian Experts and Adaptive Allocation

SplatWeaver's central contribution is the design of cardinality Gaussian experts. Each expert is specialized to predict a fixed quantity (0 to $M$ ) of Gaussian primitives, where $M=3$ balances allocation expressivity and routing complexity. Allocation operates at the pixel level: a router assigns each pixel's features to the most appropriate expert, thereby flexibly distributing computational resources according to local scene content. This routing is realized via a Gumbel-Softmax-based module, which allows for discrete, differentiable expert selection within a feed-forward pass.

The framework extends beyond naive assignment by leveraging feature aggregation and neighbor-conditioned parameter prediction. Rather than predicting the entire parameter set per primitive, experts acquire spatial positions and latent features, while the remaining parameters (scale, rotation, opacity, SH color coefficients) are inferred through attention-based aggregation from spatial neighbors, promoting physical coherence and local geometric consistency.

High-Frequency Prior Guidance

SplatWeaver integrates a high-frequency prior by extracting features via a discrete wavelet transform (DWT) of the input images. The resulting high-frequency energy map acts as an auxiliary signal, guiding the distribution of Gaussian experts through both feature injection and a dedicated regularization loss. Pixels ranked by high-frequency content are softly supervised to prioritize higher-cardinality experts in complex areas and lower-cardinality experts in smooth regions. This mechanism biases the allocation towards a "dense where complex, sparse where smooth" behavior, aligning the predicted Gaussian distribution with the structural demands of the scene.

Training and Regularization

The training objective consists of a composite loss:

MSE and perceptual losses for rendering supervision,
Huber and depth MSE for camera pose and geometry consistency,
a routing regularization term (applied in early training, controlled by the high-frequency map),
and a Gaussian budget control to prevent superfluous allocations (targeting 0.3x the input pixel count by default).

The model employs a multi-view geometry transformer (VGGT) backbone and is optimized for diverse uncalibrated, in-the-wild scenarios. Robustness to hyperparameters is demonstrated through comprehensive ablation studies.

Experimental Results

Scene Representation Efficiency and Fidelity

SplatWeaver is evaluated on benchmarks including DL3DV, RealEstate10K, and Mip-NeRF 360, under both sparse and dense view settings. It consistently outperforms SoTA generalizable methods and pruning-based frameworks in quantitative metrics (PSNR, SSIM, LPIPS), achieving higher PSNR with up to 70% fewer Gaussians than AnySplat, and significantly reduced storage and rendering latency.

Qualitative comparisons reveal that SplatWeaver preserves fine structures and textural attributes more accurately and robustly than query-based (C3G, TokenGS) or pruning methods (EcoSplat), especially in scenes exhibiting large intra-frame variability or limited view coverage. The adaptive budget mechanism yields emergent properties, such as spontaneous adjustment of primitive density based on both view and scene complexity without manual user intervention.

Pose Estimation and Geometric Consistency

The method also demonstrates improved camera pose accuracy (AUC@10/30) relative to both AnySplat and base VGGT, attributable to the enhanced geometric priors induced by sparsity and complexity-aware primitive assignment. In ablation, neighbor-conditioned prediction and high-frequency prior guidance contribute notable gains to geometric fidelity and detail rendering.

Theoretical and Practical Implications

SplatWeaver provides a framework for complexity-aware, budget-controlled 3DGS, closing the gap between explicit, scene-adaptive representations and the efficiency requirements of real-time or large-scale novel view synthesis. Its cardinality expert design introduces the MoE (Mixture-of-Experts) paradigm to dynamic 3D scene representations, enabling scalable, differentiable, and physically motivated primitive allocation directly from uncalibrated images.

On a theoretical level, the use of frequency priors to regularize expert routing is an effective strategy for bridging signal processing priors and learned neural allocations. The positive alignment between high-frequency maps and Gaussian density offers a general principle for structure-aware resource distribution in neural reconstruction pipelines.

In practical terms, the approach is directly extensible to various real-world, in-the-wild capture scenarios, including cases with unknown or noisy camera poses. The reduction in primitive count without sacrificing rendering fidelity has direct implications for mobile and web-based 3D graphics deployment, bandwidth-efficient scene transmission, and on-device AR/VR applications.

Future Directions

Potential developments include hierarchical expert routing for ultra-large scenes, further compression of primitive attributes via learned quantization, and extension to dynamic or monocular video sequences. Beyond vanilla 3DGS, SplatWeaver embodies a blueprint for progressive neural scene abstraction where explicit allocation policies can be driven by task or user-level criteria (e.g., region-of-interest rendering or semantic-aware allocation).

Conclusion

SplatWeaver presents a robust feed-forward solution for adaptive Gaussian allocation in generalizable novel view synthesis, characterized by its cardinality expert routing, high-frequency-guided regularization, and neighbor-conditioned prediction modules. The framework achieves superior scene fidelity with markedly fewer primitives, improving both the efficiency and expressivity of 3DGS-based representations. These findings advocate for further integration of dynamic neural network and frequency-domain principles in high-fidelity, scalable scene synthesis pipelines (2605.07287).

Markdown Report Issue