Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

144 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

46 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Scalable Micro-Macro Wavelet Gaussian Splatting

Updated 30 June 2025

SMW-GS is a scalable 3D reconstruction framework that combines multi-scale micro-macro projection with wavelet-based sampling to capture fine details and broad contextual cues.
It employs discrete wavelet transforms to decompose CNN feature maps into multiple frequency sub-bands, enhancing texture representation and resilience to lighting variations.
A point-statistics-guided partitioning strategy ensures consistent supervision across large-scale scenes, yielding improved PSNR, SSIM, and efficient rendering performance.

Scalable Micro-macro Wavelet-based Gaussian Splatting (SMW-GS) is an advanced framework for 3D scene reconstruction and rendering that addresses scalability, appearance modeling, and efficiency at large scale. SMW-GS is characterized by the explicit integration of multi-scale spatial sampling (“micro-macro projection”), frequency-domain feature decomposition (“wavelet-based sampling”), and a point-statistics-guided approach for scalable and robust large-scale processing. These innovations distinguish SMW-GS from earlier single-scale and solely spatial-domain Gaussian splatting techniques, enabling high-fidelity, consistent reconstructions even in large, unconstrained, and visually complex environments.

1. Micro-macro Projection: Multi-Scale Feature Sampling

The micro-macro projection mechanism underpins SMW-GS's ability to robustly capture both fine-grained details (micro) and broad contextual appearance (macro) for each 3D Gaussian. Each Gaussian point is projected onto 2D feature maps derived from a convolutional neural network (CNN).

Micro-projection samples features within a narrow conical frustum about the 2D projected location of the Gaussian, using $k_s$ learnable spatial offsets $\{nc_i\}_{k_s}$ . This averaging captures local, fine-scale appearance variation critical for representing texture and high-frequency structure.
Macro-projection employs a broader frustum, the radius of which is adapted based on the 3D distance between the Gaussian center and the camera:

$\dot{R} = \frac{\dot{R}_{\mathrm{max}}}{\|\mathbf{x}_i - \mathbf{x}_c\|_2}$

where $\mathbf{x}_i$ is the Gaussian center and $\mathbf{x}_c$ the camera center. $k_s$ learnable scaling factors $\{bc_i\}_{k_s}$ generate the macro sampling locations. This projects context features, aggregating illumination and environmental cues across a region.

Both micro and macro features are concatenated, creating a per-Gaussian appearance descriptor that mixes local and contextual cues. This scheme alleviates oversmoothing, enhances diversity across Gaussians, and improves rendering especially in varied lighting or complex geometry.

2. Wavelet-based Sampling: Frequency-Aware Feature Decomposition

To further enhance expressiveness and robustness under large-scale or unconstrained imaging conditions, SMW-GS integrates frequency-domain analysis via discrete wavelet decomposition of CNN feature maps. Each feature map $\mathbf{F}^{MAP}$ undergoes a multi-level 2D Discrete Wavelet Transform (DWT), yielding four frequency sub-bands per level:

$\begin{align*} \mathbf{F}^{LL} &= \mathbf{L F L}^\top \ \mathbf{F}^{LH} &= \mathbf{H F L}^\top \ \mathbf{F}^{HL} &= \mathbf{L F H}^\top \ \mathbf{F}^{HH} &= \mathbf{H F H}^\top \end{align*}$

where $\mathbf{L}$ and $\mathbf{H}$ are low- and high-pass filters, respectively. Micro- and macro-projections are then performed for each frequency band, and the sampled features are linearly aggregated:

$f^n_{r,m} = \sum_{j=1}^{4^m} \omega^n_{m,j} f^n_{r,m,j} \qquad f^b_{r,m} = \sum_{j=1}^{4^m} \omega^b_{m,j} f^b_{r,m,j}$

with learnable weights $\omega$ . The final multi-scale, frequency-aware refined feature vector for each Gaussian is

$f_r = f^n_{r,0} \oplus f^b_{r,0} \oplus \cdots \oplus f^n_{r,M} \oplus f^b_{r,M}$

( $\oplus$ denotes concatenation, $M$ is number of DWT levels).

Wavelet decomposition enables SMW-GS to represent high-frequency texture, edges, and low-frequency appearance, addressing appearance variations due to lighting, material properties, or environmental conditions.

3. Large-scale Scene Promotion: Point-Statistics-Guided Partitioning and Supervision

For tractable and consistent large-scale scene reconstruction, SMW-GS employs a scene partitioning and supervision strategy designed to ensure uniform, dense, and content-relevant learning across all regions and Gaussians.

Initial Partition divides the COLMAP-derived point cloud into overlapping blocks, based on spatial quantiles and boundary expansion. Each GPU may process a subset of blocks.
Point-Statistics-Guided (PSG) Camera Assignment consists of two stages:
- Visibility-aware assignment: For each point, the algorithm ensures a minimum number of supervising cameras, using a greedy approach to maximize camera coverage for points with insufficient views.
- Content-relevant augmentation: For each scene block, rendered images are compared with and without the block's points; if the structural similarity index (SSIM) drops by more than threshold $\eta$ , the corresponding camera is assigned to that block for supervision.
Rotational Block Training iteratively rotates GPU-block assignments across training epochs to maintain global consistency, prevent overfitting in memory-limited settings, and ensure each Gaussian gets multi-contextual supervision.

This strategy guarantees sufficient per-Gaussian supervision and appearance consistency even in large, visually diverse urban or natural scenes.

4. Technical Framework and Mathematical Formulation

SMW-GS integrates the above mechanisms into the classic 3D Gaussian Splatting (3DGS) rendering pipeline. Each Gaussian $\mathcal{G}$ is parameterized by center $\mu$ , covariance $V$ , opacity $\sigma$ , and appearance features $f_r$ .

The per-pixel color is computed by depth-ordered alpha blending:

$C = \sum_{i=1}^N c_i \alpha_i \prod_{j=1}^{i-1} (1 - \alpha_j)$

where $c_i$ is the color and $\alpha_i$ the opacity from the fused appearance features.

Multi-scale and multi-frequency features ( $f_r$ ), derived via micro-macro projection and wavelet sampling, are fed through a neural decoder (e.g., a hierarchical MLP) to predict color and opacity for each Gaussian.
Partition assignments and PSG supervision ensure distributed, scalable, and uniform backpropagation at all scales.
Empirically, the method achieves a significant increase in PSNR and SSIM, and a reduction in LPIPS compared to baselines, along with efficient rendering speeds.

5. Empirical Evaluation and Scalability

SMW-GS has been validated on both classical and large-scale benchmarks:

Datasets: Phototourism, UrbanScene3D, Mill-19, MatrixCity.
Quantitative Results: Yields 1.4–2dB PSNR improvement over prior methods (GS-W, CR-NeRF, WildGaussians), with sharper, more accurate geometry. In expansive urban scenes, SMW-GS outperforms CityGaussian and Momentum-GS by wide margins (e.g., 26.59 PSNR vs. 21.62), and maintains stability under strong lighting variations.
Efficiency: Delivers comparable or superior rendering speed and reduced model storage relative to decomposition-based baselines.
Qualitative Observations: Preserves fine geometric detail, maintains appearance consistency across large blocks and illumination transitions, and demonstrates robustness for depth recovery.

6. Applications and Impact

SMW-GS is particularly effective for:

Urban-scale 3D mapping: Provides consistent, high-fidelity reconstructions for city environments, VR/AR applications, digital twins, urban planning, and heritage documentation.
Immersive content creation: Efficient, detail-preserving real-time rendering enables novel view synthesis and appearance editing (relighting, style transfer, time-of-day effects).
Scientific and engineering domains: Scalable, frequency-aware modeling is directly beneficial for robotics, remote sensing, environmental monitoring, and simulation pipelines requiring robust 3D perception.

The explicit separation of global, refined, and intrinsic features, together with scalable partitioned training, marks a significant advance in 3D representation and lays the groundwork for robust, generalizable real-world 3D modeling.

7. Implications and Prospects

SMW-GS sets a new standard for high-capacity, detail-preserving, and scalable 3D Gaussian Splatting algorithms.

The integration of micro-macro and wavelet approaches enables explicit multi-scale, multi-frequency appearance modeling, supporting both structural fidelity and contextual robustness.
Large-scale scene promotion, via PSG partitioning and rotational training, addresses previous shortcomings of local-only methods and fragmentation-induced inconsistency.
The architecture and methodology are extensible to even larger environments, integration with foundation models, and diverse downstream tasks such as editing and simulation.

Future efforts may include transient occlusion handling, further efficiency enhancements, and semantic-aware modeling using pre-trained visual representations.

Component	Technique	Impact
Micro-macro Sampling	Jittered, multi-scale projection	Robust detail/context capture
Wavelet Decomposition	DWT on feature maps with band-wise sampling	Frequency-aware, resilient features
Scene Promotion	PSG partitioning, rotational block training	Scalable, balanced, consistent learning
Performance	Outperforms all scale-aware and city-scale baselines	Large, realistic scene reconstruction
Applicability	Urban mapping, VR/AR, editing, simulation	High-fidelity, real-world deployment

PDF Markdown Chat (Upgrade)