Mobile-GS: Mobile 3D Gaussian Splatting
- Mobile-GS is a methodology that enables efficient 3D Gaussian Splatting on mobile devices through a two-stage pipeline combining offline compression and on-device inference.
- It integrates depth-aware, order-independent blending with a compact neural module for view-dependent enhancements, improving occlusion handling and rendering quality.
- Mobile-GS leverages techniques like neural vector quantization and contribution-based pruning to achieve substantial model size reduction while maintaining competitive PSNR and SSIM.
Mobile-GS denotes a family of methodologies, systems, and metrics enabling 3D Gaussian Splatting (3DGS) or related computationally intensive tasks to operate efficiently on resource-constrained mobile platforms. Current research frames Mobile-GS both as an algorithmic advance in neural scene rendering and as a practical strategy for rapid, high-fidelity 3D reconstructions in robotics, extended reality, and real-time graphics. This article focuses on Mobile-GS as introduced in recent systems with explicit mobile and edge device deployment in mind, particularly the depth-aware, order-independent, and compressed 3DGS pipeline defined in (Du et al., 12 Mar 2026), while also connecting to real-time object-centric variations for robotics (&&&1&&&).
1. Pipeline Overview and Motivation
Mobile-GS targets the prohibitive computational cost and memory requirements of classical 3D Gaussian Splatting when executed on mobile hardware. The approach introduces a two-stage pipeline:
- Offline Training/Compression (typically on desktop GPUs):
- A high-fidelity "teacher" 3DGS model is constructed with fine appearance basis (e.g., 3rd-order spherical harmonics).
- A "student" model comprising anisotropic Gaussian primitives is initialized and optimized to match the teacher's colors and depths under a novel, order-independent compositing strategy.
- Appearance features are distilled, quantized, and pruned for size and inference efficiency.
- Online Inference (on-device, mobile GPU):
- All rendering is executed via a Vulkan-based compute pipeline, utilizing order-independent, parallel blending and compact neural modules to achieve real-time frame rates at full mobile resolution.
This pipeline concretely addresses two central bottlenecks in mobile deployment: the need to sort millions of Gaussians by depth (an O(N log N) operation, with N ≫ 10⁴–10⁵ in typical scenes) and the prohibitive storage footprint of uncompressed neural or analytic radiance fields (Du et al., 12 Mar 2026).
2. Mathematical Foundation
Each Gaussian primitive is parameterized by a center , covariance matrix (encoded as rotation and scale), appearance features (spherical harmonics coefficients), and an opacity . The spatial density is given by:
Classical rendering involves projecting Gaussians to each pixel, sorting by depth, and compositing via alpha blending:
with transmittance and alpha , where is the integrated density along the ray.
Mobile-GS replaces explicit sorting with a depth-aware, order-independent rule:
- For each overlapping Gaussian, compute and a learnable, depth-conditioned weight .
- Accumulate in parallel:
- The pixel color is
This approach is commutative and supports efficient, atomic accumulation on GPU, leading to O(N) per-pixel complexity (Du et al., 12 Mar 2026).
3. Neural View-Dependent Enhancement
Order-independent compositing introduces transparency artifacts when geometry overlaps due to loss of rendering order. Mobile-GS addresses this by introducing a compact neural module that conditions per-Gaussian contributions on the view direction and geometric attributes:
- A shared MLP ingests the camera-to-Gaussian vector, scale, rotation, and first-order SH coefficients, outputting a feature .
- Two heads predict (modulating the weight ) and (additional opacity scaling).
- Enhanced compositing replaces with and with , improving occlusion boundaries and view-dependent effects.
The model is trained with a loss comprising RGB, distillation (from the teacher 3DGS), and log-depth terms:
with , , , and specific forms for each loss included in the dataset (Du et al., 12 Mar 2026).
4. Model Compression and Mobile Optimizations
To achieve a model size viable for mobile deployment (single-digit megabytes):
- First-Order Spherical Harmonics Distillation: The appearance representation is distilled from a 3rd-order (9–16 coefficients) to a 1st-order SH basis (4 coefficients per color channel) via a pixel-wise distillation loss, reducing storage by approximately two thirds without significant loss in visual fidelity.
- Neural Vector Quantization (NVQ): High-dimensional attribute vectors are split into sub-vectors, quantized into learned codebooks (via K-means), and Huffman-encoded. At inference, a small MLP reconstructs attributes. This leads to 80–90% overall storage reduction.
- Contribution-Based Pruning: Gaussians with low opacity and small scale are identified using quantile thresholds over the attribute distributions. Candidates that consistently under-contribute are removed after accumulating votes over several pruning steps, yielding a further 30–50% reduction in active primitives with negligible PSNR loss.
- Vulkan-based Rasterization: The tile-based rasterizer is replaced with a fullscreen compute shader that efficiently bins and composites Gaussians by bounding sphere, minimizing API state changes and enabling branch-free weight calculations.
The resulting pipeline achieves ≈4.6 MB total model size (codebooks, MLP weights, Gaussian geometry) and a GPU memory footprint of ≈6 MB including buffers (Du et al., 12 Mar 2026).
5. Experimental Results
Extensive experiments demonstrate that Mobile-GS achieves both high-fidelity rendering and real-time performance:
- Desktop (RTX 3090): 1098 FPS on unbounded scenes with 24.82 PSNR, 0.856 SSIM, and 4.8 MB storage.
- Mobile (Snapdragon 8 Gen 3): 116 FPS (cold), 74 FPS (steady-state, post-throttling) at 1600×1063, PSNR 27.12 dB, SSIM 0.807. The model size is ≈4.6 MB, outperforming quantized 3DGS and SortFreeGS baselines by 17% in PSNR and 14× in size reduction.
- Ablations: Removing order-independent rendering increases PSNR marginally (+0.2 dB) at a 4–5× speed penalty. Omitting neural view dependence drops PSNR by 0.5 dB and leads to transparency artifacts. Excluding NVQ enlarges model size to >120 MB.
A user study with 30 volunteers on standard benchmarks (Mip-NeRF 360, Tank&Temples, Deep Blending) reported preference for Mobile-GS renderings in 64–79% of trials (Du et al., 12 Mar 2026).
6. Application in Real-Time Object-Centric Robotics
Mobile-GS also describes a variant pipeline for high-speed, object-of-interest (PoI) 3D reconstruction on mobile robots, as demonstrated in "CoRe-GS" (Schieber et al., 5 Sep 2025):
- Coarse-to-Refined Staging: An initial semantic GS representation is constructed in 3,000 iterations, yielding per-splat class labels; refinement focuses on Gaussians constituting the PoI with color-based pruning to remove "floaters."
- Efficiency: End-to-end training time is reduced from ≈2000 s (full GS) to ≈460 s, achieving similar or better object-masked PSNR/SSIM compared to baselines on the NeRDS 360 and SCRREAM datasets.
- Color-Based Filtering: A per-view furthest-color algorithm computes removal thresholds for Gaussians that don't match the PoI color profile, periodically cleaning the scene for efficient, mask-centric optimization.
Limitations include dependence on semantic segmentation quality and static scene assumptions; current implementations still require desktop-class GPUs (Schieber et al., 5 Sep 2025).
7. Limitations and Future Perspectives
Remaining challenges for Mobile-GS include:
- Training remains computationally intensive; present approaches require ≈1.5 h/model on high-end GPUs, precluding on-device retraining or adaptation.
- The methodology is strictly scene-specific; no provisions exist for zero-shot or dynamic-scene rendering. Generalizing to such settings requires meta-learning or online adaptation architectures.
- Quantization may introduce subtle color drifts in high frequency details. Learned entropy models or hybrid quantization could improve this trade-off.
- Further acceleration may be realized through hardware-accelerated, order-independent blending or foveated rendering (region-of-interest prioritization).
- For robotics settings, future directions suggested include on-device neural segmentation for PoI masking, adaptive iteration budgets for early exit, and cooperative multi-agent partial map sharing (Du et al., 12 Mar 2026, Schieber et al., 5 Sep 2025).
References
- "Mobile-GS: Real-time Gaussian Splatting for Mobile Devices" (Du et al., 12 Mar 2026)
- "CoRe-GS: Coarse-to-Refined Gaussian Splatting with Semantic Object Focus" (Schieber et al., 5 Sep 2025)