Mobile-GS: Mobile 3D Gaussian Splatting

Updated 13 March 2026

Mobile-GS is a methodology that enables efficient 3D Gaussian Splatting on mobile devices through a two-stage pipeline combining offline compression and on-device inference.
It integrates depth-aware, order-independent blending with a compact neural module for view-dependent enhancements, improving occlusion handling and rendering quality.
Mobile-GS leverages techniques like neural vector quantization and contribution-based pruning to achieve substantial model size reduction while maintaining competitive PSNR and SSIM.

Mobile-GS denotes a family of methodologies, systems, and metrics enabling 3D Gaussian Splatting (3DGS) or related computationally intensive tasks to operate efficiently on resource-constrained mobile platforms. Current research frames Mobile-GS both as an algorithmic advance in neural scene rendering and as a practical strategy for rapid, high-fidelity 3D reconstructions in robotics, extended reality, and real-time graphics. This article focuses on Mobile-GS as introduced in recent systems with explicit mobile and edge device deployment in mind, particularly the depth-aware, order-independent, and compressed 3DGS pipeline defined in (Du et al., 12 Mar 2026), while also connecting to real-time object-centric variations for robotics (Schieber et al., 5 Sep 2025).

1. Pipeline Overview and Motivation

Mobile-GS targets the prohibitive computational cost and memory requirements of classical 3D Gaussian Splatting when executed on mobile hardware. The approach introduces a two-stage pipeline:

Offline Training/Compression (typically on desktop GPUs):

A high-fidelity "teacher" 3DGS model is constructed with fine appearance basis (e.g., 3rd-order spherical harmonics).
A "student" model comprising anisotropic Gaussian primitives is initialized and optimized to match the teacher's colors and depths under a novel, order-independent compositing strategy.
Appearance features are distilled, quantized, and pruned for size and inference efficiency.

Online Inference (on-device, mobile GPU):
- All rendering is executed via a Vulkan-based compute pipeline, utilizing order-independent, parallel blending and compact neural modules to achieve real-time frame rates at full mobile resolution.

This pipeline concretely addresses two central bottlenecks in mobile deployment: the need to sort millions of Gaussians by depth (an O(N log N) operation, with N ≫ 10⁴–10⁵ in typical scenes) and the prohibitive storage footprint of uncompressed neural or analytic radiance fields (Du et al., 12 Mar 2026).

2. Mathematical Foundation

Each Gaussian primitive $G_i$ is parameterized by a center $p_i \in \mathbb{R}^3$ , covariance matrix $\Sigma_i \in \mathbb{R}^{3\times3}$ (encoded as rotation and scale), appearance features $Y_i$ (spherical harmonics coefficients), and an opacity $o_i$ . The spatial density is given by:

$G_i(x) = \exp\left[ -(x - p_i)^T \Sigma_i^{-1}(x - p_i) \right].$

Classical rendering involves projecting Gaussians to each pixel, sorting by depth, and compositing via alpha blending:

$C = \sum_{i=1}^N T_{i-1} a_i c_i,$

with transmittance $T_i = \prod_{j=1}^i (1 - a_j)$ and alpha $a_i = 1 - \exp(-\Lambda_i)$ , where $\Lambda_i$ is the integrated density along the ray.

Mobile-GS replaces explicit sorting with a depth-aware, order-independent rule:

For each overlapping Gaussian, compute $p_i \in \mathbb{R}^3$ 0 and a learnable, depth-conditioned weight $p_i \in \mathbb{R}^3$ 1.
Accumulate in parallel:
- $p_i \in \mathbb{R}^3$ 2
- $p_i \in \mathbb{R}^3$ 3
- $p_i \in \mathbb{R}^3$ 4
The pixel color is

$p_i \in \mathbb{R}^3$ 5

This approach is commutative and supports efficient, atomic accumulation on GPU, leading to O(N) per-pixel complexity (Du et al., 12 Mar 2026).

3. Neural View-Dependent Enhancement

Order-independent compositing introduces transparency artifacts when geometry overlaps due to loss of rendering order. Mobile-GS addresses this by introducing a compact neural module that conditions per-Gaussian contributions on the view direction and geometric attributes:

A shared MLP $p_i \in \mathbb{R}^3$ 6 ingests the camera-to-Gaussian vector, scale, rotation, and first-order SH coefficients, outputting a feature $p_i \in \mathbb{R}^3$ 7.
Two heads predict $p_i \in \mathbb{R}^3$ 8 (modulating the weight $p_i \in \mathbb{R}^3$ 9) and $\Sigma_i \in \mathbb{R}^{3\times3}$ 0 (additional opacity scaling).
Enhanced compositing replaces $\Sigma_i \in \mathbb{R}^{3\times3}$ 1 with $\Sigma_i \in \mathbb{R}^{3\times3}$ 2 and $\Sigma_i \in \mathbb{R}^{3\times3}$ 3 with $\Sigma_i \in \mathbb{R}^{3\times3}$ 4, improving occlusion boundaries and view-dependent effects.

The model is trained with a loss comprising RGB, distillation (from the teacher 3DGS), and log-depth terms:

$\Sigma_i \in \mathbb{R}^{3\times3}$ 5

with $\Sigma_i \in \mathbb{R}^{3\times3}$ 6, $\Sigma_i \in \mathbb{R}^{3\times3}$ 7, $\Sigma_i \in \mathbb{R}^{3\times3}$ 8, and specific forms for each loss included in the dataset (Du et al., 12 Mar 2026).

4. Model Compression and Mobile Optimizations

To achieve a model size viable for mobile deployment (single-digit megabytes):

First-Order Spherical Harmonics Distillation: The appearance representation is distilled from a 3rd-order (9–16 coefficients) to a 1st-order SH basis (4 coefficients per color channel) via a pixel-wise distillation loss, reducing storage by approximately two thirds without significant loss in visual fidelity.
Neural Vector Quantization (NVQ): High-dimensional attribute vectors are split into sub-vectors, quantized into learned codebooks (via K-means), and Huffman-encoded. At inference, a small MLP reconstructs attributes. This leads to 80–90% overall storage reduction.
Contribution-Based Pruning: Gaussians with low opacity and small scale are identified using quantile thresholds over the attribute distributions. Candidates that consistently under-contribute are removed after accumulating votes over several pruning steps, yielding a further 30–50% reduction in active primitives with negligible PSNR loss.
Vulkan-based Rasterization: The tile-based rasterizer is replaced with a fullscreen compute shader that efficiently bins and composites Gaussians by bounding sphere, minimizing API state changes and enabling branch-free weight calculations.

The resulting pipeline achieves ≈4.6 MB total model size (codebooks, MLP weights, Gaussian geometry) and a GPU memory footprint of ≈6 MB including buffers (Du et al., 12 Mar 2026).

5. Experimental Results

Extensive experiments demonstrate that Mobile-GS achieves both high-fidelity rendering and real-time performance:

Desktop (RTX 3090): 1098 FPS on unbounded scenes with 24.82 PSNR, 0.856 SSIM, and 4.8 MB storage.
Mobile (Snapdragon 8 Gen 3): 116 FPS (cold), 74 FPS (steady-state, post-throttling) at 1600×1063, PSNR 27.12 dB, SSIM 0.807. The model size is ≈4.6 MB, outperforming quantized 3DGS and SortFreeGS baselines by 17% in PSNR and 14× in size reduction.
Ablations: Removing order-independent rendering increases PSNR marginally (+0.2 dB) at a 4–5× speed penalty. Omitting neural view dependence drops PSNR by 0.5 dB and leads to transparency artifacts. Excluding NVQ enlarges model size to >120 MB.

A user study with 30 volunteers on standard benchmarks (Mip-NeRF 360, Tank&Temples, Deep Blending) reported preference for Mobile-GS renderings in 64–79% of trials (Du et al., 12 Mar 2026).

6. Application in Real-Time Object-Centric Robotics

Mobile-GS also describes a variant pipeline for high-speed, object-of-interest (PoI) 3D reconstruction on mobile robots, as demonstrated in "CoRe-GS" (Schieber et al., 5 Sep 2025):

Coarse-to-Refined Staging: An initial semantic GS representation is constructed in 3,000 iterations, yielding per-splat class labels; refinement focuses on Gaussians constituting the PoI with color-based pruning to remove "floaters."
Efficiency: End-to-end training time is reduced from ≈2000 s (full GS) to ≈460 s, achieving similar or better object-masked PSNR/SSIM compared to baselines on the NeRDS 360 and SCRREAM datasets.
Color-Based Filtering: A per-view furthest-color algorithm computes removal thresholds for Gaussians that don't match the PoI color profile, periodically cleaning the scene for efficient, mask-centric optimization.

Limitations include dependence on semantic segmentation quality and static scene assumptions; current implementations still require desktop-class GPUs (Schieber et al., 5 Sep 2025).

7. Limitations and Future Perspectives

Remaining challenges for Mobile-GS include:

Training remains computationally intensive; present approaches require ≈1.5 h/model on high-end GPUs, precluding on-device retraining or adaptation.
The methodology is strictly scene-specific; no provisions exist for zero-shot or dynamic-scene rendering. Generalizing to such settings requires meta-learning or online adaptation architectures.
Quantization may introduce subtle color drifts in high frequency details. Learned entropy models or hybrid quantization could improve this trade-off.
Further acceleration may be realized through hardware-accelerated, order-independent blending or foveated rendering (region-of-interest prioritization).
For robotics settings, future directions suggested include on-device neural segmentation for PoI masking, adaptive iteration budgets for early exit, and cooperative multi-agent partial map sharing (Du et al., 12 Mar 2026, Schieber et al., 5 Sep 2025).

References

"Mobile-GS: Real-time Gaussian Splatting for Mobile Devices" (Du et al., 12 Mar 2026)
"CoRe-GS: Coarse-to-Refined Gaussian Splatting with Semantic Object Focus" (Schieber et al., 5 Sep 2025)

Markdown Report Issue Upgrade to Chat

References (2)

Mobile-GS: Real-time Gaussian Splatting for Mobile Devices (2026)

CoRe-GS: Coarse-to-Refined Gaussian Splatting with Semantic Object Focus (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mobile-GS.