PocketGS: On-Device 3D Gaussian Splatting
- PocketGS is a mobile-compatible 3D Gaussian Splatting system that delivers high-fidelity, real-time scene modeling using resource-efficient, on-device training.
- It innovatively co-designs three operators: 𝒢 for geometry-prior construction, ℐ for anisotropic Gaussian initialization, and 𝒯 for hardware-aligned differentiable optimization to meet stringent mobile constraints.
- Empirical evaluations demonstrate that PocketGS outperforms traditional workstation pipelines in perceptual quality and speed while maintaining peak memory usage below 3GB.
PocketGS is a fully on-device 3D Gaussian Splatting (3DGS) training system designed to enable high-fidelity, efficient 3D scene modeling directly on resource-constrained mobile devices such as smartphones. By jointly addressing the stringent requirements of minute-scale training budgets, strict peak-memory caps (below 3 GB), and hardware-accelerated differentiable optimization, PocketGS delivers real-time novel-view synthesis that matches or surpasses workstation-grade 3DGS pipelines in perceptual fidelity. The paradigm is underpinned by three co-designed operators: 𝒢 for geometry-prior construction, ℐ for prior-conditioned anisotropic Gaussian initialization, and 𝒯 for hardware-aligned differentiable optimization. This design enables capture-to-rendering workflows entirely on-device, as substantiated by empirical comparisons and ablation studies (Guo et al., 24 Jan 2026).
1. On-Device 3DGS: Motivations and Constraints
PocketGS directly targets the limitations of executing 3DGS training on mobile devices, which differ markedly from desktop environments where memory, compute, and time budgets are largely unconstrained. On devices such as the iPhone 15, PocketGS achieves:
- End-to-end training within 5 minutes (500 iterations ≈ 4 minutes on Apple A16).
- Peak memory usage under 3 GB, covering both geometry prior formation and parameter optimization.
- Correct backpropagation of gradients solely through GPU-side operations, precluding expensive CPU-GPU synchronizations.
Naive application of desktop 3DGS fails under these constraints, confronting three primary contradictions:
- Input-Recovery Contradiction: Mobile RGB-D scans yield noisy, sparse geometric inputs; naively densifying these within the training loop inflates computational and memory demands.
- Initialization-Convergence Contradiction: Isotropic Gaussian seeding requires excessive iterations to organize primitives onto scene surfaces—unacceptable when runtime is tightly bounded.
- Hardware-Differentiability Contradiction: Tile-based deferred rendering on mobile GPUs obscures blending state, thus hindering correct and efficient backpropagation without prohibitive overhead.
PocketGS resolves these issues through the co-design of 𝒢, ℐ, and 𝒯, aligning algorithmic and hardware constraints.
2. Operator 𝒢: Geometry-Prior Construction
Operator 𝒢 synthesizes a geometry-faithful, memory-efficient dense point cloud prior to initialize the scene, comprising the following subsystems:
2.1 Information-Gated Frame Subsampling
To reduce processing load without sacrificing geometric diversity, PocketGS selects keyframes by:
- Displacement gate: Admitting frames with translation changes , where cm.
- Sharpness gate: Using approximate gradient energy .
- Windowing: Within every 8-frame window, candidate frames only replace the current best if , .
This gating strictly bounds both structure-from-motion (BA) and multi-view stereo (MVS) workloads, conservatively admitting only 8–15 keyframes for typical captures.
2.2 GPU-Native Global Bundle Adjustment
Refinement of ARKit poses and sparse points employs a robust, fully GPU-based Schur-complement solver to minimize the Huber-regularized reprojection loss:
Here, denotes camera projection and is the Huber loss. The Hessian blocks are partitioned and inverted in parallel, avoiding CPU-GPU round-trips and yielding a clean, high-precision sparse point cloud.
2.3 Single-Reference Cost-Volume MVS
Dense reconstruction relies on a single-reference plane-sweep MVS, selecting the optimal reference frame to maximize:
$S_{\mathrm{ref}} = \exp\left(-\frac{(b - b_\mathrm{target})^2}{2\sigma_b^2}\right) \max\left(\frac{\alpha}{\alpha_\min}, 1\right)$
where is the baseline, the viewing angle, and depths are sampled between the – quantiles of existing sparse depths. Census transform and Semi-Global Matching produce depth maps, which—conditioned on per-pixel confidence above $0.4$—are fused to form .
3. Operator ℐ: Prior-Conditioned Gaussian Initialization
Operator ℐ addresses initialization-convergence inefficiency by directly embedding local surface statistics into each Gaussian's parameterization:
3.1 Local Covariance and Normal Estimation
For each point in , the covariance is computed from its nearest neighbors:
The smallest-eigenvalue eigenvector yields the surface normal at .
3.2 Disc-Like Covariance Seeding
Tangential and normal scales are calculated as:
The Gaussian covariance is parameterized as:
where aligns the local -axis to . Opacity logits are initialized as . All scales are optimized in log space for numerical stability. This anisotropic, surface-aligned seeding dramatically reduces convergence time and conditions the model for rapid high-fidelity reconstruction.
4. Operator 𝒯: Hardware-Aligned Differentiable Splatting
Operator 𝒯 enables correct and efficient differentiable rendering on tile-based mobile GPUs:
4.1 Unrolled Alpha-Compositing with Forward Replay Cache
Manually unrolling the front-to-back alpha compositing equation,
PocketGS stores a minimal replay cache per pixel and a counter buffer of , permitting the correct gradient computation:
This obviates the need for full splat lists or framebuffer readbacks.
4.2 Index-Mapped Gradient Scattering
Rendering requires parameter vectors to be arranged by depth. Gaussians are sorted on-GPU, producing an index mapping . Gradients are gathered in depth order during the forward pass and scattered back according to on the backward pass:
Index mapping preserves optimizer state alignment and allows seamless backpropagation without CPU intervention.
4.3 On-GPU Adam Updates
All Adam optimizer moments and parameter updates execute within a single GPU kernel, eliminating host-device synchronization costs. Parameters are updated in logit space (opacity), log space (scales), and tangent-space (rotations), with all numerics in FP16 for stability.
5. End-to-End Mobile Capture-to-Rendering Workflow
The PocketGS workflow comprises:
- Capture 50 seconds of video with ARKit pose tracking.
- Information-gated keyframe selection (typically 8–15 frames).
- GPU-native BA to refine camera trajectories and initial sparse points.
- Single-reference MVS to produce a dense point cloud .
- ℐ-based initialization to seed surface-aligned, anisotropic Gaussians ().
- Adam-optimized differentiable splatting () for 500 iterations on-device.
- Real-time rendering of the final Gaussian scene on the mobile device.
A companion mobile application implements the pipeline in Swift+Metal. All operations, including BA, MVS, initialization, and training, are fully on-device.
6. Experimental Results and Comparative Analysis
PocketGS is empirically benchmarked against two workstation 3DGS pipelines:
- 3DGS-SFM-WK: COLMAP SfM sparse prior + vanilla 3DGS.
- 3DGS-MVS-WK: COLMAP dense prior + vanilla 3DGS.
All systems are restricted to a 500-iteration budget and equivalent resolution. The following table summarizes performance on LLFF, NeRF-Synthetic, and MobileScan datasets (Guo et al., 24 Jan 2026):
| Dataset | Method | PSNR↑ | SSIM↑ | LPIPS↓ | Time (s) | #Gaussians |
|---|---|---|---|---|---|---|
| LLFF | 3DGS-SFM-WK | 21.01 | 0.641 | 0.405 | 108.0 | 18k |
| LLFF | 3DGS-MVS-WK | 19.53 | 0.637 | 0.387 | 313.1 | 40k |
| LLFF | PocketGS | 23.54 | 0.791 | 0.222 | 105.4 | 33k |
| NeRF-Syn | 3DGS-SFM-WK | 21.75 | 0.800 | 0.243 | 83.7 | 12k |
| NeRF-Syn | 3DGS-MVS-WK | 24.47 | 0.887 | 0.128 | 532.1 | 50k |
| NeRF-Syn | PocketGS | 24.32 | 0.858 | 0.144 | 101.4 | 47k |
| MobileScn | 3DGS-SFM-WK | 21.16 | 0.687 | 0.398 | 112.8 | 23k |
| MobileScn | 3DGS-MVS-WK | 20.85 | 0.781 | 0.281 | 534.5 | 165k |
| MobileScn | PocketGS | 23.67 | 0.791 | 0.225 | 255.2 | 168k |
PocketGS satisfies memory constraints on MobileScan (geometry prior peak: 1.19–2.22 GB, full training peak: 1.82–2.65 GB, all below the 3 GB threshold).
7. Ablation Studies and Operator Contribution
Ablations on MobileScan validate the necessity of every PocketGS operator:
| Variant | PSNR↑ | SSIM↑ | LPIPS↓ | Time (s) |
|---|---|---|---|---|
| Full PocketGS | 23.67 | 0.791 | 0.225 | 255.2 |
| w/o ℐ (anisotropic init) | 22.49 | 0.770 | 0.253 | 319.5 |
| w/o Global BA | 23.45 | 0.752 | 0.232 | 251.1 |
| w/o MVS | 21.07 | 0.646 | 0.414 | 124.8 |
Key findings:
- Removing operator ℐ (anisotropic initialization) forces isotropic seeds, resulting in a 1.2 dB PSNR decrease and 25% increased runtime.
- Excluding global BA significantly degrades structural similarity (SSIM: 0.791 to 0.752).
- Omitting MVS severely degrades reconstruction quality (PSNR: 23.67 to 21.07, LPIPS: 0.225 to 0.414).
This demonstrates that 's lightweight MVS prior is critical for scene fidelity ceilings, accelerates convergence, and enables stable mobile-side differentiable training under memory and runtime constraints.
Through precise co-design of geometry prior construction, prior-conditioned initialization, and hardware-aligned optimization, PocketGS establishes a new paradigm for high-fidelity, efficient, fully on-device 3DGS training and rendering under stringent mobile hardware constraints (Guo et al., 24 Jan 2026).