Mobile-GS: Real-time Gaussian Splatting for Mobile Devices
Abstract: 3D Gaussian Splatting (3DGS) has emerged as a powerful representation for high-quality rendering across a wide range of applications.However, its high computational demands and large storage costs pose significant challenges for deployment on mobile devices. In this work, we propose a mobile-tailored real-time Gaussian Splatting method, dubbed Mobile-GS, enabling efficient inference of Gaussian Splatting on edge devices. Specifically, we first identify alpha blending as the primary computational bottleneck, since it relies on the time-consuming Gaussian depth sorting process. To solve this issue, we propose a depth-aware order-independent rendering scheme that eliminates the need for sorting, thereby substantially accelerating rendering. Although this order-independent rendering improves rendering speed, it may introduce transparency artifacts in regions with overlapping geometry due to the scarcity of rendering order. To address this problem, we propose a neural view-dependent enhancement strategy, enabling more accurate modeling of view-dependent effects conditioned on viewing direction, 3D Gaussian geometry, and appearance attributes. In this way, Mobile-GS can achieve both high-quality and real-time rendering. Furthermore, to facilitate deployment on memory-constrained mobile platforms, we also introduce first-order spherical harmonics distillation, a neural vector quantization technique, and a contribution-based pruning strategy to reduce the number of Gaussian primitives and compress the 3D Gaussian representation with the assistance of neural networks. Extensive experiments demonstrate that our proposed Mobile-GS achieves real-time rendering and compact model size while preserving high visual quality, making it well-suited for mobile applications.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Mobile-GS: Real‑time 3D “splatting” on phones, explained simply
What is this paper about?
This paper introduces Mobile‑GS, a faster and much smaller way to show 3D scenes on phones and other mobile devices in real time. It builds on a technique called 3D Gaussian Splatting, which represents a scene using lots of soft, colored “blobs” in 3D that blend together to form an image. The challenge is that normal 3D Gaussian Splatting looks great but is heavy and slow for phones. Mobile‑GS makes it fast, light, and still good-looking.
What questions are the researchers trying to answer?
The team focused on three simple questions:
- How can we render (draw) 3D Gaussian scenes on a phone without slow steps that phones struggle with?
- How can we keep the picture quality high while using much less memory and storage?
- How can we reduce the number of 3D “blobs” we need without making the scene look worse?
How did they do it? (With everyday analogies)
To make this work on mobile, they combined four key ideas.
- Depth‑aware order‑independent rendering:
- Analogy: Imagine making a collage from many transparent stickers (the 3D blobs). Normally you must sort the stickers from closest to farthest before placing them, or the picture looks wrong. That sorting takes a lot of time.
- What they did: They designed a new way to blend the stickers without sorting. Their method automatically gives “louder voices” (more weight) to nearby blobs and “quieter voices” to faraway blobs, so the final picture looks right without the slow sorting step.
- Neural view‑dependent enhancement (a small helper network):
- Analogy: If you don’t sort stickers, sometimes things look a bit see‑through in the wrong places.
- What they did: They added a tiny neural network that adjusts how see‑through each blob is based on where the camera is and the blob’s shape/color. This fixes the “unwanted see‑through” and keeps details crisp when you move around.
- First‑order spherical harmonics distillation (simpler color rules):
- Analogy: Instead of storing a huge set of rules for how each blob’s color changes as you look at it from different angles, they train a compact set of simpler rules by learning from a stronger “teacher” model.
- Result: Fewer numbers to store, faster lookups, and nearly the same appearance.
- Neural vector quantization + pruning (compress and tidy up):
- Analogy (compression): Group similar blobs’ attributes into a small dictionary of common patterns and store just tiny “pointers” to those patterns, like using abbreviations instead of full words.
- Analogy (pruning): If some blobs barely affect the final picture (they’re very small or very transparent), remove them, like trimming branches that don’t add to the shape of a tree.
- Result: The scene files get much smaller without noticeably hurting quality.
What did they find, and why is it important?
- Much faster on phones: On a mobile device with a Snapdragon 8 Gen 3 GPU, Mobile‑GS reaches about 116 frames per second at 1600×1063 resolution—that’s real‑time, smooth performance.
- Much smaller storage: The same scenes that used to take hundreds of megabytes now fit in around 4–5 MB with Mobile‑GS.
- High visual quality: Even with these speed and size cuts, the images look comparable to the original (heavy) method. On a powerful desktop GPU, it can render at over 1000 FPS in some cases, showing how efficient the approach is.
- Clear bottleneck identified: They showed that the old “sorting” step was the biggest time sink, and removing it delivers a big speed boost.
This matters because it makes advanced 3D rendering practical on everyday devices like smartphones and AR headsets, enabling smoother apps, games, and AR experiences.
What’s the bigger impact?
- Better mobile AR/VR: Faster, lighter rendering means more realistic scenes in AR glasses or phone-based AR without overheating or lag.
- Lower data and power use: Smaller scene files and simpler calculations help save storage and battery life.
- Wider access: Developers can bring high‑quality 3D experiences to more users, even on devices that aren’t super powerful.
In short, Mobile‑GS shows how to keep the eye‑catching quality of modern 3D techniques while cutting the heavy parts that slow phones down. It’s a practical step toward richer, real‑time 3D visuals on devices we use every day.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a focused list of unresolved issues that future work could address, grouped by theme to aid actionable follow-up research.
Rendering formulation and visual quality
- Lack of theoretical guarantees for the depth-aware order-independent compositing: no analysis of when it matches standard alpha blending, preserves occlusion correctness, or violates energy conservation.
- Unclear behavior and sensitivity of the depth-aware weighting term (Eq. 3): the exact formulation is ambiguous in the paper, and the impact of its components (inverse depth, scale, learnable factors) and their ranges on artifacts and stability is not characterized.
- Reliance on a view-dependent opacity MLP to correct transparency artifacts may be brittle: failure modes in scenes with strong specularities, translucency, refractive materials, thin structures, or heavy occlusions are not systematically evaluated.
- No evaluation of temporal stability along camera trajectories: potential popping/flicker introduced by view-dependent opacity modulation and order-free blending remains unquantified (no temporal metrics or user studies).
- Impact of sorting removal on anti-aliasing and silhouette sharpness is unexplored; edge halos or over-blur due to scale-weighted contributions are not analyzed.
- Physical plausibility is not verified: whether the weighted compositing conserves radiance or introduces systematic color/contrast bias is unknown.
- Handling of semi-transparent and participating media (smoke, glass, foliage) under the proposed OIT-like scheme is not demonstrated or bounded.
Compression and quantization
- NVQ codebooks are trained per scene; transferability of codebooks across scenes or feasibility of a global/shared codebook is not studied.
- Runtime decode overheads on mobile (lookup + MLP decoders) are not profiled in isolation; cache strategies and worst-case latency under heavy reuse are unspecified.
- Lower-precision regimes (e.g., 8-bit weights/activations for MLPs and attributes, integer dot products) and their quality/speed trade-offs are not evaluated.
- Error behavior of compression under extreme viewpoints and lighting (e.g., grazing angles, strong highlights) is not reported.
- Interaction between quantization error and order-independent blending (compounded artifacts) is not analyzed.
Pruning strategy
- Hyperparameter sensitivity of contribution-based pruning (threshold T, vote interval Iprune, vote threshold v) across datasets/scenes is not quantified; no auto-tuning or adaptive schedule proposed.
- Pruning relies only on opacity and scale; incorporation of additional signals (e.g., view-frequency visibility, gradient magnitude, contribution variance across views, photometric error) is unexplored.
- No analysis of test-time adaptive LOD or dynamic pruning for foveation/ROI rendering on mobile.
- Scalability and stability of pruning on very large scenes (multi-million Gaussians) not characterized.
Training and distillation
- Dependence on the teacher (Mini-Splatting): how student quality scales with different teacher strengths, or whether teacher-free training can reach similar performance, is not evaluated.
- Depth distillation uses log-error but the teacher depth is acknowledged as noisy; there is no assessment of resulting geometry accuracy (no ground-truth depth metrics) or robustness to teacher errors.
- Training cost/efficiency trade-offs (e.g., with/without multi-view regularization, different distillation weights) are not comprehensively explored, especially for resource-limited training setups.
Mobile deployment and evaluation
- Energy consumption, thermal behavior, and sustained performance (throttling) on mobile are not measured; FPS alone may not reflect deployability in real applications.
- Portability across diverse mobile GPUs/OS stacks (Adreno variants, Mali, Apple GPUs; Vulkan vs. Metal/OpenGL ES) is untested; only Snapdragon 8 Gen 3 is reported.
- Resolution and stereo/VR scaling on device (1080p, 1440p, 4K, dual-eye) are not benchmarked; memory bandwidth limits and performance headroom are unknown.
- End-to-end latency (motion-to-photon), CPU-GPU overlap, and pipeline scheduling on mobile are not analyzed—critical for AR/VR use cases.
- Fairness and parity of mobile baselines are unclear: competing methods were quantized via Huffman but may lack equally optimized Vulkan kernels; the impact on relative performance is not disentangled.
Scalability and generalization
- Generalization to dynamic scenes (time-varying geometry/appearance) is not addressed; how order-independent blending and compression adapt over time is unknown.
- Behavior on large-scale outdoor or city scenes, or asset-heavy environments (LOD/streaming/out-of-core), is not reported.
- Robustness to camera calibration/pose errors and imperfect COLMAP reconstructions is not evaluated.
- Background handling (Cbg) for unbounded scenes is under-specified (constant vs. learned vs. environment map) and its impact on quality is not analyzed.
Reproducibility and implementation details
- Numerical stability of transmittance T = Π(1 − αi) with many Gaussians (underflow/overflow) and potential need for log-domain accumulation are not discussed.
- Atomic accumulation, memory contention, and tiling strategies for “parallel per-Gaussian blending” on tile-based mobile GPUs are not described; race-condition mitigation and precision modes are unspecified.
- Storage accounting details are unclear: whether the reported MB includes MLP weights, codebooks, and all runtime buffers; peak memory and persistent cache sizes are not provided.
- The mobile Vulkan implementation and custom kernels are not yet released; reproducibility and portability of the on-device results cannot be verified.
Practical Applications
Immediate Applications
Below are specific use cases that can be deployed now, leveraging Mobile-GS’s depth-aware order-independent rendering, SH distillation, neural vector quantization, and pruning.
- Bold: On-device AR scene playback for consumer apps
- Sectors: software, mobile AR/VR, media
- Tools/products/workflows: Vulkan-based viewer SDK for Android; ARCore/ARKit integration to place high-fidelity reconstructed scenes; capture (multi-view photos/video) → desktop training/distillation/quantization → 4–8 MB package → mobile playback at 60–120 FPS
- Assumptions/dependencies: static or mostly static scenes; device with Vulkan 2.0 and modern mobile GPU (e.g., Snapdragon 8 Gen 3); training done off-device
- Bold: AEC site capture visualization on tablets/phones
- Sectors: architecture, engineering, construction (AEC); real estate
- Tools/products/workflows: “Capture-to-Viewer” pipeline for walk-throughs during design reviews and client presentations with offline playback; Unity/Unreal plug-ins for IVI/tablet apps
- Assumptions/dependencies: not survey-grade geometry (visualization-focused); good multi-view coverage; desktop training time (~1–2 h/scene) before deployment
- Bold: Low-bandwidth 3D content distribution for e-commerce and real estate
- Sectors: retail/e-commerce, real estate, marketing
- Tools/products/workflows: 3D scene “listings” for showrooms/venues; CDN-friendly GS packages (≈3–8 MB) embedded in mobile apps; in-app quick-load virtual tours
- Assumptions/dependencies: server-side preprocessing; app-side Vulkan or engine integration; quality depends on capture coverage/lighting
- Bold: Digital heritage and museum AR guides
- Sectors: culture/heritage, education
- Tools/products/workflows: offline AR exhibits and interactive guides at 60–100+ FPS; kiosk/handheld viewers; scene packs shared over local networks
- Assumptions/dependencies: static exhibits; controlled lighting preferred; content cleared for offline distribution
- Bold: Edge-first visualization for field service and inspection
- Sectors: industrial/field service, utilities, insurance
- Tools/products/workflows: technician captures site, uploads for training, receives compressed scene for offline reference and report attachment; claim/inspection apps with embedded viewers
- Assumptions/dependencies: scene privacy requirements; on-device playback reduces cloud dependency but initial training still needed
- Bold: Privacy-preserving content workflows
- Sectors: policy/compliance, enterprise IT, defense
- Tools/products/workflows: on-device/air-gapped playback of sensitive environments; small packages minimize data egress; audit logs for content movement
- Assumptions/dependencies: internal compute resources for training; data governance for 3D assets; device security hardening
- Bold: Mobile robotics operator UI and drone pilot situational playback
- Sectors: robotics, drones, public safety
- Tools/products/workflows: mission review on phones/tablets using compact reconstructions; fast scrubbing and viewpoint changes at 60–120 FPS
- Assumptions/dependencies: pre-built scenes (not live); coverage from robot/drone video; not a SLAM/localization module
- Bold: Research baselines for efficient 3DGS on edge devices
- Sectors: academia, software
- Tools/products/workflows: reproducible code to benchmark OIT-style splatting, SH distillation, and NVQ on mobile GPUs; curriculum materials for graphics courses
- Assumptions/dependencies: Vulkan-capable testbeds; comparable datasets (Mip-NeRF360, Tanks&Temples)
- Bold: Automotive IVI scene viewers
- Sectors: automotive/infotainment
- Tools/products/workflows: in-vehicle displays for venue previews, dealership showcases, and branded experiences with low storage footprint
- Assumptions/dependencies: IVI GPU/API support (Vulkan); static scenes; safety policies for in-vehicle use
Long-Term Applications
These opportunities require further research, engineering, or ecosystem alignment (e.g., dynamic scenes, standards, broader device support).
- Bold: Live telepresence and remote assistance via Gaussian streaming
- Sectors: telecommunications, field service, healthcare, manufacturing
- Tools/products/workflows: continuous capture → incremental training/updates → streaming deltas of compressed Gaussians to mobile clients; bidirectional annotations
- Assumptions/dependencies: near-real-time/incremental training and robust update codecs; dynamic-scene handling; low-latency networks; artifact suppression in occlusions
- Bold: On-device or near-device incremental training and scene updates
- Sectors: AR/VR, robotics, developer tools
- Tools/products/workflows: NPU/GPU-accelerated distillation/quantization on phones/edge boxes; background on-device refinement after capture
- Assumptions/dependencies: mobile training acceleration, thermal management, memory budgets; energy-aware schedulers
- Bold: Standardized compressed 3DGS format and delivery pipeline
- Sectors: software, content distribution, standards bodies
- Tools/products/workflows: interoperable bitstream combining NVQ codebooks + entropy coding + metadata; transcoding tools; quality-of-service ladders (multi-bitrate GS)
- Assumptions/dependencies: community consensus and IP clearance; browser/WebGPU and engine support; conformance tests
- Bold: Seamless SLAM + GS fusion for persistent AR with proper occlusion
- Sectors: AR navigation, education, gaming
- Tools/products/workflows: SLAM for tracking + Mobile-GS for photorealistic background rendering and occlusion-aware compositing
- Assumptions/dependencies: tight latency budgets; coherent scale/pose alignment; dynamic object handling
- Bold: Mobile/edge simulation assets for autonomy and synthetic data
- Sectors: automotive, robotics, defense
- Tools/products/workflows: generate fast photorealistic novel views for perception testing on portable rigs; scenario libraries using compact GS assets
- Assumptions/dependencies: domain fidelity, lighting realism, and dynamic actors; licensing for real-world captures
- Bold: Medical and clinical AR visualization (patient education, planning)
- Sectors: healthcare
- Tools/products/workflows: mobile AR viewers for patient-specific anatomy or procedural rehearsals with small storage footprint
- Assumptions/dependencies: regulatory compliance (HIPAA/GDPR), validation of geometric/photometric accuracy for clinical use; medical data pipelines
- Bold: Energy-efficient 3D experiences on AR glasses and low-power devices
- Sectors: hardware, wearables, energy
- Tools/products/workflows: hardware blocks or drivers for order-independent splatting; FP16/INT8-optimized MLPs and decoders
- Assumptions/dependencies: vendor support (Vulkan/Metal/WebGPU); thermal envelopes; battery-aware runtime
- Bold: Web delivery via WebGPU and cross-platform engines
- Sectors: web software, media
- Tools/products/workflows: WebGPU port of the rendering path; bundlers to ship GS scenes as URL-loadable assets; progressive refinement
- Assumptions/dependencies: WebGPU maturity across browsers; shader translation; security sandboxing for performance
- Bold: 3D social media posts and UGC authoring tools
- Sectors: social/media, creator economy
- Tools/products/workflows: capture apps with automated training in the cloud and instant share of GS assets; in-app editing (crop, relight, annotate)
- Assumptions/dependencies: moderation/IP management of 3D captures; scalable training infrastructure; device diversity
- Bold: Insurance, real estate, and claims automation with 3D evidence
- Sectors: finance/insurance, real estate
- Tools/products/workflows: policyholder-guided capture → automated packaging → mobile investigator playback; document links to 3D scenes
- Assumptions/dependencies: chain-of-custody and tamper evidence; guidelines for admissibility; scene completeness
Cross-cutting assumptions and dependencies
- Scene characteristics: best for static or slowly changing environments; transparency/occlusion artifacts can persist without further modeling.
- Capture quality: multi-view coverage, calibration, and lighting strongly affect fidelity.
- Training pipeline: current workflow assumes desktop/GPU training and teacher-student distillation; model quality depends on teacher quality.
- Device support: Vulkan 2.0 and modern mobile GPUs; ports needed for Metal (iOS) and WebGPU (web).
- Performance variance: published FPS on Snapdragon 8 Gen 3 may not generalize to mid-tier devices; thermal throttling and battery constraints apply.
- Storage/quality trade-offs: codebook size, pruning thresholds, and SH order must be tuned per use case.
- Compliance and IP: privacy, consent, and rights management for captured spaces; sector-specific regulations (e.g., healthcare).
Glossary
- 3D Gaussian Splatting (3DGS): A scene representation that uses anisotropic 3D Gaussian primitives for efficient, high-quality rendering and reconstruction. "3D Gaussian Splatting (3DGS) (Kerbl et al., 2023) is a recently introduced technique for high-quality 3D reconstruction that represents scenes as a set of anisotropic 3D Gaussian primitives."
- A-buffer: A per-pixel fragment storage method that enables order-independent handling of transparency by storing and later sorting fragments. "store and sort fragment lists using A- buffers (Carpenter, 1984)."
- alpha blending: A compositing technique that combines fragments using opacity, typically requiring depth sorting for correct results. "we first identify alpha blending as the primary computational bottleneck"
- anisotropic (3D Gaussian): Having direction-dependent scales; in 3DGS, Gaussians can have different extents along axes to better fit geometry. "anisotropic 3D Gaussian primitives."
- codebook: The set of representative vectors (codewords) used to quantize attribute sub-vectors in vector quantization. "The codebook size directly influences both rendering quality and storage cost."
- contribution-based pruning: A strategy that removes Gaussians with persistently low impact based on opacity and scale statistics. "we also propose a contribution-based pruning strategy"
- depth attenuation factor: A multiplicative term that reduces a Gaussian’s contribution with increasing distance to the camera. "acts as a depth attenuation factor"
- depth-aware order-independent rendering: A rendering scheme that eliminates depth sorting by weighting Gaussian contributions using depth and scale. "we propose a depth-aware order-independent rendering scheme that eliminates the need for sorting"
- depth peeling: A multi-pass transparency method that extracts successive depth layers to composite semi-transparent surfaces. "known as depth peeling (Bavoil & Myers, 2008)"
- depth sorting: Ordering fragments by depth (often near-to-far) before compositing to ensure correct transparency. "this depth-sorting process introduces multiple challenges"
- differentiable rasterizer: A rasterization pipeline whose outputs are differentiable with respect to scene parameters, enabling gradient-based optimization. "leverages a tile-based differentiable rasterizer to render novel views."
- entropy-based compression: Compression that exploits symbol probability distributions to reduce bitstream size. "This entropy-based compression technique significantly reduces the bitstream size"
- entropy encoding: A lossless compression approach (e.g., Huffman, arithmetic coding) that assigns shorter codes to more frequent symbols. "and entropy encoding (Chen et al., 2024a; Niedermayr et al., 2023)."
- first-order spherical harmonics distillation: Knowledge distillation that compresses higher-order SH appearance into first-order coefficients guided by a teacher model. "first-order spherical harmonics distillation"
- global transmittance: The accumulated fraction of light that passes through all Gaussians along a ray. "T = II]=1(1 - aj) represents the global transmittance"
- Huffman coding: A classic entropy coding algorithm that assigns variable-length codes to symbols based on frequency. "we apply Huffman coding to encode sequences at the end of training."
- inverse depth: The reciprocal of depth; often used to emphasize nearer surfaces in weighting or optimization. "we utilize the inverse depth to reduce the contributions of the distant 3D Gaussians."
- k-buffer: An OIT method that stores only the first k depth layers per pixel to approximate transparency without full sorting. "k-buffer methods similarly have different depth layers"
- K-Means clustering: A clustering algorithm used here to partition attribute vectors into subspaces for codebook quantization. "by K-Means (Hamerly & Elkan, 2003)."
- Monte Carlo rendering: A stochastic rendering approach that estimates integrals (e.g., light transport) via random sampling. "commonly used in Monte Carlo rendering"
- multi-layer perceptron (MLP): A feedforward neural network used to predict view-conditioned weights/opacity for Gaussians. "we design a lightweight multi-layer perceptron (MLP) that predicts the view- dependent opacity scalar for each Gaussian."
- near-to-far order: A specific depth ordering used for correct alpha compositing, drawing nearer fragments before farther ones. "in the near-to-far order."
- Neural Radiance Field (NeRF): A neural volumetric representation that models view-dependent radiance for novel view synthesis. "Neural Radiance Field (Mildenhall et al., 2021) is the first to leverage volume rendering"
- neural vector quantization: A neural-aided VQ scheme that quantizes sub-vectors with multiple codebooks and decodes compact features via small MLPs. "we introduce a neural vec- tor quantization technique to quantize 3D Gaussian parameters"
- novel view synthesis: Rendering images from unseen viewpoints given a learned 3D representation. "high- quality novel view synthesis"
- opacity: The alpha value representing how much a fragment blocks light; used in blending and pruning criteria. "predicts the view- dependent opacity scalar for each Gaussian."
- Order-Independent Transparency (OIT): Techniques that composite transparency without explicitly sorting fragments by depth. "approximate compositing, known as Order-Independent Transparency (OIT)."
- order-independent rendering: Rendering that aggregates contributions without requiring a specific depth order. "Order-independent rendering enables efficient Gaussian compositing."
- quantile operator: A statistical operator that selects a threshold based on a chosen quantile of a distribution. "Q7(.) denotes the T-quantile operator"
- radiance field: A function describing emitted light as a function of position and direction; the target of real-time rendering in 3DGS. "real-time radiance field rendering."
- scale-invariant depth distillation loss: A loss on log-depth that transfers depth cues from a teacher while being insensitive to global scale. "we also propose a scale-invariant depth distillation loss"
- SO(3): The Lie group of 3D rotations used to parameterize Gaussian orientations. "rotation pa- rameter ri E SO(3)"
- spherical harmonics (SH): Orthogonal basis functions on the sphere for compactly representing view-dependent appearance. "the original 3DGS uses the third-order spherical harmonic (SH) function to represent appearance"
- stochastic transparency: An OIT approach that samples fragments probabilistically to produce plausible transparency without sorting. "stochastic transparency, commonly used in Monte Carlo rendering"
- tile-based rendering: A GPU strategy that processes images in tiles to improve locality and performance. "eliminates the tile-based rendering and the 3D Gaussian sorting process"
- transmittance: The fraction of light transmitted through a medium or set of fragments; complements opacity in compositing. "Modeling transmittance remains a longstanding challenge in computer graphics"
- vector quantization: A compression method that represents vectors by the nearest codeword in a codebook. "vector quantization (Wang et al., 2024b; Liu et al., 2024; Papantonakis et al., 2024; Xie et al., 2024)"
- view-dependent effects: Appearance changes with viewing direction (e.g., specular highlights), modeled here via SH and MLPs. "especially for view-dependent effects."
- Vulkan 2.0: A low-overhead, cross-platform graphics and compute API used for mobile deployment. "using Vulkan 2.0, a modern, cross-platform graphics and compute API."
Collections
Sign up for free to add this paper to one or more collections.