Towards Next-Generation SLAM: A Survey on 3DGS-SLAM Focusing on Performance, Robustness, and Future Directions
Abstract: Traditional Simultaneous Localization and Mapping (SLAM) systems often face limitations including coarse rendering quality, insufficient recovery of scene details, and poor robustness in dynamic environments. 3D Gaussian Splatting (3DGS), with its efficient explicit representation and high-quality rendering capabilities, offers a new reconstruction paradigm for SLAM. This survey comprehensively reviews key technical approaches for integrating 3DGS with SLAM. We analyze performance optimization of representative methods across four critical dimensions: rendering quality, tracking accuracy, reconstruction speed, and memory consumption, delving into their design principles and breakthroughs. Furthermore, we examine methods for enhancing the robustness of 3DGS-SLAM in complex environments such as motion blur and dynamic environments. Finally, we discuss future challenges and development trends in this area. This survey aims to provide a technical reference for researchers and foster the development of next-generation SLAM systems characterized by high fidelity, efficiency, and robustness.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Explaining “Towards Next-Generation SLAM: A Survey on 3DGS-SLAM Focusing on Performance, Robustness, and Future Directions”
What is this paper about?
This paper is a “survey,” which means it reviews and explains many recent research works on a topic instead of presenting just one new method. The topic is SLAM (Simultaneous Localization and Mapping), a technology that lets robots, drones, and phones figure out where they are while building a map of the world around them.
The paper focuses on a new way to make SLAM maps look realistic and run fast, called 3D Gaussian Splatting (3DGS). Think of 3DGS as painting a 3D scene using thousands of tiny, soft blobs (like semi-transparent bubbles) to create a detailed, lifelike picture. The authors explain how researchers are combining 3DGS with SLAM to make next-generation systems that are both accurate and visually impressive.
What questions does the paper try to answer?
The authors look at how 3DGS can improve SLAM and organize the discussion around four big goals:
- Rendering quality: How good and realistic do the maps look?
- Tracking accuracy: How well does the system keep track of where the camera/robot is?
- Reconstruction speed: How fast can the system build the map?
- Memory consumption: How much computer memory does it use?
They also ask: How do these systems stay strong (“robust”) when things get tricky, like when the camera is blurry due to fast motion or when there are moving people and cars in the scene?
How did the researchers study it?
Because this is a survey, the authors didn’t build just one system. Instead, they collected and analyzed many papers that use 3DGS in SLAM. They compared the methods using the four goals above and explained what ideas and tricks each method uses to do better.
To make the topic easier to understand, the paper also explains how 3DGS works and how it fits into a typical SLAM pipeline.
Here’s the idea in everyday language:
- 3DGS basics:
- Initialization: Start with a rough 3D point cloud (a set of dots in space) made from multiple pictures. Each point becomes a “Gaussian”—a soft blob with position, size, color, and transparency.
- Projection: Imagine holding a camera up to the scene. Each blob gets projected onto the camera’s image, like casting shadows of the blobs onto the screen.
- Rendering: The image is split into small tiles (like a grid of 16×16 squares). For each tile, the blobs that affect it are sorted by depth (front to back), and then blended together. This blending makes the final picture look smooth and realistic.
- Optimization: The system compares the rendered image to the real photo and tweaks each blob (moving it slightly, adjusting its size or color). If an area needs more detail, a blob can split into smaller blobs; if it’s too crowded, blobs can merge. Over time, the scene becomes sharp and accurate.
- SLAM pipeline with 3DGS:
- Initialization: Pick starting frames and roughly estimate the camera’s position.
- Tracking: For each new frame, figure out where the camera is by matching what the 3DGS scene would look like with what the camera actually sees.
- Mapping: Update and improve the 3DGS blobs so the scene becomes more detailed and realistic.
- Loop closure: If the camera returns to a place it has seen before, the system recognizes it and “tightens” the map to remove drift (small errors that build up over time).
What did they find, and why is it important?
The survey highlights big trends and lessons from many 3DGS-SLAM systems:
- High-quality visuals and real-time speed can coexist: 3DGS often achieves the realism of advanced neural methods (like NeRF) but renders much faster, which is crucial for live applications (AR/VR, robots).
- Tracking can use images directly: Many systems line up the camera’s view with the rendered 3DGS view, improving tracking accuracy even in scenes with low texture or tricky light.
- Speed improvements come from clever GPU rendering: Splitting images into tiles and blending blobs front-to-back allows fast, parallel processing.
- Memory can be managed smartly: Splitting and merging blobs, pruning blobs that don’t matter, and compressing color/features keep memory usage under control—important for large buildings or outdoor maps.
- Robustness is improving: New methods handle motion blur, moving objects, and sensor noise by combining extra sensors (like depth cameras, IMUs, LiDAR), filtering out dynamic parts, or adapting blob properties.
- The ecosystem is growing fast: There are many new systems (for example, SplaTAM, GS-SLAM, MonoGS, Photo-SLAM, RTG-SLAM, Loopy-SLAM, and more) tackling different needs—some focus on speed, others on visuals, others on loop closure or dynamic scenes.
This matters because combining 3DGS with SLAM could give us maps that are both beautiful and practical, ready for real-world use where timing, accuracy, and clarity all matter.
What could this research lead to?
The paper points to exciting future directions:
- Mobile and embedded devices: Making 3DGS-SLAM run smoothly on phones, AR headsets, and small robots with limited power.
- Dynamic, everyday scenes: Better handling of people, cars, and other moving objects without breaking the map.
- Large-scale spaces: Cities and campuses require smarter memory use and faster updates.
- Sparse views and tough lighting: Improving performance when the camera doesn’t see much or conditions aren’t ideal.
- Semantics and interaction: Understanding not just shapes and colors, but what objects are—chairs, doors, roads—and making maps that are useful for tasks like navigation and editing digital twins.
In short, this survey shows how 3D Gaussian Splatting is pushing SLAM toward the next generation: systems that are accurate, fast, realistic, and reliable—opening doors for safer robots, smoother AR/VR, and better 3D experiences in the real world.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a concise list of unresolved issues the paper highlights or implies but does not fully address, framed to guide actionable future research:
- Lack of unified evaluation protocol: establish standardized, reproducible benchmarks covering rendering quality (e.g., PSNR/SSIM/LPIPS), tracking accuracy (ATE/RPE), speed (FPS/latency), and memory (bytes per Gaussian/map size) under identical hardware and software settings.
- Missing head-to-head comparisons: run controlled ablations of representative 3DGS-SLAM systems on the same datasets with fixed hyperparameters, GPU/CPU, and build settings to quantify trade-offs across the four core dimensions.
- Unclear scalability laws: derive and validate empirical/theoretical scaling of runtime, memory, and accuracy with scene size, Gaussian count, and camera trajectory complexity; identify thresholds where splitting/merging policies break down.
- Online initialization without SfM: design robust monocular or sparse-view initialization that does not rely on offline SfM, including scale recovery and pose-Gaussian joint bootstrapping under limited parallax.
- Rolling-shutter and motion-blur modeling: incorporate camera readout and blur-aware rasterization/likelihoods into 3DGS-SLAM, and evaluate on dedicated datasets with ground-truth RS parameters.
- Photometric calibration and exposure handling: model per-frame exposure, white balance, gamma, and vignetting within the 3DGS loss; quantify gains in tracking and rendering under auto-exposure cameras.
- Dynamic scene modeling at the Gaussian level: develop consistent strategies for segmenting, tracking, and updating “dynamic Gaussians,” including object re-identification, background-foreground disentanglement, and policies for addition/removal over time.
- Loop closure in dynamic environments: create GS-native place recognition descriptors and back-end optimization that can detect loops and correct drift despite transient/moving objects.
- Probabilistic uncertainty in GS-SLAM: represent and propagate pose/map uncertainty (e.g., separating shape covariance from epistemic uncertainty), enabling confidence-aware tracking, robust data association, and risk-aware planning.
- Multi-sensor fusion beyond RGB-D: formalize integration of IMU, LiDAR, and event cameras in Gaussian parameter estimation (e.g., geometry priors from LiDAR, asynchronous event alignment) with rigorous time-sync and calibration procedures.
- Semantic integration on Gaussian clouds: build scalable pipelines for instance-level semantics, affordances, and panoptic mapping directly on GS maps; define metrics for semantic consistency and usefulness for downstream tasks (planning, interaction).
- Map compression and streaming: design lossy/lossless codecs for Gaussian maps with view-adaptive streaming and on-device memory budgets; evaluate fidelity-latency trade-offs for AR/VR and robotics.
- Embedded and mobile deployment: quantify energy, memory, and latency on mobile GPUs/NPUs/CPUs; propose mixed-precision, kernel fusion, and scheduling strategies for real-time operation without discrete GPUs.
- Physical reflectance and illumination: go beyond spherical harmonics to handle specular/transparent surfaces, outdoor illumination changes, and shadows; test whether physically based shading improves robustness without prohibitive cost.
- Occlusion and depth-ordering correctness: analyze tile-based rasterization failure modes (e.g., extreme parallax, thin structures), propose ordering-correct schemes or error bounds, and measure their impact on tracking/rendering.
- Lifelong and continual mapping: address catastrophic forgetting when maps are updated over long periods; introduce regularization/replay strategies and multi-session map management with seasonal/time-of-day changes.
- Dataset coverage gaps: curate large-scale, diverse benchmarks (indoor/outdoor, industrial, highly dynamic, low light, adverse weather) with ground truth for pose, geometry, and semantics, including event/IMU streams.
- Failure-mode analysis and recovery: characterize common GS-SLAM breakdowns (e.g., pose spikes, opacity degeneracy, Gaussian explosion) and design safe fallback strategies and automatic recovery mechanisms.
- Hybrid representations and interoperability: define criteria and mechanisms to switch or fuse 3DGS with meshes/voxels/implicit fields; provide conversion tools and interfaces to traditional SLAM for planning and control.
- Optimization theory: study identifiability and convergence of joint pose–Gaussian optimization under photometric losses; assess gradient landscape, sample complexity, and sensitivity to noise/outliers.
- Camera/lens model fidelity: integrate lens distortion, rolling shutter, and non-pinhole models into projection Jacobians; quantify benefits and costs across datasets and hardware.
- Auto-tuning of GS hyperparameters: develop principled schedulers for splitting/merging thresholds, opacity pruning, learning rates, and SH order that adapt online to scene content and resource constraints.
- Multi-agent GS-SLAM: investigate distributed mapping and collaborative loop closure with Gaussian map merging, conflict resolution, and communication-efficient synchronization.
- Formal safety guarantees: provide runtime monitors and certification procedures that bound localization drift and map error for safety-critical robotics using GS-SLAM.
- Robustness under adverse conditions: benchmark and improve performance in rain, fog, dust, smoke, and glare; explore sensor fusion and specialized rendering losses tailored to these conditions.
- Privacy-preserving photorealistic mapping: design de-identification or selective detail suppression in GS maps; evaluate differential privacy or on-device processing to meet regulatory requirements.
Practical Applications
Immediate Applications
Based on the survey’s synthesis of 3D Gaussian Splatting (3DGS) integrated with SLAM—emphasizing rendering fidelity, tracking accuracy, speed, memory efficiency, and robustness in motion blur/dynamic scenes—the following use cases can be deployed with today’s methods and hardware.
- Photorealistic AR anchoring and mixed reality overlays (Sector: software/XR, media)
- What: Low-latency, high-fidelity mapping for persistent AR anchors and occlusion-correct overlays in indoor/outdoor spaces.
- Why 3DGS-SLAM: Explicit, photorealistic maps with fast rendering (e.g., RTG-SLAM, Photo-SLAM, MonoGS, SplaTAM) reduce drift and visual mismatch.
- Potential tools/workflows: A Unity/Unreal plugin that ingests live Gaussian maps via gsplat; mobile SDKs for XR headsets or RGB-D smartphones; loop closure via Loopy-SLAM/LoopSplat for multi-session consistency.
- Assumptions/dependencies: GPU acceleration (CUDA), well-calibrated intrinsics/extrinsics, indoor lighting stability or robust photometric tracking, and dynamic-object filtering to maintain anchor stability.
- Mobile robotics navigation in low-texture or dynamic environments (Sector: robotics/logistics)
- What: Reliable pose estimation and dense mapping for AMRs/AGVs in warehouses, hospitals, and retail.
- Why 3DGS-SLAM: Strong photometric tracking and view-consistent maps improve robustness where feature-based SLAM fails; loop closure (Loopy-SLAM/LoopSplat) stabilizes long runs; GS-ICP/G2S-ICP support geometric alignment.
- Potential tools/workflows: ROS2 node that publishes poses and a splat map; on-the-fly export to voxel/ESDF for planners; integration with GS-ORB-SLAM/CG-SLAM for hybrid feature + photometric tracking.
- Assumptions/dependencies: Edge GPU (Jetson/embedded), sensor synchronization, robust initialization, and policies for moving-object filtering (e.g., MotionGS, NEDS-SLAM for high-speed scenes).
- Drone-based inspection and mapping of infrastructure (Sector: energy, construction, public safety)
- What: Real-time high-fidelity recon for bridges, plants, power lines, and buildings, including quick damage triage.
- Why 3DGS-SLAM: Fusion with inertial/LiDAR (e.g., MM3DGS-SLAM, LVI-GS, GS-LIVO) improves robustness under aggressive motion/low texture; fast, explicit splats ease on-site verification.
- Potential tools/workflows: Jetson-based payloads that produce splat maps and export to mesh/point clouds; offline refinement with Gaussian splitting/merging; ICP-based re-localization (GS-ICP).
- Assumptions/dependencies: Regulatory flight constraints, variable illumination, GPS degradation (reliance on VIO/LI), weather and vibration tolerance.
- As-built capture and BIM coordination (Sector: AEC)
- What: Rapid on-site “as-is” capture for clash detection and field coordination.
- Why 3DGS-SLAM: RGB-D pipelines (e.g., GS-SLAM, SplaTAM) provide dense, photorealistic maps faster than meshing pipelines while preserving high-frequency details.
- Potential tools/workflows: Field app that scans rooms/halls and exports to BIM/CAD (mesh, point cloud, or Gaussian → mesh); loop closure enables multi-room/global consistency.
- Assumptions/dependencies: Controlled scanning paths to limit motion blur, accurate extrinsics for multi-sensor rigs, export interoperability (e.g., glTF/OpenUSD).
- Real estate, insurance, and facilities documentation (Sector: finance/insurtech, real estate, property management)
- What: Quick 3D capture for virtual tours, claims assessment, and asset inventories.
- Why 3DGS-SLAM: Monocular/RGB-D systems (MonoGS, Photo-SLAM) deliver photorealistic results with lower compute than NeRF and faster turnaround.
- Potential tools/workflows: Mobile capture app with cloud processing; automated room/asset segmentation via semantics-enabled pipelines (e.g., SGS-SLAM) for claims auditing.
- Assumptions/dependencies: Privacy and consent management for interior data, bandwidth for uploads, and consistent lighting.
- Cultural heritage digitization and museum curation (Sector: culture/heritage, education)
- What: In-situ photorealistic digitization of small-to-medium artifacts and rooms with minimal equipment.
- Why 3DGS-SLAM: High-fidelity textures with efficient optimization; event-based options (NEDS-SLAM) for low-light/high-dynamic-range scenes.
- Potential tools/workflows: Portable rigs that capture and refine Gaussian maps; artifact-level semantic tagging for curation.
- Assumptions/dependencies: Lighting constraints, reflective surfaces, on-site compute or efficient offline refinement.
- Consumer robotics (robot vacuums, home assistive robots) (Sector: consumer electronics)
- What: Better obstacle understanding, AR floorplans, and persistent home maps.
- Why 3DGS-SLAM: Compact, high-detail maps with improved tracking in weak-texture areas; memory-optimized approaches (e.g., MemGS, CompactGS) fit embedded constraints.
- Potential tools/workflows: On-device splat mapping with periodic cloud sync; occupancy extraction for planning.
- Assumptions/dependencies: Low-power GPU/DSP, privacy (local processing), illumination handling, and robust loop closure in long-term deployments.
- Remote telepresence and assistance (Sector: field service, manufacturing)
- What: Live, photorealistic scene streaming for remote experts and training.
- Why 3DGS-SLAM: Tile-based rasterization and explicit splats (RTG-SLAM) enable fast rendering and adaptive scene updates; loop closure maintains global consistency in long sessions.
- Potential tools/workflows: Edge capture + cloud render; adaptive Gaussian splitting/merging to balance bandwidth and fidelity; web viewer built on gsplat/GPU rasterization.
- Assumptions/dependencies: Network bandwidth/latency, compression/streaming of Gaussian parameters, device heterogeneity.
- Academic benchmarking and teaching (Sector: academia/education)
- What: Course labs and research prototyping with state-of-the-art SLAM and neural rendering.
- Why 3DGS-SLAM: Open-source systems (e.g., SplaTAM, GS-SLAM, MonoGS) enable reproducible, real-time experiments across RGB/RGB-D/IMU/LiDAR configurations.
- Potential tools/workflows: Modular pipelines to swap trackers, loop-closers (Loopy-SLAM/LoopSplat), and memory optimizers (MemGS); automated evaluation suites.
- Assumptions/dependencies: GPU availability, dataset licenses, and standardized output formats for fair comparisons.
- Indoor navigation and safety audits for public buildings (Sector: public sector/policy, facilities)
- What: Rapid mapping for evacuation planning, accessibility audits, and safety inspections.
- Why 3DGS-SLAM: Fast creation of highly realistic maps supporting human-in-the-loop assessment; robust re-localization across revisits.
- Potential tools/workflows: Handheld RGB-D scanners; automated route-checking with extracted geometry; persistent, loop-closed building maps.
- Assumptions/dependencies: Data governance (privacy/security), standardized export to safety tools, staff training.
Long-Term Applications
The survey identifies research directions (e.g., robustness under dynamics/motion blur, scalability, memory) that enable the following applications as algorithms and hardware mature.
- City-scale, persistent AR clouds and live digital twins (Sector: urban planning, telecom, XR)
- What: Large-scale, multi-session maps with photorealistic rendering for navigation, advertising, and planning.
- Why 3DGS-SLAM: Efficient explicit representation with high fidelity; memory-focused methods (e.g., MemGS) and robust loop closure (LoopSplat/Loopy-SLAM) are prerequisites.
- Dependencies: Cross-session/global map fusion, standardized 3DGS interchange (e.g., alignment with glTF/OpenUSD), privacy-preserving data pipelines, and edge–cloud streaming.
- Autonomous driving and outdoor robotics under heavy dynamics (Sector: automotive, logistics)
- What: Real-time mapping with moving-object awareness and strong localization in traffic and clutter.
- Why 3DGS-SLAM: Dynamic-scene handling (MotionGS), multi-sensor fusion (IMU/LiDAR; e.g., MM3DGS-SLAM, LVI-GS, GS-LIVO), and robust photometric tracking promise improved resilience.
- Dependencies: Real-time semantic instance understanding (e.g., SGS-SLAM), rolling-shutter and illumination robustness, safety certification, compute efficiency on automotive-grade SoCs.
- Multi-robot cooperative mapping with Gaussian exchange (Sector: robotics/defense/public safety)
- What: Teams of agents share sparse/compact Gaussian primitives to accelerate joint mapping and re-localization.
- Why 3DGS-SLAM: Explicit primitives are communication-friendly; ICP variants (GS-ICP/G2S-ICP) support inter-agent alignment.
- Dependencies: Time-sync, bandwidth-adaptive compression, distributed loop closure, and conflict resolution for map merges.
- On-device SLAM for AR glasses and ultra-low-power edge (Sector: consumer XR, wearables)
- What: Always-on, privacy-preserving mapping on head-worn devices.
- Why 3DGS-SLAM: Tile-based rasterization is hardware-friendly; system-level accelerators (e.g., GauSPU-like concepts) could enable dedicated splat units.
- Dependencies: Specialized hardware, VRAM constraints, thermals, and compact semantic layers; incremental/online Gaussian optimization without cloud.
- Interactive, editable digital twins with semantics and physics (Sector: industrial metaverse, AEC, manufacturing)
- What: Live editing, object-level semantics, and physics-driven “what-if” scenarios using high-fidelity maps.
- Why 3DGS-SLAM: Photorealistic geometry+appearance with semantic SLAM (SGS-SLAM) enables object retrieval and scene editing; loop closure maintains consistency over time.
- Dependencies: Stable long-term semantics, bidirectional conversions (Gaussian ↔ mesh), and integration with simulation engines.
- Image-guided surgery and clinical AR navigation (Sector: healthcare)
- What: Sub-centimeter-accurate AR overlays for minimally invasive procedures and intraoperative guidance.
- Why 3DGS-SLAM: High-fidelity mapping and fast rendering could reduce registration errors vs. classical sparse maps.
- Dependencies: Medical-grade sensor fusion (RGB-D/IMU), rigorous validation, sterility/workflow integration, and regulatory approvals; robustness to specular tissues and motion.
- Standards and governance for 3DGS data (Sector: policy, cybersecurity)
- What: Regulatory and technical standards for storage, exchange, and redaction of photorealistic maps.
- Why 3DGS-SLAM: Rich textures raise privacy risks; explicit Gaussians enable selective obfuscation/redaction policies.
- Dependencies: Interoperability standards, privacy-by-design toolchains (on-device processing, encryption), and audit mechanisms.
- Disaster response mapping in degraded conditions (Sector: emergency response)
- What: Rapid reconstruction in smoke, low light, or high-speed motion using event cameras and multi-sensor rigs.
- Why 3DGS-SLAM: Event-based compatibility (NEDS-SLAM) and inertial/LiDAR fusion could maintain tracking when standard cameras fail.
- Dependencies: Ruggedized hardware, event–frame fusion maturity, resilient loop closure, and operational protocols.
- Learning from 3DGS maps for robot manipulation and autonomy (Sector: robotics/AI)
- What: Using high-fidelity maps as training targets for grasping, placement, or navigation policies; sim-to-real transfer.
- Why 3DGS-SLAM: Continuous, photorealistic geometry aids contact reasoning and visual domain realism vs. sparse maps.
- Dependencies: Differentiable interfaces to planners/policies, real-time updates during interaction, and consistent scale/metric accuracy.
- Federated, privacy-preserving SLAM for homes and workplaces (Sector: consumer/enterprise)
- What: Local mapping with encrypted sharing of minimal splat updates for opt-in services (e.g., shared AR).
- Why 3DGS-SLAM: Compact explicit representation and memory-aware methods support selective, bandwidth-efficient sharing.
- Dependencies: Federated protocols, differential privacy techniques for textures, and user-consent frameworks.
These applications stem from the survey’s core findings: 3DGS-SLAM’s explicit, differentiable, and GPU-friendly representation bridges the fidelity–efficiency gap; robust tracking (photometric + geometric), loop closure, and multi-sensor integration expand operational envelopes; memory optimizations and tile-based rasterization enable real-time deployment. Feasibility depends on compute budgets, sensor calibration and synchronization, semantic/dynamic-scene maturity, data governance, and standardization for interoperable workflows.
Glossary
- 3D Gaussian Splatting (3DGS): An explicit scene representation using anisotropic 3D Gaussians rendered by splatting for fast, high-quality view synthesis. "3D Gaussian Splatting (3DGS), with its efficient explicit representation and high-quality rendering capabilities, offers a new reconstruction paradigm for SLAM."
- affine Jacobian: The Jacobian matrix of an affine projection, describing local derivatives of the camera projection. "where J is the affine Jacobian of the projection."
- alpha blending: A compositing technique that combines colors using opacity in a front-to-back order. "The color C of a pixel is obtained by front-to-back alpha blending of the projected Gaussians:"
- axis-aligned feature planes: Orthogonal 2D feature grids aligned with axes used to parameterize 3D features efficiently. "such as the axis-aligned feature planes in ESLAM or the neural point cloud representations in Point-SLAM."
- backpropagation: Gradient-based parameter update via reverse-mode differentiation through the rendering pipeline. "computed and backpropagated to update each Gaussian’s parameters"
- CUDA: NVIDIA’s parallel computing platform for GPU-accelerated processing of per-tile/pixel kernels. "this approach is efficiently parallelized on CUDA."
- cumulative transparency: The accumulated transmittance along a ray from previously blended elements. "T_i is the cumulative transparency from preceding Gaussians."
- differentiable projection: A projection step formulated to allow gradients to flow to 3D parameters and poses. "2)Differentiable Projection:"
- differentiable rasterization: A rasterization process designed to be differentiable so that rendering is amenable to gradient-based learning. "renders views via differentiable rasterization and iteratively refines the geometry through adaptive optimization."
- end-to-end mapping: Joint learning from inputs to outputs without manual intermediate stages, enabling unified optimization. "enable learnable, differentiable, end-to-end mapping."
- explicit geometric representations: Directly stored scene primitives such as points, voxels, or meshes, as opposed to neural functions. "explicit geometric representations (e.g., point clouds, voxels, meshes)"
- Gaussian primitive: A 3D Gaussian used as a basic rendering element in the scene representation. "Point Cloud and Gaussian Primitive Initialization"
- Gaussian splat: The projected image-space footprint of a 3D Gaussian primitive used for splatting-based rendering. "each 3D Gaussian splat G_i is initialized with parameters"
- graph-based paradigm: A SLAM formulation that optimizes a pose/landmark graph with loop-closure constraints. "the ORB-SLAM series established the classic graph-based paradigm by integrating loop closure and rigorous keyframe management."
- implicit neural representations: Coordinate-conditioned neural functions (rather than explicit geometry) used to represent scenes. "implicit neural representations excel at detail synthesis"
- keyframe management: The selection and maintenance of representative frames to support robust mapping and loop closure. "rigorous keyframe management."
- loop closure: Recognizing previously visited places to add constraints that correct accumulated trajectory error. "integrating loop closure and rigorous keyframe management."
- multi-layer perceptrons (MLPs): Feedforward neural networks used as function approximators for scene representations. "tend to use large voxel hashes or multi-layer perceptrons (MLPs) for scene representation"
- multi-scale SSIM: A perceptual image similarity metric (Structural SIMilarity) computed at multiple scales and used as a loss. "multi-scale SSIM loss"
- Neural Radiance Fields (NeRF): An implicit neural volumetric model that maps coordinates and view directions to view-dependent radiance and density. "Neural Radiance Fields (NeRF) and its variants"
- positive semi-definite: A matrix property where all quadratic forms are non-negative; required for valid covariance matrices. "To ensure Σ_i is positive semi-definite, it is reparameterized via a rotation R_i and scale matrix S_i:"
- pose drift: The gradual accumulation of error in estimated camera poses over time. "causing pose drift."
- quaternion: A four-parameter representation of 3D rotation used for stable optimization of orientations. "R_i is generated from a learnable quaternion q_i=(q_w,q_x,q_y,q_z)."
- rasterization (tile-based parallel rasterization): Converting projected primitives into pixels using a tiled, parallel scheme for efficiency. "3DGS uses a tile-based parallel rasterization to avoid costly per-pixel iteration."
- spherical harmonics: Basis functions on the sphere used to compactly model view-dependent color. "color is typically represented by spherical harmonics"
- structure-from-motion (SfM): Recovery of camera poses and sparse 3D structure from multiple overlapping images. "using structure-from-motion (SfM) to generate a sparse point cloud"
- tightly coupled architectures: SLAM systems that jointly optimize tracking and mapping within a single integrated framework. "finally to tightly coupled architectures (e.g., the ORB-SLAM series and DSO)."
- view frustum: The pyramidal volume visible to the camera used to cull off-screen elements. "first prunes Gaussians lying outside the view frustum."
- view transformation matrix: A matrix that maps world coordinates into the camera’s view space for projection. "Given a view transformation matrix W"
- voxel hash: A sparse voxel storage scheme that uses hashing to index occupied voxel blocks efficiently. "large voxel hashes"
Collections
Sign up for free to add this paper to one or more collections.