GPU-Native SDF Perception

Updated 12 March 2026

The paper introduces GPU-native SDF perception, combining block-sparse TSDF/ESDF and neural SDF decoding to achieve real-time spatial reasoning.
It details custom CUDA kernel parallelism and memory coalescence techniques that eliminate CPU bottlenecks, resulting in sub-millisecond update latencies and high collision recall.
The work demonstrates practical applications in robotics, control, and 3D reconstruction, enabling rapid planning and high-throughput mapping under dynamic conditions.

GPU-native signed distance field (SDF) perception refers to geometric understanding systems that maintain a continuous or discretized implicit signed distance function, fully realized and operated directly on the GPU. These systems underpin real-time spatial reasoning in robotics, vision, and autonomous navigation by enabling high-throughput geometric response—collision checks, free-space queries, and gradient-based planning—without recourse to CPU bottlenecks or offline preprocessing. Advances in this area span explicit volumetric grids, analytic geometric priors, and neural network SDFs, utilizing fused memory layouts, batched MLP evaluations, and fully parallelized kernels for orders-of-magnitude acceleration in mapping, control, and reconstruction tasks.

1. Core Methodologies in GPU-Native SDF Perception

GPU-native SDF perception pipelines operationalize the SDF as either explicit voxel grids (TSDFs/ESDFs) or learned neural fields, manipulating all elements—sparse hash-based layouts, query batching, distance transforms—entirely on GPU memory with parallel CUDA kernels.

Volumetric Block-Sparse TSDF/ESDF: Systems such as nvblox and cuRoboV2 store the SDF as spatially hashed, block-sparse grids, each block an 8×8×8 voxel array, with hash-table allocation and per-voxel fusion implemented in kernel launches per frame. SDF updates are triggered by sensor depth integration, and Euclidean signed distance fields (ESDFs) are constructed via multi-pass parallel banding algorithms. The entire mapping, querying, and updating loop—from depth frame to up-to-date ESDF—is executed by dedicated GPU kernels, supporting real-time integration rates (e.g., 0.4 ms/frame for TSDF, ~1 ms for ESDF) (Millane et al., 2023, Sundaralingam et al., 5 Mar 2026).
Neural SDF Decoding: Neural SDF approaches like Gate-SDF, GPU-SDF, iSDF, and GSurf represent the SDF as an implicit function $f_\theta(x): \mathbb{R}^3 \to \mathbb{R}$ parameterized by neural networks (MLPs). Queries for arbitrary 3D points are batched and forwarded through the network in parallel, exploiting large tensor operations and GPU matrix-multiply primitives. Adaptive detail is realized by focusing network capacity through keyframe replay, stratified or uncertainty-guided sampling, and local feature encoding (e.g., multi-res hash grids, Instant-NGP style) (Zhao et al., 7 Mar 2026, Feng et al., 27 Feb 2026, Ortiz et al., 2022, Xu et al., 2024).
End-to-End GPU Processing: Across both explicit and neural SDFs, principal design choices for GPU-native pipelines include: (1) elimination of CPU-GPU roundtrips (all fusions and queries in device memory), (2) fixed-shape batched kernels replacing per-loop host iteration, (3) memory coalescence (block-contiguous layouts, linear addressing), and (4) atomicity minimization (block-level conflict resolution, per-voxel unique threads).

2. SDF Representations: Analytical, Volumetric, and Neural

Analytical SDFs: Used for ground-truth supervision and as geometric priors, analytic SDFs define distance to objects or regions (e.g., $s_\mathrm{guide}(p) = c + |x| \tan \alpha - r(p)$ for a gate frustum) and can be min-combined with data-driven SDFs for hybrid fusion (Zhao et al., 7 Mar 2026, Sundaralingam et al., 5 Mar 2026).
Volumetric Grids (TSDF/ESDF): The SDF is discretized into surfel-aligned blocks; TSDF values are computed via fusion from projective depth, with running-weight averaging, truncation, and spatial confidence fields. ESDFs are constructed from the TSDF using parallelized sweep algorithms (e.g., PBA+), yielding distance-to-surface maps at reduced grid resolution for efficient querying (Millane et al., 2023, Sundaralingam et al., 5 Mar 2026).
Neural SDFs: The SDF is realized as a continuous function. GPU-native neural SDF variants use positional encoding, shallow MLPs, or hash-grid encodings to efficiently represent $f_\theta(x)$ , accepting either a raw 3D query or its locally encoded embedding. The batch of points to be evaluated is determined by the requirements of planning (e.g., tens of thousands of trajectory rollout points for MPC) (Zhao et al., 7 Mar 2026, Feng et al., 27 Feb 2026, Ortiz et al., 2022).

3. Loss Functions and Training Paradigms

Supervised and Self-Supervised Losses:
- Analytical SDF losses: Direct L1 regression between network SDF and analytic SDF (for points sampled within the camera frustum or environment).
- Reconstruction: Auxiliary image or color loss, often via a depth decoder or volume rendering.
- Eikonal regularization: Penalizes deviation of $\|\nabla f_\theta(x)\|_2$ from unity, enforcing proper SDF gradient behavior (Feng et al., 27 Feb 2026, Xu et al., 2024, Ortiz et al., 2022).
- Edge, geometric, and normal field losses: Encourage preservation of sharp features and faithful surface orientation (Feng et al., 27 Feb 2026).
- Uncertainty-guided or active sampling: Losses modulated by geometric or predicted uncertainty, focusing learning on regions where priors are weak or ambiguous (Feng et al., 27 Feb 2026).
- Self-supervised bounds: For real-time continual SDF field learning from raw depth, the loss is formulated as an upper bound based on min-distance to observed surfaces, with additional gradient and surface consistency regularizers (Ortiz et al., 2022).
Hybrid Training Schedules:
- Two-stage regimes: Neural SDF system pre-trained in simulation with denoising objectives and analytic SDFs, then fine-tuned on real data with partial or full freezing of decoder components (Zhao et al., 7 Mar 2026).
- Direct Gaussian supervision: Supervision using positions and normals of splatted 3D Gaussian disks for rapid convergence and robust mesh generation (Xu et al., 2024).

4. GPU Kernel Parallelism and Implementation Details

Representative pipelines feature key design choices for high-throughput operation:

Batched SDF Queries: All planning, mapping, and rendering kernels (e.g., 10⁵–10⁸ points per control step) are launched as large matrix multiplications and activation batches, or as voxel-aligned sweeps, achieving full occupancy and throughput (e.g., 80%+ occupancy, 2–3 ms per 10⁵ SDF queries) (Zhao et al., 7 Mar 2026, Millane et al., 2023, Sundaralingam et al., 5 Mar 2026).
Memory Structure and Coalescence: Block-sparse allocation and per-block/voxel layouts (hash-table and contiguous 8³ blocks) allow for efficient neighbor lookups and no waste in unallocated regions. Hash-grid encoders for neural fields deliver fine-grained local encoding with minimal compute overhead (Millane et al., 2023, Feng et al., 27 Feb 2026, Sundaralingam et al., 5 Mar 2026).
Custom CUDA Kernels: Key architectural elements are implemented as custom kernels: block allocation (atomic CAS for hash), site seeding and propagation (PBA+), fragment sorting and blending for Gaussians, and hybrid gather/scatter strategies for minimal atomics and memory contention (Millane et al., 2023, Sundaralingam et al., 5 Mar 2026, Xu et al., 2024).
JIT Compilation and Framework Integration: Many pipeline stages are fused with XLA/JAX or PyTorch CUDA graphs, enabling single-step kernel launch per control or mapping timestep, and reducing CPU synchronization overhead (Zhao et al., 7 Mar 2026, Feng et al., 27 Feb 2026).

5. SDF Perception in Downstream Planning and Control

SDF perception modules feed directly into planning and control pipelines:

Sampling-Based Model Predictive Control (MPPI): SDF values and distances condition cost functions or constraints for each rollout in a control horizon. For example, in Gate-SDF, given $M$ rollouts of $K$ steps, all $M \cdot K$ SDF values are computed in a single batch and used to produce collision and guidance costs in real time (Zhao et al., 7 Mar 2026).
Collision Detection and Clearance: Dense ESDFs provide subcentimeter-accurate queries for $10^6$ – $10^8$ points per planning cycle, supporting deterministic, kHz-scale collision-checking in whole-body or arm manipulation and humanoid locomotion. The high recall rates (≥99%) support safety-critical applications (Sundaralingam et al., 5 Mar 2026, Millane et al., 2023).
Surface Reconstruction and Scene Understanding: Neural and analytic SDF fields yield high-fidelity surfaces used for downstream 3D reconstruction, appearance modeling, and scene segmentation, including architectures that combine regularization, edge-awareness, and multi-view consistency to handle weakly observed or uncertain regions (Feng et al., 27 Feb 2026, Xu et al., 2024).

6. Performance Metrics and Comparative Evaluation

Empirical assessments confirm the computational and functional benefits of GPU-native SDF perception:

System	SDF Query Throughput	SDF Update Latency	Memory Usage	Collision Recall (%)
cuRoboV2	10⁶–10⁸/step	0.52 ms (TSDF), 1 ms (ESDF)	0.82–1.63 GB	97–99.7
nvblox	6–7 Gqueries/s	0.4 ms (TSDF), 1.9 ms (ESDF)	1.63–11.87 GB	92–97
Gate-SDF	10⁵ per step	2 ms (MLP), 1 ms (encoder)	--	--
iSDF	~30 Hz end-to-end	33 ms/frame	1 MB	--
GSurf	20–60 fps rendering	~0.8 h training	--	--

Volumetric systems (cuRoboV2, nvblox) achieve 7–10× speedup and 2–8× less memory usage over prior art without sacrificing collision recall (Sundaralingam et al., 5 Mar 2026, Millane et al., 2023). Neural SDF methods yield reconstruction errors of 2–6 cm (absolute SDF), outperforming traditional voxel grids on both accuracy and memory efficiency (Ortiz et al., 2022). Gate-SDF enables real-time, fully onboard drone flight at 5–10 m/s velocities with robust operation under pose perturbations up to ±0.6 m and ±60° (Zhao et al., 7 Mar 2026). Neural reconstruction methods with hash encoding and direct supervision reduce training time by up to 10× while preserving surface detail (Xu et al., 2024, Feng et al., 27 Feb 2026).

7. Applications and Limitations

GPU-native SDF perception underpins a range of high-performance spatial AI tasks:

Real-Time Robotic Control: Enables model predictive control, aggressive aerial maneuvers, and manipulation in dynamic or unknown environments, by providing continuous, differentiable signed-distance feedback (Sundaralingam et al., 5 Mar 2026, Zhao et al., 7 Mar 2026).
Scene Reconstruction: Drives high-fidelity mapping efforts, supporting dense, complete, and edge-aware surface recovery even in the presence of thin structures, discontinuities, and uncertain measurements (Feng et al., 27 Feb 2026, Xu et al., 2024, Ortiz et al., 2022).
3D Vision and Novel View Synthesis: Direct SDF supervision from 3D Gaussian primitives or neural radiance field surrogates facilitates fast, accurate rendering and mesh generation (Xu et al., 2024).

Common limitations include challenges in preserving fine details under weak supervisory signals, memory consumption in large-scale or high-resolution volumetric settings, and the requirement for careful uncertainty modeling or hybrid analytic-prior fusion in highly dynamic scenes (Feng et al., 27 Feb 2026, Sundaralingam et al., 5 Mar 2026). A plausible implication is that future systems will combine the strengths of explicit sparse grid representations (for deterministic, scalable queries), neural SDFs (for compactness and surface completion), and GPU-specific architectural optimizations for minimal latency and maximal throughput.