RayMamba: Ray-Aligned Serialization for Long-Range 3D Object Detection

Published 3 Apr 2026 in cs.CV and cs.AI | (2604.02903v1)

Abstract: Long-range 3D object detection remains challenging because LiDAR observations become highly sparse and fragmented in the far field, making reliable context modeling difficult for existing detectors. To address this issue, recent state space model (SSM)-based methods have improved long-range modeling efficiency. However, their effectiveness is still limited by generic serialization strategies that fail to preserve meaningful contextual neighborhoods in sparse scenes. To address this issue, we propose RayMamba, a geometry-aware plug-and-play enhancement for voxel-based 3D detectors. RayMamba organizes sparse voxels into sector-wise ordered sequences through a ray-aligned serialization strategy, which preserves directional continuity and occlusion-related context for subsequent Mamba-based modeling. It is compatible with both LiDAR-only and multimodal detectors, while introducing only modest overhead. Extensive experiments on nuScenes and Argoverse 2 demonstrate consistent improvements across strong baselines. In particular, RayMamba achieves up to 2.49 mAP and 1.59 NDS gain in the challenging 40--50 m range on nuScenes, and further improves VoxelNeXt on Argoverse 2 from 30.3 to 31.2 mAP.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces a ray-aligned serialization approach that organizes sparse voxel features into sector-wise, 1D sequences aligned with LiDAR sensor geometry.
The method integrates a SectorMamba3D block for independent sequence modeling across angular sectors, yielding improved mAP and NDS on nuScenes and AV2 benchmarks.
Empirical results show enhanced long-range 3D object detection with marginal increases in parameters and latency, demonstrating its potential for real-time autonomous systems.

RayMamba: Sensor-Geometry-Aware Serialization for Long-Range 3D Object Detection

Introduction

Long-range 3D object detection with LiDAR is constrained by severe point cloud sparsity and incomplete returns at extended distances, predominantly caused by occlusion and distance-dependent point-dropout. These fundamental sensor effects pose significant challenges to traditional convolutional and Transformer-based 3D detection backbones, where local or globally aggregated context cannot robustly compensate for the sparsity and fragmentation of distant objects. The paper "RayMamba: Ray-Aligned Serialization for Long-Range 3D Object Detection" (2604.02903) introduces an alternative serialization paradigm, RayMamba, which explicitly aligns sequence construction with LiDAR sensor geometry, thereby organizing sparse voxel features into sector-wise, directionally contiguous 1D sequences for Mamba-based sequence modeling.

Figure 1: Distant objects in LiDAR-based perception are often sparsely represented due to occlusion and distance, highlighting the far-field observation challenge.

The RayMamba approach addresses the core issue with generic serialization schemes—such as Hilbert or Z-order curves—that fail to preserve semantically meaningful neighborhoods under high sparsity. Instead, the ray-aligned serialization strategy partitions the space into angular sectors and imposes a top-down, azimuth-continuous ordering, directly mirroring the spatial and occlusion structure encountered in LiDAR-based perception.

Methodological Innovations

The RayMamba framework augments sparse voxel-based detectors using a plug-and-play module that consists of two principal components: Ray-Aligned Serialization and SectorMamba3D sequence modeling.

Ray-Aligned Serialization first transforms voxel indices into polar coordinates with respect to the ego vehicle, partitioning voxels into independent azimuthal sectors, then sorts each sector's voxels by height, and within each height stratum, by azimuth. This serialization is realized through a precomputed sector template facilitating efficient runtime queries and CUDA-based parallel reordering.

Figure 2: Ray-aligned ordering (blue) preserves geometric coherence in sparse sequences, in contrast to Hilbert ordering (purple) that mixes spatially unrelated regions in far-field contexts.

Figure 3: The ray-aligned serialization strategy partitions BEV space into angular sectors, applying vertical and angular ordering to ensure directionally coherent and occlusion-consistent sequences.

The serialized sequences are subsequently processed by the SectorMamba3D block, which independently applies Mamba-based state space modeling within each sector. This design is motivated by the observation that, under severe sparsity, sector-wise continuity encodes occlusion relationships and directional context more effectively than any global space-filling curve. After sequence modeling, the enhanced features are mapped back to their original spatial (sparse voxel) indices for downstream detection head fusion.

Figure 4: RayMamba blocks are inserted into sparse 3D convolutional backbones. Ray-Aligned Serialization and SectorMamba3D enable sector-wise, geometry-aware sequence modeling followed by reintegration into voxel space.

Notably, RayMamba eschews explicit radial sorting, as ablation demonstrates that additional distance-based ordering either marginally improves or degrades overall performance, reinforcing the hypothesis that azimuthal and vertical continuity are the critical context carriers in far-range conditions.

Empirical Results

RayMamba is demonstrated on both nuScenes and Argoverse 2 (AV2) benchmarks and evaluated when incorporated into CenterPoint (LiDAR-only), MV2DFusion (LiDAR + camera), and VoxelNeXt (fully sparse) backbones. The consistent performance increases validate both the modularity and generality of the method.

nuScenes: RayMamba achieves substantial gain over strong baselines. With CenterPoint, there is a +2.49 mAP and +1.59 NDS improvement in the 40–50 m range compared to baseline, while also increasing overall mAP and NDS (+1.0 and +0.9, respectively). For MV2DFusion, the approach delivers consistent improvements across all ranges.
AV2: Applied to VoxelNeXt, RayMamba increases overall mAP from 30.3 to 31.2, with pronounced improvement for challenging detection classes and across the farthest range segment (50–150 m), demonstrating geometric serialization efficacy even under extreme data sparsity.

A comprehensive ablation on the angular sector step ( $\Delta\theta$ ) and serialization granularity reveals that there is a trade-off between contextual fidelity (smaller sectors) and computational efficiency, with $\Delta\theta=60^\circ$ offering optimal performance/efficiency balance in the default configuration.

Figure 5: RayMamba improves qualitative detection on challenging long-range, occluded targets, compared with baseline predictions.

Comparative analysis with Hilbert-curve-based serialization substantiates the claim that geometry-aware (ray-aligned) ordering yields superior context modeling in sparse regimes, especially in the most challenging near-boundary and high-occlusion scenarios.

Efficiency Considerations

RayMamba introduces modest resource overhead. Amortized offline computation of the sector template and parallelized runtime sorting keeps the latency marginal. For CenterPoint, it adds just 0.40M parameters, and for MV2DFusion, 2.47M parameters, with only minimal throughput reduction, while yielding marked improvements in long-range detection accuracy, validating the method's suitability for deployment in latency-sensitive, real-time perception stacks.

Theoretical and Practical Implications

RayMamba advances the understanding that both the serialization strategy and the inductive bias induced by respecting sensor geometry are pivotal for effective sequence modeling in sparse 3D perception. This approach bridges the gap between classical geometry-inspired processing and data-driven neural modeling, underscoring that neural architectures must be co-designed with an explicit acknowledgment of the sensor's physical characteristics to maintain context and spatial coherence under severe data drop-out.

On the practical side, RayMamba’s compatibility with both unimodal and multimodal 3D object detection frameworks, coupled with its marginal runtime and parameter overhead, positions it as a robust candidate for deployment in large-scale autonomous driving systems seeking to improve safety via better long-range hazard anticipation.

Future Directions

The current bottleneck lies in the sector-wise Mamba's serial execution: as sector granularity increases, so does the overhead, limiting deployment scalability. Future research may explore parallelization within sector-wise modeling modules or integrate dynamic, adaptive sector partitioning schemes sensitive to evolving scene context. Expanding RayMamba to other sensor modalities or domain adaptation scenarios is a natural avenue, as is its potential synergy with camera- or radar-centric self-attention layers.

Conclusion

RayMamba demonstrates that sensor-geometry-aware serialization is critical for maximal exploitation of sparse, occluded, long-range LiDAR data in 3D object detection. The ray-aligned sector-wise ordering enables directionally coherent, context-preserving state space modeling, outperforming generic serialization methods both quantitatively and qualitatively. By bridging efficient sparse computation and geometry-sensitive sequence construction, this work sets a paradigm for future architectures that combine physical sensor priors with neural context aggregation for robust autonomous perception.

Markdown Report Issue