X-SLAM: Scalable Dense SLAM for Task-aware Optimization using CSFD (2405.02187v1)
Abstract: We present X-SLAM, a real-time dense differentiable SLAM system that leverages the complex-step finite difference (CSFD) method for efficient calculation of numerical derivatives, bypassing the need for a large-scale computational graph. The key to our approach is treating the SLAM process as a differentiable function, enabling the calculation of the derivatives of important SLAM parameters through Taylor series expansion within the complex domain. Our system allows for the real-time calculation of not just the gradient, but also higher-order differentiation. This facilitates the use of high-order optimizers to achieve better accuracy and faster convergence. Building on X-SLAM, we implemented end-to-end optimization frameworks for two important tasks: camera relocalization in wide outdoor scenes and active robotic scanning in complex indoor environments. Comprehensive evaluations on public benchmarks and intricate real scenes underscore the improvements in the accuracy of camera relocalization and the efficiency of robotic navigation achieved through our task-aware optimization. The code and data are available at https://gapszju.github.io/X-SLAM.
- MultiZ: A Library for Computation of High-order Derivatives Using Multicomplex or Multidual Numbers. ACM Trans. Math. Softw. 46, 3, Article 23 (jul 2020), 30 pages. https://doi.org/10.1145/3378538
- Receding Horizon ”Next-Best-View” Planner for 3D Exploration. In 2016 IEEE International Conference on Robotics and Automation (ICRA). 1462–1468. https://doi.org/10.1109/ICRA.2016.7487281
- Dsac-differentiable ransac for camera localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6684–6692.
- Eric Brachmann and Carsten Rother. 2021. Visual camera re-localization from RGB and RGB-D images using DSAC. IEEE transactions on pattern analysis and machine intelligence 44, 9 (2021), 5847–5865.
- RoboScan: an automatic system for accurate and unattended 3D scanning. In Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004. IEEE, 805–812.
- Avraham Cohen and Moshe Shoham. 2016. Application of hyper-dual numbers to multibody kinematics. Journal of Mechanisms and Robotics 8, 1 (2016), 011015.
- BundleFusion: Real-Time Globally Consistent 3D Reconstruction Using On-the-Fly Surface Reintegration. ACM Trans. Graph. 36, 4, Article 76a (jul 2017), 18 pages. https://doi.org/10.1145/3072959.3054739
- Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 224–236.
- Jeffrey Fike and Juan Alonso. 2011. The development of hyper-dual numbers for exact second-derivative calculations. In 49th AIAA aerospace sciences meeting including the new horizons forum and aerospace exposition. 886.
- 3d-future: 3d furniture shape with texture. International Journal of Computer Vision (2021), 1–25.
- Héctor H. González-Baños and Jean-Claude Latombe. 2002. Navigation Strategies for Exploring Indoor Environments. The International Journal of Robotics Research 21, 10-11 (2002), 829–848. https://doi.org/10.1177/0278364902021010834
- Deep active localization. IEEE Robotics and Automation Letters 4, 4 (2019), 4394–4401.
- ObjectMatch: Robust Registration using Canonical Object Correspondences. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13082–13091. https://doi.org/10.1109/CVPR52729.2023.01257
- A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In 2014 IEEE International Conference on Robotics and Automation (ICRA). 1524–1531. https://doi.org/10.1109/ICRA.2014.6907054
- Nicholas J Higham. 2002. Accuracy and stability of numerical algorithms. SIAM.
- Chainqueen: A real-time differentiable physical simulator for soft robotics. In 2019 International conference on robotics and automation (ICRA). IEEE, 6265–6271.
- Supervoxel Convolution for Online 3D Semantic Segmentation. ACM Transactions on Graphics 40, 3 (Aug. 2021), 34:1–34:15. https://doi.org/10.1145/3453485
- Vs-net: Voting with segmentation for visual localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6101–6111.
- ∇∇\nabla∇SLAM: Dense SLAM meets Automatic Differentiation. In 2020 IEEE International Conference on Robotics and Automation (ICRA). 2130–2137. https://doi.org/10.1109/ICRA40945.2020.9197519
- Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation. arXiv:2105.07593 [cs, stat]
- Posenet: A convolutional network for real-time 6-dof camera relocalization. , 2938–2946 pages.
- Using multicomplex variables for automatic computation of high-order derivatives. ACM Transactions on Mathematical Software (TOMS) 38, 3 (2012), 16.
- Xinyi Li and Haibin Ling. 2022. GTCaR: Graph Transformer for Camera Re-Localization. In Computer Vision – ECCV 2022, Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Vol. 13670. Springer Nature Switzerland, Cham, 229–246. https://doi.org/10.1007/978-3-031-20080-9_14
- Object-aware guidance for autonomous scene reconstruction. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1–12.
- INS-Conv: Incremental Sparse Convolution for Online 3D Segmentation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18953–18962. https://doi.org/10.1109/CVPR52688.2022.01840
- Accelerated complex-step finite difference for expedient deformable simulation. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1–16.
- Virtual Correspondence: Humans as a Cue for Extreme-View Geometry. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 15903–15913. https://doi.org/10.1109/CVPR52688.2022.01546
- The complex-step derivative approximation. ACM Transactions on Mathematical Software (TOMS) 29, 3 (2003), 245–262.
- SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks. In 2017 IEEE International Conference on Robotics and Automation (ICRA). 4628–4635. https://doi.org/10.1109/ICRA.2017.7989538
- ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics 31, 5 (Oct. 2015), 1147–1163. https://doi.org/10.1109/TRO.2015.2463671 arXiv:1502.00956 [cs]
- Kinectfusion: Real-time dense surface mapping and tracking. In 2011 10th IEEE international symposium on mixed and augmented reality. Ieee, 127–136.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 652–660.
- SLAM++: Simultaneous Localisation and Mapping at the Level of Objects. In 2013 IEEE Conference on Computer Vision and Pattern Recognition. 1352–1359. https://doi.org/10.1109/CVPR.2013.178
- From Coarse to Fine: Robust Hierarchical Localization at Large Scale. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12708–12717. https://doi.org/10.1109/CVPR.2019.01300
- Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4938–4947.
- Back to the feature: Learning robust camera localization from pixels to pose. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3247–3257.
- Fast Image-Based Localization Using Direct 2D-to-3D Matching. In 2011 International Conference on Computer Vision. 667–674. https://doi.org/10.1109/ICCV.2011.6126302
- High-order differentiable autoencoder for nonlinear model reduction. ACM Trans. Graph. 40, 4 (2021), 68:1–68:15. https://doi.org/10.1145/3450626.3459754
- Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images. In 2013 IEEE Conference on Computer Vision and Pattern Recognition. 2930–2937. https://doi.org/10.1109/CVPR.2013.377
- Long-Term Visual Localization Using Semantically Segmented Images. https://doi.org/10.48550/arXiv.1801.05269 arXiv:1801.05269 [cs]
- A benchmark for the evaluation of RGB-D SLAM systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 573–580. https://doi.org/10.1109/IROS.2012.6385773
- Learning camera localization via dense scene matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1831–1841.
- Carlo Tomasi and Roberto Manduchi. 1998. Bilateral filtering for gray and color images. In Sixth international conference on computer vision (IEEE Cat. No. 98CH36271). IEEE, 839–846.
- Atloc: Attention guided camera localization. , 10393–10401 pages.
- ElasticFusion: Dense SLAM without a pose graph.. In Robotics: science and systems, Vol. 11. Rome, Italy, 3.
- Object slam-based active mapping and robotic grasping. In 2021 International Conference on 3D Vision (3DV). IEEE, 1372–1381.
- 3D ShapeNets: A deep representation for volumetric shapes. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1912–1920. https://doi.org/10.1109/CVPR.2015.7298801
- Localizing discriminative visual landmarks for place recognition. In 2019 International conference on robotics and automation (ICRA). IEEE, 5979–5985.
- Autonomous reconstruction of unknown indoor scenes guided by time-varying tensor fields. ACM Trans. Graph. 36, 6, Article 202 (nov 2017), 15 pages. https://doi.org/10.1145/3130800.3130812
- HRBF-Fusion: Accurate 3D Reconstruction from RGB-D Data Using On-the-fly Implicits. ACM Transactions on Graphics 41, 3 (June 2022), 1–19. https://doi.org/10.1145/3516521
- View planning in robot active vision: A survey of systems, algorithms, and applications. Computational Visual Media 6 (2020), 225–245.
- Semantic SLAM Based on Object Detection and Improved Octomap. IEEE Access 6 (2018), 75545–75559. https://doi.org/10.1109/ACCESS.2018.2873617
- Active Scene Understanding via Online Semantic Reconstruction. Computer Graphics Forum 38, 7 (2019), 103–114. https://doi.org/10.1111/cgf.13820 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.13820
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.