Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VMLoc: Variational Fusion For Learning-Based Multimodal Camera Localization (2003.07289v5)

Published 12 Mar 2020 in cs.CV and eess.IV

Abstract: Recent learning-based approaches have achieved impressive results in the field of single-shot camera localization. However, how best to fuse multiple modalities (e.g., image and depth) and to deal with degraded or missing input are less well studied. In particular, we note that previous approaches towards deep fusion do not perform significantly better than models employing a single modality. We conjecture that this is because of the naive approaches to feature space fusion through summation or concatenation which do not take into account the different strengths of each modality. To address this, we propose an end-to-end framework, termed VMLoc, to fuse different sensor inputs into a common latent space through a variational Product-of-Experts (PoE) followed by attention-based fusion. Unlike previous multimodal variational works directly adapting the objective function of vanilla variational auto-encoder, we show how camera localization can be accurately estimated through an unbiased objective function based on importance weighting. Our model is extensively evaluated on RGB-D datasets and the results prove the efficacy of our model. The source code is available at https://github.com/kaichen-z/VMLoc.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Seeing through fog without seeing fog: Deep sensor fusion in the absence of labeled training data. arXiv preprint arXiv:1902.08913 .
  2. Multi-sensor mobile robot localization for diverse environments. In Robot Soccer World Cup, 468–479. Springer.
  3. An overview of the KL-ONE knowledge representation system. Cognitive Science 9(2): 171–216.
  4. Dsac-differentiable ransac for camera localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6684–6692.
  5. Learning less is more-6d camera localization via 3d surface regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4654–4662.
  6. Geometry-aware learning of maps for camera localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2616–2625.
  7. Importance weighted autoencoders. arXiv preprint arXiv:1509.00519 .
  8. Real-time RGB-D camera pose estimation in novel scenes using a relocalisation cascade. IEEE transactions on pattern analysis and machine intelligence .
  9. Selective sensor fusion for neural visual-inertial odometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 10542–10551.
  10. City-scale landmark identification on mobile devices. In CVPR 2011, 737–744. IEEE.
  11. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1907–1915.
  12. Vidloc: A deep spatio-temporal model for 6-dof video-clip relocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6856–6864.
  13. Vinet: Visual-inertial odometry as a sequence-to-sequence learning problem. In Thirty-First AAAI Conference on Artificial Intelligence.
  14. CamNet: Coarse-to-fine retrieval for camera re-localization. In Proceedings of the IEEE International Conference on Computer Vision, 2871–2880.
  15. Multi-output learning for camera relocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1114–1121.
  16. Multiple view geometry in computer vision. Cambridge university press.
  17. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  18. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Iclr 2(5): 6.
  19. Scan: Learning hierarchical compositional visual concepts. arXiv preprint arXiv:1707.03389 .
  20. Hinton, G. E. 1999. Products of experts .
  21. Modelling uncertainty in deep learning for camera relocalization. In 2016 IEEE international conference on Robotics and Automation (ICRA), 4762–4769. IEEE.
  22. Geometric loss functions for camera pose regression with deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5974–5983.
  23. Convolutional networks for real-time 6-DOF camera relocalization. CoRR abs/1505.07427 (2015).
  24. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision, 2938–2946.
  25. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 .
  26. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11): 2278–2324.
  27. Multi-task multi-sensor fusion for 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7345–7353.
  28. 1 year, 1000 km: The Oxford RobotCar dataset. The International Journal of Robotics Research 36(1): 3–15.
  29. Choosing smartly: Adaptive multimodal fusion for object detection in changing environments. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 151–156. IEEE.
  30. Image-based localization using hourglass networks. In Proceedings of the IEEE International Conference on Computer Vision, 879–886.
  31. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3994–4003.
  32. Vlocnet++: Deep multitask learning for semantic visual localization and odometry. IEEE Robotics and Automation Letters 3(4): 4407–4414.
  33. Tighter variational bounds are not necessarily better. arXiv preprint arXiv:1802.04537 .
  34. Hyperpoints and fine vocabularies for large-scale location recognition. In Proceedings of the IEEE International Conference on Computer Vision, 2102–2110.
  35. Efficient & effective prioritized matching for large-scale image-based localization. IEEE transactions on pattern analysis and machine intelligence 39(9): 1744–1756.
  36. Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models. In Advances in Neural Information Processing Systems, 15692–15703.
  37. Scene coordinate regression forests for camera relocalization in RGB-D images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2930–2937.
  38. Joint multimodal learning with deep generative models. arXiv preprint arXiv:1611.01891 .
  39. Seeing the wood for the trees: Reliable localization in urban and natural environments. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 8239–8246. IEEE.
  40. Learning to see the wood for the trees: Deep laser localization in urban and natural environments on a CPU. IEEE Robotics and Automation Letters 4(2): 1327–1334.
  41. Doubly reparameterized gradient estimators for monte carlo objectives. arXiv preprint arXiv:1810.04152 .
  42. Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision 1–47.
  43. Image-based localization using lstms for structured feature correlation. In Proceedings of the IEEE International Conference on Computer Vision, 627–637.
  44. AtLoc: Attention Guided Camera Localization. arXiv preprint arXiv:1909.03557 .
  45. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7794–7803.
  46. Automatic relocalization and loop closing for real-time monocular SLAM. IEEE transactions on pattern analysis and machine intelligence 33(9): 1699–1712.
  47. Multimodal generative models for scalable weakly-supervised learning. In Advances in Neural Information Processing Systems, 5575–5585.
  48. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 .
  49. Aligning Latent Spaces for 3D Hand Pose Estimation. In Proceedings of the IEEE International Conference on Computer Vision, 2335–2343.
  50. Camera pose voting for large-scale image-based localization. In Proceedings of the IEEE International Conference on Computer Vision, 2704–2712.
Citations (16)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub