Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Structure-from-Motion with Graph Attention Networks (2308.15984v3)

Published 30 Aug 2023 in cs.CV and cs.LG

Abstract: In this paper we tackle the problem of learning Structure-from-Motion (SfM) through the use of graph attention networks. SfM is a classic computer vision problem that is solved though iterative minimization of reprojection errors, referred to as Bundle Adjustment (BA), starting from a good initialization. In order to obtain a good enough initialization to BA, conventional methods rely on a sequence of sub-problems (such as pairwise pose estimation, pose averaging or triangulation) which provide an initial solution that can then be refined using BA. In this work we replace these sub-problems by learning a model that takes as input the 2D keypoints detected across multiple views, and outputs the corresponding camera poses and 3D keypoint coordinates. Our model takes advantage of graph neural networks to learn SfM-specific primitives, and we show that it can be used for fast inference of the reconstruction for new and unseen sequences. The experimental results show that the proposed model outperforms competing learning-based methods, and challenges COLMAP while having lower runtime. Our code is available at https://github.com/lucasbrynte/gasfm/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Ceres solver. http://ceres-solver.org/.
  2. Building rome in a day. In 2009 IEEE 12th International Conference on Computer Vision, pages 72–79, 2009.
  3. Layer normalization, 2016.
  4. How attentive are graph attention networks? In International Conference on Learning Representations, 2022.
  5. A case for using rotation invariant features in state of the art feature matchers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5110–5119, 2022.
  6. Steerers: A framework for rotation equivariant keypoint descriptors. In IEEE Conf. Comput. Vis. Pattern Recog., 2024.
  7. TensoRF: Tensorial Radiance Fields, pages 333–350. 2022.
  8. Projective multiview structure and motion from element-wise factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9):2238–2251, 2013.
  9. Non-sequential structure from motion. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pages 264–271, 2011.
  10. Rotation averaging with the chordal distance: Global minimizers and strong duality. IEEE Trans. Pattern Anal. Mach. Intell., 43(1):256–268, 2021.
  11. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
  12. Oxford Visual Geometry Group. Multi-view datasets. https://www.robots.ox.ac.uk/~vgg/data/mview/.
  13. Richard Hartley. In defense of the eight-point algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6):580–593, 1997.
  14. Critical configurations for projective reconstruction from multiple views. Int. J. Comput. Vis., 71(1):5–47, 2007.
  15. Multiple View Geometry in Computer Vision. Cambridge University Press, 2 edition, 2004.
  16. Projective bundle adjustment from arbitrary initialization using the variable projection method. In ECCV 2016, pages 477–493. Springer, 2016.
  17. Revisiting the variable projection method for separable nonlinear least squares problems. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5939–5947, 2017.
  18. expOSE: Accurate initialization-free projective factorization using exponential regularization. In IEEE Conf. Comput. Vis. Pattern Recog., pages 8959–8968, 2023.
  19. A global linear method for camera pose registration. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013.
  20. Multiple view geometry under the L∞subscript𝐿L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-norm. IEEE Trans. Pattern Anal. Mach. Intell., 30(9):1603–1617, 2008.
  21. Critical configurations for n-view projective reconstruction. In IEEE Conf. Comput. Vis. Pattern Recog., 2001.
  22. Practical global optimization for multiview geometry. Int. J. Comput. Vis., 79(3):271–284, 2008.
  23. Gpsfm: Global projective sfm using algebraic constraints on multi-view fundamental matrices. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019a.
  24. Algebraic characterization of essential matrices and their averaging in multiview settings. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019b.
  25. Quasiconvex optimization for robust geometric reconstruction. IEEE Trans. Pattern Anal. Mach. Intell., 29(10):1834–1847, 2007.
  26. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
  27. Kurt Konolige. Sparse sparse bundle adjustment. In Brit. Mach. Vis. Conf., pages 1–11, 2010.
  28. Practical projective structure from motion (p2sfm). In Int. Conf. Comput. Vis., pages 39–47, 2017.
  29. Nerf: Representing scenes as neural radiance fields for view synthesis, 2020. cite arxiv:2003.08934Comment: ECCV 2020 (oral). Project page with videos and code: http://tancik.com/nerf.
  30. Deep permutation-equivariant sfm: Github repository. https://github.com/drormoran/Equivariant-SFM/.
  31. Deep permutation equivariant structure from motion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5976–5986, 2021.
  32. David Nister. An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6):756–770, 2004.
  33. Stable structure from motion for unordered image collections. In Scand. Conf. Image Analysis, pages 524–535. Springer, 2011.
  34. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
  35. Neurora: Neural robust rotation averaging. In Computer Vision – ECCV 2020, pages 137–154, Cham, 2020. Springer International Publishing.
  36. Johannes L. Schönberger. Colmap code. https://colmap.github.io/.
  37. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  38. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), 2016.
  39. A factorization based algorithm for multi-image projective structure and motion. In Computer Vision — ECCV ’96, pages 709–720, Berlin, Heidelberg, 1996. Springer Berlin Heidelberg.
  40. LoFTR: Detector-free local feature matching with transformers. In IEEE Conf. Comput. Vis. Pattern Recog., pages 8922–8931, 2021.
  41. Bundle adjustment - a modern synthesis. page 298–372, Berlin, Heidelberg, 1999. Springer-Verlag.
  42. Attention is all you need. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2017.
  43. Graph attention networks. In International Conference on Learning Representations, 2018.
  44. PoseDiffusion: Solving pose estimation via diffusion-aided bundle adjustment. 2023.
  45. NeRF−⁣−--- -: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021.
  46. Deepsfm: Structure from motion via deep bundle adjustment. In ECCV, 2020.
  47. Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo. In ICCV, 2021.
  48. pOSE: Pseudo object space error for initialization-free bundle adjustment. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1876–1885, 2018.
Citations (6)

Summary

We haven't generated a summary for this paper yet.