Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Consistent and Asymptotically Statistically-Efficient Solution to Camera Motion Estimation (2403.01174v1)

Published 2 Mar 2024 in cs.CV

Abstract: Given 2D point correspondences between an image pair, inferring the camera motion is a fundamental issue in the computer vision community. The existing works generally set out from the epipolar constraint and estimate the essential matrix, which is not optimal in the maximum likelihood (ML) sense. In this paper, we dive into the original measurement model with respect to the rotation matrix and normalized translation vector and formulate the ML problem. We then propose a two-step algorithm to solve it: In the first step, we estimate the variance of measurement noises and devise a consistent estimator based on bias elimination; In the second step, we execute a one-step Gauss-Newton iteration on manifold to refine the consistent estimate. We prove that the proposed estimate owns the same asymptotic statistical properties as the ML estimate: The first is consistency, i.e., the estimate converges to the ground truth as the point number increases; The second is asymptotic efficiency, i.e., the mean squared error of the estimate converges to the theoretical lower bound -- Cramer-Rao bound. In addition, we show that our algorithm has linear time complexity. These appealing characteristics endow our estimator with a great advantage in the case of dense point correspondences. Experiments on both synthetic data and real images demonstrate that when the point number reaches the order of hundreds, our estimator outperforms the state-of-the-art ones in terms of estimation accuracy and CPU time.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 3, pp. 611–625, 2017.
  2. P.-E. Sarlin, P. Lindenberger, V. Larsson, and M. Pollefeys, “Pixel-perfect structure-from-motion with featuremetric refinement,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, DOI: 10.1109/TPAMI.2023.3237269.
  3. H. Zhan, C. S. Weerasekera, J.-W. Bian, and I. Reid, “Visual odometry revisited: What should be learnt?” in Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 4203–4210.
  4. R. I. Hartley, “In defense of the eight-point algorithm,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 6, pp. 580–593, 1997.
  5. D. Nistér, “An efficient solution to the five-point relative pose problem,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 6, pp. 756–770, 2004.
  6. B. Li, L. Heng, G. H. Lee, and M. Pollefeys, “A 4-point algorithm for relative pose estimation of a calibrated camera with a known relative rotation angle,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2013, pp. 1595–1601.
  7. A. Chatterjee and V. M. Govindu, “Robust relative rotation averaging,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 958–972, 2017.
  8. D. Zou and P. Tan, “Coslam: Collaborative visual slam in dynamic environments,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 2, pp. 354–366, 2012.
  9. Y.-Y. Jau, R. Zhu, H. Su, and M. Chandraker, “Deep keypoint-based camera pose estimation with geometric constraints,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 4950–4957.
  10. J. Zhao, “An efficient solution to non-minimal case essential matrix estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 1777–1792, 2020.
  11. G. Chesi, “Camera displacement via constrained minimization of the algebraic error,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 370–375, 2008.
  12. U. Helmke, K. Hüper, P. Y. Lee, and J. Moore, “Essential matrix estimation using gauss-newton iterations on a manifold,” International Journal of Computer Vision, vol. 74, pp. 117–136, 2007.
  13. J. Briales, L. Kneip, and J. Gonzalez-Jimenez, “A certifiably globally optimal solution to the non-minimal relative pose problem,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 145–154.
  14. Y. Ding, D. Barath, J. Yang, H. Kong, and Z. Kukelova, “Globally optimal relative pose estimation with gravity prior,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 394–403.
  15. R. I. Hartley and F. Kahl, “Global optimization through rotation space search,” International Journal of Computer Vision, vol. 82, no. 1, pp. 64–79, 2009.
  16. N. Jiang, Z. Cui, and P. Tan, “A global linear method for camera pose registration,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2013, pp. 481–488.
  17. Y. Ma, J. Košecká, and S. Sastry, “Optimization criteria and geometric algorithms for motion and structure estimation,” International Journal of Computer Vision, vol. 44, pp. 219–249, 2001.
  18. R. Tron and K. Daniilidis, “The space of essential matrices as a riemannian quotient manifold,” SIAM Journal on Imaging Sciences, vol. 10, no. 3, pp. 1416–1445, 2017.
  19. M. Garcia-Salguero, J. Briales, and J. Gonzalez-Jimenez, “A tighter relaxation for the relative pose problem between cameras,” Journal of Mathematical Imaging and Vision, vol. 64, no. 5, pp. 493–505, 2022.
  20. H. Li, R. Hartley, and J.-h. Kim, “A linear approach to motion estimation using generalized camera models,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8.
  21. T. Schops, J. L. Schonberger, S. Galliani, T. Sattler, K. Schindler, M. Pollefeys, and A. Geiger, “A multi-view stereo benchmark with high-resolution images and multi-camera videos,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3260–3269.
  22. Z. Kukelova and T. Pajdla, “Two minimal problems for cameras with radial distortion,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2007, pp. 1–8.
  23. Z. Kukelova, M. Bujnak, and T. Pajdla, “Polynomial eigenvalue solutions to the 5-pt and 6-pt relative pose problems.” in Proceedings of British Machine Vision Conference (BMVC), 2008, pp. 56.1–56.10.
  24. H. Stewenius, C. Engels, and D. Nistér, “Recent developments on direct relative orientation,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 60, no. 4, pp. 284–294, 2006.
  25. L. Kneip, R. Siegwart, and M. Pollefeys, “Finding the exact rotation between two images independently of the translation,” in Proceedings of European Conference on Computer Vision (ECCV), 2012, pp. 696–709.
  26. H. C. Longuet-Higgins, “A computer algorithm for reconstructing a scene from two projections,” Nature, vol. 293, no. 5828, pp. 133–135, 1981.
  27. D. Barath and J. Matas, “Graph-cut ransac: Local optimization on spatially coherent structures,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4961–4974, 2021.
  28. R. Subbarao, Y. Genc, and P. Meer, “Robust unambiguous parametrization of the essential manifold,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8.
  29. L. Kneip and S. Lynen, “Direct optimization of frame-to-frame rotation,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2013, pp. 2352–2359.
  30. M. Garcia-Salguero, J. Briales, and J. Gonzalez-Jimenez, “Certifiable relative pose estimation,” Image and Vision Computing, vol. 109, 2021, Art. no. 104142.
  31. V. Lepetit, F. Moreno-Noguer, and P. Fua, “Epnp: An accurate o (n) solution to the pnp problem,” International Journal of Computer Vision, vol. 81, no. 2, pp. 155–166, 2009.
  32. J. A. Hesch and S. I. Roumeliotis, “A direct least-squares (dls) method for pnp,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2011, pp. 383–390.
  33. S. Urban, J. Leitloff, and S. Hinz, “Mlpnp-a real-time maximum likelihood solution to the perspective-n-point problem,” arXiv:1607.08112, 2016.
  34. Q. Cai, Y. Wu, L. Zhang, and P. Zhang, “Equivalent constraints for two-view geometry: Pose solution/pure rotation identification and 3d reconstruction,” International Journal of Computer Vision, vol. 127, pp. 163–180, 2019.
  35. C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021.
  36. C. Toft, W. Maddern, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic, T. Pajdla, F. Kahl, and T. Sattler, “Long-term visual localization revisited,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 2074–2088, 2020.
  37. A. Tonioni, M. Poggi, S. Mattoccia, and L. Di Stefano, “Unsupervised domain adaptation for depth prediction from images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 10, pp. 2396–2409, 2019.
  38. G. Zeng, B. Mu, L. Shi, J. Chen, and J. Wu, “Consistent and asymptotically efficient localization from range-difference measurements,” arXiv:2302.03311, 2023.
  39. G. Zeng, B. Mu, J. Chen, Z. Shi, and J. Wu, “Global and asymptotically efficient localization from range measurements,” IEEE Transactions on Signal Processing, vol. 70, pp. 5041–5057, 2022.
  40. P. Stoica and B. C. Ng, “On the cramér-rao bound under parametric constraints,” IEEE Signal Processing Letters, vol. 5, no. 7, pp. 177–179, 1998.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Guangyang Zeng (11 papers)
  2. Qingcheng Zeng (30 papers)
  3. Xinghan Li (7 papers)
  4. Biqiang Mu (23 papers)
  5. Jiming Chen (105 papers)
  6. Ling Shi (119 papers)
  7. Junfeng Wu (71 papers)

Summary

  • The paper introduces a two-step algorithm based on maximum likelihood estimation that delivers consistent and asymptotically statistically-efficient camera motion estimates.
  • The paper employs a noise variance estimator and a one-step Gauss-Newton iteration on rotation matrices and normalized translations to eliminate bias and refine estimates.
  • The method outperforms state-of-the-art techniques in accuracy and computational efficiency, demonstrating real-time applicability in visual odometry and SLAM.

Consistent and Asymptotically Statistically-Efficient Solution to Camera Motion Estimation

Introduction to the Problem

Camera Motion Estimation (CME) is the process of estimating the relative movement between two camera positions in space, given a pair of images. This task is pivotal in numerous computer vision applications such as visual odometry, Structure-from-Motion (SfM), and Simultaneous Localization and Mapping (SLAM). The conventional approach to this problem involves estimating the essential matrix through the epipolar geometry constraint, which, although popular, does not align with optimal maximum likelihood (ML) estimation principles due to its departure from the original measurement model involving rotation matrices and normalized translation vectors.

Unique Contribution

This paper introduces a two-step algorithm that provides consistent and asymptotically statistically-efficient estimates for CME directly from the original measurement model. This approach not only formalizes the ML problem with respect to rotation matrices and normalized translation vectors but also proposes a method to solve it optimally in the asymptotic regime where the number of point correspondences grows large.

Methodology Overview

  1. Noise Variance Estimation: The paper begins by devising a consistent estimator for the variance of measurement noises via calculating the maximum eigenvalue of a specifically derived matrix. This estimate is crucial for the subsequent process of bias elimination.
  2. Bias Elimination and Estimate Refinement: Using the estimated noise variance, the algorithm performs bias elimination and subsequently refines these estimates through a one-step Gauss-Newton (GN) iteration on the manifold that encompasses rotation matrices (SO(3)) and normalized translations (2-sphere).

Key Theoretical Insights

  • The proposed algorithm achieves consistency, meaning the estimates converge to the true values as the number of point correspondences increases.
  • It is asymptotically statistically-efficient, indicating that the mean squared error of the estimates asymptotically attains the Cramer-Rao lower bound, representing the theoretical limit of estimation accuracy.

Practical Implications and Performance

  • The algorithm demonstrates superior performance in terms of estimation accuracy and computational efficiency, outperforming several state-of-the-arts, especially as the number of point correspondences becomes large.
  • Given its linear time complexity, the algorithm promises real-time applicability in dense point correspondence scenarios, a significant advantage for applications demanding swift computation.
  • Through extensive experimentation on synthetic data and real images, the algorithm's robustness, and efficacy in the face of increasing data and varying conditions have been validated.

Future Research Directions

  • The analysis underscores the importance of avoiding degenerate configurations, such as coplanar points, that challenge the underlying assumptions of the algorithm. This insight may guide future research toward more robust feature selection mechanisms or integrating additional sensor data to enhance estimation reliability across diverse scenarios.

Conclusion

The paper successfully addresses a fundamental issue in CME by leveraging the original measurement model for maximizing likelihood estimation. The resulting algorithm, notable for its theoretical rigor and practical efficiency, sets a new benchmark for achieving high-accuracy camera motion estimates in computer vision tasks.

X Twitter Logo Streamline Icon: https://streamlinehq.com