Consistent and Asymptotically Statistically-Efficient Solution to Camera Motion Estimation (2403.01174v1)

Published 2 Mar 2024 in cs.CV

Abstract: Given 2D point correspondences between an image pair, inferring the camera motion is a fundamental issue in the computer vision community. The existing works generally set out from the epipolar constraint and estimate the essential matrix, which is not optimal in the maximum likelihood (ML) sense. In this paper, we dive into the original measurement model with respect to the rotation matrix and normalized translation vector and formulate the ML problem. We then propose a two-step algorithm to solve it: In the first step, we estimate the variance of measurement noises and devise a consistent estimator based on bias elimination; In the second step, we execute a one-step Gauss-Newton iteration on manifold to refine the consistent estimate. We prove that the proposed estimate owns the same asymptotic statistical properties as the ML estimate: The first is consistency, i.e., the estimate converges to the ground truth as the point number increases; The second is asymptotic efficiency, i.e., the mean squared error of the estimate converges to the theoretical lower bound -- Cramer-Rao bound. In addition, we show that our algorithm has linear time complexity. These appealing characteristics endow our estimator with a great advantage in the case of dense point correspondences. Experiments on both synthetic data and real images demonstrate that when the point number reaches the order of hundreds, our estimator outperforms the state-of-the-art ones in terms of estimation accuracy and CPU time.

References (40)

Authors (7)

Guangyang Zeng (11 papers)
Qingcheng Zeng (30 papers)
Xinghan Li (7 papers)
Biqiang Mu (23 papers)
Jiming Chen (105 papers)
Ling Shi (119 papers)
Junfeng Wu (71 papers)

Summary

The paper introduces a two-step algorithm based on maximum likelihood estimation that delivers consistent and asymptotically statistically-efficient camera motion estimates.
The paper employs a noise variance estimator and a one-step Gauss-Newton iteration on rotation matrices and normalized translations to eliminate bias and refine estimates.
The method outperforms state-of-the-art techniques in accuracy and computational efficiency, demonstrating real-time applicability in visual odometry and SLAM.

Consistent and Asymptotically Statistically-Efficient Solution to Camera Motion Estimation

Introduction to the Problem

Camera Motion Estimation (CME) is the process of estimating the relative movement between two camera positions in space, given a pair of images. This task is pivotal in numerous computer vision applications such as visual odometry, Structure-from-Motion (SfM), and Simultaneous Localization and Mapping (SLAM). The conventional approach to this problem involves estimating the essential matrix through the epipolar geometry constraint, which, although popular, does not align with optimal maximum likelihood (ML) estimation principles due to its departure from the original measurement model involving rotation matrices and normalized translation vectors.

Unique Contribution

This paper introduces a two-step algorithm that provides consistent and asymptotically statistically-efficient estimates for CME directly from the original measurement model. This approach not only formalizes the ML problem with respect to rotation matrices and normalized translation vectors but also proposes a method to solve it optimally in the asymptotic regime where the number of point correspondences grows large.

Methodology Overview

Noise Variance Estimation: The paper begins by devising a consistent estimator for the variance of measurement noises via calculating the maximum eigenvalue of a specifically derived matrix. This estimate is crucial for the subsequent process of bias elimination.
Bias Elimination and Estimate Refinement: Using the estimated noise variance, the algorithm performs bias elimination and subsequently refines these estimates through a one-step Gauss-Newton (GN) iteration on the manifold that encompasses rotation matrices (SO(3)) and normalized translations (2-sphere).

Key Theoretical Insights

The proposed algorithm achieves consistency, meaning the estimates converge to the true values as the number of point correspondences increases.
It is asymptotically statistically-efficient, indicating that the mean squared error of the estimates asymptotically attains the Cramer-Rao lower bound, representing the theoretical limit of estimation accuracy.

Practical Implications and Performance

The algorithm demonstrates superior performance in terms of estimation accuracy and computational efficiency, outperforming several state-of-the-arts, especially as the number of point correspondences becomes large.
Given its linear time complexity, the algorithm promises real-time applicability in dense point correspondence scenarios, a significant advantage for applications demanding swift computation.
Through extensive experimentation on synthetic data and real images, the algorithm's robustness, and efficacy in the face of increasing data and varying conditions have been validated.

Future Research Directions

The analysis underscores the importance of avoiding degenerate configurations, such as coplanar points, that challenge the underlying assumptions of the algorithm. This insight may guide future research toward more robust feature selection mechanisms or integrating additional sensor data to enhance estimation reliability across diverse scenarios.

Conclusion

The paper successfully addresses a fundamental issue in CME by leveraging the original measurement model for maximizing likelihood estimation. The resulting algorithm, notable for its theoretical rigor and practical efficiency, sets a new benchmark for achieving high-accuracy camera motion estimates in computer vision tasks.

PDF Markdown

Tweets

https://twitter.com/zhenjun_zhao/status/1766809212795887729