Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Epipolar Geometry based Learning of Multi-view Depth and Ego-Motion from Monocular Sequences (1812.11922v3)

Published 23 Dec 2018 in cs.RO and cs.CV

Abstract: Deep approaches to predict monocular depth and ego-motion have grown in recent years due to their ability to produce dense depth from monocular images. The main idea behind them is to optimize the photometric consistency over image sequences by warping one view into another, similar to direct visual odometry methods. One major drawback is that these methods infer depth from a single view, which might not effectively capture the relation between pixels. Moreover, simply minimizing the photometric loss does not ensure proper pixel correspondences, which is a key factor for accurate depth and pose estimations. In contrast, we propose a 2-view depth network to infer the scene depth from consecutive frames, thereby learning inter-pixel relationships. To ensure better correspondences, thereby better geometric understanding, we propose incorporating epipolar constraints to make the learning more geometrically sound. We use the Essential matrix obtained using Nist'er's Five Point Algorithm, to enforce meaningful geometric constraints, rather than using it as training labels. This allows us to use lesser no. of trainable parameters compared to state-of-the-art methods. The proposed method results in better depth images and pose estimates, which capture the scene structure and motion in a better way. Such a geometrically constrained learning performs successfully even in cases where simply minimizing the photometric error would fail.

Citations (9)

Summary

We haven't generated a summary for this paper yet.