EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images (1804.02379v1)

Published 6 Apr 2018 in cs.CV

Abstract: Light field cameras capture both the spatial and the angular properties of light rays in space. Due to its property, one can compute the depth from light fields in uncontrolled lighting environments, which is a big advantage over active sensing devices. Depth computed from light fields can be used for many applications including 3D modelling and refocusing. However, light field images from hand-held cameras have very narrow baselines with noise, making the depth estimation difficult. any approaches have been proposed to overcome these limitations for the light field depth estimation, but there is a clear trade-off between the accuracy and the speed in these methods. In this paper, we introduce a fast and accurate light field depth estimation method based on a fully-convolutional neural network. Our network is designed by considering the light field geometry and we also overcome the lack of training data by proposing light field specific data augmentation methods. We achieved the top rank in the HCI 4D Light Field Benchmark on most metrics, and we also demonstrate the effectiveness of the proposed method on real-world light-field images.

Citations (234)

View on Semantic Scholar

Summary

The paper introduces EPINET, a fully-convolutional network that leverages epipolar geometry to achieve rapid and precise depth estimation from light field images.
It employs a multi-stream architecture to process angular sub-aperture images separately, significantly enhancing depth prediction accuracy over previous methods.
A novel data augmentation strategy respecting geometric constraints boosts training efficiency and delivers top benchmark performance on the HCI 4D Light Field Benchmark.

EPINET: A Fully-Convolutional Neural Network Utilizing Epipolar Geometry for Depth Estimation from Light Field Images

This paper introduces EPINET, a fully convolutional neural network architecture designed to perform depth estimation on images captured by light field cameras. These cameras are distinguished by their ability to record both the spatial and angular properties of light, thus enabling depth computation in scenarios with uncontrolled lighting. This key advantage positions light field cameras above active sensing devices constrained to controlled environments.

The paper identifies a significant limitation of hand-held light field cameras: their narrow baseline and associated noise, which complicates accurate depth estimation. Earlier methods to address these challenges have suffered from a trade-off between speed and accuracy. The authors propose a novel deep learning approach that effectively balances these conflicting requirements.

The architecture of EPINET is methodically constructed to harness the geometric attributes of light field images. The network is composed of multi-stream and merging components that separately process sub-aperture images from four distinct angular directions and subsequently integrate these to produce a unified representation for depth estimation. This structure enables the network to efficiently handle direction-specific features, leading to enhanced depth prediction reliability.

A significant barrier to implementing machine learning techniques in this domain is the scarcity of appropriately annotated datasets. To mitigate this, the authors propose a set of specialized data augmentation methods that adhere to the geometric constraints inherent in light field imaging. These methods include adjusting the viewpoint, rotation, scaling, and modifying color parameters, thereby increasing available data for training without compromising the structural integrity necessary for accurate depth estimation.

Empirically, the results speak highly of the proposed methodology. The network achieved top performance on the HCI 4D Light Field Benchmark across most relevant metrics, including bad pixel ratio and mean square error, and substantially surpasses previous methods in terms of runtime efficiency—showcasing its capability to deliver precise sub-pixel depth maps rapidly. Furthermore, by reducing the computational complexity without loss of detail, EPINET becomes practical for real-world application scenarios.

In examining the broader implications, EPINET exhibits substantial potential applications ranging from 3D modeling to enhanced photography and augmented reality. It introduces a framework for leveraging convolutional neural networks in light field image processing that could inspire future research directions. The paper hints at further improvements by integrating additional cues such as photometric information and material properties, potentially leading to even more robust depth estimation.

In summation, this work makes significant strides in using deep learning for light field depth estimation. It innovatively resolves longstanding issues related to depth-computation accuracy and processing speed, while contributing a significant architectural advancement through its epipolar geometry-based design. The proposed data augmentation strategy further underscores the adaptability and applicability of the network across various datasets and scenarios. Future explorations could expand upon this work by diversifying the training datasets and introducing comprehensive multimodal information incorporation, thereby advancing the overall efficacy and versatility of light field depth estimation methodologies.

PDF Markdown

EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images (1804.02379v1)

Summary

EPINET: A Fully-Convolutional Neural Network Utilizing Epipolar Geometry for Depth Estimation from Light Field Images

Related Papers