Deep Depth From Focus (1704.01085v3)

Published 4 Apr 2017 in cs.CV

Abstract: Depth from focus (DFF) is one of the classical ill-posed inverse problems in computer vision. Most approaches recover the depth at each pixel based on the focal setting which exhibits maximal sharpness. Yet, it is not obvious how to reliably estimate the sharpness level, particularly in low-textured areas. In this paper, we propose Deep Depth From Focus (DDFF)' as the first end-to-end learning approach to this problem. One of the main challenges we face is the hunger for data of deep neural networks. In order to obtain a significant amount of focal stacks with corresponding groundtruth depth, we propose to leverage a light-field camera with a co-calibrated RGB-D sensor. This allows us to digitally create focal stacks of varying sizes. Compared to existing benchmarks our dataset is 25 times larger, enabling the use of machine learning for this inverse problem. We compare our results with state-of-the-art DFF methods and we also analyze the effect of several key deep architectural components. These experiments show that our proposed methodDDFFNet' achieves state-of-the-art performance in all scenes, reducing depth error by more than 75% compared to the classical DFF methods.

Citations (80)

View on Semantic Scholar

Summary

The paper introduces DDFFNet, an end-to-end deep learning model that reduces depth error by over 75% compared to traditional methods.
The paper details a novel dataset generated from a light-field camera and RGB-D sensor, offering 720 images across 12 indoor scenes for robust training.
The paper demonstrates near real-time performance, processing frames in 0.6 seconds on an NVidia Pascal Titan X GPU, showcasing its practical application.

Deep Depth From Focus: A Technical Overview

The paper "Deep Depth From Focus" presents a pioneering approach to the classical challenge of depth from focus (DFF) in computer vision. DFF involves reconstructing a pixel-accurate disparity map using a stack of images captured at varying optical focus settings. However, DFF is an ill-posed problem, exacerbated in low-textured areas where traditional sharpness estimation proves unreliable. This paper introduces "Deep Depth From Focus (DDFF)" as the first end-to-end learning solution to this problem, leveraging deep neural networks to outperform conventional methods.

Methodology and Dataset

To overcome the high data demand characteristic of deep learning techniques, the authors deployed a light-field camera combined with a co-calibrated RGB-D sensor, creating a novel and extensive dataset. This setup allowed for the digital creation of focal stacks from single photographic exposures, avoiding practical challenges such as inconsistent illumination and motion artifacts present in manual focal adjustment. The resulting dataset, DDFF 12-Scene, comprises 720 images across 12 indoor environments with groundtruth depth maps. By dramatically increasing data availability — the dataset is 25 times larger than previous benchmarks — the authors enabled the application of machine learning models to DFF with real-world data.

Proposed DDFFNet Architecture

The paper presents DDFFNet, an auto-encoder-style convolutional neural network (CNN) designed to produce disparity maps from focal stack inputs. The encoder mirrors the well-established VGG-16 network, facilitating robust feature extraction, while the decoder part employs mirrored operations to reconstruct the input size, with several architectural variations explored. These include different upsampling methods and concatenation strategies to enhance edge sharpness in the disparity maps.

Experimental Results

Extensive comparisons with state-of-the-art DFF methods reveal significant performance improvements with DDFFNet, reflected in metrics such as mean squared error (MSE) and root mean square error (RMS). DDFFNet reduced depth errors by over 75% compared to classical approaches, achieving near real-time computation speeds of 0.6 seconds on an NVidia Pascal Titan X GPU. The paper also benchmarks against depth from light-field approaches, such as Lytro-generated depth and DFLF method adaptations, reaffirming the superior generalizability and accuracy of DDFFNet beyond the primary DFF domain.

Implications and Future Directions

The implications of this research are manifold, suggesting potential applications in areas requiring precise depth estimation, such as robotics, augmented reality, and advanced imaging systems. The introduction of DDFFNet marks a substantial step towards a practical and accurate solution for DFF, enabling deeper exploration into the integration of deep learning for optical flow estimation and semantic segmentation within ill-posed problems.

Future developments may focus on enhancing the network's ability to generalize across diverse camera systems, further refining depth estimation accuracy in challenging environmental conditions. The adaptability of the presented methods to integrate with varied data sources, as demonstrated by the preliminary results on a mobile depth from focus dataset, indicates potential for broader practical deployments across different computational platforms.

The rigorous exploration and documentation of architectural variations offer valuable insights for subsequent AI research, advocating for continued experimentation with CNN designs to address inherently complex vision tasks.

PDF Markdown

Related Papers

YouTube

Show All Videos