Synthetic Depth-of-Field with a Single-Camera Mobile Phone (1806.04171v1)

Published 11 Jun 2018 in cs.CV and cs.GR

Abstract: Shallow depth-of-field is commonly used by photographers to isolate a subject from a distracting background. However, standard cell phone cameras cannot produce such images optically, as their short focal lengths and small apertures capture nearly all-in-focus images. We present a system to computationally synthesize shallow depth-of-field images with a single mobile camera and a single button press. If the image is of a person, we use a person segmentation network to separate the person and their accessories from the background. If available, we also use dense dual-pixel auto-focus hardware, effectively a 2-sample light field with an approximately 1 millimeter baseline, to compute a dense depth map. These two signals are combined and used to render a defocused image. Our system can process a 5.4 megapixel image in 4 seconds on a mobile phone, is fully automatic, and is robust enough to be used by non-experts. The modular nature of our system allows it to degrade naturally in the absence of a dual-pixel sensor or a human subject.

Citations (167)

View on Semantic Scholar

Summary

The paper introduces a system that synthesizes shallow depth-of-field on mobile phones by fusing dual-pixel data with a three-stage U-Net segmentation network.
The method employs edge-aware filtering to refine segmentation and disparity maps, effectively emulating optical defocus in a single-camera setup.
The system processes a 5.4-megapixel image in about four seconds, demonstrating its feasibility for real-time mobile photography enhancements.

Overview of "Synthetic Depth-of-Field with a Single-Camera Mobile Phone"

The paper "Synthetic Depth-of-Field with a Single-Camera Mobile Phone" presents a computational photography system that synthesizes shallow depth-of-field images using a single-camera mobile phone. This research is significant for its practical application in enhancing the photographic capabilities of mobile phones which inherently lack the optical features found in DSLR cameras required for producing images with a pronounced depth-of-field effect.

Traditionally, achieving shallow depth-of-field in photography requires large aperture lenses to isolate the subject with a blurred background. However, mobile phones, due to their small apertures and compact sensors, capture almost everything in focus. The system in this paper circumvents this limitation by employing a combination of image segmentation and depth estimation using dual-pixel (DP) auto-focus hardware, paving the way for advanced imaging features without requiring additional hardware such as multiple cameras or depth sensors.

System Design and Implementation

The proposed system is designed to perform effectively on a mobile platform with the following key components:

Person Segmentation Network: The system utilizes a neural network to perform semantic segmentation of people and their accessories in images. The design leverages a three-stage U-Net architecture to ensure both efficiency and high segmentation accuracy, tuned specifically for mobile processing requirements.
Disparity Estimation from Dual-Pixels: The DP auto-focus feature available in modern smartphones is exploited to derive depth information. The dual-pixel data, providing a small baseline stereo view, is processed to calculate disparity fields that represent the depth in the scene.
Edge-Aware Filtering and Rendering: The segmentation masks and disparity maps are refined using edge-aware filtering techniques. The rendering algorithm simulates lens effects by mapping depth information to blur radii, emulating sensorial defocus with spatial accuracy.

Performance and Results

The system is capable of processing a 5.4-megapixel image in approximately four seconds on modern smartphones, showcasing its suitability for real-time applications. The effectiveness of the implemented system is validated through extensive testing, including user studies comparing the output with methods from Shen et al. and Barron et al., with the proposed method outperforming in terms of user preference for image quality.

The deployment of this technique in real-world applications, as exemplified by its use in Google's "Portrait Mode," illustrates the practical impact of the research on consumer technology, rendering DSLR-like photographic quality accessible on widely available mobile platforms.

Implications and Future Directions

The implications of this research extend beyond just enhancing photographic capabilities. The ability to generate separated foreground and background layers could find applications in AR, VR, and AI training datasets that require depth information or segmentation.

Future developments could explore broadening segmentation capabilities to include objects beyond people, such as animals and various common photography subjects. Additionally, as computational photography gains traction, blending artistic non-realism with realism could evolve, diversifying the ways photographers interact with digital images.

In conclusion, this paper significantly contributes to the field of computational photography, demonstrating a robust application of machine learning and image processing techniques to overcome physical limitations in mobile camera hardware. As technology advances, the foundations laid by this research can spur future innovations in mobile imaging and beyond.

PDF Markdown

Related Papers

YouTube

Show All Videos