Light Field Video Capture Using a Learning-Based Hybrid Imaging System (1705.02997v1)

Published 8 May 2017 in cs.CV and cs.GR

Abstract: Light field cameras have many advantages over traditional cameras, as they allow the user to change various camera settings after capture. However, capturing light fields requires a huge bandwidth to record the data: a modern light field camera can only take three images per second. This prevents current consumer light field cameras from capturing light field videos. Temporal interpolation at such extreme scale (10x, from 3 fps to 30 fps) is infeasible as too much information will be entirely missing between adjacent frames. Instead, we develop a hybrid imaging system, adding another standard video camera to capture the temporal information. Given a 3 fps light field sequence and a standard 30 fps 2D video, our system can then generate a full light field video at 30 fps. We adopt a learning-based approach, which can be decomposed into two steps: spatio-temporal flow estimation and appearance estimation. The flow estimation propagates the angular information from the light field sequence to the 2D video, so we can warp input images to the target view. The appearance estimation then combines these warped images to output the final pixels. The whole process is trained end-to-end using convolutional neural networks. Experimental results demonstrate that our algorithm outperforms current video interpolation methods, enabling consumer light field videography, and making applications such as refocusing and parallax view generation achievable on videos for the first time.

Citations (124)

View on Semantic Scholar

Summary

The paper introduces a CNN-based method that fuses sparse light field and dense video data to achieve 30 fps light field capture.
It uses spatio-temporal flow estimation and appearance synthesis networks to accurately align and reconstruct detailed visual scenes.
Experimental results demonstrate improved PSNR and SSIM scores, enabling advanced post-production refocusing and viewpoint adjustments.

Light Field Video Capture Using a Learning-Based Hybrid Imaging System

The paper presents an innovative approach to capturing light field videos using a hybrid imaging system that integrates a 3 fps light field camera with a standard 30 fps video camera. Given the limitations of current consumer light field cameras in delivering high frame-rate light field videos due to bandwidth constraints, the proposed system leverages a convolutional neural network (CNN)-based approach to generate full frame-rate light field videos. The resulting dataset enables the execution of applications like post-production refocusing and viewpoint adjustments on the video content.

Overview and Methodology

The central contribution of this paper is a novel methodology for constructing a full 30 fps light field video by fusing both spatial and angular information from the light field camera with high temporal resolution data from a standard video camera. The system focuses on an end-to-end learning framework comprising two core components:

Spatio-Temporal Flow Estimation: This component utilizes CNNs to assess flow fields that interconnect the angular information from the sparse light field sequence with the dense temporal data from the video frames. The network estimates disparities and temporal flows and uses them to correctly align and warp input frames to achieve the desired light field view seamlessly.
Appearance Estimation: After warping, the appearance estimation network synthesizes the final pixel values to generate the complete light field frames. This step relies heavily on combining appropriately warped images to enhance visual fidelity, ensuring that static elements and occlusions are accurately represented.

The researchers utilized paired convolutional architectures for both disparity and optical flow computation, carefully designed to balance computational efficiency with the output quality in terms of image details and motion continuity.

Experimental Results

The paper presents a comprehensive set of experiments underscoring the performance and advantages of the proposed hybrid system and methodology. The authors demonstrate improved frame interpolation and synthesis capabilities, outperforming existing video interpolation and light field super-resolution methods. Key measures include significant boosts in PSNR and SSIM scores indicating high reconstruction accuracy. Critical adjustments in network architecture and training data augment efficacy, providing rich, consistent scenery details across the captured light field content.

Implications and Future Prospects

This research has significant implications for democratizing light field videography, making it feasible for consumer-grade equipment to capture and manipulate high-quality scene reconstructions. The proposed paradigm enables new post-production capabilities in video editing, such as focal plane adjustment and dynamic aperture modifications.

Looking forward, advances might center around refining networks to handle broader motion range, enhancing training data diversity to improve reliability under diverse conditions, and possibly engaging in single-camera architectures through innovative sensor designs. Moreover, the prospect of integrating the approach into existing AR/VR systems could offer immersive visual experiences through dynamic real-time rendering and interaction.

Conclusion

In synthesizing temporal and spatial data through a learning-based approach, this paper addresses a pivotal limitation of contemporary light field technology. It lays groundwork not only for improved video synthesis but also exhibits potential in advancing computational photography more broadly by equipping consumer-level devices with near-professional capabilities to capture and manipulate complex visual scenes.

PDF Markdown

Related Papers

YouTube

Show All Videos