Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks (1604.03650v1)

Published 13 Apr 2016 in cs.CV

Abstract: As 3D movie viewing becomes mainstream and Virtual Reality (VR) market emerges, the demand for 3D contents is growing rapidly. Producing 3D videos, however, remains challenging. In this paper we propose to use deep neural networks for automatically converting 2D videos and images to stereoscopic 3D format. In contrast to previous automatic 2D-to-3D conversion algorithms, which have separate stages and need ground truth depth map as supervision, our approach is trained end-to-end directly on stereo pairs extracted from 3D movies. This novel training scheme makes it possible to exploit orders of magnitude more data and significantly increases performance. Indeed, Deep3D outperforms baselines in both quantitative and human subject evaluations.

Citations (418)

View on Semantic Scholar

Summary

The paper presents Deep3D, a deep convolutional neural network model designed for fully automatic conversion of 2D images and video into various 3D formats using a diverse dataset of 27 movies.
Deep3D supports multiple 3D output formats, including anaglyph, side-by-side for modern hardware like Oculus, and GIF, to accommodate various viewing preferences and technologies.
This automatic conversion method has significant implications for reducing production costs in the entertainment industry and extends applicability to areas like virtual and augmented reality.

Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks

The supplementary material for the paper titled "Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks" offers an in-depth exploration of the methods and datasets utilized in advancing the field of 2D-to-3D video conversion. This research is firmly positioned within the domain of computer vision and leverages the power of deep convolutional neural networks (CNNs) for automated 3D content generation, providing a meaningful contribution to multimedia applications.

Overview of Formats

The paper introduces several formats used for the generation of 3D images and video content from 2D sources. The 3D image outputs are showcased in anaglyph 3D, side-by-side 3D, and GIF formats. Each of these formats offers different viewing modes:

Anaglyph 3D: Utilizes color filtering, viewable through red-blue glasses, accommodating accessibility for traditional 3D consumption.
Side-by-Side 3D: Maximizes compatibility with modern 3D hardware such as Oculus devices and contemporary 3D televisions by aligning left and right images with compressed horizontal resolution.
GIF: Provides a straightforward approach for experiencing the 3D effect on standard displays without additional equipment.

These formats are key to broadening the accessibility of 3D visual content by addressing a wide range of viewing preferences and technological capabilities.

3D Movie Clips

Beyond static images, the research expands into dynamic content by generating 3D movie clips, further demonstrating the robustness of the Deep3D network. The output is available in anaglyph and side-by-side formats, encoded using the X264 codec, which aligns with prevalent practices for video compression, balancing quality and compatibility standards for tools such as the VLC media player.

Dataset Utilization

A pivotal component of this work is the dataset comprising 27 distinct movies, with 18 earmarked for training and 9 for testing the CNN’s efficacy. The curated collection represents a diverse array of cinematic content, ranging from action and fantasy films to science fiction and historical narratives. This selection underscores the importance of diversity in training data to enhance the generalizability and versatility of the 3D conversion algorithm.

Specifically, the dataset includes well-known titles such as "Guardians of the Galaxy" and "Mad Max: Fury Road" for training, and "Gravity" and "The Hobbit: The Desolation of Smaug" for testing. These selections emphasize the algorithm's capability to handle varied visual styles and cinematic themes, essential for the broad application of the conversion technology in real-world scenarios.

Implications and Future Outlook

This research offers significant practical implications, particularly in the entertainment industry, where there is an ongoing demand for 3D content. The ability to automate 2D-to-3D conversion using convolutional neural networks could potentially reduce production costs and timescales associated with traditional 3D film production processes. Moreover, the methodology could be extended to other domains such as virtual reality and augmented reality, where depth perception is a critical component.

Theoretically, this research contributes to the understanding of how deep learning models can be applied to spatial transformations of visual content. The insights gained from this paper could propel further advancements in how CNNs are utilized for tasks beyond traditional image recognition, potentially spurring developments in cross-domain applications involving depth inference.

As research progresses, future developments might explore enhancements in model architectures, training strategies, and dataset collections to increase the fidelity and realism of the converted 3D content. Additionally, advancements could focus on optimizing computational efficiency to better facilitate real-time applications and expand the reach of this technology across various platforms and devices.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now