Multi-View Image Generation from a Single-View (1704.04886v4)

Published 17 Apr 2017 in cs.CV and cs.MM

Abstract: This paper addresses a challenging problem -- how to generate multi-view cloth images from only a single view input. To generate realistic-looking images with different views from the input, we propose a new image generation model termed VariGANs that combines the strengths of the variational inference and the Generative Adversarial Networks (GANs). Our proposed VariGANs model generates the target image in a coarse-to-fine manner instead of a single pass which suffers from severe artifacts. It first performs variational inference to model global appearance of the object (e.g., shape and color) and produce a coarse image with a different view. Conditioned on the generated low resolution images, it then proceeds to perform adversarial learning to fill details and generate images of consistent details with the input. Extensive experiments conducted on two clothing datasets, MVC and DeepFashion, have demonstrated that images of a novel view generated by our model are more plausible than those generated by existing approaches, in terms of more consistent global appearance as well as richer and sharper details.

Citations (179)

View on Semantic Scholar

Summary

Multi-View Image Generation from a Single-View

The paper "Multi-View Image Generation from a Single-View" introduces a novel approach in computer vision for synthesizing multi-view images from a single view of a clothing item. The authors propose VariGANs, a model that combines variational inference with Generative Adversarial Networks (GANs), aimed at tackling the inherent challenges in generating realistic images from a variety of perspectives based on just one view.

Methodology

VariGANs are designed to overcome limitations in previous methods that struggled with modeling global appearances or produced images with artifacts. The core of VariGANs is its two-phase generator: the coarse generator utilizes variational inference to model and create low-resolution images that capture the basic shape and color of the object, while the fine generator employs adversarial learning to add realistic details and refine the image to high resolution.

The model comprises three primary components:

Coarse Image Generator: Uses variational inference for approximating the distribution $p(\widehat{I}_{v_j} | I_{v_i}, z)$ , optimized via a variational Bayesian approach. It generates a low-resolution image depicting basic features like shape and contour.
Fine Image Generator: This part refines the coarse image by adding detailed textures and colors, leveraging a skip-connection network (similar to U-Net architecture) to ensure details are consistent with the input view.
Conditional Discriminator: The discriminator checks the realism of generated images, enforcing consistency by contrasting them against real images from the target domain.

Experimental Results

The authors validate their approach on two clothing datasets: MVC and DeepFashion, which contain images with front, side, and back views. VariGANs demonstrate superior performance in generating plausible, realistic multi-view images, significantly enhancing both SSIM and inception scores compared to conditional VAE and GANs.

Implications and Future Directions

The paper opens up several practical applications, especially in e-commerce, where multiple product views are essential but costly to produce. The implications extend into AR/VR and digital content creation domains, where automatic generation of multi-view images can substantially reduce manual effort and costs.

Theoretically, the approach integrates generative learning with variational methods, tackling a complex image generation challenge. Future work could explore more refined models to further improve image detail, resolve artifact issues, and expand generalizability across more types of deformable objects.

This exploration in the fusion of GANs with variational inference could guide developments in other areas of AI, such as style transfer and image synthesis beyond clothing, ultimately broadening the scope and capabilities of generative models in visual computing.