Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis (1810.11610v2)

Published 27 Oct 2018 in cs.CV

Abstract: Despite remarkable advances in image synthesis research, existing works often fail in manipulating images under the context of large geometric transformations. Synthesizing person images conditioned on arbitrary poses is one of the most representative examples where the generation quality largely relies on the capability of identifying and modeling arbitrary transformations on different body parts. Current generative models are often built on local convolutions and overlook the key challenges (e.g. heavy occlusions, different views or dramatic appearance changes) when distinct geometric changes happen for each part, caused by arbitrary pose manipulations. This paper aims to resolve these challenges induced by geometric variability and spatial displacements via a new Soft-Gated Warping Generative Adversarial Network (Warping-GAN), which is composed of two stages: 1) it first synthesizes a target part segmentation map given a target pose, which depicts the region-level spatial layouts for guiding image synthesis with higher-level structure constraints; 2) the Warping-GAN equipped with a soft-gated warping-block learns feature-level mapping to render textures from the original image into the generated segmentation map. Warping-GAN is capable of controlling different transformation degrees given distinct target poses. Moreover, the proposed warping-block is light-weight and flexible enough to be injected into any networks. Human perceptual studies and quantitative evaluations demonstrate the superiority of our Warping-GAN that significantly outperforms all existing methods on two large datasets.

Authors (6)

Haoye Dong (21 papers)
Xiaodan Liang (318 papers)
Ke Gong (48 papers)
Hanjiang Lai (35 papers)
Jia Zhu (41 papers)
Jian Yin (67 papers)

Citations (168)

View on Semantic Scholar

Summary

Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis

The paper "Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis" introduces a novel approach to improve the generation of person images conditioned on arbitrary poses, focusing specifically on addressing the challenges associated with large geometric transformations. The proposed model, Soft-Gated Warping Generative Adversarial Network (Warping-GAN), seeks to overcome the inherent difficulties of significant spatial displacements and appearance variation induced by arbitrary pose manipulations. The framework is particularly aimed at handling issues like heavy occlusions and varying viewpoints which are typically not well-addressed by traditional convolutional models.

Soft-Gated Warping-GAN Architecture

The Warping-GAN model is a two-stage architecture. The first stage involves synthesizing a target part segmentation map given a target pose. This segmentation map serves as a region-level spatial layout guide to ensure generated images are coherent with respect to higher-level structural constraints. In the second stage, the Warping-GAN employs a soft-gated warping-block to learn feature-level mappings, effectively transferring textures from the original image into the synthesized segmentation map. This approach allows the model to dynamically adjust the degree of transformation based on the specific target pose, optimizing the rendering of textures and reducing common artifacts such as blurry boundaries and missing appearance details.

Technical Contributions

The primary contributions of the paper are centered around the innovative use of a soft-gated warping-block and the integration of part segmentation maps. The warping-block is both lightweight and versatile, capable of being inserted into various network architectures. Its differentiable transformation grid allows for effective feature map alignment, minimizing misalignments that lead to low-quality image synthesis. This ability to manage feature warping adaptively based on semantic parts of the image stands out as a key advantage over earlier methods.

Experimental Evaluation

The authors conducted extensive experiments to validate the effectiveness of their model using two large datasets: DeepFashion and Market-1501. The Warping-GAN demonstrated superior performance over several state-of-the-art methods, including PG2, BodyROI7, and DSCF. Quantitative metrics such as SSIM and IS, alongside human perceptual studies, confirm the model's ability to generate more visually coherent and realistic images across a wide range of poses.

Implications and Future Outlook

This paper's contributions extend beyond mere performance improvements; they also offer a robust methodology for tackling pose-dependent synthesis problems, which are prevalent in applications like virtual reality, animation, and fashion design. The strategic use of part segmentation maps and feature warping presents a significant step forward in overcoming the spatial misalignment challenges previously limiting generative models' capabilities.

Looking forward, the flexibility of the warping-block could be harnessed in broader contexts, potentially being adapted for use in other domains where alignment and transformation are critical. Additionally, future research could explore the integration of more advanced parsing mechanisms or explore multi-view synthesis to further enhance the realism and applicability of generated images.

Overall, this work clearly demonstrates the potential of coupling segmentation-based guidance with adaptive feature transformation to achieve high-quality results in pose-guided image synthesis, setting a precedent for future advancements in the field.