Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis (1909.12224v3)

Published 26 Sep 2019 in cs.CV, cs.LG, and eess.IV

Abstract: We tackle the human motion imitation, appearance transfer, and novel view synthesis within a unified framework, which means that the model once being trained can be used to handle all these tasks. The existing task-specific methods mainly use 2D keypoints (pose) to estimate the human body structure. However, they only expresses the position information with no abilities to characterize the personalized shape of the individual person and model the limbs rotations. In this paper, we propose to use a 3D body mesh recovery module to disentangle the pose and shape, which can not only model the joint location and rotation but also characterize the personalized body shape. To preserve the source information, such as texture, style, color, and face identity, we propose a Liquid Warping GAN with Liquid Warping Block (LWB) that propagates the source information in both image and feature spaces, and synthesizes an image with respect to the reference. Specifically, the source features are extracted by a denoising convolutional auto-encoder for characterizing the source identity well. Furthermore, our proposed method is able to support a more flexible warping from multiple sources. In addition, we build a new dataset, namely Impersonator (iPER) dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis. Extensive experiments demonstrate the effectiveness of our method in several aspects, such as robustness in occlusion case and preserving face identity, shape consistency and clothes details. All codes and datasets are available on https://svip-lab.github.io/project/impersonator.html

Citations (243)

View on Semantic Scholar

Summary

The paper introduces a unified framework that leverages 3D body mesh recovery to disentangle pose and shape for enhanced image synthesis.
It employs a Liquid Warping Block in a multi-stream GAN to preserve image details and ensure consistent identity and texture transfer.
Experimental results on the iPER dataset demonstrate superior performance in human motion imitation, texture fidelity, and novel view synthesis.

Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer, and Novel View Synthesis

The paper introduces a comprehensive framework for addressing three complex tasks in human image synthesis: human motion imitation, appearance transfer, and novel view synthesis. The proposed framework, titled Liquid Warping GAN, leverages advanced techniques to provide a unified solution that facilitates seamless transitions between these tasks.

Core Contributions

Central to the approach is the use of a 3D body mesh recovery module that significantly extends beyond traditional methods by achieving detailed disentanglement of pose and shape. This allows the model to not only capture joint locations and rotations but also to accurately describe personalized body shapes, addressing the limitations of earlier 2D keypoint-based approaches.

Liquid Warping GAN integrates a Liquid Warping Block (LWB) to effectively propagate and retain source image information, such as texture, style, color, and identity, across tasks. The LWB is adept at handling information preservation in both the image and feature spaces, promoting more realistic and identity-consistent outputs.

Methodology

Body Mesh Recovery: Utilizes SMPL (Skinned Multi-Person Linear) model for 3D body mesh reconstruction, ensuring precise capture of body shape and joint rotations.
Flow Composition Module: Computes a transformation flow to align source and reference inputs by mapping between 3D mesh correspondences.
Liquid Warping GAN: Employs a multi-stream generator architecture featuring:
- G $_{BG}$ : Synthesizes the background with emphasis on realism.
- G $_{SID}$ : Utilizes a denoising auto-encoder for source image reconstruction, ensuring feature preservation.
- G $_{TSF}$ : Synthesizes the output image under different conditions, augmented by LWB-supported multi-source feature propagation.

Experimental Results

The novel Impersonator (iPER) dataset was introduced to rigorously evaluate the framework's potential. Extensive experiments demonstrate the model's ability to surpass previous task-specific solutions. It achieves robust performance through:

Preserving body shape consistency even when imitating poses from differently sized reference subjects.
Maintaining high fidelity in texture, face identity, and finer details, effectively tackling occlusions and challenging angles.
Excelling in both self-imitation and cross-imitation evaluations.

Ablation studies further confirm the superiority of the LWB over other warping strategies such as early concatenation and feature warping in terms of perceptual quality and identity retention.

Implications and Future Directions

The framework's design both theoretically and practically enhances the landscape of human image synthesis. By presenting a flexible yet powerful approach, it opens avenues for novel applications in virtual environments, real-time avatar control, and beyond. Future work could extend to enhancing temporal coherence in video-based applications and exploring comprehensive integration with interactive systems.

Overall, the Liquid Warping GAN signifies a strong step toward unifying complex synthesis tasks, harnessing 3D modeling and GAN technologies to advance this multifaceted domain.

PDF Markdown