Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dense Intrinsic Appearance Flow for Human Pose Transfer (1903.11326v1)

Published 27 Mar 2019 in cs.CV

Abstract: We present a novel approach for the task of human pose transfer, which aims at synthesizing a new image of a person from an input image of that person and a target pose. We address the issues of limited correspondences identified between keypoints only and invisible pixels due to self-occlusion. Unlike existing methods, we propose to estimate dense and intrinsic 3D appearance flow to better guide the transfer of pixels between poses. In particular, we wish to generate the 3D flow from just the reference and target poses. Training a network for this purpose is non-trivial, especially when the annotations for 3D appearance flow are scarce by nature. We address this problem through a flow synthesis stage. This is achieved by fitting a 3D model to the given pose pair and project them back to the 2D plane to compute the dense appearance flow for training. The synthesized ground-truths are then used to train a feedforward network for efficient mapping from the input and target skeleton poses to the 3D appearance flow. With the appearance flow, we perform feature warping on the input image and generate a photorealistic image of the target pose. Extensive results on DeepFashion and Market-1501 datasets demonstrate the effectiveness of our approach over existing methods. Our code is available at http://mmlab.ie.cuhk.edu.hk/projects/pose-transfer

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yining Li (29 papers)
  2. Chen Huang (88 papers)
  3. Chen Change Loy (288 papers)
Citations (168)

Summary

  • The paper introduces a novel Dense Intrinsic Appearance Flow method that precisely estimates dense 3D appearance flow for superior human pose transfer.
  • It employs a dual-path U-Net and flow regression to seamlessly handle spatial discrepancies and occlusions in human image synthesis.
  • The system achieves improved SSIM and Fashion Inception Score, highlighting its effectiveness in photorealistic image generation for virtual and augmented reality.

Dense Intrinsic Appearance Flow for Human Pose Transfer

The paper introduces a novel technique for the human pose transfer task, leveraging a methodology termed "Dense Intrinsic Appearance Flow." The research is situated in the domain of computer vision with implications for image synthesis and manipulation, where the objective is to generate a high-quality, photorealistic image of a person in a new target pose based on an existing image.

The approach distinguishes itself from previous works by emphasizing the estimation of dense 3D appearance flow between reference and target poses, thus allowing for a more accurate pixel transfer across different poses. The paper proposes a pipeline where a 3D model is fit to the pose pairs. This fitting facilitates the computation of dense appearance flow necessary for training a feedforward neural network.

Key Contributions and Methodology

  • Training with Synthesized Ground-truths: Due to the scarcity of annotated 3D appearance flow data, the research involves generating synthesized ground-truths by projecting a 3D model onto the 2D plane. This projection yields the dense appearance flow maps necessary for training the network.
  • Architecture: The proposed system integrates flow synthesis, feature warping, and image generation components. It employs a dual-path U-Net for encoding the input image and target pose separately, which is crucial for handling large spatial discrepancies and non-rigid deformations inherent in pose transfer tasks.
  • Flow Regression Module: This module, comprising a U-Net architecture, predicts the 3D appearance flow and a visibility map using only the pair of input and target poses. The visibility map handles occlusions, augmenting the efficacy of the synthesis.
  • End-to-End Training: The system is trained end-to-end with adversarial and reconstruction losses, enabling the generation of high-quality images while preserving high-frequency details essential in human imagery, such as clothing textures and facial details.

Performance and Results

The system demonstrates superior performance over existing methodologies in generating human images under transformed poses. Evaluations on datasets like DeepFashion and Market-1501 highlight the improved quantitative measures including Structure Similarity Index (SSIM) and Fashion Inception Score, alongside metrics like attrition retention rate which emphasize the preservation of distinct clothing attributes.

Implications and Future Work

The implications of this research are particularly relevant for applications in video synthesis, virtual reality, and digital fashion, where realistic human rendering in arbitrary poses is critically valuable. The proposed methodology also has the potential to enhance data augmentation processes used in person re-identification tasks. Future work could explore extending this approach to more complex and dynamic scenes, incorporating elements of temporal coherence for video-based applications, or adapting the model for real-time use cases given the inherent computational demands of 3D model generation and flow estimation.

This paper provides a thorough investigation into leveraging dense appearance flows for pose transfer, highlighting the importance of robust and accurate modeling of pixel correspondence in novel viewpoints. It marks a substantial contribution to the field, with potential avenues for further research and application expansion in artificial intelligence and human-computer interaction domains.