Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Progressive Pose Attention Transfer for Person Image Generation (1904.03349v3)

Published 6 Apr 2019 in cs.CV

Abstract: This paper proposes a new generative adversarial network for pose transfer, i.e., transferring the pose of a given person to a target pose. The generator of the network comprises a sequence of Pose-Attentional Transfer Blocks that each transfers certain regions it attends to, generating the person image progressively. Compared with those in previous works, our generated person images possess better appearance consistency and shape consistency with the input images, thus significantly more realistic-looking. The efficacy and efficiency of the proposed network are validated both qualitatively and quantitatively on Market-1501 and DeepFashion. Furthermore, the proposed architecture can generate training images for person re-identification, alleviating data insufficiency. Codes and models are available at: https://github.com/tengteng95/Pose-Transfer.git.

Citations (292)

Summary

  • The paper introduces a progressive pose attention transfer GAN using a cascade of Pose-Attentional Transfer Blocks to transform person images to target poses.
  • Evaluations on Market-1501 and DeepFashion datasets show superior image quality, efficiency, and improved metrics like SSIM, IS, and PCKh compared to previous methods.
  • The method significantly enhances data augmentation for person re-identification and shows potential for transferring poses of other non-rigid objects.

Progressive Pose Attention Transfer for Person Image Generation

The paper "Progressive Pose Attention Transfer for Person Image Generation" introduces a novel generative adversarial network (GAN) aimed at the task of pose transfer in person image generation. The primary objective of this work is to transform the pose of an individual in an input image to a target pose while preserving appearance and shape consistency to produce visually realistic results.

The proposed network architecture relies on a sequence of Pose-Attentional Transfer Blocks (PATBs), where each block attends to and transfers specific regions progressively to generate the final person image. This multi-stage approach contrasts with previous methods that utilize a one-step transfer, aiming to address the variability and complexity of pose and view transformations more effectively. The GAN framework incorporates two discriminators to ensure both appearance and shape consistency between the generated and target images.

The contribution of this paper is twofold: firstly, the introduction of a progressive pose attention transfer scheme that allows for efficient and effective pose transformations; secondly, the design of a cascade of PATBs that utilize attention mechanisms to guide the pose transfer smoothly. The paper asserts that this mechanism aids in breaking down the complexity of transforming from one pose to another across a manifold of possible poses and views, particularly under large variations and deformations.

To evaluate their approach, the authors conducted extensive experiments on the Market-1501 and DeepFashion datasets. They report quantitative results including Structural Similarity Index (SSIM), Inception Score (IS), and a novel metric based on Percentage of Correct Keypoints (PCKh) that they introduced to explicitly assess shape consistency. The experiments demonstrate that their method not only achieves superior image quality but also improves computational efficiency compared to existing techniques.

Furthermore, the researchers apply their technique to enhance datasets for person re-identification by generating additional training data, addressing the issue of data insufficiency. They observed that re-identification performance on reduced datasets significantly improved with augmentation through their method, as verified using different backbone networks.

The paper discusses the implications of their pose transfer technique beyond person image generation. The authors suggest that their progressive pose-attentional transfer network has potential applications in generating other non-rigid objects and can be adapted to improve GAN-based image generation methods more generally. They also note the interpretability of their network through its attention mechanisms, which can provide insights into the network’s decision-making process during the generation task.

In terms of future developments, the exploration of further applications of progressive attention transfer methodologies in other domains of AI, and enhancements of the proposed framework to increase its adaptability and robustness to a broader range of input conditions would be valuable.

In summary, this paper offers a systematic framework for pose transfer using attention mechanisms, presenting a robust architecture that improves image quality and efficiency in person image generation tasks. This contributes to advancing the field of generative models with applications in image synthesis and beyond.