Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Human Pose Transfer with Augmented Disentangled Feature Consistency (2107.10984v4)

Published 23 Jul 2021 in cs.CV

Abstract: Deep generative models have made great progress in synthesizing images with arbitrary human poses and transferring poses of one person to others. Though many different methods have been proposed to generate images with high visual fidelity, the main challenge remains and comes from two fundamental issues: pose ambiguity and appearance inconsistency. To alleviate the current limitations and improve the quality of the synthesized images, we propose a pose transfer network with augmented Disentangled Feature Consistency (DFC-Net) to facilitate human pose transfer. Given a pair of images containing the source and target person, DFC-Net extracts pose and static information from the source and target respectively, then synthesizes an image of the target person with the desired pose from the source. Moreover, DFC-Net leverages disentangled feature consistency losses in the adversarial training to strengthen the transfer coherence and integrates a keypoint amplifier to enhance the pose feature extraction. With the help of the disentangled feature consistency losses, we further propose a novel data augmentation scheme that introduces unpaired support data with the augmented consistency constraints to improve the generality and robustness of DFC-Net. Extensive experimental results on Mixamo-Pose and EDN-10k have demonstrated DFC-Net achieves state-of-the-art performance on pose transfer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Learning Character-Agnostic Motion for Motion Retargeting in 2D. ACM Transactions on Graphics (TOG) (2019).
  2. Adobe Systems Inc. 2018. https://www.mixamo.com.. Accessed: 2018-12-27..
  3. Densepose: Dense human pose estimation in the wild. In CVPR.
  4. Synthesizing images of humans in unseen poses. In CVPR.
  5. Blender Online Community. 2018. Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam. http://www.blender.org
  6. Unsupervised pixel-level domain adaptation with generative adversarial networks. In CVPR.
  7. G. Bradski. 2000. The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000).
  8. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).
  9. Realtime multi-person 2d pose estimation using part affinity fields. In CVPR.
  10. Everybody dance now. In ICCV.
  11. PMAN: Progressive Multi-Attention Network for Human Pose Transfer. IEEE Transactions on Circuits and Systems for Video Technology (2021).
  12. Changxing Ding and Dacheng Tao. 2016. A Comprehensive Survey on Pose-Invariant Face Recognition. ACM Trans. Intell. Syst. Technol. (2016).
  13. Soft-gated warping-gan for pose-guided person image synthesis. In NIPS.
  14. A variational u-net for conditional appearance and shape generation. In CVPR.
  15. Generative adversarial nets. In NIPS.
  16. Coordinate-based texture inpainting for pose-guided human image generation. In CVPR.
  17. Deep residual learning for image recognition. In CVPR.
  18. Gans trained by a two time-scale update rule converge to a local nash equilibrium. NIPS (2017).
  19. Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. science (2006).
  20. Denoising diffusion probabilistic models. NIPS 33 (2020), 6840–6851.
  21. Yedid Hoshen and Lior Wolf. 2018. Identifying Analogies Across Domains. In ICLR.
  22. Multimodal unsupervised image-to-image translation. In ECCV.
  23. Image-to-image translation with conditional adversarial networks. In CVPR.
  24. Spatial transformer networks. In NIPS.
  25. Linearized multi-sampling for differentiable image transformation. In ICCV.
  26. Perceptual losses for real-time style transfer and super-resolution. In ECCV.
  27. Self-learning with rectification strategy for human parsing. In CVPR.
  28. Dense intrinsic appearance flow for human pose transfer. In CVPR.
  29. Chen-Hsuan Lin and Simon Lucey. 2017. Inverse compositional spatial transformer networks. In CVPR.
  30. Microsoft coco: Common objects in context. In ECCV.
  31. Neural rendering and reenactment of human actor videos. ACM Transactions on Graphics (2019).
  32. Unsupervised image-to-image translation networks. In NIPS.
  33. Pair-Based Uncertainty and Diversity Promoting Early Active Learning for Person Re-Identification. ACM Trans. Intell. Syst. Technol. (2020).
  34. Liquid warping GAN: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In ICCV.
  35. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In CVPR.
  36. SMPL: A skinned multi-person linear model. ACM transactions on graphics (TOG) (2015).
  37. Pose guided person image generation. In NIPS.
  38. Disentangled person image generation. In CVPR.
  39. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
  40. A survey of advances in vision-based human motion capture and analysis. Computer vision and image understanding (2006).
  41. Dense pose transfer. In ECCV.
  42. Learning human-object interactions by graph parsing neural networks. In ECCV.
  43. Deep image spatial transformation for person image generation. In CVPR.
  44. Improved techniques for training gans. NIPS (2016).
  45. Diffustereo: High quality human reconstruction via diffusion-based stereo using sparse cameras. In ECCV. Springer.
  46. Distilled siamese networks for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
  47. Multistage adversarial losses for pose-based human image synthesis. In CVPR.
  48. Animating arbitrary objects via deep motion transfer. In CVPR.
  49. First Order Motion Model for Image Animation. In NIPS.
  50. Deformable gans for pose-based human image generation. In CVPR.
  51. Karen Simonyan and Andrew Zisserman. 2014a. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
  52. Karen Simonyan and Andrew Zisserman. 2014b. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
  53. Rethinking the inception architecture for computer vision. In CVPR.
  54. Cycle in cycle generative adversarial networks for keypoint-guided image generation. In Proceedings of the 27th ACM International Conference on Multimedia.
  55. A Camera Identity-Guided Distribution Consistency Method for Unsupervised Multi-Target Domain Person Re-Identification. ACM Trans. Intell. Syst. Technol. (2021).
  56. Self-supervised learning of motion capture. In NIPS.
  57. Video-to-Video Synthesis. In NIPS.
  58. High-resolution image synthesis and semantic manipulation with conditional gans. In CVPR.
  59. Paying attention to video object pattern understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).
  60. Hierarchical human semantic parsing with comprehensive part-relation modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
  61. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing (2004).
  62. A survey on human performance capture and animation. Journal of Computer Science and Technology (2017).
  63. Simple Baselines for Human Pose Estimation and Tracking. In ECCV.
  64. Dualgan: Unsupervised dual learning for image-to-image translation. In ICCV.
  65. Haoyang Zhang and Xuming He. 2017. Deep free-form deformation network for object-mask registration. In ICCV.
  66. Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001 (2022).
  67. Real-time and light-weighted unsupervised video object segmentation network. Pattern Recognition (2021).
  68. Unsupervised pose flow learning for pose guided synthesis. arXiv preprint arXiv:1909.13819 (2019).
  69. Scalable person re-identification: A benchmark. In ICCV.
  70. Consistency and diversity induced human motion segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
  71. Cascaded parsing of human-object interaction recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
  72. Cascaded human-object interaction recognition. In CVPR.
  73. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV.
  74. Progressive pose attention transfer for person image generation. In CVPR.
Citations (1)

Summary

We haven't generated a summary for this paper yet.