Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Structure-Aware Human Body Reshaping with Adaptive Affinity-Graph Network (2404.13983v2)

Published 22 Apr 2024 in cs.CV

Abstract: Given a source portrait, the automatic human body reshaping task aims at editing it to an aesthetic body shape. As the technology has been widely used in media, several methods have been proposed mainly focusing on generating optical flow to warp the body shape. However, those previous works only consider the local transformation of different body parts (arms, torso, and legs), ignoring the global affinity, and limiting the capacity to ensure consistency and quality across the entire body. In this paper, we propose a novel Adaptive Affinity-Graph Network (AAGN), which extracts the global affinity between different body parts to enhance the quality of the generated optical flow. Specifically, our AAGN primarily introduces the following designs: (1) we propose an Adaptive Affinity-Graph (AAG) Block that leverages the characteristic of a fully connected graph. AAG represents different body parts as nodes in an adaptive fully connected graph and captures all the affinities between nodes to obtain a global affinity map. The design could better improve the consistency between body parts. (2) Besides, for high-frequency details are crucial for photo aesthetics, a Body Shape Discriminator (BSD) is designed to extract information from both high-frequency and spatial domain. Particularly, an SRM filter is utilized to extract high-frequency details, which are combined with spatial features as input to the BSD. With this design, BSD guides the Flow Generator (FG) to pay attention to various fine details rather than rigid pixel-level fitting. Extensive experiments conducted on the BR-5K dataset demonstrate that our framework significantly enhances the aesthetic appeal of reshaped photos, surpassing all previous work to achieve state-of-the-art in all evaluation metrics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Synthesizing images of humans in unseen poses. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8340–8348.
  2. Person Image Synthesis via Denoising Diffusion Model. arXiv:2211.12500 [cs.CV]
  3. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934 [cs.CV]
  4. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. arXiv:1611.08050 [cs.CV]
  5. StableDrag: Stable Dragging for Point-based Image Editing. arXiv:2403.04437 [cs.CV]
  6. Deep generative image models using a Laplacian pyramid of adversarial networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS’15). MIT Press, Cambridge, MA, USA, 1486–1494.
  7. A Variational U-Net for Conditional Appearance and Shape Generation. arXiv:1804.04694 [cs.CV]
  8. Jessica Fridrich and Jan Kodovsky. 2012. Rich Models for Steganalysis of Digital Images. IEEE Transactions on Information Forensics and Security 7, 3 (2012), 868–882. https://doi.org/10.1109/TIFS.2012.2190402
  9. Generative adversarial networks. Commun. ACM 63, 11 (oct 2020), 139–144. https://doi.org/10.1145/3422622
  10. Berthold KP Horn and Brian G Schunck. 1981. Determining optical flow. Artificial intelligence 17, 1-3 (1981), 185–203.
  11. Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks. arXiv:1810.12348 [cs.CV]
  12. Squeeze-and-Excitation Networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7132–7141. https://doi.org/10.1109/CVPR.2018.00745
  13. Junhwa Hur and Stefan Roth. 2019. Iterative residual refinement for joint optical flow and occlusion estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5754–5763.
  14. A Style-Based Generator Architecture for Generative Adversarial Networks. IEEE Transactions on Pattern Analysis & amp; Machine Intelligence 43, 12 (dec 2021), 4217–4228. https://doi.org/10.1109/TPAMI.2020.2970919
  15. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 105–114. https://doi.org/10.1109/CVPR.2017.19
  16. Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation. arXiv:2404.01050 [cs.CV]
  17. Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis. arXiv:1909.12224 [cs.CV]
  18. Learning Semantic Person Image Generation by Region-Adaptive Normalization. arXiv:2104.06650 [cs.CV]
  19. Nonrigid Image Deformation Using Moving Regularized Least Squares. IEEE Signal Processing Letters 20, 10 (2013), 988–991. https://doi.org/10.1109/LSP.2013.2278118
  20. Pose guided person image generation. Advances in neural information processing systems 30 (2017).
  21. Disentangled person image generation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 99–108.
  22. Disentangling factors of variation in deep representation using adversarial training. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2016/file/ef0917ea498b1665ad6c701057155abe-Paper.pdf
  23. Mehran Mehralian and Babak Karasfi. 2018. RDCGAN: Unsupervised Representation Learning With Regularized Deep Convolutional Generative Adversarial Networks. In 2018 9th Conference on Artificial Intelligence and Robotics and 2nd Asia-Pacific International Symposium. 31–38. https://doi.org/10.1109/AIAR.2018.8769811
  24. Controllable Person Image Synthesis with Attribute-Decomposed GAN. arXiv:2003.12267 [cs.CV]
  25. Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. arXiv:1411.1784 [cs.LG]
  26. Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold. arXiv:2305.10973 [cs.CV]
  27. Structure-Aware Flow Generation for Human Body Reshaping. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7744–7753. https://doi.org/10.1109/CVPR52688.2022.00760
  28. Deep Image Spatial Transformation for Person Image Generation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7687–7696. https://doi.org/10.1109/CVPR42600.2020.00771
  29. Real-Time Reshaping of Humans. In 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission. 340–347. https://doi.org/10.1109/3DIMPVT.2012.81
  30. Image deformation using moving least squares. 25, 3 (jul 2006), 533–540. https://doi.org/10.1145/1141911.1141920
  31. Animating arbitrary objects via deep motion transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2377–2386.
  32. First order motion model for image animation. Advances in neural information processing systems 32 (2019).
  33. Deformable gans for pose-based human image generation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3408–3416.
  34. Motion representations for articulated animation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13653–13662.
  35. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV]
  36. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8934–8943.
  37. Structure-aware person image generation with pose decomposition and semantic correlation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2656–2664.
  38. Structure-aware motion transfer with deformable anchor model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3637–3646.
  39. Compositional De-Attention Networks. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/16fc18d787294ad5171100e33d05d4e2-Paper.pdf
  40. Zachary Teed and Jia Deng. 2020. Raft: Recurrent all-pairs field transforms for optical flow. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16. Springer, 402–419.
  41. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
  42. Graph Attention Networks. arXiv:1710.10903 [stat.ML]
  43. Detecting Photoshopped Faces by Scripting Photoshop. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 10071–10080. https://doi.org/10.1109/ICCV.2019.01017
  44. Disco: Disentangled control for referring human dance generation in real world. arXiv preprint arXiv:2307.00040 (2023).
  45. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8798–8807. https://doi.org/10.1109/CVPR.2018.00917
  46. Non-local Neural Networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 7794–7803. https://doi.org/10.1109/CVPR.2018.00813
  47. Non-local Neural Networks. arXiv:1711.07971 [cs.CV]
  48. CBAM: Convolutional Block Attention Module. In Computer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 3–19.
  49. Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation. arXiv:2212.11565 [cs.CV]
  50. Motion and appearance adaptation for cross-domain motion transfer. In European Conference on Computer Vision. Springer, 529–545.
  51. Human body reshaping and its application using multiple RGB-D sensors. Signal Processing: Image Communication 79 (2019), 71–81. https://doi.org/10.1016/j.image.2019.08.011
  52. Semantic Parametric Reshaping of Human Body Models. In 2014 2nd International Conference on 3D Vision, Vol. 2. 41–48. https://doi.org/10.1109/3DV.2014.47
  53. Stacked Attention Networks for Image Question Answering. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 21–29. https://doi.org/10.1109/CVPR.2016.10
  54. Animating Through Warping: An Efficient Method for High-Quality Facial Expression Animation. In Proceedings of the 28th ACM International Conference on Multimedia (Seattle, WA, USA) (MM ’20). Association for Computing Machinery, New York, NY, USA, 1459–1468. https://doi.org/10.1145/3394171.3413926
  55. Pose-guided human animation from a single image in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15039–15048.
  56. Generative Image Inpainting with Contextual Attention. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 5505–5514. https://doi.org/10.1109/CVPR.2018.00577
  57. Self-Attention Generative Adversarial Networks. arXiv:1805.08318 [stat.ML]
  58. Pise: Person image synthesis and editing with decoupled gan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7982–7990.
  59. Cross-domain Correspondence Learning for Exemplar-based Image Translation. arXiv:2004.05571 [cs.CV]
  60. Ting Zhao and Xiangqian Wu. 2019. Pyramid Feature Attention Network for Saliency Detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3080–3089. https://doi.org/10.1109/CVPR.2019.00320
  61. Parametric reshaping of human bodies in images. ACM transactions on graphics (TOG) 29, 4 (2010), 1–10.
  62. Cross Attention Based Style Distribution for Controllable Person Image Synthesis. arXiv:2208.00712 [cs.CV]
  63. CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation. arXiv:2012.02047 [cs.CV]
  64. Generative Visual Manipulation on the Natural Image Manifold. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 597–613.
  65. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In 2017 IEEE International Conference on Computer Vision (ICCV). 2242–2251. https://doi.org/10.1109/ICCV.2017.244

Summary

We haven't generated a summary for this paper yet.