Papers
Topics
Authors
Recent
2000 character limit reached

SketchBodyNet: A Sketch-Driven Multi-faceted Decoder Network for 3D Human Reconstruction

Published 10 Oct 2023 in cs.CV and cs.GR | (2310.06577v1)

Abstract: Reconstructing 3D human shapes from 2D images has received increasing attention recently due to its fundamental support for many high-level 3D applications. Compared with natural images, freehand sketches are much more flexible to depict various shapes, providing a high potential and valuable way for 3D human reconstruction. However, such a task is highly challenging. The sparse abstract characteristics of sketches add severe difficulties, such as arbitrariness, inaccuracy, and lacking image details, to the already badly ill-posed problem of 2D-to-3D reconstruction. Although current methods have achieved great success in reconstructing 3D human bodies from a single-view image, they do not work well on freehand sketches. In this paper, we propose a novel sketch-driven multi-faceted decoder network termed SketchBodyNet to address this task. Specifically, the network consists of a backbone and three separate attention decoder branches, where a multi-head self-attention module is exploited in each decoder to obtain enhanced features, followed by a multi-layer perceptron. The multi-faceted decoders aim to predict the camera, shape, and pose parameters, respectively, which are then associated with the SMPL model to reconstruct the corresponding 3D human mesh. In learning, existing 3D meshes are projected via the camera parameters into 2D synthetic sketches with joints, which are combined with the freehand sketches to optimize the model. To verify our method, we collect a large-scale dataset of about 26k freehand sketches and their corresponding 3D meshes containing various poses of human bodies from 14 different angles. Extensive experimental results demonstrate our SketchBodyNet achieves superior performance in reconstructing 3D human meshes from freehand sketches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Brodt K., Bessmeltsev M.: Sketch2pose: estimating a 3d character pose from a bitmap sketch. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1–15.
  2. Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. Springer, Cham (2016).
  3. Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2022), 1371–1384.
  4. Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In European Conference on Computer Vision (2020), Springer, pp. 769–787.
  5. Cross-domain facial expression recognition: A unified evaluation benchmark and adversarial graph learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 12 (2022), 9887–9903.
  6. Learning to regress bodies from images using differentiable semantic rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 11250–11259.
  7. Learning deep similarity models with focus ranking for fabric image retrieval. Image and Vision Computing 70 (2018), 11–20.
  8. Sketch-based image retrieval: Benchmark and bag-of-features descriptors. IEEE transactions on visualization and computer graphics 17, 11 (2010), 1624–1636.
  9. Data-driven image completion for complex objects. Signal Processing: Image Communication 57 (2017), 21–32.
  10. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770–778.
  11. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence 36, 7 (2013), 1325–1339.
  12. Johnson S., Everingham M.: Clustered pose and nonlinear appearance models for human pose estimation. In bmvc (2010), vol. 2, Aberystwyth, UK, p. 5.
  13. End-to-end recovery of human shape and pose. In Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 7122–7131.
  14. Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), pp. 2252–2261.
  15. Convolutional mesh regression for single-image human shape reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), pp. 4501–4510.
  16. Smpl: A skinned multi-person linear model. ACM transactions on graphics (TOG) 34, 6 (2015), 1–16.
  17. Unite the people: Closing the loop between 3d and 2d human representations. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 6050–6059.
  18. Free-hand sketch synthesis with deformable stroke models. International Journal of Computer Vision 122, 1 (2017), 169–190.
  19. Multi-view pairwise relationship learning for sketch based 3d shape retrieval. In IEEE International Conference on Multimedia and Expo (2017), pp. 1434–1439.
  20. Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 3383–3393.
  21. Simpmodeling: Sketching implicit field to guide mesh modeling for 3d animalmorphic head design. In The 34th Annual ACM Symposium on User Interface Software and Technology (2021), pp. 854–863.
  22. A simple yet effective baseline for 3d human pose estimation. In Proceedings of the IEEE international conference on computer vision (2017), pp. 2640–2649.
  23. Moon G., Lee K. M.: I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image. In European Conference on Computer Vision (2020), Springer, pp. 752–768.
  24. Monocular 3d human pose estimation in the wild using improved cnn supervision. In 2017 international conference on 3D vision (3DV) (2017), IEEE, pp. 506–516.
  25. Oreshkin B. N.: 3d human pose and shape estimation via hybrik-transformer. arXiv preprint arXiv:2302.04774 (2023).
  26. Sketchlattice: Latticed representation for sketch manipulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 953–961.
  27. Synthetic training for accurate 3d human pose and shape estimation in the wild. arXiv preprint arXiv:2009.10013 (2020).
  28. Shin H., Igarashi T.: Magic canvas: interactive design of a 3-d scene prototype from freehand sketches. In Proceedings of Graphics Interface 2007 (2007), pp. 63–70.
  29. Interactive sketching of mannequin poses. In Proceedings of International Conference on 3D Vision (2022).
  30. Bodynet: Volumetric inference of 3d human body shapes. In Proceedings of the European conference on computer vision (ECCV) (2018), pp. 20–36.
  31. Attention is all you need. Advances in neural information processing systems 30 (2017).
  32. Instance-aware representation learning and association for online multi-person tracking. Pattern Recognition 94 (2019), 25–34.
  33. A data-driven approach for sketch-based 3d shape retrieval via similar drawing-style recommendation. Computer Graphics Forum 36, 7 (2017), 157–166.
  34. Multi-column point-cnn for sketch segmentation. Neurocomputing 392 (2020), 50–59.
  35. Encoder-decoder with multi-level attention for 3d human shape and pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 13033–13042.
  36. Data-driven method for sketch-based 3d shape retrieval based on user similar draw-style recommendation. In SIGGRAPH ASIA - Posters (2016), p. 34.
  37. Spfusionnet: Sketch segmentation using multi-modal data fusion. In IEEE International Conference on Multimedia and Expo (2019), pp. 1654–1659.
  38. 3d shape reconstruction from free-hand sketches. arXiv preprint arXiv:2006.09694 (2020).
  39. Crowd counting via localization guided transformer. Computers and Electrical Engineering 104 (2022), 108430.
  40. Sketch2model: View-aware 3d modeling from single free-hand sketches. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 6012–6021.
  41. Pymaf: 3d human pose and shape regression with pyramidal mesh alignment feedback loop. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 11446–11456.
  42. 3d human pose estimation with spatial and temporal transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 11656–11665.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.