Papers
Topics
Authors
Recent
Search
2000 character limit reached

Semi-supervised Dense Keypoints Using Unlabeled Multiview Images

Published 20 Sep 2021 in cs.CV | (2109.09299v2)

Abstract: This paper presents a new end-to-end semi-supervised framework to learn a dense keypoint detector using unlabeled multiview images. A key challenge lies in finding the exact correspondences between the dense keypoints in multiple views since the inverse of the keypoint mapping can be neither analytically derived nor differentiated. This limits applying existing multiview supervision approaches used to learn sparse keypoints that rely on the exact correspondences. To address this challenge, we derive a new probabilistic epipolar constraint that encodes the two desired properties. (1) Soft correspondence: we define a matchability, which measures a likelihood of a point matching to the other image's corresponding point, thus relaxing the requirement of the exact correspondences. (2) Geometric consistency: every point in the continuous correspondence fields must satisfy the multiview consistency collectively. We formulate a probabilistic epipolar constraint using a weighted average of epipolar errors through the matchability thereby generalizing the point-to-point geometric error to the field-to-field geometric error. This generalization facilitates learning a geometrically coherent dense keypoint detection model by utilizing a large number of unlabeled multiview images. Additionally, to prevent degenerative cases, we employ a distillation-based regularization by using a pretrained model. Finally, we design a new neural network architecture, made of twin networks, that effectively minimizes the probabilistic epipolar errors of all possible correspondences between two view images by building affinity matrices. Our method shows superior performance compared to existing methods, including non-differentiable bootstrapping in terms of keypoint accuracy, multiview consistency, and 3D reconstruction accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Tex2shape: Detailed full human body geometry from a single image. In ICCV, 2019.
  2. Densereg: Fully convolutional dense shape regression in-the-wild. In CVPR, 2017.
  3. Automated markerless pose estimation in freely moving macaques with openmonkeystudio. Nature Communications, 2020.
  4. Dense semantic correspondence where every pixel is a classifier. In ICCV, 2015.
  5. Weakly-supervised discovery of geometry-aware representation for 3d human pose estimation. In CVPR, 2019.
  6. U. Gaur and B. Manjunath. Weakly supervised manifold learning for dense semantic object correspondence. In ICCV, 2017.
  7. R. A. Guler and I. Kokkinos. Holopose: Holistic 3d human reconstruction in-the-wild. In CVPR, 2019.
  8. Densepose: Dense human pose estimation in the wild. In CVPR, 2018.
  9. R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, second edition, 2004.
  10. Epipolar transformers. In CVPR, 2020.
  11. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI, 2013.
  12. Weakly-supervised 3d human pose learning via multi-view images in the wild. In CVPR, 2020.
  13. Learnable triangulation of human pose. In ICCV, 2019.
  14. Y. Jafarian and H. S. Park. Learning high fidelity depths of dressed humans by watching social media dance videos. In CVPR, 2021.
  15. Panoptic studio: A massively multiview system for social interaction capture. TPAMI, 2017.
  16. Total capture: A 3d deformation model for tracking faces, hands, and bodies. In CVPR, 2018.
  17. End-to-end recovery of human shape and pose. In CVPR, 2018.
  18. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv, 2014.
  19. Self-supervised learning of 3d human pose using multi-view geometry. In CVPR, 2019.
  20. Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In ICCV, 2019.
  21. Convolutional mesh regression for single-image human shape reconstruction. In CVPR, 2019.
  22. Capture dense: Markerless motion capture meets dense pose estimation. arXiv, 2018.
  23. Mosh: Motion and shape capture from sparse markers. SIGGRAPH Asia, 2014.
  24. Smpl: A skinned multi-person linear model. SIGGRAPH Asia, 2015.
  25. Multiview-consistent semi-supervised learning for 3d human pose estimation. In CVPR, 2020.
  26. Dense pose transfer. In ECCV, 2018.
  27. Slim densepose: Thrifty learning from sparse annotations and motion cues. In CVPR, 2019.
  28. Texturepose: Supervising human mesh estimation with texture consistency. In ICCV, 2019.
  29. Metric regression forests for correspondence estimation. IJCV, 2015.
  30. Cross view fusion for 3d human pose estimation. In ICCV, 2019.
  31. Accelerating 3d deep learning with pytorch3d. arXiv, 2020.
  32. Lightweight multi-view 3d pose estimation through camera-disentangled representation. In CVPR, 2020.
  33. Unsupervised geometry-aware representation for 3d human pose estimation. In ECCV, 2018.
  34. Learning monocular 3d human pose estimation from multi-view images. In CVPR, 2018.
  35. Delving deep into hybrid annotations for 3d human recovery in the wild. In ICCV, 2019.
  36. Transferring dense pose to proximal animal classes. In CVPR, 2020.
  37. Textured neural avatars. In CVPR, 2019.
  38. Hand keypoint detection in single images using multiview bootstrapping. In CVPR, 2017.
  39. O. Sorkine-Hornung and M. Rabinovich. Least-squares rigid motion using svd. Computing, 2017.
  40. J. Spörri. Reasearch dedicated to sports injury prevention-the’sequence of prevention’on the example of alpine ski racing. Habilitation with Venia Docendi in Biomechanics, 2016.
  41. View-invariant probabilistic embedding for human pose. In ECCV, 2020.
  42. Bottom-up human pose estimation by ranking heatmap-guided adaptive keypoint estimates. arXiv, 2020.
  43. The vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In CVPR, 2012.
  44. Unsupervised learning of landmarks by descriptor vector exchange. In ICCV, 2019.
  45. Unsupervised learning of object frames by dense equivariant image labelling. arXiv, 2017.
  46. Unsupervised learning of object landmarks by factorized spatial embeddings. In ICCV, 2017.
  47. Voxelpose: Towards multi-camera 3d human pose estimation in wild environment. arXiv, 2020.
  48. Recovering accurate 3d human pose in the wild using imus and a moving camera. In ECCV, 2018.
  49. Dense human body correspondences using convolutional networks. In CVPR, 2016.
  50. Metafuse: A pre-trained fusion model for human pose estimation. In CVPR, 2020.
  51. Denserac: Joint 3d pose and shape estimation by dense render-and-compare. In ICCV, 2019.
  52. Monet: Multiview semi-supervised keypoint detection via epipolar divergence. In ICCV, 2019.
  53. Humbi: A large multiview dataset of human body expressions. In CVPR, 2020.
  54. 3d human mesh regression with dense correspondence. In CVPR, 2020.
  55. Learning 3d human shape and pose from dense body parts. TPAMI, 2020.
  56. Adafuse: Adaptive multiview fusion for accurate human pose estimation in the wild. IJCV, 2020.
  57. Learning dense correspondence via 3d-guided cycle consistency. In CVPR, 2016.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.