Papers
Topics
Authors
Recent
Search
2000 character limit reached

Joint Coordinate Regression and Association For Multi-Person Pose Estimation, A Pure Neural Network Approach

Published 3 Jul 2023 in cs.CV and cs.LG | (2307.01004v2)

Abstract: We introduce a novel one-stage end-to-end multi-person 2D pose estimation algorithm, known as Joint Coordinate Regression and Association (JCRA), that produces human pose joints and associations without requiring any post-processing. The proposed algorithm is fast, accurate, effective, and simple. The one-stage end-to-end network architecture significantly improves the inference speed of JCRA. Meanwhile, we devised a symmetric network structure for both the encoder and decoder, which ensures high accuracy in identifying keypoints. It follows an architecture that directly outputs part positions via a transformer network, resulting in a significant improvement in performance. Extensive experiments on the MS COCO and CrowdPose benchmarks demonstrate that JCRA outperforms state-of-the-art approaches in both accuracy and efficiency. Moreover, JCRA demonstrates 69.2 mAP and is 78\% faster at inference acceleration than previous state-of-the-art bottom-up algorithms. The code for this algorithm will be publicly available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Posetrack: A benchmark for human pose estimation and tracking. computer vision and pattern recognition, 2017.
  2. Learning temporal pose estimation from sparsely-labeled videos. neural information processing systems, 2019.
  3. Realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1611.08050, 2016.
  4. Realtime multi-person 2d pose estimation using part affinity fields. 2022.
  5. End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
  6. Cascaded pyramid network for multi-person pose estimation. computer vision and pattern recognition, 2017.
  7. Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5386–5395, 2020.
  8. Hierarchical recurrent neural network for skeleton based action recognition. computer vision and pattern recognition, 2015.
  9. Better appearance models for pictorial structures. In BMVC, volume 2, page 5, 2009.
  10. Rmpe: Regional multi-person pose estimation. arXiv: Computer Vision and Pattern Recognition, 2016.
  11. Rmpe: Regional multi-person pose estimation. international conference on computer vision, 2017.
  12. Bottom-up human pose estimation via disentangled keypoint regression. computer vision and pattern recognition, 2021.
  13. Bottom-up human pose estimation via disentangled keypoint regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14676–14686, 2021.
  14. Mask r-cnn. arXiv preprint arXiv:1703.06870, 2017.
  15. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  16. Differentiable hierarchical graph grouping for multi-person pose estimation. european conference on computer vision, 2020.
  17. Pifpaf: Composite fields for human pose estimation. computer vision and pattern recognition, 2019.
  18. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV), pages 734–750, 2018.
  19. Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10863–10872, 2019.
  20. Pose recognition with cascade transformers. computer vision and pattern recognition, 2021.
  21. Pose recognition with cascade transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1944–1953, 2021.
  22. Is 2d heatmap representation even necessary for human pose estimation?, 2021.
  23. Tokenpose: Learning keypoint tokens for human pose estimation. international conference on computer vision, 2021.
  24. Tokenpose: Learning keypoint tokens for human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11313–11322, 2021.
  25. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
  26. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  27. Deep dual consecutive network for human pose estimation. computer vision and pattern recognition, 2021.
  28. Rethinking the heatmap regression for bottom-up human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13264–13273, 2021.
  29. Poseur: Direct human pose regression with transformers. arXiv preprint arXiv:2201.07412, 2022.
  30. Fcpose: Fully convolutional multi-person pose estimation with dynamic instance-aware convolutions. computer vision and pattern recognition, 2022.
  31. Associative embedding: End-to-end learning for joint detection and grouping. CoRR, abs/1611.05424, 2016.
  32. Stacked hourglass networks for human pose estimation. european conference on computer vision, 2016.
  33. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision, pages 483–499. Springer, 2016.
  34. Single-stage multi-person pose machines. international conference on computer vision, 2019.
  35. Towards accurate multi-person pose estimation in the wild. CoRR, abs/1701.01779, 2017.
  36. Strike a pose: tracking people by finding stylized poses. computer vision and pattern recognition, 2005.
  37. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  38. End-to-end multi-person pose estimation with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11069–11078, 2022.
  39. Inspose: Instance-aware networks for single-stage multi-person pose estimation. acm multimedia, 2021.
  40. End-to-end trainable multi-instance pose estimation with transformers. arXiv preprint arXiv:2103.12115, 2021.
  41. Deep high-resolution representation learning for human pose estimation. computer vision and pattern recognition, 2019.
  42. Directpose: Direct end-to-end multi-person pose estimation. arXiv: Computer Vision and Pattern Recognition, 2019.
  43. Point-set anchors for object detection, instance segmentation and pose estimation. european conference on computer vision, 2020.
  44. Convolutional pose machines. CoRR, abs/1602.00134, 2016.
  45. Convolutional pose machines. computer vision and pattern recognition, 2016.
  46. Vision transformer with deformable attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4794–4803, 2022.
  47. Simple baselines for human pose estimation and tracking. european conference on computer vision, 2018.
  48. Vitpose: Simple vision transformer baselines for human pose estimation. arXiv preprint arXiv:2204.12484, 2022.
  49. Learning local-global contextual adaptation for multi-person pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  50. Transpose: Keypoint localization via transformer. international conference on computer vision, 2020.
  51. Transpose: Keypoint localization via transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11802–11812, 2021.
  52. Articulated pose estimation with flexible mixtures-of-parts. In CVPR 2011, pages 1385–1392, June 2011.
  53. Hrformer: High-resolution transformer for dense prediction. arXiv: Computer Vision and Pattern Recognition, 2021.
  54. Hrformer: High-resolution vision transformer for dense predict. Advances in Neural Information Processing Systems, 34:7281–7293, 2021.
  55. Self-attention generative adversarial networks. In International conference on machine learning, pages 7354–7363. PMLR, 2019.
  56. Objects as points. arXiv preprint arXiv:1904.07850, 2019.
  57. Deformable detr: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2021.
Citations (5)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.