Joint Coordinate Regression and Association For Multi-Person Pose Estimation, A Pure Neural Network Approach
Abstract: We introduce a novel one-stage end-to-end multi-person 2D pose estimation algorithm, known as Joint Coordinate Regression and Association (JCRA), that produces human pose joints and associations without requiring any post-processing. The proposed algorithm is fast, accurate, effective, and simple. The one-stage end-to-end network architecture significantly improves the inference speed of JCRA. Meanwhile, we devised a symmetric network structure for both the encoder and decoder, which ensures high accuracy in identifying keypoints. It follows an architecture that directly outputs part positions via a transformer network, resulting in a significant improvement in performance. Extensive experiments on the MS COCO and CrowdPose benchmarks demonstrate that JCRA outperforms state-of-the-art approaches in both accuracy and efficiency. Moreover, JCRA demonstrates 69.2 mAP and is 78\% faster at inference acceleration than previous state-of-the-art bottom-up algorithms. The code for this algorithm will be publicly available.
- Posetrack: A benchmark for human pose estimation and tracking. computer vision and pattern recognition, 2017.
- Learning temporal pose estimation from sparsely-labeled videos. neural information processing systems, 2019.
- Realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1611.08050, 2016.
- Realtime multi-person 2d pose estimation using part affinity fields. 2022.
- End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
- Cascaded pyramid network for multi-person pose estimation. computer vision and pattern recognition, 2017.
- Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5386–5395, 2020.
- Hierarchical recurrent neural network for skeleton based action recognition. computer vision and pattern recognition, 2015.
- Better appearance models for pictorial structures. In BMVC, volume 2, page 5, 2009.
- Rmpe: Regional multi-person pose estimation. arXiv: Computer Vision and Pattern Recognition, 2016.
- Rmpe: Regional multi-person pose estimation. international conference on computer vision, 2017.
- Bottom-up human pose estimation via disentangled keypoint regression. computer vision and pattern recognition, 2021.
- Bottom-up human pose estimation via disentangled keypoint regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14676–14686, 2021.
- Mask r-cnn. arXiv preprint arXiv:1703.06870, 2017.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Differentiable hierarchical graph grouping for multi-person pose estimation. european conference on computer vision, 2020.
- Pifpaf: Composite fields for human pose estimation. computer vision and pattern recognition, 2019.
- Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV), pages 734–750, 2018.
- Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10863–10872, 2019.
- Pose recognition with cascade transformers. computer vision and pattern recognition, 2021.
- Pose recognition with cascade transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1944–1953, 2021.
- Is 2d heatmap representation even necessary for human pose estimation?, 2021.
- Tokenpose: Learning keypoint tokens for human pose estimation. international conference on computer vision, 2021.
- Tokenpose: Learning keypoint tokens for human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11313–11322, 2021.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- Deep dual consecutive network for human pose estimation. computer vision and pattern recognition, 2021.
- Rethinking the heatmap regression for bottom-up human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13264–13273, 2021.
- Poseur: Direct human pose regression with transformers. arXiv preprint arXiv:2201.07412, 2022.
- Fcpose: Fully convolutional multi-person pose estimation with dynamic instance-aware convolutions. computer vision and pattern recognition, 2022.
- Associative embedding: End-to-end learning for joint detection and grouping. CoRR, abs/1611.05424, 2016.
- Stacked hourglass networks for human pose estimation. european conference on computer vision, 2016.
- Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision, pages 483–499. Springer, 2016.
- Single-stage multi-person pose machines. international conference on computer vision, 2019.
- Towards accurate multi-person pose estimation in the wild. CoRR, abs/1701.01779, 2017.
- Strike a pose: tracking people by finding stylized poses. computer vision and pattern recognition, 2005.
- Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
- End-to-end multi-person pose estimation with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11069–11078, 2022.
- Inspose: Instance-aware networks for single-stage multi-person pose estimation. acm multimedia, 2021.
- End-to-end trainable multi-instance pose estimation with transformers. arXiv preprint arXiv:2103.12115, 2021.
- Deep high-resolution representation learning for human pose estimation. computer vision and pattern recognition, 2019.
- Directpose: Direct end-to-end multi-person pose estimation. arXiv: Computer Vision and Pattern Recognition, 2019.
- Point-set anchors for object detection, instance segmentation and pose estimation. european conference on computer vision, 2020.
- Convolutional pose machines. CoRR, abs/1602.00134, 2016.
- Convolutional pose machines. computer vision and pattern recognition, 2016.
- Vision transformer with deformable attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4794–4803, 2022.
- Simple baselines for human pose estimation and tracking. european conference on computer vision, 2018.
- Vitpose: Simple vision transformer baselines for human pose estimation. arXiv preprint arXiv:2204.12484, 2022.
- Learning local-global contextual adaptation for multi-person pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Transpose: Keypoint localization via transformer. international conference on computer vision, 2020.
- Transpose: Keypoint localization via transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11802–11812, 2021.
- Articulated pose estimation with flexible mixtures-of-parts. In CVPR 2011, pages 1385–1392, June 2011.
- Hrformer: High-resolution transformer for dense prediction. arXiv: Computer Vision and Pattern Recognition, 2021.
- Hrformer: High-resolution vision transformer for dense predict. Advances in Neural Information Processing Systems, 34:7281–7293, 2021.
- Self-attention generative adversarial networks. In International conference on machine learning, pages 7354–7363. PMLR, 2019.
- Objects as points. arXiv preprint arXiv:1904.07850, 2019.
- Deformable detr: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.