Meta-Point Learning and Refining for Category-Agnostic Pose Estimation (2403.13647v1)
Abstract: Category-agnostic pose estimation (CAPE) aims to predict keypoints for arbitrary classes given a few support images annotated with keypoints. Existing methods only rely on the features extracted at support keypoints to predict or refine the keypoints on query image, but a few support feature vectors are local and inadequate for CAPE. Considering that human can quickly perceive potential keypoints of arbitrary objects, we propose a novel framework for CAPE based on such potential keypoints (named as meta-points). Specifically, we maintain learnable embeddings to capture inherent information of various keypoints, which interact with image feature maps to produce meta-points without any support. The produced meta-points could serve as meaningful potential keypoints for CAPE. Due to the inevitable gap between inherency and annotation, we finally utilize the identities and details offered by support keypoints to assign and refine meta-points to desired keypoints in query image. In addition, we propose a progressive deformable point decoder and a slacked regression loss for better prediction and supervision. Our novel framework not only reveals the inherency of keypoints but also outperforms existing methods of CAPE. Comprehensive experiments and in-depth studies on large-scale MP-100 dataset demonstrate the effectiveness of our framework.
- Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11):2189–2202, 2012.
- 2d human pose estimation: New benchmark and state of the art analysis. In CVPR, pages 3686–3693, 2014.
- 3fabrec: Fast few-shot face alignment by reconstruction. In CVPR, pages 6110–6120, 2020.
- Cross-domain adaptation for animal pose estimation. In ICCV, pages 9498–9507, 2019.
- End-to-end object detection with transformers. In ECCV, pages 213–229, 2020.
- Weak-shot fine-grained classification via similarity transfer. In NeurIPS, pages 7306–7318, 2021a.
- Depth privileged scene recognition via dual attention hallucination. IEEE Transactions on Image Processing, 30:9164–9178, 2021b.
- Weak-shot semantic segmentation via dual similarity transfer. In NeurIPS, pages 32525–32536, 2022a.
- Amodal instance segmentation via prior-guided expansion. In AAAI, pages 313–321, 2023.
- Semi-supervised anatomical landmark detection via shape-regulated self-training. Neurocomputing, 471:335–345, 2022b.
- Cascaded pyramid network for multi-person pose estimation. In CVPR, pages 7103–7112, 2018.
- Meta-baseline: Exploring simple meta-learning for few-shot learning. In ICCV, pages 9062–9071, 2021c.
- Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In CVPR, pages 5386–5395, 2020a.
- Per-pixel classification is not all you need for semantic segmentation. pages 17864–17875, 2021.
- Cascadepsp: Toward class-agnostic and very high-resolution segmentation via global and local refinement. In CVPR, pages 8890–8899, 2020b.
- Few-shot object detection with attention-rpn and multi-relation detector. In CVPR, pages 4013–4022, 2020.
- Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, pages 1126–1135, 2017.
- Metacloth: Learning unseen tasks of dense fashion landmark detection from a few samples. IEEE Transactions on Image Processing, 31:1120–1133, 2021.
- Bottom-up human pose estimation via disentangled keypoint regression. In CVPR, pages 14676–14686, 2021.
- Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
- Few-shot geometry-aware keypoint localization. In CVPR, pages 21337–21348, 2023.
- What makes for effective detection proposals? IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(4):814–830, 2015.
- Rigor: Reusing inference in graph cuts for generating object regions. In CVPR, pages 336–343, 2014.
- Class-agnostic object detection. In WCAV, pages 919–928, 2021.
- Geodesic object proposals. In ECCV, pages 725–739. Springer, 2014.
- Macaquepose: a novel “in the wild” macaque monkey pose dataset for markerless motion capture. Frontiers in behavioral neuroscience, 14:581154, 2021.
- Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In CVPR, pages 10863–10872, 2019.
- Human pose regression with residual log-likelihood estimation. In ICCV, pages 11025–11034, 2021.
- Crnet: Cross-reference networks for few-shot segmentation. In CVPR, pages 4165–4173, 2020.
- Few-shot keypoint detection with uncertainty learning for unseen species. In CVPR, pages 19416–19426, 2022.
- From saliency to dino: Saliency-guided vision transformer for few-shot keypoint detection. arXiv preprint arXiv:2304.03140, 2023.
- Understanding the effective receptive field in deep convolutional neural networks. 2016.
- Rethinking the heatmap regression for bottom-up human pose estimation. In CVPR, pages 13264–13273, 2021.
- Poseur: Direct human pose regression with transformers. In ECCV, pages 72–88. Springer, 2022.
- Revisiting fine-tuning for few-shot learning. arXiv preprint arXiv:1910.00216, 2019.
- Single-stage multi-person pose machines. In ICCV, pages 6951–6960, 2019.
- Learning to segment object candidates. 2015.
- Carfusion: Combining point tracking and part detection for dynamic 3d reconstruction of vehicles. In CVPR, pages 1906–1915, 2018.
- Faster r-cnn: Towards real-time object detection with region proposal networks. 2015.
- End-to-end multi-person pose estimation with transformers. In CVPR, pages 11069–11078, 2022.
- Matching is not enough: A two-stage framework for category-agnostic pose estimation. In CVPR, pages 7308–7317, 2023.
- Prototypical networks for few-shot learning. 2017.
- Apollocar3d: A large 3d car instance understanding benchmark for autonomous driving. In CVPR, pages 5452–5462, 2019.
- Attention is all you need. 2017.
- Non-local neural networks. In CVPR, pages 7794–7803, 2018.
- Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3):1–34, 2020.
- Few-shot model adaptation for customized facial landmark detection, segmentation, stylization and shadow removal. arXiv preprint arXiv:2104.09457, 2021.
- Pose for everything: Towards category-agnostic pose estimation. In ECCV, pages 398–416, 2022a.
- A simple baseline for open-vocabulary semantic segmentation with pre-trained vision-language model. In ECCV, pages 736–753. Springer, 2022b.
- Vitpose: Simple vision transformer baselines for human pose estimation. pages 38571–38584, 2022c.
- Articulated human detection with flexible mixtures of parts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12):2878–2890, 2012.
- One-shot medical landmark localization by edge-guided transform and noisy landmark refinement. In ECCV, pages 473–489. Springer, 2022.
- Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In CVPR, pages 5217–5226, 2019.
- Distribution-aware coordinate representation for human pose estimation. In CVPR, pages 7093–7102, 2020.
- Hallucination improves few-shot object detection. In CVPR, pages 13008–13017, 2021.
- Deformable detr: Deformable transformers for end-to-end object detection. In ICLR, 2021.
- Edge boxes: Locating object proposals from edges. In ECCV, pages 391–405. Springer, 2014.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.