Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation (2405.11448v1)
Abstract: In practical applications of human pose estimation, low-resolution inputs frequently occur, and existing state-of-the-art models perform poorly with low-resolution images. This work focuses on boosting the performance of low-resolution models by distilling knowledge from a high-resolution model. However, we face the challenge of feature size mismatch and class number mismatch when applying knowledge distillation to networks with different input resolutions. To address this issue, we propose a novel cross-domain knowledge distillation (CDKD) framework. In this framework, we construct a scale-adaptive projector ensemble (SAPE) module to spatially align feature maps between models of varying input resolutions. It adopts a projector ensemble to map low-resolution features into multiple common spaces and adaptively merges them based on multi-scale information to match high-resolution features. Additionally, we construct a cross-class alignment (CCA) module to solve the problem of the mismatch of class numbers. By combining an easy-to-hard training (ETHT) strategy, the CCA module further enhances the distillation performance. The effectiveness and efficiency of our approach are demonstrated by extensive experiments on two common benchmark datasets: MPII and COCO. The code is made available in supplementary material.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755.
- M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, “2d human pose estimation: New benchmark and state of the art analysis,” in Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, 2014, pp. 3686–3693.
- K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5693–5703.
- B. Xiao, H. Wu, and Y. Wei, “Simple baselines for human pose estimation and tracking,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 466–481.
- F. Zhang, X. Zhu, H. Dai, M. Ye, and C. Zhu, “Distribution-aware coordinate representation for human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 7093–7102.
- C. Wang, F. Zhang, X. Zhu, and S. S. Ge, “Low-resolution human pose estimation,” Pattern Recognition, vol. 126, p. 108579, 2022.
- Z. Li, J. Ye, M. Song, Y. Huang, and Z. Pan, “Online knowledge distillation for efficient pose estimation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11 740–11 750.
- Y. Li, S. Zhang, Z. Wang, S. Yang, W. Yang, S.-T. Xia, and E. Zhou, “Tokenpose: Learning keypoint tokens for human pose estimation,” in Proceedings of the IEEE/CVF International conference on computer vision, 2021, pp. 11 313–11 322.
- K. Li, S. Wang, X. Zhang, Y. Xu, W. Xu, and Z. Tu, “Pose recognition with cascade transformers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 1944–1953.
- Y. Li, S. Yang, P. Liu, S. Zhang, Y. Wang, Z. Wang, W. Yang, and S.-T. Xia, “Simcc: A simple coordinate classification perspective for human pose estimation,” in European Conference on Computer Vision. Springer, 2022, pp. 89–106.
- J. Li, S. Bian, A. Zeng, C. Wang, B. Pang, W. Liu, and C. Lu, “Human pose regression with residual log-likelihood estimation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11 025–11 034.
- Z.-H. Zhou, J. Wu, and W. Tang, “Ensembling neural networks: many could be better than all,” Artificial intelligence, vol. 137, no. 1-2, pp. 239–263, 2002.
- L. Qi, J. Kuen, J. Gu, Z. Lin, Y. Wang, Y. Chen, Y. Li, and J. Jia, “Multi-scale aligned distillation for low-resolution detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 443–14 453.
- Z. Huang, S. Yang, M. Zhou, Z. Li, Z. Gong, and Y. Chen, “Feature map distillation of thin nets for low-resolution object recognition,” IEEE Transactions on Image Processing, vol. 31, pp. 1364–1379, 2022.
- S. Shin, J. Lee, J. Lee, Y. Yu, and K. Lee, “Teaching where to look: Attention similarity knowledge distillation for low resolution face recognition,” in European Conference on Computer Vision. Springer, 2022, pp. 631–647.
- Y. Jin, J. Wang, and D. Lin, “Multi-level logit distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 24 276–24 285.
- X. Deng and Z. Zhang, “Comprehensive knowledge distillation with causal intervention,” Advances in Neural Information Processing Systems, vol. 34, pp. 22 158–22 170, 2021.
- Y. Chen, S. Wang, J. Liu, X. Xu, F. de Hoog, and Z. Huang, “Improved feature distillation via projector ensemble,” Advances in Neural Information Processing Systems, vol. 35, pp. 12 084–12 095, 2022.
- R. Diaz and A. Marathe, “Soft labels for ordinal regression,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4738–4747.
- X. Li, W. Wang, X. Hu, and J. Yang, “Selective kernel networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 510–519.
- Z. Li, X. Li, L. Yang, B. Zhao, R. Song, L. Luo, J. Li, and J. Yang, “Curriculum temperature for knowledge distillation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 1504–1512.
- F. Zhang, X. Zhu, and M. Ye, “Fast human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3517–3526.
- Z. Geng, C. Wang, Y. Wei, Z. Liu, H. Li, and H. Hu, “Human pose as compositional tokens,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 660–671.
- Y. Zhu, Q. Zhou, N. Liu, Z. Xu, Z. Ou, X. Mou, and J. Tang, “Scalekd: Distilling scale-aware knowledge in small object detector,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 723–19 733.
- Z. Yang, A. Zeng, C. Yuan, and Y. Li, “Effective whole-body pose estimation with two-stages distillation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4210–4220.
- W. Huang, Z. Peng, L. Dong, F. Wei, J. Jiao, and Q. Ye, “Generic-to-specific distillation of masked autoencoders,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 15 996–16 005.
- B. Zhao, Q. Cui, R. Song, Y. Qiu, and J. Liang, “Decoupled knowledge distillation,” in Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2022, pp. 11 953–11 962.
- A. Vats and D. C. Anastasiu, “Key point-based driver activity recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3274–3281.
- U. Iqbal, P. Molchanov, and J. Kautz, “Weakly-supervised 3d human pose learning via multi-view images in the wild,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5243–5252.
- Y. Cheng, B. Wang, and R. T. Tan, “Dual networks based 3d multi-person pose estimation from monocular video,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 1636–1651, 2022.
- S. Ye, Y. Zhang, J. Hu, L. Cao, S. Zhang, L. Shen, J. Wang, S. Ding, and R. Ji, “Distilpose: Tokenized pose regression with heatmap distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2163–2172.
- J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang et al., “Deep high-resolution representation learning for visual recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 10, pp. 3349–3364, 2020.
- L. Schmidtke, A. Vlontzos, S. Ellershaw, A. Lukens, T. Arichi, and B. Kainz, “Unsupervised human pose estimation through transforming shape templates,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2484–2494.
- J. Kim, H. Lee, and S. S. Woo, “Imf: integrating matched features using attentive logit in knowledge distillation,” in Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence IJCAI, 2023.
- Z. Ni, F. Yang, S. Wen, and G. Zhang, “Dual relation knowledge distillation for object detection,” arXiv preprint arXiv:2302.05637, 2023.
- A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, “Fitnets: Hints for thin deep nets,” arXiv preprint arXiv:1412.6550, 2014.
- S. Ge, S. Zhao, C. Li, Y. Zhang, and J. Li, “Efficient low-resolution face recognition via bridge distillation,” IEEE Transactions on Image Processing, vol. 29, pp. 6898–6908, 2020.
- A. Kumar and R. Chellappa, “S2ld: Semi-supervised landmark detection in low-resolution images and impact on face verification,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 758–759.
- J. C. L. Chai, T.-S. Ng, C.-Y. Low, J. Park, and A. B. J. Teoh, “Recognizability embedding enhancement for very low-resolution face recognition and quality estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9957–9967.
- R. Sunkara and T. Luo, “No more strided convolutions or pooling: A new cnn building block for low-resolution images and small objects,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2022, pp. 443–459.
- X. Xu, H. Chen, F. Moreno-Noguer, L. A. Jeni, and F. De la Torre, “3d human shape and pose from a single low-resolution image with self-supervised learning,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16. Springer, 2020, pp. 284–300.
- R. Miles, I. Elezi, and J. Deng, “v_kd::𝑣_𝑘𝑑absentv\_kd:italic_v _ italic_k italic_d : improving knowledge distillation using orthogonal projections,” arXiv preprint arXiv:2403.06213, 2024.
- J. Wang, Y. Chen, Z. Zheng, X. Li, M.-M. Cheng, and Q. Hou, “Crosskd: Cross-head knowledge distillation for object detection.”
- Z. Zheng, R. Ye, P. Wang, D. Ren, W. Zuo, Q. Hou, and M.-M. Cheng, “Localization distillation for dense object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9407–9416.
- B. Artacho and A. Savakis, “Unipose: Unified human pose estimation in single images and videos,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 7035–7044.
- S.-E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional pose machines,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2016, pp. 4724–4732.
- Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun, “Cascaded pyramid network for multi-person pose estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7103–7112.
- A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14. Springer, 2016, pp. 483–499.
- J. Huang, Z. Zhu, F. Guo, and G. Huang, “The devil is in the details: Delving into unbiased data processing for human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5700–5709.
- Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7291–7299.
- H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, “Rmpe: Regional multi-person pose estimation,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2334–2343.
- W. Li, Z. Wang, B. Yin, Q. Peng, Y. Du, T. Xiao, G. Yu, H. Lu, Y. Wei, and J. Sun, “Rethinking on multi-stage networks for human pose estimation,” arXiv preprint arXiv:1901.00148, 2019.
- Y. Wang, M. Li, H. Cai, W.-M. Chen, and S. Han, “Lite pose: Efficient architecture design for 2d human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 126–13 136.
- Z. Geng, K. Sun, B. Xiao, Z. Zhang, and J. Wang, “Bottom-up human pose estimation via disentangled keypoint regression,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 14 676–14 686.
- F. Wei, X. Sun, H. Li, J. Wang, and S. Lin, “Point-set anchors for object detection, instance segmentation and pose estimation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16. Springer, 2020, pp. 527–544.
- V. Belagiannis, C. Rupprecht, G. Carneiro, and N. Navab, “Robust optimization for deep regression,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2830–2838.
- J. Carreira, P. Agrawal, K. Fragkiadaki, and J. Malik, “Human pose estimation with iterative error feedback,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4733–4742.
- Z. Tian, H. Chen, and C. Shen, “Directpose: Direct end-to-end multi-person pose estimation,” arXiv preprint arXiv:1911.07451, 2019.
- A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1653–1660.
- X. Zhou, D. Wang, and P. Krähenbühl, “Objects as points,” arXiv preprint arXiv:1904.07850, 2019.
- M. Sandler, A. Zhmoginov, A. G. Howard, and P. K. Mudrakarta, “Parameter-efficient multi-task and transfer learning,” Jun. 13 2023, uS Patent 11,676,008.
- S. Ge, S. Zhao, C. Li, and J. Li, “Low-resolution face recognition in the wild via selective knowledge distillation,” IEEE Transactions on Image Processing, vol. 28, no. 4, pp. 2051–2062, 2018.
- T. Vu, C. Van Nguyen, T. X. Pham, T. M. Luu, and C. D. Yoo, “Fast and efficient image quality enhancement via desubpixel convolutional neural networks,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0–0.
- Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “Eca-net: Efficient channel attention for deep convolutional neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 534–11 542.
- G. Papandreou, T. Zhu, N. Kanazawa, A. Toshev, J. Tompson, C. Bregler, and K. Murphy, “Towards accurate multi-person pose estimation in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4903–4911.
- D.-M. Pham, “Human identification using neural network-based classification of periodic behaviors in virtual reality,” in 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 2018, pp. 657–658.
- I. S. MacKenzie, “Human-computer interaction: An empirical research perspective,” 2012.
- H. Qu, L. Xu, Y. Cai, L. G. Foo, and J. Liu, “Heatmap distribution matching for human pose estimation,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 327–24 339, 2022.
- Y. Xu, J. Zhang, Q. Zhang, and D. Tao, “Vitpose: Simple vision transformer baselines for human pose estimation,” Advances in Neural Information Processing Systems, vol. 35, pp. 38 571–38 584, 2022.
- S. Yang, Z. Quan, M. Nie, and W. Yang, “Transpose: Keypoint localization via transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11 802–11 812.
- J. Kuen, X. Kong, Z. Lin, G. Wang, J. Yin, S. See, and Y.-P. Tan, “Stochastic downsampling for cost-adjustable inference and improved regularization in convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7929–7938.
- D. Li, A. Yao, and Q. Chen, “Learning to learn parameterized classification networks for scalable input images,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX 16. Springer, 2020, pp. 19–35.
- D. Morrison, A. W. Tow, M. Mctaggart, R. Smith, N. Kelly-Boxall, S. Wade-Mccue, J. Erskine, R. Grinover, A. Gurman, T. Hunn et al., “Cartman: The low-cost cartesian manipulator that won the amazon robotics challenge,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 7757–7764.
- Y. Wang, F. Sun, D. Li, and A. Yao, “Resolution switchable networks for runtime efficient image recognition,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16. Springer, 2020, pp. 533–549.
- A. Bulat and G. Tzimiropoulos, “Human pose estimation via convolutional part heatmap regression,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14. Springer, 2016, pp. 717–732.
- J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler, “Joint training of a convolutional network and a graphical model for human pose estimation,” Advances in neural information processing systems, vol. 27, 2014.
- A. Varamesh and T. Tuytelaars, “Mixture dense regression for object detection and human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 13 086–13 095.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- Z. Kan, S. Chen, C. Zhang, Y. Tang, and Z. He, “Self-correctable and adaptable inference for generalizable human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5537–5546.
- X. Peng, J. Hoffman, X. Y. Stella, and K. Saenko, “Fine-to-coarse knowledge transfer for low-res image classification,” in 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016, pp. 3683–3687.
- M. Singh, S. Nagpal, M. Vatsa, and R. Singh, “Enhancing fine-grained classification for low resolution images,” in 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021, pp. 1–8.
- Z. Zou and W. Tang, “Modulated graph convolutional network for 3d human pose estimation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11 477–11 487.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research),” URL http://www. cs. toronto. edu/kriz/cifar. html, vol. 5, no. 4, p. 1, 2010.
- Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7, no. 7, p. 3, 2015.
- L. Zhao, X. Peng, Y. Tian, M. Kapadia, and D. N. Metaxas, “Semantic graph convolutional networks for 3d human pose regression,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3425–3435.
- S.-H. Zhang, R. Li, X. Dong, P. Rosin, Z. Cai, X. Han, D. Yang, H. Huang, and S.-M. Hu, “Pose2seg: Detection free human instance segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 889–898.
- A. Zeng, X. Sun, F. Huang, M. Liu, Q. Xu, and S. Lin, “Srnet: Improving generalization in 3d human pose estimation with a split-and-recombine approach,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16. Springer, 2020, pp. 507–523.
- W. Yang, W. Ouyang, H. Li, and X. Wang, “End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3073–3082.
- W. Yang, S. Li, W. Ouyang, H. Li, and X. Wang, “Learning feature pyramids for human pose estimation,” in proceedings of the IEEE international conference on computer vision, 2017, pp. 1281–1290.
- T. Xu and W. Takano, “Graph stacked hourglass networks for 3d human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 16 105–16 114.
- K. Sun, C. Lan, J. Xing, W. Zeng, D. Liu, and J. Wang, “Human pose estimation using global and local normalization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5599–5607.
- K. Su, D. Yu, Z. Xu, X. Geng, and C. Wang, “Multi-person pose estimation with enhanced channel-wise and spatial information,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5674–5682.
- D. Shi, X. Wei, L. Li, Y. Ren, and W. Tan, “End-to-end multi-person pose estimation with transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 069–11 078.
- S. Sharma, P. T. Varigonda, P. Bindal, A. Sharma, and A. Jain, “Monocular 3d human pose estimation by generation and ordinal ranking,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 2325–2334.
- S. Lee, J. Rim, B. Jeong, G. Kim, B. Woo, H. Lee, S. Cho, and S. Kwak, “Human pose estimation in extremely low-light conditions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 704–714.
- X. Ju, A. Zeng, J. Wang, Q. Xu, and L. Zhang, “Human-art: A versatile human-centric dataset bridging natural and artificial scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 618–629.
- J. Yang, A. Zeng, S. Liu, F. Li, R. Zhang, and L. Zhang, “Explicit box detection unifies end-to-end multi-person pose estimation,” arXiv preprint arXiv:2302.01593, 2023.
- G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, Y. Kwon, K. Michael, J. Fang, Z. Yifu, C. Wong, D. Montes et al., “ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation,” Zenodo, 2022.
- P. Jeevan and A. Sethi, “Wavemix: resource-efficient token mixing for images,” arXiv preprint arXiv:2203.03689, 2022.
- G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, “Deep networks with stochastic depth,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 2016, pp. 646–661.
- G. G. Chrysos, S. Moschoglou, G. Bouritsas, J. Deng, Y. Panagakis, and S. Zafeiriou, “Deep polynomial neural networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 8, pp. 4021–4034, 2021.
- Y. Xiong, Z. Zeng, R. Chakraborty, M. Tan, G. Fung, Y. Li, and V. Singh, “Nyströmformer: A nyström-based algorithm for approximating self-attention,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 16, 2021, pp. 14 138–14 148.
- B. Koonce and B. Koonce, “Resnet 50,” Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization, pp. 63–72, 2021.
- S. Targ, D. Almeida, and K. Lyman, “Resnet in resnet: Generalizing residual architectures,” arXiv preprint arXiv:1603.08029, 2016.
- X. Nie, J. Feng, J. Zhang, and S. Yan, “Single-stage multi-person pose machines,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6951–6960.
- Q. Sun, Y. Wang, A. Zeng, W. Yin, C. Wei, W. Wang, H. Mei, C. S. Leung, Z. Liu, L. Yang et al., “Aios: All-in-one-stage expressive human pose and shape estimation,” arXiv preprint arXiv:2403.17934, 2024.
- G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
- B. Heo, J. Kim, S. Yun, H. Park, N. Kwak, and J. Y. Choi, “A comprehensive overhaul of feature distillation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1921–1930.
- B. Heo, M. Lee, S. Yun, and J. Y. Choi, “Knowledge transfer via distillation of activation boundaries formed by hidden neurons,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 3779–3787.
- W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3967–3976.
- Y. Tian, D. Krishnan, and P. Isola, “Contrastive representation distillation,” arXiv preprint arXiv:1910.10699, 2019.
- T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.
- J. Guo, M. Chen, Y. Hu, C. Zhu, X. He, and D. Cai, “Reducing the teacher-student gap via spherical knowledge disitllation,” arXiv preprint arXiv:2010.07485, 2020.
- Y. Niu, L. Chen, C. Zhou, and H. Zhang, “Respecting transfer gap in knowledge distillation,” Advances in Neural Information Processing Systems, vol. 35, pp. 21 933–21 947, 2022.
- H. Liu, T. Liu, Y. Chen, Z. Zhang, and Y.-F. Li, “Ehpe: Skeleton cues-based gaussian coordinate encoding for efficient human pose estimation,” IEEE Transactions on Multimedia, 2022.
- W. Li, H. Liu, R. Ding, M. Liu, P. Wang, and W. Yang, “Exploiting temporal contexts with strided transformer for 3d human pose estimation,” IEEE Transactions on Multimedia, vol. 25, pp. 1282–1293, 2022.
- M. Li, Z. Zhou, and X. Liu, “Multi-person pose estimation using bounding box constraint and lstm,” IEEE Transactions on Multimedia, vol. 21, no. 10, pp. 2653–2663, 2019.
- S. Zou, X. Zuo, S. Wang, Y. Qian, C. Guo, and L. Cheng, “Human pose and shape estimation from single polarization images,” IEEE Transactions on Multimedia, 2022.
- G. Ning, Z. Zhang, and Z. He, “Knowledge-guided deep fractal neural networks for human pose estimation,” IEEE Transactions on Multimedia, vol. 20, no. 5, pp. 1246–1259, 2017.