Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HOKEM: Human and Object Keypoint-based Extension Module for Human-Object Interaction Detection (2306.14260v1)

Published 25 Jun 2023 in cs.CV and cs.LG

Abstract: Human-object interaction (HOI) detection for capturing relationships between humans and objects is an important task in the semantic understanding of images. When processing human and object keypoints extracted from an image using a graph convolutional network (GCN) to detect HOI, it is crucial to extract appropriate object keypoints regardless of the object type and to design a GCN that accurately captures the spatial relationships between keypoints. This paper presents the human and object keypoint-based extension module (HOKEM) as an easy-to-use extension module to improve the accuracy of the conventional detection models. The proposed object keypoint extraction method is simple yet accurately represents the shapes of various objects. Moreover, the proposed human-object adaptive GCN (HO-AGCN), which introduces adaptive graph optimization and attention mechanism, accurately captures the spatial relationships between keypoints. Experiments using the HOI dataset, V-COCO, showed that HOKEM boosted the accuracy of an appearance-based model by a large margin.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Y. Kong and Y. Fu, “Human action recognition and prediction: A survey,” Int. Jour. Computer Vision (IJCV), vol. 130, no. 5, pp. 1366–1401, 2022.
  2. G. Gkioxari, R. Girshick, P. Dollár, and K. He, “Detecting and recognizing human-object interactions,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8359–8367.
  3. C. Gao, Y. Zou, and J-B. Huang, “Ican: Instance-centric attention network for human-object interaction detection,” arXiv preprint arXiv:1808.10437, 2018.
  4. B. Kim, T. Choi, J. Kang, and H J. Kim, “Uniondet: Union-level detector towards real-time human-object interaction detection,” in Proc. European Conf. Computer Vision (ECCV), 2018, pp. 498–514.
  5. O. Ulutan, A. S. M. Iftekhar, and B.S. Manjunath, “Vsgnet: Spatial attention network for detecting human object interactions using graph convolutions,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2020, pp. 13617–13626.
  6. Z. Cao, T. Simon, S-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7291–7299.
  7. Z. Geng, K. Sun, B. Xiao, Z. Zhang, and J. Wang, “Bottom-up human pose estimation via disentangled keypoint regression,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2021, pp. 14676–14686.
  8. P. Zhou and M. Chi, “Relation parsing neural network for human-object interaction detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2019, pp. 843–851.
  9. B. Wan, D. Zhou, Y. Liu, R. Li, and X. He, “Pose-aware multi-level feature network for human object interaction detection,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2019, pp. 9469–9478.
  10. X. Zhong, C. Ding, X. Qu, and D. Tao, “Polysemy deciphering network for human-object interaction detection,” in Proc. European Conf. Computer Vision (ECCV), 2020, pp. 69–85.
  11. D-J. Kim, X. Sun, J. Choi, S. Lin, and I S. Kweon, “Detecting human-object interactions with action co-occurrence priors,” in Proc. European Conf. Computer Vision (ECCV), 2020, pp. 718–736.
  12. Z. Liang, J. Liu, Y. Guan, and J. Rojas, “Pose-based modular network for human-object interaction detection,” arXiv preprint arXiv:2008.02042, 2020.
  13. L. Liu and R.T. Tan, “Human object interaction detection using two-direction spatial enhancement and exclusive object prior,” Pattern Recognition, vol. 124, pp. 108438, 2022.
  14. M. Zhu, E. SL Ho, and H. PH Shum, “A skeleton-aware graph convolutional network for human-object interaction detection,” in Proc. IEEE Int. Conf. Systems, Man, and Cybernetics (SMC), 2022, pp. 474–491.
  15. S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Proc. AAAI Conf. Artificial Intelligence (AAAI), 2018, vol. 32, no. 1.
  16. L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-stream adaptive graph convolutional networks for skeleton-based action recognition,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2019, pp. 12026–12035.
  17. Y.-F. Song, Z. Zhang, C. Shan, and L. Wang, “Constructing stronger and faster baselines for skeleton-based action recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), 2022.
  18. T Y. Zhang and C Y. Suen, “A fast parallel algorithm for thinning digital patterns,” Communications of the ACM, vol. 27, no. 3, pp. 236–239, 1984.
  19. Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile network design,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13713–13722.
  20. S. Gupta and J. Malik, “Visual semantic role labeling,” arXiv preprint arXiv:1505.04474, 2015.
  21. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
  22. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2017, pp. 2961–2969.
  23. B. Kim, J. Lee, J. Kang, E-S. Kim, and H J. Kim, “Hotr: End-to-end human-object interaction detection with transformers,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2021, pp. 74–83.
Citations (3)

Summary

We haven't generated a summary for this paper yet.