Skeleton-Based Human Action Recognition with Noisy Labels
Abstract: Understanding human actions from body poses is critical for assistive robots sharing space with humans in order to make informed and safe decisions about the next interaction. However, precise temporal localization and annotation of activity sequences is time-consuming and the resulting labels are often noisy. If not effectively addressed, label noise negatively affects the model's training, resulting in lower recognition quality. Despite its importance, addressing label noise for skeleton-based action recognition has been overlooked so far. In this study, we bridge this gap by implementing a framework that augments well-established skeleton-based human action recognition methods with label-denoising strategies from various research areas to serve as the initial benchmark. Observations reveal that these baselines yield only marginal performance when dealing with sparse skeleton data. Consequently, we introduce a novel methodology, NoiseEraSAR, which integrates global sample selection, co-teaching, and Cross-Modal Mixture-of-Experts (CM-MOE) strategies, aimed at mitigating the adverse impacts of label noise. Our proposed approach demonstrates better performance on the established benchmark, setting new state-of-the-art standards. The source code for this study is accessible at https://github.com/xuyizdby/NoiseEraSAR.
- M. Dallel, V. Havard, D. Baudry, and X. Savatier, “Inhard-industrial human action recognition dataset in the context of industrial collaborative robotics,” in Proc. ICHMS, 2020, pp. 1–6.
- I. Rodomagoulakis et al., “Multimodal human action recognition in assistive human-robot interaction,” in Proc. ICASSP, 2016, pp. 2702–2706.
- J. Lee and B. Ahn, “Real-time human action recognition with a low-cost RGB camera and mobile robot platform,” Sensors, vol. 20, no. 10, p. 2886, 2020.
- S. C. Akkaladevi and C. Heindl, “Action recognition for human robot interaction in industrial applications,” in Proc. CGVIS, 2015, pp. 94–99.
- J. Yu, H. Gao, Y. Chen, D. Zhou, J. Liu, and Z. Ju, “Adaptive spatiotemporal representation learning for skeleton-based human action recognition,” IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 4, pp. 1654–1665, 2022.
- S. Cho, M. Maqbool, F. Liu, and H. Foroosh, “Self-attention network for skeleton-based human action recognition,” in Proc. WACV, 2020, pp. 624–633.
- D.-T. Pham, T.-N. Nguyen, T.-L. Le, and H. Vu, “Spatio-temporal representation for skeleton-based human action recognition,” in Proc. MAPR, 2020, pp. 1–6.
- J. Koch, L. Büsch, M. Gomse, and T. Schüppstuhl, “A methods-time-measurement based approach to enable action recognition for multi-variant assembly in human-robot collaboration,” Procedia CIRP, vol. 106, pp. 233–238, 2022.
- A. Althnian et al., “Impact of dataset size on classification performance: An empirical evaluation in the medical domain,” Applied Sciences, vol. 11, no. 2, p. 796, 2021.
- H. Song, M. Kim, and J.-G. Lee, “SELFIE: Refurbishing unclean samples for robust deep learning,” in Proc. ICML, 2019, pp. 5907–5915.
- J. Wei, H. Liu, T. Liu, G. Niu, M. Sugiyama, and Y. Liu, “To smooth or not? When label smoothing meets noisy labels,” in Proc. ICML, 2022, pp. 23 589–23 614.
- P. Chen, J. Ye, G. Chen, J. Zhao, and P.-A. Heng, “Beyond class-conditional assumption: A primary attempt to combat instance-dependent label noise,” in Proc. AAAI, 2021, pp. 11 442–11 450.
- B. Han et al., “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” in Proc. NeurIPS, 2018, pp. 8536–8546.
- P. Chen, B. B. Liao, G. Chen, and S. Zhang, “Understanding and utilizing deep neural networks trained with noisy labels,” in Proc. ICML, 2019, pp. 1062–1070.
- J. Li, R. Socher, and S. C. H. Hoi, “DivideMix: Learning with noisy labels as semi-supervised learning,” in Proc. ICLR, 2020.
- S. Liu, Z. Zhu, Q. Qu, and C. You, “Robust training under label noise by over-parameterization,” in Proc. ICML, 2022, pp. 14 153–14 172.
- H. Bae, S. Shin, B. Na, J. Jang, K. Song, and I.-C. Moon, “From noisy prediction to true label: Noisy prediction calibration via generative model,” in Proc. ICML, 2022, pp. 1277–1297.
- Y. Chen, Z. Zhang, C. Yuan, B. Li, Y. Deng, and W. Hu, “Channel-wise topology refinement graph convolution for skeleton-based action recognition,” in Proc. ICCV, 2021, pp. 13 339–13 348.
- P. Wang, Z. Li, Y. Hou, and W. Li, “Action recognition based on joint trajectory maps using convolutional neural networks,” in Proc. MM, 2016, pp. 102–106.
- T. S. Kim and A. Reiter, “Interpretable 3D human action analysis with temporal convolutional networks,” in Proc. CVPRW, 2017, pp. 1623–1631.
- J. Tu, M. Liu, and H. Liu, “Skeleton-based human action recognition using spatial temporal 3D convolutional neural networks,” in Proc. ICME, 2018, pp. 1–6.
- Y. Du, W. Wang, and L. Wang, “Hierarchical recurrent neural network for skeleton based action recognition,” in Proc. CVPR, 2015, pp. 1110–1118.
- P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue, and N. Zheng, “View adaptive neural networks for high performance skeleton-based human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 8, pp. 1963–1978, 2019.
- C. Si, W. Chen, W. Wang, L. Wang, and T. Tan, “An attention enhanced graph convolutional LSTM network for skeleton-based action recognition,” in Proc. CVPR, 2019, pp. 1227–1236.
- Y. Zhang, B. Wu, W. Li, L. Duan, and C. Gan, “STST: Spatial-temporal specialized transformer for skeleton-based action recognition,” in Proc. MM, 2021, pp. 3229–3237.
- J. Lee, M. Lee, D. Lee, and S. Lee, “Hierarchically decomposed graph convolutional networks for skeleton-based action recognition,” in Proc. ICCV, 2023, pp. 10 410–10 419.
- D. Ahn, S. Kim, H. Hong, and B. C. Ko, “STAR-Transformer: A spatio-temporal cross attention transformer for human action recognition,” in Proc. WACV, 2023, pp. 3330–3339.
- S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Proc. AAAI, 2018, pp. 7444–7452.
- F. Ye, S. Pu, Q. Zhong, C. Li, D. Xie, and H. Tang, “Dynamic GCN: Context-enriched topology learning for skeleton-based action recognition,” in Proc. MM, 2020, pp. 55–63.
- K. Peng et al., “Navigating open set scenarios for skeleton-based action recognition,” in Proc. AAAI, 2024.
- H. Song, M. Kim, D. Park, Y. Shin, and J.-G. Lee, “Learning from noisy labels with deep neural networks: A survey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 11, pp. 8135–8153, 2023.
- X. Xia et al., “Are anchor points really indispensable in label-noise learning?” in Proc. NeurIPS, 2019, pp. 6838–6849.
- R. Tanno, A. Saeedi, S. Sankaranarayanan, D. C. Alexander, and N. Silberman, “Learning from noisy labels by regularized estimation of annotator confusion,” in Proc. CVPR, 2019, pp. 11 236–11 245.
- Z. Zhu, Y. Song, and Y. Liu, “Clusterability as an alternative to anchor points when learning with noisy labels,” in Proc. ICML, 2021, pp. 12 912–12 923.
- Z. Zhu, J. Wang, and Y. Liu, “Beyond images: Label noise transition matrix estimation for tasks with lower-quality features,” in Proc. ICML, 2022, pp. 27 633–27 653.
- S. Li, X. Xia, H. Zhang, Y. Zhan, S. Ge, and T. Liu, “Estimating noise transition matrix with label correlations for noisy multi-label learning,” in Proc. NeurIPS, 2022, pp. 24 184–24 198.
- T. Liu and D. Tao, “Classification with noisy labels by importance reweighting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 3, pp. 447–461, 2016.
- J. Shu et al., “Meta-weight-net: Learning an explicit mapping for sample weighting,” in Proc. NeurIPS, 2019, pp. 1919–1930.
- Z. Wang, G. Hu, and Q. Hu, “Training noise-robust deep neural networks via meta-learning,” in Proc. CVPR, 2020, pp. 4523–4532.
- G. Zheng, A. H. Awadallah, and S. Dumais, “Meta label correction for noisy label learning,” in Proc. AAAI, 2021, pp. 11 053–11 061.
- Y. Liu and H. Guo, “Peer loss functions: Learning from noisy labels without knowing noise rates,” in Proc. ICML, 2020, pp. 6226–6236.
- X. Ma, H. Huang, Y. Wang, S. Romano, S. Erfani, and J. Bailey, “Normalized loss functions for deep learning with noisy labels,” in Proc. ICML, 2020, pp. 6543–6553.
- Z. Zhu, T. Liu, and Y. Liu, “A second-order approach to learning with instance-dependent label noise,” in Proc. CVPR, 2021, pp. 10 108–10 118.
- H. Wei, L. Tao, R. Xie, and B. An, “Open-set label noise can improve robustness against inherent label noise,” in Proc. NeurIPS, 2021, pp. 7978–7992.
- H. Cheng, Z. Zhu, X. Sun, and Y. Liu, “Mitigating memorization of noisy labels via regularization between representations,” in Proc. ICLR, 2023.
- L. Cheng et al., “Weakly supervised learning with side information for noisy labeled images,” in Proc. ECCV, 2020, pp. 306–321.
- D. Patel and P. S. Sastry, “Adaptive sample selection for robust learning under label noise,” in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3921–3931.
- A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “NTU RGB+ D: A large scale dataset for 3D human activity analysis,” in Proc. CVPR, 2016, pp. 1010–1019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.