Boosting Gesture Recognition with an Automatic Gesture Annotation Framework
Abstract: Training a real-time gesture recognition model heavily relies on annotated data. However, manual data annotation is costly and demands substantial human effort. In order to address this challenge, we propose a framework that can automatically annotate gesture classes and identify their temporal ranges. Our framework consists of two key components: (1) a novel annotation model that leverages the Connectionist Temporal Classification (CTC) loss, and (2) a semi-supervised learning pipeline that enables the model to improve its performance by training on its own predictions, known as pseudo labels. These high-quality pseudo labels can also be used to enhance the accuracy of other downstream gesture recognition models. To evaluate our framework, we conducted experiments using two publicly available gesture datasets. Our ablation study demonstrates that our annotation model design surpasses the baseline in terms of both gesture classification accuracy (3-4% improvement) and localization accuracy (71-75% improvement). Additionally, we illustrate that the pseudo-labeled dataset produced from the proposed framework significantly boosts the accuracy of a pre-trained downstream gesture recognition model by 11-18%. We believe that this annotation framework has immense potential to improve the training of downstream gesture recognition models using unlabeled datasets.
- “Gesture spotter: A rapid prototyping tool for key gesture spotting in virtual and augmented reality applications,” in IEEE transactions on visualization and computer graphics, 2022.
- “Decoupled representation learning for skeleton-based gesture recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5751–5760.
- “The imaginative generative adversarial network: Automatic data augmentation for dynamic skeleton-based hand gesture and human action recognition,” in 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021). 2021, p. 1–8, IEEE Press.
- “Improved regularization of convolutional neural networks with cutout,” arXiv preprint arXiv:1708.04552, 2017.
- “mixup: Beyond empirical risk minimization,” arXiv preprint arXiv:1710.09412, 2017.
- “Enabling hand gesture customization on wrist-worn devices,” 2022.
- “Shrec 2021: Track on skeleton-based hand gesture recognition in the wild,” arXiv preprint arXiv:2106.10980, 2021.
- “SHREC’17 Track: 3D Hand Gesture Recognition Using a Depth and Skeletal Dataset,” in 3DOR - 10th Eurographics Workshop on 3D Object Retrieval, I. Pratikakis, F. Dupont, and M. Ovsjanikov, Eds., Lyon, France, Apr. 2017, pp. 1–6.
- “A review on automatic image annotation techniques,” Pattern Recognition, vol. 45, no. 1, pp. 346–362, 2012.
- “Automatic annotation of everyday movements,” Advances in neural information processing systems, vol. 16, 2003.
- “Automatic annotation of human actions in video,” in 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2009, pp. 1491–1498.
- “Towards accurate automatic segmentation of imu-tracked motion gestures,” 2015, CHI EA ’15, Association for Computing Machinery.
- “Semi-automation of gesture annotation by machine learning and human collaboration,” Language Resources and Evaluation, vol. 56, 09 2022.
- Burr Settles, “Active learning literature survey,” 2009.
- “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- “Lstm can solve hard long time lag problems,” Advances in neural information processing systems, pp. 473–479, 1997.
- “Effective approaches to attention-based neural machine translation,” arXiv preprint arXiv:1508.04025, 2015.
- “Low-latency hand gesture recognition with a low-resolution thermal imager,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 98–99.
- “Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4207–4215.
- “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369–376.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.