Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion (1901.04622v1)

Published 15 Jan 2019 in cs.CV

Abstract: Gesture recognition is a hot topic in computer vision and pattern recognition, which plays a vitally important role in natural human-computer interface. Although great progress has been made recently, fast and robust hand gesture recognition remains an open problem, since the existing methods have not well balanced the performance and the efficiency simultaneously. To bridge it, this work combines image entropy and density clustering to exploit the key frames from hand gesture video for further feature extraction, which can improve the efficiency of recognition. Moreover, a feature fusion strategy is also proposed to further improve feature representation, which elevates the performance of recognition. To validate our approach in a "wild" environment, we also introduce two new datasets called HandGesture and Action3D datasets. Experiments consistently demonstrate that our strategy achieves competitive results on Northwestern University, Cambridge, HandGesture and Action3D hand gesture datasets. Our code and datasets will release at https://github.com/Ha0Tang/HandGestureRecognition.

Authors (4)

Hao Tang (379 papers)
Hong Liu (395 papers)
Wei Xiao (100 papers)
Nicu Sebe (270 papers)

Citations (101)

View on Semantic Scholar

Summary

Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion

The paper "Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion" addresses the challenge of achieving both speed and accuracy in hand gesture recognition, a crucial component in human-computer interaction interfaces. The authors propose a novel method that balances computational efficiency and robust performance by employing a two-pronged approach: extracting key frames and applying a feature fusion strategy.

Key Frames Extraction

The extraction of key frames is pivotal to the proposed method. The authors introduce a technique that leverages image entropy and density clustering to identify representative frames from a hand gesture video. The process involves calculating the local entropy for each frame to determine its information content and subsequently mapping these values into a two-dimensional space. Local extreme points (maxima and minima) in this space are identified, reflecting frames with significant discriminative power. These points are then clustered using a density clustering method, which is favored over traditional clustering techniques like K-means due to its ability to detect local feature clusters. This not only streamlines the dataset by removing redundant information but also reduces computational overhead significantly.

Feature Fusion Strategy

The second component of the method is a feature fusion strategy that enhances recognition accuracy. This technique combines appearance features, derived from individual key frames, with motion features, captured from their temporal sequence. The authors employ a variety of established descriptors such as SURF for appearance and LBP-TOP for motion, allowing for a comprehensive representation of the gesture dynamics. The fusion process involves weighting these features according to their respective classification rates, ensuring an optimal balance that enhances the robustness of the gesture recognition system.

Experimental Validation

The proposed approach is validated on multiple datasets, including the publicly known Cambridge and Northwestern datasets, as well as two newly introduced datasets named HandGesture and Action3D. On these datasets, the method demonstrates substantial improvements in recognition accuracy over existing approaches, achieving rates as high as 98.23% on the Cambridge dataset and 96.89% on the Northwestern dataset. Importantly, these results are not only reputedly high but also accompanied by increased computational efficiency, as evidenced by reduced processing times compared to contemporary methodologies.

Implications and Future Directions

The dual focus on speed and accuracy suggests significant practical implications for real-time systems, such as interactive gaming, virtual reality interfaces, and assistive technologies for individuals with disabilities. The integration of key frame extraction with adaptive feature weighting offers a flexible framework that could be extended to other domains of pattern recognition beyond hand gestures.

For future work, further improvements could involve exploring adaptive methods for dynamically adjusting parameters like the number of key frames or feature weights based on environmental contexts or user feedback. Moreover, integrating depth information or leveraging latest advancements in deep learning architectures could further refine the recognition system.

In conclusion, this paper presents a well-founded approach that contributes meaningfully to the field of gesture recognition by addressing key challenges in computational efficiency and recognition robustness through innovative techniques in data reduction and feature representation.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - Ha0Tang/HandGestureRecognition: [Neurocomputing 2019] Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion (102 stars)