Resource-Efficient Gesture Recognition using Low-Resolution Thermal Camera via Spiking Neural Networks and Sparse Segmentation (2401.06563v1)
Abstract: This work proposes a novel approach for hand gesture recognition using an inexpensive, low-resolution (24 x 32) thermal sensor processed by a Spiking Neural Network (SNN) followed by Sparse Segmentation and feature-based gesture classification via Robust Principal Component Analysis (R-PCA). Compared to the use of standard RGB cameras, the proposed system is insensitive to lighting variations while being significantly less expensive compared to high-frequency radars, time-of-flight cameras and high-resolution thermal sensors previously used in literature. Crucially, this paper shows that the innovative use of the recently proposed Monostable Multivibrator (MMV) neural networks as a new class of SNN achieves more than one order of magnitude smaller memory and compute complexity compared to deep learning approaches, while reaching a top gesture recognition accuracy of 93.9% using a 5-class thermal camera dataset acquired in a car cabin, within an automotive context. Our dataset is released for helping future research.
- Principal component pursuit by alternating directions. https://github.com/dganguli/robust-pca. Accessed: 2023-12-19.
- Radarsnn: A resource efficient gesture sensing system based on mm-wave radar. IEEE Transactions on Microwave Theory and Techniques, 70(4):2451–2461, 2022.
- Hand gesture recognition in range-doppler images using binary activated spiking neural networks. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pages 01–07, 2021.
- Hands on the wheel: A dataset for driver hand detection and tracking. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pages 564–570, 2018.
- Robust principal component analysis? J. ACM, 58(3), jun 2011.
- Monostable multivibrators as novel artificial neurons. Neural Networks, 108:224–239, 2018.
- Not biologically inspired: On training networks of monostable multivibrator timer neurons. Aug. 2023.
- Real-time hand gesture detection and classification using convolutional neural networks. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pages 1–8, 2019.
- Drivermhg: A multi-modal dataset for dynamic recognition of driver micro hand gestures and a real-time recognition framework. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pages 77–84, 2020.
- A new rgb-d gesture video dataset for real-life scenarios. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pages 1–5, 2021.
- Event-based visual gesture recognition with background suppression running on a smart-phone. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pages 1–1, 2019.
- S. R. Mokalla and T. Bourlai. Robust lwir-based eye center detection through thermal to visible image synthesis. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pages 1–8, 2021.
- Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine, 36(6):51–63, 2019.
- Fail-safe human detection for drones using a multi-modal curriculum learning approach. IEEE Robotics and Automation Letters, 7(1):303–310, 2022.
- Fusing event-based camera and radar for slam using spiking neural networks with continual stdp learning. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 2782–2788, 2023.
- L. N. Trefethen and D. Bau. Numerical Linear Algebra. SIAM, 1997.
- Advanced motion-tracking system with multi-layers deep learning framework for innovative car-driver drowsiness monitoring. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pages 1–5, 2019.
- Low-latency hand gesture recognition with a low resolution thermal imager. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 440–449, 2020.
- Introduction and analysis of an event-based sign language dataset. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pages 675–682, 2020.
- Noise robust hard example mining for human detection with efficient depth-thermal fusion. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pages 809–813, 2020.