mmWave Radar SB Recognition
- mmWave radar-based SB recognition is defined by using high-frequency FMCW techniques to detect subtle phase and Doppler changes from micro-movements.
- The approach integrates advanced signal processing with deep learning and sensor fusion to achieve high accuracy in detecting gestures, bruxism, speech vibrations, and blockages.
- Applications span healthcare, communications, and VR, with benchmarks indicating accuracies above 95% and reliable real-time performance.
Millimeter-wave (mmWave) radar-based SB (skeleton-based or scatterer/blockage, depending on context) recognition leverages high-frequency frequency-modulated continuous wave (FMCW) radar to sense and characterize micro-movements and dynamic interactions in various application domains. This approach exploits precise phase and Doppler extraction possible at 60–81 GHz, enabling robust recognition of fine-grained activities such as hand skeletal gestures, physiological micro-motions (e.g., bruxism), speech vibrations, or blockage events in communications. Recent advances incorporate deep learning, multitask inference, and radar-IMU sensor fusion for enhanced performance, privacy, and versatility across application verticals from healthcare to wireless networking and VR interfaces (Basak et al., 2024, Shen et al., 7 Dec 2025, Demirhan et al., 2021, Lv et al., 23 Jan 2025).
1. Physical and Mathematical Foundations of mmWave FMCW Radar for SB Recognition
SB recognition via mmWave radar exploits the physical principle that periodic or non-periodic micro-movements (e.g., jaw, hand, device vibrations, moving scatterers) induce minute phase and frequency modulations in the radar’s intermediate-frequency (IF) or beat signal. The FMCW signal model for transmitted chirps is:
where is typically 60–81 GHz, is the chirp slope, and amplitude. A reflected signal from a moving target at range introduces a round-trip delay and phase shift, which after mixing yields a beat signal with frequency and phase . Small movements (on the order of μm–mm) modulate the IF phase, which is highly sensitive owing to the short wavelength (–$5$ mm for –81 GHz) (Basak et al., 2024, Lv et al., 23 Jan 2025).
Phase unwrapping and difference operations suppress static clutter, and displacement or velocity can be precisely derived:
- Displacement:
- Instantaneous Doppler:
This foundation generalizes across domains—fine earpiece vibrations for speech (Basak et al., 2024), jaw oscillations for bruxism (Shen et al., 7 Dec 2025), skeletal hand motion (Lv et al., 23 Jan 2025), and moving objects for blockage recognition (Demirhan et al., 2021).
2. Signal Processing and Feature Engineering Approaches
Signal acquisition begins with streaming IF I/Q samples from the radar array, followed by dimensionality reduction using multi-dimensional FFTs to extract range, Doppler, and angle features. Pre-processing then targets clutter removal, phase unwrapping, and frame selection for the target region (e.g., face, hand, phone, scatterer). Examples of pre-processing steps:
- Windowing (e.g., Hanning) to reduce range-FFT sidelobes (Shen et al., 7 Dec 2025)
- Range-FFT to locate reflection peaks (Basak et al., 2024, Shen et al., 7 Dec 2025)
- Phase extraction:
- Phase differencing: to suppress drift (Shen et al., 7 Dec 2025)
Feature extraction strategies depend on the modality:
- Statistical descriptors: mean absolute phase difference, variance, kurtosis, spectral entropy, and energy in target bands (e.g., 5–10 Hz for bruxism) (Shen et al., 7 Dec 2025)
- Count-based metrics: number of local extrema or threshold-crossing events in the phase-differenced signal (Shen et al., 7 Dec 2025)
- Heatmap formation: generating 2D range–Doppler or range–angle maps for spatial/temporal skeletal analysis (Lv et al., 23 Jan 2025)
- Sequential feature stacking: aggregation of multiple time-windowed radar maps or features for temporal context (e.g., ) (Lv et al., 23 Jan 2025, Demirhan et al., 2021)
Error correction is frequently applied at the statistical or filtering level to suppress hardware artifacts and environmental interference (Basak et al., 2024, Shen et al., 7 Dec 2025).
3. Machine Learning and SB Recognition Architectures
Recognition frameworks differ by task, ranging from classical machine learning to large-scale neural networks:
- Random Forest classifier (bruxism): operates on an 11-dimensional, per-session feature vector; achieves test accuracy , precision , recall , F1 (Shen et al., 7 Dec 2025).
- Two-stage skeleton-based deep pipeline (gesture): Stage I uses a Transformer for 3D hand-joint regression from stacked radar-IMU features; Stage II uses a ResNet50 to classify rendered “skeleton images” for gesture types, reaching in-domain gesture accuracy of (Lv et al., 23 Jan 2025).
- CNN-LSTM sequence model (blockage): spatial feature extraction on each frame via a multi-layer CNN, temporal modeling with an LSTM, and binary classification with a dense/sigmoid head; overall test accuracy $95$–, F1 $90$– for 1-s ahead blockage prediction (Demirhan et al., 2021).
- LoRA-adapted LLM (speech): Low-Rank Adaptation (LoRA) fine-tuning of OpenAI Whisper-large-v2 (1.5B params) on upsampled, denoised radar-derived audio, following synthetic and real radar speech domain adaptation (Basak et al., 2024).
Model training and validation are performed through protocols such as cross-validation (Shen et al., 7 Dec 2025), ablation across sensor modalities (Lv et al., 23 Jan 2025), and staged fine-tuning (Basak et al., 2024) to address both class balance and domain gap.
4. Application Domains and Performance Benchmarks
mmWave radar-based SB recognition spans a spectrum of real-world tasks:
| Application | Task Type | Primary Features/Approach | Accuracy / F1 | Key Reference |
|---|---|---|---|---|
| Bruxism monitoring | Binary (grind/no-grind) | 11 statistical and spectral features, Random Forest | 96.1% (Acc) | (Shen et al., 7 Dec 2025) |
| Hand gesture | Multi-class (8 classes) | Transformer-based pose, skeleton ResNet | 90.8% (Acc), >93% (F1) | (Lv et al., 23 Jan 2025) |
| Device blockage | Binary (blockage) | CNN+LSTM on radar maps | 95–97% (Acc), 90–93% (F1) | (Demirhan et al., 2021) |
| Speech recognition | Sentence ASR | LoRA-adapted Whisper on denoised radar “audio” | 44.74% (Wacc), 62.52% (Cacc) | (Basak et al., 2024) |
In gesture recognition, multifactor evaluations—cross-person, cross-scene, and cross-hand transfer—indicate performance decay without few-shot calibration, e.g., zero-shot cross-person accuracy , but rises to with one-shot fine-tuning (Lv et al., 23 Jan 2025). For bruxism, the confusion matrix confirms low false positive/negative rates, and for speech ASR, accuracy is bandwidth- and SNR-limited but remains above random and lipreading baselines within 1.25 m range (Basak et al., 2024, Shen et al., 7 Dec 2025).
5. Challenges, Limitations, and Mitigation Strategies
- Spatial resolution and multipath: mmWave radar’s spatial granularity restricts small feature (finger, fine vibration) recovery; multipath from clutter increases signal ambiguity (Lv et al., 23 Jan 2025, Shen et al., 7 Dec 2025).
- Noise: Low SNR (2–5 dB for speech eavesdropping) and bandwidth constraints (<1.5 kHz audio recovery) degrade performance (Basak et al., 2024).
- Population/diversity generalization: Small subject cohorts risk anatomical overfitting; zero-shot transfer is suboptimal without adaptation (Lv et al., 23 Jan 2025, Shen et al., 7 Dec 2025).
- Environmental interference: Competing facial movements, nearby objects, and dynamic occluders introduce spurious modulations (Shen et al., 7 Dec 2025).
- Data scarcity: No large public mmWave speech, bruxism, or gesture datasets; addressed via synthetic data generation (filtered audio plus noise) (Basak et al., 2024).
- Model and system burden: Large models (Whisper, ResNet50) have inference cost; multi-antenna arrays and sensor fusion raise hardware complexity (Basak et al., 2024, Lv et al., 23 Jan 2025).
Countermeasures and improvements include beamforming, sensor fusion (e.g., radar-IMU for compensating head motion), context prompting/LLM priming, advanced denoising and super-resolution mechanisms, and on-device closed-loop feedback for privacy and robustness (Basak et al., 2024, Shen et al., 7 Dec 2025, Lv et al., 23 Jan 2025).
6. Deployment and Future Directions
Deployment considerations for mmWave radar-based SB recognition center on non-invasiveness, privacy, and real-time operation. Systems can be wall- or ceiling-mounted (bruxism), head-mounted (gestures), or physically integrated near devices (speech, blockage) (Basak et al., 2024, Shen et al., 7 Dec 2025, Lv et al., 23 Jan 2025, Demirhan et al., 2021). Privacy is intrinsic (RF-only, no images); further measures (beam shaping, raw I/Q suppression, event-only sharing) address user concerns (Shen et al., 7 Dec 2025).
Research aims include:
- Multi-antenna spatial selectivity to target or exclude anatomical regions (Shen et al., 7 Dec 2025)
- Supervised and self-supervised pre-training for generalization across subjects and contexts (Lv et al., 23 Jan 2025)
- Hybrid sensing (radar plus LiDAR/vision) for multimodal SB recognition (Demirhan et al., 2021)
- On-device low-latency inferencing and quantization for real-time feedback (Lv et al., 23 Jan 2025)
- Adversarial/defensive measures for privacy (vibration dampers, jamming, coatings) in security-sensitive scenarios (e.g., speech eavesdropping) (Basak et al., 2024)
A plausible implication is that as radar bandwidth, multi-antenna configurations, and algorithmic paradigms advance, millimeter-wave radar-based SB recognition will gain accuracy and robustness across healthcare, communications, HCI, and security contexts, while demanding ongoing vigilance for privacy risks and unintended side-channel leakage.