Papers
Topics
Authors
Recent
Search
2000 character limit reached

GesFi: WiFi Gesture Recognition Framework

Updated 15 January 2026
  • GesFi is a WiFi-based gesture recognition framework employing latent domain mining to autonomously discover intrinsic domain factors from CSI data without relying on physical labels.
  • It integrates a denoising and visualization pipeline with iterative clustering and adversarial alignment to significantly improve cross-domain gesture recognition accuracy.
  • System deployment on commodity WiFi hardware across multiple datasets demonstrates performance gains up to +78% over traditional domain-adaptation methods.

GesFi is a WiFi-based gesture recognition framework that redefines domain generalization paradigms through the introduction of WiFi latent domain mining. Unlike preceding approaches that rely on physical labels such as user location or device orientation, GesFi autonomously discovers intrinsic domain factors responsible for distributional shifts in the Channel State Information (CSI) data, facilitating robust generalization in unseen target environments. The system integrates a denoising and visualization pipeline with iterative clustering and adversarial alignment to learn invariant features, achieving significant improvements in cross-domain recognition accuracy relative to prior methods (Zhang et al., 7 Jan 2026).

1. Principles of WiFi Latent Domain Mining

GesFi fundamentally departs from conventional domain adaptation by eschewing explicit physical-domain labels in favor of latent domains extracted from data statistics. The key premise is that physical labels (e.g., locations, orientations) poorly capture the nuanced factors that induce distribution changes in CSI during wireless gesture sensing. Instead, GesFi employs unsupervised clustering of learned feature representations, augmented with class-wise adversarial learning, to find latent domain factors more tightly coupled to real-world CSI variations. This mitigates two principal pitfalls: classification conflict (merging gesture classes with overlapping distributions) and manifold distortion (misalignment of distant domains distorting true CSI geometry).

2. Data Processing and Standardization Pipeline

The acquisition and preprocessing pipeline in GesFi ensures that raw WiFi CSI is transformed into high-fidelity input images suitable for deep learning:

  • CSI Ratio Denoising: Raw CSI at subcarrier ff, time tt is modeled as R=HS+NR = HS + \mathcal{N}, with H=Hs+HdH = H_s + H_d separating static and dynamic (gesture-induced) contributions. To suppress oscillator phase noise θn\theta_n, CSI from adjacent antennas H1(f,t)H_1(f,t) and H2(f,t)H_2(f,t) are ratioed: Hq(f,t)=H1(f,t)/H2(f,t)H_q(f,t) = H_1(f,t)/H_2(f,t). For small Δd=d2d1\Delta d = d_2 - d_1, HqH_q approximates a Möbius transform of the true gesture phase, robustly filtering hardware noise.
  • Short-Time Fourier Transform (STFT) for Doppler Analysis: The instantaneous phase P(f,t)=Hq(f,t)\mathcal{P}(f,t) = \angle H_q(f, t) is high-pass filtered to remove static multipath. STFT is then computed to extract Doppler Frequency Shifts (DFS): DFS(f,ω)=STFT{Hq(f,)}(t,ω)DFS(f, \omega) = STFT\{H_q(f, \cdot)\}(t, \omega).
  • Visualization and Fusion: Heatmaps of P(f,t)\mathcal{P}(f,t) and DFS(f,ω)DFS(f,\omega) for each antenna pair are concatenated into multi-channel images (resolution typically 224×224224 \times 224), providing a standardized input to a ResNet-18 backbone.

3. Latent Domain Discovery and Gesture Semantic Suppression

GesFi discovers KK latent domains via an iterative two-step scheme:

  • Pre-Learning Gesture Discrimination: A feature extractor hfh_f and bottleneck hbph_b^p are trained with a cross-entropy gesture classifier hcph_c^p, minimizing Lsuper=E(x,yg)[ygloghcp(hbp(hf(x)))]\mathcal{L}_{super} = \mathbb{E}_{(x,y_g)}[-y_g \cdot \log h_c^p(h_b^p(h_f(x)))].
  • Pseudo-Labeling and Clustering: Domain centroids μ~k\tilde{\mu}_k are initialized from softmax logits of a domain-classifier head hclh_c^l. Samples are assigned domain labels ydy_d by proximity in bottleneck space (Euclidean distance), and centroids are iteratively updated: yd=argminkD(hbl(hf(x)),μk)y_d = \arg \min_k D(h_b^l(h_f(x)), \mu_k).
  • Class-wise Adversarial Learning: During clustering, a gradient-reversal layer Rλ1\mathcal{R}_{\lambda_1} and adversarial gesture classifier hadvlh_{adv}^l are used. Minimizing Llad\mathcal{L}_{lad} while maximizing the reversed-gradient Ladv\mathcal{L}_{adv} decouples gesture semantics from domain discrimination, preventing semantic conflict.

4. Adversarial Alignment for Robust Generalization

After clustering, GesFi aligns features across latent domains using adversarial domain discrimination:

  • Domain-Adversarial Loss: The feature extractor hfh_f, bottleneck hbdh_b^d, and gesture classifier hcdh_c^d are trained to minimize

Lges=E(x,yg)[ygloghcd(hbd(hf(x)))]\mathcal{L}_{ges} = \mathbb{E}_{(x,y_g)}[-y_g \cdot \log h_c^d(h_b^d(h_f(x)))]

while simultaneously confusing the latent domain discriminator hdadvh_{dadv} by maximizing its prediction loss with rebalanced class weights wdw_d. Gradient reversal (Rλ2\mathcal{R}_{\lambda_2}) is used to force feature invariance to domain factors.

  • Training Strategy: Training alternates between updating the feature extractor/gesture classifier and maximizing domain discrimination, iterating latent mining and alignment for 10–20 epochs until convergence.

5. System Deployment and Evaluation

GesFi was implemented using commodity WiFi transceivers in multiple configurations:

  • Single-Pair Mode: 1 transmitter, 1 receiver (3 antennas) yield one antenna-pair CSI ratio.
  • Multi-Pair Mode: 1 transmitter, 3 receivers (each 3 antennas), producing six antenna pairs.
  • Datasets: Widar3.0, ARIL, XRF55, and real-world data (details in following table).
Dataset Subjects/Env Gestures Total Samples Hardware Config
Widar3.0 9 users/3 envs 6 12,750 Single/Multi
ARIL 1 user/16 pos 6 1,392 Single
XRF55 39 sub/4 scenes 8 6,240 Multi
Real-World 2 users, uncontrolled traffic 6 450 Multi
  • Training Protocol: ResNet-18 backbone, K=3K=3 latent domains, Adam optimizer (lr 2×1032\times 10^{-3}, batch 32, 50 epochs), pre-learning for 2 epochs, followed by alternating latent mining/adversarial alignment.

6. Benchmarking and Quantitative Performance

GesFi demonstrated substantial advances in cross-domain gesture recognition accuracy:

  • Widar3 (Multi-Pair): Achieved up to +78% improvement (cross-location/environments: 98.82% vs. 20.73% for baseline).
  • Widar3 (Single-Pair/ARIL): Outperformed MetaFormer/one-shot and Wi-Learner by up to 18% and 26% respectively.
  • XRF55: Cross-env accuracy 62.15% (vs. WiGRUNT 55.92%), cross-user 67.18% (vs. 63.47%).
  • Real-World Generalization: In-domain 62.89% (baseline 54.44%), cross-location 46.00% (baseline 38.89%), cross-orientation 42.89% (baseline 38.00%).
  • Training Data Sensitivity: Maintained >>90% cross-location accuracy with only 20% Widar3 data. Greater source domain diversity (from 1 to 3) enhanced cross-orientation by +34%.

7. Contributions, Limitations, and Interpretive Insights

GesFi identified and systematically addressed two core limitations of conventional domain-adversarial approaches in WiFi gesture recognition:

  • Classification Conflict: Source error increases when adversarial alignment blends gesture classes with overlapping CSI.
  • Manifold Distortion: Physical domain-based alignment can misrepresent the geometry of the CSI manifold, impairing transfer performance.

By introducing WiFi latent domain mining—which iteratively clusters representations and erases gesture semantics during domain discovery—GesFi achieves tighter generalization bounds. This suggests that adversarial learning benefits substantially when domain factors are derived directly from CSI statistics rather than imposed through heuristic physical labels.

GesFi's implementation demonstrated consistent outperformance of state-of-the-art domain-adaptation frameworks without access to target-domain data. Its modular design supports both single-pair and multi-pair hardware; pipeline generality was shown across multiple public datasets and uncontrolled real-world settings.

A plausible implication is the broader applicability of latent domain mining strategies in other sensor-based domain generalization tasks, where physical labels are incomplete proxies for underlying distributional shifts (Zhang et al., 7 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GesFi.